SWATGenXSWATGenX
Sign inSign up

SWAT+ calibration and validation on AWS EC2 | SWATGenX

SWATGenX runs particle-swarm calibration and validation on AWS EC2 — each job gets its own dedicated c7i instance, sized to the model, evaluates the swarm in parallel, streams live progress to your dashboard, and shuts the machine down the moment results are fetched. Short jobs run on interruptible spot; longer jobs run on a non-reclaimable on-demand box. This page documents the pipeline, instance options, and measured wall time.

Per-job dedicated compute · parallel PSO · auto-terminate

Open the calibration wizardPricing & Pro tiers

Calibrating a large SWAT+ model is compute-bound: a particle swarm runs the model thousands of times to fit observed streamflow. On a laptop that can take days to weeks. SWATGenX instead dispatches the swarm to a dedicated AWS EC2 instance and runs the forward runs in parallel.

Everything below is fact-based: the wall-time examples come from a measured six-model runtime sweep on the production SWAT+ engine, and the instance and spot/on-demand choices are made automatically from each model’s size and estimated run length.

Read this first

  1. Default instance is a c7i.8xlarge (32 vCPU / 64 GiB); the swarm runs in parallel across the cores, with RAM setting how many run at once.
  2. Pricing model is automatic: short jobs (under ~1 h) use interruptible spot, longer jobs use a non-reclaimable on-demand box so they finish through validation.
  3. Every instance auto-terminates the moment results sync — no idle compute. Cloud calibration is in preview, offered to Pro users by request.

Key takeaways

  • SWATGenX runs PSO (particle-swarm) calibration and validation on AWS EC2 compute-optimized instances, not on a shared queue — each job gets its own dedicated machine.
  • The instance size is chosen automatically from the model: the swarm's forward runs are spread across the cores, and the box must hold enough of them in RAM to run at full parallelism.
  • The pricing model is chosen automatically from run length — jobs estimated under about an hour use interruptible spot; longer jobs run on a dedicated on-demand instance that AWS cannot reclaim mid-run, so multi-hour calibrations finish through validation. If a short spot job is interrupted mid-run, SWATGenX automatically relaunches it on a non-reclaimable on-demand instance — warm-started from the last saved iteration — so an interruption costs a little time but never the run.
  • You can monitor the run live on your dashboard — objective metrics and the global-best and per-particle convergence curves update every iteration — and stop it yourself at any time based on your own read of the fit; you keep the best parameters found so far.
  • Every instance launches on demand and auto-terminates the moment results are fetched — no idle compute.
c7i.8xlarge
default instance (32 vCPU / 64 GiB)
Spot < 1 h
short jobs use interruptible spot
On-demand ≥ 1 h
long jobs use a non-reclaimable box
Auto-terminate
instance stops the moment results sync
1

How it works

How a calibration job reaches AWS

  1. 1
    Bundle

    Your built model plus the pinned SWAT+ engine binary are packaged into a ~32 MB bundle on the SWATGenX server.

  2. 2
    Select

    The instance size is picked from the model (cores plus the RAM needed to hold the parallel swarm), and the pricing model from the estimated run length — interruptible spot for short jobs, dedicated on-demand for jobs of roughly an hour or more so they can't be reclaimed mid-run.

  3. 3
    Run & monitor

    The PSO swarm runs on the box — forward runs are evaluated in parallel across the cores. Live progress streams back to your dashboard every ~20 seconds: objective metrics, the initial-stage hydrograph, and the global-best and per-particle convergence curves. You can watch the swarm converge and stop the run whenever you judge the fit is good enough — you keep the best result so far. If AWS reclaims a spot box mid-run, the job restarts automatically on an on-demand instance from the last checkpoint.

  4. 4
    Fetch & terminate

    Calibrated parameters, metrics and hydrographs sync to your dashboard, then the instance terminates automatically (a shell trap guarantees it, even on error).

2

Server options

Instance options (c7i, us-east-1)

InstancevCPURAMRole
c7i.8xlarge3264 GiBDefault — fits almost every basin
c7i.16xlarge64128 GiBLarge swarms / faster big-basin turnaround
c7i.48xlarge192384 GiBMaximum parallelism

The instance is chosen automatically so the parallel swarm fits in memory and finishes fastest. RAM is the binding limit: each forward run needs roughly 0.5–4 GiB, so a 64 GiB box runs about 16 in parallel and a 128 GiB box about 32. Bigger swarms or big basins benefit from the larger instances.

3

How long it takes

Compute time by basin size (measured models)

Reference workload: 7-year calibration window, 32-particle swarm x 50 PSO iterations + 32 warm-up samples (1,632 forward runs).

BasinModelHRUsChannelsPer runWall timeRuns on
Small03080102b1,2021000.33 min17 minSpot
Medium0947130011,2841,3712.89 min2.5 hOn-demand
Large0310010194,3038,18130.80 min26.7 hOn-demand

Each row is a real SWATGenX model from the runtime sweep (engine 247e95b, NetCDF + print-filter), for a 7-year window and a 32-particle x 50-iteration PSO (1,632 forward runs) on one c7i.8xlarge. "Runs on" is the pricing model SWATGenX selects from the estimated wall time. Big basins finish faster on a larger instance or with a smaller swarm; parallelism here is capped at the 32-particle pool.

4

The fine print

Availability and accuracy

  • Cloud calibration is in preview and offered to Pro users by request — it is not yet a self-serve checkout.
  • Spot capacity is not guaranteed: a short job may wait briefly for a slot, and AWS can reclaim a spot box mid-run. SWATGenX surfaces the wait on your dashboard and, on a reclaim, automatically resumes the job on a non-reclaimable on-demand instance (warm-started from the latest checkpoint) — so spot interruptions never lose work. Long jobs run on on-demand capacity from the start.
  • Runtime constants were fit (R²=0.996) to a six-model sweep and translated to c7i; your exact wall time depends on basin size, channel count, the calibration window, and swarm settings.

Engine 61.0.2.61-353-g247e95b · reference instances us-east-1.

FAQ

  • What AWS instance does SWATGenX use for calibration?

    A spot c7i.8xlarge (32 vCPU / 64 GiB, Sapphire Rapids) by default. Larger swarms or big basins can use c7i.16xlarge (64 vCPU) or c7i.48xlarge (192 vCPU). Each calibration job gets its own dedicated instance — not a shared queue.

  • Does SWATGenX use spot or on-demand instances?

    Both — chosen automatically from the estimated run length. Jobs that finish in under about an hour run on low-cost interruptible spot. Longer jobs run on a dedicated on-demand instance that AWS cannot reclaim mid-run, so a multi-hour calibration completes through validation instead of being interrupted partway. If a short spot job is reclaimed mid-run, SWATGenX automatically relaunches it on a non-reclaimable on-demand instance, warm-started from the last saved iteration, so the run still finishes. Either way the instance auto-terminates the moment results are fetched, so there is no idle compute.

  • How long does a calibration take on AWS?

    Roughly 15-20 minutes for a small basin, a few hours for a mid-size model, and up to ~1 day for the largest continental basins on a single 32-core instance. Wall time scales with HRU and channel count, the calibration window, and the swarm size; bigger swarms or instances finish large basins faster.

  • Can I watch the calibration live and stop it myself?

    Yes. Progress streams to your dashboard every ~20 seconds — objective metrics, the initial-stage hydrograph, and the global-best and per-particle convergence curves — so you can see how the swarm is converging in real time. You can stop the run at any point based on your own judgment of the fit (for example, once the global best has plateaued); SWATGenX keeps the best parameters found so far and shuts the instance down.

  • Do I have to manage any AWS infrastructure?

    No. SWATGenX bundles your model with the pinned SWAT+ engine, launches the instance, runs the PSO swarm, streams live progress to your dashboard, fetches the results, and terminates the machine automatically — you only see the calibrated parameters, metrics and hydrographs.

  • Is cloud calibration available now?

    It is in preview and offered to Pro users by request. The infrastructure is live and validated; contact us to enable it for your account while we roll out self-serve access.

Related guides

Calibration & validation
SWAT+ runtime benchmark (measured on real models)
SWAT+ production engine
Access Level
Methodology

Explore related

Last updated 2026-06-11.

Home