Which SWAT+ engine does SWATGenX actually run in production?

A pinned fork commit — currently rafiei-vahid/swatplus@6a4b7f1, the unified engine deployed 2026-06-28. Its engine core is rafiei-vahid/swatplus@247e95b (stock swat-model/swatplus@5ccf6f0 plus a NetCDF output backend, channel_sd print-filtering, and two engine fixes, PRs #219/#220), consolidated with three opt-in capabilities: OpenMP parallelism (multi-core HRU land phase + wavefront routing), PFAS fate-and-transport, and a daily two-way MODFLOW 6 coupling. Production is pinned per release, not a moving label — and at the plain-model serial default the parallel/PFAS/MF6 code is inert, so an ordinary SWAT+ build runs exactly as the 247e95b core did.

How does the SWATGenX engine compare to stock SWAT+?

On a 3-month Peace River run (94,303 HRUs), stock writes a 695 MB channel_sd text file in 462 s; the measured engine core writes a 2.0 MB filtered NetCDF file in 227 s — output ~341× smaller and wall time ~2.0× faster, with byte-identical results. The current unified build (6a4b7f1) is ship-gate certified output-equivalent to that core on plain models (worst-relative ≤ 1e-5); its opt-in multi-core scaling (up to ~7.1× end-to-end) is measured on the parallel-engine page.

Are the SWAT+ engine contributions upstream in the official release?

The four output/performance contributions are each open as an independent pull request against swat-model/swatplus — NetCDF backend (#213), channel_sd print-filter (#214), hru_read O(1) name lookup (#219), and varinit per-row reset (#220) — proposed but not yet merged into the official release, so SWATGenX runs them from the pinned fork today. The parallelism, PFAS, and MODFLOW 6 layers live on the open fork main branch at https://github.com/rafiei-vahid/swatplus.

SWAT+ production engine

The SWAT+ engine we run in production

SWATGenX runs a pinned SWAT+ build — stock swat-model/swatplus plus a NetCDF backend, print-filtering, and two engine fixes, now unified with opt-in multi-core, PFAS, and MODFLOW 6 — deployed only after a ship-gate certification proves it output-identical on plain models.

Pinned per release: currently rafiei-vahid/swatplus@6a4b7f1 (deployed 2026-06-28). Ladder measured on Peace River HUC-8, 94,303 HRUs.

Current pin: fork@6a4b7f1
341× smaller output vs stock
~2× faster engine core vs stock
Ship-gate certified ≤ 1e-5 on plain models

Stock → production wall time94k HRUs · 90 days

Byte-identical results across all rungs. The unified production engine (fork@6a4b7f1) inherits this measured core; its multi-core scaling is on the parallel-engine page.

A recurring question is which exact SWAT+ binary builds SWATGenX models, and how it differs from the stock distribution. This page pins it down — every engine is a commit, pinned per release, and every speed/size claim is measured against the stock original.

It is an overview, not a re-derivation. The detailed evidence for each contribution lives on its own deep-dive page; here we show the cumulative stock-to-production ladder and a changelog of every contribution and its state — the four output/performance layers are each an open upstream PR, and the multi-core, PFAS, and MODFLOW 6 layers live on the open fork.

The current production build is the unified engine (fork@6a4b7f1, deployed 2026-06-28): the measured fast engine core plus three opt-in capabilities — shared-memory OpenMP parallelism, PFAS fate-and-transport, and a daily two-way MODFLOW 6 coupling. All three are inert at the plain-model serial default, and a pre-production ship gate certifies each swap output-identical to the previous engine on ordinary models before it deploys.

Key takeaways

The production engine is pinned per release — currently rafiei-vahid/swatplus@6a4b7f1, the unified engine deployed 2026-06-28.
NetCDF output + the channel_sd print-filter cut output size 341× (695 MB → 2.0 MB) versus stock.
The two engine fixes then attack compute: the measured engine core runs ~2.0× faster than stock, with byte-identical results — and the unified build is ship-gate certified output-equivalent to that core on plain models (≤ 1e-5).
The unified engine adds opt-in OpenMP parallelism, PFAS transport, and a MODFLOW 6 coupling — all inert at the plain-model serial default, all deployed today.

6a4b7f1

production commit

341×

smaller output vs stock

2.0× faster

wall time vs stock

In production

deployed engine

Motivation

What SWATGenX runs, and how far it is from stock SWAT+

"Production" should never be a vague word. The SWAT+ engine SWATGenX builds models with is a specific fork commit — pinned per release — and it differs from the stock distribution in ways we can point to and measure. This page pins down exactly what we run and how far it is from the original.

It is the changelog and the integrator: a stock → fork-production ladder ending at the current unified engine, plus a table of every contribution and its state. The detailed proof for each lives on its own deep-dive page — here we show the cumulative effect, anchored to commits.

Methods

Every engine, pinned to a commit

Four engines are pinned. Baseline is stock swat-model/swatplus@5ccf6f0. Two fork rungs were each production in turn: 768f1d1 (NetCDF backend + channel_sd print-filter) and 247e95b (the same plus the two engine fixes). The current production build is rafiei-vahid/swatplus@6a4b7f1 — the unified engine, deployed 2026-06-28: the 247e95b core consolidated with opt-in OpenMP parallelism (HRU land phase + routing wavefront), PFAS fate-and-transport, and a daily two-way MODFLOW 6 coupling. At its serial default all of that is inert, so a plain model runs exactly as before.

The three measured rungs were run on the same basin and window (Peace River HUC-8 (03100101) — 94,303 HRUs, 90 simulated days, ifx -O3 -ipo), each in its own native output mode — stock writes formatted-text channel_sd for all channels; the fork builds write gauge-filtered NetCDF — so the ladder reflects how each engine would actually be used, not an artificial common setting. The unified rung is deliberately not re-timed here: instead of a new benchmark, it carries a certification.

Before the 6a4b7f1 swap, the candidate binary passed the pre-production ship gate (scripts/ship_gate): Tier 1 regression against the then-live 247e95b production engine on plain benchmark models with flow/nutrient worst-relative differences ≤ 1e-5 (the compiler-noise band); Tier 2 coupling + PFAS validation against a committed golden on the Rogue SWAT+/MODFLOW 6 model, with MODFLOW mass-balance discrepancy < 1%; and Tier 3 deploy-time safety (no in-flight builds, timestamped backup of the previous binary, smoke test, auto-rollback). The new physics is inert unless a model configures it, so ordinary SWAT+ builds are unaffected.

Role	Repo @ commit	What it is
Stock	swat-model/swatplus@5ccf6f0	stock SWAT+ (upstream/main) — formatted-text output, no filter
First fork production	rafiei-vahid/swatplus@768f1d1	first fork production — NetCDF backend + channel_sd print-filter (deployed until 2026-06-06)
Second fork production	rafiei-vahid/swatplus@247e95b	second fork production — NetCDF + print-filter + the two engine fixes (deployed 2026-06-06 → 2026-06-28)
Current production	rafiei-vahid/swatplus@6a4b7f1	current production — unified engine: the 247e95b core plus opt-in OpenMP parallelism (HRU land phase + routing wavefront), PFAS fate-and-transport, and a MODFLOW 6 coupling (deployed 2026-06-28; engine rev 61.0.2.61-385-g6a4b7f1)

Baseline: swat-model/swatplus@5ccf6f0 (stock).
Fork productions, in order: 768f1d1 (NetCDF + filter) → 247e95b (+ two engine fixes).
Current production (deployed 2026-06-28): rafiei-vahid/swatplus@6a4b7f1 — unified engine, serial default, opt-in parallelism/PFAS/MF6.
Each measured number on this page is sourced from the release manifest, not hand-edited.

Results and discussion

Stock → fork productions → current unified engine

Peace River HUC-8 (03100101) — 94,303 HRUs · 90 simulated days.

Figure 1. Wall time and output size across the three measured rungs on Peace River HUC-8 (03100101) — 94,303 HRUs (90 simulated days). NetCDF + filtering crushed size; the two fixes crushed runtime. The current unified engine (6a4b7f1) is certified output-equivalent to the fastest measured rung on plain models, so it inherits these numbers at its serial default.

Table 1. Stock → fork productions → current unified engine, each pinned to a commit, with wall time, output size, and ratios versus stock for the measured rungs.

Stage	Commit	Wall (s)	Output	vs stock (wall)	vs stock (size)	State
Stock	5ccf6f0	462	695.10 MB	1.0× (baseline)	1.0× (baseline)	upstream original
NetCDF + print-filter	768f1d1	414	2.04 MB	1.1×	341× smaller	first fork production (superseded 2026-06-06)
Two engine fixes	247e95b	227	2.04 MB	2.0×	341× smaller	second fork production (superseded 2026-06-28); measured core of the current engine
Unified engine	6a4b7f1	—	—	≡ 247e95b (certified)	same filtered NetCDF	deployed — current (2026-06-28)

Unified engine (6a4b7f1): Not re-timed on this ladder. Ship-gate certified output-equivalent (worst-relative ≤ 1e-5) to the 247e95b rung on plain models at the serial default; its opt-in multi-core scaling is measured separately on the parallel-engine page.

The contribution families move different axes. NetCDF + print-filtering (stock → 768f1d1) cut output ~341× (695 MB → 2.0 MB) while wall time changed only ~1.1× — because once output is small, the bottleneck is compute, not I/O. The two engine fixes (768f1d1 → 247e95b) then attack the compute: the same run drops to 227 s, ~2.0× faster than stock with byte-identical results. The unified engine (6a4b7f1) keeps that measured core — certified output-equivalent on plain models (≤ 1e-5) — and adds the opt-in capabilities: multi-core scaling (measured on the parallel-engine page), PFAS transport, and the MODFLOW 6 coupling.

Contributions and their state

Table 2. Every engine contribution: the axis it moves, its headline effect, its state (output/perf contributions have open upstream PRs; the parallel/PFAS/MF6 layers live on the open fork), and a link to its deep-dive evidence.

Contribution	Axis	Headline	State	Upstream PR	Details
NetCDF output backend	output format	enables compact NetCDF output	in production; upstream PR open	#213	runtime benchmark
channel_sd print-filter	output scope	gauge-only channel_sd → 341× smaller	in production; upstream PR open	#214	runtime benchmark
hru_read O(1) name index	runtime (startup)	string name-matching 75 → 7 s	in production; upstream PR open	#219	performance profiling
varinit per-row reset	runtime (daily loop)	array zeroing 28 → 2 s	in production; upstream PR open	#220	performance profiling
OpenMP parallelism (HRU land phase + routing wavefront)	runtime (multi-core)	5.33× at 24 threads on a 32-core node; byte-identical at 1 thread	in production (opt-in; serial default); fork main	fork only	parallel engine
PFAS fate-and-transport	new physics	watershed-scale PFAS transport in SWAT+ (Freundlich sorption)	in production (inert unless configured); fork main	fork only	PFAS fate & transport
MODFLOW 6 coupling	new physics	daily two-way recharge/baseflow exchange with MODFLOW 6	in production (inert without mf6 config); fork main	fork only	SWAT+ × MODFLOW 6

Conclusion

Production is pinned per release — currently rafiei-vahid/swatplus@6a4b7f1 (deployed 2026-06-28), the unified engine: the measured 247e95b core plus opt-in OpenMP parallelism, PFAS fate-and-transport, and a MODFLOW 6 coupling, all inert at the plain-model default.
Versus stock on a 94k-HRU basin, the measured engine core writes ~341× smaller output and runs ~2.0× faster, with byte-identical results; the unified build is ship-gate certified output-equivalent (≤ 1e-5) to that core on plain models.
Output size and runtime are independent axes: NetCDF + filtering won on size, the two fixes won on speed, and both are in the shipped engine. The four output/perf contributions each have an open upstream PR against swat-model/swatplus (#213/#214/#219/#220); the parallelism, PFAS, and MODFLOW 6 layers live on the open fork's main branch and are documented on their own pages.

FAQ

Which SWAT+ engine does SWATGenX actually run in production?
A pinned fork commit — currently rafiei-vahid/swatplus@6a4b7f1, the unified engine deployed 2026-06-28. Its engine core is rafiei-vahid/swatplus@247e95b (stock swat-model/swatplus@5ccf6f0 plus a NetCDF output backend, channel_sd print-filtering, and two engine fixes, PRs #219/#220), consolidated with three opt-in capabilities: OpenMP parallelism (multi-core HRU land phase + wavefront routing), PFAS fate-and-transport, and a daily two-way MODFLOW 6 coupling. Production is pinned per release, not a moving label — and at the plain-model serial default the parallel/PFAS/MF6 code is inert, so an ordinary SWAT+ build runs exactly as the 247e95b core did.
How does production compare to stock SWAT+?
On a 3-month Peace River run (94,303 HRUs), stock writes a 695 MB channel_sd text file in 462 s; the measured engine core writes a 2.0 MB filtered NetCDF file in 227 s — output size ~341× smaller and wall time ~2.0× faster, with byte-identical results. The current unified build (6a4b7f1) is ship-gate certified output-equivalent to that core on plain models (worst-relative ≤ 1e-5), so it carries the same numbers at its serial default; its opt-in multi-core scaling (up to ~7.1× end-to-end) is measured on the parallel-engine page.
How do you know the unified engine is safe for ordinary models?
Every engine binary passes a pre-production ship gate before it can replace the deployed one. Tier 1 regresses the candidate against the live production engine on plain benchmark models — flow and nutrient worst-relative differences must be ≤ 1e-5 (compiler-noise band). Tier 2 validates the coupling and PFAS transport against a committed golden on the Rogue SWAT+/MODFLOW 6 model, with MODFLOW mass-balance discrepancy < 1%. Tier 3 handles deploy safety: it refuses to swap while any model build or calibration is in flight, backs up the previous binary, smoke-tests, and auto-rolls-back on failure. The 6a4b7f1 engine passed all three.
Are the engine contributions upstream in official SWAT+?
The four output/performance contributions are each open as an independent pull request against swat-model/swatplus — NetCDF backend (#213), channel_sd print-filter (#214), hru_read O(1) name lookup (#219), and varinit per-row reset (#220) — proposed but not yet merged into the official release, so SWATGenX runs them from the pinned fork today. The parallelism, PFAS, and MODFLOW 6 layers live on the open fork’s main branch (github.com/rafiei-vahid/swatplus) and are documented on their own pages.
Where do the detailed numbers live?
Each contribution has its own deep-dive page that owns its evidence: the performance-profiling page for the two engine fixes, the runtime-benchmark page for output format and print scope, and the parallel-engine page for the multi-core scaling. This page is the changelog/overview that links to them.