Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

Gabriel Melo; Leonardo Santiago; Peter Y. Lu

arxiv: 2604.21097 · v2 · pith:L7OVMBI4new · submitted 2026-04-22 · 📊 stat.ML · cs.LG

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

Gabriel Melo , Leonardo Santiago , Peter Y. Lu This is my paper

Pith reviewed 2026-05-09 22:45 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords chaos emulationoptimal transportadversarial regularizationneural operatorsdynamical systemsstatistical fidelityattractors

0 comments

The pith

Adversarial optimal transport regularization trains neural emulators to match chaotic attractor statistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Chaotic dynamical systems are sensitive to initial conditions, so exact long-term forecasts are impossible and squared-error losses fail when training data-driven emulators on noisy data. The paper introduces a family of adversarial optimal transport objectives to regularize training so that emulators reproduce the statistical properties of the chaotic attractor rather than pointwise trajectories. It analyzes and tests a Sinkhorn divergence formulation based on the 2-Wasserstein distance together with a WGAN-style dual formulation based on the 1-Wasserstein distance; both jointly learn the emulator and high-quality summary statistics. Experiments on multiple chaotic systems, including those with high-dimensional attractors, show that the resulting emulators achieve significantly better long-term statistical fidelity than methods relying on handcrafted features or fixed summary statistics.

Core claim

A family of adversarial optimal transport objectives, including Sinkhorn divergence for 2-Wasserstein matching and a WGAN-style dual for 1-Wasserstein matching, jointly learns summary statistics and a physically consistent emulator that reproduces the statistical properties of chaotic attractors.

What carries the argument

Adversarial optimal transport objectives that enforce distributional matching between emulator trajectories and the true chaotic attractor while learning summary statistics.

If this is right

Emulators exhibit significantly improved long-term statistical fidelity across a variety of chaotic systems.
The method succeeds even for systems with high-dimensional chaotic attractors.
Joint learning of summary statistics and the emulator removes the need for handcrafted local features.
Both the Sinkhorn divergence and WGAN-style formulations are theoretically analyzed and experimentally validated for this task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The regularization may allow neural operator architectures to handle a wider range of complex dynamical systems where only statistical behavior is observable.
Applications such as weather or power-grid modeling could use these emulators for ensemble forecasting without pointwise accuracy.
The approach could be combined with other regularization terms that encode known physical invariants.

Load-bearing premise

The adversarial optimal transport regularization produces physically consistent emulators without introducing artifacts, instabilities, or distribution mismatches that affect downstream use.

What would settle it

Train an emulator on a chaotic system using the proposed regularization, then generate long trajectories and measure whether their statistical properties (for example, state distributions or attractor dimensions) match those of the true system or whether unphysical artifacts appear.

Figures

Figures reproduced from arXiv: 2604.21097 by Gabriel Melo, Leonardo Santiago, Peter Y. Lu.

**Figure 1.** Figure 1: Adversarial optimal transport regularization for emulating chaotic dynamics. (a) Emulator training via one-step prediction loss with OT regularization. (b) Adversarial learning of summary statistics that maximize the discrepancy between real and generated trajectory distributions while minimizing the full loss. 4. Our Approach: Adversarial Optimal Transport Regularization Motivated by the fact that chaotic… view at source ↗

**Figure 2.** Figure 2: KS full roll-out evaluation (clean data). Our WGANstyle emulator most faithfully replicates the diagonal wave patterns and spatial structure of the ground truth (numerical simulation) across the full evaluation rollout. Lorenz-63 Attractor Geometry Comparison Baseline (No OT) WGAN (Learnable) σ = 0.1 20 15 10 5 0 5 10 15 20 x 20 10 0 10 20 y 5 10 15 20 25 30 35 40 45 z Ground truth Emulator 20 15 10 5 0 5… view at source ↗

**Figure 3.** Figure 3: L63 emulator geometry at increasing noise level σ. At σ = 0.10, the MSE baseline (No OT) underestimates the spatial extent of the attractor; at σ = 0.15, it collapses to a limit cycle, losing the bilobal structure entirely. WGAN maintains coverage of both lobes at both noise levels, directly illustrating how distributional regularization prevents attractor collapse under noise. More results on L63 are prov… view at source ↗

**Figure 4.** Figure 4: Lipschitz bounds during training for the WGAN summary map f on L96, under three regularization settings. Upper bounds (dashed) are computed as Q ℓ ∥Wℓ∥2; lower bounds (solid) are estimated from the mean Jacobian spectral norm over a batch subset. Prescribed thresholds Lmax = 4 and Lmax = 10 are shown as horizontal dash-dotted lines. Practical trade-off. Explicit regularization via per-step Jacobian comput… view at source ↗

**Figure 5.** Figure 5: shows the Lorenz–63 attractor colored by the learned summary value s = fφ(u) across the three canonical projections (x, y), (x, z), and (y, z), alongside the projection induced by the dominant eigenvector of the displacement covariance CT (Proposition A.1). 2 1 0 1 2 x 3 2 1 0 1 2 3 y (x, y) 2 1 0 1 2 x 3 2 1 0 1 2 3 z (x, z) 3 2 1 0 1 2 3 y 3 2 1 0 1 2 3 z (y, z) 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 f_… view at source ↗

**Figure 6.** Figure 6: shows long-run state-space visitation histograms, comparing the ground truth system against the emulator trained with the adversarial OT objective. Both histograms are computed from a single long trajectory after transient removal. 2 1 0 1 2 0.0 0.1 0.2 0.3 0.4 0.5 Density x(t) Predicted True 3 2 1 0 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Density y(t) Predicted True 3 2 1 0 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 Density… view at source ↗

**Figure 7.** Figure 7: compares the distribution of the learned summary statistic s extracted from ground truth trajectory segments against emulator-generated segments. 3 2 1 0 1 2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Density f(u(t)) Predicted True Summary Histogram Comparison [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Space-time plots of u(x, t) for the Lorenz–96 system (d = 60), comparing ground truth numerical simulations (left) against emulator rollouts (right) over 1,500 timesteps. Each row corresponds to a different method. As expected for chaotic systems, pointwise trajectory agreement is not maintained beyond the Lyapunov time; the relevant comparison is the statistical structure of the attractor over long horizo… view at source ↗

**Figure 9.** Figure 9: Space-time plots of u(x, t) for the Kuramoto–Sivashinsky equation (d = 256), comparing ground truth numerical simulations (left) against emulator rollouts (right) over 1,000 timesteps. Each row corresponds to a different method. Trajectory-level correspondence is not expected beyond the Lyapunov time due to chaos; the relevant comparison is the statistical structure of the attractor rather than pointwise a… view at source ↗

**Figure 10.** Figure 10: Vorticity rollouts for 2D Kolmogorov flow (Re = 104 , α = 0.1) at selected timesteps. Each row shows autoregressive predictions from a different model alongside the ground truth (top row). For quantitative analysis, see Tables 2. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

read the original abstract

Chaos arises in many complex dynamical systems, from weather to power grids, but is difficult to accurately model with data-driven methods such as machine learning emulators. While emulators are promising tools for accelerating simulations and solving inverse problems, they still struggle to learn chaotic dynamics, where sensitivity to initial conditions renders exact long-term forecasts infeasible, especially given noisy data. Recent work instead trains emulators to match the statistical properties of chaotic attractors, but these approaches often rely on handcrafted summary statistics or large, diverse multi-environment datasets. In this work, we propose a family of adversarial optimal transport objectives that can jointly learn high-quality summary statistics and a physically consistent emulator from a single noisy trajectory. We theoretically analyze and experimentally validate a Sinkhorn divergence formulation (2-Wasserstein) and a WGAN-style dual formulation (1-Wasserstein) of our approach. Numerical experiments across a variety of chaotic systems, including ones with high-dimensional spatiotemporal chaos, show that emulators trained using our proposed objectives have significantly improved long-term statistical fidelity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a joint adversarial OT approach to learn summary statistics and train emulators that better match chaotic attractor statistics.

read the letter

The main point is that they frame emulator training for chaotic systems as matching distributions via adversarial optimal transport, so the model learns both useful summary statistics and the dynamics at the same time. This differs from prior work that either uses fixed handcrafted features or learns statistics separately from a dataset of trajectories. They work out two concrete versions—one based on Sinkhorn divergence for the 2-Wasserstein distance and one that follows the WGAN dual for the 1-Wasserstein distance—and claim both theory and experiments to support them. The experiments cover several chaotic systems, including higher-dimensional attractors, and report better long-term statistical fidelity than the baselines. That joint formulation is the clearest new piece and it makes sense as a way to avoid brittle choices in what statistics to track. The theoretical analysis is mentioned but not shown in detail here, so its depth is hard to judge without the derivations. On the experimental side, the abstract gives no numbers on effect sizes, no description of the exact metrics or how the baselines were implemented, and no discussion of whether the emulators remain stable or introduce artifacts when rolled out. Those gaps make it difficult to know how general the gains are. This is the kind of paper that would interest people building neural operators or reduced-order models for weather, fluids, or power systems. It is grounded enough in existing OT and chaos literature to merit peer review, even if the current version needs more concrete evidence on the improvements and any limitations.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a family of adversarial optimal transport objectives—specifically a Sinkhorn divergence formulation based on the 2-Wasserstein distance and a WGAN-style dual formulation for the 1-Wasserstein distance—to jointly learn summary statistics and train neural emulators for chaotic dynamical systems. The central claim is that this regularization yields emulators with significantly improved long-term statistical fidelity to the attractors of chaotic systems, outperforming baselines that rely on handcrafted local features or learned statistics from trajectory datasets, as supported by theoretical analysis and experiments on a variety of chaotic systems including high-dimensional attractors.

Significance. If the central claims hold, the work provides a principled, automatic alternative to handcrafted or pre-learned statistics for regularizing data-driven emulators of chaotic dynamics. This could improve the reliability of long-term statistical predictions in applications such as weather modeling and power-grid simulation, where exact trajectory matching is infeasible due to sensitivity to initial conditions. The joint learning of statistics and emulator via optimal transport is a notable strength relative to prior regularization approaches.

minor comments (3)

The abstract and introduction would benefit from a brief, explicit statement of the precise baseline methods (handcrafted features and learned-statistic approaches) and the quantitative metrics used to assess long-term statistical fidelity, to allow readers to immediately gauge the scope of the claimed improvements.
In the experimental section, additional detail on the number of independent runs, standard deviations or confidence intervals for the reported fidelity metrics, and the precise definition of 'long-term' (e.g., integration horizon relative to Lyapunov time) would strengthen reproducibility and interpretation of the results.
Notation for the adversarial objectives (Sinkhorn and dual formulations) should be introduced with a short table or inline reminder of the key variables (e.g., the role of the critic network and the regularization parameter) to improve readability for readers less familiar with optimal transport.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. The referee's description accurately reflects the manuscript's contributions regarding adversarial optimal transport regularization for emulators of chaotic systems. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines its core adversarial optimal transport objectives (Sinkhorn 2-Wasserstein divergence and WGAN-style 1-Wasserstein dual) directly from standard optimal transport theory and applies them to jointly optimize summary statistics and the emulator. No load-bearing step in the abstract or described approach reduces the claimed predictions or statistical fidelity improvements to quantities fitted from the target data by construction, nor relies on self-citations for uniqueness theorems, ansatzes, or renaming of known results. The central claim rests on experimental comparison to handcrafted and learned-statistic baselines, which supplies independent validation rather than tautological equivalence to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the method relies on standard optimal transport and adversarial training concepts from prior literature.

pith-pipeline@v0.9.0 · 5469 in / 962 out tokens · 119384 ms · 2026-05-09T22:45:48.433096+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Data-Adaptive Learning of Dynamical Systems by Matching Transfer Operators and Invariant Measures
math.NA 2026-07 unverdicted novelty 7.0

A data-adaptive operator-matching approach learns vector fields by aligning induced transition matrices and invariant measures on an unstructured partition, outperforming pointwise losses under noise.