pith. machine review for the scientific record. sign in

arxiv: 2602.11229 · v2 · submitted 2026-02-11 · 💻 cs.AI · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Latent Generative Solvers for Generalizable Long-Term Physics Simulation

Authors on Pith no claims yet

Pith reviewed 2026-05-16 05:18 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords latent generative solverphysics simulationPDE solversflow matchinglong-term stabilitygeneralizationautoregressive rolloutVAE
0
0 comments X

The pith

A single pretrained latent model simulates 16 different physics systems stably over long rollouts and adapts quickly to new ones with far less compute than baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Latent Generative Solver to address two shortcomings in neural physics simulators: poor generalization across unrelated PDE families and rapid error growth during extended autoregressive predictions. It compresses twelve PDE families into one shared latent space, then uses flow matching inside a pyramidal transformer to generate future states while a contraction bound from input noising prevents instability. When pretrained on millions of trajectories, the model matches or beats strong deterministic baselines at short horizons, outperforms them on nearly all systems at five- and ten-step rollouts, halves the twenty-step error, and requires thirteen to seventy-seven times less recurrent computation. It further adapts to a higher-resolution flow never seen in pretraining after only five fine-tuning epochs.

Core claim

The central claim is that a Physics VAE mapping diverse PDE trajectories onto a shared latent manifold, paired with a Pyramidal Flow-Forcing Transformer that predicts the next latent via flow matching conditioned on the model's own prior outputs, plus input noising whose contraction property is formally bounded, yields a solver that generalizes across heterogeneous systems and remains stable under long-horizon autoregressive rollout while using substantially less recurrent dynamics compute.

What carries the argument

The Latent Generative Solver (LGS), whose three coupled pieces are a Physics VAE that compresses multiple PDE families into one latent manifold, a Pyramidal Flow-Forcing Transformer that generates the next latent state by flow matching, and a sufficient-condition contraction bound derived from input noising during training.

If this is right

  • Fifteen of sixteen systems show lower error than the deterministic baseline at both five- and ten-step rollouts.
  • Twenty-step L2 relative error drops from 56.1 percent to 30.2 percent while recurrent compute falls by factors of 13 to 77.
  • The same pretrained weights adapt to an unseen 256-squared Kolmogorov flow, cutting one-step error from 0.398 to 0.129 in five fine-tuning epochs.
  • The contraction bound supplies an explicit guarantee that explains why long-horizon rollouts do not diverge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The shared manifold might allow transfer of learned dynamics between physically dissimilar systems without retraining from scratch.
  • Extending the same architecture to three-dimensional or multi-physics problems could reduce the need for separate simulators per domain.
  • If the contraction bound generalizes, similar noising schedules might stabilize other autoregressive generative models outside physics.
  • Real-time control applications could exploit the low recurrent cost for closed-loop prediction over dozens of steps.

Load-bearing premise

That one shared latent manifold plus the derived contraction bound from noising suffices for stable long-horizon behavior across twelve distinct PDE families without hidden per-system tuning.

What would settle it

An experiment showing that, on a held-out PDE family, twenty-step L2 relative error under LGS exceeds the error of the strongest deterministic baseline would falsify the claimed generalization and stability.

Figures

Figures reproduced from arXiv: 2602.11229 by Sili Deng, Zituo Chen.

Figure 1
Figure 1. Figure 1: Overview of Latent Generative Solver. All physics states are leveraged into a unified latent space, where initial condition x0 are perturbed with noise to cover off-manifold self-rollout states, which are guided back to clean next state x1. The predicted states serve to generating system dynamics descriptor “context” which differentiate heterogeneous physical dynamics. guage models (Kaplan et al., 2020; Ho… view at source ↗
Figure 2
Figure 2. Figure 2: Model Architecture. (a) Essential components of the Flow Forcing Transformer, that first generate next state xs+1 given intermediate state xˆs, diffusion time ts, and physics context cs. Then leveraging the generated state, a gated cross-attention unit that updates the physics condition to be cs+1. (b) A set of FFTs to realize the next-state-prediction during training, where inter￾mediate states are transf… view at source ↗
Figure 3
Figure 3. Figure 3: t-SNE visualization of inferred physics contexts c for different PDE systems at the beginning and end of prediction. Each system forms a stable cluster and exhibits minimal drift over the diffusion/transport steps, indicating that context updates remain on the learned manifold [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) Average L2RE over 16 systems versus training throughput. Marker area denotes per-sample forward FLOPs. Latent-space models (LGS and ablations) have substantially lower FLOPs. Physics context and temporal pyramids improve through￾put with limited loss of accuracy, while generative modeling and the uncertainty knob improve rollout accuracy. (b) Example 10- step rollout on FNO-v3 comparing LGS and U-AFNO.… view at source ↗
Figure 5
Figure 5. Figure 5: OOD adaptation on 2562 Kolmogorov flow. Left: 1-step predictions after finetuning. Right: 10-step autoregressive rollouts of the finetuned models. LGS better preserves coherent vorticity structures and exhibits reduced long-horizon drift. cases, and the continued use of physical solvers or mea￾surements for verification when high-stakes decisions are involved. References Alkin, B., Furst, A., Schmid, S., G… view at source ↗
read the original abstract

Reliable physics simulation demands two capabilities that today's neural PDE solvers do not deliver together: generalization across heterogeneous PDE families, and stability under long autoregressive rollouts. Deterministic operators accumulate error geometrically, while existing probabilistic solvers are confined to a single PDE family or short horizons. We close this gap with the \textbf{Latent Generative Solver} (LGS), three coupled components: (i) a Physics VAE (PhyVAE) compressing twelve PDE families into a shared latent manifold; (ii) a Pyramidal Flow-Forcing Transformer (PFlowFT) that generates the next latent by flow matching, conditioned on a per-trajectory context updated on the model's own predictions; and (iii) input noising during training, for which we derive a sufficient-condition contraction bound explaining the observed long-horizon stability. Pretrained on a 2.5\,M-trajectory, 16-system corpus at $128^2$, LGS matches the strongest deterministic baseline at one step, wins on 15/16 systems at both 5- and 10-step rollout, cuts 20-step L2RE from $56.1\%$ to $\mathbf{30.2\%}$, and uses $\mathbf{13}$--$\mathbf{77\times}$ less recurrent dynamics-step compute. It also adapts efficiently to a $256^2$ Kolmogorov flow held out from the pretraining corpus, dropping 1-step L2RE from $0.398$ to $0.129$ in five finetune epochs against U-AFNO's $0.653{\to}0.343$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces the Latent Generative Solver (LGS) with three components: a Physics VAE (PhyVAE) that compresses twelve PDE families into a shared latent manifold, a Pyramidal Flow-Forcing Transformer (PFlowFT) that performs flow-matching generation of the next latent state conditioned on a self-updated per-trajectory context, and input noising during training for which a sufficient-condition contraction bound is derived to explain long-horizon stability. Pretrained on a 2.5M-trajectory corpus spanning 16 systems at 128² resolution, LGS matches the strongest deterministic baseline at one step, outperforms on 15/16 systems at 5- and 10-step rollouts, reduces 20-step L2RE from 56.1% to 30.2%, and requires 13–77× less recurrent compute; it also adapts to a held-out 256² Kolmogorov flow in five finetuning epochs.

Significance. If the shared-manifold premise and contraction bound hold without hidden per-system tuning, the work would constitute a meaningful step toward generalizable, stable neural PDE solvers that operate across heterogeneous families with substantially lower long-horizon compute than recurrent baselines.

major comments (3)
  1. [abstract / methods (contraction bound)] The derivation of the sufficient-condition contraction bound from input noising (abstract and methods) does not explicitly state the latent-norm or Lipschitz-constant hypotheses required for uniformity across the twelve PDE families; without these, it is unclear whether the bound is satisfied by the single shared manifold or only after family-specific scaling of noise variance or context-update rate, directly affecting the explanation for the 20-step L2RE reduction.
  2. [results] Table reporting 5- and 10-step results (results section): the 15/16 win count is presented without full baseline implementation details, exact train/validation splits, or per-system error bars; this weakens the cross-family generalization claim because the strongest deterministic baseline may have been tuned differently per system.
  3. [results (adaptation)] Adaptation experiment on 256² Kolmogorov flow (results): the reported drop from 0.398 to 0.129 L2RE after five epochs is encouraging, yet the manuscript supplies no analysis of how the pretrained latent manifold enables this transfer (e.g., latent-space distance metrics or frozen vs. unfrozen components), leaving the shared-manifold premise load-bearing but unverified.
minor comments (3)
  1. [abstract] The abstract states “twelve PDE families” while the corpus is described as “16-system”; clarify the exact mapping between families and systems.
  2. [abstract] L2RE is used without an explicit definition on first appearance; add the expansion (e.g., L2 relative error) in the abstract and methods.
  3. [figures] Figure captions for rollout visualizations should include the exact number of steps shown and the color scale used for error fields.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify key aspects of the work. We address each major point below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [abstract / methods (contraction bound)] The derivation of the sufficient-condition contraction bound from input noising (abstract and methods) does not explicitly state the latent-norm or Lipschitz-constant hypotheses required for uniformity across the twelve PDE families; without these, it is unclear whether the bound is satisfied by the single shared manifold or only after family-specific scaling of noise variance or context-update rate, directly affecting the explanation for the 20-step L2RE reduction.

    Authors: We agree that the hypotheses underlying the contraction bound should be stated explicitly for clarity. In the revised methods section we will add the precise assumptions on latent-norm bounds and Lipschitz constants of the flow-matching operator, and we will show that these assumptions are satisfied uniformly by the shared manifold without requiring per-family adjustments to noise variance or context-update rate. Supporting derivations and numerical verification of the Lipschitz constants will be moved to the appendix. revision: yes

  2. Referee: [results] Table reporting 5- and 10-step results (results section): the 15/16 win count is presented without full baseline implementation details, exact train/validation splits, or per-system error bars; this weakens the cross-family generalization claim because the strongest deterministic baseline may have been tuned differently per system.

    Authors: We acknowledge that additional transparency is required. The revised results section will include complete implementation details and hyper-parameter settings for every baseline, the exact train/validation splits used across all 16 systems, and per-system error bars computed as standard deviation over three independent random seeds. These additions will allow direct verification that the reported 15/16 win rate reflects consistent generalization rather than differential tuning. revision: yes

  3. Referee: [results (adaptation)] Adaptation experiment on 256² Kolmogorov flow (results): the reported drop from 0.398 to 0.129 L2RE after five epochs is encouraging, yet the manuscript supplies no analysis of how the pretrained latent manifold enables this transfer (e.g., latent-space distance metrics or frozen vs. unfrozen components), leaving the shared-manifold premise load-bearing but unverified.

    Authors: We will add a dedicated paragraph and accompanying figure in the results section that quantifies the transfer. This will report (i) average Euclidean distances in the pretrained latent space between the held-out 256² Kolmogorov trajectories and the nearest pretraining systems, and (ii) an ablation specifying which components were frozen (PhyVAE encoder/decoder) versus fine-tuned (PFlowFT) during the five-epoch adaptation. These metrics will directly support the claim that the shared manifold facilitates rapid transfer. revision: yes

Circularity Check

0 steps flagged

No load-bearing circularity; contraction bound presented as independent derivation

full rationale

The paper defines the PhyVAE, PFlowFT, and input-noising strategy as modeling choices, then separately derives a sufficient-condition contraction bound from the noising procedure to explain observed rollout stability. Performance numbers (e.g., 20-step L2RE reduction, Kolmogorov adaptation) are reported against external deterministic baselines rather than being fitted or renamed within the same equations. No self-citation chains, uniqueness theorems imported from prior author work, or ansatz smuggling appear in the provided text. The shared latent manifold is a definitional premise, but the central stability claim rests on the derived bound plus empirical validation, keeping circularity mild and non-load-bearing.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the existence of a compressible shared latent manifold across PDE families and on the validity of the derived contraction bound; no explicit numerical free parameters, new physical entities, or non-standard axioms are named in the abstract.

pith-pipeline@v0.9.0 · 5591 in / 1334 out tokens · 29438 ms · 2026-05-16T05:18:08.589013+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Flow Learners for PDEs: Toward a Physics-to-Physics Paradigm for Scientific Computing

    cs.LG 2026-04 unverdicted novelty 6.0

    Flow learners parameterize transport vector fields to generate PDE trajectories through integration, offering a physics-to-physics organizing principle for learned solvers.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · cited by 1 Pith paper · 10 internal anchors

  1. [1]

    doi: 10.1038/s41524-024-01488-z

    ISSN 2057-3960. doi: 10.1038/s41524-024-01488-z. URL http://dx.doi. org/10.1038/s41524-024-01488-z. Cao, Y ., Liu, Y ., Yang, L., Yu, R., Schaeffer, H., and Osher, S. Vicon: Vision in-context operator networks for multi- physics fluid dynamics prediction,

  2. [2]

    Vicon: Vision in- context operator networks for multi-physics fluid dynamics prediction,

    URL https: //arxiv.org/abs/2411.16063. Chen, B., Monso, D. M., Du, Y ., Simchowitz, M., Tedrake, R., and Sitzmann, V . Diffusion forcing: Next-token prediction meets full-sequence diffusion,

  3. [3]

    URL https://arxiv.org/abs/2407.01392. Chen, W. W., Lee, D., Balogun, O., and Chen, W. Gan-duf: Hierarchical deep generative models for design under free-form geometric uncertainty,

  4. [4]

    8 Latent Generative Solvers Dao, T

    URL https: //arxiv.org/abs/2202.10558. 8 Latent Generative Solvers Dao, T. Flashattention-2: Faster attention with better par- allelism and work partitioning,

  5. [5]

    FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

    URL https: //arxiv.org/abs/2307.08691. Du, P., Parikh, M. H., Fan, X., Liu, X.-Y ., and Wang, J.-X. Conditional neural field latent diffusion model for gener- ating spatiotemporal turbulence.Nature Communications, 15(1):10416,

  6. [6]

    Gao, K., Shi, J., Zhang, H., Wang, C., Xiao, J., and Chen, L

    URL https://arxiv.org/abs/2412.09328. Gao, K., Shi, J., Zhang, H., Wang, C., Xiao, J., and Chen, L. Ca2-vdm: Efficient autoregressive video diffusion model with causal generation and cache sharing,

  7. [7]

    arXiv preprint arXiv:2411.16375 (2024) 5, 30

    URL https://arxiv.org/abs/2411.16375. Gupta, J. K. and Brandstetter, J. Towards multi- spatiotemporal-scale generalized pde modeling,

  8. [8]

    Hao, Z., Su, C., Liu, S., Berner, J., Ying, C., Su, H., Anand- kumar, A., Song, J., and Zhu, J

    URLhttps://arxiv.org/abs/2209.15616. Hao, Z., Su, C., Liu, S., Berner, J., Ying, C., Su, H., Anand- kumar, A., Song, J., and Zhu, J. Dpot: Auto-regressive denoising operator transformer for large-scale pde pre- training,

  9. [9]

    Hatanp¨a¨a, V ., Ku, E., Stock, J., Emani, M., Foreman, S., Jung, C., Madireddy, S., Nguyen, T., Sastry, V ., Sinurat, R

    URL https://arxiv.org/abs/ 2403.03542. Hatanp¨a¨a, V ., Ku, E., Stock, J., Emani, M., Foreman, S., Jung, C., Madireddy, S., Nguyen, T., Sastry, V ., Sinurat, R. A. O., Wheeler, S., Zheng, H., Arcomano, T., Vishwanath, V ., and Kotamarthi, R. Aeris: Argonne earth systems model for reliable and skillful predictions,

  10. [10]

    Ho, J., Jain, A., and Abbeel, P

    URL https://arxiv.org/abs/2509.13523. Ho, J., Jain, A., and Abbeel, P. Denoising diffusion prob- abilistic models,

  11. [11]

    Denoising Diffusion Probabilistic Models

    URL https://arxiv.org/ abs/2006.11239. Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., de Las Casas, D., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., van den Driessche, G., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., Rae, J. W., Vinyals, O., and Sifre, L. Training compute-...

  12. [12]

    Training Compute-Optimal Large Language Models

    URL https://arxiv.org/ abs/2203.15556. Hu, P., Wang, R., Zheng, X., Zhang, T., Feng, H., Feng, R., Wei, L., Wang, Y ., Ma, Z.-M., and Wu, T. Wavelet dif- fusion neural operator,

  13. [13]

    org/abs/2412.04833

    URL https://arxiv. org/abs/2412.04833. Huang, J., Yang, G., Wang, Z., and Park, J. J. Diffusionpde: Generative pde-solving under partial observation,

  14. [14]

    Huang, X., Li, Z., He, G., Zhou, M., and Shechtman, E

    URLhttps://arxiv.org/abs/2406.17763. Huang, X., Li, Z., He, G., Zhou, M., and Shechtman, E. Self forcing: Bridging the train-test gap in autoregressive video diffusion,

  15. [15]

    Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

    URL https://arxiv.org/ abs/2506.08009. Ibtehaz, N. and Kihara, D. Acc-unet: A completely convolutional unet model for the 2020s,

  16. [16]

    Jaegle, A., Gimeno, F., Brock, A., Zisserman, A., Vinyals, O., and Carreira, J

    URL https://arxiv.org/abs/2308.13680. Jaegle, A., Gimeno, F., Brock, A., Zisserman, A., Vinyals, O., and Carreira, J. Perceiver: General perception with it- erative attention,

  17. [17]

    Jiang, Z., Tang, Q., and Wang, Z

    URL https://arxiv.org/ abs/2103.03206. Jiang, Z., Tang, Q., and Wang, Z. Generative reliability- based design optimization using in-context learning ca- pabilities of large language models,

  18. [18]

    Kaplan, J., McCandlish, S., Henighan, T., Brown, T

    URL https: //arxiv.org/abs/2503.22401. Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. Scaling laws for neural language mod- els,

  19. [19]

    URL https://arxiv.org/abs/2001. 08361. Kovachki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhat- tacharya, K., Stuart, A., and Anandkumar, A. Neural operator: Learning maps between function spaces with applications to pdes.Journal of Machine Learning Re- search, 24(89):1–97,

  20. [20]

    Schaefer, Beth Shapiro, and Richard E

    Li, L., Carver, R., Lopez-Gomez, I., Sha, F., and An- derson, J. Generative emulation of weather fore- cast ensembles with diffusion models.Science Ad- vances, 10(13):eadk4489, 2024a. doi: 10.1126/sciadv. adk4489. URL https://www.science.org/ doi/abs/10.1126/sciadv.adk4489. Li, T. and He, K. Back to basics: Let denoising generative models denoise,

  21. [21]

    Back to Basics: Let Denoising Generative Models Denoise

    URL https://arxiv.org/ abs/2511.13720. Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhat- tacharya, K., Stuart, A., and Anandkumar, A. Fourier neural operator for parametric partial differential equa- tions,

  22. [22]

    Fourier Neural Operator for Parametric Partial Differential Equations

    URL https://arxiv.org/abs/ 2010.08895. 9 Latent Generative Solvers Li, Z., Meidani, K., and Farimani, A. B. Transformer for partial differential equations’ operator learning,

  23. [23]

    Li, Z., Zheng, H., Kovachki, N., Jin, D., Chen, H., Liu, B., Azizzadenesheli, K., and Anandkumar, A

    URLhttps://arxiv.org/abs/2205.13671. Li, Z., Zheng, H., Kovachki, N., Jin, D., Chen, H., Liu, B., Azizzadenesheli, K., and Anandkumar, A. Physics- informed neural operator for learning partial differential equations.ACM/JMS Journal of Data Science, 1(3):1–27, 2024b. Lipman, Y ., Chen, R. T. Q., Ben-Hamu, H., Nickel, M., and Le, M. Flow matching for genera...

  24. [24]

    Flow Matching for Generative Modeling

    URLhttps://arxiv.org/abs/2210.02747. Lippe, P., Veeling, B. S., Perdikaris, P., Turner, R. E., and Brandstetter, J. Pde-refiner: Achieving accurate long rollouts with neural pde solvers,

  25. [25]

    Liu, X., Gong, C., and Liu, Q

    URL https: //arxiv.org/abs/2308.05732. Liu, X., Gong, C., and Liu, Q. Flow straight and fast: Learn- ing to generate and transfer data with rectified flow,

  26. [26]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    URLhttps://arxiv.org/abs/2209.03003. Liu, Y ., Sun, J., He, X., Pinney, G., Zhang, Z., and Schaef- fer, H. Prose-fd: A multimodal pde foundation model for learning multiple operators for forecasting fluid dynam- ics,

  27. [27]

    Ohana, R., McCabe, M., Meyer, L., Morel, R., Agocs, F

    URL https://arxiv.org/ abs/2310.02994. Ohana, R., McCabe, M., Meyer, L., Morel, R., Agocs, F. J., Beneitez, M., Berger, M., Burkhart, B., Burns, K., Dalziel, S. B., Fielding, D. B., Fortunato, D., Goldberg, J. A., Hi- rashima, K., Jiang, Y .-F., Kerswell, R. R., Maddu, S., Miller, J., Mukhopadhyay, P., Nixon, S. S., Shen, J., Wat- teaux, R., Blancard, B. ...

  28. [28]

    Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T

    URLhttps://arxiv.org/abs/2409.08477. Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T. R., El-Kadi, A., Masters, D., Ewalds, T., Stott, J., Mohamed, S., Battaglia, P., Lam, R., and Willson, M. Gencast: Diffusion-based ensemble forecasting for medium-range weather,

  29. [29]

    Gencast: Diffusion-based ensemble forecasting for medium-range weather.arXiv preprint arXiv:2312.15796, 2023

    URL https://arxiv.org/abs/ 2312.15796. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models,

  30. [30]

    High-Resolution Image Synthesis with Latent Diffusion Models

    URL https://arxiv.org/ abs/2112.10752. Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., and Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics,

  31. [31]

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics

    URL https: //arxiv.org/abs/1503.03585. Song, Y ., Durkan, C., Murray, I., and Ermon, S. Maximum likelihood training of score-based diffusion models,

  32. [32]

    Sun, J., Liu, Y ., Zhang, Z., and Schaeffer, H

    URLhttps://arxiv.org/abs/2101.09258. Sun, J., Liu, Y ., Zhang, Z., and Schaeffer, H. Towards a foundation model for partial differential equations: Mul- tioperator learning and extrapolation.Physical Review E, 111(3):035304,

  33. [33]

    Xie, D., Xu, Z., Hong, Y ., Tan, H., Liu, D., Liu, F., Kaufman, A., and Zhou, Y

    URL https: //arxiv.org/abs/2506.07902. Xie, D., Xu, Z., Hong, Y ., Tan, H., Liu, D., Liu, F., Kaufman, A., and Zhou, Y . Progressive autoregressive video diffu- sion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pp. 6322–6332,

  34. [34]

    10 Latent Generative Solvers Ye, Z., Huang, X., Chen, L., Liu, H., Wang, Z., and Dong, B

    URL https://arxiv.org/abs/ 2505.17004. 10 Latent Generative Solvers Ye, Z., Huang, X., Chen, L., Liu, H., Wang, Z., and Dong, B. Pdeformer: Towards a foundation model for one- dimensional partial differential equations, 2025a. URL https://arxiv.org/abs/2402.12652. Ye, Z., Zhang, C.-S., and Wang, W. Recurrent neural op- erators: Stable long-term pde predic...

  35. [35]

    Unisolver: PDE- Conditional Transformers Are Universal PDE Solvers, July 2025

    URL https://arxiv. org/abs/2405.17527. Zhou, S., Yang, P., Wang, J., Luo, Y ., and Loy, C. C. Upscale-a-video: Temporal-consistent diffusion model for real-world video super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2535–2545,

  36. [36]

    11 Latent Generative Solvers A

    URL https://arxiv.org/abs/2512.23056. 11 Latent Generative Solvers A. Numerical analysis on error accumulation A.1. Deterministic neural operators: geometric compounding Setup.Let Φ :X → X be the true one-step latent dynamics and fθ :X → X a deterministic learned operator. Rollouts satisfy x∗ n+1 = Φ(x∗ n),x n+1 =f θ(xn).(A1) Defineδ n :=x n −x ∗ n ande n...

  37. [37]

    p(xs+1 | ˜xs, t, c)p(c| ˆx1:s)(A15) where ˜xs is ak-softenedx s at intermediate transport timet

    A1: No pyramids.Same modeling objective as the full method, but conditioning uses the full-resolution history (no temporal downsampling), increasing attention cost. p(xs+1 | ˜xs, t, c)p(c| ˆx1:s)(A15) where ˜xs is ak-softenedx s at intermediate transport timet. 15 Latent Generative Solvers A2: No physics context maintenance.We remove the latent dynamics v...