Physics-Informed Transformer operator for the prediction of three-dimensional turbulence

Huiyu Yang; Jianchun Wang; Sunan Zhao; Yunpeng Wang; Zhihong Guo

arxiv: 2601.19351 · v7 · submitted 2026-01-27 · ⚛️ physics.flu-dyn

Physics-Informed Transformer operator for the prediction of three-dimensional turbulence

Zhihong Guo , Sunan Zhao , Huiyu Yang , Yunpeng Wang , Jianchun Wang This is my paper

Pith reviewed 2026-05-16 11:10 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn

keywords turbulence predictionphysics-informed learningtransformer operatorlarge-eddy simulationhomogeneous isotropic turbulencesubgrid-scale modellong-term extrapolation

0 comments

The pith

Physics-informed Transformer operators forecast 3D turbulence with stability beyond 25 times the training horizon using far less memory than Fourier methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This work proposes physics-informed Transformer operators called PITO and PIITO for predicting the evolution of three-dimensional turbulence. The models are built on a vision Transformer backbone and are trained solely by embedding the large-eddy simulation equations into the loss function, eliminating the need for labeled data. In decaying homogeneous isotropic turbulence the operators deliver accurate statistical and structural predictions when run more than twenty-five times longer than the training interval and they outperform the physics-informed Fourier neural operator. The same models also handle forced turbulence successfully where the Fourier operator fails. They achieve these results with 80 to 90 percent lower GPU memory and only a few percent of the parameters required by the Fourier baseline while running faster than traditional numerical solvers.

Core claim

The authors show that a vision-Transformer operator, when trained with the large-eddy simulation equations inside the loss, can learn the time-stepping operator for three-dimensional turbulence, automatically determine the subgrid-scale coefficient from a single dataset, and produce stable, accurate forecasts of both decaying and forced homogeneous isotropic turbulence for extrapolation horizons exceeding twenty-five times the length of the training data.

What carries the argument

PITO, the physics-informed Transformer operator, which partitions the velocity field into patches, processes them with self-attention, and advances the field by one time step while the loss enforces consistency with the filtered Navier-Stokes equations.

If this is right

PITO and PIITO maintain stability and accuracy on statistical properties and flow structures for long-term extrapolation exceeding 25 times the training horizon in decaying HIT.
The models outperform PIFNO on decaying HIT and succeed where PIFNO fails on forced HIT.
PITO cuts GPU memory use by 79.5% and parameters to 31.5% of PIFNO levels; PIITO cuts memory by 91.3% and parameters to 3.1%.
The Transformer models run substantially faster than conventional large-eddy simulation computations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The drastic reduction in memory and parameters may allow turbulence forecasting on devices with constrained compute budgets, such as in embedded flow-control systems.
Automatic extraction of the SGS coefficient from one dataset indicates potential applicability to varied flow regimes provided the embedded equations remain representative.
Attention mechanisms appear better suited than Fourier layers for capturing the strong nonlinearities present in forced turbulence cases.

Load-bearing premise

Embedding the large-eddy simulation equations in the loss function suffices to let the model extract a generalizable subgrid-scale coefficient from a single dataset and to keep predictions accurate and non-divergent far beyond the training horizon.

What would settle it

A direct test that trains the model on one decaying HIT realization and then measures whether statistical moments and energy spectra remain physically plausible when the simulation is continued for more than twenty-five training horizons in a forced HIT configuration at a different Reynolds number.

read the original abstract

Data-driven turbulence prediction methods often face challenges related to data dependency and lack of physical interpretability. In this paper, we propose a physics-informed Transformer operator (PITO) and its implicit variant (PIITO) for predicting three-dimensional (3D) turbulence, which are developed based on the vision Transformer (ViT) architecture with an appropriate patch size. Given the current flow field, the Transformer operator computes its prediction for the next time step. By embedding the large-eddy simulation (LES) equations into the loss function, PITO and PIITO can learn solution operators without using labeled data. Furthermore, PITO can automatically learn the subgrid scale (SGS) coefficient using a single set of flow data during training. Both PITO and PIITO exhibit excellent stability and accuracy on the predictions of various statistical properties and flow structures for the situation of long-term extrapolation exceeding 25 times the training horizon in decaying homogeneous isotropic turbulence (HIT), and outperform the physics-informed Fourier neural operator (PIFNO). Furthermore, PITO exhibits a remarkable accuracy on the predictions of forced HIT where PIFNO fails. Notably, PITO and PIITO reduce GPU memory consumption by 79.5% and 91.3% while requiring only 31.5% and 3.1% of the parameters, respectively, compared to PIFNO. Moreover, both PITO and PIITO models are much faster compared to traditional LES method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The vision-Transformer operator learns an SGS coefficient from one dataset and delivers big memory cuts plus stable 25x-horizon rollouts in 3D turbulence, but the single-trajectory fit leaves the coefficient's generality unproven.

read the letter

The main thing here is a vision-Transformer operator for 3D turbulence prediction that uses a physics-informed loss based on LES equations and learns the subgrid-scale coefficient from a single dataset. It reports stable forecasts over 25 times the training length in decaying homogeneous isotropic turbulence and cuts memory use by 80-90 percent compared to the physics-informed Fourier neural operator. The new part is adapting the patch-based Transformer to handle 3D flow fields as an operator, with both explicit and implicit versions. The efficiency gains look solid on paper: much lower GPU memory, fewer parameters, and faster inference than traditional LES. It also claims better results on forced turbulence cases where the Fourier version breaks down. These practical improvements in resource use are worth noting for anyone running repeated simulations. The weaker area is the claim that the model discovers a generalizable SGS coefficient. Since it trains on one set of flow data, the coefficient might simply adjust to match the energy decay in that particular run rather than reflecting a universal physics term. The abstract gives no number for the learned coefficient, no comparison to the usual Smagorinsky value, and no ablation that turns off the physics loss to test its contribution. Long extrapolations are impressive if true, but without those checks it's possible the stability comes from overfitting the training trajectory's statistics. This paper targets people in computational fluid dynamics who want faster surrogates for turbulence. Readers working on operator learning or reduced-order models will see value in the memory savings and the stability results, assuming the experiments check out. The work shows clear thinking on combining Transformers with physics constraints, so it deserves a serious referee to examine the full validation and any code. I would send it to peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a physics-informed Transformer operator (PITO) and its implicit variant (PIITO) for three-dimensional turbulence prediction. Built on a Vision Transformer architecture with appropriate patch size, the models predict the next time step from the current flow field. By embedding the large-eddy simulation (LES) equations into the loss function, they learn solution operators without labeled data and automatically determine the subgrid-scale (SGS) coefficient from a single flow dataset. The authors report excellent long-term stability and accuracy for extrapolations exceeding 25 times the training horizon in decaying homogeneous isotropic turbulence (HIT), outperforming the physics-informed Fourier neural operator (PIFNO), with additional strong results on forced HIT and substantial reductions in GPU memory and parameter count relative to PIFNO.

Significance. If the reported long-term extrapolation performance and automatic SGS learning prove robust, the work would constitute a meaningful advance in physics-informed neural operators for turbulence. The combination of Transformer efficiency with embedded LES constraints could enable more scalable, data-light modeling of turbulent flows compared to existing Fourier-based operators, with potential implications for reduced-order modeling and real-time simulation in fluid dynamics.

major comments (3)

[Abstract] Abstract: The central claim that PITO 'can automatically learn the subgrid scale (SGS) coefficient using a single set of flow data during training' is load-bearing for the physics-informed contribution. However, the manuscript reports neither the numerical value of the learned coefficient nor any comparison to standard literature values (e.g., Smagorinsky C_s ≈ 0.1–0.18). Without this, it remains unclear whether the coefficient reflects a transferable physical parameter or a dataset-specific fit.
[Results (long-term extrapolation)] Results section on long-term extrapolation: The assertion of 'excellent stability and accuracy' for predictions exceeding 25 times the training horizon in decaying HIT is a strong claim that requires quantitative support. Specific metrics—such as time-dependent L2 errors on velocity fields, kinetic energy decay rates, or energy spectra at multiple extrapolation horizons—must be provided to demonstrate that performance does not degrade and to substantiate outperformance over PIFNO.
[Method (loss function)] Method description of the loss function: The embedding of LES equations is described at a high level, but the exact mathematical form of the physics-informed loss (including the parameterization and optimization of the SGS term) is not given. An explicit equation for the residual loss would clarify whether the SGS coefficient is constrained in a manner that guarantees generalizability beyond the single training trajectory.

minor comments (2)

[Abstract] The efficiency comparisons (79.5% and 91.3% memory reduction, 31.5% and 3.1% parameter counts) are presented in the abstract but would be clearer if accompanied by a dedicated table listing absolute values for PITO, PIITO, and PIFNO.
[Abstract] The abstract states that PITO 'exhibits a remarkable accuracy on the predictions of forced HIT where PIFNO fails,' but no quantitative metrics or figures for the forced case are referenced; adding these would strengthen the comparative claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of our work on the physics-informed Transformer operator. We agree that additional details on the learned SGS coefficient, quantitative metrics for long-term predictions, and an explicit loss formulation will improve clarity and substantiation of the claims. We will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that PITO 'can automatically learn the subgrid scale (SGS) coefficient using a single set of flow data during training' is load-bearing for the physics-informed contribution. However, the manuscript reports neither the numerical value of the learned coefficient nor any comparison to standard literature values (e.g., Smagorinsky C_s ≈ 0.1–0.18). Without this, it remains unclear whether the coefficient reflects a transferable physical parameter or a dataset-specific fit.

Authors: We agree that explicitly reporting the learned SGS coefficient value and its comparison to literature is necessary to support the claim. In the revised manuscript, we will add the optimized numerical value of the coefficient obtained from training on the single dataset and compare it directly to standard Smagorinsky constants (C_s ≈ 0.1–0.18), along with a brief discussion of its physical consistency and implications for transferability. revision: yes
Referee: [Results (long-term extrapolation)] Results section on long-term extrapolation: The assertion of 'excellent stability and accuracy' for predictions exceeding 25 times the training horizon in decaying HIT is a strong claim that requires quantitative support. Specific metrics—such as time-dependent L2 errors on velocity fields, kinetic energy decay rates, or energy spectra at multiple extrapolation horizons—must be provided to demonstrate that performance does not degrade and to substantiate outperformance over PIFNO.

Authors: We acknowledge that the current results section would benefit from more granular quantitative metrics to rigorously support the long-term stability claims. In the revised manuscript, we will include time-dependent L2 error curves for the velocity fields, plots of kinetic energy decay rates throughout the extrapolation window, and energy spectra evaluated at multiple horizons (e.g., 5×, 15×, and 25× the training horizon). These will be shown alongside corresponding PIFNO results to quantify the sustained accuracy and outperformance. revision: yes
Referee: [Method (loss function)] Method description of the loss function: The embedding of LES equations is described at a high level, but the exact mathematical form of the physics-informed loss (including the parameterization and optimization of the SGS term) is not given. An explicit equation for the residual loss would clarify whether the SGS coefficient is constrained in a manner that guarantees generalizability beyond the single training trajectory.

Authors: We agree that an explicit equation will enhance reproducibility and clarify the optimization of the SGS term. In the revised methods section, we will provide the full mathematical expression for the physics-informed loss, including the residual based on the filtered LES equations, the parameterization of the SGS coefficient as a trainable scalar, and the optimization procedure. This will also include a short discussion on how the single-trajectory training constrains the coefficient while enabling the observed extrapolation performance. revision: yes

Circularity Check

1 steps flagged

SGS coefficient fitted from single dataset via embedded LES loss, so long-term extrapolation predictions reduce to use of that fitted parameter

specific steps

fitted input called prediction [Abstract]
"Furthermore, PITO can automatically learn the subgrid scale (SGS) coefficient using a single set of flow data during training. Both PITO and PIITO exhibit excellent stability and accuracy on the predictions of various statistical properties and flow structures for the situation of long-term extrapolation exceeding 25 times the training horizon in decaying homogeneous isotropic turbulence (HIT)"

The SGS coefficient is obtained by fitting to the single training flow dataset (via the embedded LES loss); the subsequent 'predictions' of statistical properties and flow structures at horizons >25x training then employ this fitted coefficient. The reported long-term accuracy is therefore statistically tied to the training fit rather than constituting an independent derivation or parameter-free forecast.

full rationale

The paper's core claim is that embedding LES equations in the loss allows PITO to learn a generalizable SGS coefficient from one decaying HIT trajectory and then deliver accurate statistics and structures for extrapolations >25x the training horizon. This step matches the fitted-input-called-prediction pattern: the coefficient is optimized on the training data through the physics loss, after which the same coefficient is used for the reported long-term rollouts. No independent first-principles derivation or external validation of the coefficient is shown; the extrapolation accuracy therefore depends on the quality of the fit to the original dataset rather than emerging separately from it. No other circularity patterns (self-definition, self-citation load-bearing, etc.) are evident from the provided text.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard LES equations being sufficient to constrain the operator and on the assumption that a single flow dataset supplies enough information to determine the SGS coefficient without additional regularization or validation.

free parameters (1)

SGS coefficient = data-dependent
Automatically learned from a single set of flow data during training; this parameter directly controls the unresolved scales in the loss.

axioms (1)

domain assumption Large-eddy simulation (LES) equations govern the resolved scales
Embedded into the loss function to enable training without labeled data.

pith-pipeline@v0.9.0 · 5571 in / 1356 out tokens · 37991 ms · 2026-05-16T11:10:02.123695+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By embedding the large-eddy simulation (LES) equations into the loss function, PITO and PIITO can learn solution operators without using labeled data. Furthermore, PITO can automatically learn the subgrid scale (SGS) coefficient using a single set of flow data during training.
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The SM model assumes that the SGS stress is proportional to the resolved strain rate tensor... τij − 1/3 τkk δij = −2(Csmag Δ)² |S̄| S̄ij

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Large-eddy simulation nets (LESnets) based on physics-informed neural operator for wall-bounded turbulence
physics.flu-dyn 2026-04 unverdicted novelty 6.0

LESnets integrates LES equations and the law of the wall into F-FNO to enable data-free, stable long-term predictions of wall-bounded turbulence at Re_tau up to 1000 on coarse grids, matching traditional LES accuracy ...