arxiv: 2402.02366 · v2 · submitted 2024-02-04 · 💻 cs.LG · cs.NA· math.NA

Recognition: 2 theorem links

· Lean Theorem

Transolver: A Fast Transformer Solver for PDEs on General Geometries

Haixu Wu , Huakun Luo , Haowen Wang , Jianmin Wang , Mingsheng Long

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:37 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA

keywords PDE solverTransformerPhysics-Attentionmesh discretizationgeneral geometriesscientific machine learningfluid dynamicsindustrial simulation

0 comments

The pith

By grouping mesh points with similar physical states into learnable slices, Transolver lets Transformers solve PDEs on arbitrary geometries in linear time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard Transformers struggle to model PDEs because discretized meshes contain millions of points whose individual interactions are hard to capture directly. Transolver replaces point-wise attention with Physics-Attention, which partitions the domain into a small number of flexible slices that collect points sharing comparable physical states. Attention is then performed only among the tokens that represent these slices. The resulting solver works across geometries without retraining and scales linearly with mesh size. If the grouping is faithful, it would let accurate simulations run on industrial-scale designs such as car bodies and airfoils at far lower cost than current mesh-based methods.

Core claim

Transolver introduces Physics-Attention that adaptively splits the discretized domain into a series of learnable slices of flexible shapes; mesh points under similar physical states are assigned to the same slice. Attention is computed among the physics-aware tokens encoded from these slices rather than among every individual mesh point. This captures intricate physical correlations under complex geometries, confers end-to-end geometry-general modeling capacity, and reduces computation to linear complexity.

What carries the argument

Physics-Attention that partitions the mesh into learnable slices of similar physical states and performs attention over the resulting slice tokens.

If this is right

Delivers consistent state-of-the-art accuracy with a 22 percent relative gain across six standard PDE benchmarks.
Extends directly to large-scale industrial meshes such as full car exteriors and airfoil configurations.
Maintains linear complexity, enabling simulations on meshes too large for quadratic-attention Transformers.
Requires no geometry-specific preprocessing or retraining when the domain shape changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same slice-grouping idea could be applied to other mesh-based tasks such as structural mechanics or electromagnetic simulations.
If slice tokens prove stable across resolutions, coarser meshes might suffice for equivalent accuracy, lowering memory use.
Combining the slice mechanism with existing fast attention variants could further reduce wall-clock time on very large problems.
The geometry-general property suggests the model might serve as a drop-in surrogate in design optimization loops that vary shape parameters.

Load-bearing premise

Points that share similar physical states can be grouped into slices whose tokens preserve all critical local interactions without omission.

What would settle it

A benchmark PDE whose solution contains sharp local discontinuities or isolated features that the learned slices systematically merge with neighboring regions, producing measurable error increase relative to point-wise attention baselines.

read the original abstract

Transformers have empowered many milestones across various fields and have recently been applied to solve partial differential equations (PDEs). However, since PDEs are typically discretized into large-scale meshes with complex geometries, it is challenging for Transformers to capture intricate physical correlations directly from massive individual points. Going beyond superficial and unwieldy meshes, we present Transolver based on a more foundational idea, which is learning intrinsic physical states hidden behind discretized geometries. Specifically, we propose a new Physics-Attention to adaptively split the discretized domain into a series of learnable slices of flexible shapes, where mesh points under similar physical states will be ascribed to the same slice. By calculating attention to physics-aware tokens encoded from slices, Transovler can effectively capture intricate physical correlations under complex geometrics, which also empowers the solver with endogenetic geometry-general modeling capacity and can be efficiently computed in linear complexity. Transolver achieves consistent state-of-the-art with 22% relative gain across six standard benchmarks and also excels in large-scale industrial simulations, including car and airfoil designs. Code is available at https://github.com/thuml/Transolver.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Transolver's Physics-Attention groups mesh points by learned state similarity into slices for linear-time PDE solving and posts consistent benchmark gains, but the lack of spatial locality in grouping raises questions for sharp-gradient cases.

read the letter

The main thing here is that Transolver replaces point-wise attention with a new operator that adaptively partitions the mesh into flexible slices of points sharing similar physical states, then attends over tokens from those slices. This yields linear complexity and a reported 22% relative improvement across six standard PDE benchmarks, plus strong results on industrial cases like airfoils and cars. Code release makes the claims checkable.

Referee Report

2 major / 1 minor

Summary. The manuscript presents Transolver, a transformer-based solver for PDEs on general geometries. It introduces Physics-Attention, which adaptively partitions discretized meshes into a series of learnable slices that group points sharing similar intrinsic physical states, encodes these slices into physics-aware tokens, and computes attention over the tokens to capture physical correlations. The approach is claimed to operate in linear complexity, provide end-to-end geometry-general modeling, and deliver consistent state-of-the-art performance with a 22% relative gain across six standard benchmarks while also succeeding on large-scale industrial simulations such as car and airfoil designs.

Significance. If the central mechanism proves robust, the work could meaningfully advance transformer-based PDE solvers by moving beyond direct point-wise processing of large meshes to an intrinsic-state representation that scales to complex geometries. The reported empirical gains on both academic benchmarks and industrial cases, together with public code release, would strengthen the case for practical adoption in computational science and engineering.

major comments (2)

[Abstract and §3] Abstract and §3 (Physics-Attention definition): the grouping of mesh points into slices relies exclusively on learned similarity of intrinsic states with no spatial locality or neighborhood constraint. For PDEs exhibiting sharp gradients or shocks (explicitly tested in the airfoil and car benchmarks), points assigned to the same slice can be arbitrarily distant; the subsequent token attention then aggregates non-local information while discarding mesh-adjacent gradients that conventional discretizations preserve. This directly threatens the claim that the slices capture 'intricate physical correlations' without missing critical local interactions.
[§4 and §5] §4 and §5 (experimental validation): the manuscript reports consistent benchmark gains but provides no ablation or sensitivity analysis demonstrating that slice grouping remains stable under changes in mesh resolution or discretization. The abstract likewise offers no derivation or controlled experiment showing that attention operates on intrinsic physical states rather than learned proxies, which is load-bearing for the 'physics-aware' interpretation of the method.

minor comments (1)

[§3] Notation for the slice encoding and token generation in §3 could be made more explicit to distinguish learned parameters from any physical quantities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Physics-Attention definition): the grouping of mesh points into slices relies exclusively on learned similarity of intrinsic states with no spatial locality or neighborhood constraint. For PDEs exhibiting sharp gradients or shocks (explicitly tested in the airfoil and car benchmarks), points assigned to the same slice can be arbitrarily distant; the subsequent token attention then aggregates non-local information while discarding mesh-adjacent gradients that conventional discretizations preserve. This directly threatens the claim that the slices capture 'intricate physical correlations' without missing critical local interactions.

Authors: We appreciate the referee's careful reading of the Physics-Attention mechanism. The absence of explicit spatial locality constraints is intentional: by grouping points according to learned intrinsic physical states, the method can capture long-range correlations that are physically meaningful even across distant mesh locations, which is particularly useful for complex geometries. The airfoil and car benchmarks contain sharp gradients and shocks, and Transolver still reports state-of-the-art accuracy on these tasks, indicating that the learned slices do not simply discard local information. Local gradients are preserved through the per-slice token encoding step before attention is computed. To make this reasoning more explicit, we will revise §3 to clarify how the token representation and subsequent attention together maintain both local and non-local physical interactions, and we will add a short discussion of this point in the abstract. revision: partial
Referee: [§4 and §5] §4 and §5 (experimental validation): the manuscript reports consistent benchmark gains but provides no ablation or sensitivity analysis demonstrating that slice grouping remains stable under changes in mesh resolution or discretization. The abstract likewise offers no derivation or controlled experiment showing that attention operates on intrinsic physical states rather than learned proxies, which is load-bearing for the 'physics-aware' interpretation of the method.

Authors: We agree that additional empirical support would strengthen the claims. In the revised manuscript we will add, in §4 and §5, new ablation studies that vary mesh resolution and discretization type on representative benchmarks and report the resulting stability of the learned slice assignments (including quantitative metrics and visualizations). We will also expand the abstract and §3 with a clearer motivation for the intrinsic-state interpretation, supported by the new controlled experiments and by qualitative analysis of slice groupings on the airfoil and car cases. These additions will be included in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: Physics-Attention is a learned operator with no self-referential reductions

full rationale

The paper defines Physics-Attention as an adaptive grouping of mesh points into learnable slices via a parameterized neural mechanism trained end-to-end. No equation defines a quantity in terms of itself, no fitted parameter is relabeled as a prediction, and no load-bearing step reduces to a self-citation or prior ansatz by the same authors. Performance is reported via empirical benchmarks rather than a closed derivation. The architecture is self-contained against external data.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 1 invented entities

The approach rests on the empirical hypothesis that physical states are discoverable from mesh data via learned slicing; no explicit free parameters beyond standard transformer hyperparameters are named, but the number of slices and slice-shape flexibility are implicit learned quantities.

free parameters (1)

number of slices
Hyperparameter controlling how many learnable physical-state groups are created; value is chosen during training.

invented entities (1)

physics-aware tokens from slices no independent evidence
purpose: Compact representation of groups of mesh points sharing similar physical states
New token type introduced to enable attention over physical rather than geometric neighborhoods.

pith-pipeline@v0.9.0 · 5511 in / 1062 out tokens · 24235 ms · 2026-05-15T21:37:13.966590+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose a new Physics-Attention to adaptively split the discretized domain into a series of learnable slices of flexible shapes, where mesh points under similar physical states will be ascribed to the same slice

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A meshfree exterior calculus for generalizable and data-efficient learning of physics from point clouds
cs.LG 2026-05 unverdicted novelty 8.0

MEEC equips point clouds with a discrete exterior calculus that satisfies exact conservation and is differentiable in point positions, allowing a single trained kernel to produce compatible physics on unseen geometrie...
Discovering Physical Directions in Weight Space: Composing Neural PDE Experts
cs.LG 2026-05 unverdicted novelty 7.0

Fine-tuning neural PDE operators to regime endpoints reveals a physical direction in weight space that CCM uses to compose accurate merged models for new or extrapolated regimes from metadata or short prefixes.
CATO: Charted Attention for Neural PDE Operators
cs.AI 2026-05 unverdicted novelty 7.0

CATO learns a continuous latent chart for efficient axial attention on PDE meshes and adds derivative-aware supervision to improve accuracy and reduce oversmoothing on general geometries.
PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics
cs.LG 2026-05 unverdicted novelty 7.0

PerFlow embeds physics constraints into rectified flow sampling through guidance-free conditioning and constraint-preserving projections, achieving efficient sparse reconstruction and uncertainty quantification for sp...
Learning Neural Operator Surrogates for the Black Hole Accretion Code
astro-ph.HE 2026-04 unverdicted novelty 7.0

Physics-informed Fourier neural operators recover plasmoid formation in sparse SRRMHD vortex data where data-only models fail, and transformer operators approximate AMR jet evolution, marking first reported uses in th...
Faster by Design: Interactive Aerodynamics via Neural Surrogates Trained on Expert-Validated CFD
cs.LG 2026-04 unverdicted novelty 7.0

A graph-based neural operator trained on expert-validated race-car CFD data reaches accuracy levels usable for early-stage interactive aerodynamic design exploration.
Fluids You Can Trust: Property-Preserving Operator Learning for Incompressible Flows
physics.flu-dyn 2026-02 conditional novelty 7.0

A kernel operator learning framework constructs property-preserving bases so that predicted incompressible velocity fields satisfy divergence-free and periodicity conditions exactly, delivering up to six orders lower ...
U-HNO: A U-shaped Hybrid Neural Operator with Sparse-Point Adaptive Routing for Non-stationary PDE Dynamics
cs.LG 2026-05 unverdicted novelty 6.0

U-HNO uses adaptive per-point routing in a U-shaped hybrid architecture to achieve state-of-the-art accuracy on PDE benchmarks with sharp localized features.
ShardTensor: Domain Parallelism for Scientific Machine Learning
cs.DC 2026-05 unverdicted novelty 6.0

ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.
Don't Fix the Basis -- Learn It: Spectral Representation with Adaptive Basis Learning for PDEs
cs.LG 2026-05 unverdicted novelty 6.0

ABLE learns a spatially adaptive Parseval frame from data via an ancillary density to replace fixed bases in spectral neural operators for PDEs.
PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting
cs.AI 2026-05 unverdicted novelty 6.0

PnP-Corrector decouples physics simulation from error correction to counter reciprocal error amplification in coupled spatiotemporal forecasting, cutting error by 29% in a 300-day ocean-atmosphere test.
PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting
cs.AI 2026-05 unverdicted novelty 6.0

PnP-Corrector decouples physics simulation from error correction via a plug-and-play agent, cutting error by 29% in 300-day global ocean-atmosphere forecasts.
CarCrashNet: A Large-Scale Dataset and Hierarchical Neural Solver for Data-Driven Structural Crash Simulation
cs.LG 2026-05 unverdicted novelty 6.0

CarCrashNet releases a large-scale open benchmark dataset of structural crash simulations and a hierarchical neural solver for data-driven full-vehicle crash prediction.
AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling
cs.LG 2026-05 unverdicted novelty 6.0

AeroJEPA applies joint-embedding predictive learning to produce scalable, semantically organized latent representations for 3D aerodynamic fields that support both field reconstruction and downstream design tasks.
A Hybridizable Neural Time Integrator for Stable Autoregressive Forecasting
cs.LG 2026-04 unverdicted novelty 6.0

A hybrid transformer-FEM integrator provides provable discrete energy preservation and gradient bounds for stable autoregressive forecasting of chaotic systems, with 65x fewer parameters and 9000x speedup in a fusion ...
FLARE: A Data-Efficient Surrogate for Predicting Displacement Fields in Directed Energy Deposition
cs.LG 2026-04 unverdicted novelty 6.0

FLARE predicts post-cooling displacement fields in directed energy deposition by encoding simulations as implicit neural fields whose weights are regularized to follow an affine structure in parameter space, enabling ...
A Structure-Preserving Graph Neural Solver for Parametric Hyperbolic Conservation Laws
physics.comp-ph 2026-04 unverdicted novelty 6.0

A structure-preserving GNN solver for parametric hyperbolic conservation laws achieves superior long-horizon stability and orders-of-magnitude speedups over high-resolution simulations on supersonic flow benchmarks.
Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations
cs.LG 2026-05 unverdicted novelty 5.0

Di-BiLPS combines a variational autoencoder, latent diffusion, and contrastive learning to achieve state-of-the-art accuracy on PDE problems with as little as 3% observations while supporting zero-shot super-resolutio...
Replay-Based Continual Learning for Physics-Informed Neural Operators
cs.LG 2026-05 unverdicted novelty 4.0

A replay-based continual learning strategy for physics-informed neural operators mitigates catastrophic forgetting on prior physical problems while enabling efficient adaptation to new data using only physical constraints.
RETO: A Rotary-Enhanced Transformer Operator for High-Fidelity Prediction of Automotive Aerodynamics
eess.IV 2026-04 unverdicted novelty 4.0

RETO achieves relative L2 errors of 0.063 on ShapeNet and 0.089/0.097 on DrivAerML surface pressure/velocity, outperforming Transolver and other baselines.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 19 Pith papers · 4 internal anchors

[1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

ShapeNet: An Information-Rich 3D Model Repository

Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Training Deep Nets with Sublinear Memory Cost

Chen, T., Xu, B., Zhang, C., and Guestrin, C. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Neural Operator: Graph Kernel Network for Partial Differential Equations

Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhat- tacharya, K., Stuart, A., and Anandkumar, A. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020a. Li, Z., Kovachki, N. B., Azizzadenesheli, K., liu, B., Bhat- tacharya, K., Stuart, A., and Anandkumar, A. Fourier neural operator for parame...

work page internal anchor Pith review Pith/arXiv arXiv 2003
[5]

B., Choy, C., Li, B., Kossaifi, J., Otta, S

Li, Z., Kovachki, N. B., Choy, C., Li, B., Kossaifi, J., Otta, S. P., Nabian, M. A., Stadler, M., Hundt, C., Azizzade- nesheli, K., and Anandkumar, A. Geometry-informed neural operator for large-scale 3d PDEs. In NeurIPS, 2023a. Li, Z., Kovachki, N. B., Choy, C., Li, B., Kossaifi, J., Otta, S. P., Nabian, M. A., Stadler, M., Hundt, C., Azizzade- nesheli, ...

work page arXiv 2003
[6]

HT-net: Hierarchical transformer based operator learning model for multiscale PDEs

Liu, X., Xu, B., and Zhang, L. HT-net: Hierarchical transformer based operator learning model for multiscale PDEs. arXiv preprint arXiv:2210.10890,

work page arXiv
[7]

and Kolter, J

Trockman, A. and Kolter, J. Z. Patches are all you need? arXiv preprint arXiv:2201.09792,

work page arXiv
[8]

C., and Savarese, S

Xue, L., Yu, N., Zhang, S., Li, J., Mart´ın-Mart´ın, R., Wu, J., Xiong, C., Xu, R., Niebles, J. C., and Savarese, S. Ulip-2: Towards scalable multimodal pre-training for 3d understanding. arXiv preprint arXiv:2305.08275,

work page arXiv
[9]

12 Transolver: A Fast Transformer Solver for PDEs on General Geometries A. Proof of Theorem 3.4 Firstly, we would like to present how to formalize canonical attention into a Monte-Carlo approximation of an integral operator as a Lemma, which is summarized from proofs provided by Cao (2021) and Kovachki et al. (2023). Lemma A.1. The canonical attention mec...

work page 2021
[10]

and geo-FNO (Li et al., 2022), Shape-Net Car is from (Umetani & Bickel,

work page 2022
[11]

#Mesh records the size of discretized meshes

and preprocessed by (Deng et al., 2024), and AirfRANS is from (Bonnet et al., 2022). #Mesh records the size of discretized meshes. #Dataset is organized as the number of samples in training and test sets. GEOMETRY BENCHMARKS #DIM #M ESH #INPUT #O UTPUT #DATASET POINT CLOUD ELASTICITY 2D 972 STRUCTURE INNER STRESS (1000,

work page 2024
[12]

• Navier-Stokes equations for fluid(McLean, 2012): Airfoil, Pipe, Navier-Stokes, Shape-Net Car and AirfRANS

Note that these benchmarks involve the following three types of PDEs: • Solid material (Dym et al., 1973): Elasticity and Plasticity. • Navier-Stokes equations for fluid(McLean, 2012): Airfoil, Pipe, Navier-Stokes, Shape-Net Car and AirfRANS. • Darcy’s law(Hubbert, 1956): Darcy. Here are the details of each benchmark. Elasticity This benchmark is to estim...

work page 1973
[13]

Plasticity This benchmark is to predict the future deformation of the plasticity material under the impact from above by an arbitrary-shaped die (Li et al., 2022)

As for the experiment, 1000 samples with different structures are generated for training and another 200 samples are used for test. Plasticity This benchmark is to predict the future deformation of the plasticity material under the impact from above by an arbitrary-shaped die (Li et al., 2022). For each case, the input is the shape of the die, which is di...

work page 2022
[14]

Experimentally, 900 samples with different die shapes are used for model training and 80 new samples are for test

The output is the deformation of each mesh point in the future 20 time steps, that is a tensor in the shape of 20 × 101 × 31 × 4, which contains the deformation in four directions. Experimentally, 900 samples with different die shapes are used for model training and 80 new samples are for test. Airfoil This task is to estimate the Mach number based on the...

work page 2022
[15]

Navier-Stokes This benchmark is to model the incompressible and viscous flow on a unit torus, where the fluid density is constant and viscosity is set as 10−5 (Li et al., 2021)

1000 samples with different pipe shapes are used for model training and 200 new samples are for test, which are generated by controlling the centerline of the pipe. Navier-Stokes This benchmark is to model the incompressible and viscous flow on a unit torus, where the fluid density is constant and viscosity is set as 10−5 (Li et al., 2021). The fluid fiel...

work page 2021
[16]

As for the lift coefficient of AirfRANS,bi is set as (0, 0, −1)

and A is the area of the smallest rectangle enclosing the front of cars. As for the lift coefficient of AirfRANS,bi is set as (0, 0, −1). The relative L2 is defined between the ground truth coefficient and the coefficient calculated from the predicted velocity and pressure field. Spearman’s rank correlations for drag and lift coefficients Given K samples ...

work page 1961
[17]

Especially, we configure Project() in Eq

These configurations can also align our model parameters and running efficiency with other Transformer operators. Especially, we configure Project() in Eq. (1) as a single Linear layer for unstructured meshes: Elasticity, ShapeNet Car and AirfRANS, a convolution layer with 3 × 3 kernel for others. Next, we will present the implementation of all the baseli...

work page 2021
[18]

Training configurations are directly from previous works without extra tuning (Bonnet et al., 2022; Hao et al., 2023; Deng et al., 2024)

Training and model configurations of Transolver. Training configurations are directly from previous works without extra tuning (Bonnet et al., 2022; Hao et al., 2023; Deng et al., 2024). Here Lv and Ls represent the loss on volume and surface fields respectively. As for Darcy, we adopt an additional spatial gradient regularization term Lg following ONO (X...

work page 2022
[19]

Experimentally, we found that ONO (Xiao et al.,

that is in submission, we adopted their official code provided in the OpenReview and rerun it under the same training and hyperparameter-search strategy as other baselines and also tried different linear attention designs (Katharopoulos et al., 2020a; Kitaev et al., 2020; Xiong et al., 2021; Cao, 2021; Choromanski et al., 2021a). Experimentally, we found ...

work page 2020
[20]

and OFormer (Li et al., 2023c) come across the unstable training problem on Shape-Net Car and AirfRANS. This may come from that ONO adopts the Cholesky decomposition to the channel attention map to ensure feature orthogonality, which requires the channel attention to be positive semidefinite. However, in large-scale mesh scenarios, this assumption may not...

work page 2017
[21]

As for GNO (Li et al., 2020a) and 3D-GeoCA (Deng et al., 2024), we adopt the official code base of 3D-GeoCA

following the code base of AirfRANS (Bonnet et al., 2022). As for GNO (Li et al., 2020a) and 3D-GeoCA (Deng et al., 2024), we adopt the official code base of 3D-GeoCA. And we implement GINO based on its official code. Note that in the official paper, GINO is only trained to estimate the surface pressure of cars, which is not enough to calculate the drag c...

work page 2022
[22]

and Transolver at different resolutions of Darcy. NUMBER OF MESH POINTS 484 1,681 3,364 7,225 10,609 19,881 44,521 168,921 (RESOLUTIONS ) (22 ×22) (41 ×41) (58 ×58) (85 ×85) (103 ×103) (141 ×141) (211 ×211) (411 ×411) PLAIN TRANSFORMER 0.02017 0.0103 0.0073 0.0081 OOM OOM OOM OOM TRANSOLVER 0.02019 0.0089 0.0058 0.0059 0.0057 0.0062 0.0063 0.0060 RELATIVE...

work page 2017
[23]

utilize well-designed efficient attention for more than 1k mesh points instead of plain Transformer. Thus, the experiments of plain Transformer in over 1k tokens are with the help of the gradient-checkpointing technique (Chen et al., 2016), which can reduce the GPU memory but also significantly affect speed. Even so, we can only measure plain Transformer ...

work page 2016
[24]

D.1. Learned Slices Original mesh We visualize the learned slices on 5 benchmarks: Shape-Net Car (Figure 9), Airfoil (Figure 10), Pipe (Figure 12), Naiver-Stokes (Figure 14), and Darcy (Figure 16). These visualizations provide valuable insights into the 18 Transolver: A Fast Transformer Solver for PDEs on General Geometries model’s ability to capture dive...

work page 2021
[25]

Here, GNOT fails in predicting the future deformation of Plasticity

and GNOT (Hao et al., 2023), Transolver excels in capturing the deformation of Plasticity, the shock wave of Airfoil, fluid in the Pipe end and the swirling parts of Navier-Stokes. Here, GNOT fails in predicting the future deformation of Plasticity. Figure

work page 2023
[26]

This means that we can set the number of slices M at will, regardless of the exact division restriction

or U-Net (Ron- neberger et al., 2015), our design in learning slices is not limited by the discretization. This means that we can set the number of slices M at will, regardless of the exact division restriction. This also makes our model free from inflexible padding operations. In this paper, we mainly experiment with the official configuration. We would ...

work page 2015
[27]

0.0838 0.1403 0.3836 0.9806 0.2021 0.4649 0.4425 0.9784 GRAPH U-N ET (GAO & JI,

work page 2021
[28]

0.2789 0.2382 1.7718 0.7631 0.4902 1.1071 0.6525 0.8927 GNO (L I ET AL ., 2020 A) 0.0833 0.1562 0.4408 0.9878 0.1626 0.2359 0.3038 0.9884 GALERKIN (CAO,

work page 2020
[29]

Apply to Lagrangian Settings In the main text, we follow the convention of previous neural operators (Li et al., 2021; Wu et al.,

0.0305 0.0959 0.3268 0.9865 0.0471 0.3466 0.3497 0.9868 GINO (L I ET AL ., 2023 A) 0.0839 0.1825 0.4180 0.9645 0.1589 0.2469 0.2583 0.9923 TRANSOLVER (OURS ) 0.0143 0.0364 0.2996 0.9896 0.0357 0.2275 0.1500 0.9950 E.4. Apply to Lagrangian Settings In the main text, we follow the convention of previous neural operators (Li et al., 2021; Wu et al.,

work page 2023
[30]

There is another branch of tasks, named Lagrangian settings, which simulates the dynamics system (e.g

and experiment with Eulerian datasets, where the geometry of input data is fixed. There is another branch of tasks, named Lagrangian settings, which simulates the dynamics system (e.g. fluid) by tracking a series of particles. To further verify the effectiveness of Transolver in handling ever-changing geometrics, we also experiment with a Lagrangian PDE-s...

work page 2020
[31]

Efficiency is evaluated on inputs of different meshes during training

Model efficiency comparison in Elasticity (Relative L2) and Shape-Net Car ( ρD), where we select five Transformer-based methods that can be applied to unstructured meshes. Efficiency is evaluated on inputs of different meshes during training. Running time is measured by the time to complete one epoch, which contains 103 iterations. “/” indicates the basel...

work page 2048
[32]

8192 5.2477 2.33 67.170 16384 5.2477 4.23 112.552 32768 5.2477 7.46 209.923 1024 1.1093 1.47 69.759 0.0118 / 2048 1.1093 1.75 76.245 ONO 4096 1.1093 2.30 100.134 (Xiao et al.,

work page 2048
[33]

8192 1.1093 3.47 149.598 16384 1.1093 5.64 255.339 32768 1.1093 10.09 462.459 1024 0.8844 0.63 28.147 0.0183 / 2048 0.8844 0.69 30.983 OFormer 4096 0.8844 0.80 31.113 (Li et al., 2023c) 8192 0.8844 1.02 47.904 16384 0.8844 1.67 91.671 32768 0.8844 2.44 182.205 1024 1.0414 0.62 26.507 0.0240 0.9764 2048 1.0414 0.66 26.503 Galerkin Transformer 4096 1.0414 0...

work page 2048