arxiv: 2605.00062 · v2 · submitted 2026-04-30 · 📡 eess.IV · cs.LG

Recognition: unknown

RETO: A Rotary-Enhanced Transformer Operator for High-Fidelity Prediction of Automotive Aerodynamics

Bojun Zhang , Huiyu Yang , Yunpeng Wang , Yuntian Chen , Yuanwei Bin , Rikui Zhang , Jianchun Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:34 UTC · model grok-4.3

classification 📡 eess.IV cs.LG

keywords rotary positional encodingneural operatortransformeraerodynamic predictionautomotive designspatial awareness mechanismcomputational fluid dynamicsmachine learning for physics

0 comments

The pith

RETO improves vehicle aerodynamics predictions by adding rotary positional encodings to transformer operators for better spatial correlation capture.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the rotary-enhanced transformer operator, or RETO, as a neural solver for rapid aerodynamic evaluation around complex vehicle shapes. It combines sinusoidal-cosine encodings to reference global positions with rotary positional encodings that represent relative displacements through unitary rotations, aiming to enforce translation invariance and sharpen local gradient resolution. On the ShapeNet car dataset, this yields a relative L2 error of 0.063, a 16 percent improvement over the Transolver baseline. On the high-fidelity DrivAerML benchmark, it reaches relative L2 errors of 0.089 for surface pressure and 0.097 for velocity, corresponding to 23 percent and 19 percent gains over the same baseline. Lower attention entropy values further indicate that the mechanism keeps focus on localized flow features rather than diffusing across the entire domain.

Core claim

RETO achieves relative L2 errors of 0.063 on ShapeNet and 0.089 for surface pressure plus 0.097 for velocity on DrivAerML by using a dual-stage spatial awareness mechanism of sinusoidal-cosine encodings for global referencing together with rotary positional encodings that encode spatial relations via unitary rotations. This setup enforces translation invariance while improving resolution of local gradients in the flow field. The approach outperforms RegDGCNN, AB-UBT, and Transolver baselines across both benchmarks, and information-theoretic analysis shows its attention entropy peaks at 0.35 compared with 0.75 for Transolver at 10^4 resolution, confirming more focused attention that preserves

What carries the argument

Dual-stage spatial awareness mechanism consisting of sinusoidal-cosine encodings for global referencing and rotary positional encodings (RoPE) for relative displacements via unitary rotations.

If this is right

More accurate surface pressure and velocity field predictions for automotive shapes without full computational fluid dynamics runs.
Faster iterative vehicle design cycles through reliable neural-operator evaluations on high-fidelity meshes.
Improved preservation of localized flow gradients due to translation-invariant relative encoding.
Lower attention entropy that reduces global diffusion of focus during high-resolution inference.
Potential extension to other mesh-based physical simulation tasks that require precise spatial relations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same rotary enhancement could be tested on fluid problems outside automotive design, such as aircraft wakes or wind around buildings.
If the focused attention pattern generalizes, it may improve robustness of neural operators to small changes in input geometry.
Controlled ablations that isolate RoPE from other architecture choices would clarify whether the mechanism scales to larger models.
Integration into real-time design optimization loops could become feasible if the error reductions hold on additional independent datasets.

Load-bearing premise

The accuracy gains and lower entropy stem specifically from the dual-stage sinusoidal-cosine plus RoPE mechanism rather than from differences in model capacity, training details, or dataset specifics.

What would settle it

Train an otherwise identical transformer operator without the rotary positional encoding component but with matched capacity on the DrivAerML dataset and check whether the relative L2 errors for pressure and velocity match or exceed RETO's reported values of 0.089 and 0.097.

read the original abstract

Rapid aerodynamic evaluation is crucial for modern vehicle design, yet existing neural operators struggle to capture intricate spatial correlations. We propose the rotary-enhanced transformer operator (RETO), a novel neural solver featuring a dual-stage spatial awareness mechanism: sinusoidal-cosine encodings for global referencing and rotary positional encodings (RoPE) for relative displacements. RoPE encodes spatial relations via unitary rotations, enforcing translation invariance and enhancing local gradient resolution. RETO is validated on ShapeNet and the high-fidelity DrivAerML benchmark. On ShapeNet, RETO achieves a relative $L_2$ error of 0.063, outperforming RegDGCNN at 0.125 and representing a 16\% improvement over the Transolver baseline, which yields an error of 0.075. These performance gains are further amplified on the DrivAerML dataset, where RETO achieves relative $L_2$ errors of 0.089 for surface pressure and 0.097 for velocity. In comparison, Transolver results in errors of 0.116 and 0.121 for the same metrics, indicating that RETO achieves precision enhancements of 23\% and 19\%, respectively. For comprehensive comparison, the surface pressure and velocity errors for AB-UBT are 0.102 and 0.124, while RegDGCNN yields 0.235 and 0.312, respectively. Information-theoretical analysis shows that the entropy peak of RETO at 0.35 is significantly lower than that of Transolver at 0.75 under $10^4$ resolution, indicating a focused attentional mechanism capable of preserving localized gradients against global diffusion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RETO adds rotary encodings to a transformer operator for car aerodynamics and reports 16-23% lower errors than Transolver on ShapeNet and DrivAerML, but the gains are not tied to the new mechanism.

read the letter

The paper's core move is to stack rotary positional encodings on top of standard sinusoidal ones inside a transformer-style neural operator. This dual-stage setup aims to give both global position reference and better handling of relative distances and local gradients for flow prediction around vehicles. On the reported numbers, RETO reaches 0.063 relative L2 error on ShapeNet versus 0.075 for Transolver, and it improves surface pressure and velocity errors on DrivAerML by roughly 20 percent over the same baseline. The entropy comparison is a small but concrete addition that suggests the attention stays more localized.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes the Rotary-Enhanced Transformer Operator (RETO), a neural operator for automotive aerodynamics that augments a transformer backbone with a dual-stage spatial awareness mechanism: sinusoidal-cosine encodings for global position referencing combined with rotary positional encodings (RoPE) to enforce translation invariance and improve local gradient resolution. It reports concrete performance gains on public benchmarks—relative L2 error of 0.063 on ShapeNet (16% better than Transolver at 0.075) and 0.089/0.097 for surface pressure/velocity on DrivAerML (23%/19% better than Transolver)—plus lower attention entropy (0.35 vs. 0.75), attributing these to the RoPE component.

Significance. If the reported error reductions prove robust under controlled ablations and matched-capacity baselines, RETO could meaningfully advance neural-operator methods for high-fidelity CFD in automotive design by better preserving localized flow features. The empirical evaluation on established benchmarks (ShapeNet, DrivAerML) against named baselines is a positive step toward reproducible claims in the field.

major comments (3)

[Abstract] Abstract: the central performance claims (0.063 vs. 0.075 on ShapeNet; 0.089/0.097 vs. 0.116/0.121 on DrivAerML) are presented without error bars, multiple-run statistics, or any mention of training details (parameter counts, FLOPs, optimizer settings, epoch counts, or data augmentation). This absence prevents verification that the 16–23% gains arise from the dual-stage RoPE mechanism rather than unstated differences in model capacity or training.
[Abstract and Experiments] Abstract and Experiments: no ablation is described that removes only the RoPE stage while retaining the sinusoidal-cosine encodings, nor any matched-capacity control against Transolver. Without these isolating experiments the attribution of lower entropy (0.35 vs. 0.75) and error reductions specifically to translation invariance and local gradient resolution remains unsecured.
[Abstract] Abstract: the entropy comparison at 10^4 resolution is reported but lacks the definition or formula used to compute attention entropy, making it impossible to assess whether the lower value directly demonstrates focused preservation of localized gradients.

minor comments (2)

[Abstract] Clarify the precise definition of 'relative L2 error' (e.g., reference the equation or normalization used) so readers can reproduce the reported numbers.
[Experiments] Consider adding a summary table in the experiments section that includes parameter counts and training hyperparameters for all baselines to facilitate fair comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's detailed feedback on our manuscript. We address each of the major comments below and have made revisions to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claims (0.063 vs. 0.075 on ShapeNet; 0.089/0.097 vs. 0.116/0.121 on DrivAerML) are presented without error bars, multiple-run statistics, or any mention of training details (parameter counts, FLOPs, optimizer settings, epoch counts, or data augmentation). This absence prevents verification that the 16–23% gains arise from the dual-stage RoPE mechanism rather than unstated differences in model capacity or training.

Authors: We agree that providing statistical robustness and implementation details is essential for reproducibility. In the revised manuscript, we will include error bars derived from multiple independent training runs (using different random seeds) along with standard deviations for the reported relative L2 errors. Additionally, we will add a comprehensive 'Implementation Details' section detailing parameter counts, computational complexity (FLOPs), optimizer settings, number of training epochs, batch sizes, and any data augmentation techniques employed. This will enable verification that the performance gains are due to the proposed dual-stage spatial awareness mechanism. revision: yes
Referee: [Abstract and Experiments] Abstract and Experiments: no ablation is described that removes only the RoPE stage while retaining the sinusoidal-cosine encodings, nor any matched-capacity control against Transolver. Without these isolating experiments the attribution of lower entropy (0.35 vs. 0.75) and error reductions specifically to translation invariance and local gradient resolution remains unsecured.

Authors: We concur that controlled ablations are necessary to securely attribute the improvements to the RoPE component. In the revised version, we will expand the Experiments section with new ablation studies. Specifically, we will report results for a RETO variant without the RoPE stage (using only the sinusoidal-cosine encodings) and a capacity-matched Transolver baseline with equivalent model size and computational budget. These additions will provide direct evidence that the lower attention entropy and error reductions arise from the translation invariance and enhanced local gradient resolution provided by the rotary encodings. revision: yes
Referee: [Abstract] Abstract: the entropy comparison at 10^4 resolution is reported but lacks the definition or formula used to compute attention entropy, making it impossible to assess whether the lower value directly demonstrates focused preservation of localized gradients.

Authors: We thank the referee for pointing out this omission. The attention entropy is defined as the Shannon entropy of the attention probability distribution, averaged across heads and samples: H = -sum p log p, where p denotes the softmax-normalized attention weights. At the 10^4 resolution, this yields the reported values of 0.35 for RETO versus 0.75 for Transolver. Lower entropy indicates a more peaked distribution, corresponding to focused attention on localized flow features rather than uniform diffusion. We will incorporate the explicit formula and this interpretation into the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external empirical benchmarks

full rationale

The paper proposes the RETO architecture with a dual-stage spatial awareness mechanism (sinusoidal-cosine encodings plus RoPE) and validates it via direct measurements of relative L2 error on the external ShapeNet and DrivAerML datasets against independent baselines (Transolver, RegDGCNN, AB-UBT). Reported metrics (0.063 on ShapeNet, 0.089/0.097 on DrivAerML) and entropy values are obtained from model runs on these benchmarks, not derived by algebraic reduction or fitting that loops back to the mechanism definition. No equations, uniqueness theorems, or self-citations appear in the provided text that would make any central claim equivalent to its inputs by construction. The performance deltas are presented as measured outcomes, leaving the attribution question as one of experimental controls rather than logical circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions of neural operator applicability to fluid problems and the utility of positional encodings; no new physical entities or ad-hoc constants are introduced beyond typical neural network training.

axioms (1)

domain assumption Neural operators can learn mappings from geometry to fluid fields on the given benchmarks
Invoked implicitly by training and evaluating the model on ShapeNet and DrivAerML for aerodynamic prediction.

pith-pipeline@v0.9.0 · 5627 in / 1227 out tokens · 53438 ms · 2026-05-09T20:34:22.090816+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 15 canonical work pages · 4 internal anchors

[1]

Global EV Outlook 2022: Securing Supplies for an Electric Future

2 Ekta Meena Bibra, Elizabeth Connelly, Shobhan Dhir, Michael Drtil, Pauline Henriot, Inchan Hwang, Jean‑Baptiste Le Marois, Sarah McBain, Leonardo Paoli, and Jacob Teter. Global EV Outlook 2022: Securing Supplies for an Electric Future. Technical report, International Energy Agency,

2022
[2]

Large-eddy simulations: theory and applications

19 Ugo Piomelli and Jeffrey Robert Chasnov. Large-eddy simulations: theory and applications. In Turbulence and Transition Modelling: Lecture Notes from the ERCOF- TAC/IUTAM Summerschool held in Stockholm, 12–20 June, 1995, pages 269–336. Springer,

1995
[3]

Fourier Neural Operator for Parametric Partial Differential Equations

21 Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895,

work page internal anchor Pith review arXiv 2010
[4]

Transformer for partial differential equations’ operator learning.arXiv preprint arXiv:2205.13671, 2022

26 Zijie Li, Kazem Meidani, and Amir Barati Farimani. Transformer for partial differential equations’ operator learning. arXiv preprint arXiv:2205.13671,

work page arXiv
[5]

AB-UPT: Scaling Neural CFD Surrogates for High- Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics Transformers,

28 Benedikt Alkin, Maurits Bleeker, Richard Kurle, Tobias Kronlachner, Reinhard Sonnleitner, Matthias Dorfer, and Johannes Brandstetter. AB-UPT: Scaling Neural CFD Surrogates for High-Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics Transformers. arXiv preprint arXiv:2502.09692,

work page arXiv
[6]

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

32 Lu Lu, Pengzhan Jin, and George Em Karniadakis. Deep- onet: Learning nonlinear operators for identifying dif- ferential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193,

work page internal anchor Pith review arXiv 1910
[7]

Factorized fourier neural operators.arXiv preprint arXiv:2111.13802, 2021

33 Alasdair Tran, Alexander Mathews, Lexing Xie, and Cheng Soon Ong. Factorized Fourier neural operators. arXiv preprint arXiv:2111.13802,

work page arXiv
[8]

LLaMA: Open and Efficient Foundation Language Models

40 Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Bap- tiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and eﬀicient foundation language models. arXiv preprint arXiv:2302.13971,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Flowformer: Linearizing trans- formers with conservation flows

41 Haixu Wu, Jialong Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Flowformer: Linearizing trans- formers with conservation flows. arXiv preprint arXiv:2202.06258,

work page arXiv
[10]

Transolver: A fast transformer solver for pdes on general geometries.arXiv preprint arXiv:2402.02366, 2024

43 Haixu Wu, Huakun Luo, Haowen Wang, Jianmin Wang, and Mingsheng Long. Transolver: A fast transformer solver for PDEs on general geometries. arXiv preprint arXiv:2402.02366,

work page arXiv
[11]

Transolver++: An accurate neural solver for pdes on million-scale geometries.arXiv preprint arXiv:2502.02414, 2025

44 Huakun Luo, Haixu Wu, Hang Zhou, Lanxiang Xing, Yichen Di, Jianmin Wang, and Mingsheng Long. Tran- solver++: An accurate neural solver for PDEs on million- scale geometries. arXiv preprint arXiv:2502.02414 ,

work page arXiv
[12]

Transolver-3: Scaling Up Transformer Solvers to Industrial-Scale Geometries,

45 Hang Zhou, Haixu Wu, Haonan Shangguan, Yuezhou Ma, Huikun Weng, Jianmin Wang, and Mingsheng Long. Transolver-3: Scaling Up Transformer Solvers to Industrial-Scale Geometries. arXiv preprint arXiv:2602.04940,

work page arXiv
[13]

Adams, R

46 Corey Adams, Rishikesh Ranade, Ram Cherukuri, and Sanjay Choudhry. GeoTransolver: Learning Physics on Irregular Domains Using Multi-scale Geometry Aware Physics Attention Transformer. arXiv preprint arXiv:2512.20399,

work page arXiv
[14]

GeoFormer: Mesh-Free Geometry-to-Flow Alignment Framework for Real-Time Aerodynamics on Non-Watertight CAD

47 Jianghang Gu, Yuntian Chen, Yuanwei Bin, and Shiyi Chen. GeoFormer: Mesh-Free Geometry-to-Flow Alignment Framework for Real-Time Aerodynamics on Non-Watertight CAD. Available at SSRN 5601950. 48 Peimian Du, Jiabin Liu, Xiaowei Jin, Wangmeng Zuo, and Hui Li. Spatiotemporal Field Generation Based on Hybrid Mamba-Transformer with Physics-informed Fine-tun...

work page arXiv
[15]

MSPT: Eﬀicient Large-Scale Physical Model- ing via Parallelized Multi-Scale Attention.arXiv preprint arXiv:2512.01738,

49 Pedro MP Curvo, Jan-Willem van de Meent, and Maksim Zhdanov. MSPT: Eﬀicient Large-Scale Physical Model- ing via Parallelized Multi-Scale Attention.arXiv preprint arXiv:2512.01738,

work page arXiv
[16]

DrivAerML: High-Fidelity Computational Fluid Dynamics Dataset for Road-Car External Aerodynamics,

52 Neil Ashton, Charles Mockett, Marian Fuchs, Louis Fliessbach, Hendrik Hetmann, Thilo Knacke, Nor- bert Schonwald, Vangelis Skaperdas, Grigoris Fotiadis, Astrid Walle, et al. DrivAerML: High-fidelity computa- tional fluid dynamics dataset for road-car external aero- dynamics. arXiv preprint arXiv:2408.11969,

work page arXiv
[17]

ShapeNet: An Information-Rich 3D Model Repository

61 Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012,

work page internal anchor Pith review arXiv
[18]

Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel

66 Y ao-Hung Hubert Tsai, Shaojie Bai, Makoto Y amada, Louis-Philippe Morency, and Ruslan Salakhutdinov. Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel. InProceed- ings of the 2019 conference on empirical methods in nat- ural language processing and the 9th international joint conference on natural lang...

2019
[19]

On the aerodynam- ics of the notchback open cooling DrivAer: A detailed investigation of wind tunnel data for improved correla- tion and reference

67 Burkhard Hupertz, Karel Chalupa, Lothar Krueger, Kevin Howard, Hans-Dieter Glueck, Neil Lewington, Jin-Hyuck Chang, and Y ong-su Shin. On the aerodynam- ics of the notchback open cooling DrivAer: A detailed investigation of wind tunnel data for improved correla- tion and reference. SAE International Journal of Ad- vances and Current Practices in Mobili...

2021
[20]

Calibration, Entropy Rates, and Memory in Language Models

69 Mark Braverman, Xinyi Chen, Sham Kakade, Karthik Narasimhan, Cyril Zhang, and Yi Zhang. Calibration, Entropy Rates, and Memory in Language Models. In 37th International Conference on Machine Learning: ICML 2020, Online, 13-18 July 2020, Part 2 of 15,

2020
[21]

Integrating locality-aware attention with transformers for general geometry pdes

70 Minsu Koh, Beom-Chul Park, Heejo Kong, and Seong- Whan Lee. Integrating locality-aware attention with transformers for general geometry pdes. In 2025 Inter- national Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2025

2025