Velocityformer: Broken-Symmetry-Matched Equivariant Graph Transformers for Cosmological Velocity Reconstruction
Pith reviewed 2026-05-21 02:50 UTC · model grok-4.3
The pith
Matching a graph transformer's symmetry to the line-of-sight direction improves galaxy velocity reconstruction accuracy by 35 percent over linear theory.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Velocityformer is an equivariant graph transformer whose inductive bias is matched to the broken symmetry created by the observer's line-of-sight direction. When this architecture is conditioned on the physics-based long-wavelength velocity field, the correlation r with true velocities rises 35 percent above the standard linear-theory prediction and exceeds other machine-learning baselines at all data volumes. The resulting velocity maps improve the signal-to-noise ratio of kinematic Sunyaev-Zel'dovich measurements by the same factor and remain accurate when applied zero-shot to high-fidelity catalogs.
What carries the argument
An equivariant graph transformer whose attention and message-passing layers respect translational and rotational equivariance except along the explicit line-of-sight direction, thereby matching the dominant symmetry breaking present in observational data.
If this is right
- A 35 percent rise in r produces a matching 35 percent rise in signal-to-noise ratio for kinematic Sunyaev-Zel'dovich measurements on the same galaxy sample.
- The model reaches usable accuracy after training on only four low-fidelity simulations, reducing the computational cost of producing velocity reconstructions.
- Zero-shot generalization across survey geometry, cosmological parameters, and galaxy selection removes the need for retraining when analyzing new data releases.
- Performance gains remain consistent when the model size or training volume is varied, indicating that the symmetry match itself drives the improvement rather than scale alone.
Where Pith is reading between the lines
- The same broken-symmetry matching strategy could be applied to other line-of-sight-dependent cosmological fields such as redshift-space distortions or integrated Sachs-Wolfe reconstructions.
- Because the architecture stays data-efficient, it may become practical to retrain periodically on the latest simulation suites as they become available.
- If the line-of-sight symmetry match proves robust, similar inductive-bias alignment may reduce data requirements in other sparse scientific domains where observations break continuous symmetries.
Load-bearing premise
The dominant symmetry breaking in the observations arises solely from the line-of-sight direction, and training on a small number of low-fidelity simulations is enough for the model to generalize to high-fidelity catalogs and real survey data.
What would settle it
Measure the correlation coefficient r on a held-out set of high-fidelity simulated galaxy catalogs whose input geometry and cosmological parameters were never seen during training; if the reported 30 percent gain over the linear baseline disappears, the central claim fails.
Figures
read the original abstract
Precise measurement of the kinematic Sunyaev-Zel'dovich (kSZ) effect - a probe of the large-scale distribution of baryonic matter, a key observable for cosmological inference - requires accurate reconstruction of galaxy velocities from spectroscopic surveys. The signal-to-noise ratio (SNR) of kSZ measurements scales directly with the correlation coefficient $r$ between reconstructed and true velocities. We introduce Velocityformer, an equivariant graph transformer architecture designed to match the specific symmetry of the observational data. While the underlying physics is equivariant with respect to translations and rotations, observational effects break this symmetry due to the preferred line-of-sight direction. Matching the model's inductive bias to the data's broken symmetry consistently improves performance across all model sizes and training volumes, with Velocityformer improving $r$ by 35% over the standard linear theory baseline and outperforming ML baselines at every data volume. By matching the model's inductive bias to the data and conditioning on the physics-based long-wavelength solution, Velocityformer is highly data-efficient, training to high accuracy on as few as 4 low-fidelity simulations, and generalises zero-shot across input geometry, cosmological parameters, and galaxy sample. On high-fidelity simulated galaxy catalogues, this yields a 30% improvement in $r$ over the physical baseline, directly translating to the same SNR gain on observational data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Velocityformer, an equivariant graph transformer architecture for reconstructing galaxy velocities from spectroscopic surveys to improve kSZ effect measurements. The central claim is that matching the model's inductive bias to the broken rotational symmetry induced by the line-of-sight direction yields consistent performance gains, including a 35% improvement in the correlation coefficient r over the linear-theory baseline, data efficiency sufficient for training on only 4 low-fidelity simulations, and zero-shot generalization across input geometry, cosmological parameters, galaxy samples, and to high-fidelity catalogs (where a 30% r improvement over the physical baseline is reported).
Significance. If the reported gains and generalization hold, the work would meaningfully advance the SNR of kSZ-based probes of baryonic matter and cosmology by providing a symmetry-aware ML method that is unusually data-efficient. The explicit focus on matching broken symmetries and conditioning on long-wavelength physics is a constructive contribution to the growing literature on equivariant architectures in cosmology.
major comments (2)
- [Abstract and §4] Abstract and §4 (results): The headline 35% r improvement and 30% high-fidelity gain are stated without error bars, without the number of independent realizations used for the mean, and without an ablation that isolates the contribution of the broken-symmetry matching from other architectural choices. Because these numbers are load-bearing for the central claim, the absence of statistical characterization makes it impossible to judge whether the quoted gains survive modest changes in data cuts or simulation fidelity.
- [§5] §5 (generalization experiments): The zero-shot transfer from training on 4 low-fidelity simulations to high-fidelity catalogs is presented as a key strength, yet no quantitative measure of the domain gap (e.g., ratio of velocity power spectra, Wasserstein distance on pairwise velocity PDFs, or resolution-dependent dispersion statistics) is supplied. Without such diagnostics it remains possible that the inductive-bias matching is fitting low-fidelity artifacts rather than the true broken symmetry, which directly undermines the claim of applicability to real observations.
minor comments (2)
- [Methods] The definition of the correlation coefficient r and the precise baseline linear-theory estimator should be stated explicitly in the methods section rather than assumed from prior literature.
- [Figures] Figure captions for the performance-vs-data-volume plots should include the exact number of independent test realizations and the precise definition of the error bands shown.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which have helped us identify areas where additional statistical characterization and diagnostics will strengthen the manuscript. We address each major comment below and describe the revisions we plan to implement.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (results): The headline 35% r improvement and 30% high-fidelity gain are stated without error bars, without the number of independent realizations used for the mean, and without an ablation that isolates the contribution of the broken-symmetry matching from other architectural choices. Because these numbers are load-bearing for the central claim, the absence of statistical characterization makes it impossible to judge whether the quoted gains survive modest changes in data cuts or simulation fidelity.
Authors: We agree that error bars, the number of independent realizations, and an ablation isolating the broken-symmetry matching are necessary to substantiate the reported gains. In the revised manuscript we will add error bars computed across the independent realizations used for each mean value, explicitly state the number of realizations, and include an ablation study comparing the full Velocityformer architecture against a variant that removes the line-of-sight-specific positional encodings and attention biases while retaining all other components. These changes will allow readers to assess the robustness of the 35% and 30% improvements. revision: yes
-
Referee: [§5] §5 (generalization experiments): The zero-shot transfer from training on 4 low-fidelity simulations to high-fidelity catalogs is presented as a key strength, yet no quantitative measure of the domain gap (e.g., ratio of velocity power spectra, Wasserstein distance on pairwise velocity PDFs, or resolution-dependent dispersion statistics) is supplied. Without such diagnostics it remains possible that the inductive-bias matching is fitting low-fidelity artifacts rather than the true broken symmetry, which directly undermines the claim of applicability to real observations.
Authors: We acknowledge that explicit quantification of the domain gap would further support the generalization claims. In the revised §5 we will add quantitative diagnostics including the ratio of velocity power spectra between the low- and high-fidelity simulations, Wasserstein distances between pairwise velocity PDFs, and resolution-dependent velocity dispersion statistics. These measures will help demonstrate that the model captures the underlying broken symmetry rather than low-fidelity artifacts. The observed 30% improvement on high-fidelity catalogs, which incorporate different resolution and additional physical effects, already indicates that the inductive bias is learning the relevant observational symmetry. revision: partial
Circularity Check
No circularity: empirical ML results on independent simulation suites
full rationale
The paper trains an equivariant graph transformer on a small number of low-fidelity simulations and reports correlation-coefficient improvements against an external linear-theory baseline and other ML models on held-out high-fidelity catalogs. Performance metrics are obtained by direct comparison on separate data volumes and geometries; no equation or self-citation reduces the quoted r gains or zero-shot claims to a fitted hyperparameter or definitional identity. The central claim remains falsifiable on external benchmarks and does not rely on load-bearing self-citations or ansatzes smuggled from prior author work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The underlying physics is equivariant with respect to translations and rotations, but observational effects break this symmetry due to the preferred line-of-sight direction.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
While the underlying gravitational physics is equivariant under the Euclidean group E(3) of translations and rotations, the observed data is not: the aforementioned RSDs induce a preferred direction along the LOS... we embed the LOS coordinate as a scalar feature
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
DESI DR2 Results II: Measurements of Baryon Acoustic Oscillations and Cosmological Constraints
Abdul Karim, M. et al. (Oct. 2025). “DESI DR2 results. II. Measurements of baryon acoustic oscillations and cosmological constraints”. In: Phys. Rev. D 112.8, 083515, p. 083515. eprint: 2503.14738. Adame, A. G. et al. (July 2025). “DESI 2024 VII: cosmological constraints from the full-shape modeling of clustering measurements”. In: JCAP 2025.7, 028, p
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
DESI 2024 VII: Cosmological Constraints from the Full-Shape Modeling of Clustering Measurements
eprint:2411.12022. ATLAS Collaboration (2026).Carpe Datum: Scaling behavior of transformers for heavy hadron flavor identification. Tech. rep. All figures including auxiliary figures are available at https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PUBNOTES/ATL-SOFT-PUB-2026-002. Geneva: CERN. Brandstetter, J. et al. (2022). “Geometric and Physical Quantiti...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[3]
Curran Associates, Inc., pp. 35472–35496. eprint: 2305.18415. Chen, H. et al. (Aug. 2024). “Estimation of line-of-sight velocities of individual galaxies using neural networks - I. Modelling redshift-space distortions at large scales”. In: MNRAS 532.4, pp. 3947–3960. eprint:2312.03469. Chisari, N. E. et al. (2019). “Modelling baryonic feedback for survey ...
-
[4]
Modelling baryonic feedback for survey cosmology
eprint:1905.06082. Cohen, T. and M. Welling (2016). “Group Equivariant Convolutional Networks”. In:Proceedings of The 33rd International Conference on Machine Learning. Ed. by M. F. Balcan and K. Q. Weinberger. V ol
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[5]
Group Equivariant Convolutional Networks
Proceedings of Machine Learning Research. New York, New York, USA: PMLR, pp. 2990–2999. eprint:1602.07576. Dai, B. and U. Seljak (Oct. 2022). “Translation and rotation equivariant normalizing flow (TRENF) for optimal cosmological analysis”. In: MNRAS 516.2, pp. 2363–2373. eprint:2202.05282. Dawson, K. S. et al. (Jan. 2013). “The Baryon Oscillation Spectro...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[6]
The Baryon Oscillation Spectroscopic Survey of SDSS-III
eprint:1208.0022. Defferrard, M. et al. (2020). “DeepSphere: a graph-based spherical CNN”. In:International Confer- ence on Learning Representations. eprint:2012.15000. Dekel, A. et al. (July 1993). “IRAS Galaxies versus POTENT Mass: Density Fields, Biasing, and Omega”. In: ApJ 412, p
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[7]
The AEMULUS Project. I. Numerical Simulations for Precision Cosmol- ogy
DeRose, J. et al. (2019). “The AEMULUS Project. I. Numerical Simulations for Precision Cosmol- ogy”. In: ApJ 875.1, 69, p
work page 2019
-
[8]
The Aemulus Project I: Numerical Simulations for Precision Cosmology
eprint:1804.05865. DESI Collaboration et al. (Oct. 2016). “The DESI Experiment Part I: Science,Targeting, and Survey Design”. In:arXiv e-prints, arXiv:1611.00036. eprint:1611.00036. DESI Collaboration et al. (May 2026). “Data Release 1 of the Dark Energy Spectroscopic Instrument”. In: AJ 171.5, 285, p
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[9]
Data Release 1 of the Dark Energy Spectroscopic Instrument
eprint:2503.14745. Fuchs, F. et al. (2020). “SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks”. In:Advances in Neural Information Processing Systems. Ed. by H. Larochelle et al. V ol
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[10]
Large-scale density and velocity field reconstructions with neural networks
Curran Associates, Inc., pp. 1970–1981. eprint:2006.10503. 10 Ganeshaiah Veena, P. et al. (July 2023). “Large-scale density and velocity field reconstructions with neural networks”. In: MNRAS 522.4, pp. 5291–5307. eprint:2212.06439. Helly, J. C. et al. (Apr. 2026). “The FLAMINGO simulations data release”. In:arXiv e-prints. eprint: 2604.24324. Hoffmann, J...
-
[11]
Proceedings of Machine Learning Research. PMLR, pp. 2747–2755. eprint:1802.03690. Kugel, R. et al. (Dec. 2023). “FLAMINGO: calibrating large cosmological hydrodynamical simula- tions with machine learning”. In: MNRAS 526.4, pp. 6103–6127. eprint:2306.05492. Liao, Y .-L. and T. Smidt (2023). “Equiformer: Equivariant Graph Attention Transformer for 3D Atomi...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
On the Prediction of Velocity Fields from Redshift Space Galaxy Samples
02828. Nusser, A. and M. Davis (Jan. 1994). “On the Prediction of Velocity Fields from Redshift Space Galaxy Samples”. In: ApJ 421, p. L1. eprint:astro-ph/9309009. Nusser, A. et al. (Sept. 1991). “Cosmological Velocity-Density Relation in the Quasi-linear Regime”. In: ApJ 379, p
work page internal anchor Pith review Pith/arXiv arXiv 1994
-
[13]
Reducing SO(3) convolutions to SO(2) for efficient equivariant GNNs
Passaro, S. and C. L. Zitnick (2023). “Reducing SO(3) convolutions to SO(2) for efficient equivariant GNNs”. In:Proceedings of the 40th International Conference on Machine Learning. ICML’23. Honolulu, Hawaii, USA: JMLR.org. eprint:2302.03655. Perraudin, N. et al. (2019). “DeepSphere: Efficient spherical convolutional neural network with HEALPix sampling f...
-
[14]
eprint: 1909.05273. Wang, Y . and X. Yang (July 2024). “Peculiar Velocity Reconstruction from Simulations and Observa- tions Using Deep Learning Algorithms”. In: ApJ 969.2, 76, p
-
[15]
eprint:2406.14101. Wu, Z. et al. (July 2023). “AI-assisted reconstruction of cosmic velocity field from redshift-space spatial distribution of haloes”. In: MNRAS 522.3, pp. 4748–4765. eprint:2301.04586. Xiao, X. et al. (Dec. 2025). “AI-powered Reconstruction of Dark Matter Velocity Fields from Redshift-space Halo Distribution”. In: ApJ 994.2, 204, p
-
[16]
eprint:2411.11280. Yahil, A. et al. (May 1991). “A Redshift Survey of IRAS Galaxies. II. Methods for Determining Self-consistent Velocity and Density Fields”. In: ApJ 372, p
- [17]
-
[18]
Curran Associates, Inc. eprint:1703.06114. Zaroubi, S. et al. (Aug. 1995). “Wiener Reconstruction of the Large-Scale Structure”. In: ApJ 449, p
work page internal anchor Pith review Pith/arXiv arXiv 1995
-
[19]
Wiener Reconstruction of The Large Scale Structure
eprint:astro-ph/9410080. Zheng, Z. et al. (Oct. 2007). “Galaxy Evolution from Halo Occupation Distribution Modeling of DEEP2 and SDSS Galaxy Clustering”. In: ApJ 667.2, pp. 760–779. eprint:astro-ph/0703457. A Scientific background A.1 Spectroscopic galaxy surveys, redshift-space distortions, and symmetries The standard model of cosmology rests on thecosmo...
work page internal anchor Pith review Pith/arXiv arXiv 2007
-
[20]
measure the angular positions of galaxies on the sky and their redshifts, from which comoving distances are inferred given a model of the cosmic expansion history. Peculiar velocities along the line of sight (LOS) induce a Doppler shift indistinguishable from the cosmological redshift, displacing the inferred position of each galaxy along the LOS by s=x+ ...
work page 1987
-
[21]
W(k, r smooth).(6) Here, k=|k| , and µ is the cosine of the LOS angle with respect to the mode k, µ= ˆn·k/k and W(k, r smooth)is a Gaussian smoothing filter to suppress small-scale noise. Several higher-order correction methods to improve upon the linear reconstruction exist (see Kitaura et al., 2012, for an overview) but they tend to not lead to an impro...
-
[22]
optimiser, and 0.01 weight decay. Removing weight decay, adding learning rate warm-up, and using a cosine learning rate schedule were tested but found to have only negligible effects on our experiments, particularly for VELOCITYFORMER. For the four and 38 simulation boxes training regimes early stopping was employed. For the 3800 simulation boxes case tra...
work page 2000
-
[23]
D.3 GNN baseline The GNN baseline uses the implementation from Huang et al
By default, the model can attend to all galaxies in the input, but it can also work in graph mode by using the graph adjacency matrix as an attention mask. D.3 GNN baseline The GNN baseline uses the implementation from Huang et al. (2025). While that implementation already supports the velocity reconstruction task, we adapt it to accept the linear velocit...
work page 2025
-
[24]
15 Table 5: Hyperparameters of the broken-E(3)VELOCITYFORMER. 0.05M 0.2M 0.8M 3M 12M 48M 200M Parameters 61.8K 228K 790K 2.94M 11.5M 44.5M 203M batch_size256 128 64 128 64 24 10 learning_rate3·10 −3 3·10−3 3·10−3 3·10−3 10−3 10−3 3·10−4 num_layers3 4 4 4 4 6 7 lmax_list1 2 2 4 4 6 6 mmax_list1 2 2 4 4 4 4 sphere_channels16 16 32 32 64 64 128 attn_hidden_c...
work page 2048
-
[25]
Results are reported in the main text; this appendix describes the implementation
as a baseline. Results are reported in the main text; this appendix describes the implementation. The input point cloud, consisting of a varying number of galaxy positions and associated linear velocity estimates, is painted onto a 643 grid using cloud-in-cell (CIC) assignment, producing a four-channel grid (one density channel and three velocity channels...
work page 2048
-
[26]
The shift of the two cosmologies with 18 respect to the fiducial set that was used for training is roughly consistent with the uncertainty on cosmology from DESI (Adame et al., 2025; Abdul Karim et al., 2025). VELOCITYFORMERcan thus generalise across our uncertainty on the true cosmological parameters of the Universe. Table 12: Cosmological parameters for...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.