Recognition: unknown
EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers
Pith reviewed 2026-05-10 16:53 UTC · model grok-4.3
The pith
EquiformerV3 improves SE(3)-equivariant graph attention transformers with software speedups, SwiGLU-S² activations, and smooth-cutoff attention to reach state-of-the-art results on large atomistic benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EquiformerV3 achieves 1.75 times faster inference through software improvements, adds equivariant merged layer normalization and refined feedforward hyperparameters, introduces attention with smooth radius cutoff, and proposes SwiGLU-S² activations that incorporate many-body interactions while preserving strict SE(3)-equivariance and simplifying S² grid sampling. Together these enable accurate modeling of potential energy surfaces, supporting energy-conserving simulations and higher-order derivatives, and deliver state-of-the-art results on OC20, OMat24, and Matbench Discovery when combined with denoising non-equilibrium structures training.
What carries the argument
SwiGLU-S² activations paired with smooth radius cutoff attention, which add many-body interactions for greater expressivity, preserve strict SE(3)-equivariance, reduce S² grid sampling complexity, and support smooth potential energy surface modeling.
If this is right
- The model supports energy-conserving simulations and computation of higher-order derivatives of potential energy surfaces.
- Software optimizations deliver a measured 1.75 times speedup without changing the underlying architecture.
- Training with denoising non-equilibrium structures as an auxiliary task produces state-of-the-art performance on OC20, OMat24, and Matbench Discovery.
- Smooth radius cutoff attention and SwiGLU-S² activations together reduce the risk of discontinuities in learned energy landscapes.
Where Pith is reading between the lines
- The smoother energy surfaces could improve stability when the model is used inside long molecular-dynamics runs.
- Faster inference may make on-the-fly property prediction feasible for larger unit cells or ensembles of structures.
- The same architectural pieces could be applied to other symmetry-preserving tasks such as spin systems or crystal structure generation.
Load-bearing premise
The new activations and smooth-cutoff attention preserve strict SE(3)-equivariance while enabling accurate modeling of smoothly varying potential energy surfaces without physical inconsistencies or reduced expressivity.
What would settle it
A direct check that rotating an input structure by an arbitrary angle produces an output that transforms exactly as the input does under the same rotation, or a demonstration that energy predictions yield non-conservative forces in molecular dynamics trajectories.
Figures
read the original abstract
As $SE(3)$-equivariant graph neural networks mature as a core tool for 3D atomistic modeling, improving their efficiency, expressivity, and physical consistency has become a central challenge for large-scale applications. In this work, we introduce EquiformerV3, the third generation of the $SE(3)$-equivariant graph attention Transformer, designed to advance all three dimensions: efficiency, expressivity, and generality. Building on EquiformerV2, we have the following three key advances. First, we optimize the software implementation, achieving $1.75\times$ speedup. Second, we introduce simple and effective modifications to EquiformerV2, including equivariant merged layer normalization, improved feedforward network hyper-parameters, and attention with smooth radius cutoff. Third, we propose SwiGLU-$S^2$ activations to incorporate many-body interactions for better theoretical expressivity and to preserve strict equivariance while reducing the complexity of sampling $S^2$ grids. Together, SwiGLU-$S^2$ activations and smooth-cutoff attention enable accurate modeling of smoothly varying potential energy surfaces (PES), generalizing EquiformerV3 to tasks requiring energy-conserving simulations and higher-order derivatives of PES. With these improvements, EquiformerV3 trained with the auxiliary task of denoising non-equilibrium structures (DeNS) achieves state-of-the-art results on OC20, OMat24, and Matbench Discovery.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces EquiformerV3, the third generation of SE(3)-equivariant graph attention Transformers for 3D atomistic modeling. Building on EquiformerV2, it reports three advances: a 1.75× software implementation speedup, modifications including equivariant merged layer normalization, improved FFN hyperparameters, and attention with smooth radius cutoff, plus the new SwiGLU-S² activations to incorporate many-body interactions while preserving strict equivariance and reducing S² sampling complexity. These changes, combined with DeNS auxiliary training, are claimed to enable accurate modeling of smoothly varying potential energy surfaces and to deliver state-of-the-art results on the OC20, OMat24, and Matbench Discovery benchmarks.
Significance. If the empirical claims hold under detailed scrutiny, the work would advance practical scaling of equivariant Transformers for large-scale materials and molecular simulations by improving efficiency and the fidelity of PES modeling without sacrificing symmetry constraints. The focus on enabling energy-conserving simulations and higher-order derivatives addresses a key need in the field.
major comments (2)
- [Results section] Results section: The central claim of state-of-the-art performance on OC20, OMat24, and Matbench Discovery is load-bearing for the paper's contribution, yet the provided description lacks full baseline comparison tables, ablation studies isolating each modification (SwiGLU-S², smooth-cutoff attention, merged LN, etc.), and error bars or run statistics. Without these, the attribution of gains to the proposed changes cannot be rigorously assessed.
- [Methods section on SwiGLU-S² activations] Methods section on SwiGLU-S² activations: The claim that SwiGLU-S² preserves strict SE(3)-equivariance while incorporating many-body interactions and reducing S² grid sampling complexity is central to the expressivity and generality arguments. The text does not provide explicit equations or a verification argument showing how the activation is applied to tensor features without breaking equivariance or introducing parameter-dependent artifacts beyond the listed free parameters (FFN hyperparameters and S² sampling).
minor comments (1)
- [Abstract] Abstract: The statement that the modifications 'enable accurate modeling of smoothly varying potential energy surfaces' would benefit from a brief parenthetical reference to the specific metrics (e.g., energy MAE or force MAE) used to support this on the cited benchmarks.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements for greater rigor and clarity.
read point-by-point responses
-
Referee: [Results section] Results section: The central claim of state-of-the-art performance on OC20, OMat24, and Matbench Discovery is load-bearing for the paper's contribution, yet the provided description lacks full baseline comparison tables, ablation studies isolating each modification (SwiGLU-S², smooth-cutoff attention, merged LN, etc.), and error bars or run statistics. Without these, the attribution of gains to the proposed changes cannot be rigorously assessed.
Authors: We agree that expanded empirical support is valuable for rigorously attributing gains. The manuscript already contains baseline comparisons to prior Equiformer versions and other SOTA methods on the three benchmarks, along with some component ablations. However, to strengthen the claims, we will add full expanded baseline tables, comprehensive ablation studies isolating each modification (SwiGLU-S², smooth-cutoff attention, merged layer normalization, improved FFN hyperparameters, and the 1.75× implementation speedup), and error bars/statistics from multiple independent runs with different random seeds in the revised results section. revision: yes
-
Referee: [Methods section on SwiGLU-S² activations] Methods section on SwiGLU-S² activations: The claim that SwiGLU-S² preserves strict SE(3)-equivariance while incorporating many-body interactions and reducing S² grid sampling complexity is central to the expressivity and generality arguments. The text does not provide explicit equations or a verification argument showing how the activation is applied to tensor features without breaking equivariance or introducing parameter-dependent artifacts beyond the listed free parameters (FFN hyperparameters and S² sampling).
Authors: We acknowledge the need for more formal detail here. The current text describes the motivation and high-level properties of SwiGLU-S², but we will revise the methods section to include the explicit equations for applying the activation to the tensor features (operating on the irreducible representations), along with a verification argument demonstrating preservation of strict SE(3)-equivariance. This will also clarify the incorporation of many-body interactions, the reduction in S² sampling complexity, and the limited parameter dependencies (FFN hyperparameters and S² sampling points) without introducing additional artifacts. revision: yes
Circularity Check
No significant circularity
full rationale
The paper proposes concrete architectural and implementation changes to EquiformerV2 (SwiGLU-S² activations, smooth-cutoff attention, merged layer norm, FFN hyper-parameter tweaks, 1.75× software speedup) plus an auxiliary DeNS training task, then reports empirical SOTA results on OC20, OMat24, and Matbench Discovery. No equations, uniqueness theorems, or predictions are shown that reduce by construction to the paper's own fitted inputs or self-citations; the central claims rest on external benchmark comparisons and stated preservation of SE(3)-equivariance, which are independently testable. This is the normal non-circular case for an empirical architecture paper.
Axiom & Free-Parameter Ledger
free parameters (2)
- FFN hyperparameters
- S^2 sampling parameters
axioms (1)
- domain assumption All layers must preserve strict SE(3) equivariance for physical consistency
invented entities (1)
-
SwiGLU-S² activations
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Fast contracted Clebsch--Gordan tensor products for equivariant graph neural networks
An O(L^3) algorithm computes contracted Clebsch-Gordan tensor products for equivariant ML potentials using a structured angular grid and spherical Poisson bracket to handle parity-odd terms at fixed CP rank.
Reference graph
Works this paper leans on
-
[1]
Brody, S., Alon, U., and Yahav, E
URL https://openreview.net/forum ?id=_xwr8gOBeV1. Brody, S., Alon, U., and Yahav, E. How attentive are graph attention networks? InInternational Conference on Learning Representations (ICLR), 2022. URL https: //openreview.net/forum?id=F72ximsx7C1. Chanussot*, L., Das*, A., Goyal*, S., Lavril*, T., Shuaibi*, M., Riviere, M., Tran, K., Heras-Domingo, J., Ho...
2022
-
[2]
Lawrence Zit- nick, and Zachary Ulissi
doi: 10.1021/acscatal.0c04525. Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Sch¨utt, K. T., and M¨uller, K.-R. Machine learning of ac- curate energy-conserving molecular force fields.Science Advances, 3(5):e1603015, 2017. doi: 10.1126/sciadv.1 603015. URL https://www.science.org/doi/ abs/10.1126/sciadv.1603015. Chowdhery, A., Narang, S., De...
-
[3]
URL https://openreview.net/forum ?id=KwmPfARgOTD. Liao, Y .-L., Smidt, T., Shuaibi, M., and Das, A. General- izing denoising to non-equilibrium structures improves equivariant force fields.Transactions on Machine Learn- ing Research, 2024a. ISSN 2835-8856. URL https: //openreview.net/forum?id=whGzYUbIWA. Liao, Y .-L., Wood, B., Das*, A., and Smidt*, T. Eq...
-
[4]
Generalized gradient approximation made simple
doi: 10.1103/physrevlett.77.3865. URL http: //dx.doi.org/10.1103/PhysRevLett.77. 3865. P´ota, B., Ahlawat, P., Cs´anyi, G., and Simoncelli, M. Ther- mal conductivity predictions with foundation atomistic models.arXiv preprint arXiv:2408.00755, 2024. URL https://arxiv.org/abs/2408.00755. Pozdnyakov, S. N., Willatt, M. J., Bart´ok, A. P., Ortner, C., Cs´any...
-
[5]
URL https://openreview.net/forum ?id=zNHzqZ9wrRB. Thomas, N., Smidt, T. E., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., and Riley, P. Tensor field networks: Rotation- and translation-equivariant neural networks for 3d point clouds.arxiv preprint arXiv:1802.08219, 2018. Townshend, R. J. L., Townshend, B., Eismann, S., and Dror, R. O. Geometric prediction:...
-
[6]
Zitnick, L., Das, A., Kolluru, A., Lan, J., Shuaibi, M., Sri- ram, A., Ulissi, Z., and Wood, B
URL https://link.aps.org/doi/10.11 03/PhysRevLett.120.143001. Zitnick, L., Das, A., Kolluru, A., Lan, J., Shuaibi, M., Sri- ram, A., Ulissi, Z., and Wood, B. Spherical channels for modeling atomic interactions. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 16 EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant...
2022
-
[7]
For DeNS, we make three modifications
and the auxiliary task of denoising based on DeNS (Liao et al., 2024a). For DeNS, we make three modifications. First, we apply denoising only to structures whose maximum per-atom force norm does not exceed 2.5 eV/ ˚A. Second, after sampling which atoms to corrupt, we ensure that the fraction of corrupted atoms does not exceed 0.75, preventing denoising fr...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.