arxiv: 2604.09130 · v1 · submitted 2026-04-10 · 💻 cs.LG · cs.AI· physics.comp-ph

Recognition: unknown

EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers

Yi-Lun Liao , Alexander J. Hoffman , Sabrina C. Shen , Alexandre Duval , Sam Walton Norwood , Tess Smidt

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:53 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.comp-ph

keywords SE(3)-equivariant graph neural networksgraph attention transformersatomistic modelingpotential energy surfacesOC20 benchmarkmaterials discoveryequivariant activations

0 comments

The pith

EquiformerV3 improves SE(3)-equivariant graph attention transformers with software speedups, SwiGLU-S² activations, and smooth-cutoff attention to reach state-of-the-art results on large atomistic benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EquiformerV3 as the next iteration of SE(3)-equivariant graph attention transformers for 3D atomic systems. It combines code-level optimizations for faster training, architectural tweaks like merged layer normalization and improved feedforward networks, and new components that add many-body expressivity while keeping exact symmetry. These changes allow the model to represent smoothly varying potential energy surfaces more accurately. When trained with an auxiliary denoising task on non-equilibrium structures, the result is new performance highs on established benchmarks for catalysis and materials discovery.

Core claim

EquiformerV3 achieves 1.75 times faster inference through software improvements, adds equivariant merged layer normalization and refined feedforward hyperparameters, introduces attention with smooth radius cutoff, and proposes SwiGLU-S² activations that incorporate many-body interactions while preserving strict SE(3)-equivariance and simplifying S² grid sampling. Together these enable accurate modeling of potential energy surfaces, supporting energy-conserving simulations and higher-order derivatives, and deliver state-of-the-art results on OC20, OMat24, and Matbench Discovery when combined with denoising non-equilibrium structures training.

What carries the argument

SwiGLU-S² activations paired with smooth radius cutoff attention, which add many-body interactions for greater expressivity, preserve strict SE(3)-equivariance, reduce S² grid sampling complexity, and support smooth potential energy surface modeling.

If this is right

The model supports energy-conserving simulations and computation of higher-order derivatives of potential energy surfaces.
Software optimizations deliver a measured 1.75 times speedup without changing the underlying architecture.
Training with denoising non-equilibrium structures as an auxiliary task produces state-of-the-art performance on OC20, OMat24, and Matbench Discovery.
Smooth radius cutoff attention and SwiGLU-S² activations together reduce the risk of discontinuities in learned energy landscapes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The smoother energy surfaces could improve stability when the model is used inside long molecular-dynamics runs.
Faster inference may make on-the-fly property prediction feasible for larger unit cells or ensembles of structures.
The same architectural pieces could be applied to other symmetry-preserving tasks such as spin systems or crystal structure generation.

Load-bearing premise

The new activations and smooth-cutoff attention preserve strict SE(3)-equivariance while enabling accurate modeling of smoothly varying potential energy surfaces without physical inconsistencies or reduced expressivity.

What would settle it

A direct check that rotating an input structure by an arbitrary angle produces an output that transforms exactly as the input does under the same rotation, or a demonstration that energy predictions yield non-conservative forces in molecular dynamics trajectories.

Figures

Figures reproduced from arXiv: 2604.09130 by Alexander J. Hoffman, Alexandre Duval, Sabrina C. Shen, Sam Walton Norwood, Tess Smidt, Yi-Lun Liao.

**Figure 1.** Figure 1: EquiformerV3 architecture. The proposed improvements are highlighted in red. We encode input 3D atomistic graphs with atom and edge-degree embeddings and process with Transformer blocks, which consist of equivariant merged layer normalization (LN), equivariant graph attention and feedforward networks. a similar manner to libraries like cuEquivariance (Geiger et al., 2024) and OpenEquivariance (Bharadwaj et… view at source ↗

**Figure 2.** Figure 2: Illustration of how different normalizations calculate statistics. In the proposed equivariant merged layer normalization, we share the merged RMS across all degrees as highlighted in red. pendently. As a result, the average magnitudes of different degrees become the same after the normalization, which removes the relative importance between different degrees, and can negatively affect training dynamics. T… view at source ↗

**Figure 3.** Figure 3: Illustration of different activation functions. The colorful circles are grid features. One circle represents one channel, which contains Rϕ × Rθ grid points on S 2 . In the proposed SwiGLU-S 2 activation, we apply both nonlinearity and multiplication to grid features as highlighted in red. to irreps space. Mathematically, we consider an input irreps feature x ∈ R (Lmax+1)2 with maximum degree Lmax and a s… view at source ↗

read the original abstract

As $SE(3)$-equivariant graph neural networks mature as a core tool for 3D atomistic modeling, improving their efficiency, expressivity, and physical consistency has become a central challenge for large-scale applications. In this work, we introduce EquiformerV3, the third generation of the $SE(3)$-equivariant graph attention Transformer, designed to advance all three dimensions: efficiency, expressivity, and generality. Building on EquiformerV2, we have the following three key advances. First, we optimize the software implementation, achieving $1.75\times$ speedup. Second, we introduce simple and effective modifications to EquiformerV2, including equivariant merged layer normalization, improved feedforward network hyper-parameters, and attention with smooth radius cutoff. Third, we propose SwiGLU-$S^2$ activations to incorporate many-body interactions for better theoretical expressivity and to preserve strict equivariance while reducing the complexity of sampling $S^2$ grids. Together, SwiGLU-$S^2$ activations and smooth-cutoff attention enable accurate modeling of smoothly varying potential energy surfaces (PES), generalizing EquiformerV3 to tasks requiring energy-conserving simulations and higher-order derivatives of PES. With these improvements, EquiformerV3 trained with the auxiliary task of denoising non-equilibrium structures (DeNS) achieves state-of-the-art results on OC20, OMat24, and Matbench Discovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EquiformerV3 adds concrete tweaks like SwiGLU-S² activations and smooth-cutoff attention to EquiformerV2 that deliver reported SOTA on OC20 and related benchmarks while targeting efficiency and PES smoothness.

read the letter

EquiformerV3 refines the prior EquiformerV2 model with a handful of targeted changes that focus on speed, expressivity, and physical consistency for 3D atomic systems. The main pieces are a 1.75x software optimization, equivariant merged layer norm, adjusted feedforward hyperparameters, attention with smooth radius cutoff, and the new SwiGLU-S² activation that aims to capture many-body effects more cleanly while keeping strict SE(3)-equivariance and easing S² sampling. Combined with DeNS auxiliary training, these yield the claimed state-of-the-art numbers on OC20, OMat24, and Matbench Discovery. That is the core advance: practical engineering steps that address efficiency and smoothness gaps without reinventing the architecture from scratch. The paper does a solid job explaining why these modifications matter for large-scale simulations where you need accurate higher-order derivatives of the potential energy surface. The smooth cutoff and SwiGLU variant look like reasonable responses to known limitations in earlier equivariant transformers, and the emphasis on preserving equivariance while improving PES modeling is consistent with the goals. The soft spots are mainly in the supporting evidence. The abstract states the SOTA results and the rationale for each change, but the level of detail on ablations, error bars, and component-wise contributions is light, so it is not yet clear how much each piece drives the gains or whether the equivariance holds without hidden inconsistencies in practice. Full tables and code would help pin that down. This work is aimed at groups already using or extending equivariant graph transformers for materials and chemistry modeling. Readers who care about deployable speedups and better handling of smooth energy landscapes will get direct value from the reported numbers and the implementation notes. It deserves a serious referee because the benchmarks are standard, the modifications are specific and testable, and the empirical claims are strong enough to warrant checking the details rather than a desk rejection.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces EquiformerV3, the third generation of SE(3)-equivariant graph attention Transformers for 3D atomistic modeling. Building on EquiformerV2, it reports three advances: a 1.75× software implementation speedup, modifications including equivariant merged layer normalization, improved FFN hyperparameters, and attention with smooth radius cutoff, plus the new SwiGLU-S² activations to incorporate many-body interactions while preserving strict equivariance and reducing S² sampling complexity. These changes, combined with DeNS auxiliary training, are claimed to enable accurate modeling of smoothly varying potential energy surfaces and to deliver state-of-the-art results on the OC20, OMat24, and Matbench Discovery benchmarks.

Significance. If the empirical claims hold under detailed scrutiny, the work would advance practical scaling of equivariant Transformers for large-scale materials and molecular simulations by improving efficiency and the fidelity of PES modeling without sacrificing symmetry constraints. The focus on enabling energy-conserving simulations and higher-order derivatives addresses a key need in the field.

major comments (2)

[Results section] Results section: The central claim of state-of-the-art performance on OC20, OMat24, and Matbench Discovery is load-bearing for the paper's contribution, yet the provided description lacks full baseline comparison tables, ablation studies isolating each modification (SwiGLU-S², smooth-cutoff attention, merged LN, etc.), and error bars or run statistics. Without these, the attribution of gains to the proposed changes cannot be rigorously assessed.
[Methods section on SwiGLU-S² activations] Methods section on SwiGLU-S² activations: The claim that SwiGLU-S² preserves strict SE(3)-equivariance while incorporating many-body interactions and reducing S² grid sampling complexity is central to the expressivity and generality arguments. The text does not provide explicit equations or a verification argument showing how the activation is applied to tensor features without breaking equivariance or introducing parameter-dependent artifacts beyond the listed free parameters (FFN hyperparameters and S² sampling).

minor comments (1)

[Abstract] Abstract: The statement that the modifications 'enable accurate modeling of smoothly varying potential energy surfaces' would benefit from a brief parenthetical reference to the specific metrics (e.g., energy MAE or force MAE) used to support this on the cited benchmarks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements for greater rigor and clarity.

read point-by-point responses

Referee: [Results section] Results section: The central claim of state-of-the-art performance on OC20, OMat24, and Matbench Discovery is load-bearing for the paper's contribution, yet the provided description lacks full baseline comparison tables, ablation studies isolating each modification (SwiGLU-S², smooth-cutoff attention, merged LN, etc.), and error bars or run statistics. Without these, the attribution of gains to the proposed changes cannot be rigorously assessed.

Authors: We agree that expanded empirical support is valuable for rigorously attributing gains. The manuscript already contains baseline comparisons to prior Equiformer versions and other SOTA methods on the three benchmarks, along with some component ablations. However, to strengthen the claims, we will add full expanded baseline tables, comprehensive ablation studies isolating each modification (SwiGLU-S², smooth-cutoff attention, merged layer normalization, improved FFN hyperparameters, and the 1.75× implementation speedup), and error bars/statistics from multiple independent runs with different random seeds in the revised results section. revision: yes
Referee: [Methods section on SwiGLU-S² activations] Methods section on SwiGLU-S² activations: The claim that SwiGLU-S² preserves strict SE(3)-equivariance while incorporating many-body interactions and reducing S² grid sampling complexity is central to the expressivity and generality arguments. The text does not provide explicit equations or a verification argument showing how the activation is applied to tensor features without breaking equivariance or introducing parameter-dependent artifacts beyond the listed free parameters (FFN hyperparameters and S² sampling).

Authors: We acknowledge the need for more formal detail here. The current text describes the motivation and high-level properties of SwiGLU-S², but we will revise the methods section to include the explicit equations for applying the activation to the tensor features (operating on the irreducible representations), along with a verification argument demonstrating preservation of strict SE(3)-equivariance. This will also clarify the incorporation of many-body interactions, the reduction in S² sampling complexity, and the limited parameter dependencies (FFN hyperparameters and S² sampling points) without introducing additional artifacts. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes concrete architectural and implementation changes to EquiformerV2 (SwiGLU-S² activations, smooth-cutoff attention, merged layer norm, FFN hyper-parameter tweaks, 1.75× software speedup) plus an auxiliary DeNS training task, then reports empirical SOTA results on OC20, OMat24, and Matbench Discovery. No equations, uniqueness theorems, or predictions are shown that reduce by construction to the paper's own fitted inputs or self-citations; the central claims rest on external benchmark comparisons and stated preservation of SE(3)-equivariance, which are independently testable. This is the normal non-circular case for an empirical architecture paper.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The work relies on standard assumptions of SE(3) symmetry in atomistic systems and typical ML hyperparameter tuning; the new activation is introduced without external validation beyond benchmark performance.

free parameters (2)

FFN hyperparameters
Improved feedforward network hyper-parameters are introduced and selected to optimize performance.
S^2 sampling parameters
Parameters controlling spherical grid sampling in the new activations are adjusted to reduce complexity.

axioms (1)

domain assumption All layers must preserve strict SE(3) equivariance for physical consistency
Invoked when claiming the new activations and attention preserve equivariance while adding expressivity.

invented entities (1)

SwiGLU-S² activations no independent evidence
purpose: Incorporate many-body interactions while preserving strict equivariance and lowering S^2 sampling cost
New activation function family proposed to address expressivity limits of prior equivariant layers.

pith-pipeline@v0.9.0 · 5593 in / 1441 out tokens · 64076 ms · 2026-05-10T16:53:55.459578+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Fast contracted Clebsch--Gordan tensor products for equivariant graph neural networks
physics.comp-ph 2026-05 unverdicted novelty 7.0

An O(L^3) algorithm computes contracted Clebsch-Gordan tensor products for equivariant ML potentials using a structured angular grid and spherical Poisson bracket to handle parity-odd terms at fixed CP rank.

Reference graph

Works this paper leans on

7 extracted references · 4 canonical work pages · cited by 1 Pith paper

[1]

Brody, S., Alon, U., and Yahav, E

URL https://openreview.net/forum ?id=_xwr8gOBeV1. Brody, S., Alon, U., and Yahav, E. How attentive are graph attention networks? InInternational Conference on Learning Representations (ICLR), 2022. URL https: //openreview.net/forum?id=F72ximsx7C1. Chanussot*, L., Das*, A., Goyal*, S., Lavril*, T., Shuaibi*, M., Riviere, M., Tran, K., Heras-Domingo, J., Ho...

2022
[2]

Lawrence Zit- nick, and Zachary Ulissi

doi: 10.1021/acscatal.0c04525. Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Sch¨utt, K. T., and M¨uller, K.-R. Machine learning of ac- curate energy-conserving molecular force fields.Science Advances, 3(5):e1603015, 2017. doi: 10.1126/sciadv.1 603015. URL https://www.science.org/doi/ abs/10.1126/sciadv.1603015. Chowdhery, A., Narang, S., De...

work page doi:10.1021/acscatal.0c04525 2017
[3]

PET-MAD as a lightweight universal interatomic potential for advanced materials modeling.Nature Communications, 16(1):10653, November 2025

URL https://openreview.net/forum ?id=KwmPfARgOTD. Liao, Y .-L., Smidt, T., Shuaibi, M., and Das, A. General- izing denoising to non-equilibrium structures improves equivariant force fields.Transactions on Machine Learn- ing Research, 2024a. ISSN 2835-8856. URL https: //openreview.net/forum?id=whGzYUbIWA. Liao, Y .-L., Wood, B., Das*, A., and Smidt*, T. Eq...

work page doi:10.1038/s41467-025-65662-7 2024
[4]

Generalized gradient approximation made simple

doi: 10.1103/physrevlett.77.3865. URL http: //dx.doi.org/10.1103/PhysRevLett.77. 3865. P´ota, B., Ahlawat, P., Cs´anyi, G., and Simoncelli, M. Ther- mal conductivity predictions with foundation atomistic models.arXiv preprint arXiv:2408.00755, 2024. URL https://arxiv.org/abs/2408.00755. Pozdnyakov, S. N., Willatt, M. J., Bart´ok, A. P., Ortner, C., Cs´any...

work page doi:10.1103/physrevlett.77.3865 2024
[5]

Thomas, N., Smidt, T

URL https://openreview.net/forum ?id=zNHzqZ9wrRB. Thomas, N., Smidt, T. E., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., and Riley, P. Tensor field networks: Rotation- and translation-equivariant neural networks for 3d point clouds.arxiv preprint arXiv:1802.08219, 2018. Townshend, R. J. L., Townshend, B., Eismann, S., and Dror, R. O. Geometric prediction:...

work page doi:10.1103/physrevlett.120.1430 2018
[6]

Zitnick, L., Das, A., Kolluru, A., Lan, J., Shuaibi, M., Sri- ram, A., Ulissi, Z., and Wood, B

URL https://link.aps.org/doi/10.11 03/PhysRevLett.120.143001. Zitnick, L., Das, A., Kolluru, A., Lan, J., Shuaibi, M., Sri- ram, A., Ulissi, Z., and Wood, B. Spherical channels for modeling atomic interactions. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 16 EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant...

2022
[7]

For DeNS, we make three modifications

and the auxiliary task of denoising based on DeNS (Liao et al., 2024a). For DeNS, we make three modifications. First, we apply denoising only to structures whose maximum per-atom force norm does not exceed 2.5 eV/ ˚A. Second, after sampling which atoms to corrupt, we ensure that the fraction of corrupted atoms does not exceed 0.75, preventing denoising fr...

2025