pith. machine review for the scientific record. sign in

arxiv: 2604.09130 · v1 · submitted 2026-04-10 · 💻 cs.LG · cs.AI· physics.comp-ph

Recognition: unknown

EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:53 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.comp-ph
keywords SE(3)-equivariant graph neural networksgraph attention transformersatomistic modelingpotential energy surfacesOC20 benchmarkmaterials discoveryequivariant activations
0
0 comments X

The pith

EquiformerV3 improves SE(3)-equivariant graph attention transformers with software speedups, SwiGLU-S² activations, and smooth-cutoff attention to reach state-of-the-art results on large atomistic benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EquiformerV3 as the next iteration of SE(3)-equivariant graph attention transformers for 3D atomic systems. It combines code-level optimizations for faster training, architectural tweaks like merged layer normalization and improved feedforward networks, and new components that add many-body expressivity while keeping exact symmetry. These changes allow the model to represent smoothly varying potential energy surfaces more accurately. When trained with an auxiliary denoising task on non-equilibrium structures, the result is new performance highs on established benchmarks for catalysis and materials discovery.

Core claim

EquiformerV3 achieves 1.75 times faster inference through software improvements, adds equivariant merged layer normalization and refined feedforward hyperparameters, introduces attention with smooth radius cutoff, and proposes SwiGLU-S² activations that incorporate many-body interactions while preserving strict SE(3)-equivariance and simplifying S² grid sampling. Together these enable accurate modeling of potential energy surfaces, supporting energy-conserving simulations and higher-order derivatives, and deliver state-of-the-art results on OC20, OMat24, and Matbench Discovery when combined with denoising non-equilibrium structures training.

What carries the argument

SwiGLU-S² activations paired with smooth radius cutoff attention, which add many-body interactions for greater expressivity, preserve strict SE(3)-equivariance, reduce S² grid sampling complexity, and support smooth potential energy surface modeling.

If this is right

  • The model supports energy-conserving simulations and computation of higher-order derivatives of potential energy surfaces.
  • Software optimizations deliver a measured 1.75 times speedup without changing the underlying architecture.
  • Training with denoising non-equilibrium structures as an auxiliary task produces state-of-the-art performance on OC20, OMat24, and Matbench Discovery.
  • Smooth radius cutoff attention and SwiGLU-S² activations together reduce the risk of discontinuities in learned energy landscapes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The smoother energy surfaces could improve stability when the model is used inside long molecular-dynamics runs.
  • Faster inference may make on-the-fly property prediction feasible for larger unit cells or ensembles of structures.
  • The same architectural pieces could be applied to other symmetry-preserving tasks such as spin systems or crystal structure generation.

Load-bearing premise

The new activations and smooth-cutoff attention preserve strict SE(3)-equivariance while enabling accurate modeling of smoothly varying potential energy surfaces without physical inconsistencies or reduced expressivity.

What would settle it

A direct check that rotating an input structure by an arbitrary angle produces an output that transforms exactly as the input does under the same rotation, or a demonstration that energy predictions yield non-conservative forces in molecular dynamics trajectories.

Figures

Figures reproduced from arXiv: 2604.09130 by Alexander J. Hoffman, Alexandre Duval, Sabrina C. Shen, Sam Walton Norwood, Tess Smidt, Yi-Lun Liao.

Figure 1
Figure 1. Figure 1: EquiformerV3 architecture. The proposed improvements are highlighted in red. We encode input 3D atomistic graphs with atom and edge-degree embeddings and process with Transformer blocks, which consist of equivariant merged layer normalization (LN), equivariant graph attention and feedforward networks. a similar manner to libraries like cuEquivariance (Geiger et al., 2024) and OpenEquivariance (Bharadwaj et… view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of how different normalizations calculate statistics. In the proposed equivariant merged layer normalization, we share the merged RMS across all degrees as highlighted in red. pendently. As a result, the average magnitudes of different degrees become the same after the normalization, which removes the relative importance between different degrees, and can negatively affect training dynamics. T… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of different activation functions. The colorful circles are grid features. One circle represents one channel, which contains Rϕ × Rθ grid points on S 2 . In the proposed SwiGLU-S 2 activation, we apply both nonlinearity and multiplication to grid features as highlighted in red. to irreps space. Mathematically, we consider an input irreps feature x ∈ R (Lmax+1)2 with maximum degree Lmax and a s… view at source ↗
read the original abstract

As $SE(3)$-equivariant graph neural networks mature as a core tool for 3D atomistic modeling, improving their efficiency, expressivity, and physical consistency has become a central challenge for large-scale applications. In this work, we introduce EquiformerV3, the third generation of the $SE(3)$-equivariant graph attention Transformer, designed to advance all three dimensions: efficiency, expressivity, and generality. Building on EquiformerV2, we have the following three key advances. First, we optimize the software implementation, achieving $1.75\times$ speedup. Second, we introduce simple and effective modifications to EquiformerV2, including equivariant merged layer normalization, improved feedforward network hyper-parameters, and attention with smooth radius cutoff. Third, we propose SwiGLU-$S^2$ activations to incorporate many-body interactions for better theoretical expressivity and to preserve strict equivariance while reducing the complexity of sampling $S^2$ grids. Together, SwiGLU-$S^2$ activations and smooth-cutoff attention enable accurate modeling of smoothly varying potential energy surfaces (PES), generalizing EquiformerV3 to tasks requiring energy-conserving simulations and higher-order derivatives of PES. With these improvements, EquiformerV3 trained with the auxiliary task of denoising non-equilibrium structures (DeNS) achieves state-of-the-art results on OC20, OMat24, and Matbench Discovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces EquiformerV3, the third generation of SE(3)-equivariant graph attention Transformers for 3D atomistic modeling. Building on EquiformerV2, it reports three advances: a 1.75× software implementation speedup, modifications including equivariant merged layer normalization, improved FFN hyperparameters, and attention with smooth radius cutoff, plus the new SwiGLU-S² activations to incorporate many-body interactions while preserving strict equivariance and reducing S² sampling complexity. These changes, combined with DeNS auxiliary training, are claimed to enable accurate modeling of smoothly varying potential energy surfaces and to deliver state-of-the-art results on the OC20, OMat24, and Matbench Discovery benchmarks.

Significance. If the empirical claims hold under detailed scrutiny, the work would advance practical scaling of equivariant Transformers for large-scale materials and molecular simulations by improving efficiency and the fidelity of PES modeling without sacrificing symmetry constraints. The focus on enabling energy-conserving simulations and higher-order derivatives addresses a key need in the field.

major comments (2)
  1. [Results section] Results section: The central claim of state-of-the-art performance on OC20, OMat24, and Matbench Discovery is load-bearing for the paper's contribution, yet the provided description lacks full baseline comparison tables, ablation studies isolating each modification (SwiGLU-S², smooth-cutoff attention, merged LN, etc.), and error bars or run statistics. Without these, the attribution of gains to the proposed changes cannot be rigorously assessed.
  2. [Methods section on SwiGLU-S² activations] Methods section on SwiGLU-S² activations: The claim that SwiGLU-S² preserves strict SE(3)-equivariance while incorporating many-body interactions and reducing S² grid sampling complexity is central to the expressivity and generality arguments. The text does not provide explicit equations or a verification argument showing how the activation is applied to tensor features without breaking equivariance or introducing parameter-dependent artifacts beyond the listed free parameters (FFN hyperparameters and S² sampling).
minor comments (1)
  1. [Abstract] Abstract: The statement that the modifications 'enable accurate modeling of smoothly varying potential energy surfaces' would benefit from a brief parenthetical reference to the specific metrics (e.g., energy MAE or force MAE) used to support this on the cited benchmarks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements for greater rigor and clarity.

read point-by-point responses
  1. Referee: [Results section] Results section: The central claim of state-of-the-art performance on OC20, OMat24, and Matbench Discovery is load-bearing for the paper's contribution, yet the provided description lacks full baseline comparison tables, ablation studies isolating each modification (SwiGLU-S², smooth-cutoff attention, merged LN, etc.), and error bars or run statistics. Without these, the attribution of gains to the proposed changes cannot be rigorously assessed.

    Authors: We agree that expanded empirical support is valuable for rigorously attributing gains. The manuscript already contains baseline comparisons to prior Equiformer versions and other SOTA methods on the three benchmarks, along with some component ablations. However, to strengthen the claims, we will add full expanded baseline tables, comprehensive ablation studies isolating each modification (SwiGLU-S², smooth-cutoff attention, merged layer normalization, improved FFN hyperparameters, and the 1.75× implementation speedup), and error bars/statistics from multiple independent runs with different random seeds in the revised results section. revision: yes

  2. Referee: [Methods section on SwiGLU-S² activations] Methods section on SwiGLU-S² activations: The claim that SwiGLU-S² preserves strict SE(3)-equivariance while incorporating many-body interactions and reducing S² grid sampling complexity is central to the expressivity and generality arguments. The text does not provide explicit equations or a verification argument showing how the activation is applied to tensor features without breaking equivariance or introducing parameter-dependent artifacts beyond the listed free parameters (FFN hyperparameters and S² sampling).

    Authors: We acknowledge the need for more formal detail here. The current text describes the motivation and high-level properties of SwiGLU-S², but we will revise the methods section to include the explicit equations for applying the activation to the tensor features (operating on the irreducible representations), along with a verification argument demonstrating preservation of strict SE(3)-equivariance. This will also clarify the incorporation of many-body interactions, the reduction in S² sampling complexity, and the limited parameter dependencies (FFN hyperparameters and S² sampling points) without introducing additional artifacts. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes concrete architectural and implementation changes to EquiformerV2 (SwiGLU-S² activations, smooth-cutoff attention, merged layer norm, FFN hyper-parameter tweaks, 1.75× software speedup) plus an auxiliary DeNS training task, then reports empirical SOTA results on OC20, OMat24, and Matbench Discovery. No equations, uniqueness theorems, or predictions are shown that reduce by construction to the paper's own fitted inputs or self-citations; the central claims rest on external benchmark comparisons and stated preservation of SE(3)-equivariance, which are independently testable. This is the normal non-circular case for an empirical architecture paper.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The work relies on standard assumptions of SE(3) symmetry in atomistic systems and typical ML hyperparameter tuning; the new activation is introduced without external validation beyond benchmark performance.

free parameters (2)
  • FFN hyperparameters
    Improved feedforward network hyper-parameters are introduced and selected to optimize performance.
  • S^2 sampling parameters
    Parameters controlling spherical grid sampling in the new activations are adjusted to reduce complexity.
axioms (1)
  • domain assumption All layers must preserve strict SE(3) equivariance for physical consistency
    Invoked when claiming the new activations and attention preserve equivariance while adding expressivity.
invented entities (1)
  • SwiGLU-S² activations no independent evidence
    purpose: Incorporate many-body interactions while preserving strict equivariance and lowering S^2 sampling cost
    New activation function family proposed to address expressivity limits of prior equivariant layers.

pith-pipeline@v0.9.0 · 5593 in / 1441 out tokens · 64076 ms · 2026-05-10T16:53:55.459578+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Fast contracted Clebsch--Gordan tensor products for equivariant graph neural networks

    physics.comp-ph 2026-05 unverdicted novelty 7.0

    An O(L^3) algorithm computes contracted Clebsch-Gordan tensor products for equivariant ML potentials using a structured angular grid and spherical Poisson bracket to handle parity-odd terms at fixed CP rank.

Reference graph

Works this paper leans on

7 extracted references · 4 canonical work pages · cited by 1 Pith paper

  1. [1]

    Brody, S., Alon, U., and Yahav, E

    URL https://openreview.net/forum ?id=_xwr8gOBeV1. Brody, S., Alon, U., and Yahav, E. How attentive are graph attention networks? InInternational Conference on Learning Representations (ICLR), 2022. URL https: //openreview.net/forum?id=F72ximsx7C1. Chanussot*, L., Das*, A., Goyal*, S., Lavril*, T., Shuaibi*, M., Riviere, M., Tran, K., Heras-Domingo, J., Ho...

  2. [2]

    Lawrence Zit- nick, and Zachary Ulissi

    doi: 10.1021/acscatal.0c04525. Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Sch¨utt, K. T., and M¨uller, K.-R. Machine learning of ac- curate energy-conserving molecular force fields.Science Advances, 3(5):e1603015, 2017. doi: 10.1126/sciadv.1 603015. URL https://www.science.org/doi/ abs/10.1126/sciadv.1603015. Chowdhery, A., Narang, S., De...

  3. [3]

    PET-MAD as a lightweight universal interatomic potential for advanced materials modeling.Nature Communications, 16(1):10653, November 2025

    URL https://openreview.net/forum ?id=KwmPfARgOTD. Liao, Y .-L., Smidt, T., Shuaibi, M., and Das, A. General- izing denoising to non-equilibrium structures improves equivariant force fields.Transactions on Machine Learn- ing Research, 2024a. ISSN 2835-8856. URL https: //openreview.net/forum?id=whGzYUbIWA. Liao, Y .-L., Wood, B., Das*, A., and Smidt*, T. Eq...

  4. [4]

    Generalized gradient approximation made simple

    doi: 10.1103/physrevlett.77.3865. URL http: //dx.doi.org/10.1103/PhysRevLett.77. 3865. P´ota, B., Ahlawat, P., Cs´anyi, G., and Simoncelli, M. Ther- mal conductivity predictions with foundation atomistic models.arXiv preprint arXiv:2408.00755, 2024. URL https://arxiv.org/abs/2408.00755. Pozdnyakov, S. N., Willatt, M. J., Bart´ok, A. P., Ortner, C., Cs´any...

  5. [5]

    Thomas, N., Smidt, T

    URL https://openreview.net/forum ?id=zNHzqZ9wrRB. Thomas, N., Smidt, T. E., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., and Riley, P. Tensor field networks: Rotation- and translation-equivariant neural networks for 3d point clouds.arxiv preprint arXiv:1802.08219, 2018. Townshend, R. J. L., Townshend, B., Eismann, S., and Dror, R. O. Geometric prediction:...

  6. [6]

    Zitnick, L., Das, A., Kolluru, A., Lan, J., Shuaibi, M., Sri- ram, A., Ulissi, Z., and Wood, B

    URL https://link.aps.org/doi/10.11 03/PhysRevLett.120.143001. Zitnick, L., Das, A., Kolluru, A., Lan, J., Shuaibi, M., Sri- ram, A., Ulissi, Z., and Wood, B. Spherical channels for modeling atomic interactions. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 16 EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant...

  7. [7]

    For DeNS, we make three modifications

    and the auxiliary task of denoising based on DeNS (Liao et al., 2024a). For DeNS, we make three modifications. First, we apply denoising only to structures whose maximum per-atom force norm does not exceed 2.5 eV/ ˚A. Second, after sampling which atoms to corrupt, we ensure that the fraction of corrupted atoms does not exceed 0.75, preventing denoising fr...