pith. machine review for the scientific record. sign in

arxiv: 2605.13262 · v1 · submitted 2026-05-13 · 💻 cs.LG · q-bio.QM

Recognition: 2 theorem links

· Lean Theorem

Chem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Prediction

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:48 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM
keywords Chem-GMNetsphere-native transformermolecular property predictionMoleculeNetgeometric transformerSMILESGegenbauer polynomialsmultipole expansion
0
0 comments X

The pith

Chem-GMNet, a sphere-native geometric transformer, outperforms same-sized ChemBERTa-2 on 7 of 10 MoleculeNet endpoints with about 35 percent fewer parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether chemistry's rich structural information calls for a generic text-based transformer rescued by massive pretraining or a domain-native architecture built from geometric priors. It answers by replacing every transformer component with sphere-native modules that embed tokens as directions on a sphere, use gated sphere-flow recurrences whose state equals a truncated multipole expansion, and apply sphere-kernel attention. Randomly initialized Chem-GMNet wins on most endpoints against same-shape baselines, while the version pretrained on the same ZINC corpus matches or exceeds the public ChemBERTa-2 release on six of eight shared tasks. An ablation further shows that raising sphere dimension alone can beat a pretrained baseline on one endpoint without any pretraining.

Core claim

GM-Net is a transformer family in which every module is sphere-native: SH-Embedding treats tokens as learnable directions on S^{k-1} lifted through a Gegenbauer feature map; DualSKA fuses a linear-time gated Sphere-Flow recurrence (whose persistent state is the truncated multipole expansion of the input distribution) with a softmax Sphere-Kernel branch over the same Schoenberg-valid kernel; and SH-FFN performs sphere projection followed by Gegenbauer lift and moment readout. Instantiated as Chem-GMNet and evaluated on canonical DeepChem scaffold splits against same-shape ChemBERTa-2 baselines under a faithful training protocol, the random-initialized version wins on 7 of 10 MoleculeNet tasks

What carries the argument

Sphere-native modules (SH-Embedding, DualSKA, SH-FFN) that replace standard transformer blocks with operations on spheres using spherical harmonics, Gegenbauer lifts, and multipole expansions.

If this is right

  • Random initialization suffices to beat same-shape baselines on seven of ten MoleculeNet endpoints without any pretraining.
  • Raising sphere dimension from k=8 to k=10 at fixed depth lowers ESOL RMSE below that of pretrained ChemBERTa-2.
  • The pretrained Chem-GMNet matches or exceeds the public ChemBERTa-2 release on six of eight shared endpoints.
  • The architecture maintains its edge on scaffold splits that test generalization to unseen molecular structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The multipole-expansion view of the internal state could enable direct extraction of interpretable physical quantities from trained models.
  • Similar sphere-native replacements could be tested on other structured-sequence tasks such as protein property prediction.
  • Increasing sphere dimension may offer a more parameter-efficient route to capacity than adding layers or heads.
  • The approach may reduce the volume of pretraining data required to reach competitive accuracy compared with text-only models.

Load-bearing premise

The performance gains arise from the sphere-native inductive biases rather than from differences in training protocol, hyperparameter tuning, or data handling that are not fully detailed.

What would settle it

A side-by-side retraining of ChemBERTa-2 and Chem-GMNet from scratch under identical optimizer schedules, data preprocessing, and hyperparameter search to determine whether the gap on the seven winning endpoints disappears.

Figures

Figures reproduced from arXiv: 2605.13262 by Deepak Warrier, Raja Sekhar Pappala.

Figure 1
Figure 1. Figure 1: Chem-GMNet pipeline. Token IDs index a V ×k table of unit directions on S k−1 ; the SH-Embedding lifts each direction through the Gegenbauer feature map Φ, and the residual stream flows through a stack of three (×3) DualSKA + SH-FFN blocks. The persistent state of the Gated SFA branch inside DualSKA is, by Theorem 2, the truncated multipole expansion of the input distribution on the sphere. No absolute pos… view at source ↗
Figure 2
Figure 2. Figure 2: DualSKA block. Shared projections WK, WQ, WP feed two branches that operate on the same Gegenbauer features: the bidirectional Gated SFA recurrence (linear in T, multipole-readout state) and the SKA softmax (quadratic, Schoenberg-positive-definite). A per-head learned gate αh = σ(βh) convex-combines the two, after which WO projects back to the residual stream. The fusion vector β ∈ R H is the only DualSKA-… view at source ↗
read the original abstract

Modern SMILES-based chemical language models obtain strong MoleculeNet performance by treating SMILES as generic text and compensating with multi-million-molecule self-supervised pretraining. We ask: when a domain carries structural priors as rich as chemistry's, does it warrant a domain-native transformer rather than a generic one rescued by scale? We answer affirmatively with \textbf{GM-Net} (Geometric Measure Network), a transformer family in which every module is replaced by a sphere-native counterpart, and instantiate it as \textbf{Chem-GMNet}. Three blocks follow: SH-Embedding (tokens as learnable directions on $S^{k-1}$ lifted through a Gegenbauer feature map); DualSKA (a per-head fusion of a linear-time gated Sphere-Flow recurrence whose persistent state we prove is the truncated multipole expansion of the input distribution, and a softmax Sphere-Kernel branch over the same Schoenberg-valid kernel); and SH-FFN (sphere projection $\to$ Gegenbauer lift $\to$ moment readout). On canonical DeepChem scaffold splits, against same-shape ChemBERTa-2 baselines under the chemberta3-faithful protocol: (i) random-initialised, Chem-GMNet wins on 7 of 10 MoleculeNet endpoints at $\sim\!35\%$ fewer parameters; (ii) pretrained on the same 10M-SMILES ZINC corpus as ChemBERTa-2 MLM-10M, it matches or beats the public release on 6 of 8 shared endpoints (5/7 excluding a known ClinTox release anomaly). A $(k,L)$ ablation shows that increasing the sphere dimension from $k\!=\!8$ to $k\!=\!10$ at fixed $L\!=\!3$ lowers ESOL RMSE to $0.938$ at scratch, beating pretrained ChemBERTa-2 MLM-10M on this endpoint without any pretraining at all.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Chem-GMNet, a sphere-native geometric transformer for molecular property prediction in which every module is replaced by a counterpart built from spherical harmonics and Gegenbauer polynomials. The architecture consists of SH-Embedding (tokens lifted to directions on S^{k-1}), DualSKA (a gated Sphere-Flow recurrence whose state is proved to be the truncated multipole expansion plus a softmax Sphere-Kernel branch), and SH-FFN. On canonical DeepChem scaffold splits the model is reported to win on 7 of 10 MoleculeNet endpoints at random initialization with ~35% fewer parameters than same-shape ChemBERTa-2 baselines, and to match or beat the public ChemBERTa-2 MLM-10M release on 6 of 8 shared endpoints after identical ZINC pretraining; a (k,L) ablation is also presented.

Significance. If the reported gains are shown to arise from the sphere-native inductive biases rather than from uncontrolled differences in training protocol, the work would constitute a meaningful demonstration that domain-specific geometric structure can be exploited to reduce reliance on large-scale self-supervised pretraining in cheminformatics. The explicit proof relating the recurrence state to the multipole expansion is a positive technical feature that distinguishes the contribution from purely empirical architecture search.

major comments (2)
  1. [Abstract and §4 (Experiments)] The central attribution of performance gains to the sphere-native modules requires that the ChemBERTa-2 baselines were trained under an exactly matching protocol (optimizer, LR schedule, batch size, tokenization, and scaffold-split handling). The manuscript states that comparisons follow the “chemberta3-faithful protocol” but does not supply hyperparameter tables, code, or explicit verification that every detail matches; this is load-bearing for the claim that the geometric biases, rather than implementation differences, explain the 7/10 and 6/8 wins.
  2. [DualSKA / Sphere-Flow recurrence description] The proof that the persistent state of the Sphere-Flow recurrence equals the truncated multipole expansion of the input distribution must be shown to hold exactly under the implemented truncation order L (e.g., L=3 in the ablation). The abstract presents the relation as proved, yet no numerical verification or stability analysis under finite truncation is referenced; this directly affects the claimed interpretability of DualSKA.
minor comments (1)
  1. [Abstract] The abstract introduces the symbols k and L only in the final ablation sentence; defining them at first use would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive comments on experimental documentation and theoretical verification. We address each major point below.

read point-by-point responses
  1. Referee: [Abstract and §4 (Experiments)] The central attribution of performance gains to the sphere-native modules requires that the ChemBERTa-2 baselines were trained under an exactly matching protocol (optimizer, LR schedule, batch size, tokenization, and scaffold-split handling). The manuscript states that comparisons follow the “chemberta3-faithful protocol” but does not supply hyperparameter tables, code, or explicit verification that every detail matches; this is load-bearing for the claim that the geometric biases, rather than implementation differences, explain the 7/10 and 6/8 wins.

    Authors: We agree that explicit documentation is required to support the attribution of performance differences to the sphere-native inductive biases. In the revised manuscript we will add a detailed hyperparameter table in the appendix that lists optimizer, learning-rate schedule, batch size, tokenization procedure, and scaffold-split handling for both Chem-GMNet and the ChemBERTa-2 baselines. We will also include a direct link to the public code repository containing the exact training scripts, allowing independent verification that the protocol is identical to the chemberta3-faithful setup. revision: yes

  2. Referee: [DualSKA / Sphere-Flow recurrence description] The proof that the persistent state of the Sphere-Flow recurrence equals the truncated multipole expansion of the input distribution must be shown to hold exactly under the implemented truncation order L (e.g., L=3 in the ablation). The abstract presents the relation as proved, yet no numerical verification or stability analysis under finite truncation is referenced; this directly affects the claimed interpretability of DualSKA.

    Authors: The algebraic proof in the manuscript is exact for any finite truncation order L because it relies only on the orthogonality of Gegenbauer polynomials and the linear recurrence definition; it does not depend on the specific value of L. To address the request for empirical confirmation at the implemented L=3, we will add an appendix section containing a numerical verification on representative inputs showing that the persistent state matches the truncated multipole expansion to machine precision, together with a brief stability analysis under small input perturbations. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain

full rationale

The paper's central claims are empirical performance comparisons on MoleculeNet tasks under stated protocols, not mathematical predictions or first-principles derivations that reduce to inputs by construction. The Sphere-Flow recurrence is described as having a proved equivalence to truncated multipole expansion, but this is presented as a derived property of the defined recurrence rather than a tautological redefinition of the target quantity. No load-bearing step (e.g., performance gains, ablation results) is shown to be statistically forced by fitting or self-citation chains. The architecture definitions (SH-Embedding, DualSKA, SH-FFN) introduce new inductive biases whose effects are measured experimentally against baselines, leaving the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the mathematical validity of the sphere-native replacements and on the fairness of the ChemBERTa-2 baseline comparison; no new physical entities are postulated.

free parameters (2)
  • sphere dimension k
    Chosen as 8 or 10 in the reported ablation; controls the dimension of the embedding sphere.
  • truncation order L
    Fixed at 3 in the ablation; controls the order of the multipole expansion inside DualSKA.
axioms (2)
  • standard math Gegenbauer polynomials provide a valid feature map for functions on the sphere that preserves rotational equivariance.
    Invoked in the definition of SH-Embedding and SH-FFN.
  • domain assumption The gated Sphere-Flow recurrence maintains a persistent state exactly equal to the truncated multipole expansion of the input distribution.
    Stated as a proved property in the DualSKA description.

pith-pipeline@v0.9.0 · 5662 in / 1493 out tokens · 78984 ms · 2026-05-14T19:48:15.988785+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 5 internal anchors

  1. [1]

    Kendall Atkinson and Weimin Han.Spherical Harmonics and Approximations on the Unit Sphere: An Introduc- tion, volume 2044 ofLecture Notes in Mathematics

    URLhttps://proceedings.neurip s.cc/paper_files/paper/2024/hash/d2c50c5b2e3 fcb33e8554d44b84eb52f-Abstract-Conference.ht ml. Kendall Atkinson and Weimin Han.Spherical Harmonics and Approximations on the Unit Sphere: An Introduc- tion, volume 2044 ofLecture Notes in Mathematics. Springer,

  2. [2]

    Ilyes Batatia, David Peter Kovacs, Gregor N

    URL https://doi.org/10.1007/97 8-3-642-25983-8. Ilyes Batatia, David Peter Kovacs, Gregor N. C. Simm, Christoph Ortner, and Gábor Csányi. MACE: Higher order equivariant message passing neural networks. In Advances in Neural Information Processing Systems (NeurIPS),

  3. [3]

    Andreas Bender, Nadine Schneider, Marwin Segler, W

    URLhttps: //doi.org/10.1038/s41467-022-29939-5. Andreas Bender, Nadine Schneider, Marwin Segler, W. Patrick Walters, Ola Engkvist, and Tiago Rodrigues. Evaluation guidelines for machine learning tools in the chemical sciences.Nature Reviews Chemistry, 6:428– 442,

  4. [4]

    Boris Bonev, Thorsten Kurth, Tom Kölbl, et al

    URL https://doi.org/10.1038/s41570 -022-00391-9. Boris Bonev, Thorsten Kurth, Tom Kölbl, et al. Attention on the sphere. InAdvances in Neural Information Processing Systems (NeurIPS),

  5. [5]

    Seyone Chithrananda, Gabriel Grand, and Bharath Ram- sundar

    URL https: //arxiv.org/abs/2505.11157. Seyone Chithrananda, Gabriel Grand, and Bharath Ram- sundar. ChemBERTa: Large-scale self-supervised pre- training for molecular property prediction.arXiv preprint arXiv:2010.09885,

  6. [6]

    Seyone Chithrananda, Gabriel Grand, Bharath Ramsun- dar, et al

    URL https://ar xiv.org/abs/2010.09885. Seyone Chithrananda, Gabriel Grand, Bharath Ramsun- dar, et al. ChemBERTa-2: Towards chemical founda- tion models.arXiv preprint arXiv:2209.01712,

  7. [7]

    Seyone Chithrananda, Gabriel Grand, Bharath Ramsun- dar, et al

    URLhttps://arxiv.org/abs/2209.01712. Seyone Chithrananda, Gabriel Grand, Bharath Ramsun- dar, et al. ChemBERTa-3: An open-source training framework for chemical foundation models.Chem- Rxiv preprint; Digital Discovery (RSC),

  8. [8]

    Krzysztof Choromanski, Valerii Likhosherstov, David Do- han, et al

    URL https://doi.org/10.1039/D5DD00348B. Krzysztof Choromanski, Valerii Likhosherstov, David Do- han, et al. Rethinking attention with Performers. In International Conference on Learning Representations (ICLR),

  9. [9]

    URL https://arxiv.org/abs/2009.1

  10. [10]

    Benedek Fabian, Thomas Edlich, Heloisa Gaspar, Marwin Segler, Joshua Meyers, Marco Fiscato, and Mohamed H

    URL https://doi.org/10.1016/j.dr udis.2023.103563. Benedek Fabian, Thomas Edlich, Heloisa Gaspar, Marwin Segler, Joshua Meyers, Marco Fiscato, and Mohamed H. S. Segler. Molecular representation learning with language models and domain-relevant auxiliary tasks (MolBERT).arXiv preprint arXiv:2011.13230,

  11. [11]

    Fabian B

    URLhttps://arxiv.org/abs/2011.13230. Fabian B. Fuchs, Daniel E. Worrall, Volker Fischer, and Max Welling. SE(3)-transformers: 3D roto-translation equivariant attention networks. InAdvances in Neural Information Processing Systems (NeurIPS),

  12. [12]

    Johannes Gasteiger, Janek Groß, and Stephan Gün- nemann

    URL https://arxiv.org/abs/2006.10503. Johannes Gasteiger, Janek Groß, and Stephan Gün- nemann. Directional message passing for molecular graphs (DimeNet). InInternational Conference on Learning Representations (ICLR),

  13. [13]

    Albert Gu and Tri Dao

    URLhttps: //arxiv.org/abs/2003.03123. Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752,

  14. [14]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    URL https://arxiv.org/ab s/2312.00752. Kexin Huang, Tianfan Fu, Wenhao Gao, et al. Ther- apeutics data commons: Machine learning datasets and tasks for drug discovery and development. In Advances in Neural Information Processing Systems 9 Datasets and Benchmarks Track (NeurIPS Datasets and Benchmarks),

  15. [15]

    Huang, T

    URLhttps://arxiv.org/ab s/2102.09548. Shuqi Ji, Xinhua Ren, Xinwei Li, et al. Uni-Mol2: An im- proved 3D molecular foundation model.arXiv preprint arXiv:2406.14005,

  16. [16]

    Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret

    URL https://arxiv.org/ab s/2406.14005. Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are RNNs: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning (ICML),

  17. [17]

    Transformers are

    URLhttps://arxiv.org/abs/2006.16236. Yi-Lun Liao, Brandon Wood, Abhishek Das, and Tess Smidt. EquiformerV2: Improved equivariant trans- former for scaling to higher-degree representations. In International Conference on Learning Representations (ICLR),

  18. [19]

    Łukasz Maziarka, Tomasz Danel, Sławomir Mucha, Krzysztof Rataj, Jacek Tabor, and Stanisław Jastrzęb- ski

    URL https://arxiv.org/ab s/2410.01131. Łukasz Maziarka, Tomasz Danel, Sławomir Mucha, Krzysztof Rataj, Jacek Tabor, and Stanisław Jastrzęb- ski. Molecule attention transformer.arXiv preprint arXiv:2002.08264,

  19. [20]

    Michael Poli, Stefano Massaroli, Eric Nguyen, et al

    URL https://arxiv.org/ab s/2002.08264. Michael Poli, Stefano Massaroli, Eric Nguyen, et al. Hyena hierarchy: Towards larger convolutional language mod- els. InInternational Conference on Machine Learning (ICML),

  20. [21]

    Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das

    URL https://arxiv.org/abs/2007.02835. Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das. Large- scale chemical language representations capture molec- ular structure and properties (MoLFormer-XL).Na- ture Machine Intelligence, 4:1256–1264,

  21. [22]

    URL https://doi.org/10.1038/s42256-022-00580-7. Isaac J. Schoenberg. Positive definite functions on spheres. Duke Mathematical Journal, 9(1):96–108,

  22. [23]

    Kristof T

    URL https://doi.org/10.1215/S0012-7094-42-00908 -6. Kristof T. Schütt, Pieter-Jan Kindermans, Huziel E. Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Müller. SchNet: A continuous-filter con- volutional neural network for modeling quantum inter- actions. InAdvances in Neural Information Processing Systems (NeurIPS),

  23. [24]

    SchNet: A continuous-filter convolutional neural network for modeling quantum interactions

    URLhttps://arxiv.org/ abs/1706.08566. Yutao Sun, Li Dong, Shaohan Huang, et al. Retentive network: A successor to Transformer for large language models.arXiv preprint arXiv:2307.08621,

  24. [25]

    Retentive Network: A Successor to Transformer for Large Language Models

    URL https://arxiv.org/abs/2307.08621. Maciej Sypetkowski, Frederik Wenkel, Farimah Pour- safaei, et al. On the scalability of GNNs for molec- ular graphs (MolGPS). InAdvances in Neural In- formation Processing Systems (NeurIPS),

  25. [26]

    Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley

    URL https://arxiv.org/abs/2404.11568. Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds.arXiv preprint arXiv:1802.08219,

  26. [27]

    Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

    URL https://arxiv.org/ab s/1802.08219. Fabio Urbina, Filippa Lentzos, Cédric Invernizzi, and Sean Ekins. Dual use of artificial-intelligence-powered drug discovery.Nature Machine Intelligence, 4(3):189– 191,

  27. [28]

    Hanchen Wang, Jean Kaddour, Shengchao Liu, et al

    URL https://doi.org/10.1038/s42256 -022-00465-9. Hanchen Wang, Jean Kaddour, Shengchao Liu, et al. Evaluating self-supervised learning for molecular graph embeddings. InAdvances in Neural Information Pro- cessing Systems (NeurIPS),

  28. [29]

    Sheng Wang, Yuzhi Guo, Yuhong Wang, Hongmao Sun, and Junzhou Huang

    URLhttps://arxi v.org/abs/2206.08005. Sheng Wang, Yuzhi Guo, Yuhong Wang, Hongmao Sun, and Junzhou Huang. SMILES-BERT: Large scale unsu- pervised pre-training for molecular property prediction. InProceedings of the 10th ACM International Confer- ence on Bioinformatics, Computational Biology and Health Informatics (ACM-BCB), pages 429–436,

  29. [30]

    Andrew Gordon Wilson and Pavel Izmailov

    URLhttps://doi.org/10.1145/3307339.3342186. Andrew Gordon Wilson and Pavel Izmailov. Bayesian deep learning and a probabilistic perspective of general- ization. InAdvances in Neural Information Processing Systems (NeurIPS),

  30. [31]

    Zhenqin Wu, Bharath Ramsundar, Evan N

    URLhttps://arxiv.org/ abs/2002.08791. Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. MoleculeNet: A benchmark for molecular machine learning.Chemical Science, 9 (2):513–530,

  31. [32]

    Gated Delta Networks: Improving Mamba2 with Delta Rule

    URL https://doi.org/10.1021/acs.jcim.9b00237. Songlin Yang, Jan Kautz, Ali Hatamizadeh, et al. Gated delta networks: Improving Mamba2 with delta rule. arXiv preprint arXiv:2412.06464, 2024a. URLhttps: //arxiv.org/abs/2412.06464. Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, and Yoon Kim. Gated linear attention trans- formers with hardware-effici...

  32. [33]

    • Splits.ThecanonicalDeepChem2.8.0 ScaffoldSplitter 80/10/10splitsareproducedontheflyby scripts/downstream/data.py::load_split— the sameDummyFeaturizer + scaffold + 200-character SMILES filter that the chemberta3prepare_data.py uses. Downstream users can verify test-set membership matches DeepChem with a single hash check against the loader output; no pre...

  33. [34]

    Compute infrastructure

    convention and the de facto MoleculeNet standard but below the threshold at which standard deviations on small-benchmark RMSE are them- selves reliably estimated. Additional seed runs are inex- pensive on our hardware (∼15–30minutes per seed per endpoint on a single H100; see “Compute infrastructure” below) and can be provided during the rebuttal upon rev...

  34. [35]

    The headline observations—the geometric arm wins the per-endpoint race on a clear majority un- der both protocols—are robust to this caveat

    are too sparse at three seeds to claim per-endpoint sig- nificance; several margins are within one geometric-arm standard deviation, and we treat per-endpoint numbers as preliminary. The headline observations—the geometric arm wins the per-endpoint race on a clear majority un- der both protocols—are robust to this caveat. Expanded multi-seed runs are avai...

  35. [36]

    Classification labels are coarser than continuous regression targets, so the geometric prior is less starved at the smallerD⋆associated with smaller (k,L)

    a close second. Classification labels are coarser than continuous regression targets, so the geometric prior is less starved at the smallerD⋆associated with smaller (k,L). •L= 4is not uniformly better thanL=3:at k=8, ESOL degrades from1.010to1 .042. Higher harmonic degrees introduce features that are mostly noise on small-data tasks; the sweet spot in thi...

  36. [37]

    2.253 49.754 1.105 1.212 0.812 0.6970.9060.719 Random Forest1.31852.077 1.741 0.962 0.8510.719 0.783 0.724 GCN 1.645 51.227 0.885 0.781 0.818 0.676 0.907 0.688 ChemBERTa-1 (Chithrananda et al.,

  37. [38]

    Same hidden width, depth, head count, vocabulary, and tokenizer in both arms; the only difference is the architecture inside each block

    — — — — — 0.643 0.733 0.728 ChemBERTa-2 MLM- and MTR-pretrained (as published in (Chithrananda et al., 2022)) ChemBERTa-2 MLM-5M 1.451 54.601 0.946 0.986 0.793 0.701 0.341 0.762 ChemBERTa-2 MLM-10M 1.611 53.859 0.961 1.009 0.729 0.696 0.349 0.748 ChemBERTa-2 MLM-77M 1.509 52.754 1.025 0.987 0.735 0.698 0.239 0.749 ChemBERTa-2 MTR-5M 1.477 50.154 0.874 0.7...