pith. sign in

arxiv: 2510.18900 · v2 · submitted 2025-10-20 · ⚛️ physics.chem-ph · cond-mat.mtrl-sci· cs.LG

Foundation Models for Discovery and Exploration in Chemical Space

Pith reviewed 2026-05-18 05:47 UTC · model grok-4.3

classification ⚛️ physics.chem-ph cond-mat.mtrl-scics.LG
keywords molecular foundation modelschemical spacestructure-property predictionolfactory perceptionSmirk tokenizerneural scaling lawsmaterials discoveryhyperbolic geometry
0
0 comments X p. Extension

The pith

Molecular foundation models called MIST predict over 400 chemical properties and generalize to mapping molecular scents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce MIST, a family of molecular foundation models with substantially more parameters and training data than earlier efforts. These models rely on a new tokenizer named Smirk that encodes nuclear, electronic, and geometric details from molecular structures. After fine-tuning, the models handle more than 400 structure-property prediction tasks and reach or surpass existing best results on benchmarks spanning physiology and electrochemistry. They also address concrete applications such as screening electrolyte solvents, reasoning about organometallic stereochemistry, and estimating mixture properties. The clearest sign of foundation-model capability appears when the same models predict scent profiles for molecules and organize olfactory space in a hierarchical manner consistent with hyperbolic geometry, even though scent data was never an explicit training target.

Core claim

MIST models use the Smirk tokenizer to comprehensively capture nuclear, electronic, and geometric information from molecules, enabling them to learn diverse representations across chemical space. Fine-tuned versions predict more than 400 structure-property relationships with performance at or above the state of the art on diverse benchmarks. The models address real-world challenges such as multiobjective electrolyte solvent screening, stereochemical reasoning for organometallics, and mixture property prediction. They also accurately predict scent profiles and form a hierarchical representation of olfactory space that is consistent with hyperbolic geometry. Hyperparameter-aware Bayesian神经网络缩放

What carries the argument

The Smirk tokenizer, which encodes nuclear, electronic, and geometric information from molecular structures to support learning of broad representations in the MIST foundation models.

If this is right

  • The models enable multiobjective screening of electrolyte solvents for desired performance criteria.
  • Stereochemical reasoning tasks for organometallic compounds become tractable without custom model development.
  • Properties of chemical mixtures can be estimated from component structures alone.
  • New problems in chemical space can be solved without explicit training on those exact tasks.
  • The models learn hierarchical representations that align with hyperbolic geometry in perceptual domains such as olfaction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the models truly capture general molecular features, they could extend to other sensory or biological domains that share underlying physical principles.
  • The Bayesian scaling laws may let researchers train still larger models on modest compute budgets by removing the need for repeated hyperparameter searches.
  • A single foundation model might eventually reduce reliance on many narrow, task-specific predictors in chemistry and materials science.
  • The observed hyperbolic structure in olfactory space raises the possibility that similar geometries appear in other learned representations of molecular or biological data.

Load-bearing premise

The Smirk tokenizer captures all relevant nuclear, electronic, and geometric information from molecular structures without significant loss or bias.

What would settle it

Direct comparison of MIST scent-profile predictions against measured human sensory data for a large set of molecules outside any training distribution, or quantitative verification that the learned olfactory embeddings exhibit negative curvature consistent with hyperbolic geometry.

Figures

Figures reproduced from arXiv: 2510.18900 by Alexander Brace, Alexander B. Wiltschko, Alexius Wadell, Andrew J. Stier, Anoushka Bhutani, Anuj K. Nayak, Arvind Ramanathan, Austin R. Ellis-Mohr, Benjamin Amorelli, Bharath Ramsundar, Celia Kelly, Dimitrios Simatos, Hancheng Zhao, Hongyi Lin, Jack Wells, Kareem Hegazy, Karthik Duraisamy, Kevin Gering, Lav R. Varshney, Melisa Alkan, Michael W. Mahoney, Murali Emani, Richard C. Gerkin, Tom Gibbs, Venkatasubramanian Viswanathan, Venkatram Vishwanath, Victor Azumah, Wesley W. Qian, Yuhan Chen.

Figure 1
Figure 1. Figure 1: MIST models molecules across chemical space, accelerating a broad range of mate￾rials design tasks. (a) Molecular Insight SMILES Transformers (MIST) is a family of molecular FMs which match or exceed state-of-the-art performance across diverse benchmarks, from physiology to quantum chemistry. The models solve real-world problems across chemical space — ranging from multiobjective elec￾trolyte solvent scree… view at source ↗
Figure 2
Figure 2. Figure 2: MIST rapidly explores chemical space. (a) Hierarchical clustering of logit correlations from a MIST-1.8B variant fine-tuned on scent classification recovers human-interpretable scent relationships; for example “roasted”, “meaty” and “beefy” are all highly correlated (brown cluster). (b) MIST-28M accurately predicts quantum, chemical, and thermodynamic descriptors for electrolyte design — including orbital … view at source ↗
Figure 3
Figure 3. Figure 3: Exploring chemical space beyond organic molecules: organometallics, isotopes, and mixtures. (a) To fine-tune MIST models on binary mixture properties, we use a specialized permutation￾invariant task network. This task network uses MIST embeddings ⃗ei to compute the linear mixing PL and excess PE contributions to the mixture property Pmix. (b) To fine-tune models on mixture excess properties, we curated a d… view at source ↗
Figure 4
Figure 4. Figure 4: Interpretability analysis reveals MIST’s potential as a robust tool for exploration and discovery. (a) Interpretable chemical features were extracted from every stage of MIST models — token embeddings to downstream predictions. (b) A t-SNE projection of token embeddings shows dataset￾specific structure. All three datasets shown cluster the halogens and digits. The transition metal Quantum Mechanics dataset… view at source ↗
Figure 5
Figure 5. Figure 5: Compute-optimal training was critical to efficiently scaling MIST. (a) We adopted MCMC sampling to parameterize neural scaling laws, with penalty terms accounting for off-optimal hy￾perparameter selection, enabling robust predictions of the compute-optimal frontier. (b) Covariance plot of posterior samples for E and α/β after fitting penalized neural scaling laws shows low variance in α/β but higher varian… view at source ↗
read the original abstract

Accurate prediction of atomistic, thermodynamic, and kinetic properties from molecular structures underpins materials innovation. Existing computational and experimental approaches lack the scalability required to navigate chemical space efficiently. Scientific foundation models trained on large unlabelled datasets offer a path towards navigating chemical space across application domains. Here, we develop MIST, a family of molecular foundation models with up to an order of magnitude more parameters and data than prior works. Trained using a novel tokenizer, Smirk, which comprehensively captures nuclear, electronic, and geometric information, MIST learns a diverse range of molecules. MIST models have been fine-tuned to predict more than 400 structure-property relationships and have been shown to match or exceed state-of-the-art performance across diverse benchmarks, from physiology to electrochemistry. We demonstrate the ability of these models to solve real-world problems across chemical space from multiobjective electrolyte solvent screening to stereochemical reasoning for organometallics and mixture property prediction. The clearest demonstration of a foundation model is its ability to solve problems that were neither explicit targets of training nor central to the intentions of its developers. We identify olfactory perception mapping as such a problem, and show that MIST accurately predicted scent profiles and learned a hierarchical representation of olfactory space consistent with hyperbolic geometry. We formulated hyperparameter aware Bayesian neural scaling laws which eliminate the need for hyperparameter sweeps at every scale, making training large compute-optimal models feasible on a limited compute budget. The methods and findings presented here represent a significant step towards accelerating materials discovery, design, and optimization using foundation models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces MIST, a family of large molecular foundation models trained on extensive unlabeled data using a novel Smirk tokenizer asserted to comprehensively encode nuclear, electronic, and geometric features from molecular structures. The models are fine-tuned on more than 400 structure-property tasks and reported to match or exceed SOTA performance across benchmarks spanning physiology to electrochemistry. Applications are demonstrated in electrolyte solvent screening, stereochemical reasoning, and mixture property prediction. The central evidence for foundation-model behavior is zero-shot accurate prediction of scent profiles together with a learned hierarchical representation of olfactory space consistent with hyperbolic geometry. The work also formulates hyperparameter-aware Bayesian neural scaling laws to enable compute-optimal training without exhaustive sweeps.

Significance. If the performance and generalization claims are substantiated with detailed metrics and ablations, MIST would constitute a meaningful advance in scaling foundation models for chemical space navigation, with potential to accelerate materials discovery. The Bayesian scaling-law formulation that removes the need for per-scale hyperparameter sweeps is a concrete methodological strength that could be adopted more broadly. The zero-shot olfactory result, if shown to be robust rather than spurious, would provide a strong falsifiable test of emergent property capture. These elements, taken together, would support the paper's positioning as a step toward practical foundation-model use in chemistry.

major comments (3)
  1. [Abstract] Abstract: the claim that MIST models 'match or exceed state-of-the-art performance across diverse benchmarks' on >400 tasks is presented without any tabulated metrics, error bars, benchmark identifiers, or ablation results, rendering the central performance assertion impossible to evaluate from the given text.
  2. [Smirk tokenizer description] Description of the Smirk tokenizer: the assertion that it 'comprehensively captures nuclear, electronic, and geometric information' is load-bearing for the zero-shot olfactory generalization, yet no reconstruction-error statistics, mutual-information scores with electronic properties (partial charges, HOMO/LUMO), or ablation on 3D conformer recovery are supplied; without these, the risk that olfactory predictions rest on incomplete or biased encodings cannot be quantified.
  3. [Olfactory perception mapping] Olfactory perception results: the reported accurate scent-profile prediction and hyperbolic embedding hierarchy are presented as the clearest demonstration of foundation-model behavior, but the manuscript does not include controls or ablations showing that these outcomes survive removal of auxiliary electronic or geometric features; this omission directly affects the claim that the representation is comprehensive rather than correlational.
minor comments (2)
  1. [Abstract] Abstract: the statement 'up to an order of magnitude more parameters and data than prior works' would be strengthened by explicit numerical comparison to the largest previously published molecular foundation models.
  2. [Bayesian neural scaling laws] Scaling-laws section: the precise functional form of the hyperparameter-aware Bayesian neural scaling laws and the validation procedure against held-out empirical runs should be stated more explicitly to allow independent reproduction.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The comments highlight important areas where additional evidence and clarity will strengthen the manuscript. We address each major comment below and have revised the manuscript accordingly to provide the requested metrics, statistics, and controls.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that MIST models 'match or exceed state-of-the-art performance across diverse benchmarks' on >400 tasks is presented without any tabulated metrics, error bars, benchmark identifiers, or ablation results, rendering the central performance assertion impossible to evaluate from the given text.

    Authors: We agree that the abstract would benefit from greater specificity to allow direct evaluation of the performance claims. In the revised manuscript we have expanded the abstract to reference key benchmark identifiers and representative metrics (with error bars) drawn from the main results and supplementary tables. Detailed tabulated results, including per-task metrics, error bars, and ablation summaries across the >400 tasks, are now explicitly signposted in the main text and supplementary information. revision: yes

  2. Referee: [Smirk tokenizer description] Description of the Smirk tokenizer: the assertion that it 'comprehensively captures nuclear, electronic, and geometric information' is load-bearing for the zero-shot olfactory generalization, yet no reconstruction-error statistics, mutual-information scores with electronic properties (partial charges, HOMO/LUMO), or ablation on 3D conformer recovery are supplied; without these, the risk that olfactory predictions rest on incomplete or biased encodings cannot be quantified.

    Authors: We acknowledge that quantitative validation of the tokenizer's encoding capacity was not provided in the initial submission. In the revision we have added a dedicated supplementary section reporting (i) reconstruction-error statistics for molecular graphs and 3D structures, (ii) mutual-information scores between tokenizer-derived representations and electronic properties including partial charges and HOMO/LUMO energies, and (iii) results from an ablation study measuring 3D conformer recovery accuracy. These additions allow readers to assess the completeness of the encoding directly. revision: yes

  3. Referee: [Olfactory perception mapping] Olfactory perception results: the reported accurate scent-profile prediction and hyperbolic embedding hierarchy are presented as the clearest demonstration of foundation-model behavior, but the manuscript does not include controls or ablations showing that these outcomes survive removal of auxiliary electronic or geometric features; this omission directly affects the claim that the representation is comprehensive rather than correlational.

    Authors: We agree that explicit controls are necessary to distinguish comprehensive representation from spurious correlations. We have performed the requested ablation experiments and report the results in the revised manuscript: zero-shot olfactory prediction accuracy and the hyperbolic geometry of the learned embedding space are re-evaluated after systematic removal or masking of electronic and geometric features. The outcomes remain consistent, supporting the claim that the integrated representation drives the observed foundation-model behavior. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical training and external benchmarks drive results

full rationale

The paper's core claims rest on training MIST models with the Smirk tokenizer on large unlabeled molecular datasets, followed by fine-tuning to predict over 400 structure-property relationships and evaluation on diverse external benchmarks spanning physiology, electrochemistry, and zero-shot olfactory tasks. These outcomes are data-driven performance measurements rather than algebraic reductions, self-definitional mappings, or predictions forced by fitted inputs. The hyperparameter-aware Bayesian neural scaling laws are presented as a training optimization method to avoid exhaustive sweeps, but they do not reduce any reported predictions to the inputs by construction. No load-bearing self-citations, imported uniqueness theorems, or ansatzes smuggled via prior work are invoked to justify the central results; the olfactory hyperbolic geometry finding is an observed empirical pattern on held-out tasks. The derivation chain is therefore self-contained against external validation data.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

Central claims depend on the effectiveness of the new tokenizer and the assumption that pre-training on unlabeled molecular data yields transferable representations; many training hyperparameters and model architectural choices are implicit free parameters.

free parameters (2)
  • model parameter count and training data volume
    Chosen up to an order of magnitude larger than prior works to achieve claimed performance.
  • hyperparameters in Bayesian neural scaling laws
    Formulated to avoid sweeps; specific values fitted or chosen during development.
axioms (1)
  • domain assumption Large-scale pre-training on unlabeled molecular structures produces generalizable representations usable for downstream property prediction and zero-shot tasks.
    Invoked to justify training MIST on unlabelled datasets before fine-tuning.
invented entities (1)
  • Smirk tokenizer no independent evidence
    purpose: To encode nuclear, electronic, and geometric information comprehensively from molecular structures.
    Newly introduced component whose completeness is central to the claimed generalization.

pith-pipeline@v0.9.0 · 5956 in / 1440 out tokens · 39344 ms · 2026-05-18T05:47:13.157489+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery

    cs.LG 2026-05 conditional novelty 7.0

    fmxcoders improve cross-layer feature recovery in transformers via factorized weights and layer masking, delivering 10-30 point probing F1 gains, 25-50% lower MSE, doubled functional coherence, and 3-13x more coherent...

  2. Energy-Aware Routing to Large Reasoning Models

    cs.AI 2025-12 unverdicted novelty 4.0

    In the critical regime for energy provisioning to large reasoning models, performance is volatility-limited, motivating variance-aware routing policies based on training and inference compute scaling laws.

Reference graph

Works this paper leans on

252 extracted references · 252 canonical work pages · cited by 2 Pith papers · 15 internal anchors

  1. [1]

    Chemical Space and Biology

    Christopher M. Dobson. “Chemical Space and Biology”. In:Nature432.7019 (Dec. 1, 2004), pp. 824– 828

  2. [2]

    How Machine Learning Will Revolutionize Electrochemical Sciences

    Aashutosh Mistry et al. “How Machine Learning Will Revolutionize Electrochemical Sciences”. In: ACS Energy Lett.6 (Mar. 23, 2021), pp. 1422–1431

  3. [4]

    July 24, 2024

    Eduardo Soares et al.A Large Encoder-Decoder Family of Foundation Models For Chemical Language. July 24, 2024. arXiv:2407.20267. Pre-published

  4. [5]

    Chemical Space

    Peter Kirkpatrick and Clare Ellis. “Chemical Space”. In:Nature432.7019 (Dec. 1, 2004), pp. 823–823

  5. [6]

    Up–down Approach for Expanding the Chemical Space of Metal–Organic Frame- works

    Jiyeon Kim et al. “Up–down Approach for Expanding the Chemical Space of Metal–Organic Frame- works”. In:Nature Synthesis3.12 (Dec. 2024), pp. 1518–1528

  6. [7]

    Navigating Chemical Space for Biology and Medicine

    Christopher Lipinski and Andrew Hopkins. “Navigating Chemical Space for Biology and Medicine”. In:Nature432.7019 (Dec. 2004), pp. 855–861

  7. [8]

    Generative AI for Navigating Synthesizable Chem- ical Space

    Wenhao Gao, Shitong Luo, and Connor W. Coley. “Generative AI for Navigating Synthesizable Chem- ical Space”. In:Proceedings of the National Academy of Sciences122.41 (Oct. 14, 2025), e2415665122

  8. [9]

    Why Big Data and Compute Are Not Necessarily the Path to Big Materials Science

    Naohiro Fujinuma et al. “Why Big Data and Compute Are Not Necessarily the Path to Big Materials Science”. In:Commun Mater3.1 (Aug. 30, 2022), p. 59

  9. [12]

    Alec Radford et al.Learning Transferable Visual Models From Natural Language Supervision. Feb. 26,

  10. [13]

    Learning Transferable Visual Models From Natural Language Supervision

    arXiv:2103.00020 [cs].url:http://arxiv.org/abs/2103.00020. Pre-published

  11. [15]

    A Vision–Language Foundation Model for Precision Oncology

    Jinxi Xiang et al. “A Vision–Language Foundation Model for Precision Oncology”. In:Nature638.8051 (Feb. 2025), pp. 769–778

  12. [16]

    Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts

    Charles O’Neill et al. “Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts”. In: Neurips 2024 Workshop Foundation Models for Science: Progress, Opportunities, and Challenges. Nov. 2, 2024

  13. [18]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani et al.On the Opportunities and Risks of Foundation Models. July 12, 2022. arXiv: 2108.07258 [cs].url:http://arxiv.org/abs/2108.07258. Pre-published

  14. [24]

    Version 1

    Yinhan Liu et al.RoBERTa: A Robustly Optimized BERT Pretraining Approach. Version 1. 2019. Pre-published. 24

  15. [28]

    Accelerating Electrolyte Discovery for Energy Storage with High-Throughput Screening

    Lei Cheng et al. “Accelerating Electrolyte Discovery for Energy Storage with High-Throughput Screening”. In:Journal of Physical Chemistry Letters6.2 (2015), pp. 283–291

  16. [31]

    Flammability of Li-Ion Battery Electrolytes: Flash Point and Self-Extinguishing Time Measurements

    Steffen Hess, Margret Wohlfahrt-Mehrens, and Mario Wachtler. “Flammability of Li-Ion Battery Electrolytes: Flash Point and Self-Extinguishing Time Measurements”. In:J. Electrochem. Soc.162.2 (2015), A3084–A3097

  17. [32]

    Petrucci.General Chemistry: Principles and Modern Applications

    R.H. Petrucci.General Chemistry: Principles and Modern Applications. Pearson Education. Pearson Education International, 2007

  18. [33]

    Predicting Human Olfactory Perception from Chemical Features of Odor Molecules

    Andreas Keller et al. “Predicting Human Olfactory Perception from Chemical Features of Odor Molecules”. In:Science355.6327 (Feb. 24, 2017), pp. 820–826

  19. [34]

    Combinatorial Receptor Codes for Odors

    Bettina Malnic et al. “Combinatorial Receptor Codes for Odors”. In:Cell96.5 (Mar. 5, 1999), pp. 713–

  20. [35]

    Application of Artificial Intelligence to Decode the Relationships between Smell, Olfactory Receptors and Small Molecules

    Rayane Achebouche et al. “Application of Artificial Intelligence to Decode the Relationships between Smell, Olfactory Receptors and Small Molecules”. In:Scientific Reports12.1 (Nov. 5, 2022), p. 18817

  21. [37]

    $$\alpha$$-Decay Half-Life Predictions with Support Vector Machine

    Amir Jalili et al. “$$\alpha$$-Decay Half-Life Predictions with Support Vector Machine”. In:Sci- entific Reports14.1 (Dec. 28, 2024), p. 30776

  22. [38]

    Catalysis in the Excited State: Bringing Innate Transition Metal Photochemistry into Play

    Fabio Juli´ a. “Catalysis in the Excited State: Bringing Innate Transition Metal Photochemistry into Play”. In:ACS Catal.15.6 (Mar. 21, 2025), pp. 4665–4680

  23. [39]

    Molecular Design Principles for Photoac- tive Transition Metal Complexes: A Guide for “Photo-Motivated

    Giacomo Morselli, Christian Reber, and Oliver S. Wenger. “Molecular Design Principles for Photoac- tive Transition Metal Complexes: A Guide for “Photo-Motivated” Chemists”. In:J. Am. Chem. Soc. 147.14 (Apr. 9, 2025), pp. 11608–11624

  24. [43]

    Murtaza Zohair et al.Chemical Foundation Model Guided Design of High Ionic Conductivity Elec- trolyte Formulations. Mar. 20, 2025. arXiv:2503.14878 [cond-mat].url:http://arxiv.org/abs/ 2503.14878. Pre-published

  25. [44]

    MD simulations explain the excess molar enthalpies in pseudo- binary mixtures of a choline chloride-based deep eutectic solvent with water or methanol

    Leon de Villiers Engelbrecht et al. “MD simulations explain the excess molar enthalpies in pseudo- binary mixtures of a choline chloride-based deep eutectic solvent with water or methanol”. In:Fron- tiers in Chemistry10 (2022), p. 983281

  26. [45]

    Algebraic Representation of Thermodynamic Properties and the Classification of Solutions

    Otto Redlich and A. T. Kister. “Algebraic Representation of Thermodynamic Properties and the Classification of Solutions”. In:Ind. Eng. Chem.40.2 (Feb. 1948), pp. 345–348. 25

  27. [47]

    Definitions, Methods, and Applications in Interpretable Machine Learning

    W. James Murdoch et al. “Definitions, Methods, and Applications in Interpretable Machine Learning”. In:Proceedings of the National Academy of Sciences116.44 (Oct. 29, 2019), pp. 22071–22080

  28. [48]

    Transformer Circuits Thread

    Adly Templeton et al.Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread. May 21, 2024.url:https://transformer- circuits.pub/ 2024/scaling-monosemanticity/index.html

  29. [49]

    Sonia Joseph et al.Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video. 2025. arXiv:2504.19475 [cs.CV]

  30. [50]

    GenSLMs: Genome-scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics

    Maxim Zvyagin et al. “GenSLMs: Genome-scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics”. In:The International Journal of High Performance Computing Applications37.6 (Nov. 2023), pp. 683–705

  31. [51]

    Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings

    Christopher A. Lipinski et al. “Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings”. In:Advanced Drug Delivery Reviews 23.1–3 (Jan. 1997), pp. 3–25

  32. [53]

    A History of the Structural Theory of Benzene - The Aromatic Sextet Rule and Huckel’s Rule

    Shigeaki Kikuchi. “A History of the Structural Theory of Benzene - The Aromatic Sextet Rule and Huckel’s Rule”. In:Journal of Chemical Education74.2 (Feb. 1, 1997), p. 194

  33. [54]

    Utkarsh Sharma and Jared Kaplan.A Neural Scaling Law from the Dimension of the Data Manifold. Apr. 22, 2020. arXiv:2004.10802 [cs, stat]. Pre-published

  34. [55]

    Sequence Modeling and Design from Molecular to Genome Scale with Evo

    Eric Nguyen et al. “Sequence Modeling and Design from Molecular to Genome Scale with Evo”. In: Science386.6723 (Nov. 15, 2024), eado9336

  35. [60]

    Scaling Laws from the Data Manifold Dimension

    Utkarsh Sharma and Jared Kaplan. “Scaling Laws from the Data Manifold Dimension”. In:J. Mach. Learn. Res.23.1 (Jan. 1, 2022), 9:343–9:376

  36. [64]

    Ben Sorscher et al.Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning. Apr. 21, 2023. arXiv:2206.14486 [cs].url:http://arxiv.org/abs/2206.14486. Pre-published

  37. [65]

    Language Models Are Few-Shot Learners

    Tom Brown et al. “Language Models Are Few-Shot Learners”. In:Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. [64]Huggingface/Transformers: Transformers: State-of-the-art Machine Learning for Pytorch, Tensor- Flow, and JAX.Version 4.40.2. May 6, 2024. [65]Microsoft/DeepSpeed. Microsoft, May 15, 2024. 26

  38. [66]

    Zenodo, June 7, 2023

    William Falcon and The PyTorch Lightning team.PyTorch Lightning. Zenodo, June 7, 2023

  39. [67]

    Jerry Ma and Denis Yarats.On the Adequacy of Untuned Warmup for Adaptive Optimization. Mar. 19,

  40. [68]

    Pre-published

    arXiv:1910.04209 [cs, stat].url:http://arxiv.org/abs/1910.04209. Pre-published

  41. [69]

    Ilya Loshchilov and Frank Hutter.Decoupled Weight Decay Regularization. Jan. 4, 2019. arXiv:1711. 05101 [cs]. Pre-published. [69]Lightning-AI Torchmetrics. Version 1.4.0. Lightning AI, May 6, 2024

  42. [70]

    On the Art of Compiling and Using ’Drug-Like’ Chemical Fragment Spaces

    J¨ org Degen et al. “On the Art of Compiling and Using ’Drug-Like’ Chemical Fragment Spaces”. In: ChemMedChem3.10 (Oct. 20, 2008), pp. 1503–1507

  43. [73]

    Comparative Study on Transport Properties for LiFAP and LiPF6 in Alkyl- Carbonates as Electrolytes through Conductivity, Viscosity and NMR Self-Diffusion Measurements

    Patrice Porion et al. “Comparative Study on Transport Properties for LiFAP and LiPF6 in Alkyl- Carbonates as Electrolytes through Conductivity, Viscosity and NMR Self-Diffusion Measurements”. In:Electrochimica Acta114 (Dec. 30, 2013), pp. 95–104

  44. [74]

    Predicting Electrolyte Con- ductivity Directly from Molecular-Level Interactions

    Yumin Zhang, Imanuel Bier, and Venkatasubramanian Viswanathan. “Predicting Electrolyte Con- ductivity Directly from Molecular-Level Interactions”. In:ACS Energy Lett.7.11 (Nov. 11, 2022), pp. 4061–4070

  45. [75]

    Ionic conduction and solution structure in LiPF6 and LiBF4 propylene car- bonate electrolytes

    Sunwook Hwang et al. “Ionic conduction and solution structure in LiPF6 and LiBF4 propylene car- bonate electrolytes”. In:The Journal of Physical Chemistry C122.34 (2018), pp. 19438–19446

  46. [78]

    The COMPAS Project : A Computational Database of Polycyclic Aro- matic Systems . Phase 1: Cata - Condensed Polybenzenoid Hydrocarbons

    Alexandra Wahab et al. “The COMPAS Project : A Computational Database of Polycyclic Aro- matic Systems . Phase 1: Cata - Condensed Polybenzenoid Hydrocarbons”. In:Journal of Chemical Information and Modeling62.16 (Aug. 22, 2022), pp. 3704–3713

  47. [79]

    COMPAS-2 : A Dataset of Cata-Condensed Hetero-Polycyclic Aromatic Systems

    Eduardo Mayo Yanes, Sabyasachi Chakraborty, and Renana Gershoni-Poranne. “COMPAS-2 : A Dataset of Cata-Condensed Hetero-Polycyclic Aromatic Systems”. In:Scientific Data11.1 (Jan. 19, 2024), p. 97

  48. [80]

    Alexandra Wahab and Renana Gershoni-Poranne.COMPAS-3 : A Data Set of Peri- Condensed Poly- benzenoid Hydrocarbons. Feb. 26, 2024.url:https://chemrxiv.org/engage/chemrxiv/article- details/65d8c60ae9ebbb4db90f6276. Pre-published

  49. [81]

    Huckel theory and aromatically

    L. J. Schaad and B. A. Jr. Hess. “Huckel theory and aromatically”. In:Journal of Chemical Education 51.10 (1974), p. 640. eprint:https://doi.org/10.1021/ed051p640

  50. [87]

    Lukasz Maziarka et al.Molecule Attention Transformer. Feb. 19, 2020. arXiv:2002 . 08264 [cs]. Pre-published. 27

  51. [88]

    Mol-BERT: An Effective Molecular Representation with BERT for Molecular Property Prediction

    Juncai Li and Xiaofei Jiang. “Mol-BERT: An Effective Molecular Representation with BERT for Molecular Property Prediction”. In:Wireless Communications and Mobile Computing2021.1 (Jan. 2021). Ed. by Yulin Wang, p. 7181815

  52. [89]

    SELFormer: Molecular Representation Learning via SELFIES Language Mod- els

    Atakan Y¨ uksel et al. “SELFormer: Molecular Representation Learning via SELFIES Language Mod- els”. In:Mach. Learn.: Sci. Technol.4.2 (June 1, 2023), p. 025035

  53. [90]

    A Fingerprints Based Molecular Property Prediction Method Using the BERT Model

    Naifeng Wen et al. “A Fingerprints Based Molecular Property Prediction Method Using the BERT Model”. In:J Cheminform14.1 (Oct. 21, 2022), p. 71

  54. [92]

    Afnan Sultan et al.Transformers for Molecular Property Prediction: Lessons Learned from the Past Five Years. Apr. 5, 2024. arXiv:2404.03969 [cs]. Pre-published

  55. [93]

    Ueda.SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery

    Shion Honda, Shoi Shi, and Hiroki R. Ueda.SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery. Nov. 12, 2019. arXiv:1911.04738 [cs, stat].url:http://arxiv. org/abs/1911.04738. Pre-published

  56. [94]

    Regression Transformer Enables Concurrent Sequence Regression and Generation for Molecular Language Modelling

    Jannis Born and Matteo Manica. “Regression Transformer Enables Concurrent Sequence Regression and Generation for Molecular Language Modelling”. In:Nature Machine Intelligence5.4 (Apr. 2023), pp. 432–444

  57. [95]

    Chemformer: A Pre-Trained Transformer for Computational Chemistry

    Ross Irwin et al. “Chemformer: A Pre-Trained Transformer for Computational Chemistry”. In:Mach. Learn.: Sci. Technol.3.1 (Mar. 1, 2022), p. 015022

  58. [96]

    X-MOL: Large-Scale Pre-Training for Molecular Understanding and Diverse Molecular Analysis

    Dongyu Xue et al. “X-MOL: Large-Scale Pre-Training for Molecular Understanding and Diverse Molecular Analysis”. In:Science Bulletin67.9 (May 2022), pp. 899–902. 28 Supplementary Information for Foundation Models for Discovery and Exploration in Chemical Space Alexius Wadell∗1, Anoushka Bhutani ∗1, Victor Azumah 2, Austin R. Ellis-Mohr 3, Celia Kelly1, Han...

  59. [97]

    Generate a single conformer using RDKit’s [77]ETKDGv3[180]

  60. [98]

    Embed molecules using OpenBabel [181] and the UFF (Universal force field) [182] to generate a single starting conformer

  61. [99]

    ethereal

    Generate 200 conformers using RDKit’s [77]ETKDFv3[180] and select the lowest energy conformer after relaxation with the UFF [182] or MMFF (Merck molecular force field) [183, 184]. We evaluated each method by computing all QM9 reported properties for up to 100 randomly selected molecules from the QM9 dataset [29]. Parity plots of our calculations versus th...

  62. [100]

    Remove any molecule that was rejected byrdkit’sMolFromSMILES

  63. [101]

    De-duplicate dataset usingrdkit’s computed InChI Key

  64. [102]

    Use iterative proportional refitting to randomly sample a balanced dataset

  65. [103]

    Additivity

    Usescikit-learn’sStratifiedShuffleSplitto split the dataset into train/validation/test (80/10/10) while preserving the relative frequency of passing molecules. Initial Resampled H-Donor 99.2% 84.9% H-Acceptor 98.9% 84.2% MWT 97.1% 81.6% Log P 96.2% 73.6% Dataset Size 279,066 10,000 Table S7: Frequency of molecules passing each of Lipinski’s RO5 criteria, ...

  66. [104]

    Large-Scale Chemical Language Representations Capture Molecular Structure and Properties

    Jerret Ross et al. “Large-Scale Chemical Language Representations Capture Molecular Structure and Properties”. In:Nat Mach Intell4.12 (Dec. 2022), pp. 1256–1264

  67. [105]

    Differentiable Modeling and Optimization of Non-Aqueous Li-based Battery Elec- trolyte Solutions Using Geometric Deep Learning

    Shang Zhu et al. “Differentiable Modeling and Optimization of Non-Aqueous Li-based Battery Elec- trolyte Solutions Using Geometric Deep Learning”. In:Nat Commun15.1 (Oct. 5, 2024), p. 8649

  68. [106]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin et al.BERT: Pre-training of Deep Bidirectional Transformers for Language Understand- ing. May 24, 2019. arXiv:1810.04805 [cs]. Pre-published

  69. [107]

    Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar.ChemBERTa: Large-Scale Self- Supervised Pretraining for Molecular Property Prediction. Oct. 23, 2020. arXiv:2010.09885 [physics, q-bio].url:http://arxiv.org/abs/2010.09885. Pre-published

  70. [108]

    Enamine Ltd.REAL Space. 2024

  71. [109]

    July 8, 2025

    Alexius Wadell, Anoushka Bhutani, and Venkatasubramanian Viswanathan.Tokenization for Molec- ular Foundation Models. July 8, 2025. arXiv:2409.15370 [cs]. Pre-published

  72. [110]

    Jordan Hoffmann et al.Training Compute-Optimal Large Language Models. Mar. 29, 2022. arXiv: 2203.15556 [cs]. Pre-published

  73. [111]

    Jared Kaplan et al.Scaling Laws for Neural Language Models. Jan. 22, 2020. arXiv:2001.08361 [cs, stat]. Pre-published

  74. [112]

    Xiao Bi et al.DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. Jan. 5, 2024. arXiv:2401.02954 [cs].url:http://arxiv.org/abs/2401.02954. Pre-published

  75. [113]

    Ashish Vaswani et al.Attention Is All You Need. Dec. 5, 2017. arXiv:1706.03762. Pre-published

  76. [114]

    Version 1

    Yinhan Liu et al.RoBERTa: A Robustly Optimized BERT Pretraining Approach. Version 1. 2019. Pre-published

  77. [115]

    Generalized Subset Designs in Analytical Chemistry

    Izabella Surowiec et al. “Generalized Subset Designs in Analytical Chemistry”. In:Anal. Chem.89.12 (June 20, 2017), pp. 6491–6497

  78. [116]

    Nonaqueous Liquid Electrolytes for Lithium-Based Rechargeable Batteries

    Kang Xu. “Nonaqueous Liquid Electrolytes for Lithium-Based Rechargeable Batteries”. In:Chem. Rev.104.10 (Oct. 1, 2004), pp. 4303–4418

  79. [117]

    Electrolytes and Interphases in Li-Ion Batteries and Beyond

    Kang Xu. “Electrolytes and Interphases in Li-Ion Batteries and Beyond”. In:Chem. Rev.114.23 (Dec. 10, 2014), pp. 11503–11618

  80. [118]

    Molecular Generation by Fast Assembly of (Deep)SMILES Frag- ments

    Francois Berenger and Koji Tsuda. “Molecular Generation by Fast Assembly of (Deep)SMILES Frag- ments”. In:J Cheminform13.1 (Dec. 2021), p. 88

Showing first 80 references.