Foundation Models for Discovery and Exploration in Chemical Space

arxiv: 2510.18900 · v2 · submitted 2025-10-20 · ⚛️ physics.chem-ph · cond-mat.mtrl-sci· cs.LG

Foundation Models for Discovery and Exploration in Chemical Space

Alexius Wadell , Anoushka Bhutani , Victor Azumah , Austin R. Ellis-Mohr , Andrew J. Stier , Kareem Hegazy , Alexander Brace , Hancheng Zhao

show 21 more authors

Celia Kelly Anuj K. Nayak Yuhan Chen Dimitrios Simatos Hongyi Lin Murali Emani Venkatram Vishwanath Kevin Gering Melisa Alkan Tom Gibbs Jack Wells Wesley W. Qian Richard C. Gerkin Benjamin Amorelli Alexander B. Wiltschko Lav R. Varshney Bharath Ramsundar Karthik Duraisamy Michael W. Mahoney Arvind Ramanathan Venkatasubramanian Viswanathan

This is my paper

Pith reviewed 2026-05-18 05:47 UTC · model grok-4.3

classification ⚛️ physics.chem-ph cond-mat.mtrl-scics.LG

keywords molecular foundation modelschemical spacestructure-property predictionolfactory perceptionSmirk tokenizerneural scaling lawsmaterials discoveryhyperbolic geometry

0 comments p. Extension

The pith

Molecular foundation models called MIST predict over 400 chemical properties and generalize to mapping molecular scents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce MIST, a family of molecular foundation models with substantially more parameters and training data than earlier efforts. These models rely on a new tokenizer named Smirk that encodes nuclear, electronic, and geometric details from molecular structures. After fine-tuning, the models handle more than 400 structure-property prediction tasks and reach or surpass existing best results on benchmarks spanning physiology and electrochemistry. They also address concrete applications such as screening electrolyte solvents, reasoning about organometallic stereochemistry, and estimating mixture properties. The clearest sign of foundation-model capability appears when the same models predict scent profiles for molecules and organize olfactory space in a hierarchical manner consistent with hyperbolic geometry, even though scent data was never an explicit training target.

Core claim

MIST models use the Smirk tokenizer to comprehensively capture nuclear, electronic, and geometric information from molecules, enabling them to learn diverse representations across chemical space. Fine-tuned versions predict more than 400 structure-property relationships with performance at or above the state of the art on diverse benchmarks. The models address real-world challenges such as multiobjective electrolyte solvent screening, stereochemical reasoning for organometallics, and mixture property prediction. They also accurately predict scent profiles and form a hierarchical representation of olfactory space that is consistent with hyperbolic geometry. Hyperparameter-aware Bayesian神经网络缩放

What carries the argument

The Smirk tokenizer, which encodes nuclear, electronic, and geometric information from molecular structures to support learning of broad representations in the MIST foundation models.

If this is right

The models enable multiobjective screening of electrolyte solvents for desired performance criteria.
Stereochemical reasoning tasks for organometallic compounds become tractable without custom model development.
Properties of chemical mixtures can be estimated from component structures alone.
New problems in chemical space can be solved without explicit training on those exact tasks.
The models learn hierarchical representations that align with hyperbolic geometry in perceptual domains such as olfaction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the models truly capture general molecular features, they could extend to other sensory or biological domains that share underlying physical principles.
The Bayesian scaling laws may let researchers train still larger models on modest compute budgets by removing the need for repeated hyperparameter searches.
A single foundation model might eventually reduce reliance on many narrow, task-specific predictors in chemistry and materials science.
The observed hyperbolic structure in olfactory space raises the possibility that similar geometries appear in other learned representations of molecular or biological data.

Load-bearing premise

The Smirk tokenizer captures all relevant nuclear, electronic, and geometric information from molecular structures without significant loss or bias.

What would settle it

Direct comparison of MIST scent-profile predictions against measured human sensory data for a large set of molecules outside any training distribution, or quantitative verification that the learned olfactory embeddings exhibit negative curvature consistent with hyperbolic geometry.

Figures

Figures reproduced from arXiv: 2510.18900 by Alexander Brace, Alexander B. Wiltschko, Alexius Wadell, Andrew J. Stier, Anoushka Bhutani, Anuj K. Nayak, Arvind Ramanathan, Austin R. Ellis-Mohr, Benjamin Amorelli, Bharath Ramsundar, Celia Kelly, Dimitrios Simatos, Hancheng Zhao, Hongyi Lin, Jack Wells, Kareem Hegazy, Karthik Duraisamy, Kevin Gering, Lav R. Varshney, Melisa Alkan, Michael W. Mahoney, Murali Emani, Richard C. Gerkin, Tom Gibbs, Venkatasubramanian Viswanathan, Venkatram Vishwanath, Victor Azumah, Wesley W. Qian, Yuhan Chen.

**Figure 1.** Figure 1: MIST models molecules across chemical space, accelerating a broad range of materials design tasks. (a) Molecular Insight SMILES Transformers (MIST) is a family of molecular FMs which match or exceed state-of-the-art performance across diverse benchmarks, from physiology to quantum chemistry. The models solve real-world problems across chemical space — ranging from multiobjective electrolyte solvent scree… view at source ↗

**Figure 2.** Figure 2: MIST rapidly explores chemical space. (a) Hierarchical clustering of logit correlations from a MIST-1.8B variant fine-tuned on scent classification recovers human-interpretable scent relationships; for example “roasted”, “meaty” and “beefy” are all highly correlated (brown cluster). (b) MIST-28M accurately predicts quantum, chemical, and thermodynamic descriptors for electrolyte design — including orbital … view at source ↗

**Figure 3.** Figure 3: Exploring chemical space beyond organic molecules: organometallics, isotopes, and mixtures. (a) To fine-tune MIST models on binary mixture properties, we use a specialized permutationinvariant task network. This task network uses MIST embeddings ⃗ei to compute the linear mixing PL and excess PE contributions to the mixture property Pmix. (b) To fine-tune models on mixture excess properties, we curated a d… view at source ↗

**Figure 4.** Figure 4: Interpretability analysis reveals MIST’s potential as a robust tool for exploration and discovery. (a) Interpretable chemical features were extracted from every stage of MIST models — token embeddings to downstream predictions. (b) A t-SNE projection of token embeddings shows datasetspecific structure. All three datasets shown cluster the halogens and digits. The transition metal Quantum Mechanics dataset… view at source ↗

**Figure 5.** Figure 5: Compute-optimal training was critical to efficiently scaling MIST. (a) We adopted MCMC sampling to parameterize neural scaling laws, with penalty terms accounting for off-optimal hyperparameter selection, enabling robust predictions of the compute-optimal frontier. (b) Covariance plot of posterior samples for E and α/β after fitting penalized neural scaling laws shows low variance in α/β but higher varian… view at source ↗

read the original abstract

Accurate prediction of atomistic, thermodynamic, and kinetic properties from molecular structures underpins materials innovation. Existing computational and experimental approaches lack the scalability required to navigate chemical space efficiently. Scientific foundation models trained on large unlabelled datasets offer a path towards navigating chemical space across application domains. Here, we develop MIST, a family of molecular foundation models with up to an order of magnitude more parameters and data than prior works. Trained using a novel tokenizer, Smirk, which comprehensively captures nuclear, electronic, and geometric information, MIST learns a diverse range of molecules. MIST models have been fine-tuned to predict more than 400 structure-property relationships and have been shown to match or exceed state-of-the-art performance across diverse benchmarks, from physiology to electrochemistry. We demonstrate the ability of these models to solve real-world problems across chemical space from multiobjective electrolyte solvent screening to stereochemical reasoning for organometallics and mixture property prediction. The clearest demonstration of a foundation model is its ability to solve problems that were neither explicit targets of training nor central to the intentions of its developers. We identify olfactory perception mapping as such a problem, and show that MIST accurately predicted scent profiles and learned a hierarchical representation of olfactory space consistent with hyperbolic geometry. We formulated hyperparameter aware Bayesian neural scaling laws which eliminate the need for hyperparameter sweeps at every scale, making training large compute-optimal models feasible on a limited compute budget. The methods and findings presented here represent a significant step towards accelerating materials discovery, design, and optimization using foundation models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces MIST, a family of large molecular foundation models trained on extensive unlabeled data using a novel Smirk tokenizer asserted to comprehensively encode nuclear, electronic, and geometric features from molecular structures. The models are fine-tuned on more than 400 structure-property tasks and reported to match or exceed SOTA performance across benchmarks spanning physiology to electrochemistry. Applications are demonstrated in electrolyte solvent screening, stereochemical reasoning, and mixture property prediction. The central evidence for foundation-model behavior is zero-shot accurate prediction of scent profiles together with a learned hierarchical representation of olfactory space consistent with hyperbolic geometry. The work also formulates hyperparameter-aware Bayesian neural scaling laws to enable compute-optimal training without exhaustive sweeps.

Significance. If the performance and generalization claims are substantiated with detailed metrics and ablations, MIST would constitute a meaningful advance in scaling foundation models for chemical space navigation, with potential to accelerate materials discovery. The Bayesian scaling-law formulation that removes the need for per-scale hyperparameter sweeps is a concrete methodological strength that could be adopted more broadly. The zero-shot olfactory result, if shown to be robust rather than spurious, would provide a strong falsifiable test of emergent property capture. These elements, taken together, would support the paper's positioning as a step toward practical foundation-model use in chemistry.

major comments (3)

[Abstract] Abstract: the claim that MIST models 'match or exceed state-of-the-art performance across diverse benchmarks' on >400 tasks is presented without any tabulated metrics, error bars, benchmark identifiers, or ablation results, rendering the central performance assertion impossible to evaluate from the given text.
[Smirk tokenizer description] Description of the Smirk tokenizer: the assertion that it 'comprehensively captures nuclear, electronic, and geometric information' is load-bearing for the zero-shot olfactory generalization, yet no reconstruction-error statistics, mutual-information scores with electronic properties (partial charges, HOMO/LUMO), or ablation on 3D conformer recovery are supplied; without these, the risk that olfactory predictions rest on incomplete or biased encodings cannot be quantified.
[Olfactory perception mapping] Olfactory perception results: the reported accurate scent-profile prediction and hyperbolic embedding hierarchy are presented as the clearest demonstration of foundation-model behavior, but the manuscript does not include controls or ablations showing that these outcomes survive removal of auxiliary electronic or geometric features; this omission directly affects the claim that the representation is comprehensive rather than correlational.

minor comments (2)

[Abstract] Abstract: the statement 'up to an order of magnitude more parameters and data than prior works' would be strengthened by explicit numerical comparison to the largest previously published molecular foundation models.
[Bayesian neural scaling laws] Scaling-laws section: the precise functional form of the hyperparameter-aware Bayesian neural scaling laws and the validation procedure against held-out empirical runs should be stated more explicitly to allow independent reproduction.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The comments highlight important areas where additional evidence and clarity will strengthen the manuscript. We address each major comment below and have revised the manuscript accordingly to provide the requested metrics, statistics, and controls.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that MIST models 'match or exceed state-of-the-art performance across diverse benchmarks' on >400 tasks is presented without any tabulated metrics, error bars, benchmark identifiers, or ablation results, rendering the central performance assertion impossible to evaluate from the given text.

Authors: We agree that the abstract would benefit from greater specificity to allow direct evaluation of the performance claims. In the revised manuscript we have expanded the abstract to reference key benchmark identifiers and representative metrics (with error bars) drawn from the main results and supplementary tables. Detailed tabulated results, including per-task metrics, error bars, and ablation summaries across the >400 tasks, are now explicitly signposted in the main text and supplementary information. revision: yes
Referee: [Smirk tokenizer description] Description of the Smirk tokenizer: the assertion that it 'comprehensively captures nuclear, electronic, and geometric information' is load-bearing for the zero-shot olfactory generalization, yet no reconstruction-error statistics, mutual-information scores with electronic properties (partial charges, HOMO/LUMO), or ablation on 3D conformer recovery are supplied; without these, the risk that olfactory predictions rest on incomplete or biased encodings cannot be quantified.

Authors: We acknowledge that quantitative validation of the tokenizer's encoding capacity was not provided in the initial submission. In the revision we have added a dedicated supplementary section reporting (i) reconstruction-error statistics for molecular graphs and 3D structures, (ii) mutual-information scores between tokenizer-derived representations and electronic properties including partial charges and HOMO/LUMO energies, and (iii) results from an ablation study measuring 3D conformer recovery accuracy. These additions allow readers to assess the completeness of the encoding directly. revision: yes
Referee: [Olfactory perception mapping] Olfactory perception results: the reported accurate scent-profile prediction and hyperbolic embedding hierarchy are presented as the clearest demonstration of foundation-model behavior, but the manuscript does not include controls or ablations showing that these outcomes survive removal of auxiliary electronic or geometric features; this omission directly affects the claim that the representation is comprehensive rather than correlational.

Authors: We agree that explicit controls are necessary to distinguish comprehensive representation from spurious correlations. We have performed the requested ablation experiments and report the results in the revised manuscript: zero-shot olfactory prediction accuracy and the hyperbolic geometry of the learned embedding space are re-evaluated after systematic removal or masking of electronic and geometric features. The outcomes remain consistent, supporting the claim that the integrated representation drives the observed foundation-model behavior. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical training and external benchmarks drive results

full rationale

The paper's core claims rest on training MIST models with the Smirk tokenizer on large unlabeled molecular datasets, followed by fine-tuning to predict over 400 structure-property relationships and evaluation on diverse external benchmarks spanning physiology, electrochemistry, and zero-shot olfactory tasks. These outcomes are data-driven performance measurements rather than algebraic reductions, self-definitional mappings, or predictions forced by fitted inputs. The hyperparameter-aware Bayesian neural scaling laws are presented as a training optimization method to avoid exhaustive sweeps, but they do not reduce any reported predictions to the inputs by construction. No load-bearing self-citations, imported uniqueness theorems, or ansatzes smuggled via prior work are invoked to justify the central results; the olfactory hyperbolic geometry finding is an observed empirical pattern on held-out tasks. The derivation chain is therefore self-contained against external validation data.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

Central claims depend on the effectiveness of the new tokenizer and the assumption that pre-training on unlabeled molecular data yields transferable representations; many training hyperparameters and model architectural choices are implicit free parameters.

free parameters (2)

model parameter count and training data volume
Chosen up to an order of magnitude larger than prior works to achieve claimed performance.
hyperparameters in Bayesian neural scaling laws
Formulated to avoid sweeps; specific values fitted or chosen during development.

axioms (1)

domain assumption Large-scale pre-training on unlabeled molecular structures produces generalizable representations usable for downstream property prediction and zero-shot tasks.
Invoked to justify training MIST on unlabelled datasets before fine-tuning.

invented entities (1)

Smirk tokenizer no independent evidence
purpose: To encode nuclear, electronic, and geometric information comprehensively from molecular structures.
Newly introduced component whose completeness is central to the claimed generalization.

pith-pipeline@v0.9.0 · 5956 in / 1440 out tokens · 39344 ms · 2026-05-18T05:47:13.157489+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost Jcost functional equation and cosh identities echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

learned a hierarchical representation of olfactory space consistent with hyperbolic geometry
IndisputableMonolith/Foundation/AlphaCoordinateFixation higher-derivative calibration of CostAlphaLog refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

Smirk ... comprehensively captures nuclear, electronic, and geometric information

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery
cs.LG 2026-05 conditional novelty 7.0

fmxcoders improve cross-layer feature recovery in transformers via factorized weights and layer masking, delivering 10-30 point probing F1 gains, 25-50% lower MSE, doubled functional coherence, and 3-13x more coherent...
Energy-Aware Routing to Large Reasoning Models
cs.AI 2025-12 unverdicted novelty 4.0

In the critical regime for energy provisioning to large reasoning models, performance is volatility-limited, motivating variance-aware routing policies based on training and inference compute scaling laws.

Reference graph

Works this paper leans on

252 extracted references · 252 canonical work pages · cited by 2 Pith papers · 15 internal anchors

[1]

Chemical Space and Biology

Christopher M. Dobson. “Chemical Space and Biology”. In:Nature432.7019 (Dec. 1, 2004), pp. 824– 828

work page 2004
[2]

How Machine Learning Will Revolutionize Electrochemical Sciences

Aashutosh Mistry et al. “How Machine Learning Will Revolutionize Electrochemical Sciences”. In: ACS Energy Lett.6 (Mar. 23, 2021), pp. 1422–1431

work page 2021
[4]

July 24, 2024

Eduardo Soares et al.A Large Encoder-Decoder Family of Foundation Models For Chemical Language. July 24, 2024. arXiv:2407.20267. Pre-published

work page arXiv 2024
[5]

Chemical Space

Peter Kirkpatrick and Clare Ellis. “Chemical Space”. In:Nature432.7019 (Dec. 1, 2004), pp. 823–823

work page 2004
[6]

Up–down Approach for Expanding the Chemical Space of Metal–Organic Frame- works

Jiyeon Kim et al. “Up–down Approach for Expanding the Chemical Space of Metal–Organic Frame- works”. In:Nature Synthesis3.12 (Dec. 2024), pp. 1518–1528

work page 2024
[7]

Navigating Chemical Space for Biology and Medicine

Christopher Lipinski and Andrew Hopkins. “Navigating Chemical Space for Biology and Medicine”. In:Nature432.7019 (Dec. 2004), pp. 855–861

work page 2004
[8]

Generative AI for Navigating Synthesizable Chem- ical Space

Wenhao Gao, Shitong Luo, and Connor W. Coley. “Generative AI for Navigating Synthesizable Chem- ical Space”. In:Proceedings of the National Academy of Sciences122.41 (Oct. 14, 2025), e2415665122

work page 2025
[9]

Why Big Data and Compute Are Not Necessarily the Path to Big Materials Science

Naohiro Fujinuma et al. “Why Big Data and Compute Are Not Necessarily the Path to Big Materials Science”. In:Commun Mater3.1 (Aug. 30, 2022), p. 59

work page 2022
[12]

Alec Radford et al.Learning Transferable Visual Models From Natural Language Supervision. Feb. 26,

work page
[13]

Learning Transferable Visual Models From Natural Language Supervision

arXiv:2103.00020 [cs].url:http://arxiv.org/abs/2103.00020. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv
[15]

A Vision–Language Foundation Model for Precision Oncology

Jinxi Xiang et al. “A Vision–Language Foundation Model for Precision Oncology”. In:Nature638.8051 (Feb. 2025), pp. 769–778

work page 2025
[16]

Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts

Charles O’Neill et al. “Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts”. In: Neurips 2024 Workshop Foundation Models for Science: Progress, Opportunities, and Challenges. Nov. 2, 2024

work page 2024
[18]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani et al.On the Opportunities and Risks of Foundation Models. July 12, 2022. arXiv: 2108.07258 [cs].url:http://arxiv.org/abs/2108.07258. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2022
[24]

Version 1

Yinhan Liu et al.RoBERTa: A Robustly Optimized BERT Pretraining Approach. Version 1. 2019. Pre-published. 24

work page 2019
[28]

Accelerating Electrolyte Discovery for Energy Storage with High-Throughput Screening

Lei Cheng et al. “Accelerating Electrolyte Discovery for Energy Storage with High-Throughput Screening”. In:Journal of Physical Chemistry Letters6.2 (2015), pp. 283–291

work page 2015
[31]

Flammability of Li-Ion Battery Electrolytes: Flash Point and Self-Extinguishing Time Measurements

Steffen Hess, Margret Wohlfahrt-Mehrens, and Mario Wachtler. “Flammability of Li-Ion Battery Electrolytes: Flash Point and Self-Extinguishing Time Measurements”. In:J. Electrochem. Soc.162.2 (2015), A3084–A3097

work page 2015
[32]

Petrucci.General Chemistry: Principles and Modern Applications

R.H. Petrucci.General Chemistry: Principles and Modern Applications. Pearson Education. Pearson Education International, 2007

work page 2007
[33]

Predicting Human Olfactory Perception from Chemical Features of Odor Molecules

Andreas Keller et al. “Predicting Human Olfactory Perception from Chemical Features of Odor Molecules”. In:Science355.6327 (Feb. 24, 2017), pp. 820–826

work page 2017
[34]

Combinatorial Receptor Codes for Odors

Bettina Malnic et al. “Combinatorial Receptor Codes for Odors”. In:Cell96.5 (Mar. 5, 1999), pp. 713–

work page 1999
[35]

Application of Artificial Intelligence to Decode the Relationships between Smell, Olfactory Receptors and Small Molecules

Rayane Achebouche et al. “Application of Artificial Intelligence to Decode the Relationships between Smell, Olfactory Receptors and Small Molecules”. In:Scientific Reports12.1 (Nov. 5, 2022), p. 18817

work page 2022
[37]

$$\alpha$$-Decay Half-Life Predictions with Support Vector Machine

Amir Jalili et al. “$$\alpha$$-Decay Half-Life Predictions with Support Vector Machine”. In:Sci- entific Reports14.1 (Dec. 28, 2024), p. 30776

work page 2024
[38]

Catalysis in the Excited State: Bringing Innate Transition Metal Photochemistry into Play

Fabio Juli´ a. “Catalysis in the Excited State: Bringing Innate Transition Metal Photochemistry into Play”. In:ACS Catal.15.6 (Mar. 21, 2025), pp. 4665–4680

work page 2025
[39]

Molecular Design Principles for Photoac- tive Transition Metal Complexes: A Guide for “Photo-Motivated

Giacomo Morselli, Christian Reber, and Oliver S. Wenger. “Molecular Design Principles for Photoac- tive Transition Metal Complexes: A Guide for “Photo-Motivated” Chemists”. In:J. Am. Chem. Soc. 147.14 (Apr. 9, 2025), pp. 11608–11624

work page 2025
[43]

Murtaza Zohair et al.Chemical Foundation Model Guided Design of High Ionic Conductivity Elec- trolyte Formulations. Mar. 20, 2025. arXiv:2503.14878 [cond-mat].url:http://arxiv.org/abs/ 2503.14878. Pre-published

work page arXiv 2025
[44]

MD simulations explain the excess molar enthalpies in pseudo- binary mixtures of a choline chloride-based deep eutectic solvent with water or methanol

Leon de Villiers Engelbrecht et al. “MD simulations explain the excess molar enthalpies in pseudo- binary mixtures of a choline chloride-based deep eutectic solvent with water or methanol”. In:Fron- tiers in Chemistry10 (2022), p. 983281

work page 2022
[45]

Algebraic Representation of Thermodynamic Properties and the Classification of Solutions

Otto Redlich and A. T. Kister. “Algebraic Representation of Thermodynamic Properties and the Classification of Solutions”. In:Ind. Eng. Chem.40.2 (Feb. 1948), pp. 345–348. 25

work page 1948
[47]

Definitions, Methods, and Applications in Interpretable Machine Learning

W. James Murdoch et al. “Definitions, Methods, and Applications in Interpretable Machine Learning”. In:Proceedings of the National Academy of Sciences116.44 (Oct. 29, 2019), pp. 22071–22080

work page 2019
[48]

Transformer Circuits Thread

Adly Templeton et al.Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread. May 21, 2024.url:https://transformer- circuits.pub/ 2024/scaling-monosemanticity/index.html

work page 2024
[49]

Sonia Joseph et al.Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video. 2025. arXiv:2504.19475 [cs.CV]

work page arXiv 2025
[50]

GenSLMs: Genome-scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics

Maxim Zvyagin et al. “GenSLMs: Genome-scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics”. In:The International Journal of High Performance Computing Applications37.6 (Nov. 2023), pp. 683–705

work page 2023
[51]

Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings

Christopher A. Lipinski et al. “Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings”. In:Advanced Drug Delivery Reviews 23.1–3 (Jan. 1997), pp. 3–25

work page 1997
[53]

A History of the Structural Theory of Benzene - The Aromatic Sextet Rule and Huckel’s Rule

Shigeaki Kikuchi. “A History of the Structural Theory of Benzene - The Aromatic Sextet Rule and Huckel’s Rule”. In:Journal of Chemical Education74.2 (Feb. 1, 1997), p. 194

work page 1997
[54]

Utkarsh Sharma and Jared Kaplan.A Neural Scaling Law from the Dimension of the Data Manifold. Apr. 22, 2020. arXiv:2004.10802 [cs, stat]. Pre-published

work page arXiv 2020
[55]

Sequence Modeling and Design from Molecular to Genome Scale with Evo

Eric Nguyen et al. “Sequence Modeling and Design from Molecular to Genome Scale with Evo”. In: Science386.6723 (Nov. 15, 2024), eado9336

work page 2024
[60]

Scaling Laws from the Data Manifold Dimension

Utkarsh Sharma and Jared Kaplan. “Scaling Laws from the Data Manifold Dimension”. In:J. Mach. Learn. Res.23.1 (Jan. 1, 2022), 9:343–9:376

work page 2022
[64]

Ben Sorscher et al.Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning. Apr. 21, 2023. arXiv:2206.14486 [cs].url:http://arxiv.org/abs/2206.14486. Pre-published

work page arXiv 2023
[65]

Language Models Are Few-Shot Learners

Tom Brown et al. “Language Models Are Few-Shot Learners”. In:Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. [64]Huggingface/Transformers: Transformers: State-of-the-art Machine Learning for Pytorch, Tensor- Flow, and JAX.Version 4.40.2. May 6, 2024. [65]Microsoft/DeepSpeed. Microsoft, May 15, 2024. 26

work page 2020
[66]

Zenodo, June 7, 2023

William Falcon and The PyTorch Lightning team.PyTorch Lightning. Zenodo, June 7, 2023

work page 2023
[67]

Jerry Ma and Denis Yarats.On the Adequacy of Untuned Warmup for Adaptive Optimization. Mar. 19,

work page
[68]

Pre-published

arXiv:1910.04209 [cs, stat].url:http://arxiv.org/abs/1910.04209. Pre-published

work page arXiv 1910
[69]

Ilya Loshchilov and Frank Hutter.Decoupled Weight Decay Regularization. Jan. 4, 2019. arXiv:1711. 05101 [cs]. Pre-published. [69]Lightning-AI Torchmetrics. Version 1.4.0. Lightning AI, May 6, 2024

work page 2019
[70]

On the Art of Compiling and Using ’Drug-Like’ Chemical Fragment Spaces

J¨ org Degen et al. “On the Art of Compiling and Using ’Drug-Like’ Chemical Fragment Spaces”. In: ChemMedChem3.10 (Oct. 20, 2008), pp. 1503–1507

work page 2008
[73]

Comparative Study on Transport Properties for LiFAP and LiPF6 in Alkyl- Carbonates as Electrolytes through Conductivity, Viscosity and NMR Self-Diffusion Measurements

Patrice Porion et al. “Comparative Study on Transport Properties for LiFAP and LiPF6 in Alkyl- Carbonates as Electrolytes through Conductivity, Viscosity and NMR Self-Diffusion Measurements”. In:Electrochimica Acta114 (Dec. 30, 2013), pp. 95–104

work page 2013
[74]

Predicting Electrolyte Con- ductivity Directly from Molecular-Level Interactions

Yumin Zhang, Imanuel Bier, and Venkatasubramanian Viswanathan. “Predicting Electrolyte Con- ductivity Directly from Molecular-Level Interactions”. In:ACS Energy Lett.7.11 (Nov. 11, 2022), pp. 4061–4070

work page 2022
[75]

Ionic conduction and solution structure in LiPF6 and LiBF4 propylene car- bonate electrolytes

Sunwook Hwang et al. “Ionic conduction and solution structure in LiPF6 and LiBF4 propylene car- bonate electrolytes”. In:The Journal of Physical Chemistry C122.34 (2018), pp. 19438–19446

work page 2018
[78]

The COMPAS Project : A Computational Database of Polycyclic Aro- matic Systems . Phase 1: Cata - Condensed Polybenzenoid Hydrocarbons

Alexandra Wahab et al. “The COMPAS Project : A Computational Database of Polycyclic Aro- matic Systems . Phase 1: Cata - Condensed Polybenzenoid Hydrocarbons”. In:Journal of Chemical Information and Modeling62.16 (Aug. 22, 2022), pp. 3704–3713

work page 2022
[79]

COMPAS-2 : A Dataset of Cata-Condensed Hetero-Polycyclic Aromatic Systems

Eduardo Mayo Yanes, Sabyasachi Chakraborty, and Renana Gershoni-Poranne. “COMPAS-2 : A Dataset of Cata-Condensed Hetero-Polycyclic Aromatic Systems”. In:Scientific Data11.1 (Jan. 19, 2024), p. 97

work page 2024
[80]

Alexandra Wahab and Renana Gershoni-Poranne.COMPAS-3 : A Data Set of Peri- Condensed Poly- benzenoid Hydrocarbons. Feb. 26, 2024.url:https://chemrxiv.org/engage/chemrxiv/article- details/65d8c60ae9ebbb4db90f6276. Pre-published

work page 2024
[81]

Huckel theory and aromatically

L. J. Schaad and B. A. Jr. Hess. “Huckel theory and aromatically”. In:Journal of Chemical Education 51.10 (1974), p. 640. eprint:https://doi.org/10.1021/ed051p640

work page doi:10.1021/ed051p640 1974
[87]

Lukasz Maziarka et al.Molecule Attention Transformer. Feb. 19, 2020. arXiv:2002 . 08264 [cs]. Pre-published. 27

work page 2020
[88]

Mol-BERT: An Effective Molecular Representation with BERT for Molecular Property Prediction

Juncai Li and Xiaofei Jiang. “Mol-BERT: An Effective Molecular Representation with BERT for Molecular Property Prediction”. In:Wireless Communications and Mobile Computing2021.1 (Jan. 2021). Ed. by Yulin Wang, p. 7181815

work page 2021
[89]

SELFormer: Molecular Representation Learning via SELFIES Language Mod- els

Atakan Y¨ uksel et al. “SELFormer: Molecular Representation Learning via SELFIES Language Mod- els”. In:Mach. Learn.: Sci. Technol.4.2 (June 1, 2023), p. 025035

work page 2023
[90]

A Fingerprints Based Molecular Property Prediction Method Using the BERT Model

Naifeng Wen et al. “A Fingerprints Based Molecular Property Prediction Method Using the BERT Model”. In:J Cheminform14.1 (Oct. 21, 2022), p. 71

work page 2022
[92]

Afnan Sultan et al.Transformers for Molecular Property Prediction: Lessons Learned from the Past Five Years. Apr. 5, 2024. arXiv:2404.03969 [cs]. Pre-published

work page arXiv 2024
[93]

Ueda.SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery

Shion Honda, Shoi Shi, and Hiroki R. Ueda.SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery. Nov. 12, 2019. arXiv:1911.04738 [cs, stat].url:http://arxiv. org/abs/1911.04738. Pre-published

work page arXiv 2019
[94]

Regression Transformer Enables Concurrent Sequence Regression and Generation for Molecular Language Modelling

Jannis Born and Matteo Manica. “Regression Transformer Enables Concurrent Sequence Regression and Generation for Molecular Language Modelling”. In:Nature Machine Intelligence5.4 (Apr. 2023), pp. 432–444

work page 2023
[95]

Chemformer: A Pre-Trained Transformer for Computational Chemistry

Ross Irwin et al. “Chemformer: A Pre-Trained Transformer for Computational Chemistry”. In:Mach. Learn.: Sci. Technol.3.1 (Mar. 1, 2022), p. 015022

work page 2022
[96]

X-MOL: Large-Scale Pre-Training for Molecular Understanding and Diverse Molecular Analysis

Dongyu Xue et al. “X-MOL: Large-Scale Pre-Training for Molecular Understanding and Diverse Molecular Analysis”. In:Science Bulletin67.9 (May 2022), pp. 899–902. 28 Supplementary Information for Foundation Models for Discovery and Exploration in Chemical Space Alexius Wadell∗1, Anoushka Bhutani ∗1, Victor Azumah 2, Austin R. Ellis-Mohr 3, Celia Kelly1, Han...

work page doi:10.5281/zenodo.13761263 2022
[97]

Generate a single conformer using RDKit’s [77]ETKDGv3[180]

work page
[98]

Embed molecules using OpenBabel [181] and the UFF (Universal force field) [182] to generate a single starting conformer

work page
[99]

ethereal

Generate 200 conformers using RDKit’s [77]ETKDFv3[180] and select the lowest energy conformer after relaxation with the UFF [182] or MMFF (Merck molecular force field) [183, 184]. We evaluated each method by computing all QM9 reported properties for up to 100 randomly selected molecules from the QM9 dataset [29]. Parity plots of our calculations versus th...

work page 2000
[100]

Remove any molecule that was rejected byrdkit’sMolFromSMILES

work page
[101]

De-duplicate dataset usingrdkit’s computed InChI Key

work page
[102]

Use iterative proportional refitting to randomly sample a balanced dataset

work page
[103]

Additivity

Usescikit-learn’sStratifiedShuffleSplitto split the dataset into train/validation/test (80/10/10) while preserving the relative frequency of passing molecules. Initial Resampled H-Donor 99.2% 84.9% H-Acceptor 98.9% 84.2% MWT 97.1% 81.6% Log P 96.2% 73.6% Dataset Size 279,066 10,000 Table S7: Frequency of molecules passing each of Lipinski’s RO5 criteria, ...

work page 2024
[104]

Large-Scale Chemical Language Representations Capture Molecular Structure and Properties

Jerret Ross et al. “Large-Scale Chemical Language Representations Capture Molecular Structure and Properties”. In:Nat Mach Intell4.12 (Dec. 2022), pp. 1256–1264

work page 2022
[105]

Differentiable Modeling and Optimization of Non-Aqueous Li-based Battery Elec- trolyte Solutions Using Geometric Deep Learning

Shang Zhu et al. “Differentiable Modeling and Optimization of Non-Aqueous Li-based Battery Elec- trolyte Solutions Using Geometric Deep Learning”. In:Nat Commun15.1 (Oct. 5, 2024), p. 8649

work page 2024
[106]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin et al.BERT: Pre-training of Deep Bidirectional Transformers for Language Understand- ing. May 24, 2019. arXiv:1810.04805 [cs]. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2019
[107]

Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar.ChemBERTa: Large-Scale Self- Supervised Pretraining for Molecular Property Prediction. Oct. 23, 2020. arXiv:2010.09885 [physics, q-bio].url:http://arxiv.org/abs/2010.09885. Pre-published

work page arXiv 2020
[108]

Enamine Ltd.REAL Space. 2024

work page 2024
[109]

July 8, 2025

Alexius Wadell, Anoushka Bhutani, and Venkatasubramanian Viswanathan.Tokenization for Molec- ular Foundation Models. July 8, 2025. arXiv:2409.15370 [cs]. Pre-published

work page arXiv 2025
[110]

Jordan Hoffmann et al.Training Compute-Optimal Large Language Models. Mar. 29, 2022. arXiv: 2203.15556 [cs]. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2022
[111]

Jared Kaplan et al.Scaling Laws for Neural Language Models. Jan. 22, 2020. arXiv:2001.08361 [cs, stat]. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2020
[112]

Xiao Bi et al.DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. Jan. 5, 2024. arXiv:2401.02954 [cs].url:http://arxiv.org/abs/2401.02954. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2024
[113]

Ashish Vaswani et al.Attention Is All You Need. Dec. 5, 2017. arXiv:1706.03762. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2017
[114]

Version 1

Yinhan Liu et al.RoBERTa: A Robustly Optimized BERT Pretraining Approach. Version 1. 2019. Pre-published

work page 2019
[115]

Generalized Subset Designs in Analytical Chemistry

Izabella Surowiec et al. “Generalized Subset Designs in Analytical Chemistry”. In:Anal. Chem.89.12 (June 20, 2017), pp. 6491–6497

work page 2017
[116]

Nonaqueous Liquid Electrolytes for Lithium-Based Rechargeable Batteries

Kang Xu. “Nonaqueous Liquid Electrolytes for Lithium-Based Rechargeable Batteries”. In:Chem. Rev.104.10 (Oct. 1, 2004), pp. 4303–4418

work page 2004
[117]

Electrolytes and Interphases in Li-Ion Batteries and Beyond

Kang Xu. “Electrolytes and Interphases in Li-Ion Batteries and Beyond”. In:Chem. Rev.114.23 (Dec. 10, 2014), pp. 11503–11618

work page 2014
[118]

Molecular Generation by Fast Assembly of (Deep)SMILES Frag- ments

Francois Berenger and Koji Tsuda. “Molecular Generation by Fast Assembly of (Deep)SMILES Frag- ments”. In:J Cheminform13.1 (Dec. 2021), p. 88

work page 2021

Showing first 80 references.

[1] [1]

Chemical Space and Biology

Christopher M. Dobson. “Chemical Space and Biology”. In:Nature432.7019 (Dec. 1, 2004), pp. 824– 828

work page 2004

[2] [2]

How Machine Learning Will Revolutionize Electrochemical Sciences

Aashutosh Mistry et al. “How Machine Learning Will Revolutionize Electrochemical Sciences”. In: ACS Energy Lett.6 (Mar. 23, 2021), pp. 1422–1431

work page 2021

[3] [4]

July 24, 2024

Eduardo Soares et al.A Large Encoder-Decoder Family of Foundation Models For Chemical Language. July 24, 2024. arXiv:2407.20267. Pre-published

work page arXiv 2024

[4] [5]

Chemical Space

Peter Kirkpatrick and Clare Ellis. “Chemical Space”. In:Nature432.7019 (Dec. 1, 2004), pp. 823–823

work page 2004

[5] [6]

Up–down Approach for Expanding the Chemical Space of Metal–Organic Frame- works

Jiyeon Kim et al. “Up–down Approach for Expanding the Chemical Space of Metal–Organic Frame- works”. In:Nature Synthesis3.12 (Dec. 2024), pp. 1518–1528

work page 2024

[6] [7]

Navigating Chemical Space for Biology and Medicine

Christopher Lipinski and Andrew Hopkins. “Navigating Chemical Space for Biology and Medicine”. In:Nature432.7019 (Dec. 2004), pp. 855–861

work page 2004

[7] [8]

Generative AI for Navigating Synthesizable Chem- ical Space

Wenhao Gao, Shitong Luo, and Connor W. Coley. “Generative AI for Navigating Synthesizable Chem- ical Space”. In:Proceedings of the National Academy of Sciences122.41 (Oct. 14, 2025), e2415665122

work page 2025

[8] [9]

Why Big Data and Compute Are Not Necessarily the Path to Big Materials Science

Naohiro Fujinuma et al. “Why Big Data and Compute Are Not Necessarily the Path to Big Materials Science”. In:Commun Mater3.1 (Aug. 30, 2022), p. 59

work page 2022

[9] [12]

Alec Radford et al.Learning Transferable Visual Models From Natural Language Supervision. Feb. 26,

work page

[10] [13]

Learning Transferable Visual Models From Natural Language Supervision

arXiv:2103.00020 [cs].url:http://arxiv.org/abs/2103.00020. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv

[11] [15]

A Vision–Language Foundation Model for Precision Oncology

Jinxi Xiang et al. “A Vision–Language Foundation Model for Precision Oncology”. In:Nature638.8051 (Feb. 2025), pp. 769–778

work page 2025

[12] [16]

Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts

Charles O’Neill et al. “Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts”. In: Neurips 2024 Workshop Foundation Models for Science: Progress, Opportunities, and Challenges. Nov. 2, 2024

work page 2024

[13] [18]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani et al.On the Opportunities and Risks of Foundation Models. July 12, 2022. arXiv: 2108.07258 [cs].url:http://arxiv.org/abs/2108.07258. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2022

[14] [24]

Version 1

Yinhan Liu et al.RoBERTa: A Robustly Optimized BERT Pretraining Approach. Version 1. 2019. Pre-published. 24

work page 2019

[15] [28]

Accelerating Electrolyte Discovery for Energy Storage with High-Throughput Screening

Lei Cheng et al. “Accelerating Electrolyte Discovery for Energy Storage with High-Throughput Screening”. In:Journal of Physical Chemistry Letters6.2 (2015), pp. 283–291

work page 2015

[16] [31]

Flammability of Li-Ion Battery Electrolytes: Flash Point and Self-Extinguishing Time Measurements

Steffen Hess, Margret Wohlfahrt-Mehrens, and Mario Wachtler. “Flammability of Li-Ion Battery Electrolytes: Flash Point and Self-Extinguishing Time Measurements”. In:J. Electrochem. Soc.162.2 (2015), A3084–A3097

work page 2015

[17] [32]

Petrucci.General Chemistry: Principles and Modern Applications

R.H. Petrucci.General Chemistry: Principles and Modern Applications. Pearson Education. Pearson Education International, 2007

work page 2007

[18] [33]

Predicting Human Olfactory Perception from Chemical Features of Odor Molecules

Andreas Keller et al. “Predicting Human Olfactory Perception from Chemical Features of Odor Molecules”. In:Science355.6327 (Feb. 24, 2017), pp. 820–826

work page 2017

[19] [34]

Combinatorial Receptor Codes for Odors

Bettina Malnic et al. “Combinatorial Receptor Codes for Odors”. In:Cell96.5 (Mar. 5, 1999), pp. 713–

work page 1999

[20] [35]

Application of Artificial Intelligence to Decode the Relationships between Smell, Olfactory Receptors and Small Molecules

Rayane Achebouche et al. “Application of Artificial Intelligence to Decode the Relationships between Smell, Olfactory Receptors and Small Molecules”. In:Scientific Reports12.1 (Nov. 5, 2022), p. 18817

work page 2022

[21] [37]

$$\alpha$$-Decay Half-Life Predictions with Support Vector Machine

Amir Jalili et al. “$$\alpha$$-Decay Half-Life Predictions with Support Vector Machine”. In:Sci- entific Reports14.1 (Dec. 28, 2024), p. 30776

work page 2024

[22] [38]

Catalysis in the Excited State: Bringing Innate Transition Metal Photochemistry into Play

Fabio Juli´ a. “Catalysis in the Excited State: Bringing Innate Transition Metal Photochemistry into Play”. In:ACS Catal.15.6 (Mar. 21, 2025), pp. 4665–4680

work page 2025

[23] [39]

Molecular Design Principles for Photoac- tive Transition Metal Complexes: A Guide for “Photo-Motivated

Giacomo Morselli, Christian Reber, and Oliver S. Wenger. “Molecular Design Principles for Photoac- tive Transition Metal Complexes: A Guide for “Photo-Motivated” Chemists”. In:J. Am. Chem. Soc. 147.14 (Apr. 9, 2025), pp. 11608–11624

work page 2025

[24] [43]

Murtaza Zohair et al.Chemical Foundation Model Guided Design of High Ionic Conductivity Elec- trolyte Formulations. Mar. 20, 2025. arXiv:2503.14878 [cond-mat].url:http://arxiv.org/abs/ 2503.14878. Pre-published

work page arXiv 2025

[25] [44]

MD simulations explain the excess molar enthalpies in pseudo- binary mixtures of a choline chloride-based deep eutectic solvent with water or methanol

Leon de Villiers Engelbrecht et al. “MD simulations explain the excess molar enthalpies in pseudo- binary mixtures of a choline chloride-based deep eutectic solvent with water or methanol”. In:Fron- tiers in Chemistry10 (2022), p. 983281

work page 2022

[26] [45]

Algebraic Representation of Thermodynamic Properties and the Classification of Solutions

Otto Redlich and A. T. Kister. “Algebraic Representation of Thermodynamic Properties and the Classification of Solutions”. In:Ind. Eng. Chem.40.2 (Feb. 1948), pp. 345–348. 25

work page 1948

[27] [47]

Definitions, Methods, and Applications in Interpretable Machine Learning

W. James Murdoch et al. “Definitions, Methods, and Applications in Interpretable Machine Learning”. In:Proceedings of the National Academy of Sciences116.44 (Oct. 29, 2019), pp. 22071–22080

work page 2019

[28] [48]

Transformer Circuits Thread

Adly Templeton et al.Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread. May 21, 2024.url:https://transformer- circuits.pub/ 2024/scaling-monosemanticity/index.html

work page 2024

[29] [49]

Sonia Joseph et al.Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video. 2025. arXiv:2504.19475 [cs.CV]

work page arXiv 2025

[30] [50]

GenSLMs: Genome-scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics

Maxim Zvyagin et al. “GenSLMs: Genome-scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics”. In:The International Journal of High Performance Computing Applications37.6 (Nov. 2023), pp. 683–705

work page 2023

[31] [51]

Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings

Christopher A. Lipinski et al. “Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings”. In:Advanced Drug Delivery Reviews 23.1–3 (Jan. 1997), pp. 3–25

work page 1997

[32] [53]

A History of the Structural Theory of Benzene - The Aromatic Sextet Rule and Huckel’s Rule

Shigeaki Kikuchi. “A History of the Structural Theory of Benzene - The Aromatic Sextet Rule and Huckel’s Rule”. In:Journal of Chemical Education74.2 (Feb. 1, 1997), p. 194

work page 1997

[33] [54]

Utkarsh Sharma and Jared Kaplan.A Neural Scaling Law from the Dimension of the Data Manifold. Apr. 22, 2020. arXiv:2004.10802 [cs, stat]. Pre-published

work page arXiv 2020

[34] [55]

Sequence Modeling and Design from Molecular to Genome Scale with Evo

Eric Nguyen et al. “Sequence Modeling and Design from Molecular to Genome Scale with Evo”. In: Science386.6723 (Nov. 15, 2024), eado9336

work page 2024

[35] [60]

Scaling Laws from the Data Manifold Dimension

Utkarsh Sharma and Jared Kaplan. “Scaling Laws from the Data Manifold Dimension”. In:J. Mach. Learn. Res.23.1 (Jan. 1, 2022), 9:343–9:376

work page 2022

[36] [64]

Ben Sorscher et al.Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning. Apr. 21, 2023. arXiv:2206.14486 [cs].url:http://arxiv.org/abs/2206.14486. Pre-published

work page arXiv 2023

[37] [65]

Language Models Are Few-Shot Learners

Tom Brown et al. “Language Models Are Few-Shot Learners”. In:Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. [64]Huggingface/Transformers: Transformers: State-of-the-art Machine Learning for Pytorch, Tensor- Flow, and JAX.Version 4.40.2. May 6, 2024. [65]Microsoft/DeepSpeed. Microsoft, May 15, 2024. 26

work page 2020

[38] [66]

Zenodo, June 7, 2023

William Falcon and The PyTorch Lightning team.PyTorch Lightning. Zenodo, June 7, 2023

work page 2023

[39] [67]

Jerry Ma and Denis Yarats.On the Adequacy of Untuned Warmup for Adaptive Optimization. Mar. 19,

work page

[40] [68]

Pre-published

arXiv:1910.04209 [cs, stat].url:http://arxiv.org/abs/1910.04209. Pre-published

work page arXiv 1910

[41] [69]

Ilya Loshchilov and Frank Hutter.Decoupled Weight Decay Regularization. Jan. 4, 2019. arXiv:1711. 05101 [cs]. Pre-published. [69]Lightning-AI Torchmetrics. Version 1.4.0. Lightning AI, May 6, 2024

work page 2019

[42] [70]

On the Art of Compiling and Using ’Drug-Like’ Chemical Fragment Spaces

J¨ org Degen et al. “On the Art of Compiling and Using ’Drug-Like’ Chemical Fragment Spaces”. In: ChemMedChem3.10 (Oct. 20, 2008), pp. 1503–1507

work page 2008

[43] [73]

Comparative Study on Transport Properties for LiFAP and LiPF6 in Alkyl- Carbonates as Electrolytes through Conductivity, Viscosity and NMR Self-Diffusion Measurements

Patrice Porion et al. “Comparative Study on Transport Properties for LiFAP and LiPF6 in Alkyl- Carbonates as Electrolytes through Conductivity, Viscosity and NMR Self-Diffusion Measurements”. In:Electrochimica Acta114 (Dec. 30, 2013), pp. 95–104

work page 2013

[44] [74]

Predicting Electrolyte Con- ductivity Directly from Molecular-Level Interactions

Yumin Zhang, Imanuel Bier, and Venkatasubramanian Viswanathan. “Predicting Electrolyte Con- ductivity Directly from Molecular-Level Interactions”. In:ACS Energy Lett.7.11 (Nov. 11, 2022), pp. 4061–4070

work page 2022

[45] [75]

Ionic conduction and solution structure in LiPF6 and LiBF4 propylene car- bonate electrolytes

Sunwook Hwang et al. “Ionic conduction and solution structure in LiPF6 and LiBF4 propylene car- bonate electrolytes”. In:The Journal of Physical Chemistry C122.34 (2018), pp. 19438–19446

work page 2018

[46] [78]

The COMPAS Project : A Computational Database of Polycyclic Aro- matic Systems . Phase 1: Cata - Condensed Polybenzenoid Hydrocarbons

Alexandra Wahab et al. “The COMPAS Project : A Computational Database of Polycyclic Aro- matic Systems . Phase 1: Cata - Condensed Polybenzenoid Hydrocarbons”. In:Journal of Chemical Information and Modeling62.16 (Aug. 22, 2022), pp. 3704–3713

work page 2022

[47] [79]

COMPAS-2 : A Dataset of Cata-Condensed Hetero-Polycyclic Aromatic Systems

Eduardo Mayo Yanes, Sabyasachi Chakraborty, and Renana Gershoni-Poranne. “COMPAS-2 : A Dataset of Cata-Condensed Hetero-Polycyclic Aromatic Systems”. In:Scientific Data11.1 (Jan. 19, 2024), p. 97

work page 2024

[48] [80]

Alexandra Wahab and Renana Gershoni-Poranne.COMPAS-3 : A Data Set of Peri- Condensed Poly- benzenoid Hydrocarbons. Feb. 26, 2024.url:https://chemrxiv.org/engage/chemrxiv/article- details/65d8c60ae9ebbb4db90f6276. Pre-published

work page 2024

[49] [81]

Huckel theory and aromatically

L. J. Schaad and B. A. Jr. Hess. “Huckel theory and aromatically”. In:Journal of Chemical Education 51.10 (1974), p. 640. eprint:https://doi.org/10.1021/ed051p640

work page doi:10.1021/ed051p640 1974

[50] [87]

Lukasz Maziarka et al.Molecule Attention Transformer. Feb. 19, 2020. arXiv:2002 . 08264 [cs]. Pre-published. 27

work page 2020

[51] [88]

Mol-BERT: An Effective Molecular Representation with BERT for Molecular Property Prediction

Juncai Li and Xiaofei Jiang. “Mol-BERT: An Effective Molecular Representation with BERT for Molecular Property Prediction”. In:Wireless Communications and Mobile Computing2021.1 (Jan. 2021). Ed. by Yulin Wang, p. 7181815

work page 2021

[52] [89]

SELFormer: Molecular Representation Learning via SELFIES Language Mod- els

Atakan Y¨ uksel et al. “SELFormer: Molecular Representation Learning via SELFIES Language Mod- els”. In:Mach. Learn.: Sci. Technol.4.2 (June 1, 2023), p. 025035

work page 2023

[53] [90]

A Fingerprints Based Molecular Property Prediction Method Using the BERT Model

Naifeng Wen et al. “A Fingerprints Based Molecular Property Prediction Method Using the BERT Model”. In:J Cheminform14.1 (Oct. 21, 2022), p. 71

work page 2022

[54] [92]

Afnan Sultan et al.Transformers for Molecular Property Prediction: Lessons Learned from the Past Five Years. Apr. 5, 2024. arXiv:2404.03969 [cs]. Pre-published

work page arXiv 2024

[55] [93]

Ueda.SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery

Shion Honda, Shoi Shi, and Hiroki R. Ueda.SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery. Nov. 12, 2019. arXiv:1911.04738 [cs, stat].url:http://arxiv. org/abs/1911.04738. Pre-published

work page arXiv 2019

[56] [94]

Regression Transformer Enables Concurrent Sequence Regression and Generation for Molecular Language Modelling

Jannis Born and Matteo Manica. “Regression Transformer Enables Concurrent Sequence Regression and Generation for Molecular Language Modelling”. In:Nature Machine Intelligence5.4 (Apr. 2023), pp. 432–444

work page 2023

[57] [95]

Chemformer: A Pre-Trained Transformer for Computational Chemistry

Ross Irwin et al. “Chemformer: A Pre-Trained Transformer for Computational Chemistry”. In:Mach. Learn.: Sci. Technol.3.1 (Mar. 1, 2022), p. 015022

work page 2022

[58] [96]

X-MOL: Large-Scale Pre-Training for Molecular Understanding and Diverse Molecular Analysis

Dongyu Xue et al. “X-MOL: Large-Scale Pre-Training for Molecular Understanding and Diverse Molecular Analysis”. In:Science Bulletin67.9 (May 2022), pp. 899–902. 28 Supplementary Information for Foundation Models for Discovery and Exploration in Chemical Space Alexius Wadell∗1, Anoushka Bhutani ∗1, Victor Azumah 2, Austin R. Ellis-Mohr 3, Celia Kelly1, Han...

work page doi:10.5281/zenodo.13761263 2022

[59] [97]

Generate a single conformer using RDKit’s [77]ETKDGv3[180]

work page

[60] [98]

Embed molecules using OpenBabel [181] and the UFF (Universal force field) [182] to generate a single starting conformer

work page

[61] [99]

ethereal

Generate 200 conformers using RDKit’s [77]ETKDFv3[180] and select the lowest energy conformer after relaxation with the UFF [182] or MMFF (Merck molecular force field) [183, 184]. We evaluated each method by computing all QM9 reported properties for up to 100 randomly selected molecules from the QM9 dataset [29]. Parity plots of our calculations versus th...

work page 2000

[62] [100]

Remove any molecule that was rejected byrdkit’sMolFromSMILES

work page

[63] [101]

De-duplicate dataset usingrdkit’s computed InChI Key

work page

[64] [102]

Use iterative proportional refitting to randomly sample a balanced dataset

work page

[65] [103]

Additivity

Usescikit-learn’sStratifiedShuffleSplitto split the dataset into train/validation/test (80/10/10) while preserving the relative frequency of passing molecules. Initial Resampled H-Donor 99.2% 84.9% H-Acceptor 98.9% 84.2% MWT 97.1% 81.6% Log P 96.2% 73.6% Dataset Size 279,066 10,000 Table S7: Frequency of molecules passing each of Lipinski’s RO5 criteria, ...

work page 2024

[66] [104]

Large-Scale Chemical Language Representations Capture Molecular Structure and Properties

Jerret Ross et al. “Large-Scale Chemical Language Representations Capture Molecular Structure and Properties”. In:Nat Mach Intell4.12 (Dec. 2022), pp. 1256–1264

work page 2022

[67] [105]

Differentiable Modeling and Optimization of Non-Aqueous Li-based Battery Elec- trolyte Solutions Using Geometric Deep Learning

Shang Zhu et al. “Differentiable Modeling and Optimization of Non-Aqueous Li-based Battery Elec- trolyte Solutions Using Geometric Deep Learning”. In:Nat Commun15.1 (Oct. 5, 2024), p. 8649

work page 2024

[68] [106]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin et al.BERT: Pre-training of Deep Bidirectional Transformers for Language Understand- ing. May 24, 2019. arXiv:1810.04805 [cs]. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2019

[69] [107]

Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar.ChemBERTa: Large-Scale Self- Supervised Pretraining for Molecular Property Prediction. Oct. 23, 2020. arXiv:2010.09885 [physics, q-bio].url:http://arxiv.org/abs/2010.09885. Pre-published

work page arXiv 2020

[70] [108]

Enamine Ltd.REAL Space. 2024

work page 2024

[71] [109]

July 8, 2025

Alexius Wadell, Anoushka Bhutani, and Venkatasubramanian Viswanathan.Tokenization for Molec- ular Foundation Models. July 8, 2025. arXiv:2409.15370 [cs]. Pre-published

work page arXiv 2025

[72] [110]

Jordan Hoffmann et al.Training Compute-Optimal Large Language Models. Mar. 29, 2022. arXiv: 2203.15556 [cs]. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2022

[73] [111]

Jared Kaplan et al.Scaling Laws for Neural Language Models. Jan. 22, 2020. arXiv:2001.08361 [cs, stat]. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2020

[74] [112]

Xiao Bi et al.DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. Jan. 5, 2024. arXiv:2401.02954 [cs].url:http://arxiv.org/abs/2401.02954. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2024

[75] [113]

Ashish Vaswani et al.Attention Is All You Need. Dec. 5, 2017. arXiv:1706.03762. Pre-published

work page internal anchor Pith review Pith/arXiv arXiv 2017

[76] [114]

Version 1

Yinhan Liu et al.RoBERTa: A Robustly Optimized BERT Pretraining Approach. Version 1. 2019. Pre-published

work page 2019

[77] [115]

Generalized Subset Designs in Analytical Chemistry

Izabella Surowiec et al. “Generalized Subset Designs in Analytical Chemistry”. In:Anal. Chem.89.12 (June 20, 2017), pp. 6491–6497

work page 2017

[78] [116]

Nonaqueous Liquid Electrolytes for Lithium-Based Rechargeable Batteries

Kang Xu. “Nonaqueous Liquid Electrolytes for Lithium-Based Rechargeable Batteries”. In:Chem. Rev.104.10 (Oct. 1, 2004), pp. 4303–4418

work page 2004

[79] [117]

Electrolytes and Interphases in Li-Ion Batteries and Beyond

Kang Xu. “Electrolytes and Interphases in Li-Ion Batteries and Beyond”. In:Chem. Rev.114.23 (Dec. 10, 2014), pp. 11503–11618

work page 2014

[80] [118]

Molecular Generation by Fast Assembly of (Deep)SMILES Frag- ments

Francois Berenger and Koji Tsuda. “Molecular Generation by Fast Assembly of (Deep)SMILES Frag- ments”. In:J Cheminform13.1 (Dec. 2021), p. 88

work page 2021