pith. sign in

Foundation Models for Discovery and Exploration in Chemical Space

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Accurate prediction of atomistic, thermodynamic, and kinetic properties from molecular structures underpins materials innovation. Existing computational and experimental approaches lack the scalability required to navigate chemical space efficiently. Scientific foundation models trained on large unlabelled datasets offer a path towards navigating chemical space across application domains. Here, we develop MIST, a family of molecular foundation models with up to an order of magnitude more parameters and data than prior works. Trained using a novel tokenizer, Smirk, which comprehensively captures nuclear, electronic, and geometric information, MIST learns a diverse range of molecules. MIST models have been fine-tuned to predict more than 400 structure-property relationships and have been shown to match or exceed state-of-the-art performance across diverse benchmarks, from physiology to electrochemistry. We demonstrate the ability of these models to solve real-world problems across chemical space from multiobjective electrolyte solvent screening to stereochemical reasoning for organometallics and mixture property prediction. The clearest demonstration of a foundation model is its ability to solve problems that were neither explicit targets of training nor central to the intentions of its developers. We identify olfactory perception mapping as such a problem, and show that MIST accurately predicted scent profiles and learned a hierarchical representation of olfactory space consistent with hyperbolic geometry. We formulated hyperparameter aware Bayesian neural scaling laws which eliminate the need for hyperparameter sweeps at every scale, making training large compute-optimal models feasible on a limited compute budget. The methods and findings presented here represent a significant step towards accelerating materials discovery, design, and optimization using foundation models.

citation-role summary

background 1

citation-polarity summary

fields

cs.AI 1 cs.LG 1

years

2026 1 2025 1

roles

background 1

polarities

background 1

representative citing papers

fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery

cs.LG · 2026-05-10 · conditional · novelty 7.0

fmxcoders improve cross-layer feature recovery in transformers via factorized weights and layer masking, delivering 10-30 point probing F1 gains, 25-50% lower MSE, doubled functional coherence, and 3-13x more coherent latents than standard crosscoders on GPT2-Small, Pythia, and Gemma2 models.

Energy-Aware Routing to Large Reasoning Models

cs.AI · 2025-12-23 · unverdicted · novelty 4.0

In the critical regime for energy provisioning to large reasoning models, performance is volatility-limited, motivating variance-aware routing policies based on training and inference compute scaling laws.

citing papers explorer

Showing 2 of 2 citing papers.

  • fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery cs.LG · 2026-05-10 · conditional · none · ref 28 · internal anchor

    fmxcoders improve cross-layer feature recovery in transformers via factorized weights and layer masking, delivering 10-30 point probing F1 gains, 25-50% lower MSE, doubled functional coherence, and 3-13x more coherent latents than standard crosscoders on GPT2-Small, Pythia, and Gemma2 models.

  • Energy-Aware Routing to Large Reasoning Models cs.AI · 2025-12-23 · unverdicted · none · ref 22 · internal anchor

    In the critical regime for energy provisioning to large reasoning models, performance is volatility-limited, motivating variance-aware routing policies based on training and inference compute scaling laws.