UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems
Pith reviewed 2026-05-15 22:45 UTC · model grok-4.3
The pith
UBio-MolFM reaches ab initio-level accuracy on out-of-distribution biomolecular systems up to 1500 atoms using a bio-specific dataset, linear-scaling equivariant transformer, and staged training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UBio-MolFM is a universal molecular foundation model that combines the UBio-Mol26 dataset built by a two-pronged multi-fidelity strategy, the E2Former-V2 linear-scaling equivariant transformer with Equivariant Axis-Aligned Sparsification and Long-Short Range modeling, and a three-stage curriculum that moves from energy initialization through energy-force consistency to force-focused refinement. When tested on liquid water structure, ionic solvation, peptide folding, and other observables, the resulting model delivers ab initio-level fidelity on large out-of-distribution biomolecular systems up to approximately 1500 atoms while supporting realistic molecular-dynamics trajectories.
What carries the argument
E2Former-V2 linear-scaling equivariant transformer with Equivariant Axis-Aligned Sparsification and Long-Short Range modeling, trained via the three-stage curriculum on the UBio-Mol26 dataset.
If this is right
- Enables molecular-dynamics runs of protein-scale systems that match quantum forces without the usual scale-accuracy tradeoff.
- Reproduces structural observables such as water radial distribution functions and ion solvation shells at realistic temperatures.
- Supports folding trajectories for peptides that align with experimental timescales and endpoints.
- Delivers up to fourfold higher inference speed on large systems through the sparsification and long-short range design.
- Supplies a ready-to-deploy model for computing dynamics in out-of-distribution biomolecular environments.
Where Pith is reading between the lines
- The same three-stage protocol could be applied to other molecular domains such as materials or catalysis once an analogous multi-fidelity dataset is assembled.
- Longer-timescale simulations of conformational changes in proteins become feasible if the model maintains stability over millions of steps.
- Integration into existing molecular-dynamics packages would allow routine replacement of classical force fields with this higher-accuracy option for systems up to 1500 atoms.
Load-bearing premise
The multi-fidelity dataset and staged training produce a model whose quantum accuracy holds when the input shifts to real native protein environments and previously unseen large systems.
What would settle it
Direct comparison of UBio-MolFM forces and energies against new high-level ab initio reference calculations on a biomolecule of 1200-1500 atoms drawn from an environment outside the UBio-Mol26 sampling would show systematic deviation if the generalization claim fails.
Figures
read the original abstract
All-atom molecular simulation serves as a quintessential ``computational microscope'' for understanding the machinery of life, yet it remains fundamentally limited by the trade-off between quantum-mechanical (QM) accuracy and biological scale. We present UBio-MolFM, a universal foundation model framework specifically engineered to bridge this gap. UBio-MolFM introduces three synergistic innovations: (1) UBio-Mol26, a large bio-specific dataset constructed via a multi-fidelity ``Two-Pronged Strategy'' that combines systematic bottom-up enumeration with top-down sampling of native protein environments (up to 1,200 atoms); (2) E2Former-V2, a linear-scaling equivariant transformer that integrates Equivariant Axis-Aligned Sparsification (EAAS) and Long-Short Range (LSR) modeling to capture non-local physics with up to ~4x higher inference throughput in our large-system benchmarks; and (3) a Three-Stage Curriculum Learning protocol that transitions from energy initialization to energy-force consistency, with force-focused supervision to mitigate energy offsets. Rigorous benchmarking across microscopic forces and macroscopic observables -- including liquid water structure, ionic solvation, and peptide folding -- demonstrates that UBio-MolFM achieves ab initio-level fidelity on large, out-of-distribution biomolecular systems (up to ~1,500 atoms) and realistic MD observables. By reconciling scalability with quantum precision, UBio-MolFM provides a robust, ready-to-use tool for the next generation of computational biology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents UBio-MolFM, a universal foundation model for biomolecular systems. It introduces the UBio-Mol26 dataset (constructed via bottom-up enumeration and top-down sampling of native protein environments up to 1,200 atoms), the E2Former-V2 linear-scaling equivariant transformer incorporating Equivariant Axis-Aligned Sparsification (EAAS) and Long-Short Range (LSR) modeling, and a three-stage curriculum learning protocol (energy initialization to energy-force consistency with force-focused supervision). Benchmarking on microscopic forces and macroscopic observables (liquid water structure, ionic solvation, peptide folding) is claimed to demonstrate ab initio-level fidelity on large out-of-distribution systems up to ~1,500 atoms.
Significance. If the central claims hold, the work would be significant for enabling scalable, quantum-accurate all-atom simulations of biological systems beyond the reach of direct QM methods, potentially providing a practical tool for studying protein dynamics and environments at realistic scales.
major comments (1)
- [Abstract] Abstract: The claim of achieving 'ab initio-level fidelity on large, out-of-distribution biomolecular systems (up to ~1,500 atoms)' rests on macroscopic observables (liquid water structure, ionic solvation, peptide folding). These observables are insensitive to small force errors and can be reproduced by classical or semi-empirical potentials, so they do not confirm the asserted microscopic QM force accuracy for OOD cases where direct DFT/wavefunction validation is intractable.
minor comments (1)
- [Abstract] Abstract: The phrase 'ab initio-level fidelity' requires a precise quantitative definition (e.g., specific error thresholds relative to a reference QM method such as DFT or CCSD(T)) to allow readers to assess the strength of the benchmarking results.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the major comment below and indicate the revisions we will incorporate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim of achieving 'ab initio-level fidelity on large, out-of-distribution biomolecular systems (up to ~1,500 atoms)' rests on macroscopic observables (liquid water structure, ionic solvation, peptide folding). These observables are insensitive to small force errors and can be reproduced by classical or semi-empirical potentials, so they do not confirm the asserted microscopic QM force accuracy for OOD cases where direct DFT/wavefunction validation is intractable.
Authors: We agree that direct microscopic QM force validation is intractable for the largest OOD systems (~1,500 atoms) and that macroscopic observables serve as an indirect proxy. The manuscript already reports microscopic force benchmarks on smaller OOD systems (where QM is feasible) and employs multi-fidelity training plus force-focused curriculum learning to target microscopic accuracy. To address the concern, we will revise the abstract to explicitly distinguish direct microscopic validation (where computationally tractable) from macroscopic validation for large OOD cases, and we will add a dedicated limitations paragraph discussing the sensitivity of macroscopic observables along with additional quantitative force-error metrics from our validation sets. revision: partial
Circularity Check
No significant circularity; claims rest on empirical benchmarking of trained model
full rationale
The paper describes construction of UBio-Mol26 via multi-fidelity sampling, the E2Former-V2 architecture with EAAS and LSR, and a three-stage curriculum from energy initialization to force supervision. It then reports benchmarking on microscopic forces plus macroscopic observables (water structure, solvation, peptide folding) for OOD systems up to ~1500 atoms. No quoted equations, self-citations, or uniqueness theorems reduce any performance claim to a fitted input or self-definition by construction. The validation observables are distinct from the training targets and are presented as external checks, making the derivation self-contained under the given criteria.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
E2Former-V2, a linear-scaling equivariant transformer that integrates Equivariant Axis-Aligned Sparsification (EAAS) and Long–Short Range (LSR) modeling... Three-Stage Curriculum Learning protocol that transitions from energy initialization to energy–force consistency
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
UBio-Mol26... up to 1,200 atoms... test set... 1,300–1,500 atoms... ωB97M-D3/def2-TZVPD
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Dotson, Raimondas Galvelis, John E
Peter Eastman, Pavan Kumar Behara, David L. Dotson, Raimondas Galvelis, John E. Herr, Josh T. Wills, and John D. Chodera. SPICE, a dataset of drug-like molecules and peptides for training machine learning potentials.Scientific Data, 10(1):11, 2023
work page 2023
-
[2]
Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G
David S. Levine, Muhammad Shuaibi, Evan W. C. Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, G´ abor Cs´ anyi, Mark Dzamba, Pe- ter Eastman, Nathan C. Frey, Xiang Fu, Vardan Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Saro Raja, Adeel Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Var- gas, C. Lawrence Zitn...
-
[3]
Chu Wang, Lin Huang, Xinran Wei, Tao Qin, Arthur Jiang, Lixue Cheng, and Jia Zhang. Scalable machine learning force fields for macromolecular systems through long-range aware message passing.arXiv preprint arXiv:2601.03774, 2026
-
[4]
Kovacs, Gregor Simm, Christoph Ortner, and G´ abor Cs´ anyi
Ilyes Batatia, David P. Kovacs, Gregor Simm, Christoph Ortner, and G´ abor Cs´ anyi. MACE: Higher order equivariant message passing neural networks for fast and accurate force fields.Advances in Neural Information Processing Systems, 35:11423–11436, 2022
work page 2022
-
[5]
Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E
Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials.Nature Communi- cations, 13(1):2453, 2022
work page 2022
-
[6]
Saro Passaro and C. Lawrence Zitnick. Reducing so(3) convolutions to so(2) for efficient equivariant gnns. InProceedings of the 40th International Conference on Machine Learning, volume 202, pages 27420–27438. PMLR, 2023
work page 2023
-
[8]
Oliver T. Unke, Martin St¨ ohr, Stefan Ganscha, Thomas Unterthiner, Hartmut Maennel, Sergey Kashubin, Daniel Ahlin, Michael Gastegger, Muhammad Shuaibi, Benjamin Neb- gen, et al. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments.Science Advances, 10(4):eadn4397, 2024
work page 2024
-
[9]
Lin Huang, Chengxiang Huang, Ziang Wang, Yiyue Du, Chu Wang, Haocheng Lu, Yunyang Li, Xiaoli Liu, Arthur Jiang, and Jia Zhang. E2former-v2: On-the-fly equivariant attention with linear activation memory.arXiv preprint arXiv:2601.16622, 2026
-
[10]
E2former: An efficient and equivariant transformer with linear-scaling tensor products, 2025
Yunyang Li, Lin Huang, Zhihao Ding, Chu Wang, Xinran Wei, Han Yang, Zun Wang, Chang Liu, Yu Shi, Peiran Jin, et al. E2former: An efficient and equivariant transformer with linear-scaling tensor products, 2025
work page 2025
-
[11]
Junmei Wang, Piotr Cieplak, and Peter A Kollman. How well does a restrained electro- static potential (resp) model perform in calculating conformational energies of organic and biological molecules?Journal of Computational Chemistry, 21(12):1049–1074, 2000
work page 2000
-
[12]
Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agathe Leyrou, et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.Nucleic Acids Research, 50(D1):D439– D444, 2022. 33 April 14, 20...
work page 2022
-
[13]
D. A. Case et al.AmberTools 23. University of California, San Francisco, 2023
work page 2023
-
[14]
LB Skinner, CJ Benmore, JC Neuefeind, and JB Parise. The structure of water around the compressibility minimum.The Journal of Chemical Physics, 141(21):214507, 2014
work page 2014
-
[15]
Ab initio elec- tronic structure of liquid water.Physical Review Letters, 117(18):186401, 2016
Wei Chen, Francesco Ambrosio, Giacomo Miceli, and Alfredo Pasquarello. Ab initio elec- tronic structure of liquid water.Physical Review Letters, 117(18):186401, 2016
work page 2016
-
[16]
Mizied Galib, Marcel D Baer, Lawrie B Skinner, Christopher J Mundy, Thomas Huth- welker, Gregory K Schenter, Chris J Benmore, Niranjan Govind, and John L Fulton. Revisiting the hydration structure of aqueous na+.The Journal of Chemical Physics, 146(8):084504, 2017
work page 2017
-
[17]
Hyerim Hwang, Yong Chan Cho, Sooheyong Lee, Yun-Hee Lee, S Kim, Y Kim, W Jo, P Duchstein, D Zahn, and Geun Woo Lee. Hydration breaking and chemical ordering in a levitated nacl solution droplet beyond the metastable zone width limit: evidence for the early stage of two-step nucleation.Chemical Science, 12(1):179–187, 2021
work page 2021
-
[18]
Ling Ge, Leonardo Bernasconi, and Patricia Hunt. Linking electronic and molecular structure: insight into aqueous chloride solvation.Physical Chemistry Chemical Physics, 15(31):13169–13183, 2013
work page 2013
-
[19]
Salah Bouazizi, Salah Nasr, Nejmeddine Jaˆ ıdane, and Marie-Claire Bellissent-Funel. Local order in aqueous nacl solutions and pure water: X-ray scattering and molecular dynamics simulations study.The Journal of Physical Chemistry B, 110(46):23515–23523, 2006
work page 2006
-
[20]
Martin Egli, George Minasov, Li Su, and Alexander Rich. Metal ions and flexibility in a viral rna pseudoknot at atomic resolution.Proceedings of the National Academy of Sciences, 99(7):4302–4307, 2002
work page 2002
-
[21]
H. Zheng, I.G. Shabalin, K.B. Handing, J.M. Bujnicki, and W. Minor. Principles of ion binding to rna inferred from the analysis of a 1.55 a resolution bacterial ribosome structure - part i: Mg2+.Nucleic Acids Research, 43(7):3780–3794, 2015
work page 2015
-
[22]
Uma: A family of universal models for atoms
Brandon M Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R Kitchin, Daniel S Levine, et al. Uma: A family of universal models for atoms.arXiv preprint arXiv:2506.23971, 2025
-
[23]
arXiv preprint arXiv:2502.12147 (2025)
Xiang Fu, Brandon M Wood, Luis Barroso-Luque, Daniel S Levine, Meng Gao, Misko Dzamba, and C Lawrence Zitnick. Learning smooth and expressive interatomic potentials for physical property prediction.arXiv preprint arXiv:2502.12147, 2025
-
[24]
D´ avid P´ eter Kov´ acs, J Harry Moore, Nicholas J Browning, Ilyes Batatia, Joshua T Horton, Yixuan Pu, Venkat Kapil, William C Witt, Ioan-Bogdan Magd˘ au, Daniel J Cole, and G´ abor Cs´ anyi. Mace-off: Short-range transferable machine learning force fields for organic molecules.Journal of the American Chemical Society, 147(21):17598–17611, 2025
work page 2025
-
[25]
E2former: An efficient and equivariant transformer with linear-scaling tensor products, 2025
Yunyang Li, Lin Huang, Zhihao Ding, Chu Wang, Xinran Wei, Han Yang, Zun Wang, Chang Liu, Yu Shi, Peiran Jin, Tao Qin, Mark Gerstein, and Jia Zhang. E2former: An efficient and equivariant transformer with linear-scaling tensor products, 2025
work page 2025
-
[26]
Maho Nakata and Tomomi Shimazaki. PubChemQC project: a large-scale first-principles electronic structure database for data-driven chemistry.Journal of Chemical Information and Modeling, 57(6):1300–1308, 2017. 34 April 14, 2026UBio-MolFM Technical Report
work page 2017
-
[27]
James JP Stewart. Optimization of parameters for semiempirical methods v: Modification of nddo approximations and application to 70 elements.Journal of Molecular Modeling, 13:1173–1213, 2007
work page 2007
-
[28]
Density-functional thermochemistry
Axel D Becke. Density-functional thermochemistry. iii. the role of exact exchange.The Journal of chemical physics, 98(7):5648–5652, 1993
work page 1993
-
[29]
Peter Eastman, Pavan Kumar Behara, David Dotson, Raimondas Galvelis, John Herr, Josh Horton, Yuezhi Mao, John Chodera, Benjamin Pritchard, Yuanqing Wang, Gianni De Fabritiis, and Thomas Markland. Spice 2.0.1, 2024
work page 2024
-
[30]
Narbe Mardirossian and Martin Head-Gordon.ωB97M-V: A combinatorially optimized, range-separated hybrid, meta-GGA density functional with VV10 nonlocal correlation.The Journal of Chemical Physics, 144(21):214110, 2016
work page 2016
-
[31]
Florian Weigend and Reinhart Ahlrichs. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for h to rn: Design and assessment of accuracy. Physical Chemistry Chemical Physics, 7(18):3297–3305, 2005
work page 2005
-
[32]
Stefan Grimme, Jens Antony, Stephan Ehrlich, and Helge Krieg. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu.The Journal of Chemical Physics, 132(15):154104, 2010
work page 2010
-
[33]
Birgin, and Jos´ e Mario Mart´ ınez
Leandro Mart´ ınez, Ricardo Andrade, Ernesto G. Birgin, and Jos´ e Mario Mart´ ınez. Pack- mol: A package for building initial configurations for molecular dynamics simulations. Journal of Computational Chemistry, 30(13):2157–2164, 2009
work page 2009
-
[34]
Peter Eastman, Jason Swails, John D. Chodera, Robert T. McGibbon, Yutong Zhao, Kyle A. Beauchamp, Lee-Ping Wang, Andrew C. Simmonett, Matthew P. Felty, Michael E. Morozov, et al. OpenMM 7: Rapid development of high performance algorithms for molec- ular dynamics.PLoS Computational Biology, 13(7):e1005659, 2017
work page 2017
-
[35]
Naveen Michaud-Agrawal, Elizabeth J. Denning, Thomas B. Woolf, and Oliver Beckstein. MDAnalysis: a Python package for the rapid analysis of molecular dynamics simulations. Journal of Computational Chemistry, 32(10):2319–2327, 2011
work page 2011
-
[36]
Introducing gpu-acceleration into the python-based simulations of chemistry framework, 2024
Rui Li, Qiming Sun, Xing Zhang, and Garnet Kin-Lic Chan. Introducing gpu-acceleration into the python-based simulations of chemistry framework, 2024
work page 2024
-
[37]
Castelli, Rune Christensen, Marcin Du lak, Jesper Friis, Michael N
Ask Hjorth Larsen, Jens Jørgen Mortensen, Jakob Blomqvist, Ivano E. Castelli, Rune Christensen, Marcin Du lak, Jesper Friis, Michael N. Groves, Bjørk Hammer, Cory Hargus, et al. The atomic simulation environment—a Python library for working with atoms. Journal of Physics: Condensed Matter, 29(27):273002, 2017
work page 2017
-
[38]
Kristof Sch¨ utt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert M¨ uller. Schnet: A continuous-filter convolu- tional neural network for modeling quantum interactions.Advances in Neural Information Processing Systems, 30, 2017
work page 2017
-
[39]
Directional message passing for molecular graphs
Johannes Gasteiger, Janek Groß, and Stephan G¨ unnemann. Directional message passing for molecular graphs. InInternational Conference on Learning Representations, 2020
work page 2020
-
[40]
Owen, Mordechai Kornbluth, and Boris Kozinsky
Albert Musaelian, Simon Batzner, Anders Johansson, Lixin Sun, Cameron J. Owen, Mordechai Kornbluth, and Boris Kozinsky. Learning local equivariant representations for large-scale atomistic dynamics.Nature Communications, 14(1):579, 2023. 35 April 14, 2026UBio-MolFM Technical Report
work page 2023
-
[41]
Equiformer: Equivariant graph attention transformer for 3d atomistic graphs
Yi-Lun Liao and Tess Smidt. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. InInternational Conference on Learning Representations, 2023
work page 2023
-
[42]
Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations
Yi-Lun Liao, Brandon Wood, Abhishek Das, and Tess Smidt. Equiformerv2: Im- proved equivariant transformer for scaling to higher-degree representations.arXiv preprint arXiv:2306.12059, 2023
-
[43]
Uni-mol: A universal 3d molecular representation learning framework
Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Lin- feng Zhang, and Guolin Ke. Uni-mol: A universal 3d molecular representation learning framework. InInternational Conference on Learning Representations, 2023
work page 2023
-
[44]
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Chen, and Tie-Yan Liu. Do transformers really perform badly for graph representation? In Advances in Neural Information Processing Systems, volume 34, pages 28877–28888, 2021
work page 2021
-
[45]
DPA-2: A large atomic model as a multi-task learner, 2023
Duo Zhang et al. DPA-2: A large atomic model as a multi-task learner, 2023
work page 2023
-
[46]
D´ avid P´ eter Kov´ acs, J. Harry Moore, Nicholas J. Browning, Ilyes Batatia, Joshua T. Horton, Venkat Kapil, William C. Witt, Ioan-Bogdan Magd` au, Daniel J. Cole, and G´ abor Cs´ anyi. Mace-off23: Transferable machine learning force fields for organic molecules, 2023
work page 2023
-
[47]
Smith, Berk Hess, and Erik Lindahl
Mark James Abraham, Teemu Murtola, Roland Schulz, Szil´ ard P´ all, Jeremy C. Smith, Berk Hess, and Erik Lindahl. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers.SoftwareX, 1:19–25, 2015
work page 2015
-
[48]
Steve Plimpton. Fast parallel algorithms for short-range molecular dynamics.Journal of Computational Physics, 117(1):1–19, 1995
work page 1995
-
[49]
James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Eliza- beth Villa, Christophe Chipot, Robert B. Skeel, Laxmikant Kal´ e, and Klaus Schulten. Scal- able molecular dynamics with NAMD.Journal of Computational Chemistry, 26(16):1781– 1802, 2005. 36
work page 2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.