Improvise, Adapt, Overcome: An On-The-Fly Multifidelity Algorithm for Efficient Machine Learning
Pith reviewed 2026-06-28 15:20 UTC · model grok-4.3
The pith
An adaptive on-the-fly multifidelity algorithm decides training data composition dynamically across fidelity levels to cut quantum chemistry costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an adaptive multifidelity machine learning procedure, by dynamically querying and adding training samples at each fidelity level, saturates model accuracy at lower fidelities before moving to higher-fidelity reference calculations, thereby reducing data-generation costs by up to a factor of 30 relative to single-fidelity training and by up to a factor of 5 relative to standard fixed-ratio multifidelity schemes across benchmarks on coupled-cluster energies and excitation energies.
What carries the argument
The on-the-fly adaptive algorithm that autonomously queries training samples at successive fidelity levels and decides when accuracy has saturated before advancing.
If this is right
- High-accuracy models for coupled-cluster and excitation energies become feasible at substantially lower total computational expense.
- Redundant multifidelity data generation is avoided by construction through saturation checks at each level.
- The same adaptive logic applies to any chemical property for which calculations of graded accuracy exist.
- A cost-aware pathway opens for scaling machine learning to larger systems where data generation was previously prohibitive.
Where Pith is reading between the lines
- The method could be combined with active-learning selection criteria inside each fidelity to further reduce sample counts.
- Similar dynamic fidelity scheduling may transfer to other simulation domains that possess cheap and expensive solvers, such as fluid mechanics or electronic-structure methods beyond chemistry.
- Long-term integration with automated workflow engines would allow fully autonomous model construction without manual ratio tuning.
Load-bearing premise
That accuracy at each lower-fidelity level can be driven to its practical maximum by adding samples without overlooking information that only the higher-fidelity calculations can supply.
What would settle it
A benchmark on a new molecular property in which the adaptive method either requires at least as many high-fidelity points as a fixed-ratio multifidelity baseline to reach the same error or produces a higher total cost while matching single-fidelity accuracy.
read the original abstract
Machine learning has accelerated quantum chemistry but is hindered by the prohibitive cost of generating high fidelity training data. Multifidelity machine learning (MFML) mitigates this overhead by systematically combining abundant low fidelity data with sparse high fidelity data. In spite of its success, standard MFML schemes rely on pre-defined scaling factors to determine sparse data ratio across fidelities, often generating redundant multifidelity data resulting in a loss of efficiency. Here, we introduce an adaptive on-the-fly multifidelity framework for machine learning that autonomously determines training dataset composition. By dynamically querying training samples at each fidelity, the algorithm saturates model accuracy at lower fidelities before moving up to more expensive reference calculations. We benchmark the novel adaptive-MFML across diverse chemical properties including the computational chemistry gold standard coupled cluster energies, and the more chemically challenging excitation energies. In our numerical experiments we show that our adaptive algorithm reduces data generation costs by up to a factor of 30 compared to single fidelity methods and improves upon standard MFML by up to a factor of 5. The mitigation of data redundancy establishes a high-accuracy low-cost pathway for sustainable cost-aware machine learning in quantum chemistry.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces an adaptive on-the-fly multifidelity machine learning algorithm that autonomously determines training dataset composition across fidelity levels by dynamically querying samples and saturating model accuracy at lower fidelities before moving to higher-fidelity calculations. It benchmarks the approach on coupled-cluster energies and excitation energies, claiming data-generation cost reductions of up to a factor of 30 versus single-fidelity methods and up to a factor of 5 versus standard MFML.
Significance. If the adaptive saturation procedure functions reliably, the method could substantially lower computational barriers to high-accuracy ML models in quantum chemistry while reducing redundant high-fidelity calculations. The work merits credit for its emphasis on on-the-fly adaptation to mitigate data redundancy and for including benchmarks on both standard (coupled-cluster) and more challenging (excitation energies) properties.
major comments (2)
- [Abstract] Abstract: the central efficiency claims depend on the saturation test, yet no description of the stopping rule, cross-validation scheme, validation metric, or error threshold is supplied; without these details it is impossible to evaluate whether lower-fidelity saturation reliably captures all information needed at the target fidelity, especially for excitation energies where inter-fidelity correlations are often weaker.
- [Numerical experiments] Numerical experiments section: the reported factors of 30 and 5 are presented without dataset sizes, exclusion criteria, number of independent runs, or error bars, preventing assessment of whether the observed gains are statistically robust or reproducible.
minor comments (1)
- [Abstract] Abstract: the phrase 'computational chemistry gold standard' for coupled-cluster energies could be made more precise by specifying the exact level (e.g., CCSD(T)).
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We address each major comment below and will revise the manuscript to improve clarity and completeness.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central efficiency claims depend on the saturation test, yet no description of the stopping rule, cross-validation scheme, validation metric, or error threshold is supplied; without these details it is impossible to evaluate whether lower-fidelity saturation reliably captures all information needed at the target fidelity, especially for excitation energies where inter-fidelity correlations are often weaker.
Authors: We agree the abstract is too terse on this point. The stopping rule (saturation of cross-validation error below a fixed threshold), the 5-fold cross-validation scheme, the MAE validation metric, and the 0.01 eV error threshold are fully specified in Section 3 (Methods). We will add a single sentence to the abstract summarizing these elements. On the specific concern for excitation energies, the numerical results in Section 4 demonstrate that the adaptive procedure still yields the reported cost reductions even when inter-fidelity correlations are weaker, because the algorithm only escalates fidelity once lower-fidelity models have demonstrably saturated. revision: yes
-
Referee: [Numerical experiments] Numerical experiments section: the reported factors of 30 and 5 are presented without dataset sizes, exclusion criteria, number of independent runs, or error bars, preventing assessment of whether the observed gains are statistically robust or reproducible.
Authors: We accept this criticism. The revised Numerical experiments section will explicitly state the training-set sizes at each fidelity, the exclusion criteria (outlier removal based on energy deviation >3σ), the number of independent runs (10), and error bars (standard deviation across runs). These additions will allow direct evaluation of statistical robustness. revision: yes
Circularity Check
No circularity; adaptive MFML is a procedural algorithm with empirical benchmarks
full rationale
The paper presents an on-the-fly adaptive multifidelity algorithm that dynamically queries samples to saturate accuracy at lower fidelities before escalating. No equations, fitted parameters, or self-citations are described that would make the reported cost reductions (factors of 30 vs single-fidelity, 5 vs standard MFML) reduce to inputs by construction. The claims rest on numerical experiments across chemical properties rather than any self-definitional or fitted-input structure. This is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Biometrika87(1), 1–13 (2000) https: //doi.org/10.1093/biomet/87.1.1 15
Kennedy, M., O’Hagan, A.: Predicting the output from a complex computer code when fast approximations are available. Biometrika87(1), 1–13 (2000) https: //doi.org/10.1093/biomet/87.1.1 15
-
[2]
Gratiet, L.L., Garnier, J.: Recursive co-kriging model for design of computer experiments with multiple levels of fidelity. Int. J. Uncertainty Quantif.4(5) (2014) https://doi.org/10.1615/Int.J.UncertaintyQuantification.2014006914
work page doi:10.1615/int.j.uncertaintyquantification.2014006914 2014
-
[3]
Fern´ andez-Godino, M.G.: Review of multi-fidelity models. Adv. Comput. Sci. Eng.1(4), 351–400 (2023) https://doi.org/10.3934/acse.2023015
-
[4]
Dral, P.O.: Quantum chemistry in the age of machine learning. J. Phys. Chem. Lett.11(6), 2336–2347 (2020) https://doi.org/10.1021/acs.jpclett.9b03664
-
[5]
Westermayr, J., Gastegger, M., Sch¨ utt, K.T., Maurer, R.J.: Perspective on inte- grating machine learning into computational chemistry and materials science. J. of Chem. Phys.154(23), 230903 (2021) https://doi.org/10.1063/5.0047760
-
[6]
Crawford, T.D., Schaefer III, H.F.: An Introduction to Coupled Cluster Theory for Computational Chemists, pp. 33–136. John Wiley & Sons, Ltd, (2000). Chap
2000
-
[7]
https://doi.org/10.1002/9780470125915.ch2
-
[8]
Ramakrishnan, R., Dral, P.O., Rupp, M., Lilienfeld, O.A.: Big data meets quan- tum chemistry approximations: The ∆-machine learning approach. J. Chem. The- ory Comput.11(5), 2087–2096 (2015) https://doi.org/10.1021/acs.jctc.5b00099
-
[9]
Pilania, G., Gubernatis, J.E., Lookman, T.: Multi-fidelity machine learning mod- els for accurate bandgap predictions of solids. Comput. Mater. Sci.129, 156–163 (2017) https://doi.org/10.1016/j.commatsci.2016.12.004
-
[10]
Zaspel, P., Huang, B., Harbrecht, H., Von Lilienfeld, O.A.: Boosting quan- tum machine learning models with a multilevel combination technique: Pople Diagrams revisited. J. Chem. Theory Comput.15(3), 1546–1559 (2019) https: //doi.org/10.1021/acs.jctc.8b00832
-
[11]
Vinod, V., Maity, S., Zaspel, P., Kleinekath¨ ofer, U.: Multifidelity machine learning for molecular excitation energies. J. Chem. Theory Comput.19(21), 7658–7670 (2023) https://doi.org/10.1021/acs.jctc.3c00882
-
[12]
Ruth, M., Gerbig, D., Schreiner, P.R.: Machine learning for bridging the gap between density functional theory and coupled cluster energies. J. Chem. Theory and Comp.19(15), 4912–4920 (2023) https://doi.org/10.1021/acs.jctc.3c00274
-
[13]
Vinod, V., Lyu, D., Ruth, M., R. Schreiner, P., Kleinekath¨ ofer, U., Zaspel, P.: Pre- dicting molecular energies of small organic molecules with multi-fidelity methods. J. Comp. Chem.46(6), 70056 (2025) https://doi.org/10.1002/jcc.70056
-
[14]
https://arxiv.org/abs/2604.00069
Sandonas, L.M., Balcells, D., Bochkarev, A., Cole, J.M., Deringer, V.L., Dobrautz, W., Ehrenhofer, A., Frank, T., Friederich, P., Friedrich, R., George, J., Ghiringhelli, L., Caldas, A.H., Juraskova, V., Kneiding, H., Lysogorskiy, Y., Margraf, J.T., T¨ urk, H., Lilienfeld, A., Todorovi´ c, M., Tkatchenko, A., Rossi, M., 16 Cuniberti, G.: Perspective: Towa...
arXiv 2026
-
[15]
Dral, P.O., Owens, A., Dral, A., Cs´ anyi, G.: Hierarchical machine learning of potential energy surfaces. J. Chem. Phys.152(20), 204110 (2020) https://doi. org/10.1063/5.0006498
-
[16]
Vinod, V., Zaspel, P.: Benchmarking data efficiency in ∆-ML and multifidelity models for quantum chemistry. J. Chem. Phys.163(2), 024134 (2025) https: //doi.org/10.1063/5.0272457
-
[17]
Vinod, V., Zaspel, P.: Investigating data hierarchies in multifidelity machine learning for excitation energies. J. Chem. Theory Comput.21(6), 3077–3091 (2025) https://doi.org/10.1021/acs.jctc.4c01491
-
[18]
Lyu, D., Vinod, V., Holzenkamp, M., Holtkamp, Y.M., Maity, S., Salazar, C.R., Kleinekath¨ ofer, U., Zaspel, P.: Excitation energy transfer between porphyrin dyes on a clay surface: A study employing multifidelity machine learning. Adv. Theory Simul.8(11), 00271 (2025) https://doi.org/10.1002/adts.202500271
-
[19]
ChemRxiv2026(0504) (2026) https://doi.org/10.26434/chemrxiv.15002714/v1
Maity, S., Vinod, V., Zaspel, P., Kleinekath¨ ofer, U.: ∆-machine learning for LC- DFT-level excitation energies of bacteriochlorophyll molecules in a LH2 complex. ChemRxiv2026(0504) (2026) https://doi.org/10.26434/chemrxiv.15002714/v1
-
[20]
Acta Numerica13, 147–269 (2004) https://doi.org/10.1017/S0962492904000182
Bungartz, H.-J., Griebel, M.: Sparse grids. Acta Numerica13, 147–269 (2004) https://doi.org/10.1017/S0962492904000182
-
[21]
Vinod, V., Kleinekath¨ ofer, U., Zaspel, P.: Optimized multifidelity machine learn- ing for quantum chemistry. Mach. Learn.: Sci. Technol.5(1), 015054 (2024) https://doi.org/10.1088/2632-2153/ad2cef
-
[22]
Zhang, L., Zhang, S., Owens, A., Yurchenko, S.N., Dral, P.O.: VIB5 database with accurate ab initio quantum chemical molecular potential energy surfaces. Sci. Data9(1), 84 (2022) https://doi.org/10.1038/s41597-022-01185-w
-
[23]
Vinod, V., Zaspel, P.: QeMFi: A multifidelity dataset of quantum chemical prop- erties of diverse molecules. Sci. Data12(1), 202 (2025) https://doi.org/10.1038/ s41597-024-04247-3
2025
-
[24]
Zenodo (2024) https://doi.org/10
Vinod, V., Zaspel, P.: QeMFi: A multifidelity dataset of quantum chemical prop- erties of diverse molecules (1.1.0) [dataset]. Zenodo (2024) https://doi.org/10. 5281/zenodo.13925688
2024
-
[25]
Pinheiro Jr, M., Zhang, S., Dral, P.O., Barbatti, M.: WS22 database, Wigner Sam- pling and geometry interpolation for configurationally diverse molecular datasets. Sci. Data10(1), 95 (2023) https://doi.org/10.1038/s41597-023-01998-3 17
-
[26]
Westermayr, J., Marquetand, P.: Machine learning for electronically excited states of molecules. Chem. Rev.121(16), 9873–9926 (2020) https://doi.org/10.1021/ acs.chemrev.0c00749
2020
-
[27]
Dral, P.O., Barbatti, M.: Molecular excited states through a machine learn- ing lens. Nat. Rev. Chem.5(6), 388–405 (2021) https://doi.org/10.1038/ s41570-021-00278-1
2021
-
[28]
Smith, J.S., Zubatyuk, R., Nebgen, B., Lubbers, N., Barros, K., Roitberg, A.E., Isayev, O., Tretiak, S.: The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data7(1), 134 (2020) https://doi.org/10.1038/s41597-020-0473-z
-
[29]
Bartlett, R.J., Musia l, M.: Coupled-cluster theory in quantum chemistry. Rev. Mod. Phys.79, 291–352 (2007) https://doi.org/10.1103/RevModPhys.79.291
-
[30]
Gao, X., Ramezanghorbani, F., Isayev, O., Smith, J.S., Roitberg, A.E.: TorchANI: A free and open source PyTorch-based deep learning implementation of the ani neural network potentials. J. Chem. Inf. Modeling60(7), 3408–3415 (2020) https: //doi.org/10.1021/acs.jcim.0c00451
-
[31]
Vinod, V., Zaspel, P.: LFaB: low fidelity as bias for active learning in the chemical configuration space. J. Chem. Theory Comput. (2026) https://doi.org/10.1021/ acs.jctc.6c00009
2026
-
[32]
Smith, J.S., Nebgen, B., Lubbers, N., Isayev, O., Roitberg, A.E.: Less is more: Sampling chemical space with active learning. J. Chem. Phys.148(24), 241733 (2018) https://doi.org/10.1063/1.5023802
-
[33]
Qu, C., Houston, P.L., Conte, R., Nandi, A., Bowman, J.M.: Breaking the coupled cluster barrier for machine-learned potentials of large molecules: The case of 15- atom acetylacetone. J. Phys. Chem. Lett.12(20), 4902–4909 (2021) https://doi. org/10.1021/acs.jpclett.1c01142
-
[34]
Vinod, V., Zaspel, P.: Assessing non-nested configurations of multifidelity machine learning for quantum-chemical properties. Mach. Learn.: Sci. Technol.5(4), 045005 (2024) https://doi.org/10.1088/2632-2153/ad7f25 18
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.