Derivative Informed Learning of Exchange-Correlation Functionals
Pith reviewed 2026-06-28 10:29 UTC · model grok-4.3
The pith
Supervising first and second energy derivatives on density matrices makes machine-learned XC functionals match hybrid targets more closely.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a hybrid-distillation setting, Derivative Informed XC-Loss supervises first and second derivatives of the energy on the Grassmannian of admissible density matrices; this produces functionals whose total-energy MAE is 66 percent lower on average than energy-plus-density supervision alone, whose density-sensitive E_ρ metric improves from 1.2 to 0.8 mEh, whose densities reduce hybrid SCF iterations by up to 50 percent, and whose Hessian supervision lowers TDDFT excitation-energy MAE by 19-35 percent.
What carries the argument
Derivative Informed XC-Loss (DI-Loss), a training objective that adds first- and second-order derivative supervision on the Grassmannian to align the learned functional's local response with the target hybrid functional.
If this is right
- Total-energy mean absolute error falls by 66 percent when averaged uniformly across architectures.
- The density-sensitive mean-field energy metric E_ρ improves from 1.2 to 0.8 mEh on average.
- Densities obtained from the distilled functionals reduce the number of hybrid-functional SCF iterations by up to 50 percent.
- Hessian supervision in the loss lowers mean excitation-energy MAE in downstream TDDFT calculations by 19-35 percent.
Where Pith is reading between the lines
- The same derivative-supervision idea could be applied to other reference functionals or to properties beyond energies, such as forces, to improve geometry predictions.
- If the response alignment persists across molecular sizes, the distilled models might serve as cheap initial guesses that accelerate large-scale hybrid calculations.
- Testing whether the learned functionals remain stable when used in geometry optimizations or molecular dynamics would reveal whether local-response matching transfers to dynamical properties.
Load-bearing premise
That supervising first and second derivatives on the Grassmannian will produce a functional whose local response properties match the target without introducing SCF instabilities or inconsistencies in downstream observables.
What would settle it
Training the same architectures with DI-Loss and then observing that the resulting functionals require more SCF iterations than the energy-plus-density baselines on a held-out set of molecules would falsify the central claim.
Figures
read the original abstract
Machine-learned (ML) exchange-correlation (XC) functionals aim to replace human-designed density functional approximations by learning directly from reference data, but they still do not consistently outperform traditional $\mathcal{O}(N^4)$-scaling hybrid functionals. We study a hybrid-distillation setting in which $\mathcal{O}(N^3)$-scaling ML-XC functionals are trained to reproduce B3LYP/def2-SVP targets. We introduce Derivative Informed XC-Loss (DI-Loss), a loss that incorporates additional information from the reference hybrid functional by supervising first and second derivatives of the energy on the Grassmannian of admissible density matrices. Rather than only matching the self-consistent fixed point, DI-Loss aligns the local first- and second-order response of the learned functional with that of the target functional. Across four evaluated architectures, DI-Loss consistently improves the main energy metrics. Averaged uniformly across architectures, the total-energy MAE decreases by 66% relative to energy and density supervision alone. The density-sensitive mean-field energy metric $E_\rho$ improves from $1.2$ to $0.8$ mEh on average, while dipole and $\mathcal{L}_2$ density errors do not improve uniformly. We further show that densities from the distilled functionals reduce hybrid-functional SCF iterations by up to 50%. In downstream TDDFT calculations, Hessian supervision improves excited-state predictions, with XCdiff reducing the mean excitation-energy MAE by 19 - 35%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Derivative Informed XC-Loss (DI-Loss) in a hybrid-distillation setting where O(N^3) ML exchange-correlation functionals are trained to reproduce B3LYP/def2-SVP targets. The loss augments standard energy and density matching by supervising first- and second-order energy derivatives on the Grassmannian of admissible density matrices, with the goal of aligning local response properties. Across four architectures the approach yields an average 66% reduction in total-energy MAE relative to energy/density supervision, improves the density-sensitive metric E_ρ from 1.2 to 0.8 mEh, reduces hybrid SCF iterations by up to 50%, and lowers TDDFT excitation-energy MAE by 19-35%.
Significance. If the numerical gains are robust, the work shows that explicit derivative supervision on the Grassmannian can materially improve the fidelity of learned XC functionals to a hybrid reference while preserving O(N^3) inference cost. The direct linkage between the added loss terms and the reported gains in energy, E_ρ, SCF convergence, and downstream TDDFT accuracy is a clear strength of the experimental design.
major comments (2)
- [§3.2] §3.2 (DI-Loss definition): the central claim that Grassmannian supervision of first and second derivatives produces response properties aligned with B3LYP without introducing SCF instabilities or inconsistencies rests on the untested assumption that the chosen manifold and loss weights are sufficient; the manuscript provides no explicit verification (e.g., eigenvalue spectra of the Hessian or convergence-failure rates) that this holds for all tested molecules.
- [Table 2, §4.1] Table 2 and §4.1: the 66% MAE reduction and E_ρ improvement are reported as uniform averages across architectures, yet no per-architecture standard deviations, number of test systems, or statistical significance tests are supplied, leaving open whether the gains are uniformly load-bearing or driven by a subset of cases.
minor comments (2)
- [§3] The notation for the Grassmannian projection operator and the precise definition of the second-derivative term (Hessian supervision) would benefit from an expanded equation or pseudocode block to aid reproducibility.
- [§4.3] Figure captions for the TDDFT results should explicitly state whether the reported MAE reductions are computed on the same test set used for training or on a held-out set.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the recommendation of minor revision. We respond to each major comment below.
read point-by-point responses
-
Referee: [§3.2] §3.2 (DI-Loss definition): the central claim that Grassmannian supervision of first and second derivatives produces response properties aligned with B3LYP without introducing SCF instabilities or inconsistencies rests on the untested assumption that the chosen manifold and loss weights are sufficient; the manuscript provides no explicit verification (e.g., eigenvalue spectra of the Hessian or convergence-failure rates) that this holds for all tested molecules.
Authors: We acknowledge that the manuscript does not contain explicit Hessian eigenvalue spectra or tabulated convergence-failure rates. The observed reductions in SCF iteration counts (up to 50%) and the fact that all reported calculations completed successfully provide indirect support for stability, but these do not constitute the direct verification requested. In the revised manuscript we will add a supplementary table listing SCF convergence success rates over the full test set for each architecture and, for a representative subset of molecules, report the lowest eigenvalues of the response Hessian under the learned functionals. revision: yes
-
Referee: [Table 2, §4.1] Table 2 and §4.1: the 66% MAE reduction and E_ρ improvement are reported as uniform averages across architectures, yet no per-architecture standard deviations, number of test systems, or statistical significance tests are supplied, leaving open whether the gains are uniformly load-bearing or driven by a subset of cases.
Authors: The 66% figure is the uniform average across the four architectures, as stated in the text. We agree that per-architecture variability, test-set size, and statistical tests are missing. The revised version will expand Table 2 to include per-architecture means and standard deviations, state the exact number of test molecules, and add p-values (or bootstrap confidence intervals) for the reported energy and E_ρ improvements. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper describes an empirical ML distillation procedure that trains models to match B3LYP targets using an augmented loss (DI-Loss) supervising energy, density, and derivatives on the Grassmannian. Reported gains (e.g., 66% MAE reduction, E_ρ improvement) are measured by comparing two supervised training regimes against the same external B3LYP reference on held-out data; this is standard supervised-learning evaluation and does not reduce any claimed result to its inputs by construction. No self-definitional equations, fitted parameters renamed as predictions, load-bearing self-citations, or uniqueness theorems appear in the abstract or described claims. The work is self-contained as a practical fitting improvement against an independent benchmark functional.
Axiom & Free-Parameter Ledger
free parameters (2)
- loss component weights
- neural network hyperparameters
axioms (1)
- domain assumption The Grassmannian of admissible density matrices provides a valid manifold on which to compute and supervise energy derivatives for SCF-consistent densities.
Reference graph
Works this paper leans on
-
[1]
Physical Review Letters , volume=
Learning neural free-energy functionals with pair-correlation matching , author=. Physical Review Letters , volume=. 2025 , publisher=
2025
-
[2]
and Kotsev, Viktor and G
Eberhard, Eike S. and Kotsev, Viktor and G. Transferable
-
[3]
Karton, Amir , year = 2026, journal =. W1--
2026
-
[4]
Journal of chemical theory and computation , volume=
Density-corrected DFT explained: Questions and answers , author=. Journal of chemical theory and computation , volume=. 2022 , publisher=
2022
-
[5]
SIAM journal on Matrix Analysis and Applications , volume=
The geometry of algorithms with orthogonality constraints , author=. SIAM journal on Matrix Analysis and Applications , volume=. 1998 , publisher=
1998
-
[6]
Pharmacological reviews , volume=
Computational methods in drug discovery , author=. Pharmacological reviews , volume=. 2014 , publisher=
2014
-
[7]
Materials Reports: Energy , volume=
Computational discovery of energy materials in the era of big data and machine learning: a critical review , author=. Materials Reports: Energy , volume=. 2021 , publisher=
2021
-
[8]
Advanced Materials , volume=
In silico chemical experiments in the age of AI: From quantum chemistry to machine learning and back , author=. Advanced Materials , volume=. 2024 , publisher=
2024
-
[9]
Discover Chemistry , volume=
Advancing predictive modeling in computational chemistry through quantum chemistry, molecular mechanics, and machine learning , author=. Discover Chemistry , volume=. 2025 , publisher=
2025
-
[10]
Hartree--Fock Theory , booktitle =
Helgaker, Trygve and J. Hartree--Fock Theory , booktitle =. 2000 , isbn =. doi:10.1002/9781119019572.ch10 , url =
-
[11]
WIREs Computational Molecular Science , volume =
Double-Hybrid Density Functionals , author =. WIREs Computational Molecular Science , volume =
-
[12]
and Geng, Dominik and Gerhartz, Gerrit and Ickler, Marc K
Remme, Roman and Kaczun, Tobias and Ebert, Tim and Gehrig, Christof A. and Geng, Dominik and Gerhartz, Gerrit and Ickler, Marc K. and Klockow, Manuel V. and Lippmann, Peter and Schmidt, Johannes S. and Wagner, Simon and Dreuw, Andreas and Hamprecht, Fred A. , year = 2025, journal =. Stable and
2025
-
[13]
ACS Central Science , volume =
Automatic. ACS Central Science , volume =
-
[14]
2024 , url =
Keller Jordan and Jeremy Bernstein and Brendan Rappazzo and @fernbear.bsky.social and Boza Vlado and You Jiacheng and Franz Cesista and Braden Koszarsky and @Grad62304977 , title =. 2024 , url =
2024
-
[15]
, year = 1965, journal =
Anderson, Donald G. , year = 1965, journal =. Iterative
1965
-
[16]
Stephens, P. J. and Devlin, F. J. and Chabalowski, C. F. and Frisch, M. J. , date =. Ab
-
[17]
and Ba, Jimmy , year = 2017, number =
Kingma, Diederik P. and Ba, Jimmy , year = 2017, number =. Adam:
2017
-
[18]
Physical Review Research , volume =
Neural Network Distillation of Orbital Dependent Density Functional Theory , author =. Physical Review Research , volume =
-
[19]
Liu, Zhe and Ni, Yuyan and Pu, Zhichen and Sun, Qiming and Liu, Siyuan and Yan, Wen , year = 2025, number =. Towards
2025
-
[20]
Deep Equilibrium Networks Are Sensitive to Initialization Statistics , booktitle =
-
[21]
Physics Letters A , volume =
Highly Accurate Machine Learning Model for Kinetic Energy Density Functional , author =. Physics Letters A , volume =
-
[22]
Allen, Alice E. A. and Lubbers, Nicholas and Matin, Sakib and Smith, Justin and Messerly, Richard and Tretiak, Sergei and Barros, Kipton , year = 2024, journal =. Learning Together:
2024
-
[23]
Physical Review B , volume =
Exact Results for the Charge and Spin Densities, Exchange-Correlation Potentials, and Density-Functional Eigenvalues , author =. Physical Review B , volume =
-
[24]
Antypas, K and Austin, B A and Butler, T L and Gerber, R A and Whitney, C L and Wright, N J and Yang, W and Zhao, Z , year = 2014, institution =
2014
-
[25]
and Dykstra, Clifford E
Augspurger, Joseph D. and Dykstra, Clifford E. , year = 1990, journal =. General Quantum Mechanical Operators
1990
-
[26]
Zico and Koltun, Vladlen , year = 2019, volume =
Bai, Shaojie and Kolter, J. Zico and Koltun, Vladlen , year = 2019, volume =. Deep. Advances in
2019
-
[27]
Zico , year = 2020, number =
Bai, Shaojie and Koltun, Vladlen and Kolter, J. Zico , year = 2020, number =. Multiscale
2020
-
[28]
Zico , year = 2021, primaryclass =
Bai, Shaojie and Koltun, Vladlen and Kolter, J. Zico , year = 2021, primaryclass =. Stabilizing
2021
-
[29]
Physical Review B , volume =
On Representing Chemical Environments , author =. Physical Review B , volume =
-
[30]
Nature Communications , volume =
E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials , author =. Nature Communications , volume =
-
[31]
Beebe, Nelson H. F. and Linderberg, Jan , year = 1977, journal =
1977
-
[32]
Pattern Recognition and Machine Learning , author =
-
[33]
and McMartin, Colin and Guida, Wayne C
Bohacek, Regine S. and McMartin, Colin and Guida, Wayne C. , year = 1996, journal =. The Art and Practice of Structure-Based Drug Design:
1996
-
[34]
Attention on the
Bonev, Boris and Rietmann, Max and Paris, Andrea and Carpentieri, Alberto and Kurth, Thorsten , year = 2025, number =. Attention on the
2025
-
[35]
Does Equivariance Matter at Scale? , author =
-
[36]
and Burke, Kieron and M
Brockherde, Felix and Vogt, Leslie and Li, Li and Tuckerman, Mark E. and Burke, Kieron and M. Bypassing the. Nature Communications , volume =
-
[37]
Bystrom, Kyle and Kozinsky, Boris , year = 2022, journal =
2022
-
[38]
Nonlocal
Bystrom, Kyle and Kozinsky, Boris , year = 2023, number =. Nonlocal
2023
-
[39]
and Handy, Nicholas C
Chan, Garnet Kin-Lic and Cohen, Aron J. and Handy, Nicholas C. , year = 2001, journal =. Thomas--
2001
-
[40]
The Journal of Chemical Physics , volume =
Development of a Machine Learning Finite-Range Nonlocal Density Functional , author =. The Journal of Chemical Physics , volume =
-
[41]
, year = 2015, journal =
Chow, Edmond and Liu, Xing and Smelyanskiy, Mikhail and Hammond, Jeff R. , year = 2015, journal =. Parallel Scalability of
2015
-
[42]
, year = 2014, edition =
Cramer, Christopher J. , year = 2014, edition =. Essentials of Computational Chemistry: Theories and Models , shorttitle =
2014
-
[43]
The Journal of Chemical Physics , volume =
Improved Loss Functions for Machine-Learned Atomic Potentials , author =. The Journal of Chemical Physics , volume =
-
[44]
and Gavini, Vikram , year = 2021, journal =
Kanungo, Bikash and Zimmerman, Paul M. and Gavini, Vikram , year = 2021, journal =. A
2021
-
[45]
Physical Review B , volume =
Highly Accurate and Constrained Density Functional Obtained with Differentiable Programming , author =. Physical Review B , volume =
-
[46]
Nature Communications , volume =
Machine Learning Accurate Exchange and Correlation Functionals of the Electronic Density , author =. Nature Communications , volume =
-
[47]
Testing of
Domaga. Testing of. International Journal of Molecular Sciences , volume =
-
[48]
Scientific Data , volume =
Quantum Chemical Benchmark Databases of Gold-Standard Dimer Interaction Energies , author =. Scientific Data , volume =
-
[49]
Physical Chemistry Chemical Physics , volume =
Robust and Variational Fitting , author =. Physical Chemistry Chemical Physics , volume =
-
[50]
Dwivedi, Vijay Prakash and Ramp. Long
-
[51]
Eberhard, Eike and Burger, Ludwig and Pastrana, C. Force. Nano Letters , volume =
-
[52]
and Takeda, Kenji and Huang, Chin-Wei and Luise, Giulia and van den Berg, Rianne and
Ehlert, Sebastian and Hermann, Jan and Vogels, Thijs and Satorras, Victor Garcia and Lanius, Stephanie and Segler, Marwin and Kooi, Derk P. and Takeda, Kenji and Huang, Chin-Wei and Luise, Giulia and van den Berg, Rianne and. Accurate
-
[53]
Auxiliary Basis Sets for Main Row Atoms and Transition Metals and Their Use to Approximate
Eichkorn, Karin and Weigend, Florian and Treutler, Oliver and Ahlrichs, Reinhart , year = 1997, journal =. Auxiliary Basis Sets for Main Row Atoms and Transition Metals and Their Use to Approximate
1997
-
[54]
Introduction to Representation Theory , author =
-
[55]
Compiling Machine Learning Programs via High-Level Tracing , author =
-
[56]
Fu, Xiang and Rosen, Andrew and Bystrom, Kyle and Wang, Rui and Musaelian, Albert and Kozinsky, Boris and Smidt, Tess and Jaakkola, Tommi , year = 2024, number =. A
2024
-
[57]
Stability
Gama, Fernando and Bruna, Joan and Ribeiro, Alejandro , year = 2020, journal =. Stability
2020
-
[58]
Learning
Gao, Nicholas and Eberhard, Eike and G. Learning. The
-
[59]
The Journal of Chemical Physics , volume =
Comparison of Self-Consistent Field Convergence Acceleration Techniques , author =. The Journal of Chemical Physics , volume =
-
[60]
Directional
Gasteiger, Johannes and Gro. Directional
-
[61]
Gasteiger, Johannes and Giri, Shankari and Margraf, Johannes T. and G. Fast and. arXiv.org , howpublished =
-
[62]
Gasteiger, Johannes and Giri, Shankari and Margraf, Johannes T. and G. Fast and
-
[63]
A Step toward Density Benchmarking---
Gould, Tim , year = 2023, journal =. A Step toward Density Benchmarking---
2023
-
[64]
Gray, Montgomery and Bowling, Paige and Herbert, John , year = 2024, publisher =. In
2024
-
[65]
Gross, Eberhard K. U. and Maitra, Neepa T. , editor =. Introduction to. Fundamentals of
-
[66]
A Comprehensive Analysis of the History of
Haunschild, Robin and Barth, Andreas and French, Bernie , year = 2019, journal =. A Comprehensive Analysis of the History of
2019
-
[67]
Helal, Hatem and Fitzgibbon, Andrew , year = 2024, number =
2024
-
[68]
Statistical
Helias, Moritz and Dahmen, David , year = 2020, series =. Statistical
2020
-
[69]
Denoising
Ho, Jonathan and Jain, Ajay and Abbeel, Pieter , year = 2020, number =. Denoising
2020
-
[70]
and Kohn, W
Hohenberg, P. and Kohn, W. , year = 1964, journal =. Inhomogeneous
1964
-
[71]
Imoto, Fumihiro and Imada, Masatoshi and Oshiyama, Atsushi , year = 2021, journal =. Order-\
2021
-
[72]
Isert, Clemens and Atz, Kenneth and
-
[73]
Averaging
Izmailov, Pavel and Podoprikhin, Dmitrii and Garipov, Timur and Vetrov, Dmitry and Wilson, Andrew Gordon , year = 2019, number =. Averaging
2019
-
[74]
Jacobs, Ryan and Morgan, Dane and Attarian, Siamak and Meng, Jun and Shen, Chen and Wu, Zhenghao and Xie, Clare Yijia and Yang, Julia H. and Artrith, Nongnuch and Blaiszik, Ben and Ceder, Gerbrand and Choudhary, Kamal and Csanyi, Gabor and Cubuk, Ekin Dogus and Deng, Bowen and Drautz, Ralf and Fu, Xiang and Godwin, Jonathan and Honavar, Vasant and Isayev,...
-
[75]
Jacobs, Ryan and Morgan, Dane and Attarian, Siamak and Meng, Jun and Shen, Chen and Wu, Zhenghao and Xie, Clare Yijia and Yang, Julia H. and Artrith, Nongnuch and Blaiszik, Ben and Ceder, Gerbrand and Choudhary, Kamal and Csanyi, Gabor and Cubuk, Ekin Dogus and Deng, Bowen and Drautz, Ralf and Fu, Xiang and Godwin, Jonathan and Honavar, Vasant and Isayev,...
2025
-
[76]
Introduction to Computational Chemistry , author =
-
[77]
The Journal of Chemical Physics , volume =
Conjugate-Gradient Optimization Method for Orbital-Free Density Functional Calculations , author =. The Journal of Chemical Physics , volume =
-
[78]
Nature Communications , volume =
Exact Exchange-Correlation Potentials from Ground-State Electron Densities , author =. Nature Communications , volume =
-
[79]
and Gavini, Vikram , year = 2023, journal =
Kanungo, Bikash and Hatch, Jeffrey and Zimmerman, Paul M. and Gavini, Vikram , year = 2023, journal =. Exact and
2023
-
[80]
Learning Local and Semi-Local Density Functionals from Exact Exchange-Correlation Potentials and Energies , author =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.