End-to-End Differentiable Learning of a Single Functional for DFT and Linear-Response TDDFT
Pith reviewed 2026-05-16 07:32 UTC · model grok-4.3
The pith
A single deep-learned functional can be optimized end-to-end using targets from both Kohn-Sham DFT and adiabatic LR-TDDFT.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present an end-to-end differentiable workflow to optimize a single deep-learned energy functional using targets from both Kohn-Sham DFT and adiabatic LR-TDDFT. The learned functional supplies the self-consistent potential and the linear-response kernel through automatic differentiation, permitting gradient-based optimization through the SCF fixed-point equations and the Casida eigenvalue problem.
What carries the argument
JAX-based two-component quantum chemistry framework that routes automatic differentiation through the SCF fixed-point solver and the Casida eigenvalue problem so that a single learned functional yields both the potential and the kernel.
Load-bearing premise
A functional trained on excitation energies of small molecules plus self-interaction penalties will transfer to unseen molecules without overfitting or causing divergence in the SCF or Casida solvers.
What would settle it
Run the trained functional on a held-out set of larger molecules and measure whether mean absolute errors in excitation energies stay within the training-set range and whether SCF and Casida iterations converge for every case.
read the original abstract
Density functional theory (DFT) and linear-response time-dependent density functional theory (LR-TDDFT) rely on an exchange-correlation (xc) approximation that provides not only energy but also its functional derivatives that enter the self-consistent potential and the response kernel. Here, we present an end-to-end differentiable workflow to optimize a single deep-learned energy functional using targets from both Kohn-Sham DFT and adiabatic LR-TDDFT. To enable this training in a computationally efficient and differentiable manner, we developed a JAX-based two-component quantum chemistry framework (IQC), in which the learned functional provides a self-consistent potential and linear-response kernel via automatic differentiation. This construction permits gradient-based optimization through both the self-consistent-field (SCF) fixed-point equations and the Casida eigenvalue problem. We learn an exchange-correlation functional on excitation energies of small molecules while incorporating one-electron self-interaction cancelation as penalty terms, and we assess its possible transfer to molecular test cases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an end-to-end differentiable workflow, implemented in a new JAX-based two-component quantum chemistry package (IQC), that optimizes a single deep neural-network exchange-correlation functional for simultaneous use in Kohn-Sham DFT (via the self-consistent potential) and adiabatic linear-response TDDFT (via the Casida kernel). Training targets are excitation energies of small molecules together with one-electron self-interaction cancellation penalties; the authors assess transferability to unseen molecular test cases.
Significance. If the numerical stability and transferability claims hold, the approach would constitute a genuine advance by allowing a single functional to be variationally consistent across ground- and excited-state properties without separate parametrizations. The use of automatic differentiation through both the SCF fixed-point and the Casida eigensolve is technically novel and, if robust, could be adopted by other groups working on learned functionals.
major comments (2)
- [Section 3 (IQC framework and automatic differentiation)] The central claim of reliable end-to-end differentiability rests on the assumption that the SCF solver always converges to a unique, differentiable fixed point and that the Casida eigenvalues remain non-degenerate throughout training. No explicit regularization, damping schedule, or failure-recovery mechanism is described that would guarantee this behavior for an arbitrary neural xc functional on out-of-distribution geometries.
- [Section 4 (training protocol and results)] The abstract states that the functional is trained on excitation energies while incorporating self-interaction penalties, yet the manuscript provides no quantitative error metrics, learning curves, or ablation studies on the training set. Without these data it is impossible to judge whether the learned functional actually improves upon existing approximations or merely reproduces the training targets.
minor comments (2)
- [Section 2] Notation for the neural-network functional and its derivatives should be introduced once in a dedicated subsection rather than scattered across the text.
- [Figure 2] Figure captions should explicitly state the molecules used for training versus testing and the number of data points in each set.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work's potential significance and for the constructive comments. We address each major point below and have revised the manuscript accordingly to strengthen the presentation of the IQC framework and training results.
read point-by-point responses
-
Referee: [Section 3 (IQC framework and automatic differentiation)] The central claim of reliable end-to-end differentiability rests on the assumption that the SCF solver always converges to a unique, differentiable fixed point and that the Casida eigenvalues remain non-degenerate throughout training. No explicit regularization, damping schedule, or failure-recovery mechanism is described that would guarantee this behavior for an arbitrary neural xc functional on out-of-distribution geometries.
Authors: We agree that explicit safeguards for SCF convergence and eigenvalue non-degeneracy are essential to substantiate the end-to-end differentiability claim. Although the IQC implementation employs standard adaptive damping and convergence monitoring within the JAX SCF solver, these were not described in sufficient detail. In the revised manuscript we have added a new paragraph to Section 3 that specifies the damping schedule, convergence thresholds, degeneracy checks during the Casida solve, and a simple restart protocol for rare non-convergent cases encountered on out-of-distribution geometries. revision: yes
-
Referee: [Section 4 (training protocol and results)] The abstract states that the functional is trained on excitation energies while incorporating self-interaction penalties, yet the manuscript provides no quantitative error metrics, learning curves, or ablation studies on the training set. Without these data it is impossible to judge whether the learned functional actually improves upon existing approximations or merely reproduces the training targets.
Authors: We acknowledge that the original manuscript did not present quantitative training-set metrics, learning curves, or ablation studies in a readily accessible form. We have revised Section 4 to include a new table reporting mean absolute errors on the training excitations versus PBE and B3LYP, an explicit learning curve figure, and a short ablation study isolating the contribution of the self-interaction penalty terms. These additions allow direct assessment of improvement over baseline functionals. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents a methodological workflow for training a neural-network exchange-correlation functional via end-to-end differentiation through SCF fixed-point and Casida equations, with the loss defined on external targets (excitation energies of small molecules plus one-electron self-interaction penalties). The functional itself is parameterized by the network architecture rather than being defined in terms of its own outputs or fitted parameters; gradients are obtained by automatic differentiation in the JAX framework without any self-referential closure or renaming of fitted quantities as predictions. No load-bearing self-citation, uniqueness theorem, or ansatz smuggling is invoked in the provided text. The derivation chain is therefore self-contained against independent data targets and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network weights
axioms (2)
- domain assumption adiabatic approximation for the exchange-correlation kernel in LR-TDDFT
- domain assumption stable convergence of the differentiable SCF fixed-point iteration
invented entities (1)
-
deep-learned xc functional
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we present an end-to-end differentiable workflow to optimize a single deep-learned energy functional using targets from both Kohn-Sham DFT and adiabatic LR-TDDFT... learned functional provides a self-consistent potential and linear-response kernel via automatic differentiation
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The model takes as input the density matrix... block feature vector... neural architecture is strictly additive... E_IXC(D;θ) = sum ε_a
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.