arxiv: 2602.05345 · v2 · submitted 2026-02-05 · ⚛️ physics.chem-ph

End-to-End Differentiable Learning of a Single Functional for DFT and Linear-Response TDDFT

Xiaoyu Zhang This is my paper

Pith reviewed 2026-05-16 07:32 UTC · model grok-4.3

classification ⚛️ physics.chem-ph

keywords density functional theorytime-dependent DFTmachine-learned functionalsexchange-correlation functionaldifferentiable programmingexcitation energiesself-interaction error

0 comments

The pith

A single deep-learned functional can be optimized end-to-end using targets from both Kohn-Sham DFT and adiabatic LR-TDDFT.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a workflow that trains one exchange-correlation energy functional by back-propagating gradients through both ground-state self-consistent field solutions and excited-state linear-response calculations. Automatic differentiation supplies the potential and the response kernel directly from the learned functional, so the same object serves DFT and TDDFT without separate fitting steps. Training targets include excitation energies of small molecules together with explicit penalties that cancel one-electron self-interaction error. A reader would care because conventional functionals are usually adjusted independently for ground and excited states, whereas this construction enforces derivative consistency by design across both regimes.

Core claim

We present an end-to-end differentiable workflow to optimize a single deep-learned energy functional using targets from both Kohn-Sham DFT and adiabatic LR-TDDFT. The learned functional supplies the self-consistent potential and the linear-response kernel through automatic differentiation, permitting gradient-based optimization through the SCF fixed-point equations and the Casida eigenvalue problem.

What carries the argument

JAX-based two-component quantum chemistry framework that routes automatic differentiation through the SCF fixed-point solver and the Casida eigenvalue problem so that a single learned functional yields both the potential and the kernel.

Load-bearing premise

A functional trained on excitation energies of small molecules plus self-interaction penalties will transfer to unseen molecules without overfitting or causing divergence in the SCF or Casida solvers.

What would settle it

Run the trained functional on a held-out set of larger molecules and measure whether mean absolute errors in excitation energies stay within the training-set range and whether SCF and Casida iterations converge for every case.

read the original abstract

Density functional theory (DFT) and linear-response time-dependent density functional theory (LR-TDDFT) rely on an exchange-correlation (xc) approximation that provides not only energy but also its functional derivatives that enter the self-consistent potential and the response kernel. Here, we present an end-to-end differentiable workflow to optimize a single deep-learned energy functional using targets from both Kohn-Sham DFT and adiabatic LR-TDDFT. To enable this training in a computationally efficient and differentiable manner, we developed a JAX-based two-component quantum chemistry framework (IQC), in which the learned functional provides a self-consistent potential and linear-response kernel via automatic differentiation. This construction permits gradient-based optimization through both the self-consistent-field (SCF) fixed-point equations and the Casida eigenvalue problem. We learn an exchange-correlation functional on excitation energies of small molecules while incorporating one-electron self-interaction cancelation as penalty terms, and we assess its possible transfer to molecular test cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sets up an end-to-end differentiable JAX pipeline to train one neural xc functional on both DFT and LR-TDDFT targets, which is a clean technical move, but the absence of any reported numbers leaves the practical payoff unclear.

read the letter

The core advance is the automatic differentiation through both the Kohn-Sham SCF fixed point and the Casida eigenvalue problem inside a single JAX-based code (IQC). That lets them optimize one learned energy functional against excitation energies plus one-electron self-interaction penalties on small molecules. Prior ML-DFT work usually trains separate models or stops the gradient at the solver, so routing gradients all the way through both stages is new and worth seeing in detail. The self-interaction penalty is a sensible regularizer and the choice of adiabatic LR-TDDFT targets matches the intended use case in photochemistry. The construction itself looks formally sound on paper. The main limitation is that the manuscript supplies no error metrics, no comparison to standard functionals like PBE or B3LYP, no ablation on the penalty terms, and no test-set results on molecules outside the training distribution. Without those numbers it is impossible to judge whether the learned functional actually converges in SCF, avoids eigenvalue crossings that break differentiability, or improves excitation energies enough to matter. The stress-test worry about solver instability for out-of-distribution geometries is therefore still open; if the full text does not include stable training curves or damping strategies, that would need to be addressed. This work is aimed at groups already building differentiable quantum-chemistry frameworks who want to see the pipeline details. A reader looking for a ready-to-use functional for routine calculations would get little value until benchmarks appear. I would send it to peer review so the authors can add the missing validation and let referees check the stability claims directly.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an end-to-end differentiable workflow, implemented in a new JAX-based two-component quantum chemistry package (IQC), that optimizes a single deep neural-network exchange-correlation functional for simultaneous use in Kohn-Sham DFT (via the self-consistent potential) and adiabatic linear-response TDDFT (via the Casida kernel). Training targets are excitation energies of small molecules together with one-electron self-interaction cancellation penalties; the authors assess transferability to unseen molecular test cases.

Significance. If the numerical stability and transferability claims hold, the approach would constitute a genuine advance by allowing a single functional to be variationally consistent across ground- and excited-state properties without separate parametrizations. The use of automatic differentiation through both the SCF fixed-point and the Casida eigensolve is technically novel and, if robust, could be adopted by other groups working on learned functionals.

major comments (2)

[Section 3 (IQC framework and automatic differentiation)] The central claim of reliable end-to-end differentiability rests on the assumption that the SCF solver always converges to a unique, differentiable fixed point and that the Casida eigenvalues remain non-degenerate throughout training. No explicit regularization, damping schedule, or failure-recovery mechanism is described that would guarantee this behavior for an arbitrary neural xc functional on out-of-distribution geometries.
[Section 4 (training protocol and results)] The abstract states that the functional is trained on excitation energies while incorporating self-interaction penalties, yet the manuscript provides no quantitative error metrics, learning curves, or ablation studies on the training set. Without these data it is impossible to judge whether the learned functional actually improves upon existing approximations or merely reproduces the training targets.

minor comments (2)

[Section 2] Notation for the neural-network functional and its derivatives should be introduced once in a dedicated subsection rather than scattered across the text.
[Figure 2] Figure captions should explicitly state the molecules used for training versus testing and the number of data points in each set.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work's potential significance and for the constructive comments. We address each major point below and have revised the manuscript accordingly to strengthen the presentation of the IQC framework and training results.

read point-by-point responses

Referee: [Section 3 (IQC framework and automatic differentiation)] The central claim of reliable end-to-end differentiability rests on the assumption that the SCF solver always converges to a unique, differentiable fixed point and that the Casida eigenvalues remain non-degenerate throughout training. No explicit regularization, damping schedule, or failure-recovery mechanism is described that would guarantee this behavior for an arbitrary neural xc functional on out-of-distribution geometries.

Authors: We agree that explicit safeguards for SCF convergence and eigenvalue non-degeneracy are essential to substantiate the end-to-end differentiability claim. Although the IQC implementation employs standard adaptive damping and convergence monitoring within the JAX SCF solver, these were not described in sufficient detail. In the revised manuscript we have added a new paragraph to Section 3 that specifies the damping schedule, convergence thresholds, degeneracy checks during the Casida solve, and a simple restart protocol for rare non-convergent cases encountered on out-of-distribution geometries. revision: yes
Referee: [Section 4 (training protocol and results)] The abstract states that the functional is trained on excitation energies while incorporating self-interaction penalties, yet the manuscript provides no quantitative error metrics, learning curves, or ablation studies on the training set. Without these data it is impossible to judge whether the learned functional actually improves upon existing approximations or merely reproduces the training targets.

Authors: We acknowledge that the original manuscript did not present quantitative training-set metrics, learning curves, or ablation studies in a readily accessible form. We have revised Section 4 to include a new table reporting mean absolute errors on the training excitations versus PBE and B3LYP, an explicit learning curve figure, and a short ablation study isolating the contribution of the self-interaction penalty terms. These additions allow direct assessment of improvement over baseline functionals. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents a methodological workflow for training a neural-network exchange-correlation functional via end-to-end differentiation through SCF fixed-point and Casida equations, with the loss defined on external targets (excitation energies of small molecules plus one-electron self-interaction penalties). The functional itself is parameterized by the network architecture rather than being defined in terms of its own outputs or fitted parameters; gradients are obtained by automatic differentiation in the JAX framework without any self-referential closure or renaming of fitted quantities as predictions. No load-bearing self-citation, uniqueness theorem, or ansatz smuggling is invoked in the provided text. The derivation chain is therefore self-contained against independent data targets and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central claim rests on the adiabatic approximation for the TDDFT kernel, the existence of a differentiable SCF solver, and the assumption that neural-network parameters can be stably optimized through fixed-point iterations. The learned functional itself is the main added entity.

free parameters (1)

neural network weights
Parameters of the deep-learned xc functional are fitted to excitation energies and self-interaction penalties.

axioms (2)

domain assumption adiabatic approximation for the exchange-correlation kernel in LR-TDDFT
Invoked to allow the same functional to supply both the potential and the response kernel.
domain assumption stable convergence of the differentiable SCF fixed-point iteration
Required for gradient flow through the self-consistent equations.

invented entities (1)

deep-learned xc functional no independent evidence
purpose: Single neural-network replacement for conventional exchange-correlation approximations
The functional is trained end-to-end rather than hand-designed.

pith-pipeline@v0.9.0 · 5461 in / 1369 out tokens · 38902 ms · 2026-05-16T07:32:17.396678+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we present an end-to-end differentiable workflow to optimize a single deep-learned energy functional using targets from both Kohn-Sham DFT and adiabatic LR-TDDFT... learned functional provides a self-consistent potential and linear-response kernel via automatic differentiation
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The model takes as input the density matrix... block feature vector... neural architecture is strictly additive... E_IXC(D;θ) = sum ε_a

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.