Representability-Aware Neural Networks for Reduced Density Matrices: Application to Fractional Chern Insulators

Awwab A. Azam; Haining Pan; Jiabin Yu; Justin B. Hart; Thomas Li; Ye Bi; Yunxuan Li

arxiv: 2605.20326 · v1 · pith:CFN2OXJ5new · submitted 2026-05-19 · ❄️ cond-mat.str-el · cs.AI

Representability-Aware Neural Networks for Reduced Density Matrices: Application to Fractional Chern Insulators

Justin B. Hart , Awwab A. Azam , Thomas Li , Yunxuan Li , Ye Bi , Haining Pan , Jiabin Yu This is my paper

Pith reviewed 2026-05-21 01:07 UTC · model grok-4.3

classification ❄️ cond-mat.str-el cs.AI

keywords reduced density matricesneural networksrepresentability conditionsfractional Chern insulatorstwisted bilayer MoTe2variational optimizationexact diagonalizationmoire materials

0 comments

The pith

A neural network embedding representability conditions variationally optimizes 2-RDMs for fractional Chern insulators and yields energies closer to exact results than semidefinite programming while using far fewer parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a neural network that respects key physical constraints on two-particle reduced density matrices and tests it on the fractional Chern insulator realized in twisted bilayer MoTe2 at 2/3 hole filling. The network is first trained on exact-diagonalization data from small momentum meshes and can then be used either to interpolate 2-RDMs onto larger meshes or to variationally minimize the energy on any chosen mesh. When the residual multilayer perceptron is optimized variationally on a 6x6 mesh it reaches an energy only 0.104 meV below the exact ground state while keeping 98.9 percent fidelity to the exact 2-RDM. This performance is obtained with less than one-twentieth the number of parameters required by boundary-point semidefinite programming, which itself lies 5.56 meV below the exact energy. The approach therefore supplies a scalable route to approximate many-body states in moire materials without solving the full Hilbert space.

Core claim

The residual multilayer perceptron neural network, after variational energy minimization on the 6x6 momentum mesh for the one-band projected model of twisted bilayer MoTe2 at 3.89 degrees and 2/3 filling, produces a 2-RDM whose energy lies 0.104 meV below the exact-diagonalization ground state while reproducing the exact 2-RDM to 98.94-98.96 percent accuracy, outperforming boundary-point semidefinite programming in energy accuracy with less than 1/20 as many parameters.

What carries the argument

Representability-aware neural network whose architecture and loss function embed a subset of 2-RDM representability conditions, augmented by interpolated representability conditions evaluated across multiple momentum meshes.

If this is right

The same network can interpolate exact 2-RDMs from 12- or 18-point meshes to produce predictions on larger meshes without retraining.
Variational optimization remains stable when the mesh is enlarged to 48 points, yielding estimates of both the many-body energy and the many-body quantum metric.
Accuracy to the exact 2-RDM stays above 98.9 percent even after variational relaxation, showing that the embedded constraints continue to enforce physicality.
The parameter count remains more than twenty times smaller than that of boundary-point SDP while delivering a lower variational energy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The interpolation mechanism could be applied to other moire systems where exact diagonalization is limited to small clusters, allowing continuum-limit estimates without exponential cost growth.
If the subset of enforced conditions proves robust, the method may reduce reliance on post-hoc projections or purification steps common in other variational 2-RDM approaches.
Extending the same architecture to three-particle reduced density matrices would test whether the representability-aware design generalizes beyond two-body quantities.

Load-bearing premise

Embedding only a subset of representability conditions in the architecture and loss function, together with cross-mesh interpolation, is sufficient to keep the network outputs physically valid during variational energy minimization.

What would settle it

Evaluate the full set of N-representability conditions on the variationally optimized 6x6 2-RDM and check whether any violation exceeds the threshold that would render the matrix unphysical, or compare the NN-predicted energy on the added 48-point mesh against an independent larger-scale calculation.

Figures

Figures reproduced from arXiv: 2605.20326 by Awwab A. Azam, Haining Pan, Jiabin Yu, Justin B. Hart, Thomas Li, Ye Bi, Yunxuan Li.

**Figure 3.** Figure 3: FIG. 3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Visualization of the sampling of the 1BZin the conventional meshes (a-d) and the tilted mesh (e) with [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Eigenvalue of the pair-pair correlation function vs. index for the training data on the conventional [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Visualization of the sampling of the 1BZfor the tilted mesh training of the pair-pair correlation function. The convention [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. True pair-pair correlation function (left) and predicted pair-pair correlation function (right) by a KAN for the Richard [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

**Figure 9.** Figure 9: FIG. 9. R-value (a) and normalized overlap (b) versus number of parameters when training on the [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗

**Figure 10.** Figure 10: FIG. 10. True pair-pair correlation function (left) and predicted pair-pair correlation function (right) by a KAN for the [PITH_FULL_IMAGE:figures/full_fig_p031_10.png] view at source ↗

**Figure 11.** Figure 11: FIG. 11. R-value (a) and normalized overlap (b) versus number of parameters for the training on 4 tilted [PITH_FULL_IMAGE:figures/full_fig_p032_11.png] view at source ↗

**Figure 12.** Figure 12: FIG. 12. The ED spectrum for twisted bilayer MoTe [PITH_FULL_IMAGE:figures/full_fig_p035_12.png] view at source ↗

**Figure 13.** Figure 13: , there are no clear principal components, unlike the case for the Richardson model. Therefore, in this case, we do not use eigendecomposition for the NNs. L1 × L2 M EED kGS 2 × 6 [PITH_FULL_IMAGE:figures/full_fig_p037_13.png] view at source ↗

**Figure 14.** Figure 14: FIG. 14. R-value (a) and normalized overlap (b) versus number of parameters for the best performance of each NN architecture [PITH_FULL_IMAGE:figures/full_fig_p041_14.png] view at source ↗

read the original abstract

We develop a representability-aware and interpolable neural network (NN) framework for predicting two-particle reduced density matrices (2-RDMs). The NN incorporates a subset of representability conditions through its architecture and loss function, and can operate on different momentum meshes, enabling evaluating the representability conditions across multiple meshes, which we call interpolated representability condition. The framework can be used either to predict 2-RDMs on large momentum meshes by interpolating exact results from small meshes, or as a variational 2-RDM ansatz optimized by energy minimization on arbitrary meshes. We apply this approach to the fractional Chern insulator in the one-band projected model of twisted bilayer MoTe$_2$ at twist angle $3.89^\circ$ and hole filling $2/3$. Trained on exact-diagonalization (ED) 2-RDMs from meshes with $12$ or $18$ momentum points using six different NN architectures, the best NN is the residual multilayer perceptron, which predicts the $6\times6$ 2-RDM with $97.07\%-98.18\%$ accuracy relative to the ED 2-RDM but predicts an energy $77.353$ meV above ED ground-state energy. We then variationally optimize the NN on several meshes including $6\times6$, predicting a $6\times 6$ energy of just $0.104$ meV below ED while maintaining $98.94\%-98.96\%$ accuracy. Compared with the conventional boundary-point semidefinite programming, which gives an energy $5.560$ meV below ED with $96.40\%-98.94\%$ accuracy, the NN achieves a more accurate energy and similar accuracy while using only less than 1/20 as many parameters. Eventually, we add a symmetric mesh of $48$ momentum points to the variational optimization of the NN, and provide a prediction of the many-body ground-state energy and the many-body quantum metric on that mesh.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The NN framework for 2-RDMs in fractional Chern insulators delivers practical accuracy gains over SDP but the variational run on the 6x6 mesh produces an energy 0.104 meV below exact diagonalization.

read the letter

The paper introduces a residual multilayer perceptron that folds a subset of representability conditions into its architecture and loss, then adds interpolation across momentum meshes so the same network can predict 2-RDMs on larger grids or serve as a variational ansatz. They apply it to the one-band projected twisted bilayer MoTe2 model at 3.89 degrees and 2/3 filling, training on ED data from 12- and 18-point meshes before optimizing directly on 6x6 and 48-point meshes. The best network reaches 98.94-98.96 percent fidelity to the ED 2-RDM while using far fewer parameters than boundary-point SDP. That combination of built-in constraints, mesh interpolation, and direct variational use is the concrete advance. The numbers show the NN beats SDP on energy accuracy for the same system size, which is useful for anyone who needs 2-RDM access on meshes where full ED or heavy SDP becomes expensive. The extension to a 48-point mesh for both energy and quantum metric predictions is a straightforward next step that follows from the interpolation design. The soft spot sits in the variational result itself. On a finite 6x6 mesh, exact diagonalization already gives the true ground-state energy. Any 2-RDM that yields a lower energy must lie outside the N-representable set. The reported 0.104 meV undershoot is small, yet it is still below the exact value, and the SDP undershoots by a larger margin. This indicates that the subset of conditions plus interpolation does not fully close the physical domain during minimization. The high fidelity to the ED 2-RDM is reassuring but does not remove the inconsistency with the energy. Readers working on numerical methods for strongly correlated 2D electrons, especially those comparing variational 2-RDM approaches to SDP or ED, will find the benchmarks and scaling demonstration worth their time. The work is coherent on its own terms and engages the literature directly, so it merits a serious referee who can press on the exact representability enforcement and suggest tighter constraints or post-processing checks. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a representability-aware neural network framework for predicting and variationally optimizing two-particle reduced density matrices (2-RDMs). The NN embeds a subset of N-representability conditions in its architecture and loss function, supports interpolation of these conditions across momentum meshes, and is applied to the one-band projected model of twisted bilayer MoTe2 at 3.89° twist angle and 2/3 hole filling. Trained on exact-diagonalization (ED) 2-RDMs from small meshes (12 or 18 points), the residual multilayer perceptron predicts 6x6 2-RDMs with 97-98% accuracy; when variationally optimized on the 6x6 mesh it yields an energy 0.104 meV below the ED ground state while retaining ~98.95% fidelity to the ED 2-RDM, using <1/20 the parameters of boundary-point semidefinite programming (SDP), which itself lies 5.56 meV below ED. The method is further used to predict energies and quantum metrics on a 48-point mesh.

Significance. If the variational outputs can be shown to remain strictly inside the N-representable set, the approach would provide a computationally lightweight, mesh-flexible alternative to SDP for 2-RDM variational calculations in strongly correlated lattice models. The combination of architecture-enforced constraints, loss-based penalties, and cross-mesh interpolation is a novel way to inject physical priors into machine-learned density-matrix ansatzes, and the reported parameter efficiency relative to SDP is a concrete practical advantage.

major comments (2)

[Abstract] Abstract: The reported variational energy of 0.104 meV below the ED ground-state energy on the finite 6x6 mesh directly contradicts the claim that the optimized 2-RDM remains physically valid. Because ED furnishes the exact ground-state energy for this system size, any lower variational energy obtained from the NN 2-RDM implies that the matrix lies outside the N-representable set, indicating that the subset of conditions plus interpolated constraints are insufficient to keep the ansatz inside the physical domain during energy minimization.
[Abstract] Abstract and variational-optimization section: The NN achieves only a 0.104 meV violation while SDP reaches 5.56 meV, yet both are benchmarked against the same ED reference; this quantitative difference does not demonstrate that the NN constraints are adequate, only that they are tighter than the SDP relaxation used. A concrete test (e.g., explicit evaluation of the full set of N-representability conditions on the optimized NN 2-RDM or a comparison against a tighter SDP formulation) is needed to substantiate the representability claim.

minor comments (2)

The manuscript should clarify whether the 0.104 meV difference is within the numerical tolerance of the energy evaluation or constitutes a genuine physical violation.
Provide explicit pseudocode or a supplementary table listing which specific representability conditions (e.g., P, Q, G, T1, T2) are enforced by architecture versus loss versus interpolation.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their thorough review and valuable feedback on our manuscript. We address each of the major comments in detail below, providing clarifications and indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The reported variational energy of 0.104 meV below the ED ground-state energy on the finite 6x6 mesh directly contradicts the claim that the optimized 2-RDM remains physically valid. Because ED furnishes the exact ground-state energy for this system size, any lower variational energy obtained from the NN 2-RDM implies that the matrix lies outside the N-representable set, indicating that the subset of conditions plus interpolated constraints are insufficient to keep the ansatz inside the physical domain during energy minimization.

Authors: We acknowledge the validity of this observation. The fact that the variationally optimized energy is 0.104 meV below the exact ED ground-state energy indicates that the resulting 2-RDM does not strictly satisfy all N-representability conditions, despite the incorporation of a subset of these conditions in the NN architecture and loss function. This small violation suggests that the enforced constraints, including the interpolated ones, are not sufficient to fully confine the ansatz to the N-representable set during minimization. We will revise the abstract to remove any implication of strict physical validity and instead report the energy as being in close agreement with ED, while noting the small deviation. Additionally, we will expand the discussion in the variational optimization section to address this point explicitly. revision: yes
Referee: [Abstract] Abstract and variational-optimization section: The NN achieves only a 0.104 meV violation while SDP reaches 5.56 meV, yet both are benchmarked against the same ED reference; this quantitative difference does not demonstrate that the NN constraints are adequate, only that they are tighter than the SDP relaxation used. A concrete test (e.g., explicit evaluation of the full set of N-representability conditions on the optimized NN 2-RDM or a comparison against a tighter SDP formulation) is needed to substantiate the representability claim.

Authors: We agree that the smaller energy violation relative to the SDP result demonstrates that our NN constraints are more effective than the boundary-point SDP relaxation used in the comparison, but it does not conclusively prove that the optimized 2-RDM is fully N-representable. A direct evaluation of the complete set of N-representability conditions on the NN-optimized 2-RDM would indeed be a valuable addition to substantiate the claims. However, computing the full set of conditions is highly computationally intensive, which is the reason we focused on a practical subset in our framework. We will include a discussion of this limitation in the revised manuscript and, if possible, perform and report checks on additional representability conditions for the optimized 2-RDMs. revision: partial

standing simulated objections not resolved

A complete evaluation of the full set of N-representability conditions on the optimized NN 2-RDMs, due to the prohibitive computational cost for the system sizes under consideration.

Circularity Check

0 steps flagged

No significant circularity in the representability-aware NN derivation

full rationale

The paper's chain starts from independent ED 2-RDM data on small meshes (12 or 18 points) as training input. The NN then either interpolates via the architecture/loss or performs variational energy minimization on target meshes (including 6x6). Reported energies and accuracies are benchmarked directly against external ED ground-state energies and boundary-point SDP results, with no reduction of a claimed prediction to a fitted quantity by construction. No load-bearing self-citation, uniqueness theorem, or ansatz smuggling is present in the provided derivation steps. The slight energy violation relative to ED is a correctness concern, not a circularity in the methodological chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach depends on exact-diagonalization training data and the assumption that partial representability enforcement suffices; no new physical entities are introduced.

free parameters (1)

Neural network parameters
Weights and biases across the six architectures are fitted to ED 2-RDMs and further optimized variationally.

axioms (1)

domain assumption A subset of representability conditions incorporated in architecture and loss is adequate to produce physically valid 2-RDMs
Stated in the framework description for both prediction and variational modes.

pith-pipeline@v0.9.0 · 5929 in / 1446 out tokens · 56674 ms · 2026-05-21T01:07:20.375931+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

variational optimization... minimize the energy and the interpolated LPSDQ/G

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

The MLP is the standard NN architecture, owing to its ability to universally approximate any vector valued function with arbitrary precision [185]

MLP Our first architecture is the MLP. The MLP is the standard NN architecture, owing to its ability to universally approximate any vector valued function with arbitrary precision [185]. Based on this property, given an arbitrary inputx∈R m , the MLP is able to predict an outputf(x)∈Rn. The function described in Eq. (C1) which represents the NN is thus an...

work page
[2]

Residual MLP A Residual MLP modifies the MLP by replacing each hidden layer with a residual block. Each block now computes h(l+1) =LayerNorm h(l) +F(h (l), W(l) 1 , W(l) 2 ,f (l) 1 ,f (l) 2 ) (C5) 18 Hyperparameter Value Depth [1,2,3,4,5,6] Hidden Dimension [1,2,4,8,16,32,64,128,256] TABLE I. List of hyperparameters for training an MLP. The depth,d, refer...

work page
[3]

For this representation, a KAN replaces the fixed activation functions of an MLP with a learnable activation function defined on each edge of the network

KAN While an MLP is based on the universal approximation theorem, KANs are based on the Kolmogorov-Arnold representation theorem, which presents a way to represent multivariate continuous functions as a sum of functions of a single variable [188]. For this representation, a KAN replaces the fixed activation functions of an MLP with a learnable activation ...

work page
[4]

This method has been shown to be effective in representing continuous signals and complex function mappings in the context of implicit neural representations [192]

SIREN A SIREN replaces the standard activations of the MLP with a sinusoidal representation such that each layer has the component-wise activation function σ(x) = sin(ωx)(C12) whereωis eitherω 0 orω 1 defined below. This method has been shown to be effective in representing continuous signals and complex function mappings in the context of implicit neural...

work page
[5]

FINER A FINER is a modification of the SIREN in order to better represent a broad spectrum of frequencies. While a SIREN has a fixed scaling value, a FINER uses the activation function σ(x) = sin(ωi(|x|+ 1)x).(C14) 20 Hyperparameter Value Depth [1,2,3,4,5] Hidden Dimension [1,2,4,16,32,64,128,256] Firstω 0 [3,6,12] Hiddenω 1 [5,10] Linear Layer [True] TAB...

work page
[6]

Our implementation is largely based on the linear attention variant from Ref.[193]

TBN The TBN is a Transformer-Based Neural Network that uses a global self-attention mechanism [2] to map input coordinates to n-RDM values. Our implementation is largely based on the linear attention variant from Ref.[193]. Unlike the MLP and residual MLP architectures, the TBN processes all input coordinates simultaneously, allowing each point to attend ...

work page
[7]

For each loss function, we will specify the form of NN it is used for

Loss Functions for Training and Optimization In this part, we will discuss all the loss functions. For each loss function, we will specify the form of NN it is used for. Even if we stick to one way of using NN, we may not adopt all loss functions in all scenarios and the specific ways of using the same loss function may vary, which will be specified in Ap...

work page
[8]

We use (i) R-value accuracy and (ii) normalized overlap, as scores defining the NN accuracy at predicting the large-sizen-RDM

Testing Criteria To evaluate the effectiveness of the architectures, two different quantities were used to benchmark the accuracy prediction. We use (i) R-value accuracy and (ii) normalized overlap, as scores defining the NN accuracy at predicting the large-sizen-RDM. For the NNs that predict the complete 2-RDMs, the PSD of predicted2Qand 2Gin Eq. (D7) an...

work page
[9]

We benchmark the parameter efficiency only for the interpolation tasks, evaluating both the R-value and normalized overlap as a function of trainable parameters

Parameter Efficiency To benchmark the NN, we use the number of parameters needed to achieve a certain accuracy as a benchmark of the efficiency, leading to the parameter-efficiency plots. We benchmark the parameter efficiency only for the interpolation tasks, evaluating both the R-value and normalized overlap as a function of trainable parameters. To gene...

work page
[10]

We particularly focus on the pair-pair correlation function Ctrue,kk′ ≡ ⟨A † kAk′⟩ − ⟨A† k⟩⟨Ak′ ⟩.(E2) where⟨

Model Setup The Richardson model of superconductivity [196–198] has the Hamiltonian H= X k,s εkc† k,sck,s + u L2 X k A† k ! X k′ Ak′ ! .(E1) wherec † ks creates an electron at Bloch momnetumkand spins,A † k =c † k,↑c† −k,↓ is the Cooper-pair operator,u <0 denotes attractive interaction,εk =t(cosk x + cosky), andktakes value from anL1 ×L 2 mesh. We particu...

work page
[11]

We then predict on an conventional18×18mesh

Training Data and Preprocessing For the Richardson Model, we train the NN on both a conventional6×6mesh and 4 tilted meshes each of which has 12kpoints. We then predict on an conventional18×18mesh. We will provide a general discuss on both ways, and then describe the differences. Both methods train on the dominant component. Specifically, in Fig. 6 we see...

work page
[12]

The prediction is then evaluated on the18×18mesh

ML Results In this section, we evaluate the performance of our NN architectures for training on both a6×6kpoint mesh and on 4 tilted12−kmeshes of the Richardson model’s pair-pair correlation function. The prediction is then evaluated on the18×18mesh. a. Trained on6×6Mesh When trained on the6×6mesh, each architecture is able to reach a similar level of per...

work page
[13]

Model Formulation WefirststartwithabriefreviewoftheK-valleymodelfortMoTe 2 [128]thatweuseinthiswork. Thenon-interacting part of the model has a general form H h K,0 =− X Mx,My∈N X l,l′=t,b Z d2r iMx+My ∂Mx x ∂My y ec† K,l,r tMxMy ll′ (r)ecK,l′,r (F1) whereec† K,l,r is the creation operator for a hole in the K valley in thelth layer at positionr, andNdenot...

work page
[14]

(F6) We always consider the 2-RDMs that are generated by one ground state with a definite many-body momentum

Parametrization of RDMs In the case of the projectedtMoTe2, the physical 2-RDMs reads 2Dtrue k1k2k3k4 = D c† k1 c† k2 ck3 ck4 E 2Qtrue k1k2k3k4 = D ck1 ck2 c† k3 c† k4 E 2Gtrue k1k2k3k4 = D c† k1 ck2 c† k3 ck4 E . (F6) We always consider the 2-RDMs that are generated by one ground state with a definite many-body momentum. Therefore, the physical and the p...

work page
[15]

Many-body quantum metric from 2-RDM The many-body quantum metric is defined by the tracking the evolution of the many-body state under twist boundary condition [199–205]. Specifically, given|Ψ(q)⟩with twist-boundary conditionq, we define |Φq⟩=e −iq·X |Ψ(q)⟩,(F17) whereXis the many-body position operator X= Z d2r X l rec† K,l,recK,l,r .(F18) Then, the many...

work page
[16]

We obtain the training and testing data via ED on the projected Hamiltonian Eq

Training Data and Testing Data In this section, we first discuss how we obtain our training and testing data, and explain the preprocessing necessary for the use of the training data in our NN. We obtain the training and testing data via ED on the projected Hamiltonian Eq. (F4) for2×6,3×4,3×5,3×6, 4×6,5×6,6×6conventional meshes and a6×2tilted mesh, all wi...

work page 2020
[17]

An architecture plot of the framework can be seen in Fig

Framework of the NN In this section, we discuss the framework for the NN, which is shared by both the interpolation and variational optimization. An architecture plot of the framework can be seen in Fig. 1, and we expand and generalize that description. Generally, the framework is given a momentum mesh. It then uses a NN to predict the value of an object ...

work page
[18]

This method is characterized by exact 2-RDM on smallkmeshes, which we use to train the NN and predict a large mesh which

Interpolation Training We first describe the interpolation training. This method is characterized by exact 2-RDM on smallkmeshes, which we use to train the NN and predict a large mesh which. For this study, we use take two choices of sampling: a single-mesh interpolation with conventional3×6training mesh, or a three-mesh interpolation with three training ...

work page
[19]

learning rate fromηtoη/100. We setN w = 800warmup epochs training the MSE loss alone (i.e.αP SDQ/G = αT1 =α Edist = 0) so that the model does not get stuck in a non-optimal local minimum based on the other losses. Following that, we spend the nextNr =r·(N epochs −N w)epochs gradually increasing the penalty weights to their full reported values, wherer= 0....

work page arXiv 2097
[20]

Specifically, we note that the energies predicted by the interpolation-trained NNs (App

Variational Optimization In this section, we develop a method to variationally optimize our NN. Specifically, we note that the energies predicted by the interpolation-trained NNs (App. [F6]) are far higher than the ED ground state on6×6, and hence wish to improve the accuracy of their prediction. To do so, we first initialize our NN pipeline with weights ...

work page 2097
[21]

We first briefly review the BPSDP algorithm described in Ref.[95]

Boundary-Point Semidefinite Programming To compare with the neural network results, we also perform the BPSDP. We first briefly review the BPSDP algorithm described in Ref.[95]. The general algorithm is a first-order primal–dual method for solving SDP problems of the form ESDP = min x cT x,subject toAx=bandM ℓ(x)⪰0∀l,(F41) wherexdenotes the independent va...

work page arXiv 2065

[1] [1]

The MLP is the standard NN architecture, owing to its ability to universally approximate any vector valued function with arbitrary precision [185]

MLP Our first architecture is the MLP. The MLP is the standard NN architecture, owing to its ability to universally approximate any vector valued function with arbitrary precision [185]. Based on this property, given an arbitrary inputx∈R m , the MLP is able to predict an outputf(x)∈Rn. The function described in Eq. (C1) which represents the NN is thus an...

work page

[2] [2]

Residual MLP A Residual MLP modifies the MLP by replacing each hidden layer with a residual block. Each block now computes h(l+1) =LayerNorm h(l) +F(h (l), W(l) 1 , W(l) 2 ,f (l) 1 ,f (l) 2 ) (C5) 18 Hyperparameter Value Depth [1,2,3,4,5,6] Hidden Dimension [1,2,4,8,16,32,64,128,256] TABLE I. List of hyperparameters for training an MLP. The depth,d, refer...

work page

[3] [3]

For this representation, a KAN replaces the fixed activation functions of an MLP with a learnable activation function defined on each edge of the network

KAN While an MLP is based on the universal approximation theorem, KANs are based on the Kolmogorov-Arnold representation theorem, which presents a way to represent multivariate continuous functions as a sum of functions of a single variable [188]. For this representation, a KAN replaces the fixed activation functions of an MLP with a learnable activation ...

work page

[4] [4]

This method has been shown to be effective in representing continuous signals and complex function mappings in the context of implicit neural representations [192]

SIREN A SIREN replaces the standard activations of the MLP with a sinusoidal representation such that each layer has the component-wise activation function σ(x) = sin(ωx)(C12) whereωis eitherω 0 orω 1 defined below. This method has been shown to be effective in representing continuous signals and complex function mappings in the context of implicit neural...

work page

[5] [5]

FINER A FINER is a modification of the SIREN in order to better represent a broad spectrum of frequencies. While a SIREN has a fixed scaling value, a FINER uses the activation function σ(x) = sin(ωi(|x|+ 1)x).(C14) 20 Hyperparameter Value Depth [1,2,3,4,5] Hidden Dimension [1,2,4,16,32,64,128,256] Firstω 0 [3,6,12] Hiddenω 1 [5,10] Linear Layer [True] TAB...

work page

[6] [6]

Our implementation is largely based on the linear attention variant from Ref.[193]

TBN The TBN is a Transformer-Based Neural Network that uses a global self-attention mechanism [2] to map input coordinates to n-RDM values. Our implementation is largely based on the linear attention variant from Ref.[193]. Unlike the MLP and residual MLP architectures, the TBN processes all input coordinates simultaneously, allowing each point to attend ...

work page

[7] [7]

For each loss function, we will specify the form of NN it is used for

Loss Functions for Training and Optimization In this part, we will discuss all the loss functions. For each loss function, we will specify the form of NN it is used for. Even if we stick to one way of using NN, we may not adopt all loss functions in all scenarios and the specific ways of using the same loss function may vary, which will be specified in Ap...

work page

[8] [8]

We use (i) R-value accuracy and (ii) normalized overlap, as scores defining the NN accuracy at predicting the large-sizen-RDM

Testing Criteria To evaluate the effectiveness of the architectures, two different quantities were used to benchmark the accuracy prediction. We use (i) R-value accuracy and (ii) normalized overlap, as scores defining the NN accuracy at predicting the large-sizen-RDM. For the NNs that predict the complete 2-RDMs, the PSD of predicted2Qand 2Gin Eq. (D7) an...

work page

[9] [9]

We benchmark the parameter efficiency only for the interpolation tasks, evaluating both the R-value and normalized overlap as a function of trainable parameters

Parameter Efficiency To benchmark the NN, we use the number of parameters needed to achieve a certain accuracy as a benchmark of the efficiency, leading to the parameter-efficiency plots. We benchmark the parameter efficiency only for the interpolation tasks, evaluating both the R-value and normalized overlap as a function of trainable parameters. To gene...

work page

[10] [10]

We particularly focus on the pair-pair correlation function Ctrue,kk′ ≡ ⟨A † kAk′⟩ − ⟨A† k⟩⟨Ak′ ⟩.(E2) where⟨

Model Setup The Richardson model of superconductivity [196–198] has the Hamiltonian H= X k,s εkc† k,sck,s + u L2 X k A† k ! X k′ Ak′ ! .(E1) wherec † ks creates an electron at Bloch momnetumkand spins,A † k =c † k,↑c† −k,↓ is the Cooper-pair operator,u <0 denotes attractive interaction,εk =t(cosk x + cosky), andktakes value from anL1 ×L 2 mesh. We particu...

work page

[11] [11]

We then predict on an conventional18×18mesh

Training Data and Preprocessing For the Richardson Model, we train the NN on both a conventional6×6mesh and 4 tilted meshes each of which has 12kpoints. We then predict on an conventional18×18mesh. We will provide a general discuss on both ways, and then describe the differences. Both methods train on the dominant component. Specifically, in Fig. 6 we see...

work page

[12] [12]

The prediction is then evaluated on the18×18mesh

ML Results In this section, we evaluate the performance of our NN architectures for training on both a6×6kpoint mesh and on 4 tilted12−kmeshes of the Richardson model’s pair-pair correlation function. The prediction is then evaluated on the18×18mesh. a. Trained on6×6Mesh When trained on the6×6mesh, each architecture is able to reach a similar level of per...

work page

[13] [13]

Model Formulation WefirststartwithabriefreviewoftheK-valleymodelfortMoTe 2 [128]thatweuseinthiswork. Thenon-interacting part of the model has a general form H h K,0 =− X Mx,My∈N X l,l′=t,b Z d2r iMx+My ∂Mx x ∂My y ec† K,l,r tMxMy ll′ (r)ecK,l′,r (F1) whereec† K,l,r is the creation operator for a hole in the K valley in thelth layer at positionr, andNdenot...

work page

[14] [14]

(F6) We always consider the 2-RDMs that are generated by one ground state with a definite many-body momentum

Parametrization of RDMs In the case of the projectedtMoTe2, the physical 2-RDMs reads 2Dtrue k1k2k3k4 = D c† k1 c† k2 ck3 ck4 E 2Qtrue k1k2k3k4 = D ck1 ck2 c† k3 c† k4 E 2Gtrue k1k2k3k4 = D c† k1 ck2 c† k3 ck4 E . (F6) We always consider the 2-RDMs that are generated by one ground state with a definite many-body momentum. Therefore, the physical and the p...

work page

[15] [15]

Many-body quantum metric from 2-RDM The many-body quantum metric is defined by the tracking the evolution of the many-body state under twist boundary condition [199–205]. Specifically, given|Ψ(q)⟩with twist-boundary conditionq, we define |Φq⟩=e −iq·X |Ψ(q)⟩,(F17) whereXis the many-body position operator X= Z d2r X l rec† K,l,recK,l,r .(F18) Then, the many...

work page

[16] [16]

We obtain the training and testing data via ED on the projected Hamiltonian Eq

Training Data and Testing Data In this section, we first discuss how we obtain our training and testing data, and explain the preprocessing necessary for the use of the training data in our NN. We obtain the training and testing data via ED on the projected Hamiltonian Eq. (F4) for2×6,3×4,3×5,3×6, 4×6,5×6,6×6conventional meshes and a6×2tilted mesh, all wi...

work page 2020

[17] [17]

An architecture plot of the framework can be seen in Fig

Framework of the NN In this section, we discuss the framework for the NN, which is shared by both the interpolation and variational optimization. An architecture plot of the framework can be seen in Fig. 1, and we expand and generalize that description. Generally, the framework is given a momentum mesh. It then uses a NN to predict the value of an object ...

work page

[18] [18]

This method is characterized by exact 2-RDM on smallkmeshes, which we use to train the NN and predict a large mesh which

Interpolation Training We first describe the interpolation training. This method is characterized by exact 2-RDM on smallkmeshes, which we use to train the NN and predict a large mesh which. For this study, we use take two choices of sampling: a single-mesh interpolation with conventional3×6training mesh, or a three-mesh interpolation with three training ...

work page

[19] [19]

learning rate fromηtoη/100. We setN w = 800warmup epochs training the MSE loss alone (i.e.αP SDQ/G = αT1 =α Edist = 0) so that the model does not get stuck in a non-optimal local minimum based on the other losses. Following that, we spend the nextNr =r·(N epochs −N w)epochs gradually increasing the penalty weights to their full reported values, wherer= 0....

work page arXiv 2097

[20] [20]

Specifically, we note that the energies predicted by the interpolation-trained NNs (App

Variational Optimization In this section, we develop a method to variationally optimize our NN. Specifically, we note that the energies predicted by the interpolation-trained NNs (App. [F6]) are far higher than the ED ground state on6×6, and hence wish to improve the accuracy of their prediction. To do so, we first initialize our NN pipeline with weights ...

work page 2097

[21] [21]

We first briefly review the BPSDP algorithm described in Ref.[95]

Boundary-Point Semidefinite Programming To compare with the neural network results, we also perform the BPSDP. We first briefly review the BPSDP algorithm described in Ref.[95]. The general algorithm is a first-order primal–dual method for solving SDP problems of the form ESDP = min x cT x,subject toAx=bandM ℓ(x)⪰0∀l,(F41) wherexdenotes the independent va...

work page arXiv 2065