pith. sign in

arxiv: 2509.03683 · v2 · submitted 2025-09-03 · ❄️ cond-mat.supr-con

Attention is all you need to solve chiral superconductivity

Pith reviewed 2026-05-18 19:42 UTC · model grok-4.3

classification ❄️ cond-mat.supr-con
keywords neural quantum stateschiral superconductivityself-attentionattractive Fermi gasp-wave pairingtime-reversal symmetry breakingoff-diagonal long-range order
0
0 comments X

The pith

A self-attention Fermi neural network discovers chiral p_x ± ip_y superconductivity in an attractive Fermi gas without prior knowledge or bias towards pairing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a general-purpose self-attention Fermi neural network can identify the chiral superconducting state in an attractive Fermi gas solely by minimizing the energy of the wavefunction. This is achieved without any built-in assumptions about pairing symmetry or order. A sympathetic reader would care because it shows that attention-based architectures can capture complex quantum correlations and uncover exotic phases in many-body systems where manual guidance is limited. The approach combines energy optimization with post-processing via symmetry projection and density matrix analysis to confirm the p-wave chiral order and time-reversal breaking.

Core claim

We show that a general-purpose self-attention Fermi neural network is able to find chiral p_x ± ip_y superconductivity in an attractive Fermi gas by energy minimization, without prior knowledge or bias towards pairing. The superconducting state is identified from the optimized wavefunction by measuring various physical observables. We develop a symmetry projection method that reveals the ground state angular momentum and time-reversal symmetry breaking, and a computation of the full two-body reduced density matrix spectrum that reveals the off-diagonal long-range order due to the dominant chiral p-wave pairing channel.

What carries the argument

self-attention Fermi neural network, which represents the fermionic many-body wavefunction and uses attention to capture correlations between particles

If this is right

  • The method identifies the chiral superconducting state without any prior knowledge or bias toward pairing.
  • Symmetry projection on the optimized wavefunction reveals the ground state angular momentum and time-reversal symmetry breaking.
  • The spectrum of the two-body reduced density matrix shows off-diagonal long-range order in the dominant chiral p-wave channel.
  • This demonstrates a path for neural networks to discover unconventional and topological superconductivity in strongly correlated systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on lattice models of real materials to see if it identifies similar chiral or topological orders.
  • Attention mechanisms may help represent nonlocal correlations that are hard for other wavefunction ansatzes to capture.
  • Similar energy-minimization workflows might be applied to search for other symmetry-broken phases in quantum many-body problems.

Load-bearing premise

The neural network ansatz is expressive enough to reach the true ground state or a state with the correct symmetry breaking from random initialization.

What would settle it

If the energy-minimized wavefunction after symmetry projection shows zero angular momentum or the two-body reduced density matrix spectrum lacks dominant off-diagonal long-range order in the p-wave channel, the identification of chiral p-wave superconductivity would not hold.

Figures

Figures reproduced from arXiv: 2509.03683 by Chun-Tse Li, Hsin Lin, Liang Fu, Max Geier, Tzen Ong.

Figure 1
Figure 1. Figure 1: FIG. 1 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4 [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: , the corresponding eigenvector Φ0(k) exhibits a 2π phase winding around the origin, consistent with chi￾ral px +ipy symmetry. To our knowledge, this is the first variational Monte Carlo calculation that explicitly con￾structs and diagonalizes the two-body reduced density matrix to identify the Cooper pair wavefunction. Esti￾mator details and numerical stabilization procedures are described in Appendix E. … view at source ↗
read the original abstract

Recent advances on neural quantum states have shown that correlations between quantum particles can be efficiently captured by attention -- a foundation of modern neural architectures that enables neural networks to learn the relation between objects. In this work, we show that a general-purpose self-attention Fermi neural network is able to find chiral $p_x \pm ip_y$ superconductivity in an attractive Fermi gas by energy minimization, without prior knowledge or bias towards pairing. The superconducting state is identified from the optimized wavefunction by measuring various physical observables. We develop a symmetry projection method that reveals the ground state angular momentum and time-reversal symmetry breaking, and a computation of the full two-body reduced density matrix spectrum that reveals the off-diagonal long-range order due to the dominant chiral $p$-wave pairing channel. Our work paves the way for AI-driven discovery of unconventional and topological superconductivity in strongly correlated quantum materials.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that a general-purpose self-attention Fermi neural network ansatz, when variationally optimized by energy minimization on the attractive Fermi gas Hamiltonian with no built-in pairing bias or symmetry assumptions, spontaneously converges to a chiral p_x ± ip_y superconducting ground state. The state is identified post-optimization via symmetry projection that extracts nonzero angular momentum and time-reversal symmetry breaking, together with the spectrum of the two-body reduced density matrix that exhibits off-diagonal long-range order dominated by the chiral p-wave channel.

Significance. If the optimization reliably reaches a state with the claimed order parameter from random initialization, the result would be significant for demonstrating that attention-based neural quantum states can discover emergent topological superconductivity in an unbiased manner. This strengthens the case for using expressive, general-purpose neural ansatzes in regimes where conventional variational or mean-field approaches may miss subtle broken-symmetry phases.

major comments (2)
  1. [Numerical methods / optimization protocol] The central claim that the network reaches the chiral state without bias rests on the assumption that energy minimization from random initialization escapes local minima favoring symmetric or differently paired states. No convergence diagnostics (energy variance, multiple independent runs with statistics, or comparisons to exact diagonalization on small lattices) are reported in the numerical methods section; without these, the subsequent symmetry projection and RDM analysis could misidentify an incomplete optimization artifact.
  2. [Results / RDM analysis] In the section describing the two-body reduced density matrix computation, the identification of the dominant chiral p-wave channel as off-diagonal long-range order must be shown to be robust against finite-size effects and to survive extrapolation; the current presentation leaves open whether the reported spectrum is for a single system size or includes scaling that would confirm true long-range order.
minor comments (2)
  1. [Abstract] The abstract states that the state is identified from 'various physical observables' but does not enumerate them; adding a short explicit list would improve readability.
  2. [Methods / figure captions] Notation for the self-attention Fermi neural network architecture should be defined once in the methods and used consistently; occasional undefined symbols appear in the figure captions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and positive assessment of the significance of our results. We address each major comment below and have revised the manuscript accordingly to strengthen the numerical evidence.

read point-by-point responses
  1. Referee: [Numerical methods / optimization protocol] The central claim that the network reaches the chiral state without bias rests on the assumption that energy minimization from random initialization escapes local minima favoring symmetric or differently paired states. No convergence diagnostics (energy variance, multiple independent runs with statistics, or comparisons to exact diagonalization on small lattices) are reported in the numerical methods section; without these, the subsequent symmetry projection and RDM analysis could misidentify an incomplete optimization artifact.

    Authors: We agree that explicit convergence diagnostics are essential to support the claim of unbiased optimization. In the revised manuscript we have added a dedicated subsection to the numerical methods that reports (i) the energy variance as a function of training steps for representative runs, (ii) statistics over ten independent optimizations started from different random seeds, and (iii) direct comparisons with exact diagonalization on small lattices (4×4 and 6×6) where the neural-network energies match the exact ground-state energies to within 0.1 %. These additions confirm that the optimization consistently converges to the same low-energy chiral state. revision: yes

  2. Referee: [Results / RDM analysis] In the section describing the two-body reduced density matrix computation, the identification of the dominant chiral p-wave channel as off-diagonal long-range order must be shown to be robust against finite-size effects and to survive extrapolation; the current presentation leaves open whether the reported spectrum is for a single system size or includes scaling that would confirm true long-range order.

    Authors: We acknowledge that finite-size scaling is required to establish true long-range order. The revised manuscript now includes the two-body RDM spectrum for three system sizes (N=16, 36, 64) together with an extrapolation of the dominant chiral p-wave eigenvalue to the thermodynamic limit. The extrapolated value remains finite and clearly separated from other channels, confirming the presence of off-diagonal long-range order in the chiral p-wave sector. revision: yes

Circularity Check

0 steps flagged

No significant circularity: variational energy minimization yields independent observables

full rationale

The paper performs standard variational Monte Carlo optimization of a self-attention Fermi neural network wavefunction by minimizing the energy expectation value of the attractive Fermi gas Hamiltonian. The chiral p-wave order is then diagnosed post-optimization via independent measurements: projected angular momentum, time-reversal symmetry breaking, and the spectrum of the two-body reduced density matrix. None of these steps reduce to a self-definition, a fitted parameter renamed as prediction, or a load-bearing self-citation chain; the ansatz is general-purpose with no built-in pairing bias, and the identification relies on external physical observables rather than construction. The derivation chain is therefore self-contained against the external energy functional.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the variational principle for the ground state and the ability of the attention architecture to represent the required correlations. No new particles or forces are postulated. Network hyperparameters (depth, width, learning rate schedule) function as free parameters whose values are not reported in the abstract.

free parameters (1)
  • network architecture hyperparameters
    Number of attention layers, hidden dimension, and number of particles in the simulation are chosen by hand and affect whether the chiral state is reached.
axioms (1)
  • domain assumption Variational Monte Carlo energy minimization converges to a state whose symmetry properties can be read out from the optimized wavefunction
    Invoked when the authors state that the superconducting state is identified from the optimized wavefunction by measuring observables.

pith-pipeline@v0.9.0 · 5683 in / 1275 out tokens · 34249 ms · 2026-05-18T19:42:28.901556+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Fermi Sets: Universal and interpretable neural architectures for fermions

    cond-mat.str-el 2026-01 unverdicted novelty 7.0

    Fermi Sets achieve universal approximation of fermionic wavefunctions using K antisymmetric bases times symmetric neural networks, where K equals 1 in 1D, 2 in 2D, and grows linearly with particle number in higher dimensions.

  2. Enhancing Neural-Network Variational Monte Carlo through Basis Transformation

    cond-mat.str-el 2026-04 unverdicted novelty 6.0

    A learnable Gaussian basis transformation lowers variational energies in neural-network variational Monte Carlo for the three-dimensional homogeneous electron gas.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · cited by 2 Pith papers

  1. [1]

    = ˆc† x1ˆc† x2ˆcx′ 2ˆcx′ 1 , (6) which satisfies Tr ρ(2) = N(N −1). Equivalently, it is the partial trace of |Ψ⟩ ⟨Ψ| over N −2 particle coordinates: ρ(2) = N(N − 1) Z d ˜RΨ∗(x1, x2, ˜R)Ψ(x′ 1, x′ 2, ˜R), (7) where, for brevity, we used the notation ˜R ≡ (x3, . . . ,xN) to denote all other particle’s coordinates. The defining feature of superconductivity d...

  2. [2]

    This is the manifestation of macroscopic occupation of a two-particle (or Cooper pair) state, which is given by the corresponding eigenvector Φ0(x1, x2)

    has a large eigenvalue λ0 that is proportional to the particle number N [45]. This is the manifestation of macroscopic occupation of a two-particle (or Cooper pair) state, which is given by the corresponding eigenvector Φ0(x1, x2). For translationally invariant systems, Φ 0(x1, x2) is a prod- uct of the center-of-mass part and the relative wave- functions...

  3. [3]

    Hohenberg and W

    P. Hohenberg and W. Kohn, Inhomogeneous electron gas, Physical review 136, B864 (1964)

  4. [4]

    Kohn and L

    W. Kohn and L. J. Sham, Self-consistent equations in- cluding exchange and correlation effects, Physical review 140, A1133 (1965)

  5. [5]

    D. M. Ceperley and B. J. Alder, Ground state of the elec- tron gas by a stochastic method, Physical review letters 45, 566 (1980)

  6. [6]

    W. M. Foulkes, L. Mitas, R. Needs, and G. Rajagopal, Quantum monte carlo simulations of solids, Reviews of Modern Physics 73, 33 (2001)

  7. [7]

    Becca and S

    F. Becca and S. Sorella, Quantum Monte Carlo ap- proaches for correlated systems (Cambridge University Press, 2017)

  8. [8]

    S. R. White, Density matrix formulation for quantum renormalization groups, Physical review letters 69, 2863 (1992)

  9. [9]

    Verstraete, T

    F. Verstraete, T. Nishino, U. Schollw¨ ock, M. C. Ba˜ nuls, G. K. Chan, and M. E. Stoudenmire, Density matrix renormalization group, 30 years on, Nature Reviews Physics 5, 273 (2023)

  10. [10]

    Carleo and M

    G. Carleo and M. Troyer, Solving the quantum many- body problem with artificial neural networks, Science 355, 602 (2017)

  11. [11]

    D. Pfau, J. S. Spencer, A. G. Matthews, and W. M. C. Foulkes, Ab initio solution of the many-electron schr¨ odinger equation with deep neural networks, Physical review research 2, 033429 (2020)

  12. [13]

    von Glehn, J

    I. von Glehn, J. S. Spencer, and D. Pfau, A self-attention ansatz for ab-initio quantum chemistry, arXiv preprint arXiv:2211.13672 (2022)

  13. [14]

    Cybenko, Approximation by superpositions of a sig- moidal function, Mathematics of control, signals and sys- tems 2, 303 (1989)

    G. Cybenko, Approximation by superpositions of a sig- moidal function, Mathematics of control, signals and sys- tems 2, 303 (1989)

  14. [15]

    Funahashi, On the approximate realization of con- tinuous mappings by neural networks, Neural networks 2, 183 (1989)

    K.-I. Funahashi, On the approximate realization of con- tinuous mappings by neural networks, Neural networks 2, 183 (1989)

  15. [16]

    Hornik, M

    K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neu- ral networks 2, 359 (1989)

  16. [17]

    Pescia, J

    G. Pescia, J. Nys, J. Kim, A. Lovato, and G. Carleo, Message-passing neural quantum states for the homoge- neous electron gas, Phys. Rev. B 110, 035108 (2024)

  17. [18]

    D. Luo, D. D. Dai, and L. Fu, Simulating moir´ e quantum matter with neural network (2024), arXiv:2406.17645 [cond-mat.str-el]

  18. [19]

    X. Li, Y. Qian, W. Ren, Y. Xu, and J. Chen, Emergent wigner phases in moir´ esuperlattice from deep learning, Communications Physics 8, 364 (2025)

  19. [20]

    Smith, Y

    C. Smith, Y. Chen, R. Levy, Y. Yang, M. A. Morales, and S. Zhang, Unified variational approach description of ground-state phases of the two-dimensional electron gas, Phys. Rev. Lett. 133, 266504 (2024)

  20. [21]

    W. T. Lou, H. Sutterud, G. Cassella, W. M. C. Foulkes, J. Knolle, D. Pfau, and J. S. Spencer, Neural wave functions for superfluids, Physical Review X 14, 021030 (2024)

  21. [23]

    D. Luo, D. D. Dai, and L. Fu, Pairing-based graph neu- ral network for simulating quantum materials (2023), arXiv:2311.02143 [cond-mat.str-el]

  22. [24]

    Geier, K

    M. Geier, K. Nazaryan, T. Zaklama, and L. Fu, Self- attention neural network for solving correlated electron problems in solids, Phys. Rev. B 112, 045119 (2025)

  23. [25]

    Y. Teng, D. D. Dai, and L. Fu, Solving the fractional quantum hall problem with self-attention neural network, Physical Review B 111, 205117 (2025)

  24. [26]

    Read and D

    N. Read and D. Green, Paired states of fermions in two dimensions with breaking of parity and time-reversal symmetries and the fractional quantum hall effect, Phys- ical Review B 61, 10267 (2000)

  25. [27]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, At- tention is all you need, Advances in neural information 6 processing systems 30 (2017)

  26. [28]

    J. Kim, G. Pescia, B. Fore, J. Nys, G. Carleo, S. Gan- dolfi, M. Hjorth-Jensen, and A. Lovato, Neural-network quantum states for ultra-cold fermi gases, Communica- tions Physics 7, 148 (2024)

  27. [29]

    R. P. Feynman and M. Cohen, Energy spectrum of the excitations in liquid helium, Phys. Rev.102, 1189 (1956)

  28. [30]

    Y. Kwon, D. M. Ceperley, and R. M. Martin, Effects of three-body and backflow correlations in the two- dimensional electron gas, Phys. Rev. B 48, 12037 (1993)

  29. [31]

    Luo and B

    D. Luo and B. K. Clark, Backflow transformations via neural networks for quantum many-body wave functions, Phys. Rev. Lett. 122, 226401 (2019)

  30. [32]

    Hermann, Z

    J. Hermann, Z. Sch¨ atzle, and F. No´ e, Deep-neural- network solution of the electronic schr¨ odinger equation, Nature Chemistry 12, 891 (2020)

  31. [33]

    Gao and S

    N. Gao and S. G¨ unnemann, Generalizing neural wave functions, in Proceedings of the 40th International Con- ference on Machine Learning , ICML’23 (JMLR.org, 2023)

  32. [34]

    Hermann, J

    J. Hermann, J. Spencer, K. Choo, A. Mezzacapo, W. M. C. Foulkes, D. Pfau, G. Carleo, and F. No´ e, Ab initio quantum chemistry with neural-network wavefunc- tions, Nature Reviews Chemistry 7, 692 (2023)

  33. [35]

    Scherbela, L

    M. Scherbela, L. Gerard, and P. Grohs, Towards a trans- ferable fermionic neural wavefunction for molecules, Na- ture Communications 15, 120 (2024)

  34. [36]

    R. Li, H. Ye, D. Jiang, X. Wen, C. Wang, Z. Li, X. Li, D. He, J. Chen, W. Ren, and L. Wang, A computational framework for neural network-based variational monte carlo with forward laplacian, Nature Machine Intelligence 6, 209 (2024)

  35. [37]

    Foster, Z

    A. Foster, Z. Sch¨ atzle, P. B. Szab´ o, L. Cheng, J. K¨ ohler, G. Cassella, N. Gao, J. Li, F. No´ e, and J. Hermann, An ab initio foundation model of wavefunctions that accurately describes chemical bond breaking (2025), arXiv:2506.19960 [physics.chem-ph]

  36. [38]

    L. L. Viteritti, R. Rende, and F. Becca, Transformer vari- ational wave functions for frustrated quantum spin sys- tems, Phys. Rev. Lett. 130, 236401 (2023)

  37. [39]

    Y. Gu, W. Li, H. Lin, B. Zhan, R. Li, Y. Huang, D. He, Y. Wu, T. Xiang, M. Qin, L. Wang, and D. Lv, Solving the hubbard model with neural quantum states (2025), arXiv:2507.02644 [cond-mat.str-el]

  38. [41]

    Cassella, H

    G. Cassella, H. Sutterud, S. Azadi, N. D. Drummond, D. Pfau, J. S. Spencer, and W. M. C. Foulkes, Discover- ing Quantum Phase Transitions with Fermionic Neural Networks, Phys. Rev. Lett. 130, 036401 (2023)

  39. [42]

    Wilson, S

    M. Wilson, S. Moroni, M. Holzmann, N. Gao, F. Wu- darski, T. Vegge, and A. Bhowmik, Neural network ansatz for periodic wave functions and the homogeneous electron gas, Phys. Rev. B 107, 235139 (2023)

  40. [43]

    Gerard, M

    L. Gerard, M. Scherbela, H. Sutterud, M. Foulkes, and P. Grohs, Transferable neural wavefunctions for solids (2024), arXiv:2405.07599 [physics.comp-ph]

  41. [44]

    X. Li, Z. Li, and J. Chen, Ab initio calculation of real solids via neural network ansatz, Nature Communica- tions 13, 7895 (2022)

  42. [45]

    Rende, S

    R. Rende, S. Goldt, F. Becca, and L. L. Viteritti, Fine- tuning neural network quantum states, Physical Review Research 6, 043280 (2024)

  43. [46]

    Fu, Electron teleportation via majorana bound states in a mesoscopic superconductor, Physical review letters 104, 056402 (2010)

    L. Fu, Electron teleportation via majorana bound states in a mesoscopic superconductor, Physical review letters 104, 056402 (2010)

  44. [47]

    C. N. Yang, Concept of off-diagonal long-range order and the quantum phases of liquid he and of superconductors, Rev. Mod. Phys. 34, 694 (1962)

  45. [48]

    Y. Cao, V. Fatemi, S. Fang, K. Watanabe, T. Taniguchi, E. Kaxiras, and P. Jarillo-Herrero, Unconventional super- conductivity in magic-angle graphene superlattices, Na- ture 556, 43 (2018)

  46. [49]

    G. Chen, A. L. Sharpe, P. Gallagher, I. T. Rosen, E. J. Fox, L. Jiang, B. Lyu, H. Li, K. Watanabe, T. Taniguchi, J. Jung, Z. Shi, D. Goldhaber-Gordon, Y. Zhang, and F. Wang, Signatures of tunable superconductivity in a trilayer graphene moir´ e superlattice, Nature 572, 215 (2019)

  47. [50]

    X. Lu, P. Stepanov, W. Yang, M. Xie, M. A. Aamir, I. Das, C. Urgell, K. Watanabe, T. Taniguchi, G. Zhang, A. Bachtold, A. H. MacDonald, and D. K. Efetov, Su- perconductors, orbital magnets and correlated states in magic-angle bilayer graphene, Nature 574, 653 (2019)

  48. [51]

    H. S. Arora, R. Polski, Y. Zhang, A. Thomson, Y. Choi, H. Kim, Z. Lin, I. Z. Wilson, X. Xu, J.-H. Chu, K. Watan- abe, T. Taniguchi, J. Alicea, and S. Nadj-Perge, Super- conductivity in metallic twisted bilayer graphene stabi- lized by WSe2, Nature 583, 379 (2020)

  49. [52]

    Saito, J

    Y. Saito, J. Ge, K. Watanabe, T. Taniguchi, and A. F. Young, Independent superconductors and correlated in- sulators in twisted bilayer graphene, Nat. Phys. 16, 926 (2020)

  50. [53]

    J. M. Park, Y. Cao, K. Watanabe, T. Taniguchi, and P. Jarillo-Herrero, Tunable strongly coupled supercon- ductivity in magic-angle twisted trilayer graphene, Na- ture 590, 249 (2021)

  51. [54]

    Z. Hao, A. M. Zimmerman, P. Ledwith, E. Khalaf, D. H. Najafabadi, K. Watanabe, T. Taniguchi, A. Vishwanath, and P. Kim, Electric field–tunable superconductivity in alternating-twist magic-angle trilayer graphene, Science 371, 1133 (2021)

  52. [55]

    M. Oh, K. P. Nuckolls, D. Wong, R. L. Lee, X. Liu, K. Watanabe, T. Taniguchi, and A. Yazdani, Evidence for unconventional superconductivity in twisted bilayer graphene, Nature 600, 240 (2021)

  53. [56]

    H. Zhou, T. Xie, T. Taniguchi, K. Watanabe, and A. F. Young, Superconductivity in rhombohedral trilayer graphene, Nature 598, 434 (2021)

  54. [57]

    H. Kim, Y. Choi, C. Lewandowski, A. Thomson, Y. Zhang, R. Polski, K. Watanabe, T. Taniguchi, J. Al- icea, and S. Nadj-Perge, Evidence for unconventional su- perconductivity in twisted trilayer graphene, Nature606, 494 (2022)

  55. [58]

    C. Li, F. Xu, B. Li, J. Li, G. Li, K. Watanabe, T. Taniguchi, B. Tong, J. Shen, L. Lu, J. Jia, F. Wu, X. Liu, and T. Li, Tunable superconductivity in electron- and hole-doped Bernal bilayer graphene, Nature631, 300 (2024)

  56. [59]

    T. Han, Z. Lu, Z. Hadjri, L. Shi, Z. Wu, W. Xu, Y. Yao, A. A. Cotten, O. Sharifi Sedeh, H. Weldeyesus, J. Yang, J. Seo, S. Ye, M. Zhou, H. Liu, G. Shi, Z. Hua, K. Watan- abe, T. Taniguchi, P. Xiong, D. M. Zumb¨ uhl, L. Fu, and L. Ju, Signatures of chiral superconductivity in rhombo- hedral graphene, Nature 643, 654 (2025)

  57. [60]

    Y. Xia, Z. Han, K. Watanabe, T. Taniguchi, J. Shan, and K. F. Mak, Superconductivity in twisted bilayer WSe2, 7 Nature 637, 833 (2025)

  58. [61]

    Y. Guo, J. Pack, J. Swann, L. Holtzman, M. Cothrine, K. Watanabe, T. Taniguchi, D. G. Mandrus, K. Barmak, J. Hone, A. J. Millis, A. Pasupathy, and C. R. Dean, Superconductivity in 5.0 ◦twisted bilayer WSe2, Nature 637, 839 (2025)

  59. [62]

    J. G. Bednorz and K. A. M¨ uller, Possible high Tc super- conductivity in the Ba-La-Cu-O system, Zeitschrift f¨ ur Physik B Condensed Matter 64, 189 (1986)

  60. [63]

    M. K. Wu, J. R. Ashburn, C. J. Torng, P. H. Hor, R. L. Meng, L. Gao, Z. J. Huang, Y. Q. Wang, and C. W. Chu, Superconductivity at 93 K in a new mixed-phase Y-Ba- Cu-O compound system at ambient pressure, Phys. Rev. Lett. 58, 908 (1987)

  61. [64]

    Sorella, Green function monte carlo with stochastic reconfiguration, Physical review letters 80, 4558 (1998)

    S. Sorella, Green function monte carlo with stochastic reconfiguration, Physical review letters 80, 4558 (1998)

  62. [65]

    Amari, Natural gradient works efficiently in learn- ing, Neural computation 10, 251 (1998)

    S.-I. Amari, Natural gradient works efficiently in learn- ing, Neural computation 10, 251 (1998)

  63. [66]

    Stokes, J

    J. Stokes, J. Izaac, N. Killoran, and G. Carleo, Quantum natural gradient, Quantum 4, 269 (2020)

  64. [67]

    Martens and R

    J. Martens and R. Grosse, Optimizing neural networks with kronecker-factored approximate curvature, in Inter- national conference on machine learning (PMLR, 2015) pp. 2408–2417

  65. [68]

    H. Lu, S. Das Sarma, and K. Park, Superconducting or- der parameter for the even-denominator fractional quan- tum hall effect, Physical Review B—Condensed Matter and Materials Physics 82, 201303 (2010)

  66. [69]

    Attention is all you need to solve chiral superconductivity

    J. R. Schrieffer, Theory of superconductivity (CRC press, 2018). 8 Supplementary Material for: “Attention is all you need to solve chiral superconductivity” Chun-Tse Li1,2, Tzen Ong 1, Max Geier 3, Hsin Lin 1, and Liang Fu3 1Institute of Physics, Academia Sinica, Taipei 115201, Taiwan 2Department of Electrical and Computer Engineering, University of South...

  67. [70]

    Neural Network Quantum States and Generalized Slater Determinants In this subsection, we describe in detail the architecture of our neural-network quantum state for the spin-polarized attractive Fermi gas. A fundamental requirement for any fermionic wave function is antisymmetry under particle exchange: The simplest type of wavefunction that guarantees th...

  68. [71]

    In the periodic system, one need to enforce the wavefunction to satisfy the periodic boundary condition (PBC): Ψ(x1, ..., xj + L, ..., xN) = Ψ(x1, ..., xj, ..., xN)

    Self-Attention Neural Network Now, we discuss the detail construction of the self-attention neural network that generate the generalized orbitals Φk µ(xj; {x/j}). In the periodic system, one need to enforce the wavefunction to satisfy the periodic boundary condition (PBC): Ψ(x1, ..., xj + L, ..., xN) = Ψ(x1, ..., xj, ..., xN). (A8) where L = n1L1 + n2L2, ...

  69. [72]

    ,xN) collects all particle coordinates and H is the Hamiltonian

    Wavefunction Optimization In variational Monte Carlo (VMC) the variational energy of a parametrized wave-function Ψ θ(X) ∈ C is E(θ) = Z dXΨ∗ θ(X)HΨθ(X) Z dX|Ψθ(X)|2 , (C1) where X = ( x1, . . . ,xN) collects all particle coordinates and H is the Hamiltonian. Allowing Ψ θ to be complex accommodates possible time-reversal–symmetry breaking. Rewriting Eq. (...

  70. [73]

    Natural-Gradient (Stochastic Reconfiguration) Direct stochastic-gradient descent converges slowly because the energy landscape is highly anisotropic in parameter space. Stochastic reconfiguration (SR) preconditions the gradient ga with the quantum geometric tensor (QGT): Sab = EX∼pθ O∗ aOb − EX∼pθ O∗ a EX∼pθ Ob , (C7) producing the natural gradient ∆θ = −...

  71. [74]

    The KFAC formulation assumes that the matrix element Fab ≈ 0 if θa and θb are from different neural network layers

    Kronecker-Factored Approximate Curvature (KFAC) To overcome this bottleneck we use the Kronecker-factored Approximate Curvature (KFAC) optimizer [65], an efficient approximation to the natural gradient widely adopted in deep-learning and, more recently, in VMC [9]. The KFAC formulation assumes that the matrix element Fab ≈ 0 if θa and θb are from differen...

  72. [75]

    (D3) For a non-interacting (normal) Fermi gas, Wick factorization is exact and hence ρ(2) conn ≡ 0

    = ⟨ˆc† r1ˆcr′ 1 ⟩⟨ˆc† r2ˆcr′ 2 ⟩ − ⟨ˆc† r1ˆcr′ 2 ⟩⟨ˆc† r2ˆcr′ 1 ⟩, (D2) ρ(2) conn = ρ(2) − ρ(2) disc. (D3) For a non-interacting (normal) Fermi gas, Wick factorization is exact and hence ρ(2) conn ≡ 0. Although ρ(2) fully encodes pair correlations, it is a high-dimensional object. To distill superconducting signatures into a more accessible form, we consi...

  73. [76]

    Momentum-space occupation number The momentum-space occupation number can be written in first-quantized form as ⟨ˆn(k)⟩ = 1 V Z Z dx′ 1 dx′ 2 d ˜R eik·(x′ 1−x′

  74. [77]

    ,xN), V is the system volume, and Z = R dX |Ψ(X)|2

    Ψ∗(x′ 1, ˜R) Ψ(x′ 2, ˜R), (E1) where ˜R = (x2, . . . ,xN), V is the system volume, and Z = R dX |Ψ(X)|2. To evaluate this via Monte Carlo, we use the importance-sampling density p(x′ 1, x′ 2, ˜R) = 1 N Ψ(x′ 1, ˜R) Ψ(x′ 2, ˜R) , N = Z dx′ 1 dx′ 2 d ˜R Ψ(x′ 1, ˜R) Ψ(x′ 2, ˜R) . (E2) Define the relative phase θ(x′ 1, x′ 2, ˜R) = Arg h Ψ∗(x′ 1, ˜R) Ψ(x′ 2, ˜R...

  75. [78]

    (E4) The overall factor N /(V Z) can be fixed by particle-number conservation, P k⟨ˆn(k)⟩ = N

    eiθ(x′ 1,x′ 2, ˜R) i . (E4) The overall factor N /(V Z) can be fixed by particle-number conservation, P k⟨ˆn(k)⟩ = N. In practice we compute the unnormalized expectation and rescale so that this sum rule is satisfied. 16

  76. [79]

    (E5) In first quantization this can be written as Γk,k′ = 1 V2 Z Z d ˜R dx1dx2dx′ 1dx′ 2 Ψ∗(x1, x2, ˜R) Ψ(x′ 1, x′ 2, ˜R) e−ik·(x1−x2)+ik′·(x′ 1−x′ 2), (E6) where ˜R = (x3,

    Two-body reduced density matrix The zero–center-of-mass (Q = 0) 2-RDM in momentum space is Γk,k′ = ˆ∆†(k) ˆ∆(k′) , ˆ∆(k) = ˆc−kˆck. (E5) In first quantization this can be written as Γk,k′ = 1 V2 Z Z d ˜R dx1dx2dx′ 1dx′ 2 Ψ∗(x1, x2, ˜R) Ψ(x′ 1, x′ 2, ˜R) e−ik·(x1−x2)+ik′·(x′ 1−x′ 2), (E6) where ˜R = (x3, . . . ,xN), Z = R dX |Ψ(X)|2, and V is the spatial v...