Attention-based optimizer for symmetry finding
Pith reviewed 2026-06-29 06:33 UTC · model grok-4.3
The pith
A Set-Transformer uses self-attention on Pauli strings and commutation optimization to locate symmetries of Hamiltonians near-deterministically for physical models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Built on a Set-Transformer architecture, the framework uses self-attention to encode the pairwise and higher-order correlations among the Pauli strings. The relations are then decoded as a candidate, which is further optimized with a custom commutation-based objective and mapped to a symmetry of the input Hamiltonian. For physical Hamiltonians including the periodic one- and two-dimensional transverse-field Ising model and the Toric code, the framework succeeds with near-deterministic probability while providing substantial advantage compared to state-of-the-art strategies.
What carries the argument
Set-Transformer architecture that applies self-attention to Pauli strings, followed by commutation-based optimization to produce valid symmetries.
If this is right
- Physical Hamiltonians such as the transverse-field Ising model and Toric code are handled with near-deterministic success probability.
- The method supplies a substantial advantage compared to state-of-the-art symmetry-finding strategies.
- For random Pauli Hamiltonians the number of parallel starts and the number of GPUs needed for high success probability can be estimated under fixed design specifications.
Where Pith is reading between the lines
- The same attention-plus-commutation loop could be tested on other many-body Hamiltonians to check whether near-deterministic performance holds beyond the models examined.
- Resource estimates for random cases imply that larger system sizes will require scaling the number of parallel GPU starts accordingly.
- The output symmetries could be fed directly into existing quantum simulation codes to reduce effective Hilbert-space dimension before numerical work begins.
Load-bearing premise
Self-attention on Pauli strings followed by commutation optimization will reliably map to a valid symmetry of the input Hamiltonian for the tested physical models.
What would settle it
Repeated independent runs on the Toric code that return a candidate which does not commute with every term of the Hamiltonian.
Figures
read the original abstract
Finding symmetries is crucial for understanding physical models. In this work, we present an optimization framework that searches Pauli symmetries of Hamiltonians, merging the fields of machine learning with automated symmetry finding. Built on a Set-Transformer architecture, our framework uses self-attention to encode the pairwise and higher-order correlations among the Pauli-Strings. The relations are then decoded as a candidate, which is further optimized with a custom commutation-based objective, and mapped to a symmetry of the input Hamiltonian. We apply our method to random Pauli Hamiltonians, periodic one and two dimensional transverse-field Ising model and the Toric code. We show that for physical Hamiltonians (Ising and Toric), our framework succeeds with near-deterministic probability while providing substantial advantage compared to state-of-the-art strategies. For random Pauli Hamiltonians, we estimate the required computational resources, specifically the number of parallel starts and the number of GPUs, to find a symmetry with high success probability under fixed design specifications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an optimization framework using a Set-Transformer architecture to search for Pauli symmetries of Hamiltonians. Self-attention encodes pairwise and higher-order correlations among Pauli strings; candidates are decoded and refined via a custom commutation-based objective to map to valid symmetries of the input Hamiltonian. The method is applied to random Pauli Hamiltonians, 1D/2D transverse-field Ising models, and the Toric code, with claims of near-deterministic success probability on the physical models, substantial advantage over SOTA strategies, and resource estimates (parallel starts and GPUs) for the random case.
Significance. If the reported success rates hold under the tested conditions, the work demonstrates a viable integration of attention-based ML with domain-specific commutation optimization for symmetry discovery in quantum Hamiltonians. The near-deterministic performance on standard physical models (Ising, Toric) and the provision of concrete resource estimates constitute practical strengths that could aid automated analysis of many-body systems.
minor comments (3)
- The abstract asserts near-deterministic success and substantial SOTA advantage without quantitative success rates, baselines, or error bars; these metrics (present in the full manuscript) should be summarized in the abstract for immediate clarity.
- The description of the commutation-based objective and its mapping to a valid symmetry would benefit from an explicit equation or pseudocode block to make the optimization step fully reproducible from the text.
- Experimental details on the number of trials, definition of success, and exact baselines used for the Ising/Toric comparisons should be consolidated in one dedicated subsection for easier evaluation of the claimed advantage.
Simulated Author's Rebuttal
We thank the referee for their positive summary of our work and for recommending minor revision. The referee's description accurately reflects the Set-Transformer approach, commutation objective, and results on Ising, Toric code, and random Pauli Hamiltonians. No major comments were provided in the report.
Circularity Check
No circularity; empirical ML pipeline on tested Hamiltonians
full rationale
The manuscript describes a Set-Transformer architecture with self-attention on Pauli strings, followed by a commutation-based optimization objective, applied empirically to random Pauli Hamiltonians, Ising models, and the Toric code. Reported success probabilities and resource estimates are direct experimental outcomes on those instances. No equations, parameters, or claims reduce by construction to fitted inputs, self-definitions, or self-citation chains; the central results are falsifiable measurements on standard models with no load-bearing internal reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
[39] (grey variations)
as a way to benchmark the CPU (light blue) and GPU (blue) implementations (for device specifications, see Appendix C) of our attention-based optimizer, and compare the results against the deterministic algorithm of Ref. [39] (grey variations). For a periodic chain ofnq spins the input for our frame- work is [68] HIsing =−J nX i=1 ZiZi+1 −h x nX i=1 Xi. Th...
-
[2]
are in black (Ising ladder), dark-gray (Toric with ⃗B), and light-gray (Toric without ⃗B). The Hamiltonian for the 2-D Ising ladder withn y = 2 legs andn x rungs ( =⇒n q =n ynx = 2nx) is given by HIL =J X ⟨i,j⟩ ZiZj +h nqX i=1 Xi, where⟨i, j⟩indicates connected qubitsiandj. On the other hand, the Toric code considers a rectangular lat- tice with periodic ...
-
[3]
E. Noether, Nachrichten von der K¨ oniglichen Gesellschaft der Wissenschaften zu G¨ ottingen, Mathematisch- Physikalische Klasse , 235 (1918), reprinted/translated in Transport Theory and Statistical Physics1(3), 183–207 (1971)
1918
-
[4]
Gross, Proceedings of the National Academy of Sci- ences of the United States of America93, 14256 (1996)
D. Gross, Proceedings of the National Academy of Sci- ences of the United States of America93, 14256 (1996)
1996
-
[5]
E. P. Wigner, Proceedings of the Na- tional Academy of Sciences51, 956 (1964), https://www.pnas.org/doi/pdf/10.1073/pnas.51.5.956
-
[6]
Metropolis, A
N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, The Journal of Chemical Physics21, 1087 (1953)
1953
-
[7]
Car and M
R. Car and M. Parrinello, Phys. Rev. Lett.55, 2471 (1985)
1985
-
[8]
Landau and K
D. Landau and K. Binder,A Guide to Monte Carlo Sim- ulations in Statistical Physics, 5th ed. (Cambridge Uni- versity Press, 2021)
2021
-
[9]
H. Q. Lin, Phys. Rev. B42, 6561 (1990)
1990
-
[10]
Exact diagonalization techniques for quan- tum spin systems,
J. Schnack, “Exact diagonalization techniques for quan- tum spin systems,” inComputational Modelling of Molec- ular Nanomagnets, edited by G. Rajaraman (Springer International Publishing, Cham, 2023) pp. 155–177
2023
-
[11]
Troyer and U.-J
M. Troyer and U.-J. Wiese, Phys. Rev. Lett.94, 170201 (2005)
2005
-
[12]
Manousakis, Rev
E. Manousakis, Rev. Mod. Phys.63, 1 (1991)
1991
-
[13]
S. R. White, Phys. Rev. Lett.69, 2863 (1992)
1992
-
[14]
Banks, J
J. Banks, J. Garza-Vargas, A. Kulkarni, and N. Sri- vastava, Foundations of Computational Mathematics , 1 (2022)
2022
-
[15]
¨Ostlund and S
S. ¨Ostlund and S. Rommer, Phys. Rev. Lett.75, 3537 (1995)
1995
-
[16]
Verstraete and J
F. Verstraete and J. I. Cirac, Phys. Rev. B73, 094423 (2006)
2006
-
[17]
J. I. Cirac, D. P´ erez-Garc´ ıa, N. Schuch, and F. Ver- straete, Rev. Mod. Phys.93, 045003 (2021)
2021
-
[18]
Verstraete, M
F. Verstraete, M. M. Wolf, D. Perez-Garcia, and J. I. Cirac, Phys. Rev. Lett.96, 220601 (2006)
2006
-
[19]
Renormalization algorithms for Quantum-Many Body Systems in two and higher dimensions
F. Verstraete and J. I. Cirac, “Renormalization algo- rithms for quantum-many body systems in two and higher dimensions,” (2004), arXiv:cond-mat/0407066 [cond-mat.str-el]
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[20]
Shi, L.-M
Y.-Y. Shi, L.-M. Duan, and G. Vidal, Phys. Rev. A74, 022320 (2006)
2006
-
[21]
Tagliacozzo, G
L. Tagliacozzo, G. Evenbly, and G. Vidal, Phys. Rev. B 80, 235127 (2009)
2009
-
[22]
Cheng, L
S. Cheng, L. Wang, T. Xiang, and P. Zhang, Phys. Rev. B99, 155131 (2019)
2019
-
[23]
Or´ us, Nature Reviews Physics1, 538 (2019)
R. Or´ us, Nature Reviews Physics1, 538 (2019)
2019
-
[24]
B. Lanthier, J. Cˆ ot´ e, and S. Kourtis, Frontiers in Physics Volume 12 - 2024(2024), 10.3389/fphy.2024.1431810
-
[25]
Haussler and M
D. Haussler and M. Warmuth, The Mathematics of Gen- eralization , 17 (2018)
2018
-
[26]
P. Horn, V. Saz Ulibarrena, B. Koren, and S. Portegies Zwart, Journal of Computational Physics521, 113536 (2025)
2025
-
[27]
Greydanus, M
S. Greydanus, M. Dzamba, and J. Yosinski, Advances in neural information processing systems32(2019)
2019
- [28]
-
[29]
Mandal, Y
A. Mandal, Y. Tiwari, P. K. Panigrahi, and M. Pal, Chaos, Solitons & Fractals164, 112670 (2022)
2022
-
[30]
A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W. Battaglia, CoRR abs/2002.09405(2020), 2002.09405. 10
-
[31]
Corso, H
G. Corso, H. Stark, S. Jegelka, T. Jaakkola, and R. Barzilay, Nature Reviews Methods Primers4, 17 (2024)
2024
-
[32]
A. C. Cenxin, K. Onggadinata, D. Kaszlikowski, and V. Scarani, PRX Quantum4, 020352 (2023)
2023
-
[33]
Y. R. Sanders, D. W. Berry, P. C. Costa, L. W. Tessler, N. Wiebe, C. Gidney, H. Neven, and R. Babbush, PRX Quantum1, 020312 (2020)
2020
-
[34]
Gray and S
J. Gray and S. Kourtis, Quantum5, 410 (2021)
2021
-
[35]
Kundu, P
A. Kundu, P. Bede lek, M. Ostaszewski, O. Danaci, Y. J. Patel, V. Dunjko, and J. A. Miszczak, New Journal of Physics26, 013034 (2024)
2024
-
[36]
Kundu, Machine Learning: Science and Technology 6, 025066 (2025)
A. Kundu, Machine Learning: Science and Technology 6, 025066 (2025)
2025
-
[37]
Eisert, M
J. Eisert, M. Cramer, and M. B. Plenio, Rev. Mod. Phys. 82, 277 (2010)
2010
- [38]
-
[39]
Bettaque and B
V. Bettaque and B. Swingle, Quantum8, 1362 (2024)
2024
-
[40]
Tapering off qubits to simulate fermionic Hamiltonians
S. Bravyi, J. M. Gambetta, A. Mezzacapo, and K. Temme, “Tapering off qubits to simulate fermionic hamiltonians,” (2017), arXiv:1701.08213 [quant-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[41]
L. G. Gunderman, A. Jena, and L. Dellantonio, Phys. Rev. A109, 022618 (2024)
2024
-
[42]
van den Berg and K
E. van den Berg and K. Temme, Quantum4, 322 (2020)
2020
-
[43]
Stabilizer Codes and Quantum Error Correction
D. Gottesman,Stabilizer Codes and Quantum Error Cor- rection, Ph.D. thesis, California Institute of Technology (1997), arXiv:quant-ph/9705052
work page internal anchor Pith review Pith/arXiv arXiv 1997
-
[44]
The Heisenberg Representation of Quantum Computers
D. Gottesman, inProceedings of the XXII International Colloquium on Group Theoretical Methods in Physics (1999) pp. 32–43, arXiv:quant-ph/9807006
work page internal anchor Pith review Pith/arXiv arXiv 1999
-
[45]
Aaronson and D
S. Aaronson and D. Gottesman, Phys. Rev. A70, 052328 (2004)
2004
-
[46]
Krippendorf and M
S. Krippendorf and M. Syvaeri, Machine Learning: Sci- ence and Technology2, 015010 (2020)
2020
-
[47]
Liu and M
Z. Liu and M. Tegmark, Phys. Rev. Lett.128, 180201 (2022)
2022
-
[48]
Calvo-Barl´ es, S
P. Calvo-Barl´ es, S. G. Rodrigo, E. S´ anchez-Burillo, and L. Mart´ ın-Moreno, Phys. Rev. E110, 045304 (2024)
2024
-
[49]
Learning Symmetries of Classical Integrable Systems
R. Bondesan and A. Lamacraft, “Learning symmetries of classical integrable systems,” (2019), arXiv:1906.04645 [physics.comp-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[50]
R. T. Forestano, K. T. Matchev, K. Matcheva, A. Ro- man, E. B. Unlu, and S. Verner, Machine Learning: Sci- ence and Technology4, 025027 (2023)
2023
-
[51]
Clifford symmetries in quantum many-body systems
C. Nation, R. P. A. Simon, S. Banerjee, F. Martini, A. Ricottone, F. Cerisola, and L. Dellantonio, “Clifford symmetries in quantum many-body systems,” (2026), arXiv:2605.18966 [quant-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[52]
Graph automorphism approach to obtain clifford symmetries in open and closed qudit models,
C. Nation, R. P. A. Simon, F. Martini, A. Ricottone, S. Banerjee, F. Cerisola, and L. Dellantonio, “Graph automorphism approach to obtain clifford symmetries in open and closed qudit models,” (2026)
2026
-
[53]
J. Lee, Y. Lee, J. Kim, A. R. Kosiorek, S. Choi, and Y. W. Teh, inICML(2018)
2018
-
[54]
Connor, G
M. Connor, G. Canal, and C. Rozell, inProceedings of The 24th International Conference on Artificial Intelli- gence and Statistics, Proceedings of Machine Learning Research, Vol. 130, edited by A. Banerjee and K. Fuku- mizu (PMLR, 2021) pp. 2359–2367
2021
-
[55]
Vaswani, N
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, inAdvances in Neural Information Processing Systems, Vol. 30 (2017)
2017
-
[56]
Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, inProceedings of the 2016 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, edited by K. Knight, A. Nenkova, and O. Rambow (As- sociation for Computational Linguistics, San Diego, Cal- ifornia, 2016) pp. 1480–1489
2016
-
[57]
J. L. Ba, J. R. Kiros, and G. E. Hinton, arXiv preprint arXiv:1607.06450 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[58]
C. J. Maddison, A. Mnih, and Y. W. Teh, CoRR abs/1611.00712(2016), 1611.00712
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[59]
Learning Sparse Neural Networks through $L_0$ Regularization
C. Louizos, M. Welling, and D. P. Kingma, “Learn- ing sparse neural networks throughl 0 regularization,” (2018), arXiv:1712.01312 [stat.ML]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[60]
Zaheer, S
M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Advances in neural in- formation processing systems30(2017)
2017
-
[61]
Dehaene and B
J. Dehaene and B. De Moor, Phys. Rev. A68, 042318 (2003)
2003
-
[62]
B. M. Terhal, Rev. Mod. Phys.87, 307 (2015)
2015
-
[63]
Haah, Revista Colombiana de Matem´ aticas50, 299 (2017)
J. Haah, Revista Colombiana de Matem´ aticas50, 299 (2017)
2017
-
[64]
Y. Leviathan, M. Kalman, and Y. Matias, inProceedings of the International Conference on Learning Representa- tions (ICLR)(2025) arXiv:2410.02703 [cs.CL]
-
[65]
Jurafsky and J
D. Jurafsky and J. H. Martin,Speech and Language Pro- cessing: An Introduction to Natural Language Process- ing, Computational Linguistics, and Speech Recognition with Language Models, 3rd ed. (2026) online manuscript released January 6, 2026
2026
-
[66]
Neural Machine Translation by Jointly Learning to Align and Translate
D. Bahdanau, K. Cho, and Y. Bengio, arXiv preprint arXiv:1409.0473 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[67]
Efficient Estimation of Word Representations in Vector Space
T. Mikolov, K. Chen, G. Corrado, and J. Dean, arXiv preprint arXiv:1301.3781 (2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[68]
X. S. Huang, F. Perez, J. Ba, and M. Volkovs, inPro- ceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 119, edited by H. D. III and A. Singh (PMLR, 2020) pp. 4475–4483
2020
-
[69]
Dean and M
T. Dean and M. Boddy, inProceedings of the Seventh AAAI National Conference on Artificial Intelligence, AAAI’88 (AAAI Press, 1988) p. 49–54
1988
-
[70]
Pfeuty, Annals of Physics57, 79 (1970)
P. Pfeuty, Annals of Physics57, 79 (1970)
1970
-
[71]
C. R. Laumann, R. Moessner, A. Scardicchio, and S. L. Sondhi, Phys. Rev. Lett.109, 030502 (2012)
2012
-
[72]
Quantum simulation and ground state preparation for the honeycomb kitaev model,
T. A. Bespalova and O. Kyriienko, “Quantum simulation and ground state preparation for the honeycomb kitaev model,” (2021), arXiv:2109.13883 [quant-ph]
-
[73]
Dennis, A
E. Dennis, A. Kitaev, A. Landahl, and J. Preskill, Jour- nal of Mathematical Physics43, 4452 (2002)
2002
-
[74]
Gelman, J
A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Ru- bin,Bayesian Data Analysis, 1st ed. (Chapman and Hall/CRC, 1995)
1995
-
[75]
J. K. Kruschke,Doing Bayesian Data Analysis: A Tuto- rial with R, JAGS, and Stan, 2nd ed. (Academic Press, 2014)
2014
-
[76]
R. P. A. Simon, Z. Shi, C. Nation, A. Jena, and L. Del- lantonio, “Enhanced measurements on quantum comput- ers via the simultaneous probing of non-commuting pauli operators,” (2025), arXiv:2509.01482 [quant-ph]
-
[77]
R. P. A. Simon, M. Meth, F. Martini, P. Tirler, A. Jena, 11 M. Ringbauer, and L. Dellantonio, “An error-aware and adaptive method for the estimation of quantum ob- servables on qudit-based quantum computers,” (2026), arXiv:2605.00682 [quant-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[78]
Shlosberg, A
A. Shlosberg, A. J. Jena, P. Mukhopadhyay, J. F. Haase, F. Leditzky, and L. Dellantonio, Quantum7, 906 (2023)
2023
-
[79]
Sympleq,
C. Nation, R. P. A. Simon, F. Martini, A. Ricottone, S. Banerjee, F. Cerisola, and L. Dellantonio, “Sympleq,” (2026)
2026
-
[80]
T. Dao, D. Y. Fu, S. Ermon, A. Rudra, and C. R´ e, inProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22 (Cur- ran Associates Inc., Red Hook, NY, USA, 2022)
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.