{"paper":{"title":"Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Symmetries in next-token targets transfer exactly to circulant logit matrices and equiangular structures in LLM weights and embeddings.","cross_cats":["cs.AI","stat.ML"],"primary_cat":"math.OC","authors_text":"Hangfeng He, Weijie Su, Zhehang Du","submitted_at":"2026-05-12T21:10:34Z","abstract_excerpt":"Large language models (LLMs) are pretrained by minimizing the cross-entropy loss for next-token prediction. In this paper, we study whether this optimization strategy can induce geometric structure in the learned model weights and context embeddings. We approach this problem by analyzing a constrained layer-peeled optimization program, which serves as a mathematically tractable surrogate for LLMs by treating the output projection matrix and last-layer context embeddings as optimization variables. Our analysis of this nonconvex optimization program demonstrates that symmetries in the target nex"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"we prove that when the target tokens exhibit a cyclic-shift symmetry (such as the seven days of the week or the twelve months of the year), the optimal logit matrix is exactly circulant, and the Gram matrices of both the output projections and the context embeddings form circulant geometries as well.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The constrained layer-peeled optimization program serves as a mathematically tractable surrogate for LLMs by treating the output projection matrix and last-layer context embeddings as optimization variables.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Symmetries in next-token prediction targets induce corresponding geometric symmetries such as circulant matrices and equiangular tight frames in the optimal weights and embeddings of a layer-peeled LLM surrogate model.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Symmetries in next-token targets transfer exactly to circulant logit matrices and equiangular structures in LLM weights and embeddings.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"facd33427a18ea1575b8c9f009e3b1ac121e4415b2f4efa0d6bf9505624253e6"},"source":{"id":"2605.12756","kind":"arxiv","version":1},"verdict":{"id":"d672c147-c740-46c6-b093-fb5e5b851857","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T20:07:32.146525Z","strongest_claim":"we prove that when the target tokens exhibit a cyclic-shift symmetry (such as the seven days of the week or the twelve months of the year), the optimal logit matrix is exactly circulant, and the Gram matrices of both the output projections and the context embeddings form circulant geometries as well.","one_line_summary":"Symmetries in next-token prediction targets induce corresponding geometric symmetries such as circulant matrices and equiangular tight frames in the optimal weights and embeddings of a layer-peeled LLM surrogate model.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The constrained layer-peeled optimization program serves as a mathematically tractable surrogate for LLMs by treating the output projection matrix and last-layer context embeddings as optimization variables.","pith_extraction_headline":"Symmetries in next-token targets transfer exactly to circulant logit matrices and equiangular structures in LLM weights and embeddings."},"references":{"count":98,"sample":[{"doi":"","year":2021,"title":"Expositiones Mathematicae , volume =","work_id":"4b7edc29-e13a-4a2f-a033-91b56e1266b9","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2012,"title":"and Johnson, Charles R","work_id":"5035bbd0-c065-480a-a959-32a4356b81d7","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1007/978-1-4684-9458-7","year":1977,"title":"42; Springer: New York","work_id":"c64a6595-0803-4cc3-8efd-093d4df7517a","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is All you Need , booktitle =. 2017 , publisher =","work_id":"0ebb2d3e-1d06-4ced-a9c6-d8d1efe73930","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Papyan, Vardan and Han, X. Y. and Donoho, David L. , title =. Proceedings of the National Academy of Sciences , volume =. 2020 , doi =","work_id":"b8a58faa-8ff4-4caa-aec9-7f7975b434e2","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":98,"snapshot_sha256":"8e5aff20a41eeff37443de8fb2ef60e9a790cbf0d9a94d812ba0bc178941592a","internal_anchors":5},"formal_canon":{"evidence_count":2,"snapshot_sha256":"8063848be7dbd6b7db200ceead2c506dd4d9092b07669491598fd01a87cc86c8"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}