{"paper":{"title":"How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Transformer weights emerge in closed form as compositions of three basis functions from corpus statistics.","cross_cats":["cs.LG"],"primary_cat":"cs.CL","authors_text":"Changdae Oh, Sharon Li, Shawn Im, Zhen Fang","submitted_at":"2026-01-27T05:22:34Z","abstract_excerpt":"Semantic associations such as the link between \"bird\" and \"flew\" are foundational for language modeling as they enable models to go beyond memorization and instead generalize and generate coherent text. Understanding how these associations are learned and represented in language models is essential for connecting deep learning with linguistic theory and developing a mechanistic foundation for large language models. In this work, we analyze how these associations emerge from natural language data in attention-based language models through the lens of training dynamics. By leveraging a leading-t"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"each set of weights of the transformer has closed-form expressions as simple compositions of three basis functions (bigram, token-interchangeability, and context mappings), reflecting the statistics of the text corpus","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The leading-term approximation of the gradients remains accurate enough in the earliest training phase to determine the functional form of the learned weights and that semantic associations are primarily shaped by these early-stage closed-form expressions rather than later training dynamics.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Transformer weights at early training stages are closed-form compositions of bigram, token-interchangeability, and context mappings that directly reflect text-corpus statistics and explain the emergence of semantic associations.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Transformer weights emerge in closed form as compositions of three basis functions from corpus statistics.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"be8999c46c3f2ac220ffe7704db0df837ae6fe7ca479799de682cc5c8e39867a"},"source":{"id":"2601.19208","kind":"arxiv","version":2},"verdict":{"id":"ba505410-c5f8-4298-9f29-1c41cb74424e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T11:17:41.220828Z","strongest_claim":"each set of weights of the transformer has closed-form expressions as simple compositions of three basis functions (bigram, token-interchangeability, and context mappings), reflecting the statistics of the text corpus","one_line_summary":"Transformer weights at early training stages are closed-form compositions of bigram, token-interchangeability, and context mappings that directly reflect text-corpus statistics and explain the emergence of semantic associations.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The leading-term approximation of the gradients remains accurate enough in the earliest training phase to determine the functional form of the learned weights and that semantic associations are primarily shaped by these early-stage closed-form expressions rather than later training dynamics.","pith_extraction_headline":"Transformer weights emerge in closed form as compositions of three basis functions from corpus statistics."},"references":{"count":37,"sample":[{"doi":"","year":null,"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","ref_index":1,"cited_arxiv_id":"2303.08774","is_internal_anchor":true},{"doi":"","year":2026,"title":"Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901,","work_id":"14563da7-3040-4c2f-bd6e-31ce3e21dcbe","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Sparse Autoencoders Find Highly Interpretable Features in Language Models","work_id":"51960d72-c69f-4db8-8efd-e90e8b4d9524","ref_index":3,"cited_arxiv_id":"2309.08600","is_internal_anchor":true},{"doi":"","year":null,"title":"Computational-statistical gaps in gaussian single-index models","work_id":"216db4ca-4839-4aa1-bb34-55721add3445","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"How two-layer neural networks learn, one (giant) step at a time.arXiv preprint arXiv:2305.18270,","work_id":"f7b71d5d-aead-488e-b0cb-38f05965355d","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":37,"snapshot_sha256":"51c2dc45c5b90063028218d1e298305df2b78002c24d75b5851f57fd119331e7","internal_anchors":9},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}