{"paper":{"title":"LIMO: Less is More for Reasoning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Sophisticated mathematical reasoning emerges in large language models from only a few strategically designed examples.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Ethan Chern, Pengfei Liu, Shijie Xia, Yang Xiao, Yixin Ye, Zhen Huang","submitted_at":"2025-02-05T17:23:45Z","abstract_excerpt":"We challenge the prevailing assumption that complex reasoning in large language models (LLMs) necessitates massive training data. We demonstrate that sophisticated mathematical reasoning can emerge with only a few examples. Specifically, through simple supervised fine-tuning, our model, LIMO, achieves 63.3\\% accuracy on AIME24 and 95.6\\% on MATH500, surpassing previous fine-tuned models (6.5\\% on AIME24, 59.2\\% on MATH500) while using only 1\\% of the training data required by prior approaches. Furthermore, LIMO exhibits strong out-of-distribution generalization, achieving a 45.8\\% absolute imp"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"sophisticated mathematical reasoning can emerge with only a few examples... through minimal but strategically designed demonstrations of cognitive processes.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The foundation model has already comprehensively encoded the relevant domain knowledge during pre-training, and the small set of post-training examples functions as effective cognitive templates.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Sophisticated mathematical reasoning emerges in large language models from only a few strategically designed examples.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"16afdebc6eff494daf1db31257ec9dfe30625c4e5b5eb445ffcaf4cdab9472e7"},"source":{"id":"2502.03387","kind":"arxiv","version":3},"verdict":{"id":"5e2233a9-6c9d-4a39-967b-ea9e5e9f5b15","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T02:06:10.598649Z","strongest_claim":"sophisticated mathematical reasoning can emerge with only a few examples... through minimal but strategically designed demonstrations of cognitive processes.","one_line_summary":"LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The foundation model has already comprehensively encoded the relevant domain knowledge during pre-training, and the small set of post-training examples functions as effective cognitive templates.","pith_extraction_headline":"Sophisticated mathematical reasoning emerges in large language models from only a few strategically designed examples."},"references":{"count":299,"sample":[{"doi":"","year":null,"title":"Attention is All you Need , url =","work_id":"fa4d7178-75f8-4139-9b39-41030a2917db","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"OpenAI o1 System Card , author=. 2024 , eprint=","work_id":"f4926faa-1e08-4b8f-8e4e-7065c75fd9fe","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training , author=. 2025 , eprint=","work_id":"e4119e9f-6591-40a0-b465-9eb7781d4d57","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Benchmarking Benchmark Leakage in Large Language Models , author=. 2024 , eprint=","work_id":"de8590fb-4f72-4557-b70e-22316fbe659b","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"MathPile: A Billion-Token-Scale Pretraining Corpus for Math , author=. 2024 , eprint=","work_id":"9d11acdc-ea9d-4f90-9f36-42a7ab665dac","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":299,"snapshot_sha256":"70fcbfc1f59e5910283da5046d7d3424edeb0767511024fd422a90bdf915597f","internal_anchors":38},"formal_canon":{"evidence_count":2,"snapshot_sha256":"1f15360cc5861bd2ab3938d5626396d8d049005b3998cd46bd6f7f566eab11e3"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}