{"paper":{"title":"Automatic Chain of Thought Prompting in Large Language Models","license":"http://creativecommons.org/licenses/by-sa/4.0/","headline":"Auto-CoT lets large language models build their own chain-of-thought demonstrations by sampling diverse questions.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Alex Smola, Aston Zhang, Mu Li, Zhuosheng Zhang","submitted_at":"2022-10-07T12:28:21Z","abstract_excerpt":"Large language models (LLMs) can perform complex reasoning by generating intermediate reasoning steps. Providing these steps for prompting demonstrations is called chain-of-thought (CoT) prompting. CoT prompting has two major paradigms. One leverages a simple prompt like \"Let's think step by step\" to facilitate step-by-step thinking before answering a question. The other uses a few manual demonstrations one by one, each composed of a question and a reasoning chain that leads to an answer. The superior performance of the second paradigm hinges on the hand-crafting of task-specific demonstration"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"On ten public benchmark reasoning tasks with GPT-3, Auto-CoT consistently matches or exceeds the performance of the CoT paradigm that requires manual designs of demonstrations.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That sampling questions for diversity sufficiently mitigates the impact of occasional errors in the automatically generated reasoning chains, so that the constructed demonstrations remain effective overall.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Auto-CoT automatically builds chain-of-thought demonstrations by sampling diverse questions and letting the LLM generate reasoning chains, matching manual CoT performance on ten reasoning tasks with GPT-3.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Auto-CoT lets large language models build their own chain-of-thought demonstrations by sampling diverse questions.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"cea6e5c124e2dcbc535f9263c4e4a453347065e437d52df520d603ce4583d9c6"},"source":{"id":"2210.03493","kind":"arxiv","version":1},"verdict":{"id":"e1a81ef3-4178-4b0a-84b8-815962e157cb","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T10:36:28.283141Z","strongest_claim":"On ten public benchmark reasoning tasks with GPT-3, Auto-CoT consistently matches or exceeds the performance of the CoT paradigm that requires manual designs of demonstrations.","one_line_summary":"Auto-CoT automatically builds chain-of-thought demonstrations by sampling diverse questions and letting the LLM generate reasoning chains, matching manual CoT performance on ten reasoning tasks with GPT-3.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That sampling questions for diversity sufficiently mitigates the impact of occasional errors in the automatically generated reasoning chains, so that the constructed demonstrations remain effective overall.","pith_extraction_headline":"Auto-CoT lets large language models build their own chain-of-thought demonstrations by sampling diverse questions."},"references":{"count":32,"sample":[{"doi":"","year":2020,"title":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretch","work_id":"c8c76fe6-9f18-49cf-b7c5-eb092bf80d60","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"URL https://proceedings.neurips.cc/paper/2020/hash/ 1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html. Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, A","work_id":"546635ca-ac36-4775-b6e0-1a7e0066ad49","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"LaMDA: Language Models for Dialog Applications","work_id":"1b66d0a5-f6ae-4332-8025-c662dc64b238","ref_index":3,"cited_arxiv_id":"2201.08239","is_internal_anchor":true},{"doi":"","year":2022,"title":"PaLM: Scaling Language Modeling with Pathways","work_id":"a94f3ef7-2c49-4445-93fe-6ec16aafd966","ref_index":4,"cited_arxiv_id":"2204.02311","is_internal_anchor":true},{"doi":"","year":2015,"title":"Large Language Models are Zero-Shot Reasoners","work_id":"d9b7eb1a-7165-46ff-9f06-d2f0b9d6f95d","ref_index":5,"cited_arxiv_id":"2205.11916","is_internal_anchor":true}],"resolved_work":32,"snapshot_sha256":"323120ad02d5c17afc6ed77ac6b4421920cf17d34a81eee88a9da8fecae6eeaa","internal_anchors":9},"formal_canon":{"evidence_count":2,"snapshot_sha256":"ec72543635480f50d3fd7491dbea7a0e7fd313f0a45881f501daffe1ce5fda2c"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}