{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2022:JIDEKWROCPKMO3IDKLDMVYQ5HZ","short_pith_number":"pith:JIDEKWRO","schema_version":"1.0","canonical_sha256":"4a06455a2e13d4c76d0352c6cae21d3e4ca1b18a921a1e557ad74cef254acadb","source":{"kind":"arxiv","id":"2211.10435","version":2},"attestation_state":"computed","paper":{"title":"PAL: Program-aided Language Models","license":"http://creativecommons.org/publicdomain/zero/1.0/","headline":"LLMs generate programs as reasoning steps and let a Python interpreter execute them to solve math and symbolic problems more accurately than much larger models using chain-of-thought.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Aman Madaan, Graham Neubig, Jamie Callan, Luyu Gao, Pengfei Liu, Shuyan Zhou, Uri Alon, Yiming Yang","submitted_at":"2022-11-18T18:56:13Z","abstract_excerpt":"Large language models (LLMs) have recently demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks, when provided with a few examples at test time (\"few-shot prompting\"). Much of this success can be attributed to prompting methods such as \"chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is dec"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2211.10435","kind":"arxiv","version":2},"metadata":{"license":"http://creativecommons.org/publicdomain/zero/1.0/","primary_cat":"cs.CL","submitted_at":"2022-11-18T18:56:13Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"e6a3e4ab69f3f371b2c5bde4a1bf6463ca13fe3fbea0168f2641acc77ba1c924","abstract_canon_sha256":"abc4d6b463b55fea9aaebb576a3312e727cdfd12b19ac275fdcb5e59210a2932"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:53.445277Z","signature_b64":"zUQQs2nlFxWuKSaOjrhWAUDlknzTLeYyRbj470L3Hj3vzu4YuwHmSkR3XUUXuYXqtHDoNDtQxAsq5vtmKFSxDw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"4a06455a2e13d4c76d0352c6cae21d3e4ca1b18a921a1e557ad74cef254acadb","last_reissued_at":"2026-05-17T23:38:53.444657Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:53.444657Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"PAL: Program-aided Language Models","license":"http://creativecommons.org/publicdomain/zero/1.0/","headline":"LLMs generate programs as reasoning steps and let a Python interpreter execute them to solve math and symbolic problems more accurately than much larger models using chain-of-thought.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Aman Madaan, Graham Neubig, Jamie Callan, Luyu Gao, Pengfei Liu, Shuyan Zhou, Uri Alon, Yiming Yang","submitted_at":"2022-11-18T18:56:13Z","abstract_excerpt":"Large language models (LLMs) have recently demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks, when provided with a few examples at test time (\"few-shot prompting\"). Much of this success can be attributed to prompting methods such as \"chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is dec"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"PAL using Codex achieves state-of-the-art few-shot accuracy on the GSM8K benchmark of math word problems, surpassing PaLM-540B which uses chain-of-thought by absolute 15% top-1.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the LLM will reliably generate correct, executable programs whose logic matches the intended reasoning without introducing its own coding or planning errors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"PAL improves few-shot reasoning accuracy by having LLMs generate executable programs rather than text-based chains of thought, outperforming much larger models on math and logic benchmarks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"LLMs generate programs as reasoning steps and let a Python interpreter execute them to solve math and symbolic problems more accurately than much larger models using chain-of-thought.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c40e8ad7960f510d2b8f7cbb245a628c3a1fb13bbd99eccde6c19a821ec691f3"},"source":{"id":"2211.10435","kind":"arxiv","version":2},"verdict":{"id":"0a44b9dd-a9d7-4379-8e62-79055d9e279b","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T04:58:42.378977Z","strongest_claim":"PAL using Codex achieves state-of-the-art few-shot accuracy on the GSM8K benchmark of math word problems, surpassing PaLM-540B which uses chain-of-thought by absolute 15% top-1.","one_line_summary":"PAL improves few-shot reasoning accuracy by having LLMs generate executable programs rather than text-based chains of thought, outperforming much larger models on math and logic benchmarks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the LLM will reliably generate correct, executable programs whose logic matches the intended reasoning without introducing its own coding or planning errors.","pith_extraction_headline":"LLMs generate programs as reasoning steps and let a Python interpreter execute them to solve math and symbolic problems more accurately than much larger models using chain-of-thought."},"references":{"count":44,"sample":[{"doi":"","year":2022,"title":"Do As I Can, Not As I Say: Grounding Language in Robotic Affordances","work_id":"037320f1-b0a9-4cbe-a639-bfb25409ce71","ref_index":1,"cited_arxiv_id":"2204.01691","is_internal_anchor":true},{"doi":"","year":2019,"title":"https://aclanthology.org/N19-1245 M ath QA : Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms","work_id":"95ff4a33-2a6e-4326-9caa-ac6d568e3241","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1909,"title":"Giving bert a calculator: Finding operations and arguments with reading comprehension","work_id":"ab124ee4-c511-4a61-8482-bad3e558fe10","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert - Voss, A., Krueger, G., Henighan, T., Child, R., Ram","work_id":"96eceee9-e1b2-4c6f-9f77-b5dc792fb8eb","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","ref_index":6,"cited_arxiv_id":"2107.03374","is_internal_anchor":true}],"resolved_work":44,"snapshot_sha256":"827a1b1a9cdfb4ec9e24c12d0324d53ae30956e876a8ac611fe71a5ff83db1be","internal_anchors":18},"formal_canon":{"evidence_count":2,"snapshot_sha256":"66d2c61de709373853cae67ddcdc89d3e34752de848492c973d8eabef145e072"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2211.10435","created_at":"2026-05-17T23:38:53.444757+00:00"},{"alias_kind":"arxiv_version","alias_value":"2211.10435v2","created_at":"2026-05-17T23:38:53.444757+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2211.10435","created_at":"2026-05-17T23:38:53.444757+00:00"},{"alias_kind":"pith_short_12","alias_value":"JIDEKWROCPKM","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"JIDEKWROCPKMO3ID","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"JIDEKWRO","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":35,"internal_anchor_count":13,"sample":[{"citing_arxiv_id":"2410.13181","citing_title":"AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17958","citing_title":"Enhancing the Code Reasoning Capabilities of LLMs via Consistency-based Reinforcement Learning","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2403.17134","citing_title":"RepairAgent: An Autonomous, LLM-Based Agent for Program Repair","ref_index":74,"is_internal_anchor":true},{"citing_arxiv_id":"2309.17452","citing_title":"ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2310.10631","citing_title":"Llemma: An Open Language Model For Mathematics","ref_index":142,"is_internal_anchor":true},{"citing_arxiv_id":"2502.10248","citing_title":"Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model","ref_index":259,"is_internal_anchor":true},{"citing_arxiv_id":"2509.20823","citing_title":"CaTS-Bench: Can Language Models Describe Time Series?","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2303.08128","citing_title":"ViperGPT: Visual Inference via Python Execution for Reasoning","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2511.21104","citing_title":"BRIDGE: Building Representations In Domain Guided Program Synthesis","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2303.09014","citing_title":"ART: Automatic multi-step reasoning and tool-use for large language models","ref_index":143,"is_internal_anchor":true},{"citing_arxiv_id":"2306.13549","citing_title":"A Survey on Multimodal Large Language Models","ref_index":194,"is_internal_anchor":true},{"citing_arxiv_id":"2305.18323","citing_title":"ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14126","citing_title":"Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14186","citing_title":"LLMs Know When They Know, but Do Not Act on It: A Metacognitive Harness for Test-time Scaling","ref_index":7,"is_internal_anchor":false},{"citing_arxiv_id":"2304.11477","citing_title":"LLM+P: Empowering Large Language Models with Optimal Planning Proficiency","ref_index":64,"is_internal_anchor":false},{"citing_arxiv_id":"2303.11381","citing_title":"MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action","ref_index":11,"is_internal_anchor":false},{"citing_arxiv_id":"2303.17580","citing_title":"HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face","ref_index":17,"is_internal_anchor":false},{"citing_arxiv_id":"2604.00149","citing_title":"Towards Verifiable and Self-Correcting AI Physicists for Quantum Many-Body Simulations","ref_index":22,"is_internal_anchor":false},{"citing_arxiv_id":"2504.11536","citing_title":"ReTool: Reinforcement Learning for Strategic Tool Use in LLMs","ref_index":5,"is_internal_anchor":false},{"citing_arxiv_id":"2605.11202","citing_title":"Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing","ref_index":13,"is_internal_anchor":false},{"citing_arxiv_id":"2211.12588","citing_title":"Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks","ref_index":11,"is_internal_anchor":false},{"citing_arxiv_id":"2605.03862","citing_title":"Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards","ref_index":7,"is_internal_anchor":false},{"citing_arxiv_id":"2605.08212","citing_title":"LLMs with in-context learning for Algorithmic Theoretical Physics","ref_index":13,"is_internal_anchor":false},{"citing_arxiv_id":"2304.05128","citing_title":"Teaching Large Language Models to Self-Debug","ref_index":92,"is_internal_anchor":false},{"citing_arxiv_id":"2605.07237","citing_title":"Teaching Language Models to Think in Code","ref_index":5,"is_internal_anchor":false}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/JIDEKWROCPKMO3IDKLDMVYQ5HZ","json":"https://pith.science/pith/JIDEKWROCPKMO3IDKLDMVYQ5HZ.json","graph_json":"https://pith.science/api/pith-number/JIDEKWROCPKMO3IDKLDMVYQ5HZ/graph.json","events_json":"https://pith.science/api/pith-number/JIDEKWROCPKMO3IDKLDMVYQ5HZ/events.json","paper":"https://pith.science/paper/JIDEKWRO"},"agent_actions":{"view_html":"https://pith.science/pith/JIDEKWROCPKMO3IDKLDMVYQ5HZ","download_json":"https://pith.science/pith/JIDEKWROCPKMO3IDKLDMVYQ5HZ.json","view_paper":"https://pith.science/paper/JIDEKWRO","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2211.10435&json=true","fetch_graph":"https://pith.science/api/pith-number/JIDEKWROCPKMO3IDKLDMVYQ5HZ/graph.json","fetch_events":"https://pith.science/api/pith-number/JIDEKWROCPKMO3IDKLDMVYQ5HZ/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/JIDEKWROCPKMO3IDKLDMVYQ5HZ/action/timestamp_anchor","attest_storage":"https://pith.science/pith/JIDEKWROCPKMO3IDKLDMVYQ5HZ/action/storage_attestation","attest_author":"https://pith.science/pith/JIDEKWROCPKMO3IDKLDMVYQ5HZ/action/author_attestation","sign_citation":"https://pith.science/pith/JIDEKWROCPKMO3IDKLDMVYQ5HZ/action/citation_signature","submit_replication":"https://pith.science/pith/JIDEKWROCPKMO3IDKLDMVYQ5HZ/action/replication_record"}},"created_at":"2026-05-17T23:38:53.444757+00:00","updated_at":"2026-05-17T23:38:53.444757+00:00"}