{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2023:FSM4SJL7JGJDTZTHH5WRBQZ3YN","short_pith_number":"pith:FSM4SJL7","schema_version":"1.0","canonical_sha256":"2c99c9257f499239e6673f6d10c33bc37c8b61b0eb06d05048a5b7445fe822e0","source":{"kind":"arxiv","id":"2305.14992","version":2},"attestation_state":"computed","paper":{"title":"Reasoning with Language Model is Planning with World Model","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Language models can reason better by using themselves as world models and planning with tree search.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.CL","authors_text":"Daisy Zhe Wang, Haodi Ma, Joshua Jiahua Hong, Shibo Hao, Yi Gu, Zhen Wang, Zhiting Hu","submitted_at":"2023-05-24T10:28:28Z","abstract_excerpt":"Large language models (LLMs) have shown remarkable reasoning capabilities, especially when prompted to generate intermediate reasoning steps (e.g., Chain-of-Thought, CoT). However, LLMs can still struggle with problems that are easy for humans, such as generating action plans for executing tasks in a given environment, or performing complex math, logical, and commonsense reasoning. The deficiency stems from the key fact that LLMs lack an internal $\\textit{world model}$ to predict the world $\\textit{state}$ (e.g., environment status, intermediate variable values) and simulate long-term outcomes"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2305.14992","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2023-05-24T10:28:28Z","cross_cats_sorted":["cs.AI","cs.LG"],"title_canon_sha256":"0f69912f03a536037278f3398eb9a7514173bb9f5528928c09371ee69d22f828","abstract_canon_sha256":"bcb89edfc5c0269c1257a96a4849fdbac48a9efca4fe11252d16014928c28beb"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:45.932631Z","signature_b64":"JENwz5xbcQmAbrAYpk5AUmimyxtvbPesOYB5rcbZzvS2B4O0NWdIR33f7xxUhy0G0VbNVwfhKWMbj61eIau9AQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"2c99c9257f499239e6673f6d10c33bc37c8b61b0eb06d05048a5b7445fe822e0","last_reissued_at":"2026-05-17T23:38:45.931891Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:45.931891Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Reasoning with Language Model is Planning with World Model","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Language models can reason better by using themselves as world models and planning with tree search.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.CL","authors_text":"Daisy Zhe Wang, Haodi Ma, Joshua Jiahua Hong, Shibo Hao, Yi Gu, Zhen Wang, Zhiting Hu","submitted_at":"2023-05-24T10:28:28Z","abstract_excerpt":"Large language models (LLMs) have shown remarkable reasoning capabilities, especially when prompted to generate intermediate reasoning steps (e.g., Chain-of-Thought, CoT). However, LLMs can still struggle with problems that are easy for humans, such as generating action plans for executing tasks in a given environment, or performing complex math, logical, and commonsense reasoning. The deficiency stems from the key fact that LLMs lack an internal $\\textit{world model}$ to predict the world $\\textit{state}$ (e.g., environment status, intermediate variable values) and simulate long-term outcomes"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"RAP on LLAMA-33B surpasses CoT on GPT-4 with 33% relative improvement in a plan generation setting.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the LLM, when prompted to act as world model, produces sufficiently accurate state predictions and transition simulations to guide search without compounding errors that invalidate the planning process.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"RAP turns LLMs into dual world-model and planning agents via MCTS to generate better reasoning paths, outperforming CoT baselines and achieving 33% relative gains over GPT-4 CoT using LLaMA-33B on plan generation.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Language models can reason better by using themselves as world models and planning with tree search.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"60587927e399c3a0400a8a9f486237cbcb4f99a44f4a472378d82856ff0484f2"},"source":{"id":"2305.14992","kind":"arxiv","version":2},"verdict":{"id":"c1417558-31a0-4e87-9ca9-aad60b332789","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T01:45:01.211709Z","strongest_claim":"RAP on LLAMA-33B surpasses CoT on GPT-4 with 33% relative improvement in a plan generation setting.","one_line_summary":"RAP turns LLMs into dual world-model and planning agents via MCTS to generate better reasoning paths, outperforming CoT baselines and achieving 33% relative gains over GPT-4 CoT using LLaMA-33B on plan generation.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the LLM, when prompted to act as world model, produces sufficiently accurate state predictions and transition simulations to guide search without compounding errors that invalidate the planning process.","pith_extraction_headline":"Language models can reason better by using themselves as world models and planning with tree search."},"references":{"count":134,"sample":[{"doi":"","year":1992,"title":"Alan Baddeley. 1992. Working memory. Science, 255(5044):556--559","work_id":"f36b7e9d-fb69-46ab-9df5-b0ea2ea4b066","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2011,"title":"Robert Eamon Briscoe. 2011. Mental imagery and the varieties of amodal perception. Pacific Philosophical Quarterly, 92(2):153--173","work_id":"6f8e00d0-b211-4677-8661-131bbfc2b45e","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot lear","work_id":"50684699-ce18-4086-8bac-7cecd178fad0","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1994,"title":"Tom Bylander. 1994. The computational complexity of propositional strips planning. Artificial Intelligence, 69(1-2):165--204","work_id":"3be5a615-51c1-49c4-ab86-dff31f715a8e","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2013,"title":"Eduardo F Camacho and Carlos Bordons Alba. 2013. Model predictive control. Springer science & business media","work_id":"775ab905-fd2a-4ddd-821e-0fc8324fe7ed","ref_index":6,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":134,"snapshot_sha256":"03cd7f9af3bd37b779e1d8c431df627824a113282ccd5043b090d5cdbc1ce01f","internal_anchors":31},"formal_canon":{"evidence_count":2,"snapshot_sha256":"65afd9f30e6abf4560f14233598d202f80622ee4c95e7e11b7b50b9f26131303"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2305.14992","created_at":"2026-05-17T23:38:45.931967+00:00"},{"alias_kind":"arxiv_version","alias_value":"2305.14992v2","created_at":"2026-05-17T23:38:45.931967+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2305.14992","created_at":"2026-05-17T23:38:45.931967+00:00"},{"alias_kind":"pith_short_12","alias_value":"FSM4SJL7JGJD","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"FSM4SJL7JGJDTZTH","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"FSM4SJL7","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":30,"internal_anchor_count":30,"sample":[{"citing_arxiv_id":"2502.02871","citing_title":"Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning","ref_index":57,"is_internal_anchor":true},{"citing_arxiv_id":"2504.09775","citing_title":"MIST: A Co-Design Framework for Heterogeneous, Multi-Stage LLM Inference","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2507.21035","citing_title":"GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2510.05746","citing_title":"ARM: Discovering Agentic Reasoning Modules for Generalizable Multi-Agent Systems","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2503.11926","citing_title":"Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation","ref_index":37,"is_internal_anchor":true},{"citing_arxiv_id":"2307.06435","citing_title":"A Comprehensive Overview of Large Language Models","ref_index":233,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14892","citing_title":"Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems","ref_index":137,"is_internal_anchor":true},{"citing_arxiv_id":"2506.00886","citing_title":"Position: Agent Should Invoke External Tools ONLY When Epistemically Necessary","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2507.00432","citing_title":"Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning","ref_index":111,"is_internal_anchor":true},{"citing_arxiv_id":"2501.00309","citing_title":"Retrieval-Augmented Generation with Graphs (GraphRAG)","ref_index":142,"is_internal_anchor":true},{"citing_arxiv_id":"2402.13116","citing_title":"A Survey on Knowledge Distillation of Large Language Models","ref_index":252,"is_internal_anchor":true},{"citing_arxiv_id":"2601.12538","citing_title":"Agentic Reasoning for Large Language Models","ref_index":112,"is_internal_anchor":true},{"citing_arxiv_id":"2504.15965","citing_title":"From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs","ref_index":94,"is_internal_anchor":true},{"citing_arxiv_id":"2309.02427","citing_title":"Cognitive Architectures for Language Agents","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2605.00827","citing_title":"Separating Intelligence from Execution: A Workflow Engine for the Model Context Protocol","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2604.04944","citing_title":"Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2308.11432","citing_title":"A Survey on Large Language Model based Autonomous Agents","ref_index":57,"is_internal_anchor":true},{"citing_arxiv_id":"2312.13010","citing_title":"AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14892","citing_title":"Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems","ref_index":136,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13165","citing_title":"STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2406.06592","citing_title":"Improve Mathematical Reasoning in Language Models by Automated Process Supervision","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2406.00515","citing_title":"A Survey on Large Language Models for Code Generation","ref_index":93,"is_internal_anchor":true},{"citing_arxiv_id":"2402.02716","citing_title":"Understanding the planning of LLM agents: A survey","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08221","citing_title":"NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2305.10601","citing_title":"Tree of Thoughts: Deliberate Problem Solving with Large Language Models","ref_index":9,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/FSM4SJL7JGJDTZTHH5WRBQZ3YN","json":"https://pith.science/pith/FSM4SJL7JGJDTZTHH5WRBQZ3YN.json","graph_json":"https://pith.science/api/pith-number/FSM4SJL7JGJDTZTHH5WRBQZ3YN/graph.json","events_json":"https://pith.science/api/pith-number/FSM4SJL7JGJDTZTHH5WRBQZ3YN/events.json","paper":"https://pith.science/paper/FSM4SJL7"},"agent_actions":{"view_html":"https://pith.science/pith/FSM4SJL7JGJDTZTHH5WRBQZ3YN","download_json":"https://pith.science/pith/FSM4SJL7JGJDTZTHH5WRBQZ3YN.json","view_paper":"https://pith.science/paper/FSM4SJL7","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2305.14992&json=true","fetch_graph":"https://pith.science/api/pith-number/FSM4SJL7JGJDTZTHH5WRBQZ3YN/graph.json","fetch_events":"https://pith.science/api/pith-number/FSM4SJL7JGJDTZTHH5WRBQZ3YN/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/FSM4SJL7JGJDTZTHH5WRBQZ3YN/action/timestamp_anchor","attest_storage":"https://pith.science/pith/FSM4SJL7JGJDTZTHH5WRBQZ3YN/action/storage_attestation","attest_author":"https://pith.science/pith/FSM4SJL7JGJDTZTHH5WRBQZ3YN/action/author_attestation","sign_citation":"https://pith.science/pith/FSM4SJL7JGJDTZTHH5WRBQZ3YN/action/citation_signature","submit_replication":"https://pith.science/pith/FSM4SJL7JGJDTZTHH5WRBQZ3YN/action/replication_record"}},"created_at":"2026-05-17T23:38:45.931967+00:00","updated_at":"2026-05-17T23:38:45.931967+00:00"}