arxiv: 2305.14992 · v2 · submitted 2023-05-24 · 💻 cs.CL · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Reasoning with Language Model is Planning with World Model

Shibo Hao , Yi Gu , Haodi Ma , Joshua Jiahua Hong , Zhen Wang , Daisy Zhe Wang , Zhiting Hu

Authors on Pith no claims yet

Pith reviewed 2026-05-17 01:45 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords large language modelsreasoningplanningworld modelmonte carlo tree searchchain of thoughtaction planning

0 comments

The pith

Language models can reason better by using themselves as world models and planning with tree search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that language models struggle with planning because they lack an internal world model to predict how states evolve after actions or to simulate long-term outcomes. To address this, the authors introduce a framework that prompts the same model to act as a world simulator while also serving as a reasoning agent that explores paths via Monte Carlo Tree Search. The search balances exploration of alternatives against exploitation of promising steps, guided by task rewards and simulated states. This produces stronger results than chain-of-thought prompting on plan generation, math problems, and logical inference. One reported outcome is that the method on a 33-billion-parameter model yields a 33 percent relative gain over chain-of-thought on a larger model for generating action plans.

Core claim

Reasoning with a language model is equivalent to planning with a world model; by repurposing the model to predict next states and rewards and embedding it inside a Monte Carlo Tree Search procedure, the system can systematically explore and refine reasoning sequences to reach higher-reward solutions for complex tasks.

What carries the argument

RAP (Reasoning via Planning), which has the language model simulate state transitions as a world model and build a reasoning tree as an agent under the direction of Monte Carlo Tree Search and task-specific rewards.

If this is right

RAP produces higher-quality action plans and solutions than chain-of-thought or least-to-most prompting with self-consistency on plan generation, math reasoning, and logical inference.
The model can explore alternative reasoning paths and anticipate future states instead of committing to a single linear chain.
Task-specific rewards combined with simulated outcomes allow efficient search that balances exploration and exploitation.
The same model size can achieve better performance than larger models when the planning mechanism is added.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could extend to interactive settings such as game playing or robotic control where accurate internal simulation reduces reliance on external feedback.
Combining the method with external tools or fine-tuning for better state prediction might further limit compounding errors over long horizons.
Similar tree-search structures might improve other generative tasks that benefit from lookahead, such as code synthesis or multi-turn dialogue planning.

Load-bearing premise

The language model's predictions of future states and action outcomes must remain accurate enough that simulation errors do not accumulate and invalidate the planning search.

What would settle it

A controlled test on a multi-step math or planning task in which the model's state predictions diverge from ground truth after only a few steps, causing the search to select a low-quality or invalid reasoning path that standard prompting would have avoided.

read the original abstract

Large language models (LLMs) have shown remarkable reasoning capabilities, especially when prompted to generate intermediate reasoning steps (e.g., Chain-of-Thought, CoT). However, LLMs can still struggle with problems that are easy for humans, such as generating action plans for executing tasks in a given environment, or performing complex math, logical, and commonsense reasoning. The deficiency stems from the key fact that LLMs lack an internal $\textit{world model}$ to predict the world $\textit{state}$ (e.g., environment status, intermediate variable values) and simulate long-term outcomes of actions. This prevents LLMs from performing deliberate planning akin to human brains, which involves exploring alternative reasoning paths, anticipating future states and rewards, and iteratively refining existing reasoning steps. To overcome the limitations, we propose a new LLM reasoning framework, $\underline{R}$easoning vi$\underline{a}$ $\underline{P}$lanning $\textbf{(RAP)}$. RAP repurposes the LLM as both a world model and a reasoning agent, and incorporates a principled planning algorithm (based on Monto Carlo Tree Search) for strategic exploration in the vast reasoning space. During reasoning, the LLM (as agent) incrementally builds a reasoning tree under the guidance of the LLM (as world model) and task-specific rewards, and obtains a high-reward reasoning path efficiently with a proper balance between exploration $\textit{vs.}$ exploitation. We apply RAP to a variety of challenging reasoning problems including plan generation, math reasoning, and logical inference. Empirical results on these tasks demonstrate the superiority of RAP over various strong baselines, including CoT and least-to-most prompting with self-consistency. RAP on LLAMA-33B surpasses CoT on GPT-4 with 33% relative improvement in a plan generation setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Reasoning via Planning (RAP), a framework that repurposes an LLM as both a reasoning agent and a world model, then integrates it with Monte Carlo Tree Search (MCTS) to explore reasoning paths guided by simulated state transitions and task-specific rewards. It evaluates the approach on plan generation, mathematical reasoning, and logical inference tasks, reporting that RAP instantiated with LLaMA-33B outperforms Chain-of-Thought prompting with GPT-4 by 33% relative improvement on plan generation.

Significance. If the LLM-as-world-model component produces sufficiently accurate long-horizon state predictions, the work would demonstrate a concrete way to augment LLM reasoning with explicit planning, potentially improving performance on tasks requiring anticipation of future states. The use of a standard, off-the-shelf planning algorithm (MCTS) with separable reward signals is a methodological strength that keeps the contribution focused on the LLM simulation interface rather than algorithmic novelty.

major comments (2)

[Abstract and §3] Abstract and §3 (RAP framework): the central claim that RAP enables 'deliberate planning' and yields the reported gains rests on the untested assumption that the LLM, when prompted as world model, produces next-state predictions accurate enough to guide MCTS without compounding errors. No quantitative measurement of world-model fidelity (e.g., next-state prediction accuracy or rollout error against ground-truth transitions on the evaluation tasks) is provided, which is load-bearing for interpreting the 33% relative improvement as evidence of principled planning rather than noisy search.
[Experimental results] Experimental results section (plan-generation setting): the headline comparison (RAP on LLaMA-33B vs. CoT on GPT-4) reports no error bars, confidence intervals, or details on experimental controls such as prompt formatting, decoding parameters, or number of MCTS simulations. Without these, it is impossible to determine whether the observed difference is robust or sensitive to implementation choices.

minor comments (1)

[§3.2] Notation for the world-model prompt template is introduced without a clear example or pseudocode, making it difficult to reproduce the exact simulation interface used in the MCTS rollouts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments on the RAP framework and experimental reporting. We address each major point below with proposed revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (RAP framework): the central claim that RAP enables 'deliberate planning' and yields the reported gains rests on the untested assumption that the LLM, when prompted as world model, produces next-state predictions accurate enough to guide MCTS without compounding errors. No quantitative measurement of world-model fidelity (e.g., next-state prediction accuracy or rollout error against ground-truth transitions on the evaluation tasks) is provided, which is load-bearing for interpreting the 33% relative improvement as evidence of principled planning rather than noisy search.

Authors: We agree that explicit quantification of world-model accuracy would aid interpretation. In the original manuscript, we prioritized end-task performance as the primary evidence, since ground-truth state transitions are not explicitly annotated in the plan-generation and logical-inference benchmarks. The consistent gains over strong baselines (including GPT-4 CoT) and the use of task-specific rewards provide indirect support that the simulated transitions are useful. In revision we will add a new subsection in §3 discussing potential error accumulation, include qualitative rollout examples in the appendix, and report a simple next-state prediction accuracy metric on the math-reasoning tasks where intermediate variables offer clearer ground truth. revision: partial
Referee: [Experimental results] Experimental results section (plan-generation setting): the headline comparison (RAP on LLaMA-33B vs. CoT on GPT-4) reports no error bars, confidence intervals, or details on experimental controls such as prompt formatting, decoding parameters, or number of MCTS simulations. Without these, it is impossible to determine whether the observed difference is robust or sensitive to implementation choices.

Authors: We accept this criticism. The revised manuscript will report standard deviations across three random seeds for the plan-generation results, include 95% confidence intervals, and add a dedicated “Implementation Details” paragraph specifying the number of MCTS simulations (100), prompt templates, decoding parameters (temperature 0.7, top-p 0.9), and stopping criteria. These additions will appear in the experimental setup and results sections. revision: yes

Circularity Check

0 steps flagged

No circularity: RAP framework is a procedural combination of standard MCTS with LLM prompting, independent of its inputs

full rationale

The paper introduces RAP as an algorithmic framework that repurposes an LLM for both agent and world-model roles inside a Monte Carlo Tree Search loop, with task-specific rewards. No equations or derivations reduce a claimed prediction back to a fitted parameter or self-citation by construction. The planning procedure, tree expansion, and selection steps are described as standard MCTS operations applied to LLM-generated text; they do not presuppose the final performance numbers. Empirical results on plan generation, math, and logic tasks are presented as external measurements rather than tautological outputs. The design choice to use the same LLM for simulation is separable from the algorithmic contribution and does not create a self-definitional loop. This is the common case of an honest non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that an LLM can be prompted to produce usable state predictions and transition functions; no free parameters or invented entities are declared in the abstract.

axioms (1)

domain assumption An LLM prompted as world model yields sufficiently accurate state predictions and action outcomes for planning guidance
Invoked as the core justification for repurposing the LLM; appears in the problem statement and method description.

pith-pipeline@v0.9.0 · 5643 in / 1182 out tokens · 64493 ms · 2026-05-17T01:45:01.211709+00:00 · methodology

discussion (0)

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models
cs.CL 2023-05 accept novelty 8.0

Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning
cs.CL 2026-04 unverdicted novelty 7.0

CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.
Training Large Language Models to Reason in a Continuous Latent Space
cs.CL 2024-12 unverdicted novelty 7.0

Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency t...
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
cs.CL 2023-12 accept novelty 7.0

A three-agent loop of code generation, test creation, and execution feedback lifts pass@1 to 96.3% on HumanEval and 91.8% on MBPP for GPT-4 while using roughly half the tokens of prior state-of-the-art.
STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes
cs.CL 2026-05 unverdicted novelty 6.0

STOP uses structured on-policy analysis to prune long reasoning traces to their earliest correct node, cutting token usage 19-42% with little accuracy loss on math benchmarks.
Process Supervision via Verbal Critique Improves Reasoning in Large Language Models
cs.CL 2026-04 unverdicted novelty 6.0

Verbal Process Supervision uses structured critiques from stronger models in an iterative loop to improve LLM reasoning, reaching 94.9% on GPQA Diamond and large gains on AIME 2025.
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
cs.CL 2024-06 conditional novelty 6.0

OmegaPRM automates collection of 1.5 million process supervision labels via binary-search MCTS, raising Gemini Pro math accuracy from 51% to 69.4% on MATH500 and Gemma2 27B from 42.3% to 58.2%.
Cognitive Architectures for Language Agents
cs.AI 2023-09 accept novelty 6.0

CoALA is a modular cognitive architecture for language agents that organizes memory components, action spaces for internal and external interaction, and a generalized decision-making loop to support more systematic de...
A Survey on Large Language Model based Autonomous Agents
cs.AI 2023-08 accept novelty 6.0

A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future di...
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
cs.AI 2026-05 conditional novelty 5.0

The survey proposes the LIFE framework to unify fragmented research on collaboration, failure attribution, and self-evolution in LLM multi-agent systems into a progression toward self-organizing intelligence.
NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning
cs.LG 2026-05 unverdicted novelty 5.0

Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.
Transferable Expertise for Autonomous Agents via Real-World Case-Based Learning
cs.AI 2026-04 unverdicted novelty 5.0

A case-based learning framework extracts reusable knowledge from past tasks to improve LLM agents' structured performance on complex real-world tasks, outperforming standard prompting baselines especially as task comp...
Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space
cs.CL 2026-03 unverdicted novelty 5.0

Inclusion-of-Thoughts purifies multiple-choice questions by keeping only plausible options, stabilizing LLM preferences and improving chain-of-thought results on reasoning benchmarks.
Separating Intelligence from Execution: A Workflow Engine for the Model Context Protocol
cs.DC 2026-03 unverdicted novelty 5.0

An MCP-native workflow engine decouples agent reasoning from execution by using declarative blueprints, reducing token cost by over 99% on a 67-step Kubernetes synchronization task.
Understanding the planning of LLM agents: A survey
cs.AI 2024-02 accept novelty 4.0

A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.
The Rise and Potential of Large Language Model Based Agents: A Survey
cs.AI 2023-09 accept novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
A Survey on Large Language Models for Code Generation
cs.CL 2024-06 unverdicted novelty 3.0

A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark...

Reference graph

Works this paper leans on

134 extracted references · 134 canonical work pages · cited by 17 Pith papers · 31 internal anchors

[1]

Alan Baddeley. 1992. Working memory. Science, 255(5044):556--559

work page 1992
[2]

Robert Eamon Briscoe. 2011. Mental imagery and the varieties of amodal perception. Pacific Philosophical Quarterly, 92(2):153--173

work page 2011
[3]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33:1877--1901

work page 2020
[5]

Tom Bylander. 1994. The computational complexity of propositional strips planning. Artificial Intelligence, 69(1-2):165--204

work page 1994
[6]

Eduardo F Camacho and Carlos Bordons Alba. 2013. Model predictive control. Springer science & business media

work page 2013
[9]

R \'e mi Coulom. 2007. Efficient selectivity and backup operators in monte-carlo tree search. In Computers and Games: 5th International Conference, CG 2006, Turin, Italy, May 29-31, 2006. Revised Papers 5, pages 72--83. Springer

work page 2007
[11]

Wojciech W Gasparski and Tufan Orel. 2014. Designology: Studies on Planning for Action, volume 1. Transaction Publishers

work page 2014
[12]

Dedre Gentner and Albert L Stevens. 2014. Mental models. Psychology Press

work page 2014
[13]

David Ha and J \"u rgen Schmidhuber. 2018 a . Recurrent world models facilitate policy evolution. Advances in neural information processing systems, 31

work page 2018
[17]

Shibo Hao, Tianyang Liu, Zhen Wang, and Zhiting Hu. 2023 a . Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings. Advances in neural information processing systems, 36

work page 2023
[18]

Shibo Hao, Bowen Tan, Kaiwen Tang, Bin Ni, Xiyan Shao, Hengzhe Zhang, Eric Xing, and Zhiting Hu. 2023 b . Bertnet: Harvesting knowledge graphs with arbitrary relations from pretrained language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5000--5015

work page 2023
[19]

Mark K Ho, David Abel, Carlos G Correa, Michael L Littman, Jonathan D Cohen, and Thomas L Griffiths. 2021. Control of mental representations in human planning. arXiv e-prints, pages arXiv--2105

work page 2021
[22]

Quentin JM Huys, Neir Eshel, Elizabeth O'Nions, Luke Sheridan, Peter Dayan, and Jonathan P Roiser. 2012. Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS computational biology, 8(3):e1002410

work page 2012
[23]

Yu-qian Jiang, Shi-qi Zhang, Piyush Khandelwal, and Peter Stone. 2019. Task planning in robotics: an empirical comparison of pddl-and asp-based systems. Frontiers of Information Technology & Electronic Engineering, 20:363--373

work page 2019
[24]

Philip N Johnson-Laird. 2010. Mental models and human reasoning. Proceedings of the National Academy of Sciences, 107(43):18243--18250

work page 2010
[25]

Philip Nicholas Johnson-Laird. 1983. Mental models: Towards a cognitive science of language, inference, and consciousness. 6. Harvard University Press

work page 1983
[27]

Levente Kocsis and Csaba Szepesv \'a ri. 2006. Bandit based monte-carlo planning. In Machine Learning: ECML 2006: 17th European Conference on Machine Learning Berlin, Germany, September 18-22, 2006 Proceedings 17, pages 282--293. Springer

work page 2006
[29]

Yann LeCun. 2022. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62

work page 2022
[33]

Yutaka Matsuo, Yann LeCun, Maneesh Sahani, Doina Precup, David Silver, Masashi Sugiyama, Eiji Uchibe, and Jun Morimoto. 2022. Deep learning, reinforcement learning, and world models. Neural Networks

work page 2022
[34]

John McCarthy. 1963. Situations, actions, and causal laws. Technical report, STANFORD UNIV CA DEPT OF COMPUTER SCIENCE

work page 1963
[36]

OpenAI. 2023. http://arxiv.org/abs/2303.08774 Gpt-4 technical report

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, and Antonio Torralba. 2018. http://arxiv.org/abs/1806.07011 Virtualhome: Simulating household activities via programs

work page internal anchor Pith review Pith/arXiv arXiv 2018
[41]

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. 2020. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604--609

work page 2020
[42]

Jay Schulkin. 2012. Action, perception and the brain: Adaptation and cephalic expression. Springer

work page 2012
[43]

Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. 2020. Planning to explore via self-supervised world models. In International Conference on Machine Learning, pages 8583--8592. PMLR

work page 2020
[44]

Noah Shinn, Beck Labash, and Ashwin Gopinath. 2023. Reflexion: an autonomous agent with dynamic memory and self-reflection. ArXiv, abs/2303.11366

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

Edward C Tolman. 1948. Cognitive maps in rats and men. Psychological review, 55(4):189

work page 1948
[55]

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. 2023. Daydreamer: World models for physical robot learning. In Conference on Robot Learning, pages 2226--2240. PMLR

work page 2023
[56]

Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang, and Zhiting Hu. 2023. Language models meet world models: Embodied experiences enhance language models. Advances in neural information processing systems, 36

work page 2023
[60]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[61]

2, 2022-06-27 , author=

A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27 , author=. Open Review , volume=

work page 2022
[62]

arXiv preprint arXiv:2106.15772 , year=

A diverse corpus for evaluating and developing English math word problem solvers , author=. arXiv preprint arXiv:2106.15772 , year=

work page arXiv
[63]

Measuring Mathematical Problem Solving With the MATH Dataset

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[64]

arXiv preprint arXiv:2012.13048 , year=

Proofwriter: Generating implications, proofs, and abductive statements over natural language , author=. arXiv preprint arXiv:2012.13048 , year=

work page arXiv 2012
[65]

arXiv preprint arXiv:2209.00840 , year=

Folio: Natural language reasoning with first-order logic , author=. arXiv preprint arXiv:2209.00840 , year=

work page arXiv
[66]

arXiv preprint arXiv:2210.01240 , year=

Language models are greedy reasoners: A systematic formal analysis of chain-of-thought , author=. arXiv preprint arXiv:2210.01240 , year=

work page arXiv
[67]

2023 , eprint=

GPT-4 Technical Report , author=. 2023 , eprint=

work page 2023
[68]

LLaMA: Open and Efficient Foundation Language Models

Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[69]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page
[70]

PaLM: Scaling Language Modeling with Pathways

Palm: Scaling language modeling with pathways , author=. arXiv preprint arXiv:2204.02311 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[71]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Sparks of artificial general intelligence: Early experiments with gpt-4 , author=. arXiv preprint arXiv:2303.12712 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[72]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

work page
[73]

arXiv preprint arXiv:2206.10498 , year=

Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) , author=. arXiv preprint arXiv:2206.10498 , year=

work page arXiv
[74]

World Models

World models , author=. arXiv preprint arXiv:1803.10122 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[75]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Chain of thought prompting elicits reasoning in large language models , author=. arXiv preprint arXiv:2201.11903 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[76]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[77]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[78]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[79]

Transactions of the Association for Computational Linguistics , volume=

Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies , author=. Transactions of the Association for Computational Linguistics , volume=. 2021 , publisher=

work page 2021
[80]

arXiv preprint arXiv:2110.07178 , year=

Symbolic knowledge distillation: from general language models to commonsense models , author=. arXiv preprint arXiv:2110.07178 , year=

work page arXiv
[81]

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

Gpteval: Nlg evaluation using gpt-4 with better human alignment , author=. arXiv preprint arXiv:2303.16634 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[82]

Model-Based Planning with Discrete and Continuous Actions

Model-based planning with discrete and continuous actions , author=. arXiv preprint arXiv:1705.07177 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[83]

IEEE Transactions on Computational Intelligence and AI in games , volume=

A survey of monte carlo tree search methods , author=. IEEE Transactions on Computational Intelligence and AI in games , volume=. 2012 , publisher=

work page 2012
[84]

arXiv preprint arXiv:2302.06706 , year=

On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark) , author=. arXiv preprint arXiv:2302.06706 , year=

work page arXiv
[85]

2011 , publisher=

Thinking, fast and slow , author=. 2011 , publisher=

work page 2011
[86]

1967 , publisher=

The nature of explanation , author=. 1967 , publisher=

work page 1967
[87]

1975 , publisher=

Applied optimal control: optimization, estimation and control , author=. 1975 , publisher=

work page 1975
[88]

arXiv preprint arXiv:2104.05336 , year=

Machine translation decoding beyond beam search , author=. arXiv preprint arXiv:2104.05336 , year=

work page arXiv
[89]

arXiv preprint arXiv:2212.10012 , year=

Language Modeling with Latent Situations , author=. arXiv preprint arXiv:2212.10012 , year=

work page arXiv
[90]

arXiv preprint arXiv:2205.11822 , year=

Maieutic prompting: Logically consistent reasoning with recursive explanations , author=. arXiv preprint arXiv:2205.11822 , year=

work page arXiv
[91]

Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems , pages=

Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts , author=. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems , pages=

work page 2022
[92]

arXiv preprint arXiv:2205.07381 , year=

SEQZERO: Few-shot compositional semantic parsing with sequential prompts and zero-shot models , author=. arXiv preprint arXiv:2205.07381 , year=

work page arXiv
[93]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

Iteratively prompt pre-trained language models for chain of thought , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2022
[94]

arXiv preprint arXiv:2304.11657 , year=

Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models , author=. arXiv preprint arXiv:2304.11657 , year=

work page arXiv
[95]

arXiv preprint arXiv:2304.01904 , year=

REFINER: Reasoning Feedback on Intermediate Representations , author=. arXiv preprint arXiv:2304.01904 , year=

work page arXiv
[96]

arXiv preprint arXiv:2211.00053 , year=

Generating Sequences by Learning to Self-Correct , author=. arXiv preprint arXiv:2211.00053 , year=

work page arXiv
[97]

arXiv preprint arXiv:2305.00633 , year=

Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding , author=. arXiv preprint arXiv:2305.00633 , year=

work page arXiv
[98]

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Tree of thoughts: Deliberate problem solving with large language models , author=. arXiv preprint arXiv:2305.10601 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[99]

Advances in neural information processing systems , volume=

Learning a world model and planning with a self-organizing, dynamic neural system , author=. Advances in neural information processing systems , volume=

work page
[100]

arXiv preprint arXiv:2210.16257 , year=

Solving Math Word Problem via Cooperative Reasoning induced Language Models , author=. arXiv preprint arXiv:2210.16257 , year=

work page arXiv
[101]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Do as i can, not as i say: Grounding language in robotic affordances , author=. arXiv preprint arXiv:2204.01691 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[102]

arXiv preprint arXiv:2303.16563 , year=

Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks , author=. arXiv preprint arXiv:2303.16563 , year=

work page arXiv
[103]

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents , author=. arXiv preprint arXiv:2302.01560 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[104]

arXiv preprint arXiv:2301.13379 , year=

Faithful chain-of-thought reasoning , author=. arXiv preprint arXiv:2301.13379 , year=

work page arXiv
[105]

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

LLM+ P: Empowering Large Language Models with Optimal Planning Proficiency , author=. arXiv preprint arXiv:2304.11477 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[106]

arXiv preprint arXiv:2205.09712 , year=

Selection-inference: Exploiting large language models for interpretable logical reasoning , author=. arXiv preprint arXiv:2205.09712 , year=

work page arXiv
[107]

Advances in Neural Information Processing Systems , volume=

Hypertree proof search for neural theorem proving , author=. Advances in Neural Information Processing Systems , volume=

work page
[108]

Thirteenth international conference on the principles of knowledge representation and reasoning , year=

The winograd schema challenge , author=. Thirteenth international conference on the principles of knowledge representation and reasoning , year=

work page
[109]

Communications of the ACM , volume=

Commonsense reasoning and commonsense knowledge in artificial intelligence , author=. Communications of the ACM , volume=. 2015 , publisher=

work page 2015
[110]

1959 , publisher=

Programs with common sense , author=. 1959 , publisher=

work page 1959

Showing first 80 references.