hub Mixed citations

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang · 2023 · cs.AI · arXiv 2310.04406

Mixed citation behavior. Most common role is background (67%).

48 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 48 citing papers arXiv PDF

abstract

While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents. In this paper, we introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of LMs in reasoning, acting, and planning. By leveraging the in-context learning ability of LMs, we integrate Monte Carlo Tree Search into LATS to enable LMs as agents, along with LM-powered value functions and self-reflections for proficient exploration and enhanced decision-making. A key feature of our approach is the incorporation of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that surpasses the constraints of existing techniques. Our experimental evaluation across diverse domains, including programming, interactive question-answering (QA), web navigation, and math, validates the effectiveness and generality of LATS in decision-making while maintaining competitive or improved reasoning performance. Notably, LATS achieves state-of-the-art pass@1 accuracy (92.7%) for programming on HumanEval with GPT-4 and demonstrates gradient-free performance (average score of 75.9) comparable to gradient-based fine-tuning for web navigation on WebShop with GPT-3.5. Code can be found at https://github.com/lapisrocks/LanguageAgentTreeSearch

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 1

citation-polarity summary

background 6 support 2 use method 1

representative citing papers

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

cs.AI · 2026-06-26 · unverdicted · novelty 7.0 · 2 refs

GILP trains a parameterized backbone for valid actions and state predictions, then uses a consistency gate with LLM drafts to reduce hallucinated-state rate from 0.176 to 0.035 on GPT-4o-mini while raising success from 0.668 to 0.838.

Look-Before-Move: Narrative-Grounded World Visual Attention in Dynamic 3D Story Worlds

cs.AI · 2026-06-25 · unverdicted · novelty 7.0 · 2 refs

Look-Before-Move is a framework that converts narrative intent into Semantic Observation Contracts, uses Monte Carlo Viewpoint Search for feasible viewpoints, and applies Semantic Trajectory Grounding for coherent camera motion in dynamic 3D story worlds.

Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing

cs.CL · 2026-06-24 · unverdicted · novelty 7.0

LBR performs token-level test-time scaling via local branch routing on hidden states, enabling end-to-end RL training and improving Pass@1 and Pass@32 on math benchmarks over CoT and RLVR baselines.

LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories

cs.AI · 2026-05-29 · unverdicted · novelty 7.0

Adding explicit parent pointers to represent search tree structure in LLM reasoning traces (LinTree) improves task performance and search efficiency on Blocks World, grid Navigation, and Sokoban relative to implicit traces and LLM-heuristic search.

Honest Lying: Understanding Memory Confabulation in Reflexive Agents

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Reflexive agents confabulate incorrect task interpretations in memory, detected via Reflection Repetition Rate metric, with a programmatic mitigation raising correct object mentions from 0% to 86% in frozen ALFWorld cases.

GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

GraphFlow uses a unified wGraph to dynamically instantiate workflows and manage KV caches for LLM agents, reporting 4.95 pp average gains and 4x memory reduction on five benchmarks.

State-Centric Decision Process

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.

RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

cs.LG · 2026-05-10 · unverdicted · novelty 7.0 · 3 refs

RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.

Inference-Time Budget Control for LLM Search Agents

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

A VOI-based controller for dual inference budgets improves multi-hop QA performance by prioritizing search actions and selectively finalizing answers.

POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference

cs.SE · 2026-05-05 · unverdicted · novelty 7.0

POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.

Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Self-Correcting RAG formalizes retrieval as MMKP to maximize information density under token limits and uses NLI-guided MCTS to validate faithfulness, raising accuracy and cutting hallucinations on six multi-hop QA and fact-checking datasets.

AdverMCTS: Combating Pseudo-Correctness in Code Generation via Adversarial Monte Carlo Tree Search

cs.SE · 2026-04-12 · unverdicted · novelty 7.0

AdverMCTS frames code generation as a minimax game where an attacker evolves tests to expose flaws in solver-generated code, yielding more robust outputs than static-test baselines.

Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

CACM improves language-based drug discovery agents by 36.4% via protocol auditing, a grounded diagnostician, and compressed static/dynamic/corrective memory channels that localize failures and bias corrections.

SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

cs.SE · 2025-12-20 · unverdicted · novelty 7.0

SWE-EVO shows GPT-5.4 with OpenHands reaching only 25% success on complex multi-file evolution tasks versus 72.8% on SWE-Bench Verified, and introduces Fix Rate as a partial-progress metric.

ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling

cs.AI · 2025-10-16 · unverdicted · novelty 7.0

ToolPRM provides fine-grained intra-call process supervision via a new dataset and reward model, outperforming outcome and coarse-grained alternatives on function-calling benchmarks.

Stop Hand-Holding Your Coding Agent: Engineering the Loops that Replace Step-by-Step Prompting

cs.SE · 2026-06-28 · unverdicted · novelty 6.0

Introduces loop engineering as a distinct practice layer for coding agents, supplies a taxonomy and verification ladder, and analyzes a hand-coded corpus of fifty real loops.

Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents

cs.PL · 2026-06-11 · unverdicted · novelty 6.0

FCGraft synthesizes code policies for embodied agents by grafting KV caches from a library of validated functions, claiming 18.31% higher success rate and 2.3x faster synthesis than prompt-level caching.

APPO: Agentic Procedural Policy Optimization

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

APPO refines branching and credit assignment in agentic RL via a Branching Score and procedure-level scaling, improving baselines by nearly 4 points on 13 benchmarks.

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

Arbor combines a coordinator, executors, and a hypothesis tree to enable cumulative autonomous research, outperforming Codex and Claude Code by over 2.5x on six real tasks and reaching 86.36% Any Medal on MLE-Bench Lite.

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

cs.LG · 2026-06-09 · unverdicted · novelty 6.0

TRACE is a rollout budget allocation framework that models ReAct turns as tree nodes and uses a predictor to allocate samples to informative prefixes, yielding a 2.8-point accuracy gain on Multi-Hop QA at equal cost.

CATPO: Critique-Augmented Tree Policy Optimization

cs.CL · 2026-06-06 · unverdicted · novelty 6.0

CATPO introduces an informativeness score F(T) and critique-guided healing for failed trees to improve efficiency and performance in tree-based RLVR, reaching 37.5% macro accuracy on math benchmarks.

Efficient Skill Grounding via Code Refactoring with Small Language Models

cs.AI · 2026-06-06 · unverdicted · novelty 6.0

RECENT decouples skill semantics from embodiment-specific bindings via code refactoring to let small language models achieve skill grounding performance matching large language model baselines.

RedEdit: Agentic Red-Teaming of Image Safety Classifiers via MCTS-Guided Photo-Editing

cs.CR · 2026-06-04 · unverdicted · novelty 6.0

RedEdit finds that fewer than two photo edits on average let 76.2% of unsafe images evade detectors while retaining 93.0% of malicious semantics.

REPOT: Recoverable Program-of-Thought via Checkpoint Repair

cs.SE · 2026-05-28 · unverdicted · novelty 6.0

RePoT recovers from PoT failures via deterministic verified replay and checkpoint repair, yielding +3 to +11pp gains on planning benchmarks and showing checkpoint state as the key recovery signal over error-only feedback.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer