super hub Canonical reference

Voyager: An Open-Ended Embodied Agent with Large Language Models

Ajay Mandlekar, Chaowei Xiao, Guanzhi Wang, Yuke Zhu, Yunfan Jiang, Yuqi Xie · 2023 · cs.AI · arXiv 2305.16291

Canonical reference. 94% of citing Pith papers cite this work as background.

357 Pith papers citing it

Background 94% of classified citations

open full Pith review browse 357 citing papers more from Ajay Mandlekar arXiv PDF

abstract

We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 59 method 2 dataset 1 other 1

citation-polarity summary

background 59 support 1 unclear 1 use dataset 1 use method 1

claims ledger

abstract We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox querie

authors

Ajay Mandlekar Chaowei Xiao Guanzhi Wang Yuke Zhu Yunfan Jiang Yuqi Xie

co-cited works

representative citing papers

Continual Harness: Online Adaptation for Self-Improving Foundation Agents

cs.LG · 2026-05-11 · conditional · novelty 8.0

Continual Harness automates online self-improvement for foundation-model embodied agents by refining prompts, sub-agents, skills, and memory within one run, cutting button-press costs on Pokemon Red and Emerald and closing much of the gap to expert harnesses.

Flame3D: Zero-shot Compositional Reasoning of 3D Scenes with Agentic Language Models

cs.CV · 2026-05-09 · unverdicted · novelty 8.0

Flame3D enables zero-shot compositional 3D scene reasoning by representing scenes as editable visual-textual memories exposed to agentic MLLMs through composable and synthesizable spatial tools.

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

cs.CR · 2026-05-09 · unverdicted · novelty 8.0 · 3 refs

ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.

The Khipu Problem: Institutional Legibility Under Distributed Cognition

cs.CY · 2026-05-06 · unverdicted · novelty 8.0

The khipu problem frames a governance failure in distributed AI where interpretive continuity is lost even when traces remain, requiring infrastructure to preserve reading practices rather than only data retention.

SEVerA: Verified Synthesis of Self-Evolving Agents

cs.LG · 2026-03-26 · unverdicted · novelty 8.0

SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

cs.CR · 2025-07-14 · unverdicted · novelty 8.0

ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.

AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

cs.AI · 2026-07-02 · unverdicted · novelty 7.0

Introduces bounded-memory testbed for LLM agents in Slay the Spire 2 where typed retrieval replaces accumulating context, with released trajectories showing skill layer raises wins from 3/10 to 6/10.

When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers

cs.DB · 2026-07-01 · unverdicted · novelty 7.0

SOLAR is a learning-augmented policy for semantic cache replacement that achieves constant competitive ratio 3 and 5-75% gains over FIFO on retrieval workloads.

Generative Skill Composition for LLM Agents

cs.CL · 2026-06-30 · unverdicted · novelty 7.0

SkillComposer performs task-conditioned skill sequence prediction with a constrained autoregressive decoder to jointly output skill subset, count, and order, raising pass rates by 23.1 and 18.2 percentage points on two production coding agents over no-skill baselines.

AI Trading's Alpha Singularity: Emergent Market Reasoning through Agent-to-Agent Self-Evolution

cs.AI · 2026-06-28 · reject · novelty 7.0

Multi-agent LLM system Agora under Sealed Joint Search conditions produces +1.87 holdout Sharpe on CSI 1000 over a 91-day sealed period, exceeding the best baseline at +1.334 under favorable seed.

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

cs.AI · 2026-06-27 · unverdicted · novelty 7.0

LLM agents often fail to abstain at the right time in uncertain multi-turn tasks, and the CONVOLVE context engineering method raises timely abstention rates on WebShop from 26.7 to 57.4 without parameter updates.

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

cs.AI · 2026-06-26 · unverdicted · novelty 7.0 · 2 refs

GILP trains a parameterized backbone for valid actions and state predictions, then uses a consistency gate with LLM drafts to reduce hallucinated-state rate from 0.176 to 0.035 on GPT-4o-mini while raising success from 0.668 to 0.838.

PreAct: Computer-Using Agents that Get Faster on Repeated Tasks

cs.AI · 2026-06-16 · unverdicted · novelty 7.0

PreAct compiles successful agent executions into verifiable state-machine programs for 8.5-13x faster replay on repeated tasks, with an independent evaluator check before storing each program.

daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization

cs.LG · 2026-06-15 · unverdicted · novelty 7.0

daVinci-kernel is a multi-agent RL system that co-evolves skill selection, policy generation, and summarization via shared LLM and REINFORCE to optimize GPU kernels, reporting higher KernelBench scores than prior RL models.

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

cs.AI · 2026-06-08 · unverdicted · novelty 7.0

SkeMex distills agent trajectories into value-aware skills organized in general/task/action branches and evolves them via a closed-loop Read-Write-Assess-Govern process, outperforming prior memory agents on clinical tasks.

Co-Evolving Skill Generation and Policy Optimization

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.

PhysAgent: Automating Physics-Based 4D Synthesis via Trajectory-Grounded Multi-Agent Feedback

cs.RO · 2026-06-07 · unverdicted · novelty 7.0

PhysAgent is a simulator-in-the-loop multi-agent system that automates physically grounded 4D synthesis from multimodal prompts by using trajectory feedback from vision models and LLM reasoning to optimize force fields.

PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents

cs.AI · 2026-06-06 · unverdicted · novelty 7.0

PACE is a training-free anytime-valid commit gate using testing-by-betting e-processes that controls per-candidate false-commit probability for self-evolving agents and reduces spurious edits compared to greedy acceptance.

Rosetta Memory: Adaptive Memory for Cross-LLM Agents

cs.LG · 2026-06-05 · unverdicted · novelty 7.0

Rosetta Memory trains two profile-conditioned operators with a minimum-gain sampling curriculum and performance-gap reward to enable memory transfer between LLMs, showing gains on multi-hop QA benchmarks and robustness to unseen models.

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

LatentSkill uses a hypernetwork to generate LoRA adapters from textual skills, enabling weight-space storage that cuts prefill tokens and boosts agent success rates on ALFWorld and Search-QA.

VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents

cs.RO · 2026-06-03 · unverdicted · novelty 7.0

VASO is a verification-guided self-evolution framework for LLM robot skill contracts that reaches 97.2% formal-specification compliance on Jackal and quadcopter tasks using under 100 samples.

Agent Planning Benchmark: A Diagnostic Framework for Planning Capabilities in LLM Agents

cs.CL · 2026-06-03 · conditional · novelty 7.0

Introduces APB benchmark with 4209 cases across 22 domains to diagnose planning in 12 MLLMs and shows it improves downstream execution when used for refinement.

AIP: A Graph Representation for Learning and Governing Agent Skills

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

AIP models skills as graphs of discrete steps connected by typed I/O edges under a validated schema, raising agent mean reward from 0.60 to 0.71 and pass rate from 53% to 67% on 27 SkillsBench tasks while enabling node-level fixes.

PersonaTree: Structured Lifecycle Memory for Person Understanding in LLM Agents

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

PersonaTree is a new hierarchical memory framework for persistent LLM agents that structures evidence into persona claims via support paths and outperforms baselines on six person-understanding benchmarks.

citing papers explorer

Showing 7 of 7 citing papers after filters.

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models cs.AI · 2023-10-06 · unverdicted · none · ref 9 · internal anchor
LATS integrates Monte Carlo Tree Search with language models using in-context learning, value functions, and self-reflection to achieve 92.7% pass@1 on HumanEval and competitive web navigation performance.
Mastering Diverse Domains through World Models cs.AI · 2023-01-10 · unverdicted · none · ref 54 · internal anchor
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations cs.AI · 2023-12-14 · conditional · none · ref 83 · internal anchor
Math-Shepherd is an automatically trained process reward model that scores solution steps to verify and reinforce LLMs, lifting Mistral-7B from 77.9% to 89.1% on GSM8K and 28.6% to 43.5% on MATH.
SGLang: Efficient Execution of Structured Language Model Programs cs.AI · 2023-12-12 · conditional · none · ref 52 · internal anchor
SGLang is a new system that speeds up structured LLM programs by up to 6.4x using RadixAttention for KV cache reuse and compressed finite state machines for output decoding.
Cognitive Architectures for Language Agents cs.AI · 2023-09-05 · accept · none · ref 81 · internal anchor
CoALA is a modular cognitive architecture for language agents that organizes memory components, action spaces for internal and external interaction, and a generalized decision-making loop to support more systematic development of capable agents.
A Survey on Large Language Model based Autonomous Agents cs.AI · 2023-08-22 · accept · none · ref 38 · internal anchor
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
The Rise and Potential of Large Language Model Based Agents: A Survey cs.AI · 2023-09-14 · accept · none · ref 191 · internal anchor
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

Voyager: An Open-Ended Embodied Agent with Large Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer