super hub Canonical reference

Voyager: An Open-Ended Embodied Agent with Large Language Models

Ajay Mandlekar, Chaowei Xiao, Guanzhi Wang, Yuke Zhu, Yunfan Jiang, Yuqi Xie · 2023 · cs.AI · arXiv 2305.16291

Canonical reference. 94% of citing Pith papers cite this work as background.

377 Pith papers citing it

Background 94% of classified citations

open full Pith review browse 377 citing papers more from Ajay Mandlekar arXiv PDF

abstract

We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 60 method 2 dataset 1 other 1

citation-polarity summary

background 60 support 1 unclear 1 use dataset 1 use method 1

claims ledger

abstract We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox querie

authors

Ajay Mandlekar Chaowei Xiao Guanzhi Wang Yuke Zhu Yunfan Jiang Yuqi Xie

co-cited works

representative citing papers

Continual Harness: Online Adaptation for Self-Improving Foundation Agents

cs.LG · 2026-05-11 · conditional · novelty 8.0

Continual Harness automates online self-improvement for foundation-model embodied agents by refining prompts, sub-agents, skills, and memory within one run, cutting button-press costs on Pokemon Red and Emerald and closing much of the gap to expert harnesses.

Flame3D: Zero-shot Compositional Reasoning of 3D Scenes with Agentic Language Models

cs.CV · 2026-05-09 · unverdicted · novelty 8.0

Flame3D enables zero-shot compositional 3D scene reasoning by representing scenes as editable visual-textual memories exposed to agentic MLLMs through composable and synthesizable spatial tools.

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

cs.CR · 2026-05-09 · unverdicted · novelty 8.0 · 3 refs

ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.

The Khipu Problem: Institutional Legibility Under Distributed Cognition

cs.CY · 2026-05-06 · unverdicted · novelty 8.0

The khipu problem frames a governance failure in distributed AI where interpretive continuity is lost even when traces remain, requiring infrastructure to preserve reading practices rather than only data retention.

SEVerA: Verified Synthesis of Self-Evolving Agents

cs.LG · 2026-03-26 · unverdicted · novelty 8.0

SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

cs.CR · 2025-07-14 · unverdicted · novelty 8.0

ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.

AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

cs.AI · 2026-07-02 · unverdicted · novelty 7.0

Introduces bounded-memory testbed for LLM agents in Slay the Spire 2 where typed retrieval replaces accumulating context, with released trajectories showing skill layer raises wins from 3/10 to 6/10.

When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers

cs.DB · 2026-07-01 · unverdicted · novelty 7.0

SOLAR is a learning-augmented policy for semantic cache replacement that achieves constant competitive ratio 3 and 5-75% gains over FIFO on retrieval workloads.

Generative Skill Composition for LLM Agents

cs.CL · 2026-06-30 · unverdicted · novelty 7.0

SkillComposer performs task-conditioned skill sequence prediction with a constrained autoregressive decoder to jointly output skill subset, count, and order, raising pass rates by 23.1 and 18.2 percentage points on two production coding agents over no-skill baselines.

AI Trading's Alpha Singularity: Emergent Market Reasoning through Agent-to-Agent Self-Evolution

cs.AI · 2026-06-28 · reject · novelty 7.0

Multi-agent LLM system Agora under Sealed Joint Search conditions produces +1.87 holdout Sharpe on CSI 1000 over a 91-day sealed period, exceeding the best baseline at +1.334 under favorable seed.

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

cs.AI · 2026-06-27 · unverdicted · novelty 7.0

LLM agents often fail to abstain at the right time in uncertain multi-turn tasks, and the CONVOLVE context engineering method raises timely abstention rates on WebShop from 26.7 to 57.4 without parameter updates.

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

cs.AI · 2026-06-26 · unverdicted · novelty 7.0 · 2 refs

GILP trains a parameterized backbone for valid actions and state predictions, then uses a consistency gate with LLM drafts to reduce hallucinated-state rate from 0.176 to 0.035 on GPT-4o-mini while raising success from 0.668 to 0.838.

Beyond Next-Observation Prediction: Agent-Authored World Modeling for Sequential Decision Making

cs.CL · 2026-06-24 · unverdicted · novelty 7.0

AAWM builds training targets for world models by retrieving and synthesizing transition evidence based on the policy's self-identified decision needs at each state.

A Stackelberg Framework for Resource-Aware LLM Agents: Learning, Repair, and Conditional Guarantees

cs.AI · 2026-06-22 · unverdicted · novelty 7.0

A Stackelberg game framework for LLM resource governance is introduced with learned response models, policy repair via real-API calibration, conditional theoretical guarantees, and an experiment showing 17.4% token cost reduction.

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

cs.AI · 2026-06-18 · unverdicted · novelty 7.0

ENPIRE supplies four modules (Environment, Policy Improvement, Rollout, Evolution) that turn real-world robot training into an autonomous optimization loop driven by coding agents.

Toward Temporal Realism in City-Scale Crisis Response Simulation using LLM Agents

cs.SI · 2026-06-18 · unverdicted · novelty 7.0

A hybrid simulator combining LLM decision-making with an explicit self-excitation model reproduces bursty temporal patterns in city-scale volunteering data, unlike pure LLM agents.

SIGMA: Skill-Incidence Graphs for Compositional Multi-Agent Design

cs.MA · 2026-06-18 · unverdicted · novelty 7.0

SIGMA introduces skill-incidence graphs to compose agents from reusable skills, yielding higher average performance and robustness than topology-only baselines on reasoning and coding benchmarks.

PreAct: Computer-Using Agents that Get Faster on Repeated Tasks

cs.AI · 2026-06-16 · unverdicted · novelty 7.0

PreAct compiles successful agent executions into verifiable state-machine programs for 8.5-13x faster replay on repeated tasks, with an independent evaluator check before storing each program.

daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization

cs.LG · 2026-06-15 · unverdicted · novelty 7.0

daVinci-kernel is a multi-agent RL system that co-evolves skill selection, policy generation, and summarization via shared LLM and REINFORCE to optimize GPU kernels, reporting higher KernelBench scores than prior RL models.

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

cs.AI · 2026-06-08 · unverdicted · novelty 7.0

SkeMex distills agent trajectories into value-aware skills organized in general/task/action branches and evolves them via a closed-loop Read-Write-Assess-Govern process, outperforming prior memory agents on clinical tasks.

Co-Evolving Skill Generation and Policy Optimization

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.

PhysAgent: Automating Physics-Based 4D Synthesis via Trajectory-Grounded Multi-Agent Feedback

cs.RO · 2026-06-07 · unverdicted · novelty 7.0

PhysAgent is a simulator-in-the-loop multi-agent system that automates physically grounded 4D synthesis from multimodal prompts by using trajectory feedback from vision models and LLM reasoning to optimize force fields.

PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents

cs.AI · 2026-06-06 · unverdicted · novelty 7.0

PACE is a training-free anytime-valid commit gate using testing-by-betting e-processes that controls per-candidate false-commit probability for self-evolving agents and reduces spurious edits compared to greedy acceptance.

Rosetta Memory: Adaptive Memory for Cross-LLM Agents

cs.LG · 2026-06-05 · unverdicted · novelty 7.0

Rosetta Memory trains two profile-conditioned operators with a minimum-gain sampling curriculum and performance-gap reward to enable memory transfer between LLMs, showing gains on multi-hop QA benchmarks and robustness to unseen models.

citing papers explorer

Showing 18 of 18 citing papers after filters.

PhysAgent: Automating Physics-Based 4D Synthesis via Trajectory-Grounded Multi-Agent Feedback cs.RO · 2026-06-07 · unverdicted · none · ref 48 · internal anchor
PhysAgent is a simulator-in-the-loop multi-agent system that automates physically grounded 4D synthesis from multimodal prompts by using trajectory feedback from vision models and LLM reasoning to optimize force fields.
VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents cs.RO · 2026-06-03 · unverdicted · none · ref 6 · internal anchor
VASO is a verification-guided self-evolution framework for LLM robot skill contracts that reaches 97.2% formal-specification compliance on Jackal and quadcopter tasks using under 100 samples.
eMEM: A Hybrid Spatio-Temporal Memory System For Embodied Agents cs.RO · 2026-06-02 · unverdicted · none · ref 5 · internal anchor
eMEM is a multi-index memory architecture with tiered consolidation and ten recall tools for embodied agents, scoring 80.8 weighted mean on eMEM-Bench covering eight cognitive psychology paradigms and outperforming a flat RAG baseline on context and lure rejection tasks.
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents cs.RO · 2026-05-08 · unverdicted · none · ref 37 · 2 links · internal anchor
MemCompiler reframes memory use as state-conditioned compilation, delivering relevant guidance via text and latent channels to improve embodied agent performance up to 129% and cut latency 60% versus static injection.
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies cs.RO · 2026-04-29 · unverdicted · none · ref 11 · 2 links · internal anchor
A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.
ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs cs.RO · 2026-02-09 · unverdicted · none · ref 112 · internal anchor
ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.
Sequential Planning via Anchored Robotic Keypoints cs.RO · 2026-06-29 · unverdicted · none · ref 41 · internal anchor
SPARK reaches 43.7% success on six LIBERO-PRO cells by LLM-generated typed behavior trees plus multi-prompt perception and recovery, more than doubling CaP-Agent0 and VLA baselines.
Automating the Design of Embodied AgentArchitectures cs.RO · 2026-06-29 · unverdicted · none · ref 24 · internal anchor
Automated architecture search for embodied agents produces directional success-rate gains on vision-language and manipulation tasks while exposing limits from simulation noise and incomplete credit assignment.
DrivingAgent: Design and Scheduling Agents for Autonomous Driving Systems cs.RO · 2026-06-10 · unverdicted · none · ref 27 · internal anchor
DrivingAgent automates autonomous driving module design via LLM code generation and super-network validation, and employs an RL-trained LLM with structured memory for dynamic real-time scheduling.
CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations cs.RO · 2026-05-08 · unverdicted · none · ref 5 · internal anchor
CSR with ASR enables infinite-horizon real-time LLM policies via stable KV-cache properties and background eviction, delivering 26x lower latency and SOTA recall on embodied benchmarks.
Long-Term Memory for VLA-based Agents in Open-World Task Execution cs.RO · 2026-04-17 · unverdicted · none · ref 10 · internal anchor
ChemBot adds dual-layer memory and future-state asynchronous inference to VLA models, enabling better long-horizon success in chemical lab automation on collaborative robots.
HoloAgent-0: A Unified Embodied Agent Framework with 3D Spatial Memory cs.RO · 2026-06-22 · unverdicted · none · ref 5 · internal anchor
HoloAgent-0 is a unified embodied agent framework with Embodied AgentOS, 3D spatial memory, and embodied skills, deployed and evaluated on real robot hardware for navigation and manipulation tasks.
MagicSim: A Unified Infrastructure for Executable Embodied Interaction cs.RO · 2026-06-16 · unverdicted · none · ref 122 · internal anchor
MagicSim is a unified embodied interaction infrastructure built on a deterministic batched runtime and shared MDP that supports diverse world construction, execution, task evaluation, automatic rollout generation, and interactive agent interfaces.
Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction cs.RO · 2026-05-18 · unverdicted · none · ref 31 · internal anchor
Robo-Cortex proposes a self-evolving embodied navigation agent using dual-grain cognitive memory and autonomous knowledge induction from trajectories, reporting SPL gains on IGNav, AR, AEQA and preliminary real-robot tests.
SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing cs.RO · 2026-04-15 · unverdicted · none · ref 31 · internal anchor
SpaceMind is a self-evolving modular VLM agent framework that achieves 90-100% navigation success in nominal conditions and recovers from failures via experience distillation, with zero-code transfer to physical robots for on-orbit tasks.
EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development cs.RO · 2026-04-15 · unverdicted · none · ref 35 · internal anchor
EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.
SR-Platform: An Agentic Pipeline for Natural Language-Driven Robot Simulation Environment Synthesis cs.RO · 2026-05-14 · unverdicted · none · ref 6 · internal anchor
SR-Platform is a production-deployed nine-service Docker system that synthesizes physically valid MuJoCo environments from natural language using LLM orchestration, CadQuery asset forging, constraint-aware layout, and MJCF assembly, with reported median latency of ~50 s for five-object scenes.
Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution cs.RO · 2026-04-09 · unreviewed · ref 20 · 2 links · internal anchor

Voyager: An Open-Ended Embodied Agent with Large Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer