VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
arXiv preprint arXiv:2303.12153 , year=
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
LLM+P lets LLMs solve planning problems optimally by converting them to PDDL for classical planners and back to natural language.
InfoAtlas is a pretrained neural model for zero-shot mutual information estimation that matches state-of-the-art accuracy with 100x speedup and handles varying dimensions via a single model.
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.
RAP turns LLMs into dual world-model and planning agents via MCTS to generate better reasoning paths, outperforming CoT baselines and achieving 33% relative gains over GPT-4 CoT using LLaMA-33B on plan generation.
DEPS combines LLM-based interactive planning with a trainable goal selector to create a zero-shot multi-task agent that completes 70+ Minecraft tasks and nearly doubles prior performance.
LLaMA-Adapter V2 achieves open-ended visual instruction following in LLMs by unlocking more parameters, early fusion of visual tokens, and joint training on disjoint parameter groups with only 14M added parameters.
citing papers explorer
-
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
-
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
LLM+P lets LLMs solve planning problems optimally by converting them to PDDL for classical planners and back to natural language.
-
InfoAtlas: A Foundation Model for Zero-Shot Statistical Dependence Estimate
InfoAtlas is a pretrained neural model for zero-shot mutual information estimation that matches state-of-the-art accuracy with 100x speedup and handles varying dimensions via a single model.
-
Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.
-
Reasoning with Language Model is Planning with World Model
RAP turns LLMs into dual world-model and planning agents via MCTS to generate better reasoning paths, outperforming CoT baselines and achieving 33% relative gains over GPT-4 CoT using LLaMA-33B on plan generation.
-
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
DEPS combines LLM-based interactive planning with a trainable goal selector to create a zero-shot multi-task agent that completes 70+ Minecraft tasks and nearly doubles prior performance.
-
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
LLaMA-Adapter V2 achieves open-ended visual instruction following in LLMs by unlocking more parameters, early fusion of visual tokens, and joint training on disjoint parameter groups with only 14M added parameters.