VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Presents the NATURAL INSTRUCTIONS meta-dataset and shows generative pre-trained language models achieve 19% better generalization to unseen tasks when using task instructions.
LaMI augments LLMs with visual commonsense via late fusion of predictions from multiple text-generated images, outperforming prior augmented LLMs on visual tasks while matching VLMs and preserving or improving NLP performance.
On GSM8K, outcome-based supervision achieves similar final-answer error rates to process-based with less labeling, but process-based or learned reward models are needed to reach 3.4% reasoning error among correct solutions.
citing papers explorer
-
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
-
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Presents the NATURAL INSTRUCTIONS meta-dataset and shows generative pre-trained language models achieve 19% better generalization to unseen tasks when using task instructions.
-
LaMI: Augmenting Large Language Models via Late Multi-Image Fusion
LaMI augments LLMs with visual commonsense via late fusion of predictions from multiple text-generated images, outperforming prior augmented LLMs on visual tasks while matching VLMs and preserving or improving NLP performance.
-
Solving math word problems with process- and outcome-based feedback
On GSM8K, outcome-based supervision achieves similar final-answer error rates to process-based with less labeling, but process-based or learned reward models are needed to reach 3.4% reasoning error among correct solutions.