Instruct2act: Mapping multi-modality instructions to robotic actions with large language model

Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li · 2023 · arXiv 2305.11176

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

cs.RO · 2025-12-19 · conditional · novelty 6.0

ImagineNav++ achieves SOTA mapless visual navigation by prompting VLMs to select imagined future views generated from a human-preference-distilled module and maintained via selective foveation memory.

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

cs.RO · 2025-08-07 · unverdicted · novelty 6.0

Genie Envisioner unifies robotic policy learning, simulation, and evaluation inside one instruction-conditioned video diffusion framework using GE-Base, GE-Act, and GE-Sim.

A Survey on Vision-Language-Action Models for Embodied AI

cs.RO · 2024-05-23 · unverdicted · novelty 6.0

This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.

ConfusionPrompt: Practical Private Inference for Online Large Language Models

cs.CR · 2023-12-30 · unverdicted · novelty 6.0

ConfusionPrompt enables private black-box LLM inference via prompt decomposition and pseudo-prompt mixing, claiming better privacy-utility trade-off than perturbation methods and lower memory use than open-source local models.

Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization

cs.RO · 2026-04-15 · unverdicted · novelty 5.0

EEAgent with LSTRO sets new state-of-the-art results on six VIMA-Bench robotic manipulation tasks by dynamically refining prompts through reflection on successes and failures.

MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence

cs.RO · 2025-11-03 · unverdicted · novelty 5.0

MARS introduces a four-agent MLLM system for risk-aware planning and personalized assistance in home robotics, claiming superior performance over state-of-the-art multimodal models on multiple datasets.

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

cs.RO · 2025-08-18 · unverdicted · novelty 5.0

This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

cs.RO · 2025-07-02 · unverdicted · novelty 5.0

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

citing papers explorer

Showing 8 of 8 citing papers.

ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination cs.RO · 2025-12-19 · conditional · none · ref 50
ImagineNav++ achieves SOTA mapless visual navigation by prompting VLMs to select imagined future views generated from a human-preference-distilled module and maintained via selective foveation memory.
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation cs.RO · 2025-08-07 · unverdicted · none · ref 16
Genie Envisioner unifies robotic policy learning, simulation, and evaluation inside one instruction-conditioned video diffusion framework using GE-Base, GE-Act, and GE-Sim.
A Survey on Vision-Language-Action Models for Embodied AI cs.RO · 2024-05-23 · unverdicted · none · ref 107
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
ConfusionPrompt: Practical Private Inference for Online Large Language Models cs.CR · 2023-12-30 · unverdicted · none · ref 3
ConfusionPrompt enables private black-box LLM inference via prompt decomposition and pseudo-prompt mixing, claiming better privacy-utility trade-off than perturbation methods and lower memory use than open-source local models.
Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization cs.RO · 2026-04-15 · unverdicted · none · ref 19
EEAgent with LSTRO sets new state-of-the-art results on six VIMA-Bench robotic manipulation tasks by dynamically refining prompts through reflection on successes and failures.
MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence cs.RO · 2025-11-03 · unverdicted · none · ref 2
MARS introduces a four-agent MLLM system for risk-aware planning and personalized assistance in home robotics, claiming superior performance over state-of-the-art multimodal models on multiple datasets.
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey cs.RO · 2025-08-18 · unverdicted · none · ref 159
This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective cs.RO · 2025-07-02 · unverdicted · none · ref 129
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

Instruct2act: Mapping multi-modality instructions to robotic actions with large language model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer