WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while making small VLMs competitive with large ones.
A survey of embodied ai: From simulators to research tasks
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 3polarities
background 3representative citing papers
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
DailyArt recovers full joint parameters of articulated objects from a single static image by synthesizing an opened state and comparing discrepancies, supporting downstream part-level novel state synthesis.
This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.
The paper summarizes expert panel consensus that long-term success in embodied AI depends equally on safe, trustworthy deployment and lifecycle governance as on AI capability advances.
citing papers explorer
-
WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models
WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while making small VLMs competitive with large ones.
-
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
-
DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics
DailyArt recovers full joint parameters of articulated objects from a single static image by synthesizing an opened state and comparing discrepancies, supporting downstream part-level novel state synthesis.
-
Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models
This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.
-
Embodied AI in Action: Insights from SAE World Congress 2026 on Safety, Trust, Robotics, and Real-World Deployment
The paper summarizes expert panel consensus that long-term success in embodied AI depends equally on safe, trustworthy deployment and lifecycle governance as on AI capability advances.