DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.
hub
arXiv preprint arXiv:2410.00371 , year=
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
The paper introduces health-conditioned VLA models that incorporate a health vector via a new projector module and train on 128 malfunction episodes in the LIBERO simulator to complete tasks despite degraded joints.
AgentChord models manipulation tasks as directed graphs enriched with anticipatory recovery branches, using specialized agents to enable immediate, low-latency failure responses and improve success on long-horizon bimanual tasks.
A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.
OBEYED-VLA improves VLA robustness in cluttered real-world manipulation by disentangling perception into VLM-based object-centric grounding and geometry-aware stages, then fine-tuning the policy only on single-object demonstrations.
A deep RL vulnerability-prediction policy trained in semantic embedding space finds up to 23% more unique robot manipulation failures than vision-language baselines and enables more efficient fine-tuning.
AFIL trains dual action generators on success and failure rollouts from a pretrained VLA to steer diffusion policies away from failure modes during inference.
EUEA fine-tunes VLMs on object perception, task planning, action understanding and goal recognition, with recovery and GRPO, to raise ALFRED success rates by 11.89% over behavior cloning.
Hierarchical framework pairs in-context VLMs for high-level plan synthesis with RL-trained low-level skills and failure recovery to reach 92% success on long-horizon DLO routing across varied scenes and language inputs.
VLBiMan framework enables generalizable bimanual manipulation from single human demonstrations via vision-language anchored task decomposition and adaptation without retraining.
ThinkAct introduces reinforced visual latent planning in a dual VLA system to enable better long-horizon reasoning and adaptation for embodied tasks.
citing papers explorer
-
DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies
DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.
-
Health-Conditioned Vision-Language-Action Models for Malfunction-Aware Robot Control
The paper introduces health-conditioned VLA models that incorporate a health vector via a new projector module and train on 128 malfunction episodes in the LIBERO simulator to complete tasks despite degraded joints.
-
From Reaction to Anticipation: Proactive Failure Recovery through Agentic Task Graph for Robotic Manipulation
AgentChord models manipulation tasks as directed graphs enriched with anticipatory recovery branches, using specialized agents to enable immediate, low-latency failure responses and improve success on long-horizon bimanual tasks.
-
A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring
A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.
-
Clutter-Robust Vision-Language-Action Models through Object-Centric and Geometry Grounding
OBEYED-VLA improves VLA robustness in cluttered real-world manipulation by disentangling perception into VLM-based object-centric grounding and geometry-aware stages, then fine-tuning the policy only on single-object demonstrations.
-
RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields
A deep RL vulnerability-prediction policy trained in semantic embedding space finds up to 23% more unique robot manipulation failures than vision-language baselines and enables more efficient fine-tuning.
-
Failing Forward: Adaptive Failure-Informed Learning for Vision-Language-Action Models
AFIL trains dual action generators on success and failure rollouts from a pretrained VLA to steer diffusion policies away from failure modes during inference.
-
Environmental Understanding Vision-Language Model for Embodied Agent
EUEA fine-tunes VLMs on object perception, task planning, action understanding and goal recognition, with recovery and GRPO, to raise ALFRED success rates by 11.89% over behavior cloning.
-
Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models
Hierarchical framework pairs in-context VLMs for high-level plan synthesis with RL-trained low-level skills and failure recovery to reach 92% success on long-horizon DLO routing across varied scenes and language inputs.
-
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
VLBiMan framework enables generalizable bimanual manipulation from single human demonstrations via vision-language anchored task decomposition and adaptation without retraining.
-
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
ThinkAct introduces reinforced visual latent planning in a dual VLA system to enable better long-horizon reasoning and adaptation for embodied tasks.
- Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery