IntentVLM uses forward-inverse modeling in a two-stage video-language setup to reach up to 80% accuracy on open-vocabulary intention recognition benchmarks, beating baselines by 30% and matching human performance.
I-failsense: Towards general robotic failure detection with vision-language models
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
Foresight detects failures in long-horizon robotic manipulation using latents from action-conditioned world models trained only on task-level labels and calibrated via functional conformal prediction.
A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.
FAR combines failure-contrastive preference adaptation with action perturbations for test-time recovery and continual policy improvement, reporting 17.6% and 11.7% success gains over diffusion policies in simulation and real-world manipulation tasks.
Fail-RAG is a retrieval-augmented generation framework that detects and describes robot failures in warehouse tasks by querying an embedded failure database and applying VLMs, showing 25 percentage point higher accuracy than off-the-shelf VLMs.
citing papers explorer
-
IntentVLM: Open-Vocabulary Intention Recognition through Forward-Inverse Modeling with Video-Language Models
IntentVLM uses forward-inverse modeling in a two-stage video-language setup to reach up to 80% accuracy on open-vocabulary intention recognition benchmarks, beating baselines by 30% and matching human performance.
-
Foresight: Failure Detection for Long-Horizon Robotic Manipulation with Action-Conditioned World Model Latents
Foresight detects failures in long-horizon robotic manipulation using latents from action-conditioned world models trained only on task-level labels and calibrated via functional conformal prediction.
-
A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring
A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.
-
FAR: Failure-Aware Retry for Test-Time Recovery and Continual Policy Improvement
FAR combines failure-contrastive preference adaptation with action perturbations for test-time recovery and continual policy improvement, reporting 17.6% and 11.7% success gains over diffusion policies in simulation and real-world manipulation tasks.
-
Fail-RAG : A Retrieval Augmented Generation Informed Framework for Robot Failure Identification
Fail-RAG is a retrieval-augmented generation framework that detects and describes robot failures in warehouse tasks by querying an embedded failure database and applying VLMs, showing 25 percentage point higher accuracy than off-the-shelf VLMs.