IntentVLM uses forward-inverse modeling in a two-stage video-language setup to reach up to 80% accuracy on open-vocabulary intention recognition benchmarks, beating baselines by 30% and matching human performance.
I-failsense: Towards general robotic failure detection with vision-language models
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.
citing papers explorer
-
IntentVLM: Open-Vocabulary Intention Recognition through Forward-Inverse Modeling with Video-Language Models
IntentVLM uses forward-inverse modeling in a two-stage video-language setup to reach up to 80% accuracy on open-vocabulary intention recognition benchmarks, beating baselines by 30% and matching human performance.
-
A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring
A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.