KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
Sketch, ground, and refine: Top-down dense video captioning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
method 1
citation-polarity summary
fields
cs.RO 1years
2026 1verdicts
UNVERDICTED 1roles
method 1polarities
use method 1representative citing papers
citing papers explorer
-
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.