Egocentric Scene Graphs convert long videos into short structured text so MLLMs can answer questions about entire sequences, achieving SOTA on HD-EPIC VQA.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
ESC uses emotional cues triggered by an external verifier to enable training-free self-correction in VLMs, improving reliability on safety, hallucination, and reasoning benchmarks.
citing papers explorer
-
Graph it first! Enabling Reasoning on Long-form Egocentric Videos through Scene Graphs
Egocentric Scene Graphs convert long videos into short structured text so MLLMs can answer questions about entire sequences, achieving SOTA on HD-EPIC VQA.
-
ESC: Emotional Self-Correction for Reliable Vision-Language Models
ESC uses emotional cues triggered by an external verifier to enable training-free self-correction in VLMs, improving reliability on safety, hallucination, and reasoning benchmarks.