Egocentric Scene Graphs convert long videos into short structured text so MLLMs can answer questions about entire sequences, achieving SOTA on HD-EPIC VQA.
arXiv preprint arXiv:2502.16427 (2025)
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
SG-Ego dataset and GLEN model enable structured reasoning over spatio-temporal scene graphs for ego-centric activity understanding, introducing the A-GEF forecasting task.
This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.
citing papers explorer
-
Graph it first! Enabling Reasoning on Long-form Egocentric Videos through Scene Graphs
Egocentric Scene Graphs convert long videos into short structured text so MLLMs can answer questions about entire sequences, achieving SOTA on HD-EPIC VQA.
-
Learning to Evolve Scenes: Reasoning about Human Activities with Scene Graphs
SG-Ego dataset and GLEN model enable structured reasoning over spatio-temporal scene graphs for ego-centric activity understanding, introducing the A-GEF forecasting task.
-
Watch, Remember, Reason: Human-View Video Understanding with MLLMs
This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.