arXiv preprint arXiv:2502.16427 (2025)

Chu, S · 2025 · arXiv 2502.16427

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Graph it first! Enabling Reasoning on Long-form Egocentric Videos through Scene Graphs

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

Egocentric Scene Graphs convert long videos into short structured text so MLLMs can answer questions about entire sequences, achieving SOTA on HD-EPIC VQA.

Learning to Evolve Scenes: Reasoning about Human Activities with Scene Graphs

cs.CV · 2026-07-02 · unverdicted · novelty 5.0

SG-Ego dataset and GLEN model enable structured reasoning over spatio-temporal scene graphs for ego-centric activity understanding, introducing the A-GEF forecasting task.

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

cs.CV · 2026-06-05 · unverdicted · novelty 4.0

This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Graph it first! Enabling Reasoning on Long-form Egocentric Videos through Scene Graphs cs.CV · 2026-06-24 · unverdicted · none · ref 8
Egocentric Scene Graphs convert long videos into short structured text so MLLMs can answer questions about entire sequences, achieving SOTA on HD-EPIC VQA.
Learning to Evolve Scenes: Reasoning about Human Activities with Scene Graphs cs.CV · 2026-07-02 · unverdicted · none · ref 40
SG-Ego dataset and GLEN model enable structured reasoning over spatio-temporal scene graphs for ego-centric activity understanding, introducing the A-GEF forecasting task.
Watch, Remember, Reason: Human-View Video Understanding with MLLMs cs.CV · 2026-06-05 · unverdicted · none · ref 91
This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.

arXiv preprint arXiv:2502.16427 (2025)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer