Egocentric Scene Graphs convert long videos into short structured text so MLLMs can answer questions about entire sequences, achieving SOTA on HD-EPIC VQA.
arXiv preprint arXiv:2504.04550 (2025)
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces V-RAGBench benchmark and CARVE method that selects per-chunk retrieval configurations via parallel retrievers and adaptive reranking, outperforming eight VideoRAG baselines.
Fine-tuned VLMs guided by eye gaze and ego motion achieve 14.5% accuracy improvement over a transformer baseline for egocentric pedestrian intent decoding.
citing papers explorer
-
Rethinking RAG in Long Videos: What to Retrieve and How to Use It?
Introduces V-RAGBench benchmark and CARVE method that selects per-chunk retrieval configurations via parallel retrievers and adaptive reranking, outperforming eight VideoRAG baselines.