EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
BalanceRAG uses sequential graphical testing on a 2D lattice of threshold pairs to certify safe operating points that meet target risk levels in cascaded RAG while increasing coverage.
citing papers explorer
-
EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding
EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
-
BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation
BalanceRAG uses sequential graphical testing on a 2D lattice of threshold pairs to certify safe operating points that meet target risk levels in cascaded RAG while increasing coverage.