arXiv preprint arXiv:2510.12422 , year=

VideoLucy: Deep Memory Backtracking for Long Video Understanding , author= · 2025 · arXiv 2510.12422

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

M$^3$Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

M³Eval is a new cognitively-grounded benchmark that evaluates memory dimensions in multi-modal video models and reports consistent model weaknesses in disentanglement, interference, spatial-temporal grounding, and symbolic recall.

MemoryCard: Topic-Aware Multi-Modal Clue Compression for Long-Video Question Answering

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

MemoryCard organizes long videos into self-contained topic-aware Memory Cards that improve long-video QA accuracy by up to 21.8% relative under fixed visual-token budgets.

GOPAgen: Motion-Aware and Efficient Agentic Long-Video Understanding with Structural Memory and Hierarchical Reasoning

cs.CV · 2026-06-03 · unverdicted · novelty 6.0

GOPAgen proposes integrating video codec GOPs with a motion agent, GOP tree reasoning, structural memory, and motion vector database to improve efficiency and motion detail in agentic long-video VQA, reporting gains on MotionBench and EgoSchema.

HiCrew: Hierarchical Reasoning for Long-Form Video Understanding via Question-Aware Multi-Agent Collaboration

cs.AI · 2026-04-23 · unverdicted · novelty 6.0

HiCrew improves long-form video question answering on EgoSchema and NExT-QA via a hybrid tree for temporal topology, question-aware captioning, and adaptive multi-agent planning, with gains in temporal and causal reasoning.

citing papers explorer

Showing 4 of 4 citing papers after filters.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding cs.CV · 2026-05-11 · unverdicted · none · ref 66
EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
M$^3$Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks cs.CV · 2026-06-03 · unverdicted · none · ref 81
M³Eval is a new cognitively-grounded benchmark that evaluates memory dimensions in multi-modal video models and reports consistent model weaknesses in disentanglement, interference, spatial-temporal grounding, and symbolic recall.
MemoryCard: Topic-Aware Multi-Modal Clue Compression for Long-Video Question Answering cs.CV · 2026-06-04 · unverdicted · none · ref 68
MemoryCard organizes long videos into self-contained topic-aware Memory Cards that improve long-video QA accuracy by up to 21.8% relative under fixed visual-token budgets.
GOPAgen: Motion-Aware and Efficient Agentic Long-Video Understanding with Structural Memory and Hierarchical Reasoning cs.CV · 2026-06-03 · unverdicted · none · ref 22
GOPAgen proposes integrating video codec GOPs with a motion agent, GOP tree reasoning, structural memory, and motion vector database to improve efficiency and motion detail in agentic long-video VQA, reporting gains on MotionBench and EgoSchema.

arXiv preprint arXiv:2510.12422 , year=

fields

years

verdicts

representative citing papers

citing papers explorer