arXiv preprint arXiv:2512.04540 , year=

Hongbo Jin, Qingyuan Wang, Wenhao Zhang, Yang Liu, Sijie Cheng · 2025 · arXiv 2512.04540

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

LMM-Track4D formulates a trajectory-grounded dialogue task, releases Track4D-Bench with 526 samples, and proposes RTGE encoding, TRK state token, and OSK-RA decoder to elicit better 4D spatiotemporal reasoning in LMMs.

VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

VideoStir introduces a spatio-temporal graph-based structure and intent-aware retrieval for long-video RAG, achieving competitive performance with SOTA methods via a new IR-600K dataset.

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

cs.AI · 2026-03-01 · unverdicted · novelty 6.0

HiMAC decomposes LLM agent tasks into macro planning and micro execution using critic-free hierarchical RL and iterative co-evolution, outperforming baselines on ALFWorld, WebShop, and Sokoban.

citing papers explorer

Showing 4 of 4 citing papers.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding cs.CV · 2026-05-11 · unverdicted · none · ref 74
EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue cs.CV · 2026-05-19 · unverdicted · none · ref 20
LMM-Track4D formulates a trajectory-grounded dialogue task, releases Track4D-Bench with 526 samples, and proposes RTGE encoding, TRK state token, and OSK-RA decoder to elicit better 4D spatiotemporal reasoning in LMMs.
VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG cs.CV · 2026-04-07 · unverdicted · none · ref 36
VideoStir introduces a spatio-temporal graph-based structure and intent-aware retrieval for long-video RAG, achieving competitive performance with SOTA methods via a new IR-600K dataset.
HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents cs.AI · 2026-03-01 · unverdicted · none · ref 22
HiMAC decomposes LLM agent tasks into macro planning and micro execution using critic-free hierarchical RL and iterative co-evolution, outperforming baselines on ALFWorld, WebShop, and Sokoban.

arXiv preprint arXiv:2512.04540 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer