VTAgent uses a question-guided agent to anchor keyframes for evidence-aware Video TextVQA, delivering up to +12 accuracy and new SOTA results via training-free operation plus SFT and RL.
Framethinker: Learning to think with long videos via multi-turn frame spotlighting
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Decoupling planning from answer authority in long-video agents reduces evidence misalignment and raises accuracy to 55.1% on LVBench and 62.0% on LongVideoBench.
PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.
Skyra is an MLLM that detects AI-generated videos by identifying and reasoning over grounded visual artifacts, supported by a new annotated dataset and benchmark.
citing papers explorer
-
VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA
VTAgent uses a question-guided agent to anchor keyframes for evidence-aware Video TextVQA, delivering up to +12 accuracy and new SOTA results via training-free operation plus SFT and RL.
-
VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority
Decoupling planning from answer authority in long-video agents reduces evidence misalignment and raises accuracy to 55.1% on LVBench and 62.0% on LongVideoBench.
-
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.
-
Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning
Skyra is an MLLM that detects AI-generated videos by identifying and reasoning over grounded visual artifacts, supported by a new annotated dataset and benchmark.