Framemind: Frame-interleaved video reasoning via reinforcement learning

Haonan Ge, Yiwei Wang, Kai-Wei Chang, Hang Wu, Yujun Cai · 2025 · arXiv 2509.24008

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1 baseline 1 other 1

citation-polarity summary

background 1 baseline 1 unclear 1

representative citing papers

ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

ReTool-Video uses a 134-tool meta-augmented library and recursive grounding to translate abstract video intents into fine-grained multimodal operations, outperforming baselines on MVBench, MLVU, and Video-MME.

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

SAVEMem improves streaming video understanding scores by adding semantic awareness to memory compression and query-adaptive retrieval without any model training.

CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning

cs.CV · 2026-01-30 · unverdicted · novelty 7.0

CamReasoner uses structured O-T-A reasoning and RL on 56k samples to lift camera movement classification from 73.8% to 78.4% and VQA from 60.9% to 74.5% on Qwen2.5-VL-7B.

VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

VideoStir introduces a spatio-temporal graph-based structure and intent-aware retrieval for long-video RAG, achieving competitive performance with SOTA methods via a new IR-600K dataset.

AdaTooler-V: Adaptive Tool-Use for Images and Videos

cs.CV · 2025-12-18 · conditional · novelty 6.0

AdaTooler-V trains MLLMs to adaptively use vision tools via AT-GRPO reinforcement learning and new datasets, reaching 89.8% on V* and outperforming GPT-4o.

citing papers explorer

Showing 5 of 5 citing papers.

ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding cs.CV · 2026-05-13 · unverdicted · none · ref 9
ReTool-Video uses a 134-tool meta-augmented library and recursive grounding to translate abstract video intents into fine-grained multimodal operations, outperforming baselines on MVBench, MLVU, and Video-MME.
Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding cs.CV · 2026-05-08 · unverdicted · none · ref 9
SAVEMem improves streaming video understanding scores by adding semantic awareness to memory compression and query-adaptive retrieval without any model training.
CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning cs.CV · 2026-01-30 · unverdicted · none · ref 18
CamReasoner uses structured O-T-A reasoning and RL on 56k samples to lift camera movement classification from 73.8% to 78.4% and VQA from 60.9% to 74.5% on Qwen2.5-VL-7B.
VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG cs.CV · 2026-04-07 · unverdicted · none · ref 46
VideoStir introduces a spatio-temporal graph-based structure and intent-aware retrieval for long-video RAG, achieving competitive performance with SOTA methods via a new IR-600K dataset.
AdaTooler-V: Adaptive Tool-Use for Images and Videos cs.CV · 2025-12-18 · conditional · none · ref 15
AdaTooler-V trains MLLMs to adaptively use vision tools via AT-GRPO reinforcement learning and new datasets, reaching 89.8% on V* and outperforming GPT-4o.

Framemind: Frame-interleaved video reasoning via reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer