Open- mmreasoner: Pushing the frontiers for multimodal rea- soning with an open and general recipe.arXiv preprint arXiv:2511.16334, 2025

Kaichen Zhang, Keming Wu, Zuhao Yang, Kairui Hu, Bin Wang, Ziwei Liu, Xingxuan Li, Lidong Bing · 2025 · arXiv 2511.16334

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

dataset 2 background 1

citation-polarity summary

use dataset 2 background 1

representative citing papers

SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

SVAgent improves long video question answering by constructing storylines via multi-agent collaboration and aligning cross-modal predictions for more robust, human-like reasoning.

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

cs.CV · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

ParaVT introduces the first multi-agent RL framework for parallel video tool calling in LMMs, using PARA-GRPO to resolve the Tool Prior Paradox and achieve +7.9% average improvement over Qwen3-VL baseline across six benchmarks.

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

cs.CV · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.

ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch

cs.CV · 2026-01-20 · conditional · novelty 6.0

ChartVerse uses Rollout Posterior Entropy and truth-anchored inverse QA synthesis to produce 640K high-quality chart reasoning samples, training an 8B model that surpasses its 30B teacher.

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

cs.CV · 2025-11-25 · unverdicted · novelty 6.0

LongVT adds native video-cropping tool calling to LMMs for interleaved multimodal chain-of-tool-thought reasoning on long videos and releases VideoSIAH data for training and evaluation.

OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

OmniThoughtVis curates 1.8M multimodal CoT samples via teacher distillation, difficulty annotation, and tag-based sampling, yielding consistent gains on nine reasoning benchmarks and allowing 4B models to match or beat undistilled 8B baselines.

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

cs.CV · 2026-04-09 · unverdicted · novelty 5.0

HDPO reframes tool efficiency as a conditional objective within accurate trajectories, enabling Metis to reduce tool invocations by orders of magnitude while raising reasoning accuracy.

citing papers explorer

Showing 7 of 7 citing papers.

SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration cs.CV · 2026-04-06 · unverdicted · none · ref 54
SVAgent improves long video question answering by constructing storylines via multi-agent collaboration and aligning cross-modal predictions for more robust, human-like reasoning.
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning cs.CV · 2026-05-19 · unverdicted · none · ref 48 · 2 links
ParaVT introduces the first multi-agent RL framework for parallel video tool calling in LMMs, using PARA-GRPO to resolve the Tool Prior Paradox and achieve +7.9% average improvement over Qwen3-VL baseline across six benchmarks.
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs cs.CV · 2026-05-01 · unverdicted · none · ref 94 · 2 links
PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.
ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch cs.CV · 2026-01-20 · conditional · none · ref 46
ChartVerse uses Rollout Posterior Entropy and truth-anchored inverse QA synthesis to produce 640K high-quality chart reasoning samples, training an 8B model that surpasses its 30B teacher.
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling cs.CV · 2025-11-25 · unverdicted · none · ref 61
LongVT adds native video-cropping tool calling to LMMs for interleaved multimodal chain-of-tool-thought reasoning on long videos and releases VideoSIAH data for training and evaluation.
OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models cs.CL · 2026-05-12 · unverdicted · none · ref 30
OmniThoughtVis curates 1.8M multimodal CoT samples via teacher distillation, difficulty annotation, and tag-based sampling, yielding consistent gains on nine reasoning benchmarks and allowing 4B models to match or beat undistilled 8B baselines.
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models cs.CV · 2026-04-09 · unverdicted · none · ref 49
HDPO reframes tool efficiency as a conditional objective within accurate trajectories, enabling Metis to reduce tool invocations by orders of magnitude while raising reasoning accuracy.

Open- mmreasoner: Pushing the frontiers for multimodal rea- soning with an open and general recipe.arXiv preprint arXiv:2511.16334, 2025

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer