Mixed citations

Sensenova-mars: Empowering multimodal agentic reasoning and search via reinforcement learning

SenseNova-MARS: Empowering Multimodal Agentic Reasoning, Search via Reinforcement Learning , author= · 2025 · arXiv 2512.24330

Mixed citation behavior. Most common role is background (60%).

10 Pith papers citing it

Background 60% of classified citations

read on arXiv browse 10 citing papers

citation-role summary

background 3 baseline 1 dataset 1

citation-polarity summary

background 3 baseline 1 use dataset 1

representative citing papers

GeoVista: Visually Grounded Active Perception for Ultra-High-Resolution Remote Sensing Understanding

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

GeoVista introduces a planning-driven active perception framework with global exploration plans, branch-wise local inspection, and explicit evidence tracking to achieve state-of-the-art results on ultra-high-resolution remote sensing benchmarks.

ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

ProMSA is a progressive multimodal search agent for KB-VQA that iteratively selects search tools under budgets, trained via rejection-sampling SFT then TN-GSPO RL, reporting gains on E-VQA and InfoSeek over RAG baselines.

IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing

cs.AI · 2026-06-11 · unverdicted · novelty 6.0

IterCAD is a multimodal agent framework using progressive SFT and geometry-aware RL for CAD tasks, with a new data pipeline, IterCAD-Bench, and CD-TR metric showing outperformance in executability and precision.

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.

InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

InterSketch improves long-horizon visual-textual chain-of-thought in VLMs by dynamically generating and interleaving self-correcting visual sketches with text, using a synthesized dataset plus reflection in cold-start followed by stepwise-reward RL, and reports outperforming Gemini-3-Pro on benchmar

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Vision-OPD transfers an MLLM's privileged regional perception to its full-image policy through on-policy token-level self-distillation, yielding competitive results on fine-grained visual benchmarks.

POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

cs.LG · 2026-02-20 · conditional · novelty 6.0 · 2 refs

MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

cs.CV · 2026-04-09 · unverdicted · novelty 5.0

HDPO reframes tool efficiency as a conditional objective within accurate trajectories, enabling Metis to reduce tool invocations by orders of magnitude while raising reasoning accuracy.

SimpleSearch-VL: A Simple Recipe for Multimodal Agentic Deep Search

cs.CV · 2026-06-30 · unverdicted · novelty 4.0

SimpleSearch-VL improves Qwen3-VL multimodal agent baselines by 15.8-16 points on average using 7K total training examples and reaches parity with Gemini-3-Pro on the 30B variant.

citing papers explorer

Showing 10 of 10 citing papers.

GeoVista: Visually Grounded Active Perception for Ultra-High-Resolution Remote Sensing Understanding cs.CV · 2026-05-14 · unverdicted · none · ref 22
GeoVista introduces a planning-driven active perception framework with global exploration plans, branch-wise local inspection, and explicit evidence tracking to achieve state-of-the-art results on ultra-high-resolution remote sensing benchmarks.
ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering cs.CV · 2026-06-26 · unverdicted · none · ref 8
ProMSA is a progressive multimodal search agent for KB-VQA that iteratively selects search tools under budgets, trained via rejection-sampling SFT then TN-GSPO RL, reporting gains on E-VQA and InfoSeek over RAG baselines.
IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing cs.AI · 2026-06-11 · unverdicted · none · ref 31
IterCAD is a multimodal agent framework using progressive SFT and geometry-aware RL for CAD tasks, with a new data pipeline, IterCAD-Bench, and CD-TR metric showing outperformance in executability and precision.
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning cs.CL · 2026-05-27 · unverdicted · none · ref 47
AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.
InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward cs.CV · 2026-05-26 · unverdicted · none · ref 15
InterSketch improves long-horizon visual-textual chain-of-thought in VLMs by dynamically generating and interleaving self-correcting visual sketches with text, using a synthesized dataset plus reflection in cold-start followed by stepwise-reward RL, and reports outperforming Gemini-3-Pro on benchmar
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation cs.CV · 2026-05-18 · unverdicted · none · ref 3 · 2 links
Vision-OPD transfers an MLLM's privileged regional perception to its full-image policy through on-policy token-level self-distillation, yielding competitive results on fine-grained visual benchmarks.
POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch cs.CV · 2026-04-15 · unverdicted · none · ref 7
POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.
MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs? cs.LG · 2026-02-20 · conditional · none · ref 15 · 2 links
MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models cs.CV · 2026-04-09 · unverdicted · none · ref 5
HDPO reframes tool efficiency as a conditional objective within accurate trajectories, enabling Metis to reduce tool invocations by orders of magnitude while raising reasoning accuracy.
SimpleSearch-VL: A Simple Recipe for Multimodal Agentic Deep Search cs.CV · 2026-06-30 · unverdicted · none · ref 80
SimpleSearch-VL improves Qwen3-VL multimodal agent baselines by 15.8-16 points on average using 7K total training examples and reaches parity with Gemini-3-Pro on the 30B variant.

Sensenova-mars: Empowering multimodal agentic reasoning and search via reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer