pith. sign in

Mixed citations

Sensenova-mars: Empowering multimodal agentic reasoning and search via reinforcement learning

Mixed citation behavior. Most common role is background (60%).

10 Pith papers citing it
Background 60% of classified citations

citation-role summary

background 3 baseline 1 dataset 1

citation-polarity summary

years

2026 10

clear filters

representative citing papers

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Agent Explorative Policy Optimization for Multimodal Agentic Reasoning cs.CL · 2026-05-27 · unverdicted · none · ref 47

    AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.