Spark: Multi-vision sensor perception and reasoning benchmark for large-scale vision-language models.arXiv preprint arXiv:2408.12114, 2024

Youngjoon Yu, Sangyun Chung, Byung-Kwan Lee, Yong Man Ro · 2024 · arXiv 2408.12114

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.

Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning

cs.CV · 2026-05-25 · unverdicted · novelty 5.0

MARS introduces mono-anchored advantage normalization to quantify information gain from multi-source integration in RLVR, yielding 3.2% and 4.9% gains on GRPO and DAPO.

citing papers explorer

Showing 2 of 2 citing papers.

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning cs.CL · 2026-05-27 · unverdicted · none · ref 85
AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.
Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning cs.CV · 2026-05-25 · unverdicted · none · ref 25
MARS introduces mono-anchored advantage normalization to quantify information gain from multi-source integration in RLVR, yielding 3.2% and 4.9% gains on GRPO and DAPO.

Spark: Multi-vision sensor perception and reasoning benchmark for large-scale vision-language models.arXiv preprint arXiv:2408.12114, 2024

fields

years

verdicts

representative citing papers

citing papers explorer