Boosting mllm reasoning with text-debiased hint-grpo.arXiv preprint arXiv:2503.23905

Boosting mllm reasoning with text-debiased hint-grpo · 2025 · arXiv 2503.23905

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams

cs.AI · 2026-05-02 · unverdicted · novelty 7.0

DiagramNet supplies a new multimodal dataset and progressive training pipeline with decoupled multi-agent workflow, allowing a 3B model to outperform GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x on system-level diagram tasks while generalizing to other benchmarks.

MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models

cs.CV · 2026-05-05 · unverdicted · novelty 6.0

MHPR is a multidimensional benchmark for LVLM human-centric perception-reasoning with C-RD, SFT-D, RL-D, T-D data tiers and ACVG pipeline, showing training gains on Qwen2.5-VL-7B to near-parity with larger models.

No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning

cs.AI · 2026-01-11 · unverdicted · novelty 6.0

ECHO jointly optimizes policy and critic via co-evolution, cascaded rollouts, and saturation-aware shaping to deliver non-stale feedback and higher success in open-world LLM agent RL.

citing papers explorer

Showing 3 of 3 citing papers.

DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams cs.AI · 2026-05-02 · unverdicted · none · ref 8
DiagramNet supplies a new multimodal dataset and progressive training pipeline with decoupled multi-agent workflow, allowing a 3B model to outperform GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x on system-level diagram tasks while generalizing to other benchmarks.
MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models cs.CV · 2026-05-05 · unverdicted · none · ref 4
MHPR is a multidimensional benchmark for LVLM human-centric perception-reasoning with C-RD, SFT-D, RL-D, T-D data tiers and ACVG pipeline, showing training gains on Qwen2.5-VL-7B to near-parity with larger models.
No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning cs.AI · 2026-01-11 · unverdicted · none · ref 7
ECHO jointly optimizes policy and critic via co-evolution, cascaded rollouts, and saturation-aware shaping to deliver non-stale feedback and higher success in open-world LLM agent RL.

Boosting mllm reasoning with text-debiased hint-grpo.arXiv preprint arXiv:2503.23905

fields

years

verdicts

representative citing papers

citing papers explorer