DiagramNet supplies a new multimodal dataset and progressive training pipeline with decoupled multi-agent workflow, allowing a 3B model to outperform GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x on system-level diagram tasks while generalizing to other benchmarks.
Boosting mllm reasoning with text-debiased hint-grpo.arXiv preprint arXiv:2503.23905
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
MHPR is a multidimensional benchmark for LVLM human-centric perception-reasoning with C-RD, SFT-D, RL-D, T-D data tiers and ACVG pipeline, showing training gains on Qwen2.5-VL-7B to near-parity with larger models.
ECHO jointly optimizes policy and critic via co-evolution, cascaded rollouts, and saturation-aware shaping to deliver non-stale feedback and higher success in open-world LLM agent RL.
citing papers explorer
-
DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams
DiagramNet supplies a new multimodal dataset and progressive training pipeline with decoupled multi-agent workflow, allowing a 3B model to outperform GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x on system-level diagram tasks while generalizing to other benchmarks.
-
MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models
MHPR is a multidimensional benchmark for LVLM human-centric perception-reasoning with C-RD, SFT-D, RL-D, T-D data tiers and ACVG pipeline, showing training gains on Qwen2.5-VL-7B to near-parity with larger models.
-
No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning
ECHO jointly optimizes policy and critic via co-evolution, cascaded rollouts, and saturation-aware shaping to deliver non-stale feedback and higher success in open-world LLM agent RL.