MHPR is a multidimensional benchmark for LVLM human-centric perception-reasoning with C-RD, SFT-D, RL-D, T-D data tiers and ACVG pipeline, showing training gains on Qwen2.5-VL-7B to near-parity with larger models.
Countercurate: Enhancing physical and semantic visio-linguistic compositional reasoning via counterfactual examples.arXiv preprint arXiv:2402.13254, 2024a
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Argos is an agentic verifier that adaptively picks scoring functions to evaluate accuracy, localization, and reasoning quality, enabling stronger multimodal RL training for AI agents.
DeFacto trains multimodal models with counterfactual image variants and GRPO reinforcement learning to enforce that correct answers are supported by correct visual evidence.
citing papers explorer
-
MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models
MHPR is a multidimensional benchmark for LVLM human-centric perception-reasoning with C-RD, SFT-D, RL-D, T-D data tiers and ACVG pipeline, showing training gains on Qwen2.5-VL-7B to near-parity with larger models.
-
Multimodal Reinforcement Learning with Adaptive Verifier for AI Agents
Argos is an agentic verifier that adaptively picks scoring functions to evaluate accuracy, localization, and reasoning quality, enabling stronger multimodal RL training for AI agents.
-
DeFacto: Counterfactual Thinking with Images for Enforcing Evidence-Grounded and Faithful Reasoning
DeFacto trains multimodal models with counterfactual image variants and GRPO reinforcement learning to enforce that correct answers are supported by correct visual evidence.