Mmbench: Is your multi-modal model an all-around player? InEuropean conference on computer vi- sion, pages 216–233

Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, et al · 2024

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing

cs.CV · 2026-02-04 · unverdicted · novelty 8.0

VLRS-Bench is the first benchmark dedicated to complex vision-language reasoning in remote sensing, with 2000 QA pairs across 14 tasks in cognition, decision, and prediction dimensions.

WildFireVQA: A Large-Scale Radiometric Thermal VQA Benchmark for Aerial Wildfire Monitoring

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

WildFireVQA is a new large-scale visual question answering benchmark that pairs RGB imagery with radiometric thermal measurements for aerial wildfire monitoring across six task categories.

CaptionQA: Is Your Caption as Useful as the Image Itself?

cs.CV · 2025-11-26 · conditional · novelty 7.0

CaptionQA is a new benchmark with 33,027 questions across natural, document, e-commerce, and embodied AI domains that measures how much utility model-generated captions retain compared to original images when used by LLMs for downstream tasks.

Multimodal Reinforcement Learning with Adaptive Verifier for AI Agents

cs.AI · 2025-12-03 · unverdicted · novelty 6.0

Argos is an agentic verifier that adaptively picks scoring functions to evaluate accuracy, localization, and reasoning quality, enabling stronger multimodal RL training for AI agents.

citing papers explorer

Showing 4 of 4 citing papers.

VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing cs.CV · 2026-02-04 · unverdicted · none · ref 28
VLRS-Bench is the first benchmark dedicated to complex vision-language reasoning in remote sensing, with 2000 QA pairs across 14 tasks in cognition, decision, and prediction dimensions.
WildFireVQA: A Large-Scale Radiometric Thermal VQA Benchmark for Aerial Wildfire Monitoring cs.CV · 2026-04-22 · unverdicted · none · ref 19
WildFireVQA is a new large-scale visual question answering benchmark that pairs RGB imagery with radiometric thermal measurements for aerial wildfire monitoring across six task categories.
CaptionQA: Is Your Caption as Useful as the Image Itself? cs.CV · 2025-11-26 · conditional · none · ref 25
CaptionQA is a new benchmark with 33,027 questions across natural, document, e-commerce, and embodied AI domains that measures how much utility model-generated captions retain compared to original images when used by LLMs for downstream tasks.
Multimodal Reinforcement Learning with Adaptive Verifier for AI Agents cs.AI · 2025-12-03 · unverdicted · none · ref 38
Argos is an agentic verifier that adaptively picks scoring functions to evaluate accuracy, localization, and reasoning quality, enabling stronger multimodal RL training for AI agents.

Mmbench: Is your multi-modal model an all-around player? InEuropean conference on computer vi- sion, pages 216–233

fields

years

verdicts

representative citing papers

citing papers explorer