JudgeBench: A Benchmark for Evaluating

Sijun Tan, Siyuan Zhuang, Kyle Montgomery, William Yuan Tang, Alejandro Cuadron, Chenguang Wang · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias

cs.AI · 2026-04-20 · unverdicted · novelty 7.0

VLMs as judges exhibit informativeness bias by favoring detailed but image-inconsistent answers; BIRCH mitigates it by first correcting answers against the image, reducing bias up to 17% and improving performance up to 9.8%.

citing papers explorer

Showing 1 of 1 citing paper after filters.

When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias cs.AI · 2026-04-20 · unverdicted · none · ref 6
VLMs as judges exhibit informativeness bias by favoring detailed but image-inconsistent answers; BIRCH mitigates it by first correcting answers against the image, reducing bias up to 17% and improving performance up to 9.8%.

JudgeBench: A Benchmark for Evaluating

fields

years

verdicts

representative citing papers

citing papers explorer