VLM judges exhibit task-dependent uncertainty in their scores, with conformal prediction revealing wide intervals for complex tasks and a decoupling between good ranking performance and poor absolute scoring reliability.
Uncertainty-guided inference-time depth adaptation for transformer-based visual tracking
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation
VLM judges exhibit task-dependent uncertainty in their scores, with conformal prediction revealing wide intervals for complex tasks and a decoupling between good ranking performance and poor absolute scoring reliability.