VLM judges exhibit task-dependent uncertainty in their scores, with conformal prediction revealing wide intervals for complex tasks and a decoupling between good ranking performance and poor absolute scoring reliability.
Uncertainty-aware LiDAR -camera autonomy via conformal prediction and principled abstention
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation
VLM judges exhibit task-dependent uncertainty in their scores, with conformal prediction revealing wide intervals for complex tasks and a decoupling between good ranking performance and poor absolute scoring reliability.