VLM judges exhibit task-dependent uncertainty in their scores, with conformal prediction revealing wide intervals for complex tasks and a decoupling between good ranking performance and poor absolute scoring reliability.
arXiv preprint arXiv:2502.06884 (2025) arXiv:2502.06884 25
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Declared losses recover epistemic distinctions collapsed by scalar neutrosophic T/I/F values in LLM evaluations.
RSCB-MC is a risk-sensitive contextual bandit memory controller for LLM coding agents that chooses safe actions including abstention, achieving 60.5% proxy success with 0% false positives and low latency in 200-case validation.
citing papers explorer
-
VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation
VLM judges exhibit task-dependent uncertainty in their scores, with conformal prediction revealing wide intervals for complex tasks and a decoupling between good ranking performance and poor absolute scoring reliability.
-
From Scalars to Tensors: Declared Losses Recover Epistemic Distinctions That Neutrosophic Scalars Cannot Express
Declared losses recover epistemic distinctions collapsed by scalar neutrosophic T/I/F values in LLM evaluations.
-
Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents
RSCB-MC is a risk-sensitive contextual bandit memory controller for LLM coding agents that chooses safe actions including abstention, achieving 60.5% proxy success with 0% false positives and low latency in 200-case validation.
- LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy