A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions

URLhttps://aclanthology · 2025 · DOI 10.1145/3744238

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A Benchmark for Hallucination Detection in VLMs for Gastrointestinal Endoscopy

cs.CV · 2026-06-23 · unverdicted · novelty 7.0

White-box method ReXTrust achieves highest AUC (peak 93.0) on Gut-VLM across five VLMs, outperforming alternatives by statistically significant margins while black-box and some gray-box methods collapse on certain models.

Leveraging Visual Signals for Robust Token-Level Uncertainty in Vision-Language Generation

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

VIG-TUQ improves token-level uncertainty estimation in LVLMs by weighting language uncertainty with visual grounding scores based on the observation that confident predictions rely more on visual content.

Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis

cs.CL · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

cs.CL · 2026-04-02 · unverdicted · novelty 6.0

SWAY quantifies sycophancy in LLMs via shifts under linguistic pressure and a counterfactual chain-of-thought mitigation reduces it to near zero while preserving responsiveness to genuine evidence.

What if AI systems weren't chatbots?

cs.CY · 2026-05-08 · unverdicted · novelty 3.0

Chatbot AI systems often fail complex needs while projecting authority, contributing to deskilling, labor displacement, economic concentration, and high environmental costs, so alternative pluralistic and task-specific designs are needed.

citing papers explorer

Showing 5 of 5 citing papers after filters.

A Benchmark for Hallucination Detection in VLMs for Gastrointestinal Endoscopy cs.CV · 2026-06-23 · unverdicted · none · ref 23
White-box method ReXTrust achieves highest AUC (peak 93.0) on Gut-VLM across five VLMs, outperforming alternatives by statistically significant margins while black-box and some gray-box methods collapse on certain models.
Leveraging Visual Signals for Robust Token-Level Uncertainty in Vision-Language Generation cs.CV · 2026-05-26 · unverdicted · none · ref 2
VIG-TUQ improves token-level uncertainty estimation in LVLMs by weighting language uncertainty with visual grounding scores based on the observation that confident predictions rely more on visual content.
Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis cs.CL · 2026-05-11 · unverdicted · none · ref 29 · 2 links
DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.
SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy cs.CL · 2026-04-02 · unverdicted · none · ref 23
SWAY quantifies sycophancy in LLMs via shifts under linguistic pressure and a counterfactual chain-of-thought mitigation reduces it to near zero while preserving responsiveness to genuine evidence.
What if AI systems weren't chatbots? cs.CY · 2026-05-08 · unverdicted · none · ref 166
Chatbot AI systems often fail complex needs while projecting authority, contributing to deskilling, labor displacement, economic concentration, and high environmental costs, so alternative pluralistic and task-specific designs are needed.

A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer