A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions

· 2024 · arXiv 2412.05563

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training

cs.AI · 2026-05-04 · conditional · novelty 6.0

Average token log-probability provides a zero-shot confidence signal for small LLMs that matches supervised baselines in-distribution and outperforms them out-of-distribution, with a new retrieval-conditional variant improving further at lower latency.

How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

LLM OOD detectors are length-confounded; a two-pathway embedding-plus-trajectory framework detects covert OOD inputs at 0.721 average AUROC and 0.850 on jailbreaks.

Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification

cs.AI · 2026-04-18 · unverdicted · novelty 6.0

Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.

Measuring and mitigating overreliance to build human-compatible AI

cs.CY · 2025-09-08 · conditional · novelty 5.0

The paper consolidates risks of overreliance on LLMs, identifies gaps in current measurement approaches, and proposes mitigation strategies to keep AI as a human-compatible thought partner.

LLM-Powered AI Agent Systems and Their Applications in Industry

cs.AI · 2025-05-22 · unverdicted · novelty 2.0

A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.

Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models

cs.CL · 2026-05-06

citing papers explorer

Showing 6 of 6 citing papers.

Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training cs.AI · 2026-05-04 · conditional · none · ref 25
Average token log-probability provides a zero-shot confidence signal for small LLMs that matches supervised baselines in-distribution and outperforms them out-of-distribution, with a new retrieval-conditional variant improving further at lower latency.
How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework cs.CL · 2026-04-30 · unverdicted · none · ref 21
LLM OOD detectors are length-confounded; a two-pathway embedding-plus-trajectory framework detects covert OOD inputs at 0.721 average AUROC and 0.850 on jailbreaks.
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification cs.AI · 2026-04-18 · unverdicted · none · ref 39
Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.
Measuring and mitigating overreliance to build human-compatible AI cs.CY · 2025-09-08 · conditional · none · ref 108
The paper consolidates risks of overreliance on LLMs, identifies gaps in current measurement approaches, and proposes mitigation strategies to keep AI as a human-compatible thought partner.
LLM-Powered AI Agent Systems and Their Applications in Industry cs.AI · 2025-05-22 · unverdicted · none · ref 99
A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.
Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models cs.CL · 2026-05-06 · unreviewed · ref 8

A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer