Average token log-probability provides a zero-shot confidence signal for small LLMs that matches supervised baselines in-distribution and outperforms them out-of-distribution, with a new retrieval-conditional variant improving further at lower latency.
A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LLM OOD detectors are length-confounded; a two-pathway embedding-plus-trajectory framework detects covert OOD inputs at 0.721 average AUROC and 0.850 on jailbreaks.
Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.
The paper consolidates risks of overreliance on LLMs, identifies gaps in current measurement approaches, and proposes mitigation strategies to keep AI as a human-compatible thought partner.
A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.
citing papers explorer
-
Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training
Average token log-probability provides a zero-shot confidence signal for small LLMs that matches supervised baselines in-distribution and outperforms them out-of-distribution, with a new retrieval-conditional variant improving further at lower latency.
-
How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework
LLM OOD detectors are length-confounded; a two-pathway embedding-plus-trajectory framework detects covert OOD inputs at 0.721 average AUROC and 0.850 on jailbreaks.
-
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.
-
Measuring and mitigating overreliance to build human-compatible AI
The paper consolidates risks of overreliance on LLMs, identifies gaps in current measurement approaches, and proposes mitigation strategies to keep AI as a human-compatible thought partner.
-
LLM-Powered AI Agent Systems and Their Applications in Industry
A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.
- Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models