BAS aggregates utility from an answer-or-abstain model across risk thresholds and is uniquely maximized by truthful confidence estimates.
arXiv preprint arXiv:2410.20199 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.
VizCopilot integrates topic modeling with document visualization to support user oversight of retrieved context in enterprise chatbots, enabling detection of misalignments and adaptation of prompting strategies.
VIDEE introduces a human-in-the-loop system using Monte-Carlo Tree Search for task decomposition, executable pipeline generation, and LLM-based evaluation with visualizations to support non-expert text analytics.
citing papers explorer
-
BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence
BAS aggregates utility from an answer-or-abstain model across risk thresholds and is uniquely maximized by truthful confidence estimates.
-
Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.
-
VizCopilot: Fostering Appropriate Reliance on Enterprise Chatbots with Context Visualization
VizCopilot integrates topic modeling with document visualization to support user oversight of retrieved context in enterprise chatbots, enabling detection of misalignments and adaptation of prompting strategies.
-
VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents
VIDEE introduces a human-in-the-loop system using Monte-Carlo Tree Search for task decomposition, executable pipeline generation, and LLM-based evaluation with visualizations to support non-expert text analytics.