Uncertainty-based abstention in llms improves safety and reduces hallucinations

Christian Tomani, Kamalika Chaudhuri, Ivan Evtimov, Daniel Cremers, Mark Ibrahim · 2024 · arXiv 2404.10960

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.

No-Worse Context-Aware Decoding: Preventing Neutral Regression in Context-Conditioned Generation

cs.CL · 2026-04-17 · unverdicted · novelty 6.0

NWCAD uses a two-stream setup with a two-stage gate to prevent accuracy drops on baseline-correct items under non-informative contexts while retaining gains from helpful contexts.

Causal Evidence that Language Models use Confidence to Drive Behavior

cs.LG · 2026-03-23 · unverdicted · novelty 6.0

Language models deploy multidimensional internal confidence representations and threshold-based policies to control abstention behavior, with causal support from activation steering experiments.

Steering the Verifiability of Multimodal AI Hallucinations

cs.AI · 2026-04-08 · unverdicted · novelty 5.0

Researchers create a human-labeled dataset of obvious and elusive multimodal hallucinations and use learned activation-space probes to control their verifiability in MLLMs.

citing papers explorer

Showing 4 of 4 citing papers.

EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium cs.AI · 2026-05-10 · unverdicted · none · ref 65
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
No-Worse Context-Aware Decoding: Preventing Neutral Regression in Context-Conditioned Generation cs.CL · 2026-04-17 · unverdicted · none · ref 22
NWCAD uses a two-stream setup with a two-stage gate to prevent accuracy drops on baseline-correct items under non-informative contexts while retaining gains from helpful contexts.
Causal Evidence that Language Models use Confidence to Drive Behavior cs.LG · 2026-03-23 · unverdicted · none · ref 20
Language models deploy multidimensional internal confidence representations and threshold-based policies to control abstention behavior, with causal support from activation steering experiments.
Steering the Verifiability of Multimodal AI Hallucinations cs.AI · 2026-04-08 · unverdicted · none · ref 33
Researchers create a human-labeled dataset of obvious and elusive multimodal hallucinations and use learned activation-space probes to control their verifiability in MLLMs.

Uncertainty-based abstention in llms improves safety and reduces hallucinations

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer