Uncertainty- based abstention in llms improves safety and reduces hallucinations.arXiv preprint arXiv:2404.10960

Uncertainty-Based Abstention in LLMs Improves Safety, Reduces Hallucinations , author= · 2024 · arXiv 2404.10960

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Reflect-R1: Evidence-Driven Reflection for Self-Correction in Long Video Understanding

cs.CV · 2026-06-26 · unverdicted · novelty 7.0 · 3 refs

Reflect-R1 introduces the first evidence-driven self-correction framework for long video understanding using a three-stage pipeline, stage-decoupled RL via SD-GRPO, and a 120K dataset to achieve SOTA on VideoMME and LongVideoBench.

EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.

CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.

No-Worse Context-Aware Decoding: Preventing Neutral Regression in Context-Conditioned Generation

cs.CL · 2026-04-17 · unverdicted · novelty 6.0

NWCAD uses a two-stream setup with a two-stage gate to prevent accuracy drops on baseline-correct items under non-informative contexts while retaining gains from helpful contexts.

Causal Evidence that Language Models use Confidence to Drive Behavior

cs.LG · 2026-03-23 · unverdicted · novelty 6.0

Language models deploy multidimensional internal confidence representations and threshold-based policies to control abstention behavior, with causal support from activation steering experiments.

Steering the Verifiability of Multimodal AI Hallucinations

cs.AI · 2026-04-08 · unverdicted · novelty 5.0

Researchers create a human-labeled dataset of obvious and elusive multimodal hallucinations and use learned activation-space probes to control their verifiability in MLLMs.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Reflect-R1: Evidence-Driven Reflection for Self-Correction in Long Video Understanding cs.CV · 2026-06-26 · unverdicted · none · ref 30 · 3 links
Reflect-R1 introduces the first evidence-driven self-correction framework for long video understanding using a three-stage pipeline, stage-decoupled RL via SD-GRPO, and a 120K dataset to achieve SOTA on VideoMME and LongVideoBench.
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium cs.AI · 2026-05-10 · unverdicted · none · ref 65
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts cs.CL · 2026-06-03 · unverdicted · none · ref 196
CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
No-Worse Context-Aware Decoding: Preventing Neutral Regression in Context-Conditioned Generation cs.CL · 2026-04-17 · unverdicted · none · ref 22
NWCAD uses a two-stream setup with a two-stage gate to prevent accuracy drops on baseline-correct items under non-informative contexts while retaining gains from helpful contexts.
Causal Evidence that Language Models use Confidence to Drive Behavior cs.LG · 2026-03-23 · unverdicted · none · ref 20
Language models deploy multidimensional internal confidence representations and threshold-based policies to control abstention behavior, with causal support from activation steering experiments.
Steering the Verifiability of Multimodal AI Hallucinations cs.AI · 2026-04-08 · unverdicted · none · ref 33
Researchers create a human-labeled dataset of obvious and elusive multimodal hallucinations and use learned activation-space probes to control their verifiability in MLLMs.

Uncertainty- based abstention in llms improves safety and reduces hallucinations.arXiv preprint arXiv:2404.10960

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer