Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

Fadeeva, Ekaterina, Rubashevskii, Aleksandr, Shelmanov, Artem, Petrakov, Sergey, Li, Haonan, Mubarak, Hamdy · 2024 · DOI 10.18653/v1/2024.findings-acl.558

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

representative citing papers

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

RLMF uses quality of model self-judgments to refine RL rankings and select training data, achieving SOTA faithful calibration while preserving accuracy and outperforming standard RL by up to 63%.

AURORA: Asymmetry and Update-Induced Rotation for Robust Hallucination Detection in Large Language Models

cs.CL · 2026-06-28 · unverdicted · novelty 6.0

AURORA detects hallucinations via skewness of cosine similarities between weights and gradients plus a rotation ratio from SVD on update-induced changes to singular vectors.

Boosting Self-Consistency with Ranking

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.

ECUAS$_n$: A family of metrics for principled evaluation of uncertainty-augmented systems

cs.AI · 2026-05-19 · unverdicted · novelty 6.0 · 3 refs

ECUAS_n is a parameterized family of proper scoring rules for jointly assessing prediction accuracy and uncertainty quality in automated decision systems.

Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis

cs.CL · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.

CAT: Confidence-Adaptive Thinking for Efficient Reasoning of Large Reasoning Models

cs.CL · 2026-07-01 · unverdicted · novelty 5.0

CAT uses intrinsic confidence signals in preference optimization to adapt reasoning length in LRMs, outperforming uniform compression baselines on accuracy across benchmarks.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

fields

years

verdicts

representative citing papers

citing papers explorer