RLMF uses quality of model self-judgments to refine RL rankings and select training data, achieving SOTA faithful calibration while preserving accuracy and outperforming standard RL by up to 63%.
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6verdicts
UNVERDICTED 6representative citing papers
AURORA detects hallucinations via skewness of cosine similarities between weights and gradients plus a rotation ratio from SVD on update-induced changes to singular vectors.
RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.
ECUAS_n is a parameterized family of proper scoring rules for jointly assessing prediction accuracy and uncertainty quality in automated decision systems.
DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.
CAT uses intrinsic confidence signals in preference optimization to adapt reasoning length in LRMs, outperforming uniform compression baselines on accuracy across benchmarks.
citing papers explorer
No citing papers match the current filters.