ALIEN: Aligned Entropy Head for Improving Uncertainty Estimation of LLMs

Alexey Zaytsev; Artem Zabolotnyi; Mile Mitrovic; Oleg Travkin; Polina Proskura; Roman Alferov; Roman Makarov

arxiv: 2505.15443 · v2 · submitted 2025-05-21 · 💻 cs.CL · stat.ML

ALIEN: Aligned Entropy Head for Improving Uncertainty Estimation of LLMs

Artem Zabolotnyi , Roman Makarov , Mile Mitrovic , Polina Proskura , Oleg Travkin , Roman Alferov , Alexey Zaytsev This is my paper

Pith reviewed 2026-05-22 13:57 UTC · model grok-4.3

classification 💻 cs.CL stat.ML

keywords uncertainty estimationpredictive entropylanguage modelscalibrationerror detectionlightweight fine-tuningselective prediction

0 comments

The pith

ALIEN refines a language model's predictive entropy with a small trained head to better detect incorrect outputs and lower calibration error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that predictive entropy alone misses important signals of unreliability such as class overlap or ambiguous inputs. ALIEN adds a lightweight uncertainty head that starts by reproducing the original entropy and is then adjusted through regularization so its scores better match whether the model's prediction is actually correct. Experiments on seven classification tasks and two named-entity benchmarks, using five different base models, find that this alignment improves error detection over strong baselines while achieving the lowest calibration error. The head adds negligible parameters and inference time, leaving the original model unchanged. A sympathetic reader would care because reliable uncertainty estimates let downstream systems know when to trust or reject an LLM prediction without retraining the whole model.

Core claim

ALIEN trains a small uncertainty head that is initialized to output the base model's original predictive entropy and is then fine-tuned with two regularization mechanisms; the resulting aligned entropy scores improve detection of incorrect predictions and reduce calibration error on classification and NER tasks across RoBERTa, ELECTRA, LLaMA-2, Qwen2.5 and Qwen3 while adding only 0.002 percent parameters for decoder models and 0.5 percent for encoder models.

What carries the argument

The Aligned Entropy head: a small network initialized to reproduce the base model's predictive entropy and fine-tuned with regularization to align its uncertainty scores with actual prediction correctness.

If this is right

Uncertainty scores from ALIEN can be used directly for selective prediction or rejection sampling without changing the original model weights.
The method works on both encoder-only and decoder-only architectures with only milliseconds of added inference time per batch.
Calibration error drops while error-detection performance rises across seven text-classification datasets and two NER benchmarks.
No storage of intermediate activations is required, making the approach suitable for large-scale deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lightweight-head idea could be tested on other uncertainty measures such as mutual information or temperature-scaled logits to see whether alignment helps beyond entropy.
If the regularization mechanisms prove dataset-agnostic, the head might transfer across domains without retraining from scratch.
In production pipelines, ALIEN-style heads could be swapped or updated independently of the backbone, allowing ongoing calibration without full model retraining.

Load-bearing premise

Fine-tuning the small head with the two regularization steps will produce uncertainty scores that generalize to new inputs and correctly track prediction reliability without dataset-specific biases or harm to the base model.

What would settle it

On a held-out classification dataset or new language model, measure AUROC for detecting incorrect predictions and expected calibration error; if ALIEN does not exceed the strongest baseline on both metrics, the alignment benefit does not hold.

Figures

Figures reproduced from arXiv: 2505.15443 by Alexey Zaytsev, Artem Zabolotnyi, Mile Mitrovic, Oleg Travkin, Polina Proskura, Roman Alferov, Roman Makarov.

**Figure 1.** Figure 1: The ALIEN head training scheme. We initialize the new uncertainty head with the original weights θinit, use its entropy output as an initial uncertainty signal, and then fine-tune the head with the three-term loss that includes binary cross-entropy, output consistency regularization, and L2-SP anchoring. The rest of the model (backbone and adapter) remains frozen. – We introduce a training strategy combin… view at source ↗

**Figure 2.** Figure 2: Spearman correlation between uncertainty estimates and ensemble-based uncertainty components across datasets. Each row corresponds to one uncertainty component (Halea, Hepi, and Htotal), and each column corresponds to a dataset. Bars compare ALIEN against the base entropy across models. Higher values indicate stronger monotonic alignment with the corresponding uncertainty component. Best viewed when zoo… view at source ↗

read the original abstract

Uncertainty estimation remains a key challenge when adapting pre-trained language models to downstream classification tasks, with overconfidence often observed for difficult inputs. While predictive entropy provides a strong baseline for uncertainty estimation, it considers mainly aleatoric uncertainty and has limited capacity to capture effects, such as class overlap or ambiguous linguistic cues. We introduce Aligned Entropy - ALIEN, a lightweight method that refines entropy-based uncertainty by aligning it with prediction reliability. ALIEN trains a small uncertainty head initialized to produce the model's original entropy and subsequently fine-tuned with two regularization mechanisms. Experiments across seven classification datasets and two NER benchmarks, evaluated on five language models (RoBERTa, ELECTRA, LLaMA-2, Qwen2.5, and Qwen3), show that ALIEN consistently outperforms strong baselines across all considered scenarios in detecting incorrect predictions, while achieving the lowest calibration error. The proposed method introduces only a small inference overhead (in the order of milliseconds per batch on CPU) and increases the model's parameter count by just 0.002% for decoder models and 0.5% for encoder models, without requiring storage of intermediate states. It improves uncertainty estimation while preserving the original model architecture, making the approach practical for large-scale deployment with modern language models. Our results demonstrate that entropy can be effectively refined through lightweight supervised alignment, producing more reliable uncertainty estimates without modifying the backbone model. The code is available at 4.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces ALIEN, a lightweight uncertainty head for language models that is initialized to reproduce the base model's predictive entropy and then fine-tuned with two regularization mechanisms to align uncertainty estimates with observed prediction reliability. Experiments across seven classification datasets, two NER benchmarks, and five models (RoBERTa, ELECTRA, LLaMA-2, Qwen2.5, Qwen3) report consistent gains over baselines in error detection and calibration error, with negligible parameter and inference overhead.

Significance. If the reported gains hold after clarifying the supervision details, the approach would offer a practical, low-overhead refinement of entropy-based uncertainty that preserves the original model architecture. The consistent outperformance across encoder and decoder models on both classification and NER tasks, together with the public code release, would strengthen the case for lightweight post-hoc uncertainty improvements in deployed LLMs.

major comments (3)

[Method] The central claim that ALIEN refines entropy-based uncertainty via alignment with prediction reliability requires clarification on whether ground-truth labels are used during head training. If the head is supervised on correctness (as implied by 'aligning it with prediction reliability'), the gains may reflect supervised error detection rather than an improved unsupervised uncertainty measure; this distinction is load-bearing for the comparison to the pure-entropy baseline.
[Abstract] Abstract and §3 (or equivalent): the two regularization mechanisms, the precise loss function, training hyperparameters, and any statistical significance tests are not specified. Without these details, it is difficult to verify that the reported improvements in error detection and calibration error are robust rather than artifacts of particular hyperparameter choices or dataset splits.
[Experiments] Experiments section: the weakest assumption—that fine-tuning the head on held-out labeled data produces uncertainty scores that generalize to new inputs without introducing dataset-specific biases—needs explicit testing, for example via cross-dataset evaluation or ablation removing the supervision signal.

minor comments (2)

[Abstract] The abstract states the method increases parameter count by 0.002% for decoder models and 0.5% for encoder models; confirm these figures are consistent with the head architecture described in the method section.
[Experiments] Clarify whether the reported calibration error is ECE or another metric, and ensure all baseline comparisons use identical evaluation protocols.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our work. We address each of the major comments below, providing clarifications and indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Method] The central claim that ALIEN refines entropy-based uncertainty via alignment with prediction reliability requires clarification on whether ground-truth labels are used during head training. If the head is supervised on correctness (as implied by 'aligning it with prediction reliability'), the gains may reflect supervised error detection rather than an improved unsupervised uncertainty measure; this distinction is load-bearing for the comparison to the pure-entropy baseline.

Authors: We thank the referee for highlighting this important distinction. The ALIEN head is trained in a supervised manner using ground-truth labels to align the initial entropy estimates with observed prediction reliability, as indicated by our use of 'supervised alignment' in the abstract. This training occurs on held-out data and is performed once. At inference, the head produces uncertainty scores without access to labels, similar to how the base entropy is computed. We argue that this results in an improved uncertainty measure rather than a direct error detector, as the output remains a scalar uncertainty value aligned with entropy. However, we agree that this point requires clearer exposition in the method section to distinguish it from purely unsupervised approaches and from supervised classification of errors. We will revise the manuscript accordingly. revision: yes
Referee: [Abstract] Abstract and §3 (or equivalent): the two regularization mechanisms, the precise loss function, training hyperparameters, and any statistical significance tests are not specified. Without these details, it is difficult to verify that the reported improvements in error detection and calibration error are robust rather than artifacts of particular hyperparameter choices or dataset splits.

Authors: We acknowledge that the current manuscript lacks sufficient detail on these aspects. In the revised version, we will expand §3 to fully describe the two regularization mechanisms, provide the exact loss function formulation, list the training hyperparameters used, and include results of statistical significance tests (e.g., paired t-tests or Wilcoxon tests) for the reported improvements. revision: yes
Referee: [Experiments] Experiments section: the weakest assumption—that fine-tuning the head on held-out labeled data produces uncertainty scores that generalize to new inputs without introducing dataset-specific biases—needs explicit testing, for example via cross-dataset evaluation or ablation removing the supervision signal.

Authors: This is a valid concern regarding generalization. Our current experiments demonstrate consistent improvements across seven classification datasets, two NER benchmarks, and five different models, which provides some evidence of robustness. However, we did not include explicit cross-dataset transfer experiments or an ablation that removes the supervision signal entirely. We will add such an ablation study in the revised manuscript to directly address this point and further validate that the gains stem from the alignment process rather than dataset-specific fitting. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper describes a practical method for refining predictive entropy via a lightweight supervised head trained on held-out labeled data to align with observed prediction correctness, using two regularization terms. This process is explicitly presented as supervised alignment rather than an unsupervised or first-principles derivation. Evaluation relies on separate benchmarks across multiple models and datasets, with no equations or claims that reduce by construction to the inputs (e.g., no fitted parameter renamed as an independent prediction, no self-citation chains invoked as uniqueness theorems, and no ansatz smuggled through prior work). The central claims rest on empirical results rather than tautological reasoning, making the derivation self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The approach rests on standard supervised learning assumptions that labeled data for the downstream task is available and that correctness labels can serve as a reliable training signal for uncertainty. No new physical or mathematical axioms are introduced.

free parameters (1)

regularization coefficients
The two regularization mechanisms almost certainly involve tunable hyperparameters whose values are chosen to optimize alignment on validation data.

pith-pipeline@v0.9.0 · 5814 in / 1294 out tokens · 31360 ms · 2026-05-22T13:57:18.978081+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ALIEN trains a small uncertainty head initialized to produce the model's original entropy and subsequently fine-tuned with two regularization mechanisms... binary cross-entropy term predicting whether the model’s prediction is incorrect
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce Aligned Entropy - ALIEN, a lightweight method that refines entropy-based uncertainty by aligning it with prediction reliability.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 6 internal anchors

[1]

Yelp Dataset Challenge: Review Rating Prediction

Asghar, N.: Yelp dataset challenge: Review rating prediction. arXiv preprint arXiv:1605.05362 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

In: EMNLP (2020)

Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Pre-training transformers as energy-based cloze models. In: EMNLP (2020)

work page 2020
[3]

In: EMNLP (2023)

Colombo, P., Darrin, M., Panitainada, P.: Rainproof: An umbrella to shield text generators from out-of-distribution data. In: EMNLP (2023)

work page 2023
[4]

Demszky, D., Movshovitz-Attias, D., Ko, J., et al.: GoEmotions: A dataset of fine- grained emotions. In: ACL. pp. 4040–4054 (2020)

work page 2020
[5]

In: Workshop on Noisy User-generated Text

Derczynski, L., Nichols, E., Van Erp, M., Limsopatham, N.: Results of the WNUT2017 shared task on novel and emerging entity recognition. In: Workshop on Noisy User-generated Text. pp. 140–147 (2017)

work page 2017
[6]

Duan, J., Cheng, H., Wang, S., et al.: Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models. In: ACL. pp. 5050–5063 (2024)

work page 2024
[7]

In: EMNLP: Industry Track

Fadeev, E., Mollaev, D., Shestov, A., et al.: Latte: Learning aligned transactions and textual embeddings for bank clients. In: EMNLP: Industry Track. pp. 2635– 2647 (2025)

work page 2025
[8]

In: Findings of ACL

Fadeeva, E., Rubashevskii, A., Shelmanov, A., et al.: Fact-checking the output of large language models via token-level uncertainty quantification. In: Findings of ACL. pp. 9367–9385. Association for Computational Linguistics, Bangkok, Thai- land (2024)

work page 2024
[9]

In: EMNLP: System Demonstrations

Fadeeva, E., Vashurin, R., Tsvigun, A., et al.: LM-polygraph: Uncertainty esti- mation for language models. In: EMNLP: System Demonstrations. pp. 446–461 (2023) Title Suppressed Due to Excessive Length 15

work page 2023
[10]

Transactions of the Association for Computational Linguistics8, 539–555 (2020)

Fomicheva, M., Sun, S., Yankovskaya, L., et al.: Unsupervised quality estimation for neural machine translation. Transactions of the Association for Computational Linguistics8, 539–555 (2020)

work page 2020
[11]

NeurIPS30(2017)

Geifman, Y., El-Yaniv, R.: Selective classification for deep neural networks. NeurIPS30(2017)

work page 2017
[12]

In: ICML

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML. pp. 1321–1330. PMLR (2017)

work page 2017
[13]

Hartvigsen, T., Gabriel, S., Palangi, H., et al.: ToxiGen: A large-scale machine- generated dataset for adversarial and implicit hate speech detection. In: ACL. pp. 3309–3326. Association for Computational Linguistics (2022)

work page 2022
[14]

In: ICML

Houlsby,N.,Giurgiu,A.,Jastrzebski,S.,etal.:Parameter-efficienttransferlearning for NLP. In: ICML. pp. 2790–2799. PMLR (2019)

work page 2019
[15]

Bayesian Active Learning for Classification and Preference Learning

Houlsby, N., Huszár, F., Ghahramani, Z., Lengyel, M.: Bayesian active learning for classification and preference learning. arXiv preprint arXiv:1112.5745 (2011)

work page internal anchor Pith review Pith/arXiv arXiv 2011
[16]

ICLR1(2), 3 (2022)

Hu, E.J., Shen, Y., Wallis, P., et al.: LoRA: Low-rank adaptation of large language models. ICLR1(2), 3 (2022)

work page 2022
[17]

Machine Learning110(3), 457–506 (2021)

Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning110(3), 457–506 (2021)

work page 2021
[18]

Language Models (Mostly) Know What They Know

Kadavath, S., Conerly, T., Askell, A., et al.: Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

In: WACV

Korchagin, S., Zaychenkova, E., Khalin, A., et al.: Improving uncertainty estima- tion with confidence-aware training data. In: WACV. pp. 7991–8001 (2025)

work page 2025
[20]

In: ICLR (2023)

Kuhn, L., Gal, Y., Farquhar, S.: Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. In: ICLR (2023)

work page 2023
[21]

NeurIPS30(2017)

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. NeurIPS30(2017)

work page 2017
[22]

In: Machine Learning Proceed- ings 1995, pp

Lang, K.: Newsweeder: Learning to filter netnews. In: Machine Learning Proceed- ings 1995, pp. 331–339. Morgan Kaufmann, San Francisco (CA) (1995)

work page 1995
[23]

In: NAACL-HLT

Larson, S., Mahendran, A., Lee, A., et al.: Outlier detection for improved data quality and diversity in dialog systems. In: NAACL-HLT. pp. 517–527 (2019)

work page 2019
[24]

In: NeurIPS

Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out- of-distribution samples and adversarial attacks. In: NeurIPS. vol. 31 (2018)

work page 2018
[25]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Y., Ott, M., Goyal, N., et al.: RoBERTa: A robustly optimized BERT pre- training approach. CoRRabs/1907.11692(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907
[26]

In: ACL-HLT

Maas, A., Daly, R.E., Pham, P.T., et al.: Learning word vectors for sentiment analysis. In: ACL-HLT. pp. 142–150 (2011)

work page 2011
[27]

In: ICLR (2021)

Malinin, A., Gales, M.: Uncertainty estimation in autoregressive structured pre- diction. In: ICLR (2021)

work page 2021
[28]

In: EMNLP: System Demonstrations

Nguyen, D.Q., Vu, T., Nguyen, A.T.: BERTweet: A pre-trained language model for english tweets. In: EMNLP: System Demonstrations. pp. 9–14 (2020)

work page 2020
[29]

In: EMNLP

van der Poel, L., Cotterell, R., Meister, C.: Mutual information alleviates halluci- nations in abstractive summarization. In: EMNLP. pp. 5956–5965. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (2022)

work page 2022
[30]

ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning (2021)

Ren, J., Fort, S., Liu, J., et al.: A simple fix to Mahalanobis distance for improving near-OOD detection. ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning (2021)

work page 2021
[31]

In: ICLR (2023) 16 Authors Suppressed Due to Excessive Length

Ren, J., Luo, J., Zhao, Y., et al.: Out-of-distribution detection and selective gen- eration for conditional language models. In: ICLR (2023) 16 Authors Suppressed Due to Excessive Length

work page 2023
[32]

In: CoNLL at HLT-NAACL

Sang, E.T.K., De Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: CoNLL at HLT-NAACL. pp. 142–147 (2003)

work page 2003
[33]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910
[34]

In: IEEE BIBM

Shelmanov, A., Liventsev, V., Kireev, D., et al.: Active learning with deep pre- trained models for sequence tagging of clinical and biomedical texts. In: IEEE BIBM. pp. 482–489. IEEE (2019)

work page 2019
[35]

BMC Medical Informatics and Decision Making25(1), 117 (2025)

Shool, S., Adimi, S., Saboori Amleshi, R., et al.: A systematic review of large language model (LLM) evaluations in clinical medicine. BMC Medical Informatics and Decision Making25(1), 117 (2025)

work page 2025
[36]

Sky, C.W., Van Durme, B., Eisner, J., Kedzie, C.: Do androids know they’re only dreaming of electric sheep? In: Findings of ACL. pp. 4401–4420 (2024)

work page 2024
[37]

In: EMNLP

Socher, R., Perelygin, A., Wu, J., et al.: Recursive deep models for semantic com- positionality over a sentiment treebank. In: EMNLP. pp. 1631–1642. Association for Computational Linguistics, Seattle, Washington, USA (2013)

work page 2013
[38]

In: Workshop on NLP for Conversational AI

Takayama,J.,Arase,Y.:Relevantandinformativeresponsegenerationusingpoint- wise mutual information. In: Workshop on NLP for Conversational AI. pp. 133–138 (2019)

work page 2019
[39]

In: EMNLP

Tian, K., Mitchell, E., Zhou, A., et al.: Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. In: EMNLP. pp. 5433–5442. Association for Computational Linguistics, Singapore (2023)

work page 2023
[40]

Touvron, H., Martin, L., Stone, K., et al.: Llama 2: Open foundation and fine-tuned chat models (2023)

work page 2023
[41]

In: ICML

Van Amersfoort, J., Smith, L., Teh, Y.W., Gal, Y.: Uncertainty estimation using a single deep deterministic neural network. In: ICML. pp. 9690–9700. PMLR (2020)

work page 2020
[42]

Vazhentsev, A., Kuzmin, G., Tsvigun, A., et al.: Hybrid uncertainty quantification forselectivetextclassificationinambiguoustasks.In:ACL.pp.11659–11681(2023)

work page 2023
[43]

Wang, Z., Duan, J., Cheng, L., et al.: ConU: Conformal uncertainty in large lan- guage models with correctness coverage guarantees (2024)

work page 2024
[44]

Transactions of the Association for Computational Linguistics7, 625–641 (2019)

Warstadt, A., Singh, A., Bowman, S.R.: Neural network acceptability judgments. Transactions of the Association for Computational Linguistics7, 625–641 (2019)

work page 2019
[45]

In: ICML

Xuhong, L., Grandvalet, Y., Davoine, F.: Explicit inductive bias for transfer learn- ing with convolutional networks. In: ICML. pp. 2825–2834. PMLR (2018)

work page 2018
[46]

Yang, A., Li, A., Yang, B., et al.: Qwen3 technical report (2025)

work page 2025
[47]

Qwen2 Technical Report

Yang, A., Yang, B., Hui, B., et al.: Qwen2 technical report. arXiv preprint arXiv:2407.10671 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[48]

Yoo, K., Kim, J., Jang, J., Kwak, N.: Detection of adversarial examples in text classification:Benchmarkandbaselineviarobustdensityestimation.In:ACLFind- ings. pp. 3656–3672 (2022)

work page 2022

[1] [1]

Yelp Dataset Challenge: Review Rating Prediction

Asghar, N.: Yelp dataset challenge: Review rating prediction. arXiv preprint arXiv:1605.05362 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[2] [2]

In: EMNLP (2020)

Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Pre-training transformers as energy-based cloze models. In: EMNLP (2020)

work page 2020

[3] [3]

In: EMNLP (2023)

Colombo, P., Darrin, M., Panitainada, P.: Rainproof: An umbrella to shield text generators from out-of-distribution data. In: EMNLP (2023)

work page 2023

[4] [4]

Demszky, D., Movshovitz-Attias, D., Ko, J., et al.: GoEmotions: A dataset of fine- grained emotions. In: ACL. pp. 4040–4054 (2020)

work page 2020

[5] [5]

In: Workshop on Noisy User-generated Text

Derczynski, L., Nichols, E., Van Erp, M., Limsopatham, N.: Results of the WNUT2017 shared task on novel and emerging entity recognition. In: Workshop on Noisy User-generated Text. pp. 140–147 (2017)

work page 2017

[6] [6]

Duan, J., Cheng, H., Wang, S., et al.: Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models. In: ACL. pp. 5050–5063 (2024)

work page 2024

[7] [7]

In: EMNLP: Industry Track

Fadeev, E., Mollaev, D., Shestov, A., et al.: Latte: Learning aligned transactions and textual embeddings for bank clients. In: EMNLP: Industry Track. pp. 2635– 2647 (2025)

work page 2025

[8] [8]

In: Findings of ACL

Fadeeva, E., Rubashevskii, A., Shelmanov, A., et al.: Fact-checking the output of large language models via token-level uncertainty quantification. In: Findings of ACL. pp. 9367–9385. Association for Computational Linguistics, Bangkok, Thai- land (2024)

work page 2024

[9] [9]

In: EMNLP: System Demonstrations

Fadeeva, E., Vashurin, R., Tsvigun, A., et al.: LM-polygraph: Uncertainty esti- mation for language models. In: EMNLP: System Demonstrations. pp. 446–461 (2023) Title Suppressed Due to Excessive Length 15

work page 2023

[10] [10]

Transactions of the Association for Computational Linguistics8, 539–555 (2020)

Fomicheva, M., Sun, S., Yankovskaya, L., et al.: Unsupervised quality estimation for neural machine translation. Transactions of the Association for Computational Linguistics8, 539–555 (2020)

work page 2020

[11] [11]

NeurIPS30(2017)

Geifman, Y., El-Yaniv, R.: Selective classification for deep neural networks. NeurIPS30(2017)

work page 2017

[12] [12]

In: ICML

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML. pp. 1321–1330. PMLR (2017)

work page 2017

[13] [13]

Hartvigsen, T., Gabriel, S., Palangi, H., et al.: ToxiGen: A large-scale machine- generated dataset for adversarial and implicit hate speech detection. In: ACL. pp. 3309–3326. Association for Computational Linguistics (2022)

work page 2022

[14] [14]

In: ICML

Houlsby,N.,Giurgiu,A.,Jastrzebski,S.,etal.:Parameter-efficienttransferlearning for NLP. In: ICML. pp. 2790–2799. PMLR (2019)

work page 2019

[15] [15]

Bayesian Active Learning for Classification and Preference Learning

Houlsby, N., Huszár, F., Ghahramani, Z., Lengyel, M.: Bayesian active learning for classification and preference learning. arXiv preprint arXiv:1112.5745 (2011)

work page internal anchor Pith review Pith/arXiv arXiv 2011

[16] [16]

ICLR1(2), 3 (2022)

Hu, E.J., Shen, Y., Wallis, P., et al.: LoRA: Low-rank adaptation of large language models. ICLR1(2), 3 (2022)

work page 2022

[17] [17]

Machine Learning110(3), 457–506 (2021)

Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning110(3), 457–506 (2021)

work page 2021

[18] [18]

Language Models (Mostly) Know What They Know

Kadavath, S., Conerly, T., Askell, A., et al.: Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[19] [19]

In: WACV

Korchagin, S., Zaychenkova, E., Khalin, A., et al.: Improving uncertainty estima- tion with confidence-aware training data. In: WACV. pp. 7991–8001 (2025)

work page 2025

[20] [20]

In: ICLR (2023)

Kuhn, L., Gal, Y., Farquhar, S.: Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. In: ICLR (2023)

work page 2023

[21] [21]

NeurIPS30(2017)

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. NeurIPS30(2017)

work page 2017

[22] [22]

In: Machine Learning Proceed- ings 1995, pp

Lang, K.: Newsweeder: Learning to filter netnews. In: Machine Learning Proceed- ings 1995, pp. 331–339. Morgan Kaufmann, San Francisco (CA) (1995)

work page 1995

[23] [23]

In: NAACL-HLT

Larson, S., Mahendran, A., Lee, A., et al.: Outlier detection for improved data quality and diversity in dialog systems. In: NAACL-HLT. pp. 517–527 (2019)

work page 2019

[24] [24]

In: NeurIPS

Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out- of-distribution samples and adversarial attacks. In: NeurIPS. vol. 31 (2018)

work page 2018

[25] [25]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Y., Ott, M., Goyal, N., et al.: RoBERTa: A robustly optimized BERT pre- training approach. CoRRabs/1907.11692(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907

[26] [26]

In: ACL-HLT

Maas, A., Daly, R.E., Pham, P.T., et al.: Learning word vectors for sentiment analysis. In: ACL-HLT. pp. 142–150 (2011)

work page 2011

[27] [27]

In: ICLR (2021)

Malinin, A., Gales, M.: Uncertainty estimation in autoregressive structured pre- diction. In: ICLR (2021)

work page 2021

[28] [28]

In: EMNLP: System Demonstrations

Nguyen, D.Q., Vu, T., Nguyen, A.T.: BERTweet: A pre-trained language model for english tweets. In: EMNLP: System Demonstrations. pp. 9–14 (2020)

work page 2020

[29] [29]

In: EMNLP

van der Poel, L., Cotterell, R., Meister, C.: Mutual information alleviates halluci- nations in abstractive summarization. In: EMNLP. pp. 5956–5965. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (2022)

work page 2022

[30] [30]

ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning (2021)

Ren, J., Fort, S., Liu, J., et al.: A simple fix to Mahalanobis distance for improving near-OOD detection. ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning (2021)

work page 2021

[31] [31]

In: ICLR (2023) 16 Authors Suppressed Due to Excessive Length

Ren, J., Luo, J., Zhao, Y., et al.: Out-of-distribution detection and selective gen- eration for conditional language models. In: ICLR (2023) 16 Authors Suppressed Due to Excessive Length

work page 2023

[32] [32]

In: CoNLL at HLT-NAACL

Sang, E.T.K., De Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: CoNLL at HLT-NAACL. pp. 142–147 (2003)

work page 2003

[33] [33]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910

[34] [34]

In: IEEE BIBM

Shelmanov, A., Liventsev, V., Kireev, D., et al.: Active learning with deep pre- trained models for sequence tagging of clinical and biomedical texts. In: IEEE BIBM. pp. 482–489. IEEE (2019)

work page 2019

[35] [35]

BMC Medical Informatics and Decision Making25(1), 117 (2025)

Shool, S., Adimi, S., Saboori Amleshi, R., et al.: A systematic review of large language model (LLM) evaluations in clinical medicine. BMC Medical Informatics and Decision Making25(1), 117 (2025)

work page 2025

[36] [36]

Sky, C.W., Van Durme, B., Eisner, J., Kedzie, C.: Do androids know they’re only dreaming of electric sheep? In: Findings of ACL. pp. 4401–4420 (2024)

work page 2024

[37] [37]

In: EMNLP

Socher, R., Perelygin, A., Wu, J., et al.: Recursive deep models for semantic com- positionality over a sentiment treebank. In: EMNLP. pp. 1631–1642. Association for Computational Linguistics, Seattle, Washington, USA (2013)

work page 2013

[38] [38]

In: Workshop on NLP for Conversational AI

Takayama,J.,Arase,Y.:Relevantandinformativeresponsegenerationusingpoint- wise mutual information. In: Workshop on NLP for Conversational AI. pp. 133–138 (2019)

work page 2019

[39] [39]

In: EMNLP

Tian, K., Mitchell, E., Zhou, A., et al.: Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. In: EMNLP. pp. 5433–5442. Association for Computational Linguistics, Singapore (2023)

work page 2023

[40] [40]

Touvron, H., Martin, L., Stone, K., et al.: Llama 2: Open foundation and fine-tuned chat models (2023)

work page 2023

[41] [41]

In: ICML

Van Amersfoort, J., Smith, L., Teh, Y.W., Gal, Y.: Uncertainty estimation using a single deep deterministic neural network. In: ICML. pp. 9690–9700. PMLR (2020)

work page 2020

[42] [42]

Vazhentsev, A., Kuzmin, G., Tsvigun, A., et al.: Hybrid uncertainty quantification forselectivetextclassificationinambiguoustasks.In:ACL.pp.11659–11681(2023)

work page 2023

[43] [43]

Wang, Z., Duan, J., Cheng, L., et al.: ConU: Conformal uncertainty in large lan- guage models with correctness coverage guarantees (2024)

work page 2024

[44] [44]

Transactions of the Association for Computational Linguistics7, 625–641 (2019)

Warstadt, A., Singh, A., Bowman, S.R.: Neural network acceptability judgments. Transactions of the Association for Computational Linguistics7, 625–641 (2019)

work page 2019

[45] [45]

In: ICML

Xuhong, L., Grandvalet, Y., Davoine, F.: Explicit inductive bias for transfer learn- ing with convolutional networks. In: ICML. pp. 2825–2834. PMLR (2018)

work page 2018

[46] [46]

Yang, A., Li, A., Yang, B., et al.: Qwen3 technical report (2025)

work page 2025

[47] [47]

Qwen2 Technical Report

Yang, A., Yang, B., Hui, B., et al.: Qwen2 technical report. arXiv preprint arXiv:2407.10671 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[48] [48]

Yoo, K., Kim, J., Jang, J., Kwak, N.: Detection of adversarial examples in text classification:Benchmarkandbaselineviarobustdensityestimation.In:ACLFind- ings. pp. 3656–3672 (2022)

work page 2022