hub

Uncertainty estimation in autoregressive structured prediction

Andrey Malinin, Mark Gales · 2021 · arXiv 2002.07650

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

cs.CL · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

BICR trains a lightweight probe on contrastive hidden states from real versus blind images to detect visual grounding in LVLM predictions, outperforming baselines on calibration and discrimination with fewer parameters.

BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

BalanceRAG uses sequential graphical testing on a 2D lattice of threshold pairs to certify safe operating points that meet target risk levels in cascaded RAG while increasing coverage.

Towards Annotation-Free Validation of MLLMs: A Vision-Language Logical Consistency Metric

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

VL-LCM measures vision-language logical consistency without annotations and shows that recent MLLMs have high accuracy but low logical consistency on benchmarks like MMMU and NaturalBench.

Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.

Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.

Ensemble-Based Uncertainty Estimation for Code Correctness Estimation

cs.SE · 2026-03-28 · unverdicted · novelty 6.0

Ensemble Semantic Entropy improves correlation with code correctness over single-model methods and powers a cascading scaling system that cuts FLOPs by 64.9% while preserving performance on LiveCodeBench.

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

cs.AI · 2026-01-08 · unverdicted · novelty 6.0

GlimpRouter uses the entropy of the first token in each reasoning step to decide whether to invoke a large model, yielding 10.7% higher accuracy and 25.9% lower latency than a standalone large model on AIME25.

Entropy After </Think> for reasoning model early exiting

cs.LG · 2025-09-30 · unverdicted · novelty 6.0

Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.

Artificial Phantasia: Emergent Mental Imagery in Large Language Models

cs.AI · 2025-09-27 · unverdicted · novelty 6.0

LLMs achieve higher accuracy than humans on compositional imagery tasks previously argued to require pictorial representations, supporting emergent propositional mental imagery in AI.

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

cs.CL · 2023-02-19 · unverdicted · novelty 6.0

Semantic entropy improves uncertainty estimation in natural language generation by incorporating semantic equivalences, outperforming standard entropy baselines on predicting model accuracy for question answering.

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.

Feature Rivalry in Sparse Autoencoder Representations: A Mechanistic Study of Uncertainty-Driven Feature Competition in LLMs

cs.LG · 2026-05-03 · unverdicted · novelty 5.0

Feature rivalry in SAE representations strengthens with model uncertainty on high-entropy questions, enables output steering, and predicts answer correctness with AUROC 0.689 in Gemma-2-2B.

Testing the Assumptions of Active Learning for Translation Tasks with Few Samples

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.

Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

Supervised fine-tuning degrades the correlation between confidence scores and output quality in language models, driven by factors like training distribution similarity rather than true quality.

Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks

cs.AI · 2026-03-12 · unverdicted · novelty 5.0

Introduces Explicit Logic Channel (ELC) with LLM, VFM and probabilistic inference for validating, selecting and enhancing MLLMs on zero-shot tasks using Consistency Rate and cross-channel integration.

citing papers explorer

Showing 15 of 15 citing papers.

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking cs.CL · 2026-05-11 · unverdicted · none · ref 50 · 2 links
BICR trains a lightweight probe on contrastive hidden states from real versus blind images to detect visual grounding in LVLM predictions, outperforming baselines on calibration and discrimination with fewer parameters.
BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation cs.CL · 2026-05-19 · unverdicted · none · ref 55
BalanceRAG uses sequential graphical testing on a 2D lattice of threshold pairs to certify safe operating points that meet target risk levels in cascaded RAG while increasing coverage.
Towards Annotation-Free Validation of MLLMs: A Vision-Language Logical Consistency Metric cs.AI · 2026-05-07 · unverdicted · none · ref 14
VL-LCM measures vision-language logical consistency without annotations and shows that recent MLLMs have high accuracy but low logical consistency on benchmarks like MMMU and NaturalBench.
Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation cs.CL · 2026-05-07 · unverdicted · none · ref 70
DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.
Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation cs.LG · 2026-04-21 · unverdicted · none · ref 129
Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.
Ensemble-Based Uncertainty Estimation for Code Correctness Estimation cs.SE · 2026-03-28 · unverdicted · none · ref 29
Ensemble Semantic Entropy improves correlation with code correctness over single-model methods and powers a cascading scaling system that cuts FLOPs by 64.9% while preserving performance on LiveCodeBench.
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts cs.AI · 2026-01-08 · unverdicted · none · ref 14
GlimpRouter uses the entropy of the first token in each reasoning step to decide whether to invoke a large model, yielding 10.7% higher accuracy and 25.9% lower latency than a standalone large model on AIME25.
Entropy After </Think> for reasoning model early exiting cs.LG · 2025-09-30 · unverdicted · none · ref 11
Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.
Artificial Phantasia: Emergent Mental Imagery in Large Language Models cs.AI · 2025-09-27 · unverdicted · none · ref 47
LLMs achieve higher accuracy than humans on compositional imagery tasks previously argued to require pictorial representations, supporting emergent propositional mental imagery in AI.
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation cs.CL · 2023-02-19 · unverdicted · none · ref 13
Semantic entropy improves uncertainty estimation in natural language generation by incorporating semantic equivalences, outperforming standard entropy baselines on predicting model accuracy for question answering.
Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering cs.CL · 2026-05-19 · unverdicted · none · ref 140
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.
Feature Rivalry in Sparse Autoencoder Representations: A Mechanistic Study of Uncertainty-Driven Feature Competition in LLMs cs.LG · 2026-05-03 · unverdicted · none · ref 7
Feature rivalry in SAE representations strengthens with model uncertainty on high-entropy questions, enables output steering, and predicts answer correctness with AUROC 0.689 in Gemma-2-2B.
Testing the Assumptions of Active Learning for Translation Tasks with Few Samples cs.CL · 2026-04-10 · unverdicted · none · ref 16
Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.
Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning cs.CL · 2026-04-10 · unverdicted · none · ref 22
Supervised fine-tuning degrades the correlation between confidence scores and output quality in language models, driven by factors like training distribution similarity rather than true quality.
Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks cs.AI · 2026-03-12 · unverdicted · none · ref 46
Introduces Explicit Logic Channel (ELC) with LLM, VFM and probabilistic inference for validating, selecting and enhancing MLLMs on zero-shot tasks using Consistency Rate and cross-channel integration.

Uncertainty estimation in autoregressive structured prediction

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer