Advances in Neural Information Processing Systems , volume=

Benchmarking llms via uncertainty quantification , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Reading Calibrated Uncertainty from Language Model Trajectories

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Geometric features from per-layer MLP update trajectories fed to a sparse linear probe outperform maximum softmax probability for uncertainty quantification under selective abstention, with gains up to 21 AURC points.

Uncertainty Quantification for LLM-based Code Generation

cs.SE · 2026-05-12 · unverdicted · novelty 6.0

RisCoSet applies multiple hypothesis testing to construct risk-controlling partial-program prediction sets for LLM code generation, achieving up to 24.5% less code removal than prior methods at equivalent risk levels.

Calibrated? Not for Everyone: How Sexual Orientation and Religious Markers Distort LLM Accuracy and Confidence in Medical QA

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

Social identity markers in medical questions degrade LLM accuracy and uncertainty calibration, producing a calibration crisis that is non-additive for intersectional cases.

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.

citing papers explorer

Showing 4 of 4 citing papers.

Reading Calibrated Uncertainty from Language Model Trajectories cs.LG · 2026-05-19 · unverdicted · none · ref 6
Geometric features from per-layer MLP update trajectories fed to a sparse linear probe outperform maximum softmax probability for uncertainty quantification under selective abstention, with gains up to 21 AURC points.
Uncertainty Quantification for LLM-based Code Generation cs.SE · 2026-05-12 · unverdicted · none · ref 60
RisCoSet applies multiple hypothesis testing to construct risk-controlling partial-program prediction sets for LLM code generation, achieving up to 24.5% less code removal than prior methods at equivalent risk levels.
Calibrated? Not for Everyone: How Sexual Orientation and Religious Markers Distort LLM Accuracy and Confidence in Medical QA cs.CL · 2026-04-19 · unverdicted · none · ref 24
Social identity markers in medical questions degrade LLM accuracy and uncertainty calibration, producing a calibration crisis that is non-additive for intersectional cases.
Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering cs.CL · 2026-05-19 · unverdicted · none · ref 76
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.

Advances in Neural Information Processing Systems , volume=

fields

years

verdicts

representative citing papers

citing papers explorer