arxiv: 2605.06053 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: unknown

Towards Generation-Efficient Uncertainty Estimation in Large Language Models

Mingcheng Zhu , Yu Liu , Tingting Zhu

Authors on Pith no claims yet

Pith reviewed 2026-05-08 13:57 UTC · model grok-4.3

classification 💻 cs.LG

keywords uncertainty estimationlarge language modelspartial generationinput-only predictionlogit magnitudehallucination detectionearly stoppingmeta learning

0 comments

The pith

Uncertainty estimates for large language model outputs can be obtained accurately from partial generations or input prompts alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether reliable uncertainty estimates for LLM responses require waiting for one or more complete generations. It organizes existing methods into a single framework that classifies them by how much of the autoregressive process they observe, from full multi-sample outputs down to the input prompt. Within this view the authors introduce Logit Magnitude, which reads uncertainty from the top-M logit values in an early-stopped prefix, and MetaUE, which trains a lightweight model to predict the same signal directly from the prompt. Experiments on general and domain-specific benchmarks show these lighter approaches perform competitively with full-generation baselines. The result is that unreliable answers can be flagged earlier and at lower inference cost than is usually assumed.

Core claim

We develop a unified framework that formulates uncertainty estimation as an early estimation problem over the autoregressive generation process of LLMs. This framework organises existing and proposed estimators by the information they observe, ranging from multi-generation to input-only prediction. Building on this view, we study two largely underexplored low-cost settings: estimating uncertainty with part of the generation, and predicting uncertainty from the input prompt. We propose Logit Magnitude, which uses top-M logit evidence to estimate uncertainty from an early-stopped generation prefix, and MetaUE, which distils generation-based uncertainty into a lightweight input-only estimator.

What carries the argument

The early-estimation framework that classifies uncertainty methods by the generation information they observe, from full multi-sample outputs to input prompts alone.

If this is right

Partial generations of LLMs are often sufficient for effective uncertainty estimation.
Logit Magnitude achieves strong performance on both general and domain-specific benchmarks.
MetaUE supplies a competitive input-only approximation in several settings.
Unreliable responses can be identified earlier in the generation process.
Inference cost for uncertainty assessment drops substantially compared with full-generation methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Interactive systems could compute an uncertainty score while the first tokens are still being generated and decide whether to continue or warn the user.
A cascaded pipeline that begins with the input-only estimator and escalates to a short prefix only when needed would further optimize the accuracy-cost trade-off.
The same early-estimation logic could be tested on other autoregressive generators such as those used for images or audio.

Load-bearing premise

Signals of uncertainty that appear in complete generations or multiple samples remain detectable in short generation prefixes or input prompts, without missing hallucinations that only emerge late.

What would settle it

A benchmark of LLM responses in which hallucinations reliably appear only after a fixed token position, paired with a measurement showing that Logit Magnitude scores from prefixes before that position fail to flag the errors while full-generation scores succeed.

Figures

Figures reproduced from arXiv: 2605.06053 by Mingcheng Zhu, Tingting Zhu, Yu Liu.

**Figure 1.** Figure 1: Taxonomy of LLM uncertainty estimation methods, ordered by estimation cost. The view at source ↗

**Figure 2.** Figure 2: Overview of the proposed generation-efficient uncertainty estimation framework. view at source ↗

**Figure 3.** Figure 3: Partial-generation dynamics of Logit Magnitude uncertainty estimation. AUROC is reported view at source ↗

**Figure 4.** Figure 4: MetaUE design analysis on Qwen3.5-4B. Figure (a-c) compares different training signals view at source ↗

**Figure 5.** Figure 5: Plots (a)-(c) shows heatmaps of AUROC and Plots (d)-(f) shows mean token-consumption view at source ↗

**Figure 6.** Figure 6: Token-level Logit Magnitude (mean ± std) at different relative token positions on emrQA. A larger separation between the two curves indicates a stronger discriminative signal. Broader Impacts This work contributes to more efficient assessment for LLM generation by reducing the amount of generation needed for uncertainty estimation. Earlier uncertainty signals may help users identify potentially unreliable … view at source ↗

read the original abstract

Uncertainty estimation is important for deploying LLMs in high-stakes applications such as healthcare and finance, where hallucinations can appear fluent and plausible while being factually incorrect, making it difficult for users to judge whether an output should be trusted. Existing methods require one or more full autoregressive generations to estimate uncertainty, which introduces substantial inference cost and often delays uncertainty assessment. In this paper, we investigate whether effective uncertainty estimation can be achieved with partial generation or even input-only information. Specifically, we first develop a unified framework that formulates uncertainty estimation as an early estimation problem over the autoregressive generation process of LLMs. This framework organises existing and proposed estimators by the information they observe, ranging from multi-generation to input-only prediction, and clarifies the performance-cost trade-off underlying different uncertainty estimation methods. Building on this view, we study two largely underexplored low-cost settings: estimating uncertainty with part of the generation, and predicting uncertainty from the input prompt. We propose Logit Magnitude, which uses top-M logit evidence to estimate uncertainty from an early-stopped generation prefix, and MetaUE, which distils generation-based uncertainty into a lightweight input-only estimator trained with uncertainty scores. Extensive experiments on general and domain-specific benchmarks show that Logit Magnitude achieves strong performance, and partial generations of LLMs are often sufficient for effective uncertainty estimation. MetaUE further provides a competitive input-only approximation in several settings. These findings suggest that effective uncertainty estimation requires less generation than commonly assumed, enabling unreliable responses to be identified earlier.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames uncertainty estimation as an early-stopping problem and shows partial generations can often work, but the experimental support is still thin on details.

read the letter

This paper's main point is that uncertainty in LLMs can frequently be spotted from just an early prefix of the output or even the input prompt, rather than waiting for a full generation. That directly tackles the inference cost that has kept many uncertainty methods from being practical in high-stakes settings like healthcare or finance. The unified framework is the clearest new piece: it organizes methods by how much generation information they use, from multi-sample to input-only, and makes the cost-performance trade-off explicit. Logit Magnitude applies top-M logit magnitudes to a stopped prefix, while MetaUE distills full-generation scores into a lightweight prompt-only predictor. Both target regimes that prior work left mostly untouched. The experiments on general and domain-specific benchmarks are the part that could matter most if they hold up, since they suggest stopping early does not lose much reliability on the tasks tested. The thinking here is straightforward and the problem is well chosen. The soft spots are mostly in the evidence. The abstract and summary give little on the exact baselines, the metrics, variance across runs, or how they checked for late-emerging hallucinations that might not show up in prefixes. Without those controls it is hard to know whether the reported gains are robust or just benchmark-specific. The distillation step in MetaUE also risks inheriting whatever biases are in the teacher signals. Still, nothing in the framing looks circular or internally broken. This is the sort of paper I would bring to a reading group to walk through the framework and the early-stopping results together. It deserves peer review because the claim is testable, the framing adds clarity, and referees can check whether the numbers actually support the practical takeaway.

Referee Report

0 major / 2 minor

Summary. The paper claims that uncertainty estimation for LLMs can be reframed as an early-estimation problem over the autoregressive generation process. It introduces a unified framework that organizes estimators by the amount of information observed (from multi-generation to input-only), proposes Logit Magnitude (top-M logit evidence from generation prefixes) for partial-generation settings, and MetaUE (distillation of generation-based scores into a lightweight input-only model). Experiments on general and domain-specific benchmarks are reported to show that Logit Magnitude performs strongly and that partial generations are often sufficient, with MetaUE providing competitive input-only approximations.

Significance. If the empirical claims hold, the work is significant for reducing inference cost and latency in uncertainty-aware LLM deployment, particularly in high-stakes domains. The framework usefully clarifies performance-cost trade-offs, and the demonstration that early prefixes or input prompts can suffice challenges the default reliance on full or multiple generations.

minor comments (2)

[Abstract] Abstract: the claim that 'partial generations of LLMs are often sufficient' would be strengthened by naming the specific benchmarks and reporting the quantitative margins versus full-generation baselines.
The manuscript should include a brief discussion of cases where late-emerging hallucinations might evade early-prefix detection, even if the tested benchmarks do not exhibit them.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our work, as well as the recommendation for minor revision. We are pleased that the significance of reframing uncertainty estimation as an early-estimation problem, along with the potential reductions in inference cost and latency, has been recognized. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a unified framework that organizes existing and new uncertainty estimators by the amount of generation information observed (multi-generation down to input-only), then proposes Logit Magnitude (top-M logit evidence on early prefixes) and MetaUE (distillation of generation-based scores as training targets for an input-only model). Neither proposal reduces to its inputs by construction: the framework is organizational rather than deductive, Logit Magnitude applies a simple statistic to partial sequences, and MetaUE performs standard supervised distillation where the teacher scores are computed externally and used as labels. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to justify core claims, and benchmark experiments supply independent empirical support. The derivation chain therefore remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are introduced in the abstract; the contribution is empirical method development rather than theoretical derivation.

pith-pipeline@v0.9.0 · 5570 in / 1073 out tokens · 24540 ms · 2026-05-08T13:57:43.981780+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 6 canonical work pages · 5 internal anchors

[1]

Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, and Tagyoung Chung. Real sampling: Boosting factuality and diversity of open-ended generation by extrapolating the entropy of an infinitely large lm.Transactions of the Association for Computational Linguistics, 13:760–783, 2025

2025
[2]

Knowledge graph finetuning enhances knowledge manipulation in large language models

Hanzhu Chen, Xu Shen, Jie Wang, Zehao Wang, Qitan Lv, Junjie He, Rong Wu, Feng Wu, and Jieping Ye. Knowledge graph finetuning enhances knowledge manipulation in large language models. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[3]

Medtpe: Compressing long ehr sequence for llm-based clinical prediction with token-pair encoding

Mingcheng Zhu, Zhiyao Luo, Yu Liu, and Tingting Zhu. Medtpe: Compressing long ehr sequence for llm-based clinical prediction with token-pair encoding. InProceedings of the 11th Mining and Learning from Time Series Workshop@ KDD, volume 2025, 2025

2025
[4]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qiang- long Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025

2025
[5]

Semantic uncertainty quantification of hallucinations in LLMs: A quantum tensor network based method

pragatheeswaran vipulanandan, Kamal Premaratne, and Dilip Sarkar. Semantic uncertainty quantification of hallucinations in LLMs: A quantum tensor network based method. InThe Fourteenth International Conference on Learning Representations, 2026

2026
[6]

A framework to assess clinical safety and hallucination rates of llms for medical text summarisation.NPJ digital medicine, 8(1):274, 2025

Elham Asgari, Nina Montaña-Brown, Magda Dubois, Saleh Khalil, Jasmine Balloch, Joshua Au Yeung, and Dominic Pimenta. A framework to assess clinical safety and hallucination rates of llms for medical text summarisation.NPJ digital medicine, 8(1):274, 2025

2025
[7]

Utilization of generative ai-drafted responses for managing patient-provider communication.npj Digital Medicine, 8(1):591, 2025

Soumik Mandal, Batia M Wiesenfeld, Adam C Szerencsy, William R Small, Vincent Major, Safiya Richardson, Antoinette Schoenthaler, Devin Mann, and Oded Nov. Utilization of generative ai-drafted responses for managing patient-provider communication.npj Digital Medicine, 8(1):591, 2025

2025
[8]

Safety of a large language model-based clinical decision support system in african primary healthcare

Ambrose Agweyu, Paul Mwaniki, Wilkister Musau, Robert Korom, Lynda Isaaka, Conrad Wanyama, Sarah Kiptinness, Najib Adan, Mira Emmanuel-Fabula, and Bilal A Mateen. Safety of a large language model-based clinical decision support system in african primary healthcare. Nature Health, pages 1–12, 2026

2026
[9]

Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

2024
[10]

Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models

Potsawee Manakul, Adian Liusie, and Mark Gales. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 9004–9017, 2023

2023
[11]

Logu: Long-form generation with uncertainty expressions

Ruihan Yang, Caiqi Zhang, Zhisong Zhang, Xinting Huang, Sen Yang, Nigel Collier, Dong Yu, and Deqing Yang. Logu: Long-form generation with uncertainty expressions. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18947–18968, 2025

2025
[12]

Think just enough: Sequence-level entropy as a confidence signal for llm reasoning

Aman Sharma and Paras Chopra. Think just enough: Sequence-level entropy as a confidence signal for llm reasoning. InFirst Workshop on Foundations of Reasoning in Language Models, 2025

2025
[13]

Estimating llm uncertainty with evidence.arXiv preprint arXiv:2502.00290, 2025

Huan Ma, Jingdong Chen, Joey Tianyi Zhou, Guangyu Wang, and Changqing Zhang. Estimating llm uncertainty with evidence.arXiv preprint arXiv:2502.00290, 2025

work page arXiv 2025
[14]

Sampling-free uncertainty quantification via hidden state dynamics in language models

Yixin Bu, Guanyun Zou, Renzhi Wang, Runze Xia, Cunjun Wang, Hongliang Dai, Xiaoqing Ma, and Piji Li. Sampling-free uncertainty quantification via hidden state dynamics in language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 30104–30111, 2026. 10

2026
[15]

Progressive mixed-precision decoding for efficient llm inference

Hao Mark Chen, Fuwen Tan, Alexandros Kouris, Royson Lee, Hongxiang Fan, and Stylianos Venieris. Progressive mixed-precision decoding for efficient llm inference. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[16]

Cresswell

Brendan Leigh Ross, Noël V ouitsis, Atiyeh Ashari Ghomi, Rasa Hosseinzadeh, Ji Xin, Zhaoyan Liu, Yi Sui, Shiyi Hou, Kin Kwan Leung, Gabriel Loaiza-Ganem, and Jesse C. Cresswell. Textual bayes: Quantifying prompt uncertainty in LLM-based systems. InThe Fourteenth International Conference on Learning Representations, 2026

2026
[17]

Calibrating large language models with sample consistency

Qing Lyu, Kumar Shridhar, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Mar- ianna Apidianaki, Mrinmaya Sachan, and Chris Callison-Burch. Calibrating large language models with sample consistency. InProceedings of the AAAI Conference on Artificial Intelli- gence, volume 39, pages 19260–19268, 2025

2025
[18]

Gram- mars of formal uncertainty: When to trust llms in automated reasoning tasks

Debargha Ganguly, Vikash Singh, Sreehari Sankar, Biyao Zhang, Xuecen Zhang, Srinivasan Iyengar, Xiaotian Han, Amit Sharma, Shivkumar Kalyanaraman, and Vipin Chaudhary. Gram- mars of formal uncertainty: When to trust llms in automated reasoning tasks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[19]

Improving uncertainty estimation through semantically diverse language generation

Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi, and Sepp Hochreiter. Improving uncertainty estimation through semantically diverse language generation. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[20]

Cocoa: A minimum bayes risk framework bridging confidence and consistency for uncertainty quantification in llms

Roman Vashurin, Maiya Goloburda, Albina Ilina, Aleksandr Rubashevskii, Preslav Nakov, Artem Shelmanov, and Maxim Panov. Cocoa: A minimum bayes risk framework bridging confidence and consistency for uncertainty quantification in llms. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[21]

Tokur: Token-level uncertainty estimation for large language model reasoning

Tunyu Zhang, Haizhou Shi, Yibin Wang, Hengyi Wang, Xiaoxiao He, Zhuowei Li, Haoxian Chen, Ligong Han, Kai Xu, Huan Zhang, et al. Tokur: Token-level uncertainty estimation for large language model reasoning. InThe Fourteenth International Conference on Learning Representations, 2026

2026
[22]

Uncertainty-aware answer selection for improved reasoning in multi-llm systems

Aakriti Agrawal, Rohith Aralikatti, Anirudh Satheesh, Souradip Chakraborty, Amrit Singh Bedi, and Furong Huang. Uncertainty-aware answer selection for improved reasoning in multi-llm systems. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 25090–25098, 2025

2025
[23]

Rethinking uncertainty es- timation in LLMs: A principled single-sequence measure

Lukas Aichberger, Kajetan Schweighofer, and Sepp Hochreiter. Rethinking uncertainty es- timation in LLMs: A principled single-sequence measure. InThe Fourteenth International Conference on Learning Representations, 2026

2026
[24]

Uncertainty as feature gaps: Epistemic uncertainty quantification of LLMs in contextual question-answering

Yavuz Faruk Bakman, Sungmin Kang, Zhiqi Huang, Duygu Nur Yaldiz, Catarina G Belém, Chenyang Zhu, Anoop Kumar, Alfy Samuel, Daben Liu, Salman Avestimehr, and Sai Praneeth Karimireddy. Uncertainty as feature gaps: Epistemic uncertainty quantification of LLMs in contextual question-answering. InThe Fourteenth International Conference on Learning Representati...

2026
[25]

Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models

Jinhao Duan, Hao Cheng, Shiqi Wang, Alex Zavalny, Chenan Wang, Renjing Xu, Bhavya Kailkhura, and Kaidi Xu. Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5050–5063, 2024

2024
[26]

Exploiting the asymmetric uncertainty structure of pre-trained vlms on the unit hypersphere

Li Ju, Max Andersson, Stina Fredriksson, Edward Glöckner, Andreas Hellander, Ekta Vats, and Prashant Singh. Exploiting the asymmetric uncertainty structure of pre-trained vlms on the unit hypersphere. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[27]

Can llms detect their confabulations? estimating reliability in uncertainty-aware language models

Tianyi Zhou, Johanne Medina, and Sanjay Chawla. Can llms detect their confabulations? estimating reliability in uncertainty-aware language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 38164–38172, 2026. 11

2026
[28]

Un- conditional truthfulness: Learning unconditional uncertainty of large language models

Artem Vazhentsev, Ekaterina Fadeeva, Rui Xing, Gleb Kuzmin, Ivan Lazichny, Alexander Panchenko, Preslav Nakov, Timothy Baldwin, Maxim Panov, and Artem Shelmanov. Un- conditional truthfulness: Learning unconditional uncertainty of large language models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35661–3...

2025
[29]

Steerconf: Steering llms for confidence elicitation

Ziang Zhou, Tianyuan Jin, Jieming Shi, and Li Qing. Steerconf: Steering llms for confidence elicitation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[30]

Beyond binary rewards: Training lms to reason about their uncertainty

Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Leshem Choshen, Yoon Kim, and Jacob Andreas. Beyond binary rewards: Training lms to reason about their uncertainty. InThe Fourteenth International Conference on Learning Representations, 2026

2026
[31]

Answer convergence as a signal for early stopping in reasoning

Xin Liu and Lu Wang. Answer convergence as a signal for early stopping in reasoning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 17907–17918, 2025

2025
[32]

Cambridge university press, 1991

David Williams.Probability with martingales. Cambridge university press, 1991

1991
[33]

Enhancing llm-as-a-judge through active-sampling-based prompt optimization

Cheng Zhen, Ervine Zheng, Jilong Kuang, and Geoffrey Jay Tso. Enhancing llm-as-a-judge through active-sampling-based prompt optimization. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 960–970, 2025

2025
[34]

Coqa: A conversational question answering challenge.Transactions of the Association for Computational Linguistics, 7:249–266, 2019

Siva Reddy, Danqi Chen, and Christopher D Manning. Coqa: A conversational question answering challenge.Transactions of the Association for Computational Linguistics, 7:249–266, 2019

2019
[35]

Newsqa: A machine comprehension dataset

Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. Newsqa: A machine comprehension dataset. InProceedings of the 2nd Workshop on Representation Learning for NLP, pages 191–200, 2017

2017
[36]

emrqa: A large corpus for question answering on electronic medical records

Anusri Pampari, Preethi Raghavan, Jennifer Liang, and Jian Peng. emrqa: A large corpus for question answering on electronic medical records. InProceedings of the 2018 conference on empirical methods in natural language processing, pages 2357–2368, 2018

2018
[37]

Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation

Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. InThe Eleventh International Conference on Learning Representations, 2023

2023
[38]

Qwen Team. Qwen3. 5-omni technical report.arXiv preprint arXiv:2604.15804, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[39]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024

work page internal anchor Pith review arXiv 2024
[40]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review arXiv 2024
[41]

Scalable best-of-n selection for large language models via self-certainty

Zhewei Kang, Xuandong Zhao, and Dawn Song. Scalable best-of-n selection for large language models via self-certainty. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[42]

M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. InFindings of the association for computational linguistics: ACL 2024, pages 2318–2335, 2024

2024
[43]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025. 12

work page internal anchor Pith review arXiv 2025
[44]

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Mingxin Li, Yanzhao Zhang, Dingkun Long, Keqin Chen, Sibo Song, Shuai Bai, Zhibo Yang, Pengjun Xie, An Yang, Dayiheng Liu, et al. Qwen3-vl-embedding and qwen3-vl-reranker: A unified framework for state-of-the-art multimodal retrieval and ranking.arXiv preprint arXiv:2601.04720, 2026

work page internal anchor Pith review arXiv 2026
[45]

A survey on mixture of experts in large language models.IEEE Transactions on Knowledge and Data Engineering, 2025

Weilin Cai, Juyong Jiang, Fan Wang, Jing Tang, Sunghun Kim, and Jiayi Huang. A survey on mixture of experts in large language models.IEEE Transactions on Knowledge and Data Engineering, 2025

2025
[46]

Youden index and optimal cut-point estimated from observations affected by a lower limit of detection

Marcus D Ruopp, Neil J Perkins, Brian W Whitcomb, and Enrique F Schisterman. Youden index and optimal cut-point estimated from observations affected by a lower limit of detection. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 50(3):419–430, 2008

2008
[47]

TX t=1 ∆2 t 1{t>τ} # =E

Raphaël Bentegeac, Bastien Le Guellec, Grégory Kuchcinski, Philippe Amouyel, and Aghiles Hamroun. Token probabilities to mitigate large language models overconfidence in answering medical questions: quantitative study.Journal of medical Internet research, 27:e64348, 2025. 13 A Algorithm for Logit Magnitude with Adaptive Early Stopping Algorithm 1 summaris...

2025