pith. machine review for the scientific record. sign in

arxiv: 2605.06053 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: unknown

Towards Generation-Efficient Uncertainty Estimation in Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-08 13:57 UTC · model grok-4.3

classification 💻 cs.LG
keywords uncertainty estimationlarge language modelspartial generationinput-only predictionlogit magnitudehallucination detectionearly stoppingmeta learning
0
0 comments X

The pith

Uncertainty estimates for large language model outputs can be obtained accurately from partial generations or input prompts alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether reliable uncertainty estimates for LLM responses require waiting for one or more complete generations. It organizes existing methods into a single framework that classifies them by how much of the autoregressive process they observe, from full multi-sample outputs down to the input prompt. Within this view the authors introduce Logit Magnitude, which reads uncertainty from the top-M logit values in an early-stopped prefix, and MetaUE, which trains a lightweight model to predict the same signal directly from the prompt. Experiments on general and domain-specific benchmarks show these lighter approaches perform competitively with full-generation baselines. The result is that unreliable answers can be flagged earlier and at lower inference cost than is usually assumed.

Core claim

We develop a unified framework that formulates uncertainty estimation as an early estimation problem over the autoregressive generation process of LLMs. This framework organises existing and proposed estimators by the information they observe, ranging from multi-generation to input-only prediction. Building on this view, we study two largely underexplored low-cost settings: estimating uncertainty with part of the generation, and predicting uncertainty from the input prompt. We propose Logit Magnitude, which uses top-M logit evidence to estimate uncertainty from an early-stopped generation prefix, and MetaUE, which distils generation-based uncertainty into a lightweight input-only estimator.

What carries the argument

The early-estimation framework that classifies uncertainty methods by the generation information they observe, from full multi-sample outputs to input prompts alone.

If this is right

  • Partial generations of LLMs are often sufficient for effective uncertainty estimation.
  • Logit Magnitude achieves strong performance on both general and domain-specific benchmarks.
  • MetaUE supplies a competitive input-only approximation in several settings.
  • Unreliable responses can be identified earlier in the generation process.
  • Inference cost for uncertainty assessment drops substantially compared with full-generation methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Interactive systems could compute an uncertainty score while the first tokens are still being generated and decide whether to continue or warn the user.
  • A cascaded pipeline that begins with the input-only estimator and escalates to a short prefix only when needed would further optimize the accuracy-cost trade-off.
  • The same early-estimation logic could be tested on other autoregressive generators such as those used for images or audio.

Load-bearing premise

Signals of uncertainty that appear in complete generations or multiple samples remain detectable in short generation prefixes or input prompts, without missing hallucinations that only emerge late.

What would settle it

A benchmark of LLM responses in which hallucinations reliably appear only after a fixed token position, paired with a measurement showing that Logit Magnitude scores from prefixes before that position fail to flag the errors while full-generation scores succeed.

Figures

Figures reproduced from arXiv: 2605.06053 by Mingcheng Zhu, Tingting Zhu, Yu Liu.

Figure 1
Figure 1. Figure 1: Taxonomy of LLM uncertainty estimation methods, ordered by estimation cost. The view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed generation-efficient uncertainty estimation framework. view at source ↗
Figure 3
Figure 3. Figure 3: Partial-generation dynamics of Logit Magnitude uncertainty estimation. AUROC is reported view at source ↗
Figure 4
Figure 4. Figure 4: MetaUE design analysis on Qwen3.5-4B. Figure (a-c) compares different training signals view at source ↗
Figure 5
Figure 5. Figure 5: Plots (a)-(c) shows heatmaps of AUROC and Plots (d)-(f) shows mean token-consumption view at source ↗
Figure 6
Figure 6. Figure 6: Token-level Logit Magnitude (mean ± std) at different relative token positions on emrQA. A larger separation between the two curves indicates a stronger discriminative signal. Broader Impacts This work contributes to more efficient assessment for LLM generation by reducing the amount of generation needed for uncertainty estimation. Earlier uncertainty signals may help users identify potentially unreliable … view at source ↗
read the original abstract

Uncertainty estimation is important for deploying LLMs in high-stakes applications such as healthcare and finance, where hallucinations can appear fluent and plausible while being factually incorrect, making it difficult for users to judge whether an output should be trusted. Existing methods require one or more full autoregressive generations to estimate uncertainty, which introduces substantial inference cost and often delays uncertainty assessment. In this paper, we investigate whether effective uncertainty estimation can be achieved with partial generation or even input-only information. Specifically, we first develop a unified framework that formulates uncertainty estimation as an early estimation problem over the autoregressive generation process of LLMs. This framework organises existing and proposed estimators by the information they observe, ranging from multi-generation to input-only prediction, and clarifies the performance-cost trade-off underlying different uncertainty estimation methods. Building on this view, we study two largely underexplored low-cost settings: estimating uncertainty with part of the generation, and predicting uncertainty from the input prompt. We propose Logit Magnitude, which uses top-M logit evidence to estimate uncertainty from an early-stopped generation prefix, and MetaUE, which distils generation-based uncertainty into a lightweight input-only estimator trained with uncertainty scores. Extensive experiments on general and domain-specific benchmarks show that Logit Magnitude achieves strong performance, and partial generations of LLMs are often sufficient for effective uncertainty estimation. MetaUE further provides a competitive input-only approximation in several settings. These findings suggest that effective uncertainty estimation requires less generation than commonly assumed, enabling unreliable responses to be identified earlier.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims that uncertainty estimation for LLMs can be reframed as an early-estimation problem over the autoregressive generation process. It introduces a unified framework that organizes estimators by the amount of information observed (from multi-generation to input-only), proposes Logit Magnitude (top-M logit evidence from generation prefixes) for partial-generation settings, and MetaUE (distillation of generation-based scores into a lightweight input-only model). Experiments on general and domain-specific benchmarks are reported to show that Logit Magnitude performs strongly and that partial generations are often sufficient, with MetaUE providing competitive input-only approximations.

Significance. If the empirical claims hold, the work is significant for reducing inference cost and latency in uncertainty-aware LLM deployment, particularly in high-stakes domains. The framework usefully clarifies performance-cost trade-offs, and the demonstration that early prefixes or input prompts can suffice challenges the default reliance on full or multiple generations.

minor comments (2)
  1. [Abstract] Abstract: the claim that 'partial generations of LLMs are often sufficient' would be strengthened by naming the specific benchmarks and reporting the quantitative margins versus full-generation baselines.
  2. The manuscript should include a brief discussion of cases where late-emerging hallucinations might evade early-prefix detection, even if the tested benchmarks do not exhibit them.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our work, as well as the recommendation for minor revision. We are pleased that the significance of reframing uncertainty estimation as an early-estimation problem, along with the potential reductions in inference cost and latency, has been recognized. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a unified framework that organizes existing and new uncertainty estimators by the amount of generation information observed (multi-generation down to input-only), then proposes Logit Magnitude (top-M logit evidence on early prefixes) and MetaUE (distillation of generation-based scores as training targets for an input-only model). Neither proposal reduces to its inputs by construction: the framework is organizational rather than deductive, Logit Magnitude applies a simple statistic to partial sequences, and MetaUE performs standard supervised distillation where the teacher scores are computed externally and used as labels. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to justify core claims, and benchmark experiments supply independent empirical support. The derivation chain therefore remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are introduced in the abstract; the contribution is empirical method development rather than theoretical derivation.

pith-pipeline@v0.9.0 · 5570 in / 1073 out tokens · 24540 ms · 2026-05-08T13:57:43.981780+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 6 canonical work pages · 5 internal anchors

  1. [1]

    Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, and Tagyoung Chung. Real sampling: Boosting factuality and diversity of open-ended generation by extrapolating the entropy of an infinitely large lm.Transactions of the Association for Computational Linguistics, 13:760–783, 2025

  2. [2]

    Knowledge graph finetuning enhances knowledge manipulation in large language models

    Hanzhu Chen, Xu Shen, Jie Wang, Zehao Wang, Qitan Lv, Junjie He, Rong Wu, Feng Wu, and Jieping Ye. Knowledge graph finetuning enhances knowledge manipulation in large language models. InThe Thirteenth International Conference on Learning Representations, 2025

  3. [3]

    Medtpe: Compressing long ehr sequence for llm-based clinical prediction with token-pair encoding

    Mingcheng Zhu, Zhiyao Luo, Yu Liu, and Tingting Zhu. Medtpe: Compressing long ehr sequence for llm-based clinical prediction with token-pair encoding. InProceedings of the 11th Mining and Learning from Time Series Workshop@ KDD, volume 2025, 2025

  4. [4]

    A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qiang- long Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025

  5. [5]

    Semantic uncertainty quantification of hallucinations in LLMs: A quantum tensor network based method

    pragatheeswaran vipulanandan, Kamal Premaratne, and Dilip Sarkar. Semantic uncertainty quantification of hallucinations in LLMs: A quantum tensor network based method. InThe Fourteenth International Conference on Learning Representations, 2026

  6. [6]

    A framework to assess clinical safety and hallucination rates of llms for medical text summarisation.NPJ digital medicine, 8(1):274, 2025

    Elham Asgari, Nina Montaña-Brown, Magda Dubois, Saleh Khalil, Jasmine Balloch, Joshua Au Yeung, and Dominic Pimenta. A framework to assess clinical safety and hallucination rates of llms for medical text summarisation.NPJ digital medicine, 8(1):274, 2025

  7. [7]

    Utilization of generative ai-drafted responses for managing patient-provider communication.npj Digital Medicine, 8(1):591, 2025

    Soumik Mandal, Batia M Wiesenfeld, Adam C Szerencsy, William R Small, Vincent Major, Safiya Richardson, Antoinette Schoenthaler, Devin Mann, and Oded Nov. Utilization of generative ai-drafted responses for managing patient-provider communication.npj Digital Medicine, 8(1):591, 2025

  8. [8]

    Safety of a large language model-based clinical decision support system in african primary healthcare

    Ambrose Agweyu, Paul Mwaniki, Wilkister Musau, Robert Korom, Lynda Isaaka, Conrad Wanyama, Sarah Kiptinness, Najib Adan, Mira Emmanuel-Fabula, and Bilal A Mateen. Safety of a large language model-based clinical decision support system in african primary healthcare. Nature Health, pages 1–12, 2026

  9. [9]

    Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

    Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

  10. [10]

    Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models

    Potsawee Manakul, Adian Liusie, and Mark Gales. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 9004–9017, 2023

  11. [11]

    Logu: Long-form generation with uncertainty expressions

    Ruihan Yang, Caiqi Zhang, Zhisong Zhang, Xinting Huang, Sen Yang, Nigel Collier, Dong Yu, and Deqing Yang. Logu: Long-form generation with uncertainty expressions. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18947–18968, 2025

  12. [12]

    Think just enough: Sequence-level entropy as a confidence signal for llm reasoning

    Aman Sharma and Paras Chopra. Think just enough: Sequence-level entropy as a confidence signal for llm reasoning. InFirst Workshop on Foundations of Reasoning in Language Models, 2025

  13. [13]

    Estimating llm uncertainty with evidence.arXiv preprint arXiv:2502.00290, 2025

    Huan Ma, Jingdong Chen, Joey Tianyi Zhou, Guangyu Wang, and Changqing Zhang. Estimating llm uncertainty with evidence.arXiv preprint arXiv:2502.00290, 2025

  14. [14]

    Sampling-free uncertainty quantification via hidden state dynamics in language models

    Yixin Bu, Guanyun Zou, Renzhi Wang, Runze Xia, Cunjun Wang, Hongliang Dai, Xiaoqing Ma, and Piji Li. Sampling-free uncertainty quantification via hidden state dynamics in language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 30104–30111, 2026. 10

  15. [15]

    Progressive mixed-precision decoding for efficient llm inference

    Hao Mark Chen, Fuwen Tan, Alexandros Kouris, Royson Lee, Hongxiang Fan, and Stylianos Venieris. Progressive mixed-precision decoding for efficient llm inference. InThe Thirteenth International Conference on Learning Representations, 2025

  16. [16]

    Cresswell

    Brendan Leigh Ross, Noël V ouitsis, Atiyeh Ashari Ghomi, Rasa Hosseinzadeh, Ji Xin, Zhaoyan Liu, Yi Sui, Shiyi Hou, Kin Kwan Leung, Gabriel Loaiza-Ganem, and Jesse C. Cresswell. Textual bayes: Quantifying prompt uncertainty in LLM-based systems. InThe Fourteenth International Conference on Learning Representations, 2026

  17. [17]

    Calibrating large language models with sample consistency

    Qing Lyu, Kumar Shridhar, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Mar- ianna Apidianaki, Mrinmaya Sachan, and Chris Callison-Burch. Calibrating large language models with sample consistency. InProceedings of the AAAI Conference on Artificial Intelli- gence, volume 39, pages 19260–19268, 2025

  18. [18]

    Gram- mars of formal uncertainty: When to trust llms in automated reasoning tasks

    Debargha Ganguly, Vikash Singh, Sreehari Sankar, Biyao Zhang, Xuecen Zhang, Srinivasan Iyengar, Xiaotian Han, Amit Sharma, Shivkumar Kalyanaraman, and Vipin Chaudhary. Gram- mars of formal uncertainty: When to trust llms in automated reasoning tasks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  19. [19]

    Improving uncertainty estimation through semantically diverse language generation

    Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi, and Sepp Hochreiter. Improving uncertainty estimation through semantically diverse language generation. InThe Thirteenth International Conference on Learning Representations, 2025

  20. [20]

    Cocoa: A minimum bayes risk framework bridging confidence and consistency for uncertainty quantification in llms

    Roman Vashurin, Maiya Goloburda, Albina Ilina, Aleksandr Rubashevskii, Preslav Nakov, Artem Shelmanov, and Maxim Panov. Cocoa: A minimum bayes risk framework bridging confidence and consistency for uncertainty quantification in llms. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  21. [21]

    Tokur: Token-level uncertainty estimation for large language model reasoning

    Tunyu Zhang, Haizhou Shi, Yibin Wang, Hengyi Wang, Xiaoxiao He, Zhuowei Li, Haoxian Chen, Ligong Han, Kai Xu, Huan Zhang, et al. Tokur: Token-level uncertainty estimation for large language model reasoning. InThe Fourteenth International Conference on Learning Representations, 2026

  22. [22]

    Uncertainty-aware answer selection for improved reasoning in multi-llm systems

    Aakriti Agrawal, Rohith Aralikatti, Anirudh Satheesh, Souradip Chakraborty, Amrit Singh Bedi, and Furong Huang. Uncertainty-aware answer selection for improved reasoning in multi-llm systems. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 25090–25098, 2025

  23. [23]

    Rethinking uncertainty es- timation in LLMs: A principled single-sequence measure

    Lukas Aichberger, Kajetan Schweighofer, and Sepp Hochreiter. Rethinking uncertainty es- timation in LLMs: A principled single-sequence measure. InThe Fourteenth International Conference on Learning Representations, 2026

  24. [24]

    Uncertainty as feature gaps: Epistemic uncertainty quantification of LLMs in contextual question-answering

    Yavuz Faruk Bakman, Sungmin Kang, Zhiqi Huang, Duygu Nur Yaldiz, Catarina G Belém, Chenyang Zhu, Anoop Kumar, Alfy Samuel, Daben Liu, Salman Avestimehr, and Sai Praneeth Karimireddy. Uncertainty as feature gaps: Epistemic uncertainty quantification of LLMs in contextual question-answering. InThe Fourteenth International Conference on Learning Representati...

  25. [25]

    Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models

    Jinhao Duan, Hao Cheng, Shiqi Wang, Alex Zavalny, Chenan Wang, Renjing Xu, Bhavya Kailkhura, and Kaidi Xu. Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5050–5063, 2024

  26. [26]

    Exploiting the asymmetric uncertainty structure of pre-trained vlms on the unit hypersphere

    Li Ju, Max Andersson, Stina Fredriksson, Edward Glöckner, Andreas Hellander, Ekta Vats, and Prashant Singh. Exploiting the asymmetric uncertainty structure of pre-trained vlms on the unit hypersphere. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  27. [27]

    Can llms detect their confabulations? estimating reliability in uncertainty-aware language models

    Tianyi Zhou, Johanne Medina, and Sanjay Chawla. Can llms detect their confabulations? estimating reliability in uncertainty-aware language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 38164–38172, 2026. 11

  28. [28]

    Un- conditional truthfulness: Learning unconditional uncertainty of large language models

    Artem Vazhentsev, Ekaterina Fadeeva, Rui Xing, Gleb Kuzmin, Ivan Lazichny, Alexander Panchenko, Preslav Nakov, Timothy Baldwin, Maxim Panov, and Artem Shelmanov. Un- conditional truthfulness: Learning unconditional uncertainty of large language models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35661–3...

  29. [29]

    Steerconf: Steering llms for confidence elicitation

    Ziang Zhou, Tianyuan Jin, Jieming Shi, and Li Qing. Steerconf: Steering llms for confidence elicitation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  30. [30]

    Beyond binary rewards: Training lms to reason about their uncertainty

    Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Leshem Choshen, Yoon Kim, and Jacob Andreas. Beyond binary rewards: Training lms to reason about their uncertainty. InThe Fourteenth International Conference on Learning Representations, 2026

  31. [31]

    Answer convergence as a signal for early stopping in reasoning

    Xin Liu and Lu Wang. Answer convergence as a signal for early stopping in reasoning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 17907–17918, 2025

  32. [32]

    Cambridge university press, 1991

    David Williams.Probability with martingales. Cambridge university press, 1991

  33. [33]

    Enhancing llm-as-a-judge through active-sampling-based prompt optimization

    Cheng Zhen, Ervine Zheng, Jilong Kuang, and Geoffrey Jay Tso. Enhancing llm-as-a-judge through active-sampling-based prompt optimization. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 960–970, 2025

  34. [34]

    Coqa: A conversational question answering challenge.Transactions of the Association for Computational Linguistics, 7:249–266, 2019

    Siva Reddy, Danqi Chen, and Christopher D Manning. Coqa: A conversational question answering challenge.Transactions of the Association for Computational Linguistics, 7:249–266, 2019

  35. [35]

    Newsqa: A machine comprehension dataset

    Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. Newsqa: A machine comprehension dataset. InProceedings of the 2nd Workshop on Representation Learning for NLP, pages 191–200, 2017

  36. [36]

    emrqa: A large corpus for question answering on electronic medical records

    Anusri Pampari, Preethi Raghavan, Jennifer Liang, and Jian Peng. emrqa: A large corpus for question answering on electronic medical records. InProceedings of the 2018 conference on empirical methods in natural language processing, pages 2357–2368, 2018

  37. [37]

    Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation

    Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. InThe Eleventh International Conference on Learning Representations, 2023

  38. [38]

    Qwen Team. Qwen3. 5-omni technical report.arXiv preprint arXiv:2604.15804, 2026

  39. [39]

    Gemma 2: Improving Open Language Models at a Practical Size

    Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024

  40. [40]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

  41. [41]

    Scalable best-of-n selection for large language models via self-certainty

    Zhewei Kang, Xuandong Zhao, and Dawn Song. Scalable best-of-n selection for large language models via self-certainty. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  42. [42]

    M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

    Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. InFindings of the association for computational linguistics: ACL 2024, pages 2318–2335, 2024

  43. [43]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025. 12

  44. [44]

    Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

    Mingxin Li, Yanzhao Zhang, Dingkun Long, Keqin Chen, Sibo Song, Shuai Bai, Zhibo Yang, Pengjun Xie, An Yang, Dayiheng Liu, et al. Qwen3-vl-embedding and qwen3-vl-reranker: A unified framework for state-of-the-art multimodal retrieval and ranking.arXiv preprint arXiv:2601.04720, 2026

  45. [45]

    A survey on mixture of experts in large language models.IEEE Transactions on Knowledge and Data Engineering, 2025

    Weilin Cai, Juyong Jiang, Fan Wang, Jing Tang, Sunghun Kim, and Jiayi Huang. A survey on mixture of experts in large language models.IEEE Transactions on Knowledge and Data Engineering, 2025

  46. [46]

    Youden index and optimal cut-point estimated from observations affected by a lower limit of detection

    Marcus D Ruopp, Neil J Perkins, Brian W Whitcomb, and Enrique F Schisterman. Youden index and optimal cut-point estimated from observations affected by a lower limit of detection. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 50(3):419–430, 2008

  47. [47]

    TX t=1 ∆2 t 1{t>τ} # =E

    Raphaël Bentegeac, Bastien Le Guellec, Grégory Kuchcinski, Philippe Amouyel, and Aghiles Hamroun. Token probabilities to mitigate large language models overconfidence in answering medical questions: quantitative study.Journal of medical Internet research, 27:e64348, 2025. 13 A Algorithm for Logit Magnitude with Adaptive Early Stopping Algorithm 1 summaris...