Confidence improves self-consistency in llms

Amir Taubenfeld, Tom Sheffer, Eran Ofek, Amir Feder, Ariel Goldstein, Zorik Gekhman, Gal Yona · 2025 · arXiv 2502.06233

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Joint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimization

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Joint Consistency casts test-time aggregation as Ising-type energy minimization with pairwise LLM-judge interactions, subsuming voting methods and outperforming baselines across reasoning tasks.

Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost

cs.AI · 2026-05-07 · conditional · novelty 7.0

Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.

BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence

cs.CL · 2026-04-03 · unverdicted · novelty 7.0

BAS aggregates utility from an answer-or-abstain model across risk thresholds and is uniquely maximized by truthful confidence estimates.

Hallucinations Undermine Trust; Metacognition is a Way Forward

cs.CL · 2026-05-02 · unverdicted · novelty 6.0

LLMs need metacognition to align expressed uncertainty with their actual knowledge boundaries, moving beyond knowledge expansion to reduce confident errors.

GrACE: A Generative Approach to Better Confidence Elicitation and Efficient Test-Time Scaling in Large Language Models

cs.CL · 2025-09-11 · unverdicted · novelty 6.0

GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

cs.CL · 2025-03-20 · accept · novelty 5.0

A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.

Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration

cs.AI · 2026-04-07 · unverdicted · novelty 4.0

A deep research agent incorporates progressive confidence estimation and calibration to produce trustworthy reports with transparent confidence scores on claims.

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

cs.AI · 2025-07-28 · accept · novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

citing papers explorer

Showing 8 of 8 citing papers.

Joint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimization cs.AI · 2026-05-07 · unverdicted · none · ref 37
Joint Consistency casts test-time aggregation as Ising-type energy minimization with pairwise LLM-judge interactions, subsuming voting methods and outperforming baselines across reasoning tasks.
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost cs.AI · 2026-05-07 · conditional · none · ref 180
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence cs.CL · 2026-04-03 · unverdicted · none · ref 45
BAS aggregates utility from an answer-or-abstain model across risk thresholds and is uniquely maximized by truthful confidence estimates.
Hallucinations Undermine Trust; Metacognition is a Way Forward cs.CL · 2026-05-02 · unverdicted · none · ref 43
LLMs need metacognition to align expressed uncertainty with their actual knowledge boundaries, moving beyond knowledge expansion to reduce confident errors.
GrACE: A Generative Approach to Better Confidence Elicitation and Efficient Test-Time Scaling in Large Language Models cs.CL · 2025-09-11 · unverdicted · none · ref 39
GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models cs.CL · 2025-03-20 · accept · none · ref 170
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration cs.AI · 2026-04-07 · unverdicted · none · ref 20
A deep research agent incorporates progressive confidence estimation and calibration to produce trustworthy reports with transparent confidence scores on claims.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence cs.AI · 2025-07-28 · accept · none · ref 178
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

Confidence improves self-consistency in llms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer