Joint Consistency casts test-time aggregation as Ising-type energy minimization with pairwise LLM-judge interactions, subsuming voting methods and outperforming baselines across reasoning tasks.
Confidence improves self-consistency in llms
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
BAS aggregates utility from an answer-or-abstain model across risk thresholds and is uniquely maximized by truthful confidence estimates.
LLMs need metacognition to align expressed uncertainty with their actual knowledge boundaries, moving beyond knowledge expansion to reduce confident errors.
GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
A deep research agent incorporates progressive confidence estimation and calibration to produce trustworthy reports with transparent confidence scores on claims.
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
citing papers explorer
-
Joint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimization
Joint Consistency casts test-time aggregation as Ising-type energy minimization with pairwise LLM-judge interactions, subsuming voting methods and outperforming baselines across reasoning tasks.
-
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
-
BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence
BAS aggregates utility from an answer-or-abstain model across risk thresholds and is uniquely maximized by truthful confidence estimates.
-
Hallucinations Undermine Trust; Metacognition is a Way Forward
LLMs need metacognition to align expressed uncertainty with their actual knowledge boundaries, moving beyond knowledge expansion to reduce confident errors.
-
GrACE: A Generative Approach to Better Confidence Elicitation and Efficient Test-Time Scaling in Large Language Models
GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.
-
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
-
Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration
A deep research agent incorporates progressive confidence estimation and calibration to produce trustworthy reports with transparent confidence scores on claims.
-
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.