pith. sign in

arxiv: 2511.12309 · v2 · pith:NUKUABYJnew · submitted 2025-11-15 · 💻 cs.LG · cs.AI· stat.ML

Optimal Self-Consistency for Efficient Reasoning with Large Language Models

classification 💻 cs.LG cs.AIstat.ML
keywords self-consistencyefficiencysamplessamplescalingappliedapproachbehavior
0
0 comments X
read the original abstract

Self-consistency (SC) is a widely used test-time inference technique for improving performance in chain-of-thought reasoning. It consists of generating multiple responses, or ``samples", from a large language model (LLM) and selecting the most frequent answer. This procedure can naturally be viewed as a majority vote or empirical mode estimation. Despite its effectiveness, self-consistency is prohibitively expensive at scale when naively applied to datasets, and it lacks a unified theoretical understanding of sample efficiency and scaling behavior. In this paper, we provide the first comprehensive analysis of SC's scaling behavior and its variants, drawing on mode estimation and voting theory. We derive and empirically validate power law scaling for self-consistency across datasets, and analyze the sample efficiency for fixed-allocation and dynamic-allocation sampling schemes. From these insights, we introduce Blend-ASC, a novel variant of self-consistency that dynamically allocates samples to questions during inference, achieving state-of-the-art sample efficiency. Our approach uses 4.8 times fewer samples than vanilla SC on average, outperforming both fixed- and dynamic-allocation SC baselines, thereby demonstrating the superiority of our approach in terms of efficiency. In contrast to existing variants, we note that Blend-ASC is hyperparameter-free, supports batching, and can fit any budget of samples, ensuring it can be easily applied to any self-consistency application.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Uncertainty-Aware Budget Allocation for Adaptive Test-Time Reasoning

    cs.CL 2026-05 unverdicted novelty 6.0

    UAB uses ANLL from a single generation as a difficulty signal and a marginal-greedy concave optimization to allocate remaining sampling budget, yielding up to 3% higher average accuracy on reasoning benchmarks.