Fork-think with confidence identifies forking points via model confidence in a single path before sampling continuations, cutting tokens up to 30% and runtime up to 57% on reasoning benchmarks while matching or exceeding parallel thinking performance.
Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
baseline 1polarities
baseline 1representative citing papers
DART is a training-free router that accepts direct answers on draft agreement and allocates thinking budgets via draft entropy on disagreement, reporting accuracy gains and token reductions on math and code benchmarks across model scales.
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
Two calls per example identify the first two moments of latent correctness probability, enabling exact bounds on the vote-accuracy curve for any majority-vote budget under conditional i.i.d. assumptions.
LCPO trains L1 reasoning models to adhere to prompt-specified CoT lengths, supporting accuracy-compute trade-offs and yielding short reasoning models that outperform larger baselines at matched lengths.
ZAS-SQL distills rules from zero-shot Text-to-SQL failures to reach 87.2-88.6% execution accuracy on Spider, new zero-shot SOTA surpassing some GPT-4 few-shot and fine-tuned baselines.
DyLAN automatically selects and dynamically organizes LLM agents for collaboration, outperforming fixed-agent baselines on code generation, reasoning, and decision tasks with up to 25% accuracy gains on some MMLU subjects.
VFR-LLM combines small LLMs with symbolic verification and solving to reach 0.983 and 0.933 accuracy on precedence and logical deduction tasks using one model call versus lower results from self-consistency baselines.
citing papers explorer
-
Fork-Think with Confidence
Fork-think with confidence identifies forking points via model confidence in a single path before sampling continuations, cutting tokens up to 30% and runtime up to 57% on reasoning benchmarks while matching or exceeding parallel thinking performance.
-
DART: Draft-Agreement Routing for Training-Free Adaptive Thinking Budgets in Hybrid Reasoning Models
DART is a training-free router that accepts direct answers on draft agreement and allocates thinking budgets via draft entropy on disagreement, reporting accuracy gains and token reductions on math and code benchmarks across model scales.
-
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
-
Two Calls, Two Moments, and the Vote-Accuracy Curve of Repeated LLM Inference
Two calls per example identify the first two moments of latent correctness probability, enabling exact bounds on the vote-accuracy curve for any majority-vote budget under conditional i.i.d. assumptions.
-
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
LCPO trains L1 reasoning models to adhere to prompt-specified CoT lengths, supporting accuracy-compute trade-offs and yielding short reasoning models that outperform larger baselines at matched lengths.
-
ZAS-SQL: Distilling Rules from Failures for Zero-Shot Text-to-SQL
ZAS-SQL distills rules from zero-shot Text-to-SQL failures to reach 87.2-88.6% execution accuracy on Spider, new zero-shot SOTA surpassing some GPT-4 few-shot and fine-tuned baselines.
-
Resource-Aware Neuro-Symbolic Reasoning for Local Small Language Models
VFR-LLM combines small LLMs with symbolic verification and solving to reach 0.983 and 0.933 accuracy on precedence and logical deduction tasks using one model call versus lower results from self-consistency baselines.