CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
baseline 1polarities
baseline 1representative citing papers
Two calls per example identify the first two moments of latent correctness probability, enabling exact bounds on the vote-accuracy curve for any majority-vote budget under conditional i.i.d. assumptions.
CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.
LCPO trains L1 reasoning models to adhere to prompt-specified CoT lengths, supporting accuracy-compute trade-offs and yielding short reasoning models that outperform larger baselines at matched lengths.
DyLAN automatically selects and dynamically organizes LLM agents for collaboration, outperforming fixed-agent baselines on code generation, reasoning, and decision tasks with up to 25% accuracy gains on some MMLU subjects.
citing papers explorer
-
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
-
Two Calls, Two Moments, and the Vote-Accuracy Curve of Repeated LLM Inference
Two calls per example identify the first two moments of latent correctness probability, enabling exact bounds on the vote-accuracy curve for any majority-vote budget under conditional i.i.d. assumptions.
-
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning
CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.
-
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
LCPO trains L1 reasoning models to adhere to prompt-specified CoT lengths, supporting accuracy-compute trade-offs and yielding short reasoning models that outperform larger baselines at matched lengths.
-
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration
DyLAN automatically selects and dynamically organizes LLM agents for collaboration, outperforming fixed-agent baselines on code generation, reasoning, and decision tasks with up to 25% accuracy gains on some MMLU subjects.