AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.
The majority is not always right: Rl training for solution aggregation
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 3years
2026 3roles
background 1polarities
background 1representative citing papers
MoCo supplies a unified library of 26 collaboration strategies and benchmarks demonstrating average outperformance over single models in 61 percent of (model, data) pairs.
Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.
citing papers explorer
-
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.
-
MoCo: A One-Stop Shop for Model Collaboration Research
MoCo supplies a unified library of 26 collaboration strategies and benchmarks demonstrating average outperformance over single models in 61 percent of (model, data) pairs.
-
Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models
Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.