AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.
arXiv preprint arXiv:2401.10480 , year=
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
RL-trained lightweight controller using answer statistics improves trade-offs among correctness, latency, and total samples in adaptive sampling for LLM test-time scaling.
A wrapper for black-box generate-verify AI pipelines that uses a conservative hard-negative reference pool and e-processes to control the probability of releasing on infeasible tasks while permitting release on feasible ones.
SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.
TTSP resolves the Grounding Paradox by treating perception as a scalable test-time process that generates, filters, and iteratively refines multiple visual exploration traces, outperforming baselines on high-resolution and multimodal reasoning tasks.
ES-CoT shortens LLM chain-of-thought generation by tracking runs of identical step answers after linguistic markers, cutting tokens 16% on average while keeping accuracy comparable to full CoT across six datasets and three models.
LRMs underperform on simple system 1 questions in both accuracy and efficiency, with problem difficulty implicitly encoded in early hidden states.
citing papers explorer
-
Early Stopping Chain-of-thoughts in Large Language Models
ES-CoT shortens LLM chain-of-thought generation by tracking runs of identical step answers after linguistic markers, cutting tokens 16% on average while keeping accuracy comparable to full CoT across six datasets and three models.
-
Exploring the System 1 Thinking Capability of Large Reasoning Models
LRMs underperform on simple system 1 questions in both accuracy and efficiency, with problem difficulty implicitly encoded in early hidden states.