AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.
arXiv preprint arXiv:2401.10480 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
PROBE pipeline with deterministic PCAP normalization, verdict-aware evidence ensembles, and composite reliability scoring raises weighted evidence F1 to 0.957 on 87 Wi-Fi captures while avoiding LLM self-confidence and evaluation bias issues.
RL-trained lightweight controller using answer statistics improves trade-offs among correctness, latency, and total samples in adaptive sampling for LLM test-time scaling.
A wrapper for black-box generate-verify AI pipelines that uses a conservative hard-negative reference pool and e-processes to control the probability of releasing on infeasible tasks while permitting release on feasible ones.
SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.
TTSP resolves the Grounding Paradox by treating perception as a scalable test-time process that generates, filters, and iteratively refines multiple visual exploration traces, outperforming baselines on high-resolution and multimodal reasoning tasks.
ES-CoT shortens LLM chain-of-thought generation by tracking runs of identical step answers after linguistic markers, cutting tokens 16% on average while keeping accuracy comparable to full CoT across six datasets and three models.
LRMs underperform on simple system 1 questions in both accuracy and efficiency, with problem difficulty implicitly encoded in early hidden states.
citing papers explorer
No citing papers match the current filters.