RL-trained lightweight controller using answer statistics improves trade-offs among correctness, latency, and total samples in adaptive sampling for LLM test-time scaling.
Learning to reason via mixture-of-thought for logical reasoning
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
representative citing papers
A parameter-free sampling strategy called CUTS combined with Mixed-CUTS training prevents mode collapse in RL for saturated LLM reasoning tasks and raises AIME25 Pass@1 accuracy by up to 15.1% over standard GRPO.
VeriTrans achieves 94.46% SAT/UNSAT correctness on SatBench via LLM translation gated by round-trip similarity and deterministic neuro-symbolic execution.