SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.
Interpretable contrastive monte carlo tree search reasoning.arXiv preprint arXiv:2410.01707
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
A disagreement-guided routing framework dynamically selects among resolution, voting, and rewriting strategies for test-time scaling, delivering 3-7% accuracy gains with lower sampling cost on mathematical benchmarks.
M2-PALE extracts process models from multi-agent MCTS-Minimax execution traces using Alpha Miner, iDHM and Inductive Miner, then uses LLMs to generate causal explanations, shown in a small checkers setting.
Reasoning Memory decomposes reasoning trajectories into 32 million subquestion-subroutine pairs and retrieves them via in-thought prompts to improve language model performance on math, science, and coding benchmarks by up to 19.2%.
NeuroSymActive combines soft-unification symbolic modules, a neural path evaluator, and Monte-Carlo-style active exploration to reach strong answer accuracy on KGQA benchmarks while cutting graph lookups and model calls versus standard retrieval baselines.
Curtailing diversity in candidate pools for test-time scaling increases unsafe LLM outputs, as demonstrated by a reference-guided reduction protocol that evades standard safety classifiers across open and closed models.
PODS applies max-variance down-sampling to GRPO rollouts in LLM RLVR, delivering at least 1.7x faster training to peak test accuracy on reasoning benchmarks.
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
citing papers explorer
-
Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration
SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.
-
When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
A disagreement-guided routing framework dynamically selects among resolution, voting, and rewriting strategies for test-time scaling, delivering 3-7% accuracy gains with lower sampling cost on mathematical benchmarks.
-
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.