Interpretable contrastive monte carlo tree search reasoning

10 Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, Lijie Wen · 2024 · arXiv 2410.01707

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

cs.CL · 2026-04-19 · unverdicted · novelty 7.0

CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.

Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.

When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling

cs.AI · 2026-04-29 · unverdicted · novelty 6.0

A disagreement-guided routing framework dynamically selects among resolution, voting, and rewriting strategies for test-time scaling, delivering 3-7% accuracy gains with lower sampling cost on mathematical benchmarks.

M2-PALE: A Framework for Explaining Multi-Agent MCTS--Minimax Hybrids via Process Mining and LLMs

cs.AI · 2026-04-16 · unverdicted · novelty 6.0

M2-PALE extracts process models from multi-agent MCTS-Minimax execution traces using Alpha Miner, iDHM and Inductive Miner, then uses LLMs to generate causal explanations, shown in a small checkers setting.

Procedural Knowledge at Scale Improves Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 6.0

Reasoning Memory decomposes reasoning trajectories into 32 million subquestion-subroutine pairs and retrieves them via in-thought prompts to improve language model performance on math, science, and coding benchmarks by up to 19.2%.

NeuroSymActive: Differentiable Neural-Symbolic Reasoning with Active Exploration for Knowledge Graph Question Answering

cs.CL · 2026-02-17 · unverdicted · novelty 6.0

NeuroSymActive combines soft-unification symbolic modules, a neural path evaluator, and Monte-Carlo-style active exploration to reach strong answer accuracy on KGQA benchmarks while cutting graph lookups and model calls versus standard retrieval baselines.

Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models

cs.CL · 2025-10-04 · unverdicted · novelty 6.0

Curtailing diversity in candidate pools for test-time scaling increases unsafe LLM outputs, as demonstrated by a reference-guided reduction protocol that evades standard safety classifiers across open and closed models.

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

cs.LG · 2025-04-18 · unverdicted · novelty 5.0

PODS applies max-variance down-sampling to GRPO rollouts in LLM RLVR, delivering at least 1.7x faster training to peak test accuracy on reasoning benchmarks.

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

cs.AI · 2025-03-12 · unverdicted · novelty 5.0

The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

citing papers explorer

Showing 9 of 9 citing papers.

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning cs.CL · 2026-04-19 · unverdicted · none · ref 58
CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.
Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration cs.LG · 2026-05-11 · unverdicted · none · ref 19 · 2 links
SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.
When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling cs.AI · 2026-04-29 · unverdicted · none · ref 12
A disagreement-guided routing framework dynamically selects among resolution, voting, and rewriting strategies for test-time scaling, delivering 3-7% accuracy gains with lower sampling cost on mathematical benchmarks.
M2-PALE: A Framework for Explaining Multi-Agent MCTS--Minimax Hybrids via Process Mining and LLMs cs.AI · 2026-04-16 · unverdicted · none · ref 14
M2-PALE extracts process models from multi-agent MCTS-Minimax execution traces using Alpha Miner, iDHM and Inductive Miner, then uses LLMs to generate causal explanations, shown in a small checkers setting.
Procedural Knowledge at Scale Improves Reasoning cs.CL · 2026-04-01 · unverdicted · none · ref 6
Reasoning Memory decomposes reasoning trajectories into 32 million subquestion-subroutine pairs and retrieves them via in-thought prompts to improve language model performance on math, science, and coding benchmarks by up to 19.2%.
NeuroSymActive: Differentiable Neural-Symbolic Reasoning with Active Exploration for Knowledge Graph Question Answering cs.CL · 2026-02-17 · unverdicted · none · ref 41
NeuroSymActive combines soft-unification symbolic modules, a neural path evaluator, and Monte-Carlo-style active exploration to reach strong answer accuracy on KGQA benchmarks while cutting graph lookups and model calls versus standard retrieval baselines.
Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models cs.CL · 2025-10-04 · unverdicted · none · ref 9
Curtailing diversity in candidate pools for test-time scaling increases unsafe LLM outputs, as demonstrated by a reference-guided reduction protocol that evades standard safety classifiers across open and closed models.
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning cs.LG · 2025-04-18 · unverdicted · none · ref 7
PODS applies max-variance down-sampling to GRPO rollouts in LLM RLVR, delivering at least 1.7x faster training to peak test accuracy on reasoning benchmarks.
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models cs.AI · 2025-03-12 · unverdicted · none · ref 204
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

Interpretable contrastive monte carlo tree search reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer