hub Mixed citations

Alphazero-like tree-search can guide large lan- guage model decoding and training

· 2023 · arXiv 2309.17179

Mixed citation behavior. Most common role is background (57%).

17 Pith papers citing it

Background 57% of classified citations

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 2

citation-polarity summary

background 4 use method 2 unclear 1

representative citing papers

V-ABS: Action-Observer Driven Beam Search for Dynamic Visual Reasoning

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

V-ABS is an action-observer beam search method with entropy-based adaptive weighting and an 80k-sample SFT dataset that delivers 19.7% average gains on visual reasoning tasks for MLLMs.

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.

Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning

stat.ML · 2026-05-06 · unverdicted · novelty 7.0

InfoTree casts intermediate state selection in tree search as monotone submodular maximization under fixed rollout budgets, yielding closed-form UUCB terms and lifting mixed-outcome ratios while outperforming flat GRPO and prior tree variants on nine benchmarks.

SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

SAT trains multi-LLM teams with sequential block updates to deliver monotonic gains and plug-and-play model swaps that provably improve performance bounds.

DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

cs.CL · 2026-01-07 · unverdicted · novelty 7.0

DiffCoT applies diffusion-style iterative denoising to chain-of-thought steps with a causal noise schedule, outperforming standard CoT optimization methods on multi-step reasoning benchmarks.

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

cs.CV · 2025-03-09 · unverdicted · novelty 7.0

Seg-Zero uses cognitive reinforcement learning on a decoupled reasoning-plus-segmentation architecture to produce explicit reasoning chains and reach 57.5 zero-shot accuracy on ReasonSeg, beating prior supervised LISA-7B by 18%.

Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

CoRD uses collaborative multi-teacher step-wise decoding with perplexity-guided beam search to generate higher-quality Long-CoT data that lets smaller models reach near-teacher performance with less supervision.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

Reinforce to Learn, Elect to Reason: A Dual Paradigm for Video Reasoning

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

RLER trains video-reasoning models with three task-driven RL rewards for evidence production and elects the best answer from a few candidates via evidence consistency scoring, yielding 6.3% average gains on eight benchmarks.

Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models

cs.CL · 2026-01-16 · unverdicted · novelty 6.0

NCoTS treats chain-of-thought reasoning as a search problem and uses a dual-factor heuristic to find paths that are over 3.5% more accurate and 22% shorter on benchmarks.

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

cs.CL · 2024-06-05 · conditional · novelty 6.0

OmegaPRM automates collection of 1.5 million process supervision labels via binary-search MCTS, raising Gemini Pro math accuracy from 51% to 69.4% on MATH500 and Gemma2 27B from 42.3% to 58.2%.

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

cs.AI · 2023-12-14 · conditional · novelty 6.0

Math-Shepherd is an automatically trained process reward model that scores solution steps to verify and reinforce LLMs, lifting Mistral-7B from 77.9% to 89.1% on GSM8K and 28.6% to 43.5% on MATH.

Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models

cs.CL · 2026-04-07 · unverdicted · novelty 5.0

Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.

KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality

cs.AI · 2025-06-24 · unverdicted · novelty 5.0

KnowRL integrates a knowledge-verification factuality reward into RL training to enforce fact-based reasoning steps and lower hallucination rates in LLMs.

Agentic Reasoning for Large Language Models

cs.AI · 2026-01-18 · unverdicted · novelty 4.0

The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model

cs.CL · 2025-04-13 · unverdicted · novelty 4.0

QM-ToT applies Tree of Thoughts decomposition and evaluator layers to quantized LLMs, reporting accuracy gains from 34% to 50% on MedQAUSMLE for LLaMA2-70b and from 58.77% to 69.49% for LLaMA-3.1-8b, plus an 86.27% improvement in data distillation using only 3.9% of the data.

citing papers explorer

Showing 17 of 17 citing papers.

V-ABS: Action-Observer Driven Beam Search for Dynamic Visual Reasoning cs.CV · 2026-05-11 · unverdicted · none · ref 4
V-ABS is an action-observer beam search method with entropy-based adaptive weighting and an 80k-sample SFT dataset that delivers 19.7% average gains on visual reasoning tasks for MLLMs.
CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation cs.CL · 2026-05-08 · unverdicted · none · ref 27
CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.
Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning stat.ML · 2026-05-06 · unverdicted · none · ref 5
InfoTree casts intermediate state selection in tree search as monotone submodular maximization under fixed rollout budgets, yielding closed-form UUCB terms and lifting mixed-outcome ratios while outperforming flat GRPO and prior tree variants on nine benchmarks.
SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees cs.LG · 2026-04-17 · unverdicted · none · ref 5
SAT trains multi-LLM teams with sequential block updates to deliver monotonic gains and plug-and-play model swaps that provably improve performance bounds.
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs cs.CL · 2026-01-07 · unverdicted · none · ref 2
DiffCoT applies diffusion-style iterative denoising to chain-of-thought steps with a causal noise schedule, outperforming standard CoT optimization methods on multi-step reasoning benchmarks.
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement cs.CV · 2025-03-09 · unverdicted · none · ref 10
Seg-Zero uses cognitive reinforcement learning on a decoupled reasoning-plus-segmentation architecture to produce explicit reasoning chains and reach 57.5 zero-shot accuracy on ReasonSeg, beating prior supervised LISA-7B by 18%.
Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration cs.LG · 2026-05-11 · unverdicted · none · ref 16 · 2 links
SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.
Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding cs.AI · 2026-05-04 · unverdicted · none · ref 48
CoRD uses collaborative multi-teacher step-wise decoding with perplexity-guided beam search to generate higher-quality Long-CoT data that lets smaller models reach near-teacher performance with less supervision.
Evaluation-driven Scaling for Scientific Discovery cs.LG · 2026-04-21 · unverdicted · none · ref 32
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
Reinforce to Learn, Elect to Reason: A Dual Paradigm for Video Reasoning cs.CV · 2026-04-06 · unverdicted · none · ref 10
RLER trains video-reasoning models with three task-driven RL rewards for evidence production and elects the best answer from a few candidates via evidence consistency scoring, yielding 6.3% average gains on eight benchmarks.
Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models cs.CL · 2026-01-16 · unverdicted · none · ref 5
NCoTS treats chain-of-thought reasoning as a search problem and uses a dual-factor heuristic to find paths that are over 3.5% more accurate and 22% shorter on benchmarks.
Improve Mathematical Reasoning in Language Models by Automated Process Supervision cs.CL · 2024-06-05 · conditional · none · ref 4
OmegaPRM automates collection of 1.5 million process supervision labels via binary-search MCTS, raising Gemini Pro math accuracy from 51% to 69.4% on MATH500 and Gemma2 27B from 42.3% to 58.2%.
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations cs.AI · 2023-12-14 · conditional · none · ref 44
Math-Shepherd is an automatically trained process reward model that scores solution steps to verify and reinforce LLMs, lifting Mistral-7B from 77.9% to 89.1% on GSM8K and 28.6% to 43.5% on MATH.
Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models cs.CL · 2026-04-07 · unverdicted · none · ref 6
Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality cs.AI · 2025-06-24 · unverdicted · none · ref 2
KnowRL integrates a knowledge-verification factuality reward into RL training to enforce fact-based reasoning steps and lower hallucination rates in LLMs.
Agentic Reasoning for Large Language Models cs.AI · 2026-01-18 · unverdicted · none · ref 120
The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model cs.CL · 2025-04-13 · unverdicted · none · ref 28
QM-ToT applies Tree of Thoughts decomposition and evaluator layers to quantized LLMs, reporting accuracy gains from 34% to 50% on MedQAUSMLE for LLaMA2-70b and from 58.77% to 69.49% for LLaMA-3.1-8b, plus an 86.27% improvement in data distillation using only 3.9% of the data.

Alphazero-like tree-search can guide large lan- guage model decoding and training

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer