pith. sign in

hub Mixed citations

Alphazero-like tree-search can guide large language model decoding and training

Mixed citation behavior. Most common role is background (57%).

24 Pith papers citing it
Background 57% of classified citations

hub tools

citation-role summary

background 5 method 2

citation-polarity summary

clear filters

representative citing papers

DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination

cs.LG · 2026-06-06 · unverdicted · novelty 7.0

DICE formalizes multi-agent LLM coordination as discounted incomplete-information Markov games and introduces Heterogeneous Quantal Response Equilibrium (HQRE) to achieve unique stable equilibria with bounded regret, demonstrated via prompt-control and fine-tuning algorithms on eleven benchmarks.

Step-by-Step Optimization-like Reasoning in LLMs over Expanding Search Spaces

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

Introduces OPT* tasks and two training regimes (solver-guided online policy optimization with rank-based reward shaping and search-based offline RL) plus a theoretical link between search success and information extraction per budget unit, showing empirical gains in optimization-like reasoning.

DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

cs.CL · 2026-01-07 · unverdicted · novelty 7.0

DiffCoT applies diffusion-style iterative denoising to chain-of-thought steps with a causal noise schedule, outperforming standard CoT optimization methods on multi-step reasoning benchmarks.

APPO: Agentic Procedural Policy Optimization

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

APPO refines branching and credit assignment in agentic RL via a Branching Score and procedure-level scaling, improving baselines by nearly 4 points on 13 benchmarks.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

Agentic Reasoning for Large Language Models

cs.AI · 2026-01-18 · unverdicted · novelty 4.0

The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model

cs.CL · 2025-04-13 · unverdicted · novelty 4.0

QM-ToT applies Tree of Thoughts decomposition and evaluator layers to quantized LLMs, reporting accuracy gains from 34% to 50% on MedQAUSMLE for LLaMA2-70b and from 58.77% to 69.49% for LLaMA-3.1-8b, plus an 86.27% improvement in data distillation using only 3.9% of the data.

citing papers explorer

Showing 24 of 24 citing papers.