Evolving deeper llm thinking

Kuang-Huei Lee, Ian Fischer, Yueh-Hua Wu, Dave Marwood, Shumeet Baluja, Dale Schuurmans, Xinyun Chen · 2025 · arXiv 2501.09891

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

representative citing papers

Joint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimization

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Joint Consistency casts test-time aggregation as Ising-type energy minimization with pairwise LLM-judge interactions, subsuming voting methods and outperforming baselines across reasoning tasks.

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution

cs.CL · 2025-11-24 · unverdicted · novelty 7.0

CogAlpha combines LLM reasoning with code-level evolutionary search to discover financial alphas that show higher predictive accuracy and generalization than prior methods on five stock datasets.

SeLaR: Selective Latent Reasoning in Large Language Models

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.

Procedural Knowledge at Scale Improves Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 6.0

Reasoning Memory decomposes reasoning trajectories into 32 million subquestion-subroutine pairs and retrieves them via in-thought prompts to improve language model performance on math, science, and coding benchmarks by up to 19.2%.

AlphaMemo: Structured Search-Process Memory for Self-Evolving Alpha Mining Agents

cs.AI · 2026-05-26 · unverdicted · novelty 5.0

AlphaMemo equips LLM alpha-mining agents with AST-diff motif memory, residual learning, and asymmetric veto control to improve out-of-sample factor discovery on CSI 500 and S&P 500.

ExComm: Exploration-Stage Communication for Error-Resilient Agentic Test-Time Scaling

cs.AI · 2026-05-21 · unverdicted · novelty 5.0

ExComm adds cross-agent conflict detection and soft belief correction plus trajectory diversification to agentic test-time scaling, yielding 5-6% gains over baselines on AIME and GAIA benchmarks.

GEAR: Genetic AutoResearch for Agentic Code Evolution

cs.NE · 2026-05-08 · unverdicted · novelty 5.0

GEAR applies genetic algorithms to maintain and evolve multiple research states in autonomous code agents, outperforming single-path baselines by continuing to discover improvements over extended runs.

The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes

cs.CL · 2026-06-09 · unverdicted · novelty 4.0

A literature survey that introduces a taxonomy for LLM reasoning paradigms, analyzes methodological trends, and synthesizes failure modes from over 300 papers.

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

cs.CL · 2026-04-19

citing papers explorer

Showing 1 of 1 citing paper after filters.

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning cs.CL · 2026-04-19 · unreviewed · ref 53

Evolving deeper llm thinking

fields

years

verdicts

representative citing papers

citing papers explorer