Evolving deeper llm thinking

Evolving deeper llm thinking , author= · 2025 · arXiv 2501.09891

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

representative citing papers

Joint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimization

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Joint Consistency casts test-time aggregation as Ising-type energy minimization with pairwise LLM-judge interactions, subsuming voting methods and outperforming baselines across reasoning tasks.

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution

cs.CL · 2025-11-24 · unverdicted · novelty 7.0

CogAlpha combines LLM reasoning with code-level evolutionary search to discover financial alphas that show higher predictive accuracy and generalization than prior methods on five stock datasets.

SeLaR: Selective Latent Reasoning in Large Language Models

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.

Procedural Knowledge at Scale Improves Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 6.0

Reasoning Memory decomposes reasoning trajectories into 32 million subquestion-subroutine pairs and retrieves them via in-thought prompts to improve language model performance on math, science, and coding benchmarks by up to 19.2%.

AlphaMemo: Structured Search-Process Memory for Self-Evolving Alpha Mining Agents

cs.AI · 2026-05-26 · unverdicted · novelty 5.0

AlphaMemo equips LLM alpha-mining agents with AST-diff motif memory, residual learning, and asymmetric veto control to improve out-of-sample factor discovery on CSI 500 and S&P 500.

ExComm: Exploration-Stage Communication for Error-Resilient Agentic Test-Time Scaling

cs.AI · 2026-05-21 · unverdicted · novelty 5.0

ExComm adds cross-agent conflict detection and soft belief correction plus trajectory diversification to agentic test-time scaling, yielding 5-6% gains over baselines on AIME and GAIA benchmarks.

GEAR: Genetic AutoResearch for Agentic Code Evolution

cs.NE · 2026-05-08 · unverdicted · novelty 5.0

GEAR applies genetic algorithms to maintain and evolve multiple research states in autonomous code agents, outperforming single-path baselines by continuing to discover improvements over extended runs.

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

cs.CL · 2026-04-19

citing papers explorer

Showing 7 of 7 citing papers after filters.

Joint Consistency: A Unified Test-Time Aggregation Framework via Energy Minimization cs.AI · 2026-05-07 · unverdicted · none · ref 24
Joint Consistency casts test-time aggregation as Ising-type energy minimization with pairwise LLM-judge interactions, subsuming voting methods and outperforming baselines across reasoning tasks.
SeLaR: Selective Latent Reasoning in Large Language Models cs.CL · 2026-04-09 · unverdicted · none · ref 19
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
Procedural Knowledge at Scale Improves Reasoning cs.CL · 2026-04-01 · unverdicted · none · ref 14
Reasoning Memory decomposes reasoning trajectories into 32 million subquestion-subroutine pairs and retrieves them via in-thought prompts to improve language model performance on math, science, and coding benchmarks by up to 19.2%.
AlphaMemo: Structured Search-Process Memory for Self-Evolving Alpha Mining Agents cs.AI · 2026-05-26 · unverdicted · none · ref 61
AlphaMemo equips LLM alpha-mining agents with AST-diff motif memory, residual learning, and asymmetric veto control to improve out-of-sample factor discovery on CSI 500 and S&P 500.
ExComm: Exploration-Stage Communication for Error-Resilient Agentic Test-Time Scaling cs.AI · 2026-05-21 · unverdicted · none · ref 3
ExComm adds cross-agent conflict detection and soft belief correction plus trajectory diversification to agentic test-time scaling, yielding 5-6% gains over baselines on AIME and GAIA benchmarks.
GEAR: Genetic AutoResearch for Agentic Code Evolution cs.NE · 2026-05-08 · unverdicted · none · ref 12
GEAR applies genetic algorithms to maintain and evolve multiple research states in autonomous code agents, outperforming single-path baselines by continuing to discover improvements over extended runs.
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning cs.CL · 2026-04-19 · unreviewed · ref 53

Evolving deeper llm thinking

fields

years

verdicts

representative citing papers

citing papers explorer