pith. sign in

arxiv: 2509.23946 · v3 · pith:QRLCN2EBnew · submitted 2025-09-28 · 💻 cs.LG · cs.AI· cs.CL· stat.ML

Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

classification 💻 cs.LG cs.AIcs.CLstat.ML
keywords executionchainexplorationonlyphaseplanningtextbftokens
0
0 comments X
read the original abstract

Many LLMs plan before they act, yet planning and execution are often still entangled in one long generation trace, enforced only through prompts, or split across separate components. We argue that these two stages call for different computation: planning benefits from diversity and breadth, whereas execution demands precision and faithful adherence to a chosen strategy. Treating them as a single undifferentiated chain wastes tokens on routine derivation and makes it costly to explore alternative strategies at test time. We present the \textbf{Explore-Execute Chain (E\textsuperscript{2}C)}, which keeps both stages in one model but separates them structurally: a stochastic \textit{Exploration} phase drafts a concise high-level plan, and a deterministic \textit{Execution} phase carries it out. Causal SFT and RL train this split so that exploration stays informative and execution remains plan-faithful. Once plans are short yet decisive, extra inference compute can be directed to exploration rather than to repeatedly decoding full solutions. On AIME'2024 at $K{=}32$, \textbf{E\textsuperscript{2}C-ReAct Loop} reaches 53.3\% accuracy with only 12.4k tokens, outperforming Tree-of-Thoughts ($N{=}32$: 50.0\%, 71.3k). The same structure also supports lightweight domain adaptation: \textbf{Exploration-Focused SFT (EF-SFT)} updates only the planning phase, uses 3.5\% of the tokens required by standard SFT, and improves medical benchmark accuracy by up to 14.5\%.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.