Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

Alex Lamb; Dianbo Liu; Kaicheng Yang; Kaisen Yang; Qinwei Ma; Rushi Shah; Tinghe Zhang

read the original abstract

Many LLMs plan before they act, yet planning and execution are often still entangled in one long generation trace, enforced only through prompts, or split across separate components. We argue that these two stages call for different computation: planning benefits from diversity and breadth, whereas execution demands precision and faithful adherence to a chosen strategy. Treating them as a single undifferentiated chain wastes tokens on routine derivation and makes it costly to explore alternative strategies at test time. We present the \textbf{Explore-Execute Chain (E\textsuperscript{2}C)}, which keeps both stages in one model but separates them structurally: a stochastic \textit{Exploration} phase drafts a concise high-level plan, and a deterministic \textit{Execution} phase carries it out. Causal SFT and RL train this split so that exploration stays informative and execution remains plan-faithful. Once plans are short yet decisive, extra inference compute can be directed to exploration rather than to repeatedly decoding full solutions. On AIME'2024 at $K{=}32$, \textbf{E\textsuperscript{2}C-ReAct Loop} reaches 53.3\% accuracy with only 12.4k tokens, outperforming Tree-of-Thoughts ($N{=}32$: 50.0\%, 71.3k). The same structure also supports lightweight domain adaptation: \textbf{Exploration-Focused SFT (EF-SFT)} updates only the planning phase, uses 3.5\% of the tokens required by standard SFT, and improves medical benchmark accuracy by up to 14.5\%.

Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

discussion (0)