Graph-Based Chain-of-Thought Pruning for Reducing Redundant Reflections in Reasoning LLMs
Pith reviewed 2026-05-10 19:58 UTC · model grok-4.3
The pith
Converting linear chain-of-thought into graphs and pruning weak or repeated reflections cuts average reasoning tokens by 42 percent while keeping or raising accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that turning a linear chain-of-thought into a directed acyclic graph with explicit dependency edges, then applying branch-level pruning to remove weakly contributing reflection branches and depth-level pruning to remove late re-verifications, produces concise yet correct reasoning traces. These traces are distilled into the model via supervised fine-tuning on the pruned examples, direct preference optimization for shorter correct trajectories, and group relative policy optimization with a length penalty, yielding an average 42 percent reduction in reasoning tokens while accuracy stays the same or improves.
What carries the argument
The directed acyclic graph conversion of a linear CoT trace together with the dual pruning rules that remove low-impact reflection branches and late-stage re-verifications.
If this is right
- Reasoning models trained this way produce shorter traces on average while solving the same problems correctly.
- The three-stage training pipeline (SFT on pruned traces, DPO for brevity preference, GRPO with length penalty) jointly optimizes accuracy and token efficiency.
- Indiscriminate and repetitive reflection patterns can be systematically identified and removed using dependency structure in the graph.
- The same pruning logic applies across different reasoning domains where overthinking appears as extra reflection steps.
Where Pith is reading between the lines
- The same graph construction and pruning steps could be applied at inference time without retraining to shorten outputs on the fly.
- Explicit dependency graphs may help detect other forms of redundancy besides reflection, such as unnecessary intermediate calculations.
- If the pruning rules prove stable across domains, they could be used to create smaller specialized reasoning models that retain performance at lower compute cost.
Load-bearing premise
The graph conversion and pruning rules can always tell the difference between reflections that do not affect the final answer and reflections that are actually required for correctness on new problems.
What would settle it
Measuring error rates on a held-out set of reasoning problems after pruning and finding that the pruned model makes more mistakes than the original model on the same problems.
Figures
read the original abstract
Extending CoT through RL has been widely used to enhance the reasoning capabilities of LLMs. However, due to the sparsity of reward signals, it can also induce undesirable thinking patterns such as overthinking, i.e., generating redundant intermediate reasoning content. In this work, we argue that a major source of such redundancy is inefficient reflection, which often manifests in two problematic patterns: Indiscriminate Reflection, where the model performs broad, low-impact checks throughout reasoning, and Repetitive Reflection, where it repeatedly re-verifies an already established conclusion. To address this, we introduce a graph-based CoT optimization framework. Specifically, we convert each linear CoT into a directed acyclic graph (DAG) with explicit dependency edges, and design a dual pruning strategy: branch-level pruning removes weakly contributing reflection branches, while depth-level pruning eliminates late-stage re-verification. We distill this behavior via a three-stage pipeline: (1) SFT to initialize the policy on pruned concise traces, (2) DPO to prefer correct but less redundant trajectories, and (3) GRPO with length penalty to jointly optimize answer correctness and efficiency. Experiments show that our approach reduces the average reasoning tokens by 42\% while maintaining or improving accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that redundant reflections in LLM Chain-of-Thought reasoning—specifically indiscriminate and repetitive patterns—can be mitigated by converting linear CoT traces into DAGs with explicit dependency edges, applying a dual pruning strategy (branch-level removal of weakly contributing reflections and depth-level elimination of late re-verifications), and distilling the resulting concise traces via a three-stage pipeline of SFT, DPO, and GRPO with length penalty. Experiments are reported to achieve a 42% reduction in average reasoning tokens while maintaining or improving accuracy.
Significance. If the empirical results hold under rigorous validation, the work offers a structured, graph-based approach to improving reasoning efficiency in LLMs without sacrificing correctness, addressing overthinking induced by RL. The three-stage distillation pipeline and explicit separation of reflection patterns represent a clear methodological contribution. Credit is due for the reproducible pipeline design and the attempt to operationalize redundancy via DAG dependencies, though the absence of detailed experimental protocols limits immediate applicability.
major comments (3)
- [Abstract and §4] Abstract and §4: The central claim of a 42% token reduction with maintained or improved accuracy provides no information on the specific baselines used, the datasets and problem types evaluated, statistical significance tests, or whether pruning thresholds were tuned post-hoc on test data; this directly weakens support for the reported efficiency gains.
- [§3.2] §3.2: The dual pruning strategy defines branch-level pruning as removing 'weakly contributing reflection branches' and depth-level pruning as eliminating 'late-stage re-verification,' but supplies neither a formal definition of contribution/dependency nor explicit construction rules for the DAG edges; this is load-bearing for the claim that pruning removes only redundancy rather than necessary conditional verification steps.
- [§4.3] §4.3: No ablation studies are presented that apply the pruning rules to problems known to require iterative self-correction and then measure accuracy preservation on held-out instances; such results are required to substantiate that the subsequent SFT+DPO+GRPO stages do not optimize on incomplete traces.
minor comments (2)
- [Figure 1] Figure 1 (DAG conversion example) would benefit from explicit annotation of the dependency edges and pruning decisions to illustrate the branch-level and depth-level rules.
- [§2] The manuscript should include references to prior work on reflection detection and CoT compression (e.g., recent papers on overthinking in o1-style models) to better situate the dual pruning contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to improve clarity, formalization, and validation of the claims.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4: The central claim of a 42% token reduction with maintained or improved accuracy provides no information on the specific baselines used, the datasets and problem types evaluated, statistical significance tests, or whether pruning thresholds were tuned post-hoc on test data; this directly weakens support for the reported efficiency gains.
Authors: We agree that the abstract and the summary presentation in §4 should be more self-contained to better support the central claim. In the revised manuscript, we will expand the abstract to explicitly list the baselines, the datasets and problem types used in evaluation, and the statistical significance tests applied. We will also add a clarification in §4 that pruning thresholds were selected via cross-validation on a development set, not tuned post-hoc on test data. These revisions will strengthen the empirical presentation without altering the reported results. revision: yes
-
Referee: [§3.2] §3.2: The dual pruning strategy defines branch-level pruning as removing 'weakly contributing reflection branches' and depth-level pruning as eliminating 'late-stage re-verification,' but supplies neither a formal definition of contribution/dependency nor explicit construction rules for the DAG edges; this is load-bearing for the claim that pruning removes only redundancy rather than necessary conditional verification steps.
Authors: This comment correctly identifies a need for greater formal rigor in §3.2. We will revise this section to include a formal definition of contribution (as the marginal effect on reasoning path validity when a reflection node is removed) and dependency (as logical or semantic precedence between CoT steps). We will also provide explicit, step-by-step rules for constructing the DAG edges from the linear trace. These additions will make clear that the pruning targets only redundant patterns and does not eliminate necessary conditional verifications. revision: yes
-
Referee: [§4.3] §4.3: No ablation studies are presented that apply the pruning rules to problems known to require iterative self-correction and then measure accuracy preservation on held-out instances; such results are required to substantiate that the subsequent SFT+DPO+GRPO stages do not optimize on incomplete traces.
Authors: We acknowledge that targeted ablations on self-correction-intensive problems would provide stronger evidence. Our existing evaluation covers a range of reasoning tasks that include iterative elements, but to directly address the concern we will add a dedicated ablation subsection in the revised §4.3. This will apply the pruning rules to a subset of problems known to require self-correction, report accuracy on held-out instances, and confirm that the three-stage distillation preserves correctness on the pruned traces. revision: yes
Circularity Check
No circularity: empirical pipeline validated by experiments
full rationale
The paper presents an applied method that converts linear CoT traces to DAGs, applies explicit branch-level and depth-level pruning rules, and then uses a three-stage distillation process (SFT on pruned traces, DPO for preference, GRPO with length penalty). The central result—a 42% average token reduction with maintained or improved accuracy—is reported as an experimental outcome on held-out data rather than a mathematical prediction or fitted quantity derived from the same inputs. No equations, self-definitional loops, or load-bearing self-citations appear in the derivation chain; the pruning criteria and training stages are described as independent design choices whose effectiveness is tested directly. This is a standard empirical engineering contribution whose claims remain falsifiable outside any internal fitting.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Reflection branches can be scored for contribution via dependency structure in the DAG
invented entities (1)
-
Dual pruning strategy on CoT DAG
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
convert each linear CoT into a directed acyclic graph (DAG) with explicit dependency edges, and design a dual pruning strategy: branch-level pruning removes weakly contributing reflection branches, while depth-level pruning eliminates late-stage re-verification
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
B(v) = |Desc(v)| ... d(v)/dmax > m ... m=0.9 and k=2
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
three-stage pipeline: (1) SFT ... (2) DPO ... (3) GRPO with length penalty
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Deepseek-r1: Incentivizing reasoning capa- bility in llms via reinforcement learning.Preprint, arXiv:2501.12948. Yunzhen Feng, Julia Kempe, Cheng Zhang, Parag Jain, and Anthony Hartshorn. 2025. What characterizes effective reasoning? revisiting length, review, and structure of cot.Preprint, arXiv:2509.19284. Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Wait, we don’t need to "wait"! removing think- ing tokens improves reasoning efficiency.Preprint, arXiv:2506.08343. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-thought prompting elic- its reasoning in large language models.Preprint, arXiv:2201.11903. Liang Wen, Yunke Cai, F...
-
[3]
•current_step: the current reasoning text segment (continuous content from the CoT)
Inputs •graph: an existing partial reasoning graph (Mermaid code). •current_step: the current reasoning text segment (continuous content from the CoT)
-
[4]
– Abstraction: donotrecord low-level arithmetic/symbolic manipulation
Node Definition (Reasoning Unit) • A node represents anabstract reasoning unitwith: – Semantic completeness: a clear intent and its reasoning product (e.g., introducing a constraint, deriving a conclusion, setting a sub-goal, establishing a framework). – Abstraction: donotrecord low-level arithmetic/symbolic manipulation. – Dependability: its product can ...
-
[5]
•Product generation: yields a key conclusion/property/constraint/equivalence used later
Independence CriteriaTreatcurrent_stepas a new reasoning unit (preferInsert) if it satisfies any of: •Goal introduction: introduces a new intermediate goal/subproblem. •Product generation: yields a key conclusion/property/constraint/equivalence used later. •Method switch: changes the reasoning strategy or framework. •Structural advancement: adds structure...
-
[6]
Update Operations • Merge: Allowed only if current_step can be integrated into exactly one existing node while keeping itabstractand single-purpose.Hard constraints: Review contentcannotbe merged into aprogressnode; Do not merge if it causes one node to mix “advance” and “reflect” as major functions. Must specifytarget_nodeandupdated_node_description. • I...
-
[7]
Edge Construction Rules •Dependency principle: if node B uses products from node A, add edgeA→Bwith a clear dependency label. • Branch origin principle: new attempts must branch fromstill-valid ancestor products, not from negated/dead-end nodes. •Ordering constraints: –Node IDs increase lexicographically: A–Z, AA–AZ, BA–BZ, . . . –Edges must go from lexic...
-
[8]
Output Format (Strict JSON) { "decision": "Insert or Merge", "target_node": "(If Merge, the node ID to merge into; if Insert, leave empty)", "new_node": { "id": "(If Insert, a unique ID; if Merge, leave empty)", "description": "(If Insert, a concise description of the new node; if Merge, leave empty)", "type": "progress or review" }, "edges": [ { "from": ...
-
[9]
= 2.4 − t/60. …… Let’s subtract equation (2) from equation (1): 9/s − 9/(s + 2) = 1.6, which gives s(s + 2) = 11.25. …… Solve the quadratic equation and discard the negative solution to get s = 2.5 km/h. Plug s = 2.5 back into equation (1) to find t = 24 minutes. Let me check this with equation (2) to ensure consistency: 9 / 4.5 = 2 hours, and 2.4 − 0.4 =...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.