CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
What makes a good reasoning chain? uncovering structural patterns in long chain-of- thought reasoning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLM planning in four-in-a-row is myopic: move choices match a shallow model that ignores deep nodes expanded in reasoning traces.
citing papers explorer
-
CLORE: Content-Level Optimization for Reasoning Efficiency
CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
-
Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning
LLM planning in four-in-a-row is myopic: move choices match a shallow model that ignores deep nodes expanded in reasoning traces.