DeepPrune prunes redundant parallel CoT traces via a judge model for equivalence prediction from partial traces plus online greedy clustering, delivering 65-88% token savings with accuracy within 3 points on AIME and GPQA benchmarks.
Efficient inference for large reasoning models: A survey,
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2025 3roles
background 1polarities
background 1representative citing papers
Thinking LLMs achieve ~10 percentage points higher accuracy than non-thinking ones on RewardBench with under 2x compute overhead, outperforming augmentation strategies that cost over 8x more while also showing better bias robustness.
LCPO reduces average LRM output length by over 50% across benchmarks via targeted preference optimization while preserving reasoning performance.
citing papers explorer
-
DeepPrune: Parallel Scaling without Inter-trace Redundancy
DeepPrune prunes redundant parallel CoT traces via a judge model for equivalence prediction from partial traces plus online greedy clustering, delivering 65-88% token savings with accuracy within 3 points on AIME and GPQA benchmarks.
-
Explicit Reasoning Makes Better Judges: A Systematic Study on Accuracy, Efficiency, and Robustness
Thinking LLMs achieve ~10 percentage points higher accuracy than non-thinking ones on RewardBench with under 2x compute overhead, outperforming augmentation strategies that cost over 8x more while also showing better bias robustness.
-
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
LCPO reduces average LRM output length by over 50% across benchmarks via targeted preference optimization while preserving reasoning performance.