Distilling and retrieving reusable reasoning skills lets LLMs solve coding and math problems with fewer tokens and higher accuracy.
arXiv preprint arXiv:2506.08343 , year =
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
LLM mathematical reasoning forms ordered, step-specific trajectories in representation space that already exist in base models, diverge for correct vs. incorrect solutions at late stages, and support both correctness prediction (ROC-AUC 0.87) and trajectory-based steering.
Graph-based pruning of redundant reflections in LLM chain-of-thought reduces average reasoning tokens by 42% while preserving or improving accuracy.
CRISP achieves 57-59% token reduction on MATH-500 with 9-16 point accuracy gains on Qwen3 models via iterative self-distillation of concise reasoning behavior.
Extra-CoT trains a semantic compressor on math CoT data, applies mixed-ratio SFT, and uses CHRPO reinforcement learning to achieve over 73% token reduction on MATH-500 with 0.6% accuracy gain on Qwen3-1.7B.
Adding controlled noise and irrelevant persona contexts across training and testing stages for strong LLMs yields better reasoning and efficiency than high-quality data alone, reaching 76.7% on AIME24/25 with Qwen2.5-32B.
DTSR enables large reasoning models to dynamically assess chain-of-thought sufficiency via reflection signals and a sufficiency check, reducing reasoning length by 28.9-34.9% with minimal performance loss on Qwen3 models.
citing papers explorer
-
Thinking with Reasoning Skills: Fewer Tokens, More Accuracy
Distilling and retrieving reusable reasoning skills lets LLMs solve coding and math problems with fewer tokens and higher accuracy.
-
LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals
LLM mathematical reasoning forms ordered, step-specific trajectories in representation space that already exist in base models, diverge for correct vs. incorrect solutions at late stages, and support both correctness prediction (ROC-AUC 0.87) and trajectory-based steering.
-
Graph-Based Chain-of-Thought Pruning for Reducing Redundant Reflections in Reasoning LLMs
Graph-based pruning of redundant reflections in LLM chain-of-thought reduces average reasoning tokens by 42% while preserving or improving accuracy.
-
CRISP: Compressed Reasoning via Iterative Self-Policy Distillation
CRISP achieves 57-59% token reduction on MATH-500 with 9-16 point accuracy gains on Qwen3 models via iterative self-distillation of concise reasoning behavior.
-
Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression
Extra-CoT trains a semantic compressor on math CoT data, applies mixed-ratio SFT, and uses CHRPO reinforcement learning to achieve over 73% token reduction on MATH-500 with 0.6% accuracy gain on Qwen3-1.7B.
-
Input-Time Scaling: Adding Noise and Irrelevance into Less-Is-More Drastically Improves Reasoning Performance and Efficiency
Adding controlled noise and irrelevant persona contexts across training and testing stages for strong LLMs yields better reasoning and efficiency than high-quality data alone, reaching 76.7% on AIME24/25 with Qwen2.5-32B.
-
When Is Thinking Enough? Early Exit via Sufficiency Assessment for Efficient Reasoning
DTSR enables large reasoning models to dynamically assess chain-of-thought sufficiency via reflection signals and a sufficiency check, reducing reasoning length by 28.9-34.9% with minimal performance loss on Qwen3 models.