Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
Thinkless: A training-free inference- efficient method for reducing reasoning redundancy
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
CRISP compresses chain-of-thought by 50-60% using intrinsic attention saliency from the termination token to prune redundancy while preserving accuracy on math tasks.
Using properties of positional embeddings, reasoning LLMs can be made to think, listen, and generate outputs asynchronously without any additional training, cutting time to first token to under 5 seconds.
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
citing papers explorer
-
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
-
CRISP: Compressing Redundancy in Chain-of-Thought via Intrinsic Saliency Pruning
CRISP compresses chain-of-thought by 50-60% using intrinsic attention saliency from the termination token to prune redundancy while preserving accuracy on math tasks.
-
Asynchronous Reasoning: Training-Free Interactive Thinking LLMs
Using properties of positional embeddings, reasoning LLMs can be made to think, listen, and generate outputs asynchronously without any additional training, cutting time to first token to under 5 seconds.
-
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.