Thinkless: A training-free inference- efficient method for reducing reasoning redundancy

doi: 10 · 2025 · arXiv 2505.15684

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost

cs.AI · 2026-05-07 · conditional · novelty 7.0

Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.

CRISP: Compressing Redundancy in Chain-of-Thought via Intrinsic Saliency Pruning

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

CRISP compresses chain-of-thought by 50-60% using intrinsic attention saliency from the termination token to prune redundancy while preserving accuracy on math tasks.

Asynchronous Reasoning: Training-Free Interactive Thinking LLMs

cs.LG · 2025-12-11 · unverdicted · novelty 6.0

Using properties of positional embeddings, reasoning LLMs can be made to think, listen, and generate outputs asynchronously without any additional training, cutting time to first token to under 5 seconds.

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

cs.CL · 2025-03-20 · accept · novelty 5.0

A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.

citing papers explorer

Showing 4 of 4 citing papers.

Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost cs.AI · 2026-05-07 · conditional · none · ref 243
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
CRISP: Compressing Redundancy in Chain-of-Thought via Intrinsic Saliency Pruning cs.CL · 2026-04-19 · unverdicted · none · ref 1
CRISP compresses chain-of-thought by 50-60% using intrinsic attention saliency from the termination token to prune redundancy while preserving accuracy on math tasks.
Asynchronous Reasoning: Training-Free Interactive Thinking LLMs cs.LG · 2025-12-11 · unverdicted · none · ref 8
Using properties of positional embeddings, reasoning LLMs can be made to think, listen, and generate outputs asynchronously without any additional training, cutting time to first token to under 5 seconds.
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models cs.CL · 2025-03-20 · accept · none · ref 87
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.

Thinkless: A training-free inference- efficient method for reducing reasoning redundancy

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer