Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
arXiv preprint arXiv:2502.11569 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
BoostTaxo introduces a boosting-style LLM framework for zero-shot taxonomy induction that uses hybrid candidate selection and constraint-aware calibration to achieve superior or comparable performance to prior methods on WordNet, DBLP, and SemEval-Sci benchmarks.
Evaluations of 53 LLMs on 14 basic math tasks show reasoning models use ~18x more tokens with sometimes lower accuracy, non-monotonic gains from extended budgets, and sharp performance drops under token constraints.
AloLab, an iterative meta-agent prompt optimizer, raises structured output accuracy for 7-9B models from 0% to 84-87% on GSM8K while preserving near-native inference speed.
EP-HUBO treats CoT evidence selection as higher-order unconstrained binary optimization over per-hypothesis pools with quality weights to improve aggregation on legal benchmarks.
DeepPrune prunes redundant parallel CoT traces via a judge model for equivalence prediction from partial traces plus online greedy clustering, delivering 65-88% token savings with accuracy within 3 points on AIME and GPQA benchmarks.
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
SEF-CLGC with SLMs trained on natural and symbolic languages achieves 27.80% content score while lowering content bias on SemEval-2026 Task 11 Subtask 1.
citing papers explorer
-
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
-
BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration
BoostTaxo introduces a boosting-style LLM framework for zero-shot taxonomy induction that uses hybrid candidate selection and constraint-aware calibration to achieve superior or comparable performance to prior methods on WordNet, DBLP, and SemEval-Sci benchmarks.
-
Do LLMs Overthink Basic Math Reasoning? Benchmarking the Accuracy-Efficiency Tradeoff in Language Models
Evaluations of 53 LLMs on 14 basic math tasks show reasoning models use ~18x more tokens with sometimes lower accuracy, non-monotonic gains from extended budgets, and sharp performance drops under token constraints.
-
When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models
AloLab, an iterative meta-agent prompt optimizer, raises structured output accuracy for 7-9B models from 0% to 84-87% on GSM8K while preserving near-native inference speed.
-
Quantum-Inspired Trace-Augmented Evidence Selection for Reasoning over Structured Hypothesis Spaces
EP-HUBO treats CoT evidence selection as higher-order unconstrained binary optimization over per-hypothesis pools with quality weights to improve aggregation on legal benchmarks.
-
DeepPrune: Parallel Scaling without Inter-trace Redundancy
DeepPrune prunes redundant parallel CoT traces via a judge model for equivalence prediction from partial traces plus online greedy clustering, delivering 65-88% token savings with accuracy within 3 points on AIME and GPQA benchmarks.
-
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
-
SEF-CLGC at SemEval-2026 Task 11: Logical Notation Impact on Language Model Performance
SEF-CLGC with SLMs trained on natural and symbolic languages achieves 27.80% content score while lowering content bias on SemEval-2026 Task 11 Subtask 1.