arXiv preprint arXiv:2502.11569 , year=

Towards reasoning ability of small language models , author= · 2025 · arXiv 2502.11569

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost

cs.AI · 2026-05-07 · conditional · novelty 7.0

Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.

BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration

cs.CL · 2026-04-03 · unverdicted · novelty 7.0

BoostTaxo introduces a boosting-style LLM framework for zero-shot taxonomy induction that uses hybrid candidate selection and constraint-aware calibration to achieve superior or comparable performance to prior methods on WordNet, DBLP, and SemEval-Sci benchmarks.

Do LLMs Overthink Basic Math Reasoning? Benchmarking the Accuracy-Efficiency Tradeoff in Language Models

cs.CL · 2025-07-05 · conditional · novelty 7.0

Evaluations of 53 LLMs on 14 basic math tasks show reasoning models use ~18x more tokens with sometimes lower accuracy, non-monotonic gains from extended budgets, and sharp performance drops under token constraints.

When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models

cs.CL · 2026-05-04 · conditional · novelty 6.0

AloLab, an iterative meta-agent prompt optimizer, raises structured output accuracy for 7-9B models from 0% to 84-87% on GSM8K while preserving near-native inference speed.

Quantum-Inspired Trace-Augmented Evidence Selection for Reasoning over Structured Hypothesis Spaces

cs.AI · 2026-06-05 · unverdicted · novelty 5.0

EP-HUBO treats CoT evidence selection as higher-order unconstrained binary optimization over per-hypothesis pools with quality weights to improve aggregation on legal benchmarks.

DeepPrune: Parallel Scaling without Inter-trace Redundancy

cs.CL · 2025-10-09 · conditional · novelty 5.0

DeepPrune prunes redundant parallel CoT traces via a judge model for equivalence prediction from partial traces plus online greedy clustering, delivering 65-88% token savings with accuracy within 3 points on AIME and GPQA benchmarks.

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

cs.CL · 2025-03-20 · accept · novelty 5.0

A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.

SEF-CLGC at SemEval-2026 Task 11: Logical Notation Impact on Language Model Performance

cs.CL · 2026-06-08 · unverdicted · novelty 2.0

SEF-CLGC with SLMs trained on natural and symbolic languages achieves 27.80% content score while lowering content bias on SemEval-2026 Task 11 Subtask 1.

citing papers explorer

Showing 8 of 8 citing papers.

Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost cs.AI · 2026-05-07 · conditional · none · ref 94
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration cs.CL · 2026-04-03 · unverdicted · none · ref 30
BoostTaxo introduces a boosting-style LLM framework for zero-shot taxonomy induction that uses hybrid candidate selection and constraint-aware calibration to achieve superior or comparable performance to prior methods on WordNet, DBLP, and SemEval-Sci benchmarks.
Do LLMs Overthink Basic Math Reasoning? Benchmarking the Accuracy-Efficiency Tradeoff in Language Models cs.CL · 2025-07-05 · conditional · none · ref 22
Evaluations of 53 LLMs on 14 basic math tasks show reasoning models use ~18x more tokens with sometimes lower accuracy, non-monotonic gains from extended budgets, and sharp performance drops under token constraints.
When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models cs.CL · 2026-05-04 · conditional · none · ref 19
AloLab, an iterative meta-agent prompt optimizer, raises structured output accuracy for 7-9B models from 0% to 84-87% on GSM8K while preserving near-native inference speed.
Quantum-Inspired Trace-Augmented Evidence Selection for Reasoning over Structured Hypothesis Spaces cs.AI · 2026-06-05 · unverdicted · none · ref 15
EP-HUBO treats CoT evidence selection as higher-order unconstrained binary optimization over per-hypothesis pools with quality weights to improve aggregation on legal benchmarks.
DeepPrune: Parallel Scaling without Inter-trace Redundancy cs.CL · 2025-10-09 · conditional · none · ref 9
DeepPrune prunes redundant parallel CoT traces via a judge model for equivalence prediction from partial traces plus online greedy clustering, delivering 65-88% token savings with accuracy within 3 points on AIME and GPQA benchmarks.
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models cs.CL · 2025-03-20 · accept · none · ref 162
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
SEF-CLGC at SemEval-2026 Task 11: Logical Notation Impact on Language Model Performance cs.CL · 2026-06-08 · unverdicted · none · ref 17
SEF-CLGC with SLMs trained on natural and symbolic languages achieves 27.80% content score while lowering content bias on SemEval-2026 Task 11 Subtask 1.

arXiv preprint arXiv:2502.11569 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer