Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
Bag of tricks for inference-time computa- tion of llm reasoning
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
A black-box LLM approach for fault localization in system-level test code that estimates execution traces from failure logs to rank potential faults with reduced inference cost.
Thinking LLMs achieve ~10 percentage points higher accuracy than non-thinking ones on RewardBench with under 2x compute overhead, outperforming augmentation strategies that cost over 8x more while also showing better bias robustness.
LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
citing papers explorer
-
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
-
Efficient Black-Box Fault Localization for System-Level Test Code Using Large Language Models
A black-box LLM approach for fault localization in system-level test code that estimates execution traces from failure logs to rank potential faults with reduced inference cost.
-
Explicit Reasoning Makes Better Judges: A Systematic Study on Accuracy, Efficiency, and Robustness
Thinking LLMs achieve ~10 percentage points higher accuracy than non-thinking ones on RewardBench with under 2x compute overhead, outperforming augmentation strategies that cost over 8x more while also showing better bias robustness.
-
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.
-
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.