pith. machine review for the scientific record. sign in

arxiv: 2504.10368 · v4 · submitted 2025-04-14 · 💻 cs.CL · cs.AI

Recognition: unknown

Exploring the System 1 Thinking Capability of Large Reasoning Models

Authors on Pith no claims yet
classification 💻 cs.CL cs.AI
keywords reasoningsystemlrmscapabilitydifficultymodelsthinkingability
0
0 comments X
read the original abstract

This paper explores the system 1 thinking capability of Large Reasoning Models (LRMs), the intuitive ability to respond efficiently with minimal token usage. While existing LRMs rely on long-chain reasoning and excel at complex tasks, their system 1 thinking ability remains largely underexplored. This capability is essential as it reflects models' difficulty awareness and reasoning efficiency, both critical for real-world applications. We propose S1-Bench, a multi-domain, multilingual benchmark comprising model-simple system 1 questions. Our investigation of 28 LRMs reveals under-accuracy and inefficiency on system 1 problems. We find existing efficient reasoning methods either generalize poorly to simple questions or sacrifice performance for efficiency. Further exploration uncovers LRMs' early difficulty awareness accompanied by lower confidence, and shows that problem difficulty is implicitly encoded in hidden states.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost

    cs.AI 2026-05 conditional novelty 7.0

    Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.

  2. Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

    cs.AI 2026-05 unverdicted novelty 6.0

    CoRD uses collaborative multi-teacher step-wise decoding with perplexity-guided beam search to generate higher-quality Long-CoT data that lets smaller models reach near-teacher performance with less supervision.

  3. Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

    cs.CL 2025-03 accept novelty 5.0

    A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.