Deepseek-r1 thoughtology: Let's think about llm reasoning

URLhttps://arxiv · 2025 · arXiv 2504.07128

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

cs.AI · 2026-04-27 · unverdicted · novelty 7.0

XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.

DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning

cs.AI · 2025-11-04 · unverdicted · novelty 7.0

DecompSR is a large, symbolically verified benchmark dataset and generation framework that independently varies productivity, substitutivity, overgeneralisation, and systematicity to probe compositional multihop spatial reasoning in LLMs.

Reasoning Can Be Restored by Correcting a Few Decision Tokens

cs.AI · 2026-05-16 · conditional · novelty 6.0

Reasoning gaps between base LLMs and LRMs concentrate on ~8% of early planning tokens; intervening with the reasoning model only at high-disagreement positions recovers performance.

ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments

cs.CL · 2025-08-06 · unverdicted · novelty 6.0

ReasoningGuard is an inference-time method that uses attention mechanisms to inject safety aha moments and scaling sampling to defend large reasoning models against jailbreak attacks.

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

cs.AI · 2025-06-07 · unverdicted · novelty 6.0

LRMs exhibit complete accuracy collapse beyond certain puzzle complexities, with reasoning effort rising then declining, outperforming standard LLMs only on medium-complexity tasks.

How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.

From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

SFI-Bench shows current multimodal LLMs struggle to integrate spatial memory with functional reasoning and external knowledge in video tasks.

Language Specific Knowledge: Do Models Know Better in X than in English?

cs.CL · 2025-05-21 · unverdicted · novelty 5.0

The paper introduces Language Specific Knowledge (LSK) and shows that selecting an optimal non-English language for a query can improve LLM performance on cultural and social norm datasets.

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

cs.CL · 2026-05-01

citing papers explorer

Showing 9 of 9 citing papers.

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation cs.AI · 2026-04-27 · unverdicted · none · ref 20
XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.
DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning cs.AI · 2025-11-04 · unverdicted · none · ref 12
DecompSR is a large, symbolically verified benchmark dataset and generation framework that independently varies productivity, substitutivity, overgeneralisation, and systematicity to probe compositional multihop spatial reasoning in LLMs.
Reasoning Can Be Restored by Correcting a Few Decision Tokens cs.AI · 2026-05-16 · conditional · none · ref 19
Reasoning gaps between base LLMs and LRMs concentrate on ~8% of early planning tokens; intervening with the reasoning model only at high-disagreement positions recovers performance.
ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments cs.CL · 2025-08-06 · unverdicted · none · ref 11
ReasoningGuard is an inference-time method that uses attention mechanisms to inject safety aha moments and scaling sampling to defend large reasoning models against jailbreak attacks.
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity cs.AI · 2025-06-07 · unverdicted · none · ref 32
LRMs exhibit complete accuracy collapse beyond certain puzzle complexities, with reasoning effort rising then declining, outperforming standard LLMs only on medium-complexity tasks.
How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem cs.AI · 2026-05-07 · unverdicted · none · ref 43
Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.
From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs cs.CV · 2026-05-04 · unverdicted · none · ref 45
SFI-Bench shows current multimodal LLMs struggle to integrate spatial memory with functional reasoning and external knowledge in video tasks.
Language Specific Knowledge: Do Models Know Better in X than in English? cs.CL · 2025-05-21 · unverdicted · none · ref 13
The paper introduces Language Specific Knowledge (LSK) and shows that selecting an optimal non-English language for a query can improve LLM performance on cultural and social norm datasets.
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models cs.CL · 2026-05-01 · unreviewed · ref 27

Deepseek-r1 thoughtology: Let's think about llm reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer