Reasoning Structure of Large Language Models

Fabian Farestam; Fr\'ed\'eric Berdoz; Luca A. Lanzend\"orfer; Roger Wattenhofer

arxiv: 2606.03883 · v1 · pith:CZPCDEZNnew · submitted 2026-06-02 · 💻 cs.AI · cs.LG

Reasoning Structure of Large Language Models

Fr\'ed\'eric Berdoz , Luca A. Lanzend\"orfer , Fabian Farestam , Roger Wattenhofer This is my paper

classification 💻 cs.AI cs.LG

keywords reasoningmodelsaccuracycountlargemetricstokenaddress

0 comments

read the original abstract

Large reasoning models (LRMs) are often evaluated using metrics such as final-answer accuracy or token count. However, identical scores on these metrics can hide fundamentally different reasoning structures. To address this limitation, we introduce a scalable LRM benchmark of logic puzzles and a pipeline that converts unstructured traces into verifiable reasoning graphs of claims and dependencies. This turns reasoning into a structured, measurable object whose topology can be quantitatively analyzed. Building on this, we define a reasoning efficiency metric that quantifies how concentrated the model's logical flow is. Our analysis on open-source reasoning models shows that structural measurements separate behaviors that token count and accuracy conflate, providing a practical tool for diagnosing failure modes and comparing how reasoning scales with puzzle difficulty.

This paper has not been read by Pith yet.

Reasoning Structure of Large Language Models

discussion (0)