ThinkProbe builds non-generative Thought Graphs from 4200 LLM traces across 7 models and 200 questions to extract 5D cognitive profiles, finding model-level stability in reasoning structure that exceeds domain effects in four dimensions.
arXiv preprint arXiv:2506.02532 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
DATG framework diagnoses that non-English reasoning in Qwen3 models shows reduced mathematical anchor coverage and dependency fidelity, with Loop-Retry and Formula-Retry improving target-language accuracy.
Unsupervised clustering on sentence-initial 3-token pivots extracts 7 universal reasoning operators from 44k traces across 12 LLMs that enable model fingerprinting and answer-correctness prediction.
citing papers explorer
-
ThinkProbe: Beyond Accuracy -- Structural Profiling of Open-Ended LLM Reasoning Traces via Non-Generative Thought Graphs
ThinkProbe builds non-generative Thought Graphs from 4200 LLM traces across 7 models and 200 questions to extract 5D cognitive profiles, finding model-level stability in reasoning structure that exceeds domain effects in four dimensions.
-
Beyond Input Understanding: Diagnosing Multilingual Mathematical Reasoning with Directed Acyclic Trace Graphs
DATG framework diagnoses that non-English reasoning in Qwen3 models shows reduced mathematical anchor coverage and dependency fidelity, with Loop-Retry and Formula-Retry improving target-language accuracy.
-
ReasonOps: Operator Segmentation for LLM Reasoning Traces
Unsupervised clustering on sentence-initial 3-token pivots extracts 7 universal reasoning operators from 44k traces across 12 LLMs that enable model fingerprinting and answer-correctness prediction.