Toward a theory of generalizability in llm mechanistic interpretability research.arXiv preprint arXiv:2509.22831,

Sean Trott · arXiv 2509.22831

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Developmental Trajectories of Situation Modeling and Mentalizing in Transformer Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

Larger LLMs acquire basic situation modeling before mentalizing on false-belief tasks, with performance depending on size, training volume, and post-training, yet remaining sensitive to non-factive verbs and agent knowledge states.

Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning

cs.AI · 2026-06-11 · unverdicted · novelty 6.0

Humans and LLMs exhibit similar error patterns in common-sense reasoning, consistent with shared pattern-matching mechanisms rather than abstract world models.

citing papers explorer

Showing 2 of 2 citing papers.

Developmental Trajectories of Situation Modeling and Mentalizing in Transformer Language Models cs.CL · 2026-06-26 · unverdicted · none · ref 41
Larger LLMs acquire basic situation modeling before mentalizing on false-belief tasks, with performance depending on size, training volume, and post-training, yet remaining sensitive to non-factive verbs and agent knowledge states.
Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning cs.AI · 2026-06-11 · unverdicted · none · ref 24
Humans and LLMs exhibit similar error patterns in common-sense reasoning, consistent with shared pattern-matching mechanisms rather than abstract world models.

Toward a theory of generalizability in llm mechanistic interpretability research.arXiv preprint arXiv:2509.22831,

fields

years

verdicts

representative citing papers

citing papers explorer