Recognition: unknown
Slot Machines: How LLMs Keep Track of Multiple Entities
Pith reviewed 2026-05-09 23:44 UTC · model grok-4.3
The pith
Language models encode current-entity and prior-entity information in separate orthogonal slots within single token activations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Information about the currently described entity and the immediately preceding one is encoded in separate and largely orthogonal current-entity and prior-entity slots. The current-entity slot is used for explicit factual retrieval, whereas the prior-entity slot supports relational inferences such as entity-level induction and conflict detection between adjacent entities. Only the current-entity slot is consulted for factual questions even when answers are linearly decodable from the prior-entity slot as well. Open-weight models perform near chance on syntax that requires two subject-verb-object bindings on a single token, while recent frontier models succeed at the same task.
What carries the argument
Multi-slot probing that disentangles a single token's residual stream activation into current-entity and prior-entity slots.
If this is right
- The prior-entity slot enables relational tasks such as answering who came after a given character in a story.
- Factual questions continue to ignore information available in the prior-entity slot.
- Syntax that forces two full entity bindings onto one token exceeds the capacity of most current models.
- The slot structure offers a substrate for behaviors that require holding two perspectives simultaneously.
Where Pith is reading between the lines
- Architectures that allow flexible access to both slots at once might improve performance on multi-entity reasoning tasks.
- The same separation could be probed in other contexts where models must maintain dual views, such as consistency checking across a dialogue.
- Frontier models' success on the double-binding syntax suggests they may have begun to develop additional binding mechanisms beyond the two-slot pattern.
Load-bearing premise
The probing method isolates information the model actually uses rather than directions that merely happen to align with entity distinctions in the chosen datasets.
What would settle it
An experiment in which intervening on the prior-entity slot changes accuracy on explicit factual retrieval questions, or in which open-weight models succeed at double subject-verb-object syntax while frontier models fail.
read the original abstract
Language models must bind entities to the attributes they possess and maintain several such binding relationships within a context. We study how multiple entities are represented across token positions and whether single tokens can carry bindings for more than one entity. We introduce a multi-slot probing approach that disentangles a single token's residual stream activation to recover information about both the currently described entity and the immediately preceding one. These two kinds of information are encoded in separate and largely orthogonal "current-entity" and "prior-entity" slots. We analyze the functional roles of these slots and find that they serve different purposes. In tandem with the current-entity slot, the prior-entity slot supports relational inferences, such as entity-level induction ("who came after Alice in the story?") and conflict detection between adjacent entities. However, only the current-entity slot is used for explicit factual retrieval questions ("Is anyone in the story tall?" "What is the tall entity's name?") despite these answers being linearly decodable from the prior-entity slot too. Consistent with this limitation, open-weight models perform near chance accuracy at processing syntax that forces two subject-verb-object bindings on a single token (e.g., "Alice prepares and Bob consumes food.") Interestingly, recent frontier models can parse this properly, suggesting they may have developed more sophisticated binding strategies. Overall, our results expose a gap between information that is available in activations and information the model actually uses, and suggest that the current/prior-entity slot structure is a natural substrate for behaviors that require holding two perspectives at once, such as sycophancy and deception.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a multi-slot probing method to disentangle residual stream activations at individual tokens into two largely orthogonal directions: a 'current-entity' slot carrying information about the entity being described at that position and a 'prior-entity' slot for the immediately preceding entity. Through probing and behavioral experiments on held-out inputs, it claims these slots serve distinct functional roles—current-entity for explicit factual retrieval questions, and both slots together for relational inferences such as entity induction and conflict detection—while also explaining why most models fail at double-binding syntax (e.g., 'Alice prepares and Bob consumes food') but frontier models succeed. The work highlights a gap between linearly decodable information and information the model functionally uses.
Significance. If the central claims hold, the results offer a concrete mechanistic account of entity tracking and binding in transformers, with direct relevance to understanding limitations in multi-entity reasoning, relational inference, and phenomena such as sycophancy. The distinction between availability and functional use of information is a valuable framing, and the observation that recent frontier models handle double-binding syntax better suggests an evolving capacity that could be tracked over model generations. The probing approach itself may generalize to other binding problems.
major comments (3)
- [§5] §5 (functional roles experiments): The claim that 'only the current-entity slot is used for explicit factual retrieval questions' despite linear decodability from the prior slot rests on higher probe accuracy for the current direction and near-chance model performance on double-binding syntax. This is correlational; without an intervention that selectively perturbs or ablates the prior-entity direction (while preserving the current direction) and demonstrates no change in factual retrieval accuracy, the functional non-use conclusion remains unestablished.
- [§3] §3 (multi-slot probing method): The orthogonality and separation of current- and prior-entity directions are demonstrated via linear probes, but the manuscript does not report controls for whether these directions generalize beyond the specific entity-attribute datasets used or whether they capture functional routing rather than dataset-specific correlations. Additional cross-dataset probe transfer results or synthetic controls would be needed to support the 'slots' interpretation.
- [§4.3] §4.3 (double-binding syntax tests): The near-chance performance on constructions forcing two SVO bindings on one token is presented as consistent with the slot limitation, but the paper does not quantify how much of the failure is attributable to the prior slot being inaccessible versus other factors such as attention patterns or training data distribution. A breakdown by model scale and error type would clarify the link to the slot hypothesis.
minor comments (3)
- [Figure 3] Figure 3 and associated text: The visualization of slot orthogonality would benefit from reporting the full distribution of cosine similarities across layers and positions rather than selected examples, to allow readers to assess robustness.
- [Methods] Methods section: Data exclusion criteria and the exact number of examples per condition are not fully specified; including these details (or a link to the dataset) would improve reproducibility.
- Notation: The terms 'current-entity slot' and 'prior-entity slot' are used interchangeably with 'directions' in some places; consistent terminology would reduce ambiguity when discussing functional roles versus representational geometry.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. The comments identify valuable opportunities to strengthen the evidence for our claims about the functional roles of the entity slots. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: §5 (functional roles experiments): The claim that 'only the current-entity slot is used for explicit factual retrieval questions' despite linear decodability from the prior slot rests on higher probe accuracy for the current direction and near-chance model performance on double-binding syntax. This is correlational; without an intervention that selectively perturbs or ablates the prior-entity direction (while preserving the current direction) and demonstrates no change in factual retrieval accuracy, the functional non-use conclusion remains unestablished.
Authors: We agree that the evidence presented is correlational and that selective interventions would provide stronger causal support for the conclusion that the prior-entity slot is not functionally used for explicit factual retrieval. Our current argument combines higher probe accuracy on the current direction with near-chance behavioral performance on double-binding syntax. In the revised manuscript we will add an explicit limitations subsection in the discussion that acknowledges this gap and outlines feasible future interventions (e.g., activation steering or direction-specific ablation). We will also report more granular per-direction probe accuracies to better quantify the observed disparity. revision: partial
-
Referee: §3 (multi-slot probing method): The orthogonality and separation of current- and prior-entity directions are demonstrated via linear probes, but the manuscript does not report controls for whether these directions generalize beyond the specific entity-attribute datasets used or whether they capture functional routing rather than dataset-specific correlations. Additional cross-dataset probe transfer results or synthetic controls would be needed to support the 'slots' interpretation.
Authors: We appreciate the call for stronger controls on generalization. In the revision we will add cross-dataset probe transfer results, including experiments on a new synthetic dataset with procedurally generated entities and attributes. These results will be reported alongside the original findings to demonstrate that the orthogonal directions are not artifacts of the particular entity-attribute corpus and instead reflect a more general routing mechanism. revision: yes
-
Referee: §4.3 (double-binding syntax tests): The near-chance performance on constructions forcing two SVO bindings on one token is presented as consistent with the slot limitation, but the paper does not quantify how much of the failure is attributable to the prior slot being inaccessible versus other factors such as attention patterns or training data distribution. A breakdown by model scale and error type would clarify the link to the slot hypothesis.
Authors: We concur that a finer-grained error analysis would help isolate the contribution of the slot limitation. The revised §4.3 will include a breakdown of accuracy by model scale and by error category (e.g., failure to bind the second subject versus attribute misassignment). We will also add a qualitative comparison of attention patterns across successful and failing cases to assess whether attention dynamics provide an independent explanation, while noting that the consistent pattern across scales remains most parsimoniously explained by the two-slot capacity. revision: yes
Circularity Check
No significant circularity in empirical probing and behavioral analysis
full rationale
The paper's claims rest on direct multi-slot linear probing of residual stream activations and accuracy measurements on held-out behavioral tasks (factual retrieval, relational inference, double-binding syntax). These are experimental observations of decodability and performance differentials, not derivations that reduce by construction to fitted parameters renamed as predictions, self-definitional equations, or load-bearing self-citations. No mathematical chain equates outputs to inputs; the gap between linear decodability and functional use is evidenced by task-specific results rather than assumed.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Linear directions in residual stream activations can be isolated via probing to recover distinct entity representations.
Reference graph
Works this paper leans on
-
[1]
Understanding intermediate layers using linear classifier probes
Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes. InICLR Workshop, 2017. arXiv:1610.01644
work page Pith review arXiv 2017
-
[2]
Burke, Tristan Hume, Shan Carter, Tom Henighan, and Christopher Olah
Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Con- erly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Zac Hatfield-Dodds, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E. Burke, Tristan Hume, Shan Carter, Tom Henighan, and...
2023
-
[3]
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoen- coders find highly interpretable features in language models.arXiv preprint arXiv:2309.08600, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Representational analysis of binding in language models.arXiv preprint arXiv:2409.05448, 2024
Qin Dai, Benjamin Heinzerling, and Kentaro Inui. Representational analysis of binding in language models.arXiv preprint arXiv:2409.05448, 2024
-
[5]
Measuring the persuasiveness of language models.https://www.anthropic.com/research/ measuring-model-persuasiveness, 2024
Esin Durmus, Liane Lovitt, Alex Tamkin, Stuart Ritchie, Jack Clark, and Deep Ganguli. Measuring the persuasiveness of language models.https://www.anthropic.com/research/ measuring-model-persuasiveness, 2024. Anthropic
2024
-
[6]
Jiahai Feng and Jacob Steinhardt. How do language models bind entities in context? In International Conference on Learning Representations, 2024. arXiv:2310.17191
-
[7]
arXiv preprint arXiv:2510.06182 , year=
Yoav Gur-Arieh, Mor Geva, and Atticus Geiger. Mixing mechanisms: How language models retrieve bound entities in-context.arXiv preprint arXiv:2510.06182, 2025. 19
-
[8]
Representational similarity analysis – connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008
Nikolaus Kriegeskorte, Marieke Mur, and Peter Bandettini. Representational similarity analysis – connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008
2008
-
[9]
Locating and Editing Factual Associations in GPT, January 2023
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT. InAdvances in Neural Information Processing Systems, 2022. arXiv:2202.05262
-
[10]
In-context Learning and Induction Heads
Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...
work page internal anchor Pith review arXiv 2022
-
[11]
and Goldstein, Simon and O'Gara, Aidan and Chen, Michael and Hendrycks, Dan , year =
PeterS.Park, Simon Goldstein, AidanO’Gara, MichaelChen, and DanHendrycks. AIdeception: A survey of examples, risks, and potential solutions.Patterns, 5(5), 2024. arXiv:2308.14752
-
[12]
Fine-tuning enhances existing mechanisms: A case study on entity tracking
Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, and David Bau. Fine- tuning enhances existing mechanisms: A case study on entity tracking. InInternational Conference on Learning Representations, 2024. arXiv:2402.14811
-
[13]
Steering Llama 2 via Contrastive Activation Addition
Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Matt Turner. Steering Llama 2 via contrastive activation addition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024. arXiv:2312.06681
work page internal anchor Pith review arXiv 2024
-
[14]
Towards Understanding Sycophancy in Language Models
Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, and Ethan Perez. Towards understanding sycophancy in language models. In Internat...
work page internal anchor Pith review arXiv 2024
-
[15]
Tensor product variable binding and the representation of symbolic structures in connectionist systems.Artificial Intelligence, 46(1–2):159–216, 1990
Paul Smolensky. Tensor product variable binding and the representation of symbolic structures in connectionist systems.Artificial Intelligence, 46(1–2):159–216, 1990
1990
-
[16]
Treisman and Garry Gelade
Anne M. Treisman and Garry Gelade. A feature-integration theory of attention.Cognitive Psychology, 12(1):97–136, 1980
1980
-
[17]
Steering Language Models With Activation Engineering
Alexander Matt Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, and Monte MacDiarmid. Activation addition: Steering language models without optimization.arXiv preprint arXiv:2308.10248, 2023
work page internal anchor Pith review arXiv 2023
-
[18]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, 2022. arXiv:2201.11903. 20 Appendix A Alternative probing approach to modeling different entities’ representations ...
work page internal anchor Pith review arXiv 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.