pith. sign in

arxiv: 2510.06182 · v2 · pith:IVM7WGIJnew · submitted 2025-10-07 · 💻 cs.CL

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

classification 💻 cs.CL
keywords mechanismentitiesboundin-contextmechanismsmodelmodelspositional
0
0 comments X
read the original abstract

A key component of in-context reasoning is the ability of language models (LMs) to bind entities for later retrieval. For example, an LM might represent "Ann loves pie" by binding "Ann" to "pie", allowing it to later retrieve "Ann" when asked "Who loves pie?" Prior research on short lists of bound entities found strong evidence that LMs implement such retrieval via a positional mechanism, where "Ann" is retrieved based on its position in context. In this work, we find that this mechanism generalizes poorly to more complex settings; as the number of bound entities in context increases, the positional mechanism becomes noisy and unreliable in middle positions. To compensate for this, we find that LMs supplement the positional mechanism with a lexical mechanism (retrieving "Ann" using its bound counterpart "pie") and a reflexive mechanism (retrieving "Ann" through a direct pointer). Through extensive experiments on nine models and ten binding tasks, we uncover a consistent pattern in how LMs mix these mechanisms to drive model behavior. We leverage these insights to develop a causal model combining all three mechanisms that estimates next token distributions with 95% agreement. Finally, we show that our model generalizes to substantially longer inputs of open-ended text interleaved with entity groups, further demonstrating the robustness of our findings in more natural settings. Overall, our study establishes a more complete picture of how LMs bind and retrieve entities in-context.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Slot Machines: How LLMs Keep Track of Multiple Entities

    cs.CL 2026-04 unverdicted novelty 8.0

    LLM activations encode current and prior entities in orthogonal slots, but models only use the current slot for explicit factual retrieval despite prior-slot information being linearly decodable.

  2. Data-driven Circuit Discovery for Interpretability of Language Models

    cs.AI 2026-05 unverdicted novelty 7.0

    Standard circuit discovery methods produce dataset-specific circuits rather than task-general ones, and a new clustering-based method discovers multiple more faithful circuits per dataset.

  3. Bucketing the Good Apples: A Method for Diagnosing and Improving Causal Abstraction

    cs.AI 2026-05 unverdicted novelty 7.0

    A four-step recipe partitions the input space using interchange intervention behavior to diagnose where causal abstractions hold and to guide improvements, demonstrated by recovering a full hypothesis from scratch in ...

  4. Cell-Based Representation of Relational Binding in Language Models

    cs.CL 2026-04 unverdicted novelty 7.0

    Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the...

  5. Talk is Cheap, Communication is Hard: Dynamic Grounding Failures and Repair in Multi-Agent Negotiation

    cs.MA 2026-05 unverdicted novelty 6.0

    LLM agent dyads fail to reach Pareto-optimal resource allocations in an iterated negotiation game due to dynamic grounding failures including anchoring, perfunctory fairness, and lost commitments, despite individual c...

  6. Talk is Cheap, Communication is Hard: Dynamic Grounding Failures and Repair in Multi-Agent Negotiation

    cs.MA 2026-05 unverdicted novelty 6.0

    LLM agent pairs in a resource allocation negotiation game fail to reach Pareto-optimal outcomes due to dynamic grounding failures such as loss of interaction history, anchoring, and referential errors.