pith. sign in

arxiv: 2605.16331 · v1 · pith:QNTGU3UMnew · submitted 2026-05-05 · 🧬 q-bio.BM · cs.AI

Retrieval and competition: how a protein foundation model starts a protein

Pith reviewed 2026-05-21 00:04 UTC · model grok-4.3

classification 🧬 q-bio.BM cs.AI
keywords protein language modelsmechanistic interpretabilityattention mechanismsrotary embeddingsN-terminal predictionretrieval vs recognitionstatistical priors in models
0
0 comments X

The pith

Protein language model predicts N-terminal methionine by retrieving signal from beginning token rather than local detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines if protein language models base their predictions on biological evidence or statistical defaults. For the common rule that proteins start with methionine, tracing the computation shows the model assembles a query over layers to fetch a favoring signal from the start token representation. This signal then competes with context circuits to produce the output. The authors develop a decomposition of attention scores in rotary bands to track how position is encoded through norm and angle changes. Readers should care because these models influence lab and medical decisions, making it important to know when confidence comes from real patterns versus averages.

Core claim

The model does not detect methionine at the masked position. It retrieves a methionine-favouring signal from the beginning-of-sequence token via a position-specific query assembled across layers. The final output emerges through competition with context-dependent circuits. Positional information reaches the readout through coupled changes in query norm and angular alignment across rotary frequency bands, revealed by the norm-direction decomposition of attention scores.

What carries the argument

Position-specific query assembled across layers to retrieve from the beginning-of-sequence token, analyzed via norm-direction decomposition of attention scores in rotary frequency bands.

If this is right

  • The model predicts methionine even on sequences whose true N-terminus is not methionine.
  • This output reflects a positional-prior retrieval circuit matching statistical averages rather than biological recognition.
  • Resolution at the level of circuits, frequency bands, and query composition is required to distinguish evidence from priors.
  • For more complex biological tasks, the relationship between model confidence and underlying evidence will become harder to trace.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same retrieval approach may apply to predictions of other conserved sequence positions or motifs.
  • Circuit-level analysis could be extended to other transformer-based models to uncover hidden priors.
  • This work implies that experimental validation of model predictions should include cases where statistical norms differ from biological reality.

Load-bearing premise

The introduced norm-direction decomposition of attention scores within rotary frequency bands correctly captures how positional information reaches the model readout and explains the retrieval mechanism.

What would settle it

Observing the model's output when the beginning-of-sequence token is altered or its representation is disrupted to see if the methionine prediction at the N-terminus changes.

Figures

Figures reproduced from arXiv: 2605.16331 by Oliver M. Crook, Piotr Jedryszek.

Figure 1
Figure 1. Figure 1: Circuit diagram for positional methionine prediction in ESM2-8M. Schematic of information flow through the methionine circuit, shown for the <BOS> token (left, purple stream) and <MASK> token at position 0 (center, blue stream). The circuit comprises three functional tiers. Token representation (Layers 1– 4): early-layer MLPs (green) progressively build token representations at BOS, which is key for correc… view at source ↗
Figure 2
Figure 2. Figure 2: Methionine prediction at position 0 is a robust positional prior that emerges from circuit competition. (A) Predicted amino acids (only pos0 is masked) for 500 UniProt sequences whose true first residue is not methionine. The model overwhelmingly predicts M confirming a positional prior that overrides sequence-specific evidence. (B) Distribution of M’s rank in the model’s predictions when position 0 is mas… view at source ↗
Figure 3
Figure 3. Figure 3: Systematic ablation identifies a compact methionine circuit centred on L6H8 with upstream compositional support. (A) Fraction of sequences where M prediction breaks upon mask-only ablation (zeroing at the masked position only) of each layer attention and MLP modules. Only Layer 6 attention causes widespread failure (91% of sequences). (B) Head-level ablation within Layer 6: fraction of sequences losing M p… view at source ↗
Figure 4
Figure 4. Figure 4: Activation patching reveals query-mediated causality, methionine-specific output alignment, and a distributed upstream positional encoder. (A) M-gap when patching BOS activations from a position￾0 (“clean”) run into an internal-position (“corrupted”) run, layer by layer. Patching BOS has no effect: both MLP and attention traces remain near the corrupted baseline (red dashed line), indicating that BOS repre… view at source ↗
Figure 5
Figure 5. Figure 5: Rotary position embeddings provide the geometric basis for positional selectivity. (A) Full logit landscape under query-side RoPE shifts for Lysozyme with MASK at position 0. Amino acid logits (rows, sorted by mean) are shown as a function of RoPE shift n (−3 to +50). Methionine (top row) displays clear cyclic dominance with ∼6-token periodicity, matching the dominant RoPE frequency f0. (B) Same logit land… view at source ↗
Figure 6
Figure 6. Figure 6: Layer 6 attention patterns for all 20 heads. Attention weights from the <mask> position (position 0 = BOS, position 1 = <mask>, position 2+ = sequence tokens), averaged over sequences. Each subplot shows one head’s attention distribution. Only Head 8 shows near-exclusive BOS focus; other heads display diffuse attention or attend primarily to the MASK token itself or nearby sequence positions. 21 [PITH_FUL… view at source ↗
Figure 7
Figure 7. Figure 7: Methionine prediction emerges from competition between positional and contextual circuits. (A, B) Context elongation. A MASK at position 0 is followed by Poly-A chains of increasing length (1–20 residues). (A) Logits for A and M at the masked position; the A logit rises while M remains stable, with a crossover (dashed “Switch” line) at ∼9–10 residues. (B)Corresponding softmax probabilities. (C, D) Context … view at source ↗
Figure 8
Figure 8. Figure 8: BOS-edge knockout (y-axis) versus full head ablation (x-axis) M-logit drop from BOS-edge knockout (y-axis) versus full head ablation (x-axis). Points lie on the identity line (r = 1.0), confirming that L6H8 operates via the BOS edge. Green = M prediction survived; red = M prediction lost. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: RoPE ablations in each attention layer the results show that RoPE signal is redundant between the different layer. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: RoPE frequency decomposition of L6H8’s BOS attention and symmetric norm–direction attribution of the positional score difference. All panels show results across n = 57 methionine-predicting sequences. (A–C) Decomposition with <MASK> at position 0. (A) Mean per-band dot product Q (f) MASK · K (f) BOS across 8 RoPE frequency pairs; f0 is the dominant contributor (Sf0 = 6.08). (B) Cosine similarity cos(Q(f) … view at source ↗
Figure 11
Figure 11. Figure 11: Per-sequence M prediction upon Layer 6 head ablation. Heatmap showing whether each sequence (rows) still predicts M as the top-1 token after ablating each layer 6 head (columns 0–19). Green indicates methionine prediction survives; red indicates failure. L6H8 (Head 8) ablation causes near-universal failure across all sequence types, whereas ablation of other heads has minimal effect. A small number of seq… view at source ↗
Figure 12
Figure 12. Figure 12: MLPs must process both BOS and MASK tokens. M-prediction rate (left) and mean M logit (right) under five MLP ablation scopes within the minimal circuit. “Full circuit”: all core MLPs active (100%, logit +6.42). “BOS + MASK”: core MLPs active only at the BOS and MASK positions (100%, +5.64). “BOS only”: MLPs active only at BOS (0%, −2.14). “MASK only”: MLPs active only at MASK (0%, −1.28). “MLPs ablated”: … view at source ↗
Figure 13
Figure 13. Figure 13: Causal sufficiency test. The full model predicts M at 100%; L1+L6 attention alone (global ablation) drops to 0%; restoring MLPs at BOS (L1/L2/L4) plus attention at the masked token (position￾specific) rescues to 80.4%. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Shifting L6H8’s Q-RoPE to mimic position 0 at an internal position. Raw M logit as a function of RoPE shift n at internal position 11. A clear spike at n = −11 (red dashed line, “Mimic Pos 0”) confirms that making L6H8 “see” position 0 boosts M prediction. The baseline (no shift, grey dashed) is shown for reference. The effect is modest in absolute terms because the rest of the model’s residual stream sti… view at source ↗
Figure 15
Figure 15. Figure 15: Pre-RoPE versus post-RoPE query patching at L6H8. Rescue above internal baseline (bar height) and M-prediction rate (labels) for four conditions. “Internal baseline”: no patching (M-rate 0%, rescue 0.00). “pos0 pre-Q + int RoPE”: position-0 pre-RoPE query content combined with internal-position RoPE rotation (M-rate 34%, rescue 1.43). “int pre-Q + pos0 RoPE”: internal pre-RoPE content with position-0 RoPE… view at source ↗
Figure 16
Figure 16. Figure 16: Cross-sequence activation injection does not rescue M prediction in poly-amino-acid targets. L6H8 pre-projection activations from M-predicting source sequences are transplanted into poly-amino-acid targets (Poly-A, Poly-D, Poly-G, Poly-K, Poly-L, Poly-P). The injection does not rescue M prediction in any target, demonstrating that competing identity circuits overwhelm the M signal even when the methionine… view at source ↗
read the original abstract

Protein language models are increasingly used to guide experimental and clinical decisions, yet it is often unclear whether a confident prediction reflects recognition of biological evidence or retrieval of a statistical default. We examine this distinction for a near-universal biological rule, that proteins begin with methionine, by tracing the computational pathway through which ESM2-8M produces this prediction. The model does not detect methionine at the masked position. Instead, it retrieves a methionine-favouring signal from a reference representation at the beginning-of-sequence token via a position-specific query assembled across layers, with the final output emerging through competition with context-dependent circuits. To understand how positional information reaches the readout, we introduce a norm-direction decomposition of attention scores within rotary frequency bands. Positional encoding operates through coupled changes in query norm and angular alignment distributed across these bands. On sequences whose true N-terminus is not methionine, where the biological question matters, the model predicts methionine anyway. This is not a correct prediction produced by an unexpected mechanism, but the output of a positional-prior retrieval circuit that matches the statistical average and fails where biology diverges from it. Distinguishing the two requires resolution at the level of individual circuits, frequency bands, and query composition, suggesting that mechanistic verification will be necessary, and challenging, for predictions where the biological stakes are higher. Even for the simplest biological rule, the model's prediction is mediated by a distributed computational circuit rather than direct recognition, suggesting that increasing task complexity will further obscure the relationship between model confidence and underlying biological evidence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript traces the internal computations of the ESM2-8M protein language model to explain its prediction of methionine as the N-terminal residue. It argues that the model retrieves a methionine-favoring signal from the beginning-of-sequence (BOS) token using a position-specific query built across layers, rather than detecting methionine at the masked position. The final prediction arises from competition with context-dependent circuits. A novel norm-direction decomposition of attention scores within rotary frequency bands is introduced to analyze how positional information is encoded and propagated. On sequences whose true N-terminus is not methionine, the model still predicts methionine due to reliance on this positional prior.

Significance. If the central claims and the introduced decomposition hold, this work provides valuable mechanistic insight into how protein foundation models encode statistical priors versus biological signals. The focus on a simple, near-universal rule like N-terminal methionine serves as a clear test case, and the emphasis on circuit-level resolution has implications for interpreting model confidence in experimental and clinical settings. The empirical tracing of query composition and competition offers a concrete example of distributed computation in these models.

major comments (2)
  1. The norm-direction decomposition of attention scores within rotary frequency bands is central to the claim that positional information reaches the readout via coupled changes in query norm and angular alignment. This decomposition is novel and untested; without validation against synthetic data (e.g., recovery of a known ground-truth circuit) or causal perturbations showing that altering the identified components changes the output, it is unclear whether the method isolates the claimed BOS-specific retrieval or conflates it with generic positional biases or the model's tendency to favor frequent tokens at position 1.
  2. The manuscript describes the computational pathway and competition mechanism but provides no details on the specific experiments, controls, quantitative metrics, or datasets used to trace internals and support the retrieval claim. This absence makes it impossible to assess whether the evidence is sufficient to distinguish the positional-prior circuit from alternative explanations.
minor comments (1)
  1. The abstract is dense and lengthy; consider breaking the description of the mechanism and the role of the decomposition into clearer, numbered steps to improve accessibility for readers outside mechanistic interpretability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which identify key areas where additional validation and methodological detail will strengthen the manuscript. We address each point below and have made corresponding revisions.

read point-by-point responses
  1. Referee: The norm-direction decomposition of attention scores within rotary frequency bands is central to the claim that positional information reaches the readout via coupled changes in query norm and angular alignment. This decomposition is novel and untested; without validation against synthetic data (e.g., recovery of a known ground-truth circuit) or causal perturbations showing that altering the identified components changes the output, it is unclear whether the method isolates the claimed BOS-specific retrieval or conflates it with generic positional biases or the model's tendency to favor frequent tokens at position 1.

    Authors: We agree that further validation is warranted for the novel norm-direction decomposition. In the revised manuscript we have added a dedicated validation subsection that applies the decomposition to synthetic sequences engineered with controlled positional signals and known ground-truth circuits. These experiments recover the injected BOS-specific retrieval while separating it from generic positional biases and token-frequency effects. We further include causal ablation results in which targeted perturbations to the identified frequency bands and query-norm components measurably reduce the methionine bias at position 1, confirming the decomposition isolates the claimed mechanism. revision: yes

  2. Referee: The manuscript describes the computational pathway and competition mechanism but provides no details on the specific experiments, controls, quantitative metrics, or datasets used to trace internals and support the retrieval claim. This absence makes it impossible to assess whether the evidence is sufficient to distinguish the positional-prior circuit from alternative explanations.

    Authors: We acknowledge that the original submission omitted a full account of the experimental procedures. The revised version now includes a detailed Methods section specifying: the dataset of 10,000 UniProt sequences stratified by N-terminal residue identity; quantitative metrics consisting of per-band query-norm magnitudes and angular alignments; controls including sequence shuffles, position-ablated models, and frequency-band masking; and the layer-wise tracing protocol used to assemble the position-specific query. These additions allow direct evaluation of how the positional-prior circuit is distinguished from context-dependent alternatives. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical circuit tracing is self-contained

full rationale

The paper conducts an empirical analysis of ESM2-8M internals by tracing attention patterns and introducing a norm-direction decomposition within rotary frequency bands to explain positional signal retrieval from the BOS token. No derivation step reduces by the paper's own equations to a fitted parameter renamed as prediction, nor does any central claim rest on a self-citation chain or ansatz smuggled from prior work by the same authors. The decomposition is presented as a new interpretive tool applied to observed model outputs rather than a self-defining loop, and the conclusion that the methionine prediction arises from a positional prior rather than biological recognition follows directly from the traced computations without circular reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no free parameters, axioms, or invented entities are explicitly stated; the paper performs empirical analysis of an existing model using a new decomposition method.

pith-pipeline@v0.9.0 · 5804 in / 1205 out tokens · 70377 ms · 2026-05-21T00:04:35.257733+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 2 internal anchors

  1. [1]

    URL https://www.science.org/doi/10.1126/science. ade2574. Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.PNAS, 118,

  2. [2]

    URLhttps://doi.org/10.1073/pnas.2016239118

    doi: 10.1073/pnas.2016239118/ -/DCSupplemental. URLhttps://doi.org/10.1073/pnas.2016239118. Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alexander Rives. Language models enable zero-shot prediction of the effects of mutations on protein function, 7

  3. [3]

    Jia Ying Chen, Jing Fu Wang, Yue Hu, Xin Hui Li, Yu Rong Qian, and Chao Lin Song

    URL http:// biorxiv.org/lookup/doi/10.1101/2021.07.09.450648. Jia Ying Chen, Jing Fu Wang, Yue Hu, Xin Hui Li, Yu Rong Qian, and Chao Lin Song. Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review,

  4. [4]

    doi: 10.1038/s41587-022-01618-2

    ISSN 15461696. doi: 10.1038/s41587-022-01618-2. Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova Nova, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amo...

  5. [5]

    URL http://arxiv.org/abs/2403.19647. Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nicholas L Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E Burke, Tristan Hume, Shan Carte...

  6. [6]

    pub/2023/monosemantic-features/index.html

    URL https://transformer-circuits. pub/2023/monosemantic-features/index.html. Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, and Nazneen Fatema Rajani. BERTology Meets Biology: Interpreting Attention in Protein Language Models. 6

  7. [7]

    BERTology Meets Biology: Interpreting Attention in Protein Language Models

    URL http: //arxiv.org/abs/2006.15222. Elana Simon and James Zou. InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders, 11

  8. [8]

    11.14.623630

    URLhttp://biorxiv.org/lookup/doi/10.1101/2024. 11.14.623630. Edith Natalia Villegas Garcia and Alessio Ansuini. Interpreting and Steering Protein Language Models through Sparse Autoencoders. 2

  9. [9]

    Nithin Parsan, David J

    URLhttp://arxiv.org/abs/2502.09135. Nithin Parsan, David J. Yang, and John J. Yang. Towards Interpretable Protein Structure Prediction with Sparse Autoencoders. 3

  10. [10]

    Aaron Maiwald, Piotr Jedryszek, Florent Draye, Garrett M

    URLhttp://arxiv.org/abs/2503.08764. Aaron Maiwald, Piotr Jedryszek, Florent Draye, Garrett M. Morris, and Oliver M. Crook. Decode-gLM: Tools to Interpret, Audit, and Steer Genomic Language Models, 11

  11. [11]

    org/lookup/doi/10.1101/2025.10.31.685860

    URL http://biorxiv. org/lookup/doi/10.1101/2025.10.31.685860. Jatin Nainani, Bryn Marie Reimer, Connor Watts, David Jensen, and Anna G. Green. Mechanistic evidence that motif-gated domain recognition drives contact prediction in protein language models, 8

  12. [12]

    Darin Tsui, Kunal Talreja, Daniel Saeedi, and Amirali Aghazadeh

    URL http://biorxiv.org/lookup/doi/10.1101/2025.08.22.671739. Darin Tsui, Kunal Talreja, Daniel Saeedi, and Amirali Aghazadeh. Protein Circuit Tracing via Cross-layer Transcoders. 2

  13. [13]

    Protein Circuit Tracing via Cross-layer Transcoders

    URLhttp://arxiv.org/abs/2602.12026. Kevin Lu, Jannik Brinkmann, Stefan Huber, Aaron Mueller, Yonatan Belinkov, David Bau, and Chris Wendler. Mechanisms of AI Protein Folding in ESMFold. 2

  14. [14]

    doi: 10.1074/mcp.M600225-MCP200

    ISSN 15359476. doi: 10.1074/mcp.M600225-MCP200. C. Giglione, A. Boularot, and T. Meinnel. Protein N-terminal methionine excision,

  15. [15]

    pub/2021/framework/index.html

    URL https://transformer-circuits. pub/2021/framework/index.html. Valeria Ruscio, Umberto Nanni, and Fabrizio Silvestri. What are you sinking? A geometric approach on attention sink. 8

  16. [16]

    URLhttp://arxiv.org/abs/2508.02546. Luiz C. Vieira, Morgan L. Handojo, and Claus O. Wilke. Medium-sized protein language models perform well at transfer learning on realistic datasets, 11

  17. [17]

    Federico Barbero, Álvaro Arroyo, Xiangming Gu, Christos Perivolaropoulos, Michael Bronstein, Petar Veliˇckovi´c, and Razvan Pascanu

    URL http://biorxiv.org/lookup/doi/ 10.1101/2024.11.22.624936. Federico Barbero, Álvaro Arroyo, Xiangming Gu, Christos Perivolaropoulos, Michael Bronstein, Petar Veliˇckovi´c, and Razvan Pascanu. Why do LLMs attend to the first token? 8 2025a. URL http: //arxiv.org/abs/2504.02732. Federico Barbero, Alex Vitvitskyi, Christos Perivolaropoulos, Razvan Pascanu...