Retrieval and competition: how a protein foundation model starts a protein
Pith reviewed 2026-05-21 00:04 UTC · model grok-4.3
The pith
Protein language model predicts N-terminal methionine by retrieving signal from beginning token rather than local detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The model does not detect methionine at the masked position. It retrieves a methionine-favouring signal from the beginning-of-sequence token via a position-specific query assembled across layers. The final output emerges through competition with context-dependent circuits. Positional information reaches the readout through coupled changes in query norm and angular alignment across rotary frequency bands, revealed by the norm-direction decomposition of attention scores.
What carries the argument
Position-specific query assembled across layers to retrieve from the beginning-of-sequence token, analyzed via norm-direction decomposition of attention scores in rotary frequency bands.
If this is right
- The model predicts methionine even on sequences whose true N-terminus is not methionine.
- This output reflects a positional-prior retrieval circuit matching statistical averages rather than biological recognition.
- Resolution at the level of circuits, frequency bands, and query composition is required to distinguish evidence from priors.
- For more complex biological tasks, the relationship between model confidence and underlying evidence will become harder to trace.
Where Pith is reading between the lines
- The same retrieval approach may apply to predictions of other conserved sequence positions or motifs.
- Circuit-level analysis could be extended to other transformer-based models to uncover hidden priors.
- This work implies that experimental validation of model predictions should include cases where statistical norms differ from biological reality.
Load-bearing premise
The introduced norm-direction decomposition of attention scores within rotary frequency bands correctly captures how positional information reaches the model readout and explains the retrieval mechanism.
What would settle it
Observing the model's output when the beginning-of-sequence token is altered or its representation is disrupted to see if the methionine prediction at the N-terminus changes.
Figures
read the original abstract
Protein language models are increasingly used to guide experimental and clinical decisions, yet it is often unclear whether a confident prediction reflects recognition of biological evidence or retrieval of a statistical default. We examine this distinction for a near-universal biological rule, that proteins begin with methionine, by tracing the computational pathway through which ESM2-8M produces this prediction. The model does not detect methionine at the masked position. Instead, it retrieves a methionine-favouring signal from a reference representation at the beginning-of-sequence token via a position-specific query assembled across layers, with the final output emerging through competition with context-dependent circuits. To understand how positional information reaches the readout, we introduce a norm-direction decomposition of attention scores within rotary frequency bands. Positional encoding operates through coupled changes in query norm and angular alignment distributed across these bands. On sequences whose true N-terminus is not methionine, where the biological question matters, the model predicts methionine anyway. This is not a correct prediction produced by an unexpected mechanism, but the output of a positional-prior retrieval circuit that matches the statistical average and fails where biology diverges from it. Distinguishing the two requires resolution at the level of individual circuits, frequency bands, and query composition, suggesting that mechanistic verification will be necessary, and challenging, for predictions where the biological stakes are higher. Even for the simplest biological rule, the model's prediction is mediated by a distributed computational circuit rather than direct recognition, suggesting that increasing task complexity will further obscure the relationship between model confidence and underlying biological evidence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript traces the internal computations of the ESM2-8M protein language model to explain its prediction of methionine as the N-terminal residue. It argues that the model retrieves a methionine-favoring signal from the beginning-of-sequence (BOS) token using a position-specific query built across layers, rather than detecting methionine at the masked position. The final prediction arises from competition with context-dependent circuits. A novel norm-direction decomposition of attention scores within rotary frequency bands is introduced to analyze how positional information is encoded and propagated. On sequences whose true N-terminus is not methionine, the model still predicts methionine due to reliance on this positional prior.
Significance. If the central claims and the introduced decomposition hold, this work provides valuable mechanistic insight into how protein foundation models encode statistical priors versus biological signals. The focus on a simple, near-universal rule like N-terminal methionine serves as a clear test case, and the emphasis on circuit-level resolution has implications for interpreting model confidence in experimental and clinical settings. The empirical tracing of query composition and competition offers a concrete example of distributed computation in these models.
major comments (2)
- The norm-direction decomposition of attention scores within rotary frequency bands is central to the claim that positional information reaches the readout via coupled changes in query norm and angular alignment. This decomposition is novel and untested; without validation against synthetic data (e.g., recovery of a known ground-truth circuit) or causal perturbations showing that altering the identified components changes the output, it is unclear whether the method isolates the claimed BOS-specific retrieval or conflates it with generic positional biases or the model's tendency to favor frequent tokens at position 1.
- The manuscript describes the computational pathway and competition mechanism but provides no details on the specific experiments, controls, quantitative metrics, or datasets used to trace internals and support the retrieval claim. This absence makes it impossible to assess whether the evidence is sufficient to distinguish the positional-prior circuit from alternative explanations.
minor comments (1)
- The abstract is dense and lengthy; consider breaking the description of the mechanism and the role of the decomposition into clearer, numbered steps to improve accessibility for readers outside mechanistic interpretability.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which identify key areas where additional validation and methodological detail will strengthen the manuscript. We address each point below and have made corresponding revisions.
read point-by-point responses
-
Referee: The norm-direction decomposition of attention scores within rotary frequency bands is central to the claim that positional information reaches the readout via coupled changes in query norm and angular alignment. This decomposition is novel and untested; without validation against synthetic data (e.g., recovery of a known ground-truth circuit) or causal perturbations showing that altering the identified components changes the output, it is unclear whether the method isolates the claimed BOS-specific retrieval or conflates it with generic positional biases or the model's tendency to favor frequent tokens at position 1.
Authors: We agree that further validation is warranted for the novel norm-direction decomposition. In the revised manuscript we have added a dedicated validation subsection that applies the decomposition to synthetic sequences engineered with controlled positional signals and known ground-truth circuits. These experiments recover the injected BOS-specific retrieval while separating it from generic positional biases and token-frequency effects. We further include causal ablation results in which targeted perturbations to the identified frequency bands and query-norm components measurably reduce the methionine bias at position 1, confirming the decomposition isolates the claimed mechanism. revision: yes
-
Referee: The manuscript describes the computational pathway and competition mechanism but provides no details on the specific experiments, controls, quantitative metrics, or datasets used to trace internals and support the retrieval claim. This absence makes it impossible to assess whether the evidence is sufficient to distinguish the positional-prior circuit from alternative explanations.
Authors: We acknowledge that the original submission omitted a full account of the experimental procedures. The revised version now includes a detailed Methods section specifying: the dataset of 10,000 UniProt sequences stratified by N-terminal residue identity; quantitative metrics consisting of per-band query-norm magnitudes and angular alignments; controls including sequence shuffles, position-ablated models, and frequency-band masking; and the layer-wise tracing protocol used to assemble the position-specific query. These additions allow direct evaluation of how the positional-prior circuit is distinguished from context-dependent alternatives. revision: yes
Circularity Check
No significant circularity; empirical circuit tracing is self-contained
full rationale
The paper conducts an empirical analysis of ESM2-8M internals by tracing attention patterns and introducing a norm-direction decomposition within rotary frequency bands to explain positional signal retrieval from the BOS token. No derivation step reduces by the paper's own equations to a fitted parameter renamed as prediction, nor does any central claim rest on a self-citation chain or ansatz smuggled from prior work by the same authors. The decomposition is presented as a new interpretive tool applied to observed model outputs rather than a self-defining loop, and the conclusion that the methionine prediction arises from a positional prior rather than biological recognition follows directly from the traced computations without circular reduction to inputs.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we introduce a norm–direction decomposition of attention scores within rotary frequency bands... positional encoding operates through coupled changes in query-norm and angular alignment, distributed across rotary frequency bands
-
IndisputableMonolith/Foundation/DimensionForcingreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RoPE shifting reveals periodic positional geometry... period of approximately 6 tokens, matching the dominant frequency f0 (period ≈6.28 tokens)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
URL https://www.science.org/doi/10.1126/science. ade2574. Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.PNAS, 118,
-
[2]
URLhttps://doi.org/10.1073/pnas.2016239118
doi: 10.1073/pnas.2016239118/ -/DCSupplemental. URLhttps://doi.org/10.1073/pnas.2016239118. Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alexander Rives. Language models enable zero-shot prediction of the effects of mutations on protein function, 7
-
[3]
Jia Ying Chen, Jing Fu Wang, Yue Hu, Xin Hui Li, Yu Rong Qian, and Chao Lin Song
URL http:// biorxiv.org/lookup/doi/10.1101/2021.07.09.450648. Jia Ying Chen, Jing Fu Wang, Yue Hu, Xin Hui Li, Yu Rong Qian, and Chao Lin Song. Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review,
-
[4]
doi: 10.1038/s41587-022-01618-2
ISSN 15461696. doi: 10.1038/s41587-022-01618-2. Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova Nova, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amo...
-
[5]
URL http://arxiv.org/abs/2403.19647. Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nicholas L Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E Burke, Tristan Hume, Shan Carte...
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
pub/2023/monosemantic-features/index.html
URL https://transformer-circuits. pub/2023/monosemantic-features/index.html. Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, and Nazneen Fatema Rajani. BERTology Meets Biology: Interpreting Attention in Protein Language Models. 6
work page 2023
-
[7]
BERTology Meets Biology: Interpreting Attention in Protein Language Models
URL http: //arxiv.org/abs/2006.15222. Elana Simon and James Zou. InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders, 11
-
[8]
URLhttp://biorxiv.org/lookup/doi/10.1101/2024. 11.14.623630. Edith Natalia Villegas Garcia and Alessio Ansuini. Interpreting and Steering Protein Language Models through Sparse Autoencoders. 2
-
[9]
URLhttp://arxiv.org/abs/2502.09135. Nithin Parsan, David J. Yang, and John J. Yang. Towards Interpretable Protein Structure Prediction with Sparse Autoencoders. 3
-
[10]
Aaron Maiwald, Piotr Jedryszek, Florent Draye, Garrett M
URLhttp://arxiv.org/abs/2503.08764. Aaron Maiwald, Piotr Jedryszek, Florent Draye, Garrett M. Morris, and Oliver M. Crook. Decode-gLM: Tools to Interpret, Audit, and Steer Genomic Language Models, 11
-
[11]
org/lookup/doi/10.1101/2025.10.31.685860
URL http://biorxiv. org/lookup/doi/10.1101/2025.10.31.685860. Jatin Nainani, Bryn Marie Reimer, Connor Watts, David Jensen, and Anna G. Green. Mechanistic evidence that motif-gated domain recognition drives contact prediction in protein language models, 8
-
[12]
Darin Tsui, Kunal Talreja, Daniel Saeedi, and Amirali Aghazadeh
URL http://biorxiv.org/lookup/doi/10.1101/2025.08.22.671739. Darin Tsui, Kunal Talreja, Daniel Saeedi, and Amirali Aghazadeh. Protein Circuit Tracing via Cross-layer Transcoders. 2
-
[13]
Protein Circuit Tracing via Cross-layer Transcoders
URLhttp://arxiv.org/abs/2602.12026. Kevin Lu, Jannik Brinkmann, Stefan Huber, Aaron Mueller, Yonatan Belinkov, David Bau, and Chris Wendler. Mechanisms of AI Protein Folding in ESMFold. 2
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
doi: 10.1074/mcp.M600225-MCP200
ISSN 15359476. doi: 10.1074/mcp.M600225-MCP200. C. Giglione, A. Boularot, and T. Meinnel. Protein N-terminal methionine excision,
-
[15]
URL https://transformer-circuits. pub/2021/framework/index.html. Valeria Ruscio, Umberto Nanni, and Fabrizio Silvestri. What are you sinking? A geometric approach on attention sink. 8
work page 2021
- [16]
-
[17]
URL http://biorxiv.org/lookup/doi/ 10.1101/2024.11.22.624936. Federico Barbero, Álvaro Arroyo, Xiangming Gu, Christos Perivolaropoulos, Michael Bronstein, Petar Veliˇckovi´c, and Razvan Pascanu. Why do LLMs attend to the first token? 8 2025a. URL http: //arxiv.org/abs/2504.02732. Federico Barbero, Alex Vitvitskyi, Christos Perivolaropoulos, Razvan Pascanu...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.