Protein Circuit Tracing via Cross-layer Transcoders

Amirali Aghazadeh; Daniel Saeedi; Darin Tsui; Kunal Talreja

arxiv: 2602.12026 · v2 · pith:65PALSYVnew · submitted 2026-02-12 · 💻 cs.LG · q-bio.QM

Protein Circuit Tracing via Cross-layer Transcoders

Darin Tsui , Kunal Talreja , Daniel Saeedi , Amirali Aghazadeh This is my paper

Pith reviewed 2026-05-16 01:49 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM

keywords proteincircuitsprotomechcomputationalcross-layermodelcapturecircuit

0 comments

The pith

ProtoMech applies cross-layer transcoders to protein language models to recover 82-89% of model performance using sparse circuits that match biological motifs and improve protein design in over 70% of cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Protein language models are AI systems trained on sequences of amino acids to predict how proteins fold and work. ProtoMech trains special transcoders that look across multiple layers of the model at the same time instead of one layer at a time. These transcoders learn a small set of important features that the model uses for its predictions. The method keeps most of the original accuracy on tasks like telling protein families apart or predicting function, but uses less than one percent of the possible features. Researchers can then adjust the model by steering along these features to create new protein sequences that score higher on fitness measures than standard design methods.

Core claim

ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while retaining up to 79% of model accuracy... Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases.

Load-bearing premise

That the sparse latent representations learned jointly across layers faithfully approximate the model's full computational circuitry and that the identified circuits correspond to genuine structural and functional motifs rather than artifacts of the transcoder training.

read the original abstract

Protein language models (pLMs) have emerged as powerful predictors of protein structure and function. However, the computational circuits underlying their predictions remain poorly understood. Recent mechanistic interpretability methods decompose pLM representations into interpretable features, but they treat each layer independently and thus fail to capture cross-layer computation, limiting their ability to approximate the full model. We introduce ProtoMech, a framework for discovering computational circuits in pLMs using cross-layer transcoders that learn sparse latent representations jointly across layers to capture the model's full computational circuitry. Applied to the pLM ESM2, ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while retaining up to 79% of model accuracy, revealing correspondence with structural and functional motifs, including binding, signaling, and stability. Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases. These results establish ProtoMech as a principled framework for protein circuit tracing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ProtoMech gives a workable sparse circuit method for protein LMs that supports steering, but the link to the model's actual cross-layer computations needs more causal checks.

read the letter

ProtoMech shows you can train transcoders jointly across layers in a protein language model and pull out very small circuits that still let you steer the model for better protein designs. That's the practical advance here. The new element is the cross-layer setup. Earlier work did this per layer, which misses how computations build across the stack. By learning sparse latents together, they get circuits that use less than 1 percent of the space but hold onto 79 percent of the accuracy on the tasks they tested. The steering results are the strongest part: it beats baselines in over 70 percent of cases for high-fitness designs, and the circuits line up with things like binding and stability motifs. The soft spot is the causal link. The performance numbers are good, but they do not yet show that these latents are doing the same work as the original model layers. If the joint sparsity just finds useful features without matching the real information flow, then the motif matches and the design gains could be less reliable than they look. The abstract leaves out the training details and any controls for how the circuits were selected, so it's hard to tell how much post-hoc fitting went into the results. This paper is for groups working on interpretability for biological models or on controllable protein generation. A reader who wants a starting point for circuit-based editing would find the framework worth trying. I would send it to peer review. The results are concrete enough that a referee can check the methods and ask for the missing causal tests.

Referee Report

3 major / 2 minor

Summary. The paper introduces ProtoMech, a framework for discovering computational circuits in protein language models (pLMs) such as ESM2 via cross-layer transcoders that learn sparse latent representations jointly across layers. It reports recovering 82-89% of original model performance on protein family classification and function prediction tasks, identifying compressed circuits using <1% of the latent space while retaining up to 79% accuracy that correspond to motifs like binding and stability, and showing that steering along these circuits enables high-fitness protein design surpassing baselines in >70% of cases.

Significance. If the central claims hold under causal validation, the work would advance mechanistic interpretability for pLMs by moving beyond layer-independent decompositions to approximate full cross-layer computation, with direct implications for understanding predictions and guiding protein engineering. The reported performance recovery and design improvements indicate practical utility even if the circuits are useful approximations rather than exact recoveries.

major comments (3)

[Abstract] Abstract: The central claim that cross-layer transcoders recover the pLM's true computational circuitry (rather than a useful sparse approximation) rests only on aggregate task performance recovery (82-89%) and compressed-circuit accuracy (79%); no interventions that ablate or swap specific cross-layer paths and measure exact behavioral match to ESM2 are described, leaving open the possibility that the latents are artifacts of the joint sparsity objective.
[Abstract] Abstract and Results: Circuit identification and motif correspondence may be circular, as compressed circuits are selected and evaluated using the same performance metrics (task accuracy, design fitness) against which recovery is measured; without held-out controls or independence from post-hoc selection, the <1% latent-space claim and motif alignment cannot be assessed as load-bearing evidence.
[Methods] Methods (training and evaluation): No details are provided on training procedures for the cross-layer transcoders, choice of baselines, statistical significance testing, or controls for post-hoc circuit selection, which directly undermines evaluation of the reported performance numbers and design superiority.

minor comments (2)

[Abstract] Abstract: The abstract does not specify the exact protein tasks, dataset sizes, or number of runs, making the 82-89% and 79% figures difficult to contextualize.
[Methods] The manuscript should clarify whether the cross-layer joint training objective includes any regularization terms that could bias toward non-causal features.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that sparse cross-layer latents can capture full model computation and that performance recovery plus motif correspondence validate the circuits; no explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Cross-layer transcoders can jointly learn sparse representations that approximate the model's full computational circuitry
This is the core premise enabling the method to move beyond single-layer analysis.

invented entities (1)

cross-layer transcoders no independent evidence
purpose: To discover compressed circuits across layers in pLMs
New component introduced by the framework; no independent evidence provided in abstract.

pith-pipeline@v0.9.0 · 5493 in / 1259 out tokens · 210095 ms · 2026-05-16T01:49:55.793406+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CLTs employ decoder matrices that map latent representations from preceding layers to layer ℓ according to: ŷℓ = Σ Wℓ'→ℓ_dec aℓ' + bℓ_pre (Eq. 2); training minimizes LMSE + α Laux with TopK (Eq. 3)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Retrieval and competition: how a protein foundation model starts a protein
q-bio.BM 2026-05 unverdicted novelty 5.0

ESM2-8M predicts N-terminal methionine via retrieval from a positional prior at the beginning-of-sequence token through distributed attention circuits rather than direct biological detection.