Protein Circuit Tracing via Cross-layer Transcoders
Pith reviewed 2026-05-16 01:49 UTC · model grok-4.3
The pith
ProtoMech applies cross-layer transcoders to protein language models to recover 82-89% of model performance using sparse circuits that match biological motifs and improve protein design in over 70% of cases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while retaining up to 79% of model accuracy... Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases.
Load-bearing premise
That the sparse latent representations learned jointly across layers faithfully approximate the model's full computational circuitry and that the identified circuits correspond to genuine structural and functional motifs rather than artifacts of the transcoder training.
read the original abstract
Protein language models (pLMs) have emerged as powerful predictors of protein structure and function. However, the computational circuits underlying their predictions remain poorly understood. Recent mechanistic interpretability methods decompose pLM representations into interpretable features, but they treat each layer independently and thus fail to capture cross-layer computation, limiting their ability to approximate the full model. We introduce ProtoMech, a framework for discovering computational circuits in pLMs using cross-layer transcoders that learn sparse latent representations jointly across layers to capture the model's full computational circuitry. Applied to the pLM ESM2, ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while retaining up to 79% of model accuracy, revealing correspondence with structural and functional motifs, including binding, signaling, and stability. Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases. These results establish ProtoMech as a principled framework for protein circuit tracing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ProtoMech, a framework for discovering computational circuits in protein language models (pLMs) such as ESM2 via cross-layer transcoders that learn sparse latent representations jointly across layers. It reports recovering 82-89% of original model performance on protein family classification and function prediction tasks, identifying compressed circuits using <1% of the latent space while retaining up to 79% accuracy that correspond to motifs like binding and stability, and showing that steering along these circuits enables high-fitness protein design surpassing baselines in >70% of cases.
Significance. If the central claims hold under causal validation, the work would advance mechanistic interpretability for pLMs by moving beyond layer-independent decompositions to approximate full cross-layer computation, with direct implications for understanding predictions and guiding protein engineering. The reported performance recovery and design improvements indicate practical utility even if the circuits are useful approximations rather than exact recoveries.
major comments (3)
- [Abstract] Abstract: The central claim that cross-layer transcoders recover the pLM's true computational circuitry (rather than a useful sparse approximation) rests only on aggregate task performance recovery (82-89%) and compressed-circuit accuracy (79%); no interventions that ablate or swap specific cross-layer paths and measure exact behavioral match to ESM2 are described, leaving open the possibility that the latents are artifacts of the joint sparsity objective.
- [Abstract] Abstract and Results: Circuit identification and motif correspondence may be circular, as compressed circuits are selected and evaluated using the same performance metrics (task accuracy, design fitness) against which recovery is measured; without held-out controls or independence from post-hoc selection, the <1% latent-space claim and motif alignment cannot be assessed as load-bearing evidence.
- [Methods] Methods (training and evaluation): No details are provided on training procedures for the cross-layer transcoders, choice of baselines, statistical significance testing, or controls for post-hoc circuit selection, which directly undermines evaluation of the reported performance numbers and design superiority.
minor comments (2)
- [Abstract] Abstract: The abstract does not specify the exact protein tasks, dataset sizes, or number of runs, making the 82-89% and 79% figures difficult to contextualize.
- [Methods] The manuscript should clarify whether the cross-layer joint training objective includes any regularization terms that could bias toward non-causal features.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Cross-layer transcoders can jointly learn sparse representations that approximate the model's full computational circuitry
invented entities (1)
-
cross-layer transcoders
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CLTs employ decoder matrices that map latent representations from preceding layers to layer ℓ according to: ŷℓ = Σ Wℓ'→ℓ_dec aℓ' + bℓ_pre (Eq. 2); training minimizes LMSE + α Laux with TopK (Eq. 3)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Retrieval and competition: how a protein foundation model starts a protein
ESM2-8M predicts N-terminal methionine via retrieval from a positional prior at the beginning-of-sequence token through distributed attention circuits rather than direct biological detection.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.