pith. sign in

arxiv: 2602.12026 · v2 · pith:65PALSYVnew · submitted 2026-02-12 · 💻 cs.LG · q-bio.QM

Protein Circuit Tracing via Cross-layer Transcoders

Pith reviewed 2026-05-16 01:49 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM
keywords proteincircuitsprotomechcomputationalcross-layermodelcapturecircuit
0
0 comments X

The pith

ProtoMech applies cross-layer transcoders to protein language models to recover 82-89% of model performance using sparse circuits that match biological motifs and improve protein design in over 70% of cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Protein language models are AI systems trained on sequences of amino acids to predict how proteins fold and work. ProtoMech trains special transcoders that look across multiple layers of the model at the same time instead of one layer at a time. These transcoders learn a small set of important features that the model uses for its predictions. The method keeps most of the original accuracy on tasks like telling protein families apart or predicting function, but uses less than one percent of the possible features. Researchers can then adjust the model by steering along these features to create new protein sequences that score higher on fitness measures than standard design methods.

Core claim

ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while retaining up to 79% of model accuracy... Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases.

Load-bearing premise

That the sparse latent representations learned jointly across layers faithfully approximate the model's full computational circuitry and that the identified circuits correspond to genuine structural and functional motifs rather than artifacts of the transcoder training.

read the original abstract

Protein language models (pLMs) have emerged as powerful predictors of protein structure and function. However, the computational circuits underlying their predictions remain poorly understood. Recent mechanistic interpretability methods decompose pLM representations into interpretable features, but they treat each layer independently and thus fail to capture cross-layer computation, limiting their ability to approximate the full model. We introduce ProtoMech, a framework for discovering computational circuits in pLMs using cross-layer transcoders that learn sparse latent representations jointly across layers to capture the model's full computational circuitry. Applied to the pLM ESM2, ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while retaining up to 79% of model accuracy, revealing correspondence with structural and functional motifs, including binding, signaling, and stability. Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases. These results establish ProtoMech as a principled framework for protein circuit tracing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces ProtoMech, a framework for discovering computational circuits in protein language models (pLMs) such as ESM2 via cross-layer transcoders that learn sparse latent representations jointly across layers. It reports recovering 82-89% of original model performance on protein family classification and function prediction tasks, identifying compressed circuits using <1% of the latent space while retaining up to 79% accuracy that correspond to motifs like binding and stability, and showing that steering along these circuits enables high-fitness protein design surpassing baselines in >70% of cases.

Significance. If the central claims hold under causal validation, the work would advance mechanistic interpretability for pLMs by moving beyond layer-independent decompositions to approximate full cross-layer computation, with direct implications for understanding predictions and guiding protein engineering. The reported performance recovery and design improvements indicate practical utility even if the circuits are useful approximations rather than exact recoveries.

major comments (3)
  1. [Abstract] Abstract: The central claim that cross-layer transcoders recover the pLM's true computational circuitry (rather than a useful sparse approximation) rests only on aggregate task performance recovery (82-89%) and compressed-circuit accuracy (79%); no interventions that ablate or swap specific cross-layer paths and measure exact behavioral match to ESM2 are described, leaving open the possibility that the latents are artifacts of the joint sparsity objective.
  2. [Abstract] Abstract and Results: Circuit identification and motif correspondence may be circular, as compressed circuits are selected and evaluated using the same performance metrics (task accuracy, design fitness) against which recovery is measured; without held-out controls or independence from post-hoc selection, the <1% latent-space claim and motif alignment cannot be assessed as load-bearing evidence.
  3. [Methods] Methods (training and evaluation): No details are provided on training procedures for the cross-layer transcoders, choice of baselines, statistical significance testing, or controls for post-hoc circuit selection, which directly undermines evaluation of the reported performance numbers and design superiority.
minor comments (2)
  1. [Abstract] Abstract: The abstract does not specify the exact protein tasks, dataset sizes, or number of runs, making the 82-89% and 79% figures difficult to contextualize.
  2. [Methods] The manuscript should clarify whether the cross-layer joint training objective includes any regularization terms that could bias toward non-causal features.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that sparse cross-layer latents can capture full model computation and that performance recovery plus motif correspondence validate the circuits; no explicit free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Cross-layer transcoders can jointly learn sparse representations that approximate the model's full computational circuitry
    This is the core premise enabling the method to move beyond single-layer analysis.
invented entities (1)
  • cross-layer transcoders no independent evidence
    purpose: To discover compressed circuits across layers in pLMs
    New component introduced by the framework; no independent evidence provided in abstract.

pith-pipeline@v0.9.0 · 5493 in / 1259 out tokens · 210095 ms · 2026-05-16T01:49:55.793406+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Retrieval and competition: how a protein foundation model starts a protein

    q-bio.BM 2026-05 unverdicted novelty 5.0

    ESM2-8M predicts N-terminal methionine via retrieval from a positional prior at the beginning-of-sequence token through distributed attention circuits rather than direct biological detection.