Protein Circuit Tracing via Cross-layer Transcoders

· 2026 · cs.LG · arXiv 2602.12026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Protein language models (pLMs) have emerged as powerful predictors of protein structure and function. However, the computational circuits underlying their predictions remain poorly understood. Recent mechanistic interpretability methods decompose pLM representations into interpretable features, but they treat each layer independently and thus fail to capture cross-layer computation, limiting their ability to approximate the full model. We introduce ProtoMech, a framework for discovering computational circuits in pLMs using cross-layer transcoders that learn sparse latent representations jointly across layers to capture the model's full computational circuitry. Applied to the pLM ESM2, ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while retaining up to 79% of model accuracy, revealing correspondence with structural and functional motifs, including binding, signaling, and stability. Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases. These results establish ProtoMech as a principled framework for protein circuit tracing.

representative citing papers

Retrieval and competition: how a protein foundation model starts a protein

q-bio.BM · 2026-05-05 · unverdicted · novelty 5.0

ESM2-8M predicts N-terminal methionine via retrieval from a positional prior at the beginning-of-sequence token through distributed attention circuits rather than direct biological detection.

citing papers explorer

Showing 1 of 1 citing paper.

Retrieval and competition: how a protein foundation model starts a protein q-bio.BM · 2026-05-05 · unverdicted · none · ref 13 · internal anchor
ESM2-8M predicts N-terminal methionine via retrieval from a positional prior at the beginning-of-sequence token through distributed attention circuits rather than direct biological detection.

Protein Circuit Tracing via Cross-layer Transcoders

fields

years

verdicts

representative citing papers

citing papers explorer