Evolutionary Profiles for Protein Fitness Prediction

Chenchen Jing; Chunhua Shen; Hao Chen; Jigang Fan; Shengdong Lin; Weian Mao; Xiaoran Jiao; Zhanming Liang

arxiv: 2510.07286 · v3 · submitted 2025-10-08 · 💻 cs.LG · cs.AI· q-bio.BM· q-bio.QM

Evolutionary Profiles for Protein Fitness Prediction

Jigang Fan , Xiaoran Jiao , Shengdong Lin , Zhanming Liang , Weian Mao , Chenchen Jing , Hao Chen , Chunhua Shen This is my paper

Pith reviewed 2026-05-18 08:53 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.BMq-bio.QM

keywords protein fitness predictionevolutionary profilesinverse foldingmasked language modelingmutation impactlightweight modelProteinGym benchmarkhomolog profiles

0 comments

The pith

EvoIF predicts protein mutation fitness by combining within-family homolog profiles with cross-family inverse-folding constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a compact model can match or beat large protein language models on mutation fitness prediction by drawing on two distinct evolutionary signals. It frames natural sequences as expert demonstrations in an inverse reinforcement learning setup where masked language modeling extracts fitness estimates. This matters for protein engineering because the method needs far less training data and fewer parameters than current approaches. Ablation tests indicate the two signal types reinforce each other across varied proteins and mutation patterns. A sympathetic reader would see a route to more efficient, data-light predictors for sequence function.

Core claim

EvoIF integrates within-family profiles retrieved from homologs and cross-family structural-evolutionary constraints distilled from inverse folding logits, then fuses sequence-structure representations with these profiles through a compact transition block to produce calibrated probabilities for log-odds scoring.

What carries the argument

EvoIF, a lightweight fusion model that combines within-family evolutionary profiles from homologs and cross-family constraints from inverse-folding logits via a compact transition block.

If this is right

EvoIF and its MSA-enabled variant reach state-of-the-art or competitive results on 217 mutational assays covering more than 2.5 million mutants.
The model achieves this using only 0.15 percent of the training data required by recent large models and with fewer parameters.
The two profile types prove complementary and raise robustness across function types, MSA depths, taxa, and mutation depths.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same profile fusion might improve zero-shot performance on related tasks such as stability or binding affinity prediction.
Relying more on the cross-family component could help when deep multiple sequence alignments are unavailable for a target protein.
Extending the transition block to accept additional structural signals from structure prediction models could further tighten fitness estimates.

Load-bearing premise

The assumption that within-family homolog profiles and cross-family inverse-folding constraints are complementary and unbiased enough to improve robustness without post-hoc selection effects.

What would settle it

Performance on the ProteinGym benchmark dropping sharply when either the within-family or cross-family profile component is removed, measured across held-out assays spanning different taxa, MSA depths, and mutation depths.

Figures

Figures reproduced from arXiv: 2510.07286 by Chenchen Jing, Chunhua Shen, Hao Chen, Jigang Fan, Shengdong Lin, Weian Mao, Xiaoran Jiao, Zhanming Liang.

**Figure 1.** Figure 1: Overview of the proposed EvoIF. remarkable zero-shot capabilities in protein fitness prediction [11]. These models can predict the impact of mutations on protein function without additional training specific to particular protein families, sometimes achieving performance comparable to specially trained models. Current state-of-the-art approaches, including AIDO-Protein-RAG [12] and VenusREM [13], further b… view at source ↗

**Figure 2.** Figure 2: Accuracy (Spearman) versus (a) model parameters and (b) training data scale. 3.3 Ablation Study Profile type ablation. We evaluate the contribution of different profile types through systematic ablation studies ( [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Breakdown analysis on ProteinGym, across (a) function type, (b) MSA depth, (c) taxon, and (d) mutation depth. Ablation study on (e) homology quantity and (f) training data size. (g) Overall performance on all assays and out-of-distribution assays. ≥5 mutations, indicating a superior ability to capture non-linear mutational interactions (epistasis). Generalizing to novel protein families. While large-scale … view at source ↗

**Figure 4.** Figure 4: Visualization of fitness prediction results for the Spike glycoprotein. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Out-of-distribution evaluation on 23 ProteinGym assays with low similarity to [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Per-assay Spearman correlation for activity assays on ProteinGym. [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Per-assay Spearman correlation for organismal fitness assays on ProteinGym. [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Per-assay Spearman correlation for stability assays on ProteinGym. [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Per-assay Spearman correlation for expression assays on ProteinGym. [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Per-assay Spearman correlation for binding assays on ProteinGym. [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

read the original abstract

Predicting the fitness impact of mutations is central to protein engineering but constrained by limited assays relative to the size of sequence space. Protein language models (pLMs) trained with masked language modeling (MLM) exhibit strong zero-shot fitness prediction; we provide a unifying view by interpreting natural evolution as implicit reward maximization and MLM as inverse reinforcement learning (IRL), in which extant sequences act as expert demonstrations and pLM log-odds serve as fitness estimates. Building on this perspective, we introduce EvoIF, a lightweight model that integrates two complementary sources of evolutionary signal: (i) within-family profiles from retrieved homologs and (ii) cross-family structural-evolutionary constraints distilled from inverse folding logits. EvoIF fuses sequence-structure representations with these profiles via a compact transition block, yielding calibrated probabilities for log-odds scoring. On ProteinGym (217 mutational assays; >2.5M mutants), EvoIF and its MSA-enabled variant achieve state-of-the-art or competitive performance while using only 0.15% of the training data and fewer parameters than recent large models. Ablations confirm that within-family and cross-family profiles are complementary, improving robustness across function types, MSA depths, taxa, and mutation depths. The codes will be made publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EvoIF fuses within-family homolog profiles with cross-family inverse-folding signals in a compact model that claims competitive ProteinGym results at 0.15% of typical training data.

read the letter

The main thing to know is that this paper presents a small model, EvoIF, that combines two evolutionary signals for mutation fitness prediction and reports competitive or better numbers on ProteinGym while using far less data and fewer parameters than recent large pLMs. The IRL framing of MLM as inverse reinforcement learning from natural sequences is the interpretive hook they use to motivate the approach. From there they retrieve within-family profiles from homologs and distill cross-family constraints from inverse-folding logits, then fuse them through a compact transition block before producing log-odds scores. That specific combination and the lightweight architecture are the concrete technical moves. The benchmark covers 217 assays and over 2.5 million mutants, and they include ablations meant to show the two profile sources add value across function types, MSA depths, taxa, and mutation depths. Those results are the practical payoff if they hold up. The softer spot is the ablation story. The abstract states that the profiles are complementary and improve robustness, but without the full methods it is hard to rule out that the transition block or fusion weights were tuned after seeing benchmark outcomes. If that happened, the complementarity claim becomes less convincing and the low-data advantage is harder to attribute cleanly to the signals themselves. Error bars and exact data-split details would also help. This is aimed at people doing protein engineering or synthetic biology who need practical, low-resource predictors rather than the biggest possible model. Readers who already work with evolutionary profiles or inverse folding will see the most immediate value. It deserves a serious referee because the empirical scope is decent and the engineering angle is clear enough to be worth checking in detail. I would send it out for review so the methods and statistical support can be examined properly.

Referee Report

1 major / 3 minor

Summary. The manuscript proposes EvoIF, a lightweight model for protein fitness prediction that fuses within-family evolutionary profiles retrieved from homologs with cross-family structural constraints distilled from inverse-folding logits. It frames pLMs trained via masked language modeling as performing inverse reinforcement learning, with extant sequences as expert demonstrations and log-odds as fitness estimates. On the ProteinGym benchmark (217 mutational assays, >2.5M mutants), both the base EvoIF and its MSA-enabled variant report state-of-the-art or competitive performance while using only 0.15% of typical training data and fewer parameters than recent large models. Ablations are presented to demonstrate complementarity of the two profile sources and robustness across function types, MSA depths, taxa, and mutation depths.

Significance. If the performance and complementarity claims hold after clarification of experimental controls, the work would be significant for data-efficient protein engineering, offering a practical alternative to large-scale pLMs. The low-data and low-parameter regime is a clear strength, as is the promise of public code release. The IRL interpretive lens is novel but remains largely post-hoc; it does not appear to derive the benchmark numbers from first principles.

major comments (1)

[Ablations] Ablations section: the claim that within-family and cross-family profiles are complementary and improve robustness rests on the reported fusion results. The manuscript must explicitly state whether the transition block architecture, fusion weights, or any hyperparameters were tuned or selected after inspecting ProteinGym outcomes. If any optimization occurred on the 217 assays used for final reporting, the complementarity conclusion risks being circular and the low-data advantage harder to attribute solely to the evolutionary signals.

minor comments (3)

[Abstract] Abstract: the statement that codes 'will be made publicly available' should be replaced with a concrete repository URL or DOI at revision.
[Methods] Methods: provide the precise mathematical definition of the transition block and the fusion operation (e.g., how logits and profiles are combined into calibrated probabilities).
[Results] Results: include error bars or statistical significance tests for the benchmark comparisons to support the 'state-of-the-art or competitive' claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and insightful comments. We address the major concern regarding potential circularity in the ablation studies below and commit to revisions that improve transparency around experimental controls.

read point-by-point responses

Referee: [Ablations] Ablations section: the claim that within-family and cross-family profiles are complementary and improve robustness rests on the reported fusion results. The manuscript must explicitly state whether the transition block architecture, fusion weights, or any hyperparameters were tuned or selected after inspecting ProteinGym outcomes. If any optimization occurred on the 217 assays used for final reporting, the complementarity conclusion risks being circular and the low-data advantage harder to attribute solely to the evolutionary signals.

Authors: We appreciate the referee's emphasis on rigorous experimental controls. Upon internal review, the transition block architecture, fusion weights, and all other hyperparameters were fixed prior to the final ProteinGym evaluation. These choices were guided by a small, disjoint validation subset of mutational assays (distinct from the 217 reported) together with architectural precedents from related evolutionary profile literature. No optimization or selection occurred on the full set of 217 assays used for benchmarking. In the revised manuscript we will add an explicit paragraph in the Methods section and a dedicated note in the Ablations section documenting this procedure, the validation split used, and confirmation that all design decisions were frozen before final reporting. This clarification will eliminate any ambiguity about circularity and more clearly attribute performance gains to the complementarity of the two evolutionary signals. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained against external benchmark

full rationale

The paper frames natural evolution as implicit reward maximization and MLM as IRL purely as an interpretive unifying view, not as a derivation whose equations reduce the reported log-odds scores or benchmark numbers to fitted inputs by construction. The central results are evaluated on the external ProteinGym dataset (217 assays, >2.5M mutants), with the low-data and parameter-efficiency claims tied directly to that independent test set rather than to any self-defined or self-cited quantity. No equations, ablations, or fusion steps are shown to collapse into the inputs via self-definition, post-hoc fitting renamed as prediction, or load-bearing self-citation chains. This is the normal, non-circular outcome for a paper whose performance claims rest on an external, held-out benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that natural sequences encode fitness information recoverable by both sequence profiles and inverse-folding logits, and that these two sources are additive rather than redundant.

axioms (2)

domain assumption Natural evolution can be interpreted as implicit reward maximization
Invoked to frame MLM training as inverse reinforcement learning with extant sequences as expert demonstrations.
domain assumption Retrieved homologs and inverse-folding logits supply complementary evolutionary signals
Underlies the claim that fusing the two profiles improves robustness across assay types and taxa.

pith-pipeline@v0.9.0 · 5779 in / 1383 out tokens · 39059 ms · 2026-05-18T08:53:59.343168+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

EvoIF fuses sequence-structure representations with these profiles via a compact transition block, yielding calibrated probabilities for log-odds scoring.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Ablations confirm that within-family and cross-family profiles are complementary, improving robustness across function types, MSA depths, taxa, and mutation depths.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 4 internal anchors

[1]

Correlated mutations and residue contacts in proteins.Proteins: Structure, Function, and Bioin- formatics, 18(4):309–317, 1994

Ulrike Göbel, Chris Sander, Reinhard Schneider, and Alfonso Valencia. Correlated mutations and residue contacts in proteins.Proteins: Structure, Function, and Bioin- formatics, 18(4):309–317, 1994. doi: https://doi.org/10.1002/prot.340180402. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.340180402

work page doi:10.1002/prot.340180402 1994
[2]

Exploring protein fitness landscapes by directed evolution.Nature reviews Molecular cell biology, 10(12):866–876, 2009

Philip A Romero and Frances H Arnold. Exploring protein fitness landscapes by directed evolution.Nature reviews Molecular cell biology, 10(12):866–876, 2009

work page 2009
[3]

Machine learning for functional protein design.Nature biotechnology, 42(2):216–228, 2024

Pascal Notin, Nathan Rollins, Yarin Gal, Chris Sander, and Debora Marks. Machine learning for functional protein design.Nature biotechnology, 42(2):216–228, 2024

work page 2024
[4]

Low-n protein engineering with data-efficient deep learning.Nature methods, 18(4):389–396, 2021

Surojit Biswas, Grigory Khimulya, Ethan C Alley, Kevin M Esvelt, and George M Church. Low-n protein engineering with data-efficient deep learning.Nature methods, 18(4):389–396, 2021

work page 2021
[5]

Mutation effects predicted from sequence co-variation.Nature biotechnology, 35(2):128–135, 2017

Thomas A Hopf, John B Ingraham, Frank J Poelwijk, Charlotta PI Schärfe, Michael Springer, Chris Sander, and Debora S Marks. Mutation effects predicted from sequence co-variation.Nature biotechnology, 35(2):128–135, 2017

work page 2017
[6]

Language models enable zero-shot prediction of the effects of mutations on protein function.Advances in neural information processing systems, 34:29287–29303, 2021

Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alex Rives. Language models enable zero-shot prediction of the effects of mutations on protein function.Advances in neural information processing systems, 34:29287–29303, 2021

work page 2021
[7]

Multi-scale representation learning for protein fitness prediction

Zuobai Zhang, Pascal Notin, Yining Huang, Aurelie Lozano, Vijil Chenthamarakshan, Debora Marks, Payel Das, and Jian Tang. Multi-scale representation learning for protein fitness prediction. InAdvances in Neural Information Processing Systems, 2024

work page 2024
[8]

Lawrence Zitnick, Jerry Ma, and Rob Fergus

Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.PNAS, 2019. doi: 10.1101/622803. URL https://www.biorxiv.org/ content/10.1101/622803v4

work page doi:10.1101/622803 2019
[9]

bioRxiv (2022)

Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives. Language models of protein sequences at the scale of evolution enable accurate structure prediction.bioRxiv, 2022. doi: 10.1101/2022.07.20.500902. URLhttps://www.biorxiv. org/content/early/2022...

work page doi:10.1101/2022.07.20.500902 2022
[10]

Learning inverse folding from millions of predicted structures

Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, and Alexander Rives. Learning inverse folding from millions of predicted structures. In International conference on machine learning, pages 8946–8970. PMLR, 2022

work page 2022
[11]

Proteingym: 11 Large-scale benchmarks for protein fitness prediction and design.Advances in Neural Information Processing Systems, 36:64331–64379, 2023

Pascal Notin, Aaron Kollasch, Daniel Ritter, Lood Van Niekerk, Steffanie Paul, Han Spinner, NathanRollins, AdaShaw, RoseOrenbuch, RubenWeitzman, etal. Proteingym: 11 Large-scale benchmarks for protein fitness prediction and design.Advances in Neural Information Processing Systems, 36:64331–64379, 2023

work page 2023
[12]

Retrieval augmented protein language models for protein structure prediction

Pan Li, Xingyi Cheng, Le Song, and Eric Xing. Retrieval augmented protein language models for protein structure prediction. 2024. doi: 10.1101/2024.12.02.626519. URL https://www.biorxiv.org/content/10.1101/2024.12.02.626519v1

work page doi:10.1101/2024.12.02.626519 2024
[13]

Retrieval- enhanced mutation mastery: Augmenting zero-shot prediction of protein language model

Yang Tan, Ruilin Wang, Banghao Wu, Liang Hong, and Bingxin Zhou. Retrieval- enhanced mutation mastery: Augmenting zero-shot prediction of protein language model. arXiv preprint arXiv: 2410.21127, 2024. URLhttps://arxiv.org/abs/2410.21127

work page arXiv 2024
[14]

Multiple sequence alignment.Current Opinion in Structural Biology, 16(3):368–373, 2006

Robert C Edgar and Serafim Batzoglou. Multiple sequence alignment.Current Opinion in Structural Biology, 16(3):368–373, 2006. ISSN 0959-440X. doi: https://doi.org/ 10.1016/j.sbi.2006.04.004. URL https://www.sciencedirect.com/science/article/ pii/S0959440X06000704. Nucleic acids/Sequences and topology

work page doi:10.1016/j.sbi.2006.04.004 2006
[15]

Esm-if1: Structure-informed protein language model for inverse folding

Faez Hsiao, Tarek Tadesse, Hayley Ho, Christopher Davis, Dan Jurafsky, and Jure Leskovec. Esm-if1: Structure-informed protein language model for inverse folding. bioRxiv, 2023. doi: 10.1101/2023.05.23.542000. URL https://www.biorxiv.org/ content/10.1101/2023.05.23.542000v1

work page doi:10.1101/2023.05.23.542000 2023
[16]

Algorithms for inverse reinforcement learning

Andrew Y Ng, Stuart Russell, et al. Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000

work page 2000
[17]

Ziebart, Andrew Maas, J

Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. Maximum entropy inverse reinforcement learning. InProceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, AAAI’08, page 1433–1438. AAAI Press, 2008. ISBN 9781577353683

work page 2008
[18]

Fast and accurate protein structure search with foldseek.Nature biotechnology, 42(2):243–246, 2024

Michel Van Kempen, Stephanie S Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron LM Gilchrist, Johannes Söding, and Martin Steinegger. Fast and accurate protein structure search with foldseek.Nature biotechnology, 42(2):243–246, 2024

work page 2024
[19]

Shanker, Theodora U

Varun R. Shanker, Theodora U. J. Bruun, Brian L. Hie, and Peter S. Kim. Unsupervised evolution of protein and antibody complexes with a structure-informed language model. Science, 385(6704):46–53, 2024. doi: 10.1126/science.adk8946. URL https://www. science.org/doi/abs/10.1126/science.adk8946

work page doi:10.1126/science.adk8946 2024
[20]

Advancing protein evolution with inverse folding models integrating structural and evolutionary constraints.Cell, 188(17):4674–4692.e19, 2025

Hongyuan Fei, Yunjia Li, Yijing Liu, Jingjing Wei, Aojie Chen, and Caixia Gao. Advancing protein evolution with inverse folding models integrating structural and evolutionary constraints.Cell, 188(17):4674–4692.e19, 2025. ISSN 0092-8674. doi: https://doi.org/10.1016/j.cell.2025.06.014. URL https://www.sciencedirect.com/ science/article/pii/S0092867425006804

work page doi:10.1016/j.cell.2025.06.014 2025
[21]

Deep mutational scanning: a new style of protein science.Nature Methods, 2014

Douglas M Fowler and Stanley Fields. Deep mutational scanning: a new style of protein science.Nature Methods, 2014. doi: 10.1038/nmeth.3027. URL https://doi.org/10. 1038/nmeth.3027

work page doi:10.1038/nmeth.3027 2014
[22]

Semantical and geometrical protein encoding toward enhanced bioactivity and thermostability.Elife, 13:RP98033, 2025

Yang Tan, Bingxin Zhou, Lirong Zheng, Guisheng Fan, and Liang Hong. Semantical and geometrical protein encoding toward enhanced bioactivity and thermostability.Elife, 13:RP98033, 2025

work page 2025
[23]

Saprot: Protein language modeling with structure-aware vocabulary.BioRxiv, pages 2023–10, 2023

Jin Su, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, and Fajie Yuan. Saprot: Protein language modeling with structure-aware vocabulary.BioRxiv, pages 2023–10, 2023

work page 2023
[24]

ProSST: Protein language modeling with quantized structure and disentangled attention

Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Huiqun Yu, Ziyi Zhou, Wanli Ouyang, Bingxin Zhou, Pan Tan, and Liang Hong. ProSST: Protein language modeling with quantized structure and disentangled attention. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 12

work page 2024
[25]

Ning Sun, Shuxian Zou, Tianhua Tao, Sazan Mahbub, Dian Li, Yonghao Zhuang, Hongyi Wang, Xingyi Cheng, Le Song, and Eric P. Xing. Mixture of experts enable efficient and effective protein understanding and design. InNeurIPS 2024 Workshop on AI for New Drug Modalities. bioRxiv, 2024. doi: 10.1101/2024.11.29.625425. URL https://www.biorxiv.org/content/10.110...

work page doi:10.1101/2024.11.29.625425 2024
[26]

Diffusion language models are versatile protein learners

Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, and Quanquan Gu. Diffusion language models are versatile protein learners. InInternational Conference on Machine Learning, 2024

work page 2024
[27]

Language models enable zero-shot prediction of the effects of mutations on protein function

Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alex Rives. Language models enable zero-shot prediction of the effects of mutations on protein function. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 29287–29303. Curran Associate...

work page 2021
[28]

Epistasis in protein evolution.Protein science, 25(7):1204–1218, 2016

Tyler N Starr and Joseph W Thornton. Epistasis in protein evolution.Protein science, 25(7):1204–1218, 2016

work page 2016
[29]

Deep Think with Confidence

Yichao Fu, Xuewei Wang, Yuandong Tian, and Jiawei Zhao. Deep think with confidence. arXiv preprint arXiv: 2508.15260, 2025. URLhttps://arxiv.org/abs/2508.15260

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Deep researcher with test-time diffusion, 2025

Rujun Han, Yanfei Chen, Zoey CuiZhu, Lesly Miculicich, Guan Sun, Yuanjun Bi, Weiming Wen, Hui Wan, Chunfeng Wen, Solène Maître, George Lee, Vishy Tirumalashetty, Emily Xue, Zizhao Zhang, Salem Haykal, Burak Gokturk, Tomas Pfister, and Chen-Yu Lee. Deep researcher with test-time diffusion, 2025. URL https://arxiv.org/abs/2507.16075

work page arXiv 2025
[31]

Para- thinker: Native parallel thinking as a new paradigm to scale llm test-time compute.arXiv preprint arXiv:2509.04475,

Hao Wen, Yifan Su, Feifei Zhang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, and Yuanchun Li. Parathinker: Native parallel thinking as a new paradigm to scale llm test-time compute.arXiv preprint arXiv: 2509.04475, 2025

work page arXiv 2025
[32]

The majority is not always right: Rl training for solution aggregation.arXiv preprint arXiv:2509.06870, 2025

Wenting Zhao, Pranjal Aggarwal, Swarnadeep Saha, Asli Celikyilmaz, Jason Weston, and Ilia Kulikov. The majority is not always right: Rl training for solution aggregation. arXiv preprint arXiv: 2509.06870, 2025

work page arXiv 2025
[33]

Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael J. L. Townshend, and Ron Dror. Learning from protein structure with geometric vector perceptrons, 2021. URL https://arxiv.org/abs/2009.01411

work page arXiv 2021
[34]

Steering protein family design through profile bayesian flow

Jingjing Gong, Yu Pei, Siyu Long, Yuxuan Song, Zhe Zhang, Wenhao Huang, Ziyao Cao, Shuyi Zhang, Hao Zhou, and Wei-Ying Ma. Steering protein family design through profile bayesian flow. InThe Thirteenth International Conference on Learning Representations,

work page
[35]

URLhttps://openreview.net/forum?id=PSiijdQjNU

work page
[36]

Boltz- 2: Towards accurate and efficient binding affinity prediction.bioRxiv, 2025

Saro Passaro, Gabriele Corso, Jeremy Wohlwend, Mateo Reveiz, Stephan Thaler, Vi- gnesh Ram Somnath, Noah Getz, Tally Portnoi, Julien Roy, Hannes Stark, David Kwabi-Addo, Dominique Beaini, Tommi Jaakkola, and Regina Barzilay. Boltz- 2: Towards accurate and efficient binding affinity prediction.bioRxiv, 2025. doi: 10.1101/2025.06.14.659707

work page doi:10.1101/2025.06.14.659707 2025
[37]

Amix- 1: A pathway to test-time scalable protein foundation model.arXiv preprint arXiv: 2507.08920, 2025

Changze Lv, Jiang Zhou, Siyu Long, Lihao Wang, Jiangtao Feng, Dongyu Xue, Yu Pei, Hao Wang, Zherui Zhang, Yuchen Cai, Zhiqiang Gao, Ziyuan Ma, Jiakai Hu, Chaochen Gao, Jingjing Gong, Yuxuan Song, Shuyi Zhang, Xiaoqing Zheng, Deyi Xiong, Lei Bai, Wanli Ouyang, Ya-Qin Zhang, Wei-Ying Ma, Bowen Zhou, and Hao Zhou. Amix- 1: A pathway to test-time scalable pro...

work page arXiv 2025
[38]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding, 2019. URL https://arxiv.org/abs/1810.04805. 13

work page internal anchor Pith review Pith/arXiv arXiv 2019
[39]

Cath: increased structural coverage of functional space.Nucleic acids research, 49(D1): D266–D273, 2021

Ian Sillitoe, Nicola Bordin, Natalie Dawson, Vaishali P Waman, Paul Ashford, Harry M Scholes, Camilla SM Pang, Laurel Woodridge, Clemens Rauer, Neeladri Sen, et al. Cath: increased structural coverage of functional space.Nucleic acids research, 49(D1): D266–D273, 2021

work page 2021
[40]

Gemme: a simple and fast global epistatic model predicting mutational effects.Molecular biology and evolution, 36 (11):2604–2619, 2019

Elodie Laine, Yasaman Karami, and Alessandra Carbone. Gemme: a simple and fast global epistatic model predicting mutational effects.Molecular biology and evolution, 36 (11):2604–2619, 2019

work page 2019
[41]

Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023

Erik Nijkamp, Jeffrey A Ruffolo, Eli N Weinstein, Nikhil Naik, and Ali Madani. Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023

work page 2023
[42]

Convolutions are competitive with transformers for protein sequence pretraining.Cell Systems, 15(3):286–294, 2024

Kevin K Yang, Nicolo Fusi, and Alex X Lu. Convolutions are competitive with transformers for protein sequence pretraining.Cell Systems, 15(3):286–294, 2024

work page 2024
[43]

Disease variant prediction with deep generative models of evolutionary data.Nature, 599(7883):91–95, 2021

Jonathan Frazer, Pascal Notin, Mafalda Dias, Aidan Gomez, Joseph K Min, Kelly Brock, Yarin Gal, and Debora S Marks. Disease variant prediction with deep generative models of evolutionary data.Nature, 599(7883):91–95, 2021

work page 2021
[44]

Msa transformer

Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, and Alexander Rives. Msa transformer. InInternational conference on machine learning, pages 8844–8856. PMLR, 2021

work page 2021
[45]

Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

Pascal Notin, Mafalda Dias, Jonathan Frazer, Javier Marchena-Hurtado, Aidan N Gomez, Debora Marks, and Yarin Gal. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. InInternational Conference on Machine Learning, pages 16990–17017. PMLR, 2022

work page 2022
[46]

Trancepteve: Combining family-specific and family-agnostic models of protein sequences for improved fitness prediction.bioRxiv, pages 2022–12, 2022

Pascal Notin, Lood Van Niekerk, Aaron W Kollasch, Daniel Ritter, Yarin Gal, and Debora S Marks. Trancepteve: Combining family-specific and family-agnostic models of protein sequences for improved fitness prediction.bioRxiv, pages 2022–12, 2022

work page 2022
[47]

Robust deep learning–based protein sequence design using proteinmpnn.Science, 378 (6615):49–56, 2022

Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J Ragotte, Lukas F Milles, Basile IM Wicky, Alexis Courbet, Rob J de Haas, Neville Bethel, et al. Robust deep learning–based protein sequence design using proteinmpnn.Science, 378 (6615):49–56, 2022

work page 2022
[48]

Masked inverse folding with sequence transfer for protein representation learning.Protein Engineering, Design and Selection, 36:gzad015, 2023

Kevin K Yang, Niccolò Zanichelli, and Hugh Yeh. Masked inverse folding with sequence transfer for protein representation learning.Protein Engineering, Design and Selection, 36:gzad015, 2023

work page 2023
[49]

Deep generative models of genetic variation capture the effects of mutations.Nature methods, 15(10):816–822, 2018

Adam J Riesselman, John B Ingraham, and Debora S Marks. Deep generative models of genetic variation capture the effects of mutations.Nature methods, 15(10):816–822, 2018

work page 2018
[50]

Evolutionary-scaleprediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, RobertVerkuil, OriKabeli, YanivShmueli, etal. Evolutionary-scaleprediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

work page 2023
[51]

Muon is Scalable for LLM Training

Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yidao Qin, Weixin Xu, Enzhe Lu, Junjie Yan, Yanru Chen, Huabin Zheng, Yibo Liu, Shaowei Liu, Bohong Yin, Weiran He, Han Zhu, Yuzhi Wang, Jianzhou Wang, Mengnan Dong, Zheng Zhang, Yongsheng Kang, Hao Zhang, Xinran Xu, Yutao Zhang, Yuxin Wu, Xinyu Zhou, and Zhilin Yang. Muon is sca...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 14

work page internal anchor Pith review Pith/arXiv arXiv 2017
[53]

Alphafold protein structure database in 2024: providing structure coverage for over 214 million protein se- quences.Nucleic Acids Research, 52(D1):D368–D375, 2024

Mihaly Varadi, Damian Bertoni, Paulyna Magana, Urmila Paramval, Ivanna Pidruchna, Malarvizhi Radhakrishnan, Maxim Tsenkov, Sreenath Nair, Milot Mirdita, Jingi Yeo, Oleg Kovalevskiy, Kathryn Tunyasuvunakool, Agata Laydon, Augustin Žídek, Hamish Tomlinson, Dhavanthi Hariharan, Josh Abrahamson, Tim Green, John Jumper, Ewan Birney, Martin Steinegger, Demis Ha...

work page doi:10.1093/nar/gkad1011 2024
[54]

for other parameters. Matrix parameters (defined as parameters with dimensionality ≥2D) are optimized using Muon with a learning rate of1× 10−3, momentum of 0.95, 5 Newton-Schulz steps, and weight decay of 0.1. The remaining parameters use AdamW with β1 = 0.9, β2 = 0.95, ϵ = 1 × 10−8, and weight decay of 0.1.Parameters are automatically routed based on di...

work page 2048

[1] [1]

Correlated mutations and residue contacts in proteins.Proteins: Structure, Function, and Bioin- formatics, 18(4):309–317, 1994

Ulrike Göbel, Chris Sander, Reinhard Schneider, and Alfonso Valencia. Correlated mutations and residue contacts in proteins.Proteins: Structure, Function, and Bioin- formatics, 18(4):309–317, 1994. doi: https://doi.org/10.1002/prot.340180402. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.340180402

work page doi:10.1002/prot.340180402 1994

[2] [2]

Exploring protein fitness landscapes by directed evolution.Nature reviews Molecular cell biology, 10(12):866–876, 2009

Philip A Romero and Frances H Arnold. Exploring protein fitness landscapes by directed evolution.Nature reviews Molecular cell biology, 10(12):866–876, 2009

work page 2009

[3] [3]

Machine learning for functional protein design.Nature biotechnology, 42(2):216–228, 2024

Pascal Notin, Nathan Rollins, Yarin Gal, Chris Sander, and Debora Marks. Machine learning for functional protein design.Nature biotechnology, 42(2):216–228, 2024

work page 2024

[4] [4]

Low-n protein engineering with data-efficient deep learning.Nature methods, 18(4):389–396, 2021

Surojit Biswas, Grigory Khimulya, Ethan C Alley, Kevin M Esvelt, and George M Church. Low-n protein engineering with data-efficient deep learning.Nature methods, 18(4):389–396, 2021

work page 2021

[5] [5]

Mutation effects predicted from sequence co-variation.Nature biotechnology, 35(2):128–135, 2017

Thomas A Hopf, John B Ingraham, Frank J Poelwijk, Charlotta PI Schärfe, Michael Springer, Chris Sander, and Debora S Marks. Mutation effects predicted from sequence co-variation.Nature biotechnology, 35(2):128–135, 2017

work page 2017

[6] [6]

Language models enable zero-shot prediction of the effects of mutations on protein function.Advances in neural information processing systems, 34:29287–29303, 2021

Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alex Rives. Language models enable zero-shot prediction of the effects of mutations on protein function.Advances in neural information processing systems, 34:29287–29303, 2021

work page 2021

[7] [7]

Multi-scale representation learning for protein fitness prediction

Zuobai Zhang, Pascal Notin, Yining Huang, Aurelie Lozano, Vijil Chenthamarakshan, Debora Marks, Payel Das, and Jian Tang. Multi-scale representation learning for protein fitness prediction. InAdvances in Neural Information Processing Systems, 2024

work page 2024

[8] [8]

Lawrence Zitnick, Jerry Ma, and Rob Fergus

Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.PNAS, 2019. doi: 10.1101/622803. URL https://www.biorxiv.org/ content/10.1101/622803v4

work page doi:10.1101/622803 2019

[9] [9]

bioRxiv (2022)

Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives. Language models of protein sequences at the scale of evolution enable accurate structure prediction.bioRxiv, 2022. doi: 10.1101/2022.07.20.500902. URLhttps://www.biorxiv. org/content/early/2022...

work page doi:10.1101/2022.07.20.500902 2022

[10] [10]

Learning inverse folding from millions of predicted structures

Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, and Alexander Rives. Learning inverse folding from millions of predicted structures. In International conference on machine learning, pages 8946–8970. PMLR, 2022

work page 2022

[11] [11]

Proteingym: 11 Large-scale benchmarks for protein fitness prediction and design.Advances in Neural Information Processing Systems, 36:64331–64379, 2023

Pascal Notin, Aaron Kollasch, Daniel Ritter, Lood Van Niekerk, Steffanie Paul, Han Spinner, NathanRollins, AdaShaw, RoseOrenbuch, RubenWeitzman, etal. Proteingym: 11 Large-scale benchmarks for protein fitness prediction and design.Advances in Neural Information Processing Systems, 36:64331–64379, 2023

work page 2023

[12] [12]

Retrieval augmented protein language models for protein structure prediction

Pan Li, Xingyi Cheng, Le Song, and Eric Xing. Retrieval augmented protein language models for protein structure prediction. 2024. doi: 10.1101/2024.12.02.626519. URL https://www.biorxiv.org/content/10.1101/2024.12.02.626519v1

work page doi:10.1101/2024.12.02.626519 2024

[13] [13]

Retrieval- enhanced mutation mastery: Augmenting zero-shot prediction of protein language model

Yang Tan, Ruilin Wang, Banghao Wu, Liang Hong, and Bingxin Zhou. Retrieval- enhanced mutation mastery: Augmenting zero-shot prediction of protein language model. arXiv preprint arXiv: 2410.21127, 2024. URLhttps://arxiv.org/abs/2410.21127

work page arXiv 2024

[14] [14]

Multiple sequence alignment.Current Opinion in Structural Biology, 16(3):368–373, 2006

Robert C Edgar and Serafim Batzoglou. Multiple sequence alignment.Current Opinion in Structural Biology, 16(3):368–373, 2006. ISSN 0959-440X. doi: https://doi.org/ 10.1016/j.sbi.2006.04.004. URL https://www.sciencedirect.com/science/article/ pii/S0959440X06000704. Nucleic acids/Sequences and topology

work page doi:10.1016/j.sbi.2006.04.004 2006

[15] [15]

Esm-if1: Structure-informed protein language model for inverse folding

Faez Hsiao, Tarek Tadesse, Hayley Ho, Christopher Davis, Dan Jurafsky, and Jure Leskovec. Esm-if1: Structure-informed protein language model for inverse folding. bioRxiv, 2023. doi: 10.1101/2023.05.23.542000. URL https://www.biorxiv.org/ content/10.1101/2023.05.23.542000v1

work page doi:10.1101/2023.05.23.542000 2023

[16] [16]

Algorithms for inverse reinforcement learning

Andrew Y Ng, Stuart Russell, et al. Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000

work page 2000

[17] [17]

Ziebart, Andrew Maas, J

Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. Maximum entropy inverse reinforcement learning. InProceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, AAAI’08, page 1433–1438. AAAI Press, 2008. ISBN 9781577353683

work page 2008

[18] [18]

Fast and accurate protein structure search with foldseek.Nature biotechnology, 42(2):243–246, 2024

Michel Van Kempen, Stephanie S Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron LM Gilchrist, Johannes Söding, and Martin Steinegger. Fast and accurate protein structure search with foldseek.Nature biotechnology, 42(2):243–246, 2024

work page 2024

[19] [19]

Shanker, Theodora U

Varun R. Shanker, Theodora U. J. Bruun, Brian L. Hie, and Peter S. Kim. Unsupervised evolution of protein and antibody complexes with a structure-informed language model. Science, 385(6704):46–53, 2024. doi: 10.1126/science.adk8946. URL https://www. science.org/doi/abs/10.1126/science.adk8946

work page doi:10.1126/science.adk8946 2024

[20] [20]

Advancing protein evolution with inverse folding models integrating structural and evolutionary constraints.Cell, 188(17):4674–4692.e19, 2025

Hongyuan Fei, Yunjia Li, Yijing Liu, Jingjing Wei, Aojie Chen, and Caixia Gao. Advancing protein evolution with inverse folding models integrating structural and evolutionary constraints.Cell, 188(17):4674–4692.e19, 2025. ISSN 0092-8674. doi: https://doi.org/10.1016/j.cell.2025.06.014. URL https://www.sciencedirect.com/ science/article/pii/S0092867425006804

work page doi:10.1016/j.cell.2025.06.014 2025

[21] [21]

Deep mutational scanning: a new style of protein science.Nature Methods, 2014

Douglas M Fowler and Stanley Fields. Deep mutational scanning: a new style of protein science.Nature Methods, 2014. doi: 10.1038/nmeth.3027. URL https://doi.org/10. 1038/nmeth.3027

work page doi:10.1038/nmeth.3027 2014

[22] [22]

Semantical and geometrical protein encoding toward enhanced bioactivity and thermostability.Elife, 13:RP98033, 2025

Yang Tan, Bingxin Zhou, Lirong Zheng, Guisheng Fan, and Liang Hong. Semantical and geometrical protein encoding toward enhanced bioactivity and thermostability.Elife, 13:RP98033, 2025

work page 2025

[23] [23]

Saprot: Protein language modeling with structure-aware vocabulary.BioRxiv, pages 2023–10, 2023

Jin Su, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, and Fajie Yuan. Saprot: Protein language modeling with structure-aware vocabulary.BioRxiv, pages 2023–10, 2023

work page 2023

[24] [24]

ProSST: Protein language modeling with quantized structure and disentangled attention

Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Huiqun Yu, Ziyi Zhou, Wanli Ouyang, Bingxin Zhou, Pan Tan, and Liang Hong. ProSST: Protein language modeling with quantized structure and disentangled attention. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 12

work page 2024

[25] [25]

Ning Sun, Shuxian Zou, Tianhua Tao, Sazan Mahbub, Dian Li, Yonghao Zhuang, Hongyi Wang, Xingyi Cheng, Le Song, and Eric P. Xing. Mixture of experts enable efficient and effective protein understanding and design. InNeurIPS 2024 Workshop on AI for New Drug Modalities. bioRxiv, 2024. doi: 10.1101/2024.11.29.625425. URL https://www.biorxiv.org/content/10.110...

work page doi:10.1101/2024.11.29.625425 2024

[26] [26]

Diffusion language models are versatile protein learners

Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, and Quanquan Gu. Diffusion language models are versatile protein learners. InInternational Conference on Machine Learning, 2024

work page 2024

[27] [27]

Language models enable zero-shot prediction of the effects of mutations on protein function

Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alex Rives. Language models enable zero-shot prediction of the effects of mutations on protein function. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 29287–29303. Curran Associate...

work page 2021

[28] [28]

Epistasis in protein evolution.Protein science, 25(7):1204–1218, 2016

Tyler N Starr and Joseph W Thornton. Epistasis in protein evolution.Protein science, 25(7):1204–1218, 2016

work page 2016

[29] [29]

Deep Think with Confidence

Yichao Fu, Xuewei Wang, Yuandong Tian, and Jiawei Zhao. Deep think with confidence. arXiv preprint arXiv: 2508.15260, 2025. URLhttps://arxiv.org/abs/2508.15260

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Deep researcher with test-time diffusion, 2025

Rujun Han, Yanfei Chen, Zoey CuiZhu, Lesly Miculicich, Guan Sun, Yuanjun Bi, Weiming Wen, Hui Wan, Chunfeng Wen, Solène Maître, George Lee, Vishy Tirumalashetty, Emily Xue, Zizhao Zhang, Salem Haykal, Burak Gokturk, Tomas Pfister, and Chen-Yu Lee. Deep researcher with test-time diffusion, 2025. URL https://arxiv.org/abs/2507.16075

work page arXiv 2025

[31] [31]

Para- thinker: Native parallel thinking as a new paradigm to scale llm test-time compute.arXiv preprint arXiv:2509.04475,

Hao Wen, Yifan Su, Feifei Zhang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, and Yuanchun Li. Parathinker: Native parallel thinking as a new paradigm to scale llm test-time compute.arXiv preprint arXiv: 2509.04475, 2025

work page arXiv 2025

[32] [32]

The majority is not always right: Rl training for solution aggregation.arXiv preprint arXiv:2509.06870, 2025

Wenting Zhao, Pranjal Aggarwal, Swarnadeep Saha, Asli Celikyilmaz, Jason Weston, and Ilia Kulikov. The majority is not always right: Rl training for solution aggregation. arXiv preprint arXiv: 2509.06870, 2025

work page arXiv 2025

[33] [33]

Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael J. L. Townshend, and Ron Dror. Learning from protein structure with geometric vector perceptrons, 2021. URL https://arxiv.org/abs/2009.01411

work page arXiv 2021

[34] [34]

Steering protein family design through profile bayesian flow

Jingjing Gong, Yu Pei, Siyu Long, Yuxuan Song, Zhe Zhang, Wenhao Huang, Ziyao Cao, Shuyi Zhang, Hao Zhou, and Wei-Ying Ma. Steering protein family design through profile bayesian flow. InThe Thirteenth International Conference on Learning Representations,

work page

[35] [35]

URLhttps://openreview.net/forum?id=PSiijdQjNU

work page

[36] [36]

Boltz- 2: Towards accurate and efficient binding affinity prediction.bioRxiv, 2025

Saro Passaro, Gabriele Corso, Jeremy Wohlwend, Mateo Reveiz, Stephan Thaler, Vi- gnesh Ram Somnath, Noah Getz, Tally Portnoi, Julien Roy, Hannes Stark, David Kwabi-Addo, Dominique Beaini, Tommi Jaakkola, and Regina Barzilay. Boltz- 2: Towards accurate and efficient binding affinity prediction.bioRxiv, 2025. doi: 10.1101/2025.06.14.659707

work page doi:10.1101/2025.06.14.659707 2025

[37] [37]

Amix- 1: A pathway to test-time scalable protein foundation model.arXiv preprint arXiv: 2507.08920, 2025

Changze Lv, Jiang Zhou, Siyu Long, Lihao Wang, Jiangtao Feng, Dongyu Xue, Yu Pei, Hao Wang, Zherui Zhang, Yuchen Cai, Zhiqiang Gao, Ziyuan Ma, Jiakai Hu, Chaochen Gao, Jingjing Gong, Yuxuan Song, Shuyi Zhang, Xiaoqing Zheng, Deyi Xiong, Lei Bai, Wanli Ouyang, Ya-Qin Zhang, Wei-Ying Ma, Bowen Zhou, and Hao Zhou. Amix- 1: A pathway to test-time scalable pro...

work page arXiv 2025

[38] [38]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding, 2019. URL https://arxiv.org/abs/1810.04805. 13

work page internal anchor Pith review Pith/arXiv arXiv 2019

[39] [39]

Cath: increased structural coverage of functional space.Nucleic acids research, 49(D1): D266–D273, 2021

Ian Sillitoe, Nicola Bordin, Natalie Dawson, Vaishali P Waman, Paul Ashford, Harry M Scholes, Camilla SM Pang, Laurel Woodridge, Clemens Rauer, Neeladri Sen, et al. Cath: increased structural coverage of functional space.Nucleic acids research, 49(D1): D266–D273, 2021

work page 2021

[40] [40]

Gemme: a simple and fast global epistatic model predicting mutational effects.Molecular biology and evolution, 36 (11):2604–2619, 2019

Elodie Laine, Yasaman Karami, and Alessandra Carbone. Gemme: a simple and fast global epistatic model predicting mutational effects.Molecular biology and evolution, 36 (11):2604–2619, 2019

work page 2019

[41] [41]

Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023

Erik Nijkamp, Jeffrey A Ruffolo, Eli N Weinstein, Nikhil Naik, and Ali Madani. Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023

work page 2023

[42] [42]

Convolutions are competitive with transformers for protein sequence pretraining.Cell Systems, 15(3):286–294, 2024

Kevin K Yang, Nicolo Fusi, and Alex X Lu. Convolutions are competitive with transformers for protein sequence pretraining.Cell Systems, 15(3):286–294, 2024

work page 2024

[43] [43]

Disease variant prediction with deep generative models of evolutionary data.Nature, 599(7883):91–95, 2021

Jonathan Frazer, Pascal Notin, Mafalda Dias, Aidan Gomez, Joseph K Min, Kelly Brock, Yarin Gal, and Debora S Marks. Disease variant prediction with deep generative models of evolutionary data.Nature, 599(7883):91–95, 2021

work page 2021

[44] [44]

Msa transformer

Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, and Alexander Rives. Msa transformer. InInternational conference on machine learning, pages 8844–8856. PMLR, 2021

work page 2021

[45] [45]

Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

Pascal Notin, Mafalda Dias, Jonathan Frazer, Javier Marchena-Hurtado, Aidan N Gomez, Debora Marks, and Yarin Gal. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. InInternational Conference on Machine Learning, pages 16990–17017. PMLR, 2022

work page 2022

[46] [46]

Trancepteve: Combining family-specific and family-agnostic models of protein sequences for improved fitness prediction.bioRxiv, pages 2022–12, 2022

Pascal Notin, Lood Van Niekerk, Aaron W Kollasch, Daniel Ritter, Yarin Gal, and Debora S Marks. Trancepteve: Combining family-specific and family-agnostic models of protein sequences for improved fitness prediction.bioRxiv, pages 2022–12, 2022

work page 2022

[47] [47]

Robust deep learning–based protein sequence design using proteinmpnn.Science, 378 (6615):49–56, 2022

Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J Ragotte, Lukas F Milles, Basile IM Wicky, Alexis Courbet, Rob J de Haas, Neville Bethel, et al. Robust deep learning–based protein sequence design using proteinmpnn.Science, 378 (6615):49–56, 2022

work page 2022

[48] [48]

Masked inverse folding with sequence transfer for protein representation learning.Protein Engineering, Design and Selection, 36:gzad015, 2023

Kevin K Yang, Niccolò Zanichelli, and Hugh Yeh. Masked inverse folding with sequence transfer for protein representation learning.Protein Engineering, Design and Selection, 36:gzad015, 2023

work page 2023

[49] [49]

Deep generative models of genetic variation capture the effects of mutations.Nature methods, 15(10):816–822, 2018

Adam J Riesselman, John B Ingraham, and Debora S Marks. Deep generative models of genetic variation capture the effects of mutations.Nature methods, 15(10):816–822, 2018

work page 2018

[50] [50]

Evolutionary-scaleprediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, RobertVerkuil, OriKabeli, YanivShmueli, etal. Evolutionary-scaleprediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

work page 2023

[51] [51]

Muon is Scalable for LLM Training

Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yidao Qin, Weixin Xu, Enzhe Lu, Junjie Yan, Yanru Chen, Huabin Zheng, Yibo Liu, Shaowei Liu, Bohong Yin, Weiran He, Han Zhu, Yuzhi Wang, Jianzhou Wang, Mengnan Dong, Zheng Zhang, Yongsheng Kang, Hao Zhang, Xinran Xu, Yutao Zhang, Yuxin Wu, Xinyu Zhou, and Zhilin Yang. Muon is sca...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[52] [52]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 14

work page internal anchor Pith review Pith/arXiv arXiv 2017

[53] [53]

Alphafold protein structure database in 2024: providing structure coverage for over 214 million protein se- quences.Nucleic Acids Research, 52(D1):D368–D375, 2024

Mihaly Varadi, Damian Bertoni, Paulyna Magana, Urmila Paramval, Ivanna Pidruchna, Malarvizhi Radhakrishnan, Maxim Tsenkov, Sreenath Nair, Milot Mirdita, Jingi Yeo, Oleg Kovalevskiy, Kathryn Tunyasuvunakool, Agata Laydon, Augustin Žídek, Hamish Tomlinson, Dhavanthi Hariharan, Josh Abrahamson, Tim Green, John Jumper, Ewan Birney, Martin Steinegger, Demis Ha...

work page doi:10.1093/nar/gkad1011 2024

[54] [54]

for other parameters. Matrix parameters (defined as parameters with dimensionality ≥2D) are optimized using Muon with a learning rate of1× 10−3, momentum of 0.95, 5 Newton-Schulz steps, and weight decay of 0.1. The remaining parameters use AdamW with β1 = 0.9, β2 = 0.95, ϵ = 1 × 10−8, and weight decay of 0.1.Parameters are automatically routed based on di...

work page 2048