pith. machine review for the scientific record. sign in

arxiv: 2605.10978 · v2 · submitted 2026-05-09 · 🧬 q-bio.QM

Recognition: 2 theorem links

· Lean Theorem

VibeProteinBench: An Evaluation Benchmark for Language-interfaced Vibe Protein Design

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:08 UTC · model grok-4.3

classification 🧬 q-bio.QM
keywords protein designlarge language modelsbenchmark evaluationsequence generationnatural language interfacecomputational biology
0
0 comments X

The pith

No large language model yet masters the full workflow of language-based protein design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VibeProteinBench to test how well language models can handle protein design through open-ended natural language. It breaks the task into three linked stages that mirror real design work: recognizing properties from descriptions, engineering changes to existing proteins, and generating entirely new sequences. Each stage uses expert rationales and computational checks to judge whether outputs are plausible. When the authors ran many general and specialized models through the benchmark, none performed strongly across every stage. This suggests that building a single model capable of flexible, generalist protein design from language remains an open problem.

Core claim

VibeProteinBench evaluates LLMs on a complete protein design workflow via three stages—recognition, engineering, and generation—each supported by mechanistic rationales and in silico validation. Results across diverse models show that no single model achieves strong performance in all three stages at once.

What carries the argument

VibeProteinBench, a language-interfaced benchmark that structures protein design evaluation into recognition, engineering, and generation stages with expert-curated rationales and multi-faceted in silico checks.

Load-bearing premise

That performance across the three stages plus in silico validation gives a reliable picture of broad competence at open-ended protein design.

What would settle it

Discovery of any single model that scores strongly on recognition, engineering, and generation tasks when tested on the full VibeProteinBench suite.

Figures

Figures reproduced from arXiv: 2605.10978 by Cheng-Hao Liu, Feyisayo Eweje, Gina El Nesr, Gyubok Lee, Gyu Rie Lee, Hongjoon Ahn, Hyejin Lee, Hyunjin Seo, Jamin Shin, Jason Yang, Jimin Park, Joseph S Brown, Junlang Liu, Leo Chen, Sangwon Jung, Sarah Gurev, Soojung Yang, Sungjun Han, Sungsoo Ahn, Zhihui Qi.

Figure 1
Figure 1. Figure 1: Example queries across the three stages of VIBEPROTEINBENCH. Each query takes the form of a natural-language task instruction paired with the stage-specific context required to express the design intent. Recognition queries follow a question-answering format with a candidate answer set. Engineering queries supply a wild-type sequence together with mechanistic rationales partitioned into defective positions… view at source ↗
Figure 2
Figure 2. Figure 2: Evaluation dataset construction pipeline of [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cross-stage correlations between recognition, engineering, and generation performance of baseline LLMs. DS denotes DeepSeek and TxG indicates TxGemma-chat models. For detailed subtask-wise experiment results on all subtasks of recognition, engineering, and genera￾tion in our benchmark, please refer to the Appendix B. 5.3 Quantitative analysis Results of non-LLM text-protein generation models. To examine wh… view at source ↗
Figure 4
Figure 4. Figure 4: Correlations between different subtask performance of baseline LLMs. DS denotes [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Snapshot of expert review interface for solubility engineering query. [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
read the original abstract

Protein design aims to compose amino-acid sequences that fold into stable three-dimensional structures while satisfying targeted functional properties. The field is increasingly shifting toward vibe protein design, where a single model is expected to generate novel sequences, engineer existing proteins, and reason about protein characteristics through flexible natural-language constraints. Large language models (LLMs) have emerged as a leading paradigm in this space. However, existing evaluation benchmarks often limit their scope to a partial aspect of protein design, while others restrict design objectives to structured input schemas, lacking an integrated framework that evaluates the broad spectrum of protein design competence under open-ended intents. To this end, we present Vibe Protein design Benchmark (VibeProteinBench), a language-interfaced benchmark that probes generalist capabilities through three complementary stages mirroring a computational protein design workflow: recognition, engineering, and generation. Each stage is grounded in expert-curated mechanistic rationales and multi-faceted in silico validation, to computationally verify whether model outputs are biologically plausible. Evaluations across diverse general-purpose and domain-specialized LLMs reveal that no model achieves strong performance across all three stages, suggesting that generalist protein design remains a substantial open challenge for current LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces VibeProteinBench, a language-interfaced evaluation benchmark for vibe protein design that probes LLMs through three complementary stages—recognition, engineering, and generation—each supported by expert-curated mechanistic rationales and multi-faceted in silico validation (structure prediction, stability, etc.). Evaluations across general-purpose and domain-specialized LLMs show that no model achieves strong performance across all stages, leading to the conclusion that generalist protein design under open-ended natural-language intents remains a substantial open challenge.

Significance. If the benchmark's three-stage structure and validation suite prove reliable, the result would be significant for the field: it supplies an integrated framework that moves beyond partial or schema-restricted evaluations, directly demonstrating current LLM limitations on the full computational protein design workflow. The expert-curated rationales and computational grounding are strengths that could help standardize assessment of broad competence.

major comments (1)
  1. [Evaluation sections] The headline claim (no model strong across all stages, hence generalist design remains open) is load-bearing on the assumption that the three stages plus expert rationales and in silico checks form a sufficient proxy for competence. The multi-faceted in silico validation (structure prediction, stability, etc.) is known to yield false positives for de novo sequences; the manuscript does not report correlation to experimental outcomes or ablation of the validation suite itself. This leaves open the possibility that low scores reflect benchmark noise rather than fundamental limits. (Evaluation sections)
minor comments (1)
  1. [Abstract] Abstract supplies no quantitative results, error bars, or key metrics; adding a sentence with headline performance numbers would make the high-level finding more informative.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing the need for rigorous validation of the benchmark's proxy measures. We address the major comment below and will incorporate revisions to strengthen the discussion of limitations.

read point-by-point responses
  1. Referee: [Evaluation sections] The headline claim (no model strong across all stages, hence generalist design remains open) is load-bearing on the assumption that the three stages plus expert rationales and in silico checks form a sufficient proxy for competence. The multi-faceted in silico validation (structure prediction, stability, etc.) is known to yield false positives for de novo sequences; the manuscript does not report correlation to experimental outcomes or ablation of the validation suite itself. This leaves open the possibility that low scores reflect benchmark noise rather than fundamental limits. (Evaluation sections)

    Authors: We agree that in silico methods such as structure prediction and stability estimation are known to produce false positives for de novo sequences and that the absence of experimental correlation or ablation studies leaves room for benchmark noise to influence scores. Our multi-faceted validation suite, grounded in expert-curated mechanistic rationales, represents a standard computational proxy used in the field to assess biological plausibility at scale, but it is not a substitute for wet-lab confirmation. In the revised manuscript we will add an explicit limitations subsection in the Discussion that (i) acknowledges the risk of false positives, (ii) states that low scores may partly reflect validation noise, and (iii) clarifies that the benchmark functions as an initial filter rather than definitive proof of model limits. We will also report an ablation of the validation components on a subset of tasks where data allow. Full experimental correlation remains outside the scope of this computational benchmark paper. revision: partial

Circularity Check

0 steps flagged

No circularity: benchmark proposal with independent empirical evaluation

full rationale

The paper proposes VibeProteinBench, a three-stage language-interfaced benchmark (recognition, engineering, generation) for evaluating LLMs on protein design tasks. It applies the benchmark to existing general-purpose and domain-specialized models, reports performance scores, and concludes that no model excels across all stages. There are no equations, parameter fittings, derivations, or self-citations that reduce the central claim to its own inputs by construction. The evaluation uses expert-curated rationales and in silico checks as proxies, but these are external to any model training or prior self-referential results. The work is self-contained as a new benchmark definition plus independent testing, with no load-bearing steps that collapse into tautology or fitted renaming.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The benchmark rests on domain assumptions about the sufficiency of in silico checks and expert rationales rather than new fitted parameters or invented entities.

axioms (1)
  • domain assumption Expert-curated mechanistic rationales combined with in silico validation can reliably assess biological plausibility of model outputs
    Invoked to ground each of the three evaluation stages.

pith-pipeline@v0.9.0 · 5589 in / 1111 out tokens · 23661 ms · 2026-05-14T22:08:24.671552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

99 extracted references · 99 canonical work pages · 3 internal anchors

  1. [1]

    Computational protein design.Nature Reviews Methods Primers, 5(1):13, 2025

    Katherine I Albanese, Sophie Barbe, Shunsuke Tagami, Derek N Woolfson, and Thomas Schiex. Computational protein design.Nature Reviews Methods Primers, 5(1):13, 2025

  2. [2]

    Claude opus 4.6

    Anthropic. Claude opus 4.6. https://www-cdn.anthropic.com/ 4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf, 2025

  3. [3]

    Miniprotein design: past, present, and prospects.Accounts of Chemical Research, 50(9):2085–2092, 2017

    Emily G Baker, Gail J Bartlett, Kathryn L Porter Goff, and Derek N Woolfson. Miniprotein design: past, present, and prospects.Accounts of Chemical Research, 50(9):2085–2092, 2017

  4. [4]

    Atomically accurate de novo design of antibodies with rfdiffusion.Nature, 649(8095):183–193, 2026

    Nathaniel R Bennett, Joseph L Watson, Robert J Ragotte, Andrew J Borst, DéJenaé L See, Connor Weidle, Riti Biswas, Yutong Yu, Ellen L Shrock, Russell Ault, et al. Atomically accurate de novo design of antibodies with rfdiffusion.Nature, 649(8095):183–193, 2026

  5. [5]

    The interpro protein families and domains database: 20 years on.Nucleic Acids Research, 49(D1):D344–D354, 2021

    Matthias Blum, Hsin-Yu Chang, Sara Chuguransky, Teresa Grego, Selvam Kandasaamy, Alex Mitchell, Gaurav Nuka, Typhaine Paysan-Lafosse, Matloob Qureshi, Surabhi Raj, et al. The interpro protein families and domains database: 20 years on.Nucleic Acids Research, 49(D1):D344–D354, 2021

  6. [6]

    Burley, Charmi Bhikadiya, Chunxiao Bi, Sebastian Bittrich, Li Chen, et al

    Stephen K. Burley, Charmi Bhikadiya, Chunxiao Bi, Sebastian Bittrich, Li Chen, et al. Rcsb protein data bank: powerful new tools for exploring 3d structures of biological macromolecules. Nucleic Acids Research, 52(D1):D480–D491, 2024

  7. [7]

    Pqa: Zero-shot protein question answering for free-form scientific enquiry with large language models.arXiv preprint arXiv:2402.13653, 2024

    Eli M Carrami and Sahand Sharifzadeh. Pqa: Zero-shot protein question answering for free-form scientific enquiry with large language models.arXiv preprint arXiv:2402.13653, 2024

  8. [8]

    Pyrosetta: a script-based interface for implementing molecular modeling algorithms using rosetta.Bioinformatics, 26(5):689–691, 2010

    Sidhartha Chaudhury, Sergey Lyskov, and Jeffrey J Gray. Pyrosetta: a script-based interface for implementing molecular modeling algorithms using rosetta.Bioinformatics, 26(5):689–691, 2010

  9. [9]

    Cheng, R

    H. Cheng, R. D. Schaeffer, Y . Liao, L. N. Kinch, and N. V . Grishin. Ecod: An evolutionary classification of protein domains.PLoS Computational Biology, 10(12):e1003926, 2014

  10. [10]

    Stable de novo protein design via joint conformational landscape and sequence optimization

    Yehlin Cho, Justas Dauparas, Kotaro Tsuboyama, Gabriel J Rocklin, and Sergey Ovchinnikov. Stable de novo protein design via joint conformational landscape and sequence optimization. Nature Communications, 2025

  11. [11]

    Sparks of function by de novo protein design

    Alexander E Chu, Tianyu Lu, and Po-Ssu Huang. Sparks of function by de novo protein design. Nature biotechnology, 42(2):203–215, 2024

  12. [12]

    Rational protein design.Current Opinion in Structural Biology, 97:103224, 2026

    Joel J Chubb, Aimee L Boyle, and Katherine I Albanese. Rational protein design.Current Opinion in Structural Biology, 97:103224, 2026

  13. [13]

    Biopython: freely available python tools for computational molecular biology and bioinformatics.Bioinformatics, 25(11):1422–1423, 2009

    Peter JA Cock, Tiago Antao, Jeffrey T Chang, Brad A Chapman, Cymon J Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, and Michiel JL de Hoon. Biopython: freely available python tools for computational molecular biology and bioinformatics.Bioinformatics, 25(11):1422–1423, 2009

  14. [14]

    Sophie Colette, Jaldert François, Bart De Moor, and Vera van Noort. Ogtfinder: A curated growth temperature data set and its application to predict optimal growth temperatures of bacteria and archaea.Journal of Chemical Information and Modeling, 2026. 10

  15. [15]

    Crowd- sourced protein design: Lessons from the adaptyv egfr binder competition.bioRxiv, pages 2025–04, 2025

    Tudor-Stefan Cotet, Igor Krawczuk, Martin Pacesa, Lennart Nickel, Bruno E Correia, Nikhil Haas, Ahmad Qamar, Chance A Challacombe, Patrick Kidger, Constance Ferragu, et al. Crowd- sourced protein design: Lessons from the adaptyv egfr binder competition.bioRxiv, pages 2025–04, 2025

  16. [16]

    De novo protein design: fully automated sequence selection.Science, 278(5335):82–87, 1997

    Bassil I Dahiyat and Stephen L Mayo. De novo protein design: fully automated sequence selection.Science, 278(5335):82–87, 1997

  17. [17]

    Toward de novo protein design from natural language

    Fengyuan Dai, Shiyang You, Yudian Zhu, Yuan Gao, Lihao Fu, Xibin Zhou, Jin Su, Chentong Wang, Yuliang Fan, Xiaoxiao Ma, et al. Toward de novo protein design from natural language. BioRxiv, 2024

  18. [18]

    Robust deep learning–based protein sequence design using proteinmpnn.Science, 378(6615):49–56, 2022

    Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J Ragotte, Lukas F Milles, Basile IM Wicky, Alexis Courbet, Rob J de Haas, Neville Bethel, et al. Robust deep learning–based protein sequence design using proteinmpnn.Science, 378(6615):49–56, 2022

  19. [19]

    Atomic context-conditioned protein sequence design using ligandmpnn.Nature Methods, 22(4):717–723, 2025

    Justas Dauparas, Gyu Rie Lee, Robert Pecoraro, Linna An, Ivan Anishchenko, Cameron Glasscock, and David Baker. Atomic context-conditioned protein sequence design using ligandmpnn.Nature Methods, 22(4):717–723, 2025

  20. [20]

    Gemini 3.1 pro - model card

    Google DeepMind. Gemini 3.1 pro - model card. https://deepmind.google/models/ model-cards/gemini-3-1-pro/, February 2026

  21. [21]

    Deepseek-v4: Towards highly efficient million-token context in- telligence

    Deepseek-AI. Deepseek-v4: Towards highly efficient million-token context in- telligence. https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/ DeepSeek_V4.pdf, 2026

  22. [22]

    Bronstein, Martin Steinegger, Emine Kucukbenli, Arash Vahdat, and Karsten Kreis

    Kieran Didi, Zuobai Zhang, Guoqing Zhou, Danny Reidenbach, Zhonglin Cao, Sooyoung Cha, Tomas Geffner, Christian Dallago, Jian Tang, Michael M. Bronstein, Martin Steinegger, Emine Kucukbenli, Arash Vahdat, and Karsten Kreis. Scaling atomistic protein binder design with generative pretraining and test-time compute. InThe Fourteenth International Conference ...

  23. [23]

    Trans- fer learning to leverage larger datasets for improved prediction of protein stability changes

    Henry Dieckhaus, Michael Brocidiacono, Nicholas Z Randolph, and Brian Kuhlman. Trans- fer learning to leverage larger datasets for improved prediction of protein stability changes. Proceedings of the national academy of sciences, 121(6):e2314853121, 2024

  24. [24]

    Bioreason-pro: Advancing protein function prediction with multimodal biological reasoning.bioRxiv, 2026

    Adibvafa Fallahpour, Arman Seyed-Ahmadi, Parsa Idehpour, Omar Ibrahim, Purav Gupta, Jack Naimer, Kevin Zhu, Arnav Shah, Shihao Ma, Abhinav Adduri, et al. Bioreason-pro: Advancing protein function prediction with multimodal biological reasoning.bioRxiv, 2026

  25. [25]

    Mol-instructions: A large-scale biomolecular instruction dataset for large language models.arXiv preprint arXiv:2306.08018, 2023

    Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, and Huajun Chen. Mol-instructions: A large-scale biomolecular instruction dataset for large language models.arXiv preprint arXiv:2306.08018, 2023

  26. [26]

    Protgpt2 is a deep unsupervised language model for protein design.Nature communications, 13(1):4348, 2022

    Noelia Ferruz, Steffen Schmidt, and Birte Höcker. Protgpt2 is a deep unsupervised language model for protein design.Nature communications, 13(1):4348, 2022

  27. [27]

    Zhangyang Gao, Cheng Tan, Yijie Zhang, Xingran Chen, Lirong Wu, and Stan Z. Li. Pro- teininvbench: Benchmarking protein inverse folding on diverse tasks, models, and metrics. In Advances in Neural Information Processing Systems, 2023

  28. [28]

    Structure-based protein function prediction using graph convolutional networks.Nature communications, 12(1):3168, 2021

    Vladimir Gligorijevi´c, P Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C Taylor, Ian M Fisk, Hera Vlamakis, et al. Structure-based protein function prediction using graph convolutional networks.Nature communications, 12(1):3168, 2021

  29. [29]

    Protein engineering for industrial biocatalysis: principles, approaches, and lessons from engineered petases.Catalysts, 15(2):147, 2025

    Konstantinos Grigorakis, Christina Ferousi, and Evangelos Topakas. Protein engineering for industrial biocatalysis: principles, approaches, and lessons from engineered petases.Catalysts, 15(2):147, 2025

  30. [30]

    Protein design with guided discrete diffusion

    Nate Gruver, Samuel Stanton, Nathan Frey, Tim GJ Rudner, Isidro Hotzel, Julien Lafrance- Vanasse, Arvind Rajpal, Kyunghyun Cho, and Andrew G Wilson. Protein design with guided discrete diffusion. InAdvances in Neural Information Processing Systems, 2023

  31. [31]

    pykvfinder: an efficient and integrable python package for biomolecular cavity detection and characterization in data science.BMC bioinformatics, 22(1):607, 2021

    João Victor da Silva Guerra, Helder Veras Ribeiro-Filho, Gabriel Ernesto Jara, Leandro Oliveira Bortot, José Geraldo de Carvalho Pereira, and Paulo Sérgio Lopes-de Oliveira. pykvfinder: an efficient and integrable python package for biomolecular cavity detection and characterization in data science.BMC bioinformatics, 22(1):607, 2021

  32. [32]

    Learning sequence, structure, and function representations of proteins with language models.bioRxiv, 2023

    Tymor Hamamsy, Meet Barot, James T Morton, Martin Steinegger, Richard Bonneau, and Kyunghyun Cho. Learning sequence, structure, and function representations of proteins with language models.bioRxiv, 2023

  33. [33]

    Simulating 500 million years of evolution with a language model.Science, 387(6736):850–858, 2025

    Thomas Hayes, Roshan Rao, Halil Akin, Nicholas J Sofroniew, Deniz Oktay, Zeming Lin, Robert Verkuil, Vincent Q Tran, Jonathan Deaton, Marius Wiggert, et al. Simulating 500 million years of evolution with a language model.Science, 387(6736):850–858, 2025

  34. [34]

    Efficient evolution of human antibodies from 11 general protein language models.Nature biotechnology, 42(2):275–283, 2024

    Brian L Hie, Varun R Shanker, Duo Xu, Theodora UJ Bruun, Payton A Weidenbacher, Shaogeng Tang, Wesley Wu, John E Pak, and Peter S Kim. Efficient evolution of human antibodies from 11 general protein language models.Nature biotechnology, 42(2):275–283, 2024

  35. [35]

    Adaptive machine learning for protein engineering.Current opinion in structural biology, 72:145–152, 2022

    Brian L Hie and Kevin K Yang. Adaptive machine learning for protein engineering.Current opinion in structural biology, 72:145–152, 2022

  36. [36]

    Pro-1.https://michaelhla.com/blog/pro1.html, March 2025

    Michael Hla. Pro-1.https://michaelhla.com/blog/pro1.html, March 2025

  37. [37]

    Elucidating the design space of multimodal protein language models.arXiv preprint arXiv:2504.11454, 2025

    Cheng-Yen Hsieh, Xinyou Wang, Daiheng Zhang, Dongyu Xue, Fei Ye, Shujian Huang, Zaixiang Zheng, and Quanquan Gu. Elucidating the design space of multimodal protein language models.arXiv preprint arXiv:2504.11454, 2025

  38. [38]

    Protein2text: Resampling mechanism to translate protein sequences into human-interpretable text

    Ala Jararweh, Oladimeji Macaulay, David Arredondo, Yue Hu, Luis E Tafoya, Kushal Virupak- shappa, and Avinash Sahu. Protein2text: Resampling mechanism to translate protein sequences into human-interpretable text. InProceedings of the 2025 Conference of the Nations of the Amer- icas Chapter of the Association for Computational Linguistics: Human Language T...

  39. [39]

    A multi-modal llm for dynamic protein-ligand interactions and generative molecular design.bioRxiv, 2025

    Haoran Jing and Yutong Miao. A multi-modal llm for dynamic protein-ligand interactions and generative molecular design.bioRxiv, 2025

  40. [40]

    Dictionary of protein secondary structure: pattern recogni- tion of hydrogen-bonded and geometrical features.Biopolymers, 22(12):2577–2637, 1983

    Wolfgang Kabsch and Chris Sander. Dictionary of protein secondary structure: pattern recogni- tion of hydrogen-bonded and geometrical features.Biopolymers, 22(12):2577–2637, 1983

  41. [41]

    Pubchem 2023 update

    Sunghwan Kim, Jie Chen, Tingjun Cheng, Asta Gindulyte, Jia He, et al. Pubchem 2023 update. Nucleic Acids Research, 51(D1):D1373–D1380, 2023

  42. [42]

    Improving protein optimization with smoothed fitness landscapes.arXiv preprint arXiv:2307.00494, 2023

    Andrew Kirjner, Jason Yim, Raman Samusevich, Shahar Bracha, Tommi Jaakkola, Regina Barzilay, and Ila Fiete. Improving protein optimization with smoothed fitness landscapes.arXiv preprint arXiv:2307.00494, 2023

  43. [43]

    Sequence-structure-function relationships in the microbial protein universe.Nature communications, 14(1):2351, 2023

    Julia Koehler Leman, Pawel Szczerbiak, P Douglas Renfrew, Vladimir Gligorijevic, Daniel Berenberg, Tommi Vatanen, Bryn C Taylor, Chris Chandler, Stefan Janssen, Andras Pataki, et al. Sequence-structure-function relationships in the microbial protein universe.Nature communications, 14(1):2351, 2023

  44. [44]

    Rational and semirational protein design.Protein engineering: methods and protocols, pages 15–23, 2017

    Ivan V Korendovych. Rational and semirational protein design.Protein engineering: methods and protocols, pages 15–23, 2017

  45. [45]

    De novo protein design—from new structures to programmable functions

    Tanja Kortemme. De novo protein design—from new structures to programmable functions. Cell, 187(3):526–544, 2024

  46. [46]

    Pdfbench: A benchmark for de novo protein design from function.arXiv preprint arXiv:2505.20346, 2025

    Jiahao Kuang, Nuowei Liu, Jie Wang, Changzhi Sun, Tao Ji, and Yuanbin Wu. Pdfbench: A benchmark for de novo protein design from function.arXiv preprint arXiv:2505.20346, 2025

  47. [47]

    Conditional generative modeling for de novo protein design with hierarchical functions.Bioinformatics, 38(13):3454– 3461, 2022

    Tim Kucera, Matteo Togninalli, and Laetitia Meng-Papaxanthos. Conditional generative modeling for de novo protein design with hierarchical functions.Bioinformatics, 38(13):3454– 3461, 2022

  48. [48]

    Design of a novel globular protein fold with atomic-level accuracy.Science, 302(5649):1364–1368, 2003

    Brian Kuhlman, Gautam Dantas, Gregory C Ireton, Gabriele Varani, Barry L Stoddard, and David Baker. Design of a novel globular protein fold with atomic-level accuracy.Science, 302(5649):1364–1368, 2003

  49. [49]

    A model-centric review of deep learning for protein design.arXiv preprint arXiv:2502.19173, 2025

    Gregory W Kyro, Tianyin Qiu, and Victor S Batista. A model-centric review of deep learning for protein design.arXiv preprint arXiv:2502.19173, 2025

  50. [50]

    ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules

    Andrew Leaver-Fay, Michael Tyka, Steven M Lewis, Oliver F Lange, James Thompson, Ron Jacak, Kristian Kaufman, P Douglas Renfrew, Colin A Smith, Will Sheffler, Ian W Davis, Seth Cooper, Adrien Treuille, Daniel J Mandell, Florian Richter, Yih-En Andrew Ban, Sarel J Fleishman, Jacob E Corn, David E Kim, Sergey Lyskov, Monica Berrondo, Stuart Mentzer, Zoran P...

  51. [51]

    DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

    Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al. Deepseek-v3. 2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556, 2025

  52. [52]

    Protein design with dynamic protein vocabulary

    Nuowei Liu, Jiahao Kuang, Yanting Liu, Tao Ji, Changzhi Sun, Man Lan, and Yuanbin Wu. Protein design with dynamic protein vocabulary. InAdvances in Neural Information Processing Systems, 2026

  53. [53]

    A text-guided protein design framework

    Shengchao Liu, Yanjing Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao, et al. A text-guided protein design framework. Nature Machine Intelligence, 7(4):580–591, 2025

  54. [54]

    Prollama: A protein large language model for multi-task protein language processing.IEEE Transactions on Artificial Intelligence, 2025

    Liuzhenghao Lv, Zongying Lin, Hao Li, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, and Yonghong Tian. Prollama: A protein large language model for multi-task protein language processing.IEEE Transactions on Artificial Intelligence, 2025

  55. [55]

    Prottex: Structure-in-context reasoning and 12 editing of proteins with large language models.Journal of Chemical Information and Modeling, 65(13):6599–6612, 2025

    Zicheng Ma, Chuanliu Fan, Zhicong Wang, Zhenyu Chen, Xiaohan Lin, Yanheng Li, Shihao Feng, Ziqiang Cao, Jun Zhang, and Yi Qin Gao. Prottex: Structure-in-context reasoning and 12 editing of proteins with large language models.Journal of Chemical Information and Modeling, 65(13):6599–6612, 2025

  56. [56]

    Zymctrl: a conditional language model for the controllable generation of artificial enzymes

    Geraldene Munsamy, Sebastian Lindner, Philipp Lorenz, and Noelia Ferruz. Zymctrl: a conditional language model for the controllable generation of artificial enzymes. InNeurIPS machine learning in structural biology workshop, 2022

  57. [57]

    Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023

    Erik Nijkamp, Jeffrey A Ruffolo, Eli N Weinstein, Nikhil Naik, and Ali Madani. Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023

  58. [58]

    Machine learning for functional protein design.Nature biotechnology, 42(2):216–228, 2024

    Pascal Notin, Nathan Rollins, Yarin Gal, Chris Sander, and Debora Marks. Machine learning for functional protein design.Nature biotechnology, 42(2):216–228, 2024

  59. [59]

    Accelerating life sciences research with retro biosciences

    OpenAI. Accelerating life sciences research with retro biosciences. https://openai. com/index/accelerating-life-sciences-research-with-retro-biosciences/ , August 2025

  60. [60]

    Introducing gpt-5

    OpenAI. Introducing gpt-5. https://openai.com/index/introducing-gpt-5/, August 2025

  61. [61]

    Introducing gpt-rosalind for life sciences research

    OpenAI. Introducing gpt-rosalind for life sciences research. https://openai.com/index/ introducing-gpt-rosalind/, 2026

  62. [62]

    Design and engineering of miniproteins.ACS bio & med Chem Au, 2(4):316–327, 2022

    Katarzyna O˙zga and Łukasz Berlicki. Design and engineering of miniproteins.ACS bio & med Chem Au, 2(4):316–327, 2022

  63. [63]

    Bindcraft: one-shot design of functional protein binders.BioRxiv, pages 2024–09, 2024

    Martin Pacesa, Lennart Nickel, Christian Schellhaas, Joseph Schmidt, Ekaterina Pyatova, Lucas Kissling, Patrick Barendse, Jagrity Choudhury, Srajan Kapoor, Ana Alcaraz-Serna, et al. Bindcraft: one-shot design of functional protein binders.BioRxiv, pages 2024–09, 2024

  64. [64]

    Proteincrow: A language model agent that can design proteins

    Manvitha Ponnapati, Sam Cox, Cade W Gordon, Michael J Hammerling, Siddharth Narayanan, Jon M Laurent, James D Braza, Michaela M Hinks, Michael D Skarlinski, Samuel G Rodriques, et al. Proteincrow: A language model agent that can design proteins. InICML 2025 Generative AI and Biology (GenBio) Workshop, 2025

  65. [65]

    Qwen3.5: Towards native multimodal agents

    Qwen. Qwen3.5: Towards native multimodal agents. https://qwen.ai/blog?id=qwen3.5, February 2026

  66. [66]

    A generalized protein design ml model enables generation of functional de novo proteins

    Timothy P Riley, Oleg Matusovsky, Mohammad S Parsa, Pourya Kalantari, Kooshiar Azimian, and Kathy Y Wei. A generalized protein design ml model enables generation of functional de novo proteins. InICLR 2025 Workshop on Generative and Experimental Perspectives for Biomolecular Design, 2025

  67. [67]

    Liveproteinbench: A contamination-free benchmark for assessing models’ specialized capabilities in protein science.arXiv preprint arXiv:2512.22257, 2025

    Dingyi Rong, Zijian Chen, Qi Jia, Kaiwei Zhang, Haotian Lu, Guangtao Zhai, and Ning Liu. Liveproteinbench: A contamination-free benchmark for assessing models’ specialized capabilities in protein science.arXiv preprint arXiv:2512.22257, 2025

  68. [68]

    Protein sequence design and its applications.Current opinion in structural biology, 37:71–80, 2016

    Sankaran Sandhya, Richa Mudgal, Gayatri Kumar, Ramanathan Sowdhamini, and Narayanaswamy Srinivasan. Protein sequence design and its applications.Current opinion in structural biology, 37:71–80, 2016

  69. [69]

    Improved protein binder design usingβ-pairing targeted rfdiffusion.Nature Communications, 2026

    Isaac Sappington, Martin Toul, David S Lee, Stephanie A Robinson, Inna Goreshnik, Clara McCurdy, Tung Ching Chan, Nic Buchholz, Buwei Huang, Dionne Vafeados, et al. Improved protein binder design usingβ-pairing targeted rfdiffusion.Nature Communications, 2026

  70. [70]

    Brenda, the enzyme database: updates and major new developments

    Ida Schomburg, Antje Chang, Christian Ebeling, Marion Gremse, Christian Heldt, Gregor Huhn, and Dietmar Schomburg. Brenda, the enzyme database: updates and major new developments. Nucleic acids research, 32(suppl_1):D431–D433, 2004

  71. [71]

    A fine-tuning dataset and benchmark for large language models for protein understanding

    Yiqing Shen, Zan Chen, Michail Mamalakis, Luhan He, Haiyang Xia, Tianbin Li, Yanzhou Su, Junjun He, and Yu Guang Wang. A fine-tuning dataset and benchmark for large language models for protein understanding. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024

  72. [72]

    Toursynbio: A multi-modal large model and agent framework to bridge text and protein sequences for protein engineering

    Yiqing Shen, Zan Chen, Michail Mamalakis, Yungeng Liu, Tianbin Li, Yanzhou Su, Junjun He, Pietro Liò, and Yu Guang Wang. Toursynbio: A multi-modal large model and agent framework to bridge text and protein sequences for protein engineering. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2382–2389. IEEE, 2024

  73. [73]

    New and continuing developments at prosite.Nucleic Acids Research, 41(D1):D344–D347, 2013

    Christian JA Sigrist, Edouard de Castro, Lorenzo Cerutti, Bruno A Cuche, Nicolas Hulo, Alan Bridge, Laurent Bougueleret, and Ioannis Xenarios. New and continuing developments at prosite.Nucleic Acids Research, 41(D1):D344–D347, 2013

  74. [74]

    Scalable Agentic Reasoning for Designing Biologics Targeting Intrinsically Disordered Proteins

    Matthew Sinclair, Moeen Meigooni, Archit Vasan, Ozan Gokdemir, Xinran Lian, Heng Ma, Yadu Babuji, Alexander Brace, Khalid Hossain, Carlo Siebenschuh, et al. Scalable agentic reasoning for designing biologics targeting intrinsically disordered proteins.arXiv preprint arXiv:2512.15930, 2025

  75. [75]

    Instructpro: Natural language guided ligand-binding protein design.arXiv preprint arXiv:2506.09332, 2025

    Zhenqiao Song, Ramith Hettiarachchi, Chuan Li, Jianwen Xie, and Lei Li. Instructpro: Natural language guided ligand-binding protein design.arXiv preprint arXiv:2506.09332, 2025. 13

  76. [76]

    Boltzgen: Toward universal binder design.bioRxiv, pages 2025–11, 2025

    Hannes Stark, Felix Faltings, MinGyu Choi, Yuxin Xie, Eunsu Hur, Timothy O’Donnell, Anton Bushuiev, Talip Uçar, Saro Passaro, Weian Mao, et al. Boltzgen: Toward universal binder design.bioRxiv, pages 2025–11, 2025

  77. [77]

    Venusfactory: A unified platform for protein engineering data retrieval and language model fine-tuning.arXiv preprint arXiv:2503.15438, 2025

    Yang Tan, Chen Liu, Jingyuan Gao, Banghao Wu, Mingchen Li, Ruilin Wang, Lingrong Zhang, Huiqun Yu, Guisheng Fan, Liang Hong, et al. Venusfactory: A unified platform for protein engineering data retrieval and language model fine-tuning.arXiv preprint arXiv:2503.15438, 2025

  78. [78]

    Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, et al. Kimi k2. 5: Visual agentic intelligence.arXiv preprint arXiv:2602.02276, 2026

  79. [79]

    Protenix-v1: Toward high-accuracy open-source biomolecular structure prediction.bioRxiv, 2026

    Protenix Team, Yuxuan Zhang, Chengyue Gong, Hanyu Zhang, Wenzhi Ma, Zhenyu Liu, Xinshi Chen, Jiaqi Guan, Lan Wang, Yanping Yang, et al. Protenix-v1: Toward high-accuracy open-source biomolecular structure prediction.bioRxiv, 2026

  80. [80]

    Engineering strategies to overcome the stability–function trade-off in proteins.ACS synthetic biology, 11(3):1030–1039, 2022

    Magdalena Teufl, Charlotte U Zajc, and Michael W Traxlmayr. Engineering strategies to overcome the stability–function trade-off in proteins.ACS synthetic biology, 11(3):1030–1039, 2022

Showing first 80 references.