arxiv: 2605.10978 · v2 · submitted 2026-05-09 · 🧬 q-bio.QM

Recognition: 2 theorem links

· Lean Theorem

VibeProteinBench: An Evaluation Benchmark for Language-interfaced Vibe Protein Design

Hyunjin Seo , Hongjoon Ahn , Jimin Park , Sungjun Han , Gyubok Lee , Soojung Yang , Joseph S Brown , Leo Chen

show 12 more authors

Gina El Nesr Feyisayo Eweje Sarah Gurev Hyejin Lee Cheng-Hao Liu Junlang Liu Zhihui Qi Jason Yang Gyu Rie Lee Sungsoo Ahn Sangwon Jung Jamin Shin

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:08 UTC · model grok-4.3

classification 🧬 q-bio.QM

keywords protein designlarge language modelsbenchmark evaluationsequence generationnatural language interfacecomputational biology

0 comments

The pith

No large language model yet masters the full workflow of language-based protein design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VibeProteinBench to test how well language models can handle protein design through open-ended natural language. It breaks the task into three linked stages that mirror real design work: recognizing properties from descriptions, engineering changes to existing proteins, and generating entirely new sequences. Each stage uses expert rationales and computational checks to judge whether outputs are plausible. When the authors ran many general and specialized models through the benchmark, none performed strongly across every stage. This suggests that building a single model capable of flexible, generalist protein design from language remains an open problem.

Core claim

VibeProteinBench evaluates LLMs on a complete protein design workflow via three stages—recognition, engineering, and generation—each supported by mechanistic rationales and in silico validation. Results across diverse models show that no single model achieves strong performance in all three stages at once.

What carries the argument

VibeProteinBench, a language-interfaced benchmark that structures protein design evaluation into recognition, engineering, and generation stages with expert-curated rationales and multi-faceted in silico checks.

Load-bearing premise

That performance across the three stages plus in silico validation gives a reliable picture of broad competence at open-ended protein design.

What would settle it

Discovery of any single model that scores strongly on recognition, engineering, and generation tasks when tested on the full VibeProteinBench suite.

Figures

Figures reproduced from arXiv: 2605.10978 by Cheng-Hao Liu, Feyisayo Eweje, Gina El Nesr, Gyubok Lee, Gyu Rie Lee, Hongjoon Ahn, Hyejin Lee, Hyunjin Seo, Jamin Shin, Jason Yang, Jimin Park, Joseph S Brown, Junlang Liu, Leo Chen, Sangwon Jung, Sarah Gurev, Soojung Yang, Sungjun Han, Sungsoo Ahn, Zhihui Qi.

**Figure 1.** Figure 1: Example queries across the three stages of VIBEPROTEINBENCH. Each query takes the form of a natural-language task instruction paired with the stage-specific context required to express the design intent. Recognition queries follow a question-answering format with a candidate answer set. Engineering queries supply a wild-type sequence together with mechanistic rationales partitioned into defective positions… view at source ↗

**Figure 2.** Figure 2: Evaluation dataset construction pipeline of [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Cross-stage correlations between recognition, engineering, and generation performance of baseline LLMs. DS denotes DeepSeek and TxG indicates TxGemma-chat models. For detailed subtask-wise experiment results on all subtasks of recognition, engineering, and generation in our benchmark, please refer to the Appendix B. 5.3 Quantitative analysis Results of non-LLM text-protein generation models. To examine wh… view at source ↗

**Figure 4.** Figure 4: Correlations between different subtask performance of baseline LLMs. DS denotes [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Snapshot of expert review interface for solubility engineering query. [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

read the original abstract

Protein design aims to compose amino-acid sequences that fold into stable three-dimensional structures while satisfying targeted functional properties. The field is increasingly shifting toward vibe protein design, where a single model is expected to generate novel sequences, engineer existing proteins, and reason about protein characteristics through flexible natural-language constraints. Large language models (LLMs) have emerged as a leading paradigm in this space. However, existing evaluation benchmarks often limit their scope to a partial aspect of protein design, while others restrict design objectives to structured input schemas, lacking an integrated framework that evaluates the broad spectrum of protein design competence under open-ended intents. To this end, we present Vibe Protein design Benchmark (VibeProteinBench), a language-interfaced benchmark that probes generalist capabilities through three complementary stages mirroring a computational protein design workflow: recognition, engineering, and generation. Each stage is grounded in expert-curated mechanistic rationales and multi-faceted in silico validation, to computationally verify whether model outputs are biologically plausible. Evaluations across diverse general-purpose and domain-specialized LLMs reveal that no model achieves strong performance across all three stages, suggesting that generalist protein design remains a substantial open challenge for current LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VibeProteinBench sets up a three-stage language benchmark for protein design but its in silico validation needs more scrutiny to support claims about LLM limits.

read the letter

The main thing here is VibeProteinBench, a new benchmark for testing LLMs on protein design using open natural language prompts across three stages: recognizing protein traits, engineering existing sequences, and generating new ones. The evaluations show no current model handles all stages strongly, which the authors read as a sign that generalist protein design is still an open problem for LLMs. The paper does a good job pointing out how earlier benchmarks were limited to single aspects or forced structured inputs instead of flexible language. It then sets up this three-stage setup with expert-curated rationales and in silico validation steps to check if outputs make biological sense. That integrated approach is a step forward for standardizing tests in this area. Where it is softer is the dependence on those in silico checks. Computational predictions for structure and stability can miss the mark on novel proteins, and without data showing how these proxies line up with real experiments or any analysis of the validation's reliability, the low model scores could stem from benchmark artifacts rather than true design shortcomings. The abstract lacks the quantitative details needed to judge this fully. The work is for folks in AI-driven biology who need benchmarks for language-interfaced design tasks. Someone evaluating LLMs for practical protein engineering would find the framework and the model comparisons useful to build on, though they'd likely want more robust validation before treating the results as definitive. It deserves peer review to get input on tightening the evaluation methods and confirming the findings.

Referee Report

1 major / 1 minor

Summary. The paper introduces VibeProteinBench, a language-interfaced evaluation benchmark for vibe protein design that probes LLMs through three complementary stages—recognition, engineering, and generation—each supported by expert-curated mechanistic rationales and multi-faceted in silico validation (structure prediction, stability, etc.). Evaluations across general-purpose and domain-specialized LLMs show that no model achieves strong performance across all stages, leading to the conclusion that generalist protein design under open-ended natural-language intents remains a substantial open challenge.

Significance. If the benchmark's three-stage structure and validation suite prove reliable, the result would be significant for the field: it supplies an integrated framework that moves beyond partial or schema-restricted evaluations, directly demonstrating current LLM limitations on the full computational protein design workflow. The expert-curated rationales and computational grounding are strengths that could help standardize assessment of broad competence.

major comments (1)

[Evaluation sections] The headline claim (no model strong across all stages, hence generalist design remains open) is load-bearing on the assumption that the three stages plus expert rationales and in silico checks form a sufficient proxy for competence. The multi-faceted in silico validation (structure prediction, stability, etc.) is known to yield false positives for de novo sequences; the manuscript does not report correlation to experimental outcomes or ablation of the validation suite itself. This leaves open the possibility that low scores reflect benchmark noise rather than fundamental limits. (Evaluation sections)

minor comments (1)

[Abstract] Abstract supplies no quantitative results, error bars, or key metrics; adding a sentence with headline performance numbers would make the high-level finding more informative.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing the need for rigorous validation of the benchmark's proxy measures. We address the major comment below and will incorporate revisions to strengthen the discussion of limitations.

read point-by-point responses

Referee: [Evaluation sections] The headline claim (no model strong across all stages, hence generalist design remains open) is load-bearing on the assumption that the three stages plus expert rationales and in silico checks form a sufficient proxy for competence. The multi-faceted in silico validation (structure prediction, stability, etc.) is known to yield false positives for de novo sequences; the manuscript does not report correlation to experimental outcomes or ablation of the validation suite itself. This leaves open the possibility that low scores reflect benchmark noise rather than fundamental limits. (Evaluation sections)

Authors: We agree that in silico methods such as structure prediction and stability estimation are known to produce false positives for de novo sequences and that the absence of experimental correlation or ablation studies leaves room for benchmark noise to influence scores. Our multi-faceted validation suite, grounded in expert-curated mechanistic rationales, represents a standard computational proxy used in the field to assess biological plausibility at scale, but it is not a substitute for wet-lab confirmation. In the revised manuscript we will add an explicit limitations subsection in the Discussion that (i) acknowledges the risk of false positives, (ii) states that low scores may partly reflect validation noise, and (iii) clarifies that the benchmark functions as an initial filter rather than definitive proof of model limits. We will also report an ablation of the validation components on a subset of tasks where data allow. Full experimental correlation remains outside the scope of this computational benchmark paper. revision: partial

Circularity Check

0 steps flagged

No circularity: benchmark proposal with independent empirical evaluation

full rationale

The paper proposes VibeProteinBench, a three-stage language-interfaced benchmark (recognition, engineering, generation) for evaluating LLMs on protein design tasks. It applies the benchmark to existing general-purpose and domain-specialized models, reports performance scores, and concludes that no model excels across all stages. There are no equations, parameter fittings, derivations, or self-citations that reduce the central claim to its own inputs by construction. The evaluation uses expert-curated rationales and in silico checks as proxies, but these are external to any model training or prior self-referential results. The work is self-contained as a new benchmark definition plus independent testing, with no load-bearing steps that collapse into tautology or fitted renaming.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The benchmark rests on domain assumptions about the sufficiency of in silico checks and expert rationales rather than new fitted parameters or invented entities.

axioms (1)

domain assumption Expert-curated mechanistic rationales combined with in silico validation can reliably assess biological plausibility of model outputs
Invoked to ground each of the three evaluation stages.

pith-pipeline@v0.9.0 · 5589 in / 1111 out tokens · 23661 ms · 2026-05-14T22:08:24.671552+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

99 extracted references · 99 canonical work pages · 3 internal anchors

[1]

Computational protein design.Nature Reviews Methods Primers, 5(1):13, 2025

Katherine I Albanese, Sophie Barbe, Shunsuke Tagami, Derek N Woolfson, and Thomas Schiex. Computational protein design.Nature Reviews Methods Primers, 5(1):13, 2025

work page 2025
[2]

Claude opus 4.6

Anthropic. Claude opus 4.6. https://www-cdn.anthropic.com/ 4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf, 2025

work page 2025
[3]

Miniprotein design: past, present, and prospects.Accounts of Chemical Research, 50(9):2085–2092, 2017

Emily G Baker, Gail J Bartlett, Kathryn L Porter Goff, and Derek N Woolfson. Miniprotein design: past, present, and prospects.Accounts of Chemical Research, 50(9):2085–2092, 2017

work page 2085
[4]

Atomically accurate de novo design of antibodies with rfdiffusion.Nature, 649(8095):183–193, 2026

Nathaniel R Bennett, Joseph L Watson, Robert J Ragotte, Andrew J Borst, DéJenaé L See, Connor Weidle, Riti Biswas, Yutong Yu, Ellen L Shrock, Russell Ault, et al. Atomically accurate de novo design of antibodies with rfdiffusion.Nature, 649(8095):183–193, 2026

work page 2026
[5]

The interpro protein families and domains database: 20 years on.Nucleic Acids Research, 49(D1):D344–D354, 2021

Matthias Blum, Hsin-Yu Chang, Sara Chuguransky, Teresa Grego, Selvam Kandasaamy, Alex Mitchell, Gaurav Nuka, Typhaine Paysan-Lafosse, Matloob Qureshi, Surabhi Raj, et al. The interpro protein families and domains database: 20 years on.Nucleic Acids Research, 49(D1):D344–D354, 2021

work page 2021
[6]

Burley, Charmi Bhikadiya, Chunxiao Bi, Sebastian Bittrich, Li Chen, et al

Stephen K. Burley, Charmi Bhikadiya, Chunxiao Bi, Sebastian Bittrich, Li Chen, et al. Rcsb protein data bank: powerful new tools for exploring 3d structures of biological macromolecules. Nucleic Acids Research, 52(D1):D480–D491, 2024

work page 2024
[7]

Pqa: Zero-shot protein question answering for free-form scientific enquiry with large language models.arXiv preprint arXiv:2402.13653, 2024

Eli M Carrami and Sahand Sharifzadeh. Pqa: Zero-shot protein question answering for free-form scientific enquiry with large language models.arXiv preprint arXiv:2402.13653, 2024

work page arXiv 2024
[8]

Pyrosetta: a script-based interface for implementing molecular modeling algorithms using rosetta.Bioinformatics, 26(5):689–691, 2010

Sidhartha Chaudhury, Sergey Lyskov, and Jeffrey J Gray. Pyrosetta: a script-based interface for implementing molecular modeling algorithms using rosetta.Bioinformatics, 26(5):689–691, 2010

work page 2010
[9]

Cheng, R

H. Cheng, R. D. Schaeffer, Y . Liao, L. N. Kinch, and N. V . Grishin. Ecod: An evolutionary classification of protein domains.PLoS Computational Biology, 10(12):e1003926, 2014

work page 2014
[10]

Stable de novo protein design via joint conformational landscape and sequence optimization

Yehlin Cho, Justas Dauparas, Kotaro Tsuboyama, Gabriel J Rocklin, and Sergey Ovchinnikov. Stable de novo protein design via joint conformational landscape and sequence optimization. Nature Communications, 2025

work page 2025
[11]

Sparks of function by de novo protein design

Alexander E Chu, Tianyu Lu, and Po-Ssu Huang. Sparks of function by de novo protein design. Nature biotechnology, 42(2):203–215, 2024

work page 2024
[12]

Rational protein design.Current Opinion in Structural Biology, 97:103224, 2026

Joel J Chubb, Aimee L Boyle, and Katherine I Albanese. Rational protein design.Current Opinion in Structural Biology, 97:103224, 2026

work page 2026
[13]

Biopython: freely available python tools for computational molecular biology and bioinformatics.Bioinformatics, 25(11):1422–1423, 2009

Peter JA Cock, Tiago Antao, Jeffrey T Chang, Brad A Chapman, Cymon J Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, and Michiel JL de Hoon. Biopython: freely available python tools for computational molecular biology and bioinformatics.Bioinformatics, 25(11):1422–1423, 2009

work page 2009
[14]

Sophie Colette, Jaldert François, Bart De Moor, and Vera van Noort. Ogtfinder: A curated growth temperature data set and its application to predict optimal growth temperatures of bacteria and archaea.Journal of Chemical Information and Modeling, 2026. 10

work page 2026
[15]

Crowd- sourced protein design: Lessons from the adaptyv egfr binder competition.bioRxiv, pages 2025–04, 2025

Tudor-Stefan Cotet, Igor Krawczuk, Martin Pacesa, Lennart Nickel, Bruno E Correia, Nikhil Haas, Ahmad Qamar, Chance A Challacombe, Patrick Kidger, Constance Ferragu, et al. Crowd- sourced protein design: Lessons from the adaptyv egfr binder competition.bioRxiv, pages 2025–04, 2025

work page 2025
[16]

De novo protein design: fully automated sequence selection.Science, 278(5335):82–87, 1997

Bassil I Dahiyat and Stephen L Mayo. De novo protein design: fully automated sequence selection.Science, 278(5335):82–87, 1997

work page 1997
[17]

Toward de novo protein design from natural language

Fengyuan Dai, Shiyang You, Yudian Zhu, Yuan Gao, Lihao Fu, Xibin Zhou, Jin Su, Chentong Wang, Yuliang Fan, Xiaoxiao Ma, et al. Toward de novo protein design from natural language. BioRxiv, 2024

work page 2024
[18]

Robust deep learning–based protein sequence design using proteinmpnn.Science, 378(6615):49–56, 2022

Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J Ragotte, Lukas F Milles, Basile IM Wicky, Alexis Courbet, Rob J de Haas, Neville Bethel, et al. Robust deep learning–based protein sequence design using proteinmpnn.Science, 378(6615):49–56, 2022

work page 2022
[19]

Atomic context-conditioned protein sequence design using ligandmpnn.Nature Methods, 22(4):717–723, 2025

Justas Dauparas, Gyu Rie Lee, Robert Pecoraro, Linna An, Ivan Anishchenko, Cameron Glasscock, and David Baker. Atomic context-conditioned protein sequence design using ligandmpnn.Nature Methods, 22(4):717–723, 2025

work page 2025
[20]

Gemini 3.1 pro - model card

Google DeepMind. Gemini 3.1 pro - model card. https://deepmind.google/models/ model-cards/gemini-3-1-pro/, February 2026

work page 2026
[21]

Deepseek-v4: Towards highly efficient million-token context in- telligence

Deepseek-AI. Deepseek-v4: Towards highly efficient million-token context in- telligence. https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/ DeepSeek_V4.pdf, 2026

work page 2026
[22]

Bronstein, Martin Steinegger, Emine Kucukbenli, Arash Vahdat, and Karsten Kreis

Kieran Didi, Zuobai Zhang, Guoqing Zhou, Danny Reidenbach, Zhonglin Cao, Sooyoung Cha, Tomas Geffner, Christian Dallago, Jian Tang, Michael M. Bronstein, Martin Steinegger, Emine Kucukbenli, Arash Vahdat, and Karsten Kreis. Scaling atomistic protein binder design with generative pretraining and test-time compute. InThe Fourteenth International Conference ...

work page 2026
[23]

Trans- fer learning to leverage larger datasets for improved prediction of protein stability changes

Henry Dieckhaus, Michael Brocidiacono, Nicholas Z Randolph, and Brian Kuhlman. Trans- fer learning to leverage larger datasets for improved prediction of protein stability changes. Proceedings of the national academy of sciences, 121(6):e2314853121, 2024

work page 2024
[24]

Bioreason-pro: Advancing protein function prediction with multimodal biological reasoning.bioRxiv, 2026

Adibvafa Fallahpour, Arman Seyed-Ahmadi, Parsa Idehpour, Omar Ibrahim, Purav Gupta, Jack Naimer, Kevin Zhu, Arnav Shah, Shihao Ma, Abhinav Adduri, et al. Bioreason-pro: Advancing protein function prediction with multimodal biological reasoning.bioRxiv, 2026

work page 2026
[25]

Mol-instructions: A large-scale biomolecular instruction dataset for large language models.arXiv preprint arXiv:2306.08018, 2023

Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, and Huajun Chen. Mol-instructions: A large-scale biomolecular instruction dataset for large language models.arXiv preprint arXiv:2306.08018, 2023

work page arXiv 2023
[26]

Protgpt2 is a deep unsupervised language model for protein design.Nature communications, 13(1):4348, 2022

Noelia Ferruz, Steffen Schmidt, and Birte Höcker. Protgpt2 is a deep unsupervised language model for protein design.Nature communications, 13(1):4348, 2022

work page 2022
[27]

Zhangyang Gao, Cheng Tan, Yijie Zhang, Xingran Chen, Lirong Wu, and Stan Z. Li. Pro- teininvbench: Benchmarking protein inverse folding on diverse tasks, models, and metrics. In Advances in Neural Information Processing Systems, 2023

work page 2023
[28]

Structure-based protein function prediction using graph convolutional networks.Nature communications, 12(1):3168, 2021

Vladimir Gligorijevi´c, P Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C Taylor, Ian M Fisk, Hera Vlamakis, et al. Structure-based protein function prediction using graph convolutional networks.Nature communications, 12(1):3168, 2021

work page 2021
[29]

Protein engineering for industrial biocatalysis: principles, approaches, and lessons from engineered petases.Catalysts, 15(2):147, 2025

Konstantinos Grigorakis, Christina Ferousi, and Evangelos Topakas. Protein engineering for industrial biocatalysis: principles, approaches, and lessons from engineered petases.Catalysts, 15(2):147, 2025

work page 2025
[30]

Protein design with guided discrete diffusion

Nate Gruver, Samuel Stanton, Nathan Frey, Tim GJ Rudner, Isidro Hotzel, Julien Lafrance- Vanasse, Arvind Rajpal, Kyunghyun Cho, and Andrew G Wilson. Protein design with guided discrete diffusion. InAdvances in Neural Information Processing Systems, 2023

work page 2023
[31]

pykvfinder: an efficient and integrable python package for biomolecular cavity detection and characterization in data science.BMC bioinformatics, 22(1):607, 2021

João Victor da Silva Guerra, Helder Veras Ribeiro-Filho, Gabriel Ernesto Jara, Leandro Oliveira Bortot, José Geraldo de Carvalho Pereira, and Paulo Sérgio Lopes-de Oliveira. pykvfinder: an efficient and integrable python package for biomolecular cavity detection and characterization in data science.BMC bioinformatics, 22(1):607, 2021

work page 2021
[32]

Learning sequence, structure, and function representations of proteins with language models.bioRxiv, 2023

Tymor Hamamsy, Meet Barot, James T Morton, Martin Steinegger, Richard Bonneau, and Kyunghyun Cho. Learning sequence, structure, and function representations of proteins with language models.bioRxiv, 2023

work page 2023
[33]

Simulating 500 million years of evolution with a language model.Science, 387(6736):850–858, 2025

Thomas Hayes, Roshan Rao, Halil Akin, Nicholas J Sofroniew, Deniz Oktay, Zeming Lin, Robert Verkuil, Vincent Q Tran, Jonathan Deaton, Marius Wiggert, et al. Simulating 500 million years of evolution with a language model.Science, 387(6736):850–858, 2025

work page 2025
[34]

Efficient evolution of human antibodies from 11 general protein language models.Nature biotechnology, 42(2):275–283, 2024

Brian L Hie, Varun R Shanker, Duo Xu, Theodora UJ Bruun, Payton A Weidenbacher, Shaogeng Tang, Wesley Wu, John E Pak, and Peter S Kim. Efficient evolution of human antibodies from 11 general protein language models.Nature biotechnology, 42(2):275–283, 2024

work page 2024
[35]

Adaptive machine learning for protein engineering.Current opinion in structural biology, 72:145–152, 2022

Brian L Hie and Kevin K Yang. Adaptive machine learning for protein engineering.Current opinion in structural biology, 72:145–152, 2022

work page 2022
[36]

Pro-1.https://michaelhla.com/blog/pro1.html, March 2025

Michael Hla. Pro-1.https://michaelhla.com/blog/pro1.html, March 2025

work page 2025
[37]

Elucidating the design space of multimodal protein language models.arXiv preprint arXiv:2504.11454, 2025

Cheng-Yen Hsieh, Xinyou Wang, Daiheng Zhang, Dongyu Xue, Fei Ye, Shujian Huang, Zaixiang Zheng, and Quanquan Gu. Elucidating the design space of multimodal protein language models.arXiv preprint arXiv:2504.11454, 2025

work page arXiv 2025
[38]

Protein2text: Resampling mechanism to translate protein sequences into human-interpretable text

Ala Jararweh, Oladimeji Macaulay, David Arredondo, Yue Hu, Luis E Tafoya, Kushal Virupak- shappa, and Avinash Sahu. Protein2text: Resampling mechanism to translate protein sequences into human-interpretable text. InProceedings of the 2025 Conference of the Nations of the Amer- icas Chapter of the Association for Computational Linguistics: Human Language T...

work page 2025
[39]

A multi-modal llm for dynamic protein-ligand interactions and generative molecular design.bioRxiv, 2025

Haoran Jing and Yutong Miao. A multi-modal llm for dynamic protein-ligand interactions and generative molecular design.bioRxiv, 2025

work page 2025
[40]

Dictionary of protein secondary structure: pattern recogni- tion of hydrogen-bonded and geometrical features.Biopolymers, 22(12):2577–2637, 1983

Wolfgang Kabsch and Chris Sander. Dictionary of protein secondary structure: pattern recogni- tion of hydrogen-bonded and geometrical features.Biopolymers, 22(12):2577–2637, 1983

work page 1983
[41]

Pubchem 2023 update

Sunghwan Kim, Jie Chen, Tingjun Cheng, Asta Gindulyte, Jia He, et al. Pubchem 2023 update. Nucleic Acids Research, 51(D1):D1373–D1380, 2023

work page 2023
[42]

Improving protein optimization with smoothed fitness landscapes.arXiv preprint arXiv:2307.00494, 2023

Andrew Kirjner, Jason Yim, Raman Samusevich, Shahar Bracha, Tommi Jaakkola, Regina Barzilay, and Ila Fiete. Improving protein optimization with smoothed fitness landscapes.arXiv preprint arXiv:2307.00494, 2023

work page arXiv 2023
[43]

Sequence-structure-function relationships in the microbial protein universe.Nature communications, 14(1):2351, 2023

Julia Koehler Leman, Pawel Szczerbiak, P Douglas Renfrew, Vladimir Gligorijevic, Daniel Berenberg, Tommi Vatanen, Bryn C Taylor, Chris Chandler, Stefan Janssen, Andras Pataki, et al. Sequence-structure-function relationships in the microbial protein universe.Nature communications, 14(1):2351, 2023

work page 2023
[44]

Rational and semirational protein design.Protein engineering: methods and protocols, pages 15–23, 2017

Ivan V Korendovych. Rational and semirational protein design.Protein engineering: methods and protocols, pages 15–23, 2017

work page 2017
[45]

De novo protein design—from new structures to programmable functions

Tanja Kortemme. De novo protein design—from new structures to programmable functions. Cell, 187(3):526–544, 2024

work page 2024
[46]

Pdfbench: A benchmark for de novo protein design from function.arXiv preprint arXiv:2505.20346, 2025

Jiahao Kuang, Nuowei Liu, Jie Wang, Changzhi Sun, Tao Ji, and Yuanbin Wu. Pdfbench: A benchmark for de novo protein design from function.arXiv preprint arXiv:2505.20346, 2025

work page arXiv 2025
[47]

Conditional generative modeling for de novo protein design with hierarchical functions.Bioinformatics, 38(13):3454– 3461, 2022

Tim Kucera, Matteo Togninalli, and Laetitia Meng-Papaxanthos. Conditional generative modeling for de novo protein design with hierarchical functions.Bioinformatics, 38(13):3454– 3461, 2022

work page 2022
[48]

Design of a novel globular protein fold with atomic-level accuracy.Science, 302(5649):1364–1368, 2003

Brian Kuhlman, Gautam Dantas, Gregory C Ireton, Gabriele Varani, Barry L Stoddard, and David Baker. Design of a novel globular protein fold with atomic-level accuracy.Science, 302(5649):1364–1368, 2003

work page 2003
[49]

A model-centric review of deep learning for protein design.arXiv preprint arXiv:2502.19173, 2025

Gregory W Kyro, Tianyin Qiu, and Victor S Batista. A model-centric review of deep learning for protein design.arXiv preprint arXiv:2502.19173, 2025

work page arXiv 2025
[50]

ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules

Andrew Leaver-Fay, Michael Tyka, Steven M Lewis, Oliver F Lange, James Thompson, Ron Jacak, Kristian Kaufman, P Douglas Renfrew, Colin A Smith, Will Sheffler, Ian W Davis, Seth Cooper, Adrien Treuille, Daniel J Mandell, Florian Richter, Yih-En Andrew Ban, Sarel J Fleishman, Jacob E Corn, David E Kim, Sergey Lyskov, Monica Berrondo, Stuart Mentzer, Zoran P...

work page 2011
[51]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al. Deepseek-v3. 2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

Protein design with dynamic protein vocabulary

Nuowei Liu, Jiahao Kuang, Yanting Liu, Tao Ji, Changzhi Sun, Man Lan, and Yuanbin Wu. Protein design with dynamic protein vocabulary. InAdvances in Neural Information Processing Systems, 2026

work page 2026
[53]

A text-guided protein design framework

Shengchao Liu, Yanjing Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao, et al. A text-guided protein design framework. Nature Machine Intelligence, 7(4):580–591, 2025

work page 2025
[54]

Prollama: A protein large language model for multi-task protein language processing.IEEE Transactions on Artificial Intelligence, 2025

Liuzhenghao Lv, Zongying Lin, Hao Li, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, and Yonghong Tian. Prollama: A protein large language model for multi-task protein language processing.IEEE Transactions on Artificial Intelligence, 2025

work page 2025
[55]

Prottex: Structure-in-context reasoning and 12 editing of proteins with large language models.Journal of Chemical Information and Modeling, 65(13):6599–6612, 2025

Zicheng Ma, Chuanliu Fan, Zhicong Wang, Zhenyu Chen, Xiaohan Lin, Yanheng Li, Shihao Feng, Ziqiang Cao, Jun Zhang, and Yi Qin Gao. Prottex: Structure-in-context reasoning and 12 editing of proteins with large language models.Journal of Chemical Information and Modeling, 65(13):6599–6612, 2025

work page 2025
[56]

Zymctrl: a conditional language model for the controllable generation of artificial enzymes

Geraldene Munsamy, Sebastian Lindner, Philipp Lorenz, and Noelia Ferruz. Zymctrl: a conditional language model for the controllable generation of artificial enzymes. InNeurIPS machine learning in structural biology workshop, 2022

work page 2022
[57]

Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023

Erik Nijkamp, Jeffrey A Ruffolo, Eli N Weinstein, Nikhil Naik, and Ali Madani. Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023

work page 2023
[58]

Machine learning for functional protein design.Nature biotechnology, 42(2):216–228, 2024

Pascal Notin, Nathan Rollins, Yarin Gal, Chris Sander, and Debora Marks. Machine learning for functional protein design.Nature biotechnology, 42(2):216–228, 2024

work page 2024
[59]

Accelerating life sciences research with retro biosciences

OpenAI. Accelerating life sciences research with retro biosciences. https://openai. com/index/accelerating-life-sciences-research-with-retro-biosciences/ , August 2025

work page 2025
[60]

Introducing gpt-5

OpenAI. Introducing gpt-5. https://openai.com/index/introducing-gpt-5/, August 2025

work page 2025
[61]

Introducing gpt-rosalind for life sciences research

OpenAI. Introducing gpt-rosalind for life sciences research. https://openai.com/index/ introducing-gpt-rosalind/, 2026

work page 2026
[62]

Design and engineering of miniproteins.ACS bio & med Chem Au, 2(4):316–327, 2022

Katarzyna O˙zga and Łukasz Berlicki. Design and engineering of miniproteins.ACS bio & med Chem Au, 2(4):316–327, 2022

work page 2022
[63]

Bindcraft: one-shot design of functional protein binders.BioRxiv, pages 2024–09, 2024

Martin Pacesa, Lennart Nickel, Christian Schellhaas, Joseph Schmidt, Ekaterina Pyatova, Lucas Kissling, Patrick Barendse, Jagrity Choudhury, Srajan Kapoor, Ana Alcaraz-Serna, et al. Bindcraft: one-shot design of functional protein binders.BioRxiv, pages 2024–09, 2024

work page 2024
[64]

Proteincrow: A language model agent that can design proteins

Manvitha Ponnapati, Sam Cox, Cade W Gordon, Michael J Hammerling, Siddharth Narayanan, Jon M Laurent, James D Braza, Michaela M Hinks, Michael D Skarlinski, Samuel G Rodriques, et al. Proteincrow: A language model agent that can design proteins. InICML 2025 Generative AI and Biology (GenBio) Workshop, 2025

work page 2025
[65]

Qwen3.5: Towards native multimodal agents

Qwen. Qwen3.5: Towards native multimodal agents. https://qwen.ai/blog?id=qwen3.5, February 2026

work page 2026
[66]

A generalized protein design ml model enables generation of functional de novo proteins

Timothy P Riley, Oleg Matusovsky, Mohammad S Parsa, Pourya Kalantari, Kooshiar Azimian, and Kathy Y Wei. A generalized protein design ml model enables generation of functional de novo proteins. InICLR 2025 Workshop on Generative and Experimental Perspectives for Biomolecular Design, 2025

work page 2025
[67]

Liveproteinbench: A contamination-free benchmark for assessing models’ specialized capabilities in protein science.arXiv preprint arXiv:2512.22257, 2025

Dingyi Rong, Zijian Chen, Qi Jia, Kaiwei Zhang, Haotian Lu, Guangtao Zhai, and Ning Liu. Liveproteinbench: A contamination-free benchmark for assessing models’ specialized capabilities in protein science.arXiv preprint arXiv:2512.22257, 2025

work page arXiv 2025
[68]

Protein sequence design and its applications.Current opinion in structural biology, 37:71–80, 2016

Sankaran Sandhya, Richa Mudgal, Gayatri Kumar, Ramanathan Sowdhamini, and Narayanaswamy Srinivasan. Protein sequence design and its applications.Current opinion in structural biology, 37:71–80, 2016

work page 2016
[69]

Improved protein binder design usingβ-pairing targeted rfdiffusion.Nature Communications, 2026

Isaac Sappington, Martin Toul, David S Lee, Stephanie A Robinson, Inna Goreshnik, Clara McCurdy, Tung Ching Chan, Nic Buchholz, Buwei Huang, Dionne Vafeados, et al. Improved protein binder design usingβ-pairing targeted rfdiffusion.Nature Communications, 2026

work page 2026
[70]

Brenda, the enzyme database: updates and major new developments

Ida Schomburg, Antje Chang, Christian Ebeling, Marion Gremse, Christian Heldt, Gregor Huhn, and Dietmar Schomburg. Brenda, the enzyme database: updates and major new developments. Nucleic acids research, 32(suppl_1):D431–D433, 2004

work page 2004
[71]

A fine-tuning dataset and benchmark for large language models for protein understanding

Yiqing Shen, Zan Chen, Michail Mamalakis, Luhan He, Haiyang Xia, Tianbin Li, Yanzhou Su, Junjun He, and Yu Guang Wang. A fine-tuning dataset and benchmark for large language models for protein understanding. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024

work page 2024
[72]

Toursynbio: A multi-modal large model and agent framework to bridge text and protein sequences for protein engineering

Yiqing Shen, Zan Chen, Michail Mamalakis, Yungeng Liu, Tianbin Li, Yanzhou Su, Junjun He, Pietro Liò, and Yu Guang Wang. Toursynbio: A multi-modal large model and agent framework to bridge text and protein sequences for protein engineering. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2382–2389. IEEE, 2024

work page 2024
[73]

New and continuing developments at prosite.Nucleic Acids Research, 41(D1):D344–D347, 2013

Christian JA Sigrist, Edouard de Castro, Lorenzo Cerutti, Bruno A Cuche, Nicolas Hulo, Alan Bridge, Laurent Bougueleret, and Ioannis Xenarios. New and continuing developments at prosite.Nucleic Acids Research, 41(D1):D344–D347, 2013

work page 2013
[74]

Scalable Agentic Reasoning for Designing Biologics Targeting Intrinsically Disordered Proteins

Matthew Sinclair, Moeen Meigooni, Archit Vasan, Ozan Gokdemir, Xinran Lian, Heng Ma, Yadu Babuji, Alexander Brace, Khalid Hossain, Carlo Siebenschuh, et al. Scalable agentic reasoning for designing biologics targeting intrinsically disordered proteins.arXiv preprint arXiv:2512.15930, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[75]

Instructpro: Natural language guided ligand-binding protein design.arXiv preprint arXiv:2506.09332, 2025

Zhenqiao Song, Ramith Hettiarachchi, Chuan Li, Jianwen Xie, and Lei Li. Instructpro: Natural language guided ligand-binding protein design.arXiv preprint arXiv:2506.09332, 2025. 13

work page arXiv 2025
[76]

Boltzgen: Toward universal binder design.bioRxiv, pages 2025–11, 2025

Hannes Stark, Felix Faltings, MinGyu Choi, Yuxin Xie, Eunsu Hur, Timothy O’Donnell, Anton Bushuiev, Talip Uçar, Saro Passaro, Weian Mao, et al. Boltzgen: Toward universal binder design.bioRxiv, pages 2025–11, 2025

work page 2025
[77]

Venusfactory: A unified platform for protein engineering data retrieval and language model fine-tuning.arXiv preprint arXiv:2503.15438, 2025

Yang Tan, Chen Liu, Jingyuan Gao, Banghao Wu, Mingchen Li, Ruilin Wang, Lingrong Zhang, Huiqun Yu, Guisheng Fan, Liang Hong, et al. Venusfactory: A unified platform for protein engineering data retrieval and language model fine-tuning.arXiv preprint arXiv:2503.15438, 2025

work page arXiv 2025
[78]

Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, et al. Kimi k2. 5: Visual agentic intelligence.arXiv preprint arXiv:2602.02276, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[79]

Protenix-v1: Toward high-accuracy open-source biomolecular structure prediction.bioRxiv, 2026

Protenix Team, Yuxuan Zhang, Chengyue Gong, Hanyu Zhang, Wenzhi Ma, Zhenyu Liu, Xinshi Chen, Jiaqi Guan, Lan Wang, Yanping Yang, et al. Protenix-v1: Toward high-accuracy open-source biomolecular structure prediction.bioRxiv, 2026

work page 2026
[80]

Engineering strategies to overcome the stability–function trade-off in proteins.ACS synthetic biology, 11(3):1030–1039, 2022

Magdalena Teufl, Charlotte U Zajc, and Michael W Traxlmayr. Engineering strategies to overcome the stability–function trade-off in proteins.ACS synthetic biology, 11(3):1030–1039, 2022

work page 2022

Showing first 80 references.