arxiv: 2605.02937 · v1 · submitted 2026-05-01 · 💻 cs.LG · cs.AI· cs.CE

Recognition: unknown

Proteo-R1: Reasoning Foundation Models for De Novo Protein Design

Fang Wu , Weihao Xuan , Heli Qi , Hanqun Cao , Heng-Jui Chang , Zeqi Zhou , Haokai Zhao , Ma Jian

show 21 more authors

Carl Ma Yu-Chi Cheng Kuan Pang Xiangru Tang Zehong Wang Guanlue Li Hanchen Wang Kejun Ying Pan Lu Chiho Im Seungju Han Peng Xia Tinson Xu Yinxi Li Deyao Zhu Pheng-Ann Heng Naoto Yokoya Masashi Sugiyama Li Erran Li Jure Leskovec Yejin Choi

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:28 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CE

keywords de novo protein designmultimodal LLMdiffusion modelsresidue constraintsgenerative modelsprotein engineeringinterpretability

0 comments

The pith

Proteo-R1 separates reasoning about key functional residues from geometric protein generation by using an MLLM to set hard constraints for a diffusion model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Proteo-R1 as a way to make de novo protein design more deliberate by splitting it into two steps. First, a multimodal large language model examines sequences, structures, and context to decide which residues matter most for binding and specificity. Second, those fixed decisions serve as constraints for a separate diffusion model that generates the actual molecular geometry. This split aims to produce designs that are easier to understand, control, and build on than methods that mix reasoning and geometry together in one sampling process.

Core claim

Proteo-R1 adopts a dual-expert architecture in which a multimodal large language model serves as an understanding expert that identifies key functional residues governing binding and specificity; these residue-level decisions are then passed as hard constraints to a diffusion-based generation expert that performs conditional co-design while respecting the fixed interaction anchors, achieving stable, interpretable, and modular integration of LLM reasoning with geometric generative models.

What carries the argument

Dual-expert architecture that converts MLLM residue decisions into hard constraints for conditional diffusion-based generation.

If this is right

Protein designs become more interpretable because the specific residue commitments driving each design are recorded explicitly.
Biochemical knowledge can be reused systematically by updating the understanding expert without retraining the geometry generator.
Controllability improves since users can directly edit or override the residue constraints before generation begins.
The framework integrates with existing state-of-the-art diffusion models without requiring changes to their internal sampling dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same residue-constraint mechanism could be tested on multi-domain proteins or complexes where binding specificity involves several distant sites.
If the MLLM component can be swapped for newer models, the overall design pipeline could absorb advances in language reasoning without redesigning the geometric component.
Iterative workflows become possible in which the generation expert produces candidates and the understanding expert re-evaluates them against new functional criteria.

Load-bearing premise

The multimodal LLM can accurately and consistently identify the functionally essential residues that govern binding and specificity, and these decisions can be enforced as hard constraints without loss of generation quality or diversity.

What would settle it

An experiment that measures whether designs produced with the MLLM-identified residues match or exceed the success rate of unconstrained diffusion models on the same targets, or whether the identified residues align with experimentally validated critical sites from known protein complexes.

Figures

Figures reproduced from arXiv: 2605.02937 by Carl Ma, Chiho Im, Deyao Zhu, Fang Wu, Guanlue Li, Hanchen Wang, Hanqun Cao, Haokai Zhao, Heli Qi, Heng-Jui Chang, Jure Leskovec, Kejun Ying, Kuan Pang, Li Erran Li, Ma Jian, Masashi Sugiyama, Naoto Yokoya, Pan Lu, Peng Xia, Pheng-Ann Heng, Seungju Han, Tinson Xu, Weihao Xuan, Xiangru Tang, Yejin Choi, Yinxi Li, Yu-Chi Cheng, Zehong Wang, Zeqi Zhou.

**Figure 1.** Figure 1: Proteo-R1 couples a multimodal reasoning expert with a geometric diffusion expert to unify molecular understanding and generation. The reasoner integrates sequence embeddings, AF3- style structural representations, and textual prompts to analyze a masked complex and determine which CDR residues should be key interaction anchors. These decisions include both the selection of critical residues and their pref… view at source ↗

**Figure 2.** Figure 2: Three-stage training diagram of Proteo-R1. In Stage I (Multimodal Alignment), the framework uses general protein data from PDB to project sequence and structural features into the LLM’s language representation space via lightweight projection layers, while the LLM backbone remains frozen. Supervision combines structured schema completion and free-form captioning over chain-level structural attributes. Stag… view at source ↗

read the original abstract

Deep learning in \emph{de novo} protein design has achieved atomic-level fidelity. However, existing models remain largely non-deliberative: they directly synthesize molecular geometries without explicitly reasoning about which residues or interactions are functionally essential. As a result, design decisions are entangled with continuous sampling dynamics, limiting interpretability, controllability, and systematic reuse of biochemical knowledge. We introduce \textbf{Proteo-R1}, a reasoning-guided protein design framework that explicitly decouples \emph{molecular understanding} from \emph{geometric generation}. Proteo-R1 adopts a dual-expert architecture in which a multimodal large language model (MLLM) serves as an \emph{understanding expert}, analyzing protein sequences, structures, and textual context to identify key functional residues that govern binding and specificity. These residue-level decisions are then passed as hard constraints to a separate diffusion-based \emph{generation expert}, which performs conditional co-design while respecting the fixed interaction anchors. This factorization mirrors how human experts approach molecular engineering: first, reasoning about critical interactions, then optimizing geometry subject to those constraints. By operationalizing reasoning as explicit residue-level commitments rather than latent textual guidance, Proteo-R1 achieves stable, interpretable, and modular integration of LLM reasoning with state-of-the-art geometric generative models. Code, data, and demos are available at https://smiles724.github.io/r1/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clean architectural proposal for splitting LLM reasoning from diffusion-based protein generation via hard residue constraints, but it offers no experiments or metrics to show the split actually helps.

read the letter

The new element here is the explicit handoff: an MLLM reads sequence, structure, and text to name specific residues that must stay fixed, then a separate diffusion model designs the rest around those anchors. Most prior protein generators either stay fully end-to-end or use soft text prompts, so locking in residue-level decisions is a distinct framing that could improve modularity and reuse of known biology. The motivation section lays this out plainly and the human-expert analogy is straightforward. Mentioning public code and demos is also a practical step if the implementation matches the description. The central weakness is the complete absence of any results. The abstract asserts stable and interpretable outcomes, yet there are no accuracy numbers on the MLLM's residue calls, no ablation on what happens when those constraints are enforced, and no comparison against latent-guidance baselines. Without those, it is impossible to tell whether the hard constraints add value or simply shrink the valid design space. The assumption that the MLLM will reliably surface the functionally critical residues therefore remains unexamined. This kind of paper is mainly for groups already working on hybrid LLM-plus-geometric models in synthetic biology who want to think through modular alternatives. It could spark useful discussion in a reading group about constraint strategies, but it is too early for most readers to cite or build on directly. I would send it to peer review because the factorization is distinct enough that referees could give concrete advice on validation experiments and on whether the hard-constraint approach is worth pursuing further.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Proteo-R1, a dual-expert framework for de novo protein design in which a multimodal LLM serves as an understanding expert that extracts key functional residues from sequence, structure, and textual context, and these residues are imposed as hard constraints on a separate diffusion-based generation expert for conditional co-design. The central claim is that operationalizing reasoning via explicit residue-level commitments (rather than latent textual guidance) yields stable, interpretable, and modular integration of LLM reasoning with geometric generative models.

Significance. If the claims were substantiated, the explicit factorization could improve interpretability and controllability in protein design by allowing modular reuse of biochemical reasoning components separate from geometry optimization. The manuscript provides no empirical results, however, so the practical significance cannot be assessed.

major comments (2)

[Abstract] Abstract: the assertion that Proteo-R1 'achieves stable, interpretable, and modular integration' is unsupported by any quantitative metrics, ablation studies on constraint enforcement, residue-identification accuracy, or comparisons against latent-text baselines.
[Abstract] Abstract: the load-bearing assumption that the MLLM can accurately and consistently identify functionally essential residues and that enforcing them as hard constraints preserves generation quality and diversity is neither quantified nor tested.

minor comments (1)

The manuscript would benefit from a schematic diagram illustrating the information flow between the understanding expert and generation expert.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that the abstract advances claims without supporting quantitative evidence, as the manuscript currently emphasizes the conceptual dual-expert framework rather than comprehensive empirical validation. We will revise the abstract and add relevant analyses in the next version.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that Proteo-R1 'achieves stable, interpretable, and modular integration' is unsupported by any quantitative metrics, ablation studies on constraint enforcement, residue-identification accuracy, or comparisons against latent-text baselines.

Authors: We agree that the current abstract overstates the achieved properties without supporting metrics. The manuscript will be revised to rephrase the claim as 'is designed to achieve' or 'enables' stable, interpretable, and modular integration through explicit residue-level constraints. We will add ablation studies on constraint enforcement, residue-identification accuracy, and direct comparisons to latent-text baselines in the experimental section. revision: yes
Referee: [Abstract] Abstract: the load-bearing assumption that the MLLM can accurately and consistently identify functionally essential residues and that enforcing them as hard constraints preserves generation quality and diversity is neither quantified nor tested.

Authors: We acknowledge that this assumption is central yet untested quantitatively in the present manuscript. The abstract will be updated to present the residue identification and constraint preservation as hypotheses supported by the architecture and available demos. We will incorporate preliminary quantification of residue accuracy (e.g., against known functional sites) and metrics on generation quality/diversity under hard constraints in the revision. revision: yes

Circularity Check

0 steps flagged

High-level architectural proposal with no derivational chain or equations

full rationale

The paper introduces Proteo-R1 as a dual-expert framework that decouples MLLM-based residue identification from diffusion-based generation, but supplies no equations, fitted parameters, uniqueness theorems, or mathematical derivations. The central claim—that explicit residue-level commitments yield stable and interpretable integration—is presented as a design choice mirroring human reasoning rather than a result derived from prior inputs or self-citations. No load-bearing steps reduce by construction to fitted values or author-overlapping citations; the contribution remains a conceptual factorization without any self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the unverified capability of current multimodal LLMs to perform accurate functional residue identification for proteins.

axioms (1)

domain assumption Multimodal LLMs can reliably extract key functional residues governing binding and specificity from sequences, structures, and textual context.
This assumption enables the understanding expert and is not derived within the paper.

pith-pipeline@v0.9.0 · 5659 in / 1085 out tokens · 30433 ms · 2026-05-09T19:28:51.256011+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 16 canonical work pages · 4 internal anchors

[1]

Toward de novo protein design from natural language.bioRxiv, pp

Dai, F., You, S., Wang, C., Fan, Y ., Su, J., Han, C., Zhou, X., Liu, J., Qian, H., Wang, S., et al. Toward de novo protein design from natural language.bioRxiv, pp. 2024–08,

2024
[2]

Emerging Properties in Unified Multimodal Pretraining

Deng, C., Zhu, D., Li, K., Gou, C., Li, F., Wang, Z., Zhong, S., Yu, W., Nie, X., Song, Z., et al. Emerging proper- ties in unified multimodal pretraining.arXiv preprint arXiv:2505.14683,

work page internal anchor Pith review arXiv
[3]

Gao, Z., Wang, J., Tan, C., Wu, L., Huang, Y ., Li, S., Ye, Z., and Li, S. Z. Uniif: Unified molecule inverse folding. arXiv preprint arXiv:2405.18968,

work page arXiv
[4]

Protenix-mini: Efficient structure predictor via compact architecture, few-step diffusion and switchable plm.arXiv preprint arXiv:2507.11839,

Gong, C., Chen, X., Zhang, Y ., Song, Y ., Zhou, H., and Xiao, W. Protenix-mini: Efficient structure predictor via compact architecture, few-step diffusion and switchable plm.arXiv preprint arXiv:2507.11839,

work page arXiv
[5]

Protdat: A unified framework for protein sequence de- sign from any protein text description.arXiv preprint arXiv:2412.04069,

Guo, X.-Y ., Li, Y .-F., Liu, Y ., Pan, X., and Shen, H.-B. Protdat: A unified framework for protein sequence de- sign from any protein text description.arXiv preprint arXiv:2412.04069,

work page arXiv
[6]

arXiv preprint arXiv:2405.06649 , year=

URL https://arxiv.org/ abs/2405.06649. arXiv:2405.06649. Jin, W., Barzilay, R., and Jaakkola, T. Antibody-antigen docking and design via hierarchical structure refinement. InInternational Conference on Machine Learning, pp. 10217–10227. PMLR,

work page arXiv
[7]

Conditional antibody de- sign as 3d equivariant graph translation

Kong, X., Huang, W., and Liu, Y . Conditional antibody de- sign as 3d equivariant graph translation. InThe Eleventh International Conference on Learning Representations, 2023a. Kong, X., Huang, W., and Liu, Y . End-to-end full-atom antibody design. InInternational Conference on Machine Learning, pp. 17409–17429. PMLR, 2023b. Kong, X., Zhang, Z., Zhang, Z....

work page arXiv
[8]

B., and Kuhlman, B

Leaver-Fay, A., Jacak, R., Stranges, P. B., and Kuhlman, B. A generic program for multistate protein design.PloS one, 6(7):e20937, 2011a. Leaver-Fay, A., Tyka, M., Lewis, S. M., Lange, O. F., Thompson, J., Jacak, R., Kaufman, K. W., Renfrew, P. D., Smith, C. A., Sheffler, W., et al. Rosetta3: an object- oriented software suite for the simulation and desig...

work page arXiv
[9]

Lin, H., Wu, L., Huang, Y ., Liu, Y ., Zhang, O., Zhou, Y ., Sun, R., and Li, S. Z. Geoab: Towards realistic antibody design and reliable affinity maturation. InForty-first International Conference on Machine Learning, 2024b. Lin, H., Zhang, O., Zhao, H., Jiang, D., Wu, L., Liu, Z., Huang, Y ., and Li, S. Z. Ppflow: Target-aware peptide design with torsio...

2024
[10]

Decoupled Weight Decay Regularization

Loshchilov, I. and Hutter, F. Decoupled weight decay regu- larization.arXiv preprint arXiv:1711.05101,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Flexible and controllable protein design by prefix-tuning large-scale protein language models.bioRxiv, pp

Luo, J., Liu, X., Li, J., Chen, Q., and Chen, J. Flexible and controllable protein design by prefix-tuning large-scale protein language models.bioRxiv, pp. 2023–12,

2023
[12]

arXiv:2503.08179

URL https:// arxiv.org/abs/2503.08179. arXiv:2503.08179. Mille-Fragoso, L. S., Wang, J. N., Driscoll, C. L., Dai, H., Widatalla, T., Zhang, X., Hie, B. L., and Gao, X. J. Effi- cient generation of epitope-targeted de novo antibodies with germinal.bioRxiv,

work page arXiv
[13]

Bindcraft: one-shot design of functional protein binders.bioRxiv, pp

Pacesa, M., Nickel, L., Schellhaas, C., Schmidt, J., Pyatova, E., Kissling, L., Barendse, P., Choudhury, J., Kapoor, S., Alcaraz-Serna, A., et al. Bindcraft: one-shot design of functional protein binders.bioRxiv, pp. 2024–09,

2024
[14]

P., Matusovsky, O., Parsa, M

Riley, T. P., Matusovsky, O., Parsa, M. S., Kalantari, P., Naderi, I., Azimian, K., and Wei, K. Y . A generalized protein design ml model enables generation of functional de novo proteins.bioRxiv, pp. 2025–03,

2025
[15]

Score-Based Generative Modeling through Stochastic Differential Equations

Song, Y ., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Er- mon, S., and Poole, B. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456,

work page internal anchor Pith review Pith/arXiv arXiv 2011
[16]

Instructpro: Natural language guided ligand-binding protein design.arXiv preprint arXiv:2506.09332, 2025

Song, Z., Hettiarachchi, R., Li, C., Xie, J., and Li, L. In- structpro: Natural language guided ligand-binding protein design.arXiv preprint arXiv:2506.09332,

work page arXiv
[17]

Boltzgen: Toward universal binder design.bioRxiv, pp

Stark, H., Faltings, F., Choi, M., Xie, Y ., Hur, E., O’Donnell, T., Bushuiev, A., Uc ¸ar, T., Passaro, S., Mao, W., et al. Boltzgen: Toward universal binder design.bioRxiv, pp. 2025–11,

2025
[18]

Pxdesign: Fast, modular, and accurate de novo design of protein binders.bioRxiv, pp

Team, P., Ren, M., Sun, J., Guan, J., Liu, C., Gong, C., Wang, Y ., Wang, L., Cai, Q., Ma, W., et al. Pxdesign: Fast, modular, and accurate de novo design of protein binders.bioRxiv, pp. 2025–08,

2025
[19]

Trippe, Jason Yim, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay, and Tommi Jaakkola

Trippe, B. L., Yim, J., Tischer, D., Baker, D., Broderick, T., Barzilay, R., and Jaakkola, T. Diffusion probabilistic mod- eling of protein backbones in 3d for the motif-scaffolding problem.arXiv preprint arXiv:2206.04119,

work page arXiv
[20]

A generative foundation model for antibody design.bioRxiv, pp

Wang, R., Wu, F., Shi, J., Song, Y ., Kong, Y ., Ma, J., He, B., Yan, Q., Ying, T., Zhao, P., et al. A generative foundation model for antibody design.bioRxiv, pp. 2025–09,

2025
[21]

Qwen3 Technical Report

12 Proteo-R1: Reasoning Foundation Models for Protein Discovery Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025a. Yang, L., Zhang, Z., Song, Y ., Hong, S., Xu, R., Zhao, Y ., Zhang, W., Cui, B., and Yang, M.-H. Diffusion models: A comprehensi...

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Proteinbench: A holistic evaluation of protein foundation models.arXiv preprint arXiv:2409.06744, 2024

Yang, N., Jiang, S., Ma, J., Wu, H., Zheng, S., Jin, W., and Yan, J. Repurposing alphafold3-like protein folding mod- els for antibody sequence and structure co-design. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025b. Ye, F., Zheng, Z., Xue, D., Shen, Y ., Wang, L., Ma, Y ., Wang, Y ., Wang, X., Zhou, X., and Gu, Q. Pr...

work page arXiv
[23]

High-affinity protein binder design via flow matching and in silico maturation.bioRxiv, pp

Yu, Q., Guo, L., Qin, X., Huang, X., Tian, B., Wang, H., Liu, Y ., Lang, Y ., Wang, D., Shen, Z., et al. High-affinity protein binder design via flow matching and in silico maturation.bioRxiv, pp. 2026–01,

2026
[24]

De novo design of high-affinity protein binders with alphaproteo.arXiv preprint arXiv:2409.08022, 2024

Zambaldi, V ., La, D., Chu, A. E., Patani, H., Danson, A. E., Kwan, T. O., Frerix, T., Schneider, R. G., Saxton, D., Thillaisundaram, A., et al. De novo design of high- affinity protein binders with alphaproteo.arXiv preprint arXiv:2409.08022,

work page arXiv
[25]

arXiv:2503.21450

URL https://arxiv.org/abs/ 2503.21450. arXiv:2503.21450. Zhu, T., Ren, M., and Zhang, H. Antibody design using a score-based diffusion model guided by evolutionary, physical and geometric constraints. InForty-first Interna- tional Conference on Machine Learning,

work page arXiv
[26]

Related Work Protein Binder and Antibody Design.Protein–protein interactions (PPIs) underlie most cellular processes and represent a major class of therapeutic targets

13 Proteo-R1: Reasoning Foundation Models for Protein Discovery A. Related Work Protein Binder and Antibody Design.Protein–protein interactions (PPIs) underlie most cellular processes and represent a major class of therapeutic targets. Traditional binder discovery pipelines, including immunization (K¨ohler & Milstein, 1975), display-based library screenin...

1975
[27]

enable direct generation of protein backbones, sequences, and full-atom structures. Frameworks such as RFdiffusion (Watson et al., 2023), BindCraft (Pacesa et al., 2024), and AF3-inspired generative models (Yang et al., 2025b) substantially improve backbone diversity and geometric realism. These methods have been extended to antibody design, including CDR...

2023
[28]

and in silico affinity maturation (Correia et al., 2014; Warszawski et al.,

2014
[29]

Recently, DL maturation methods similarly condition generation on predefined anchors or interaction patterns (Yu et al., 2026)

techniques fix or bias key interface residues while optimizing surrounding regions. Recently, DL maturation methods similarly condition generation on predefined anchors or interaction patterns (Yu et al., 2026). However, these constraints are specified heuristically or derived from post hoc energy evaluations rather than learned, multimodal reasoning. As ...

2026