pith. sign in

arxiv: 2606.07567 · v1 · pith:IODDUNZYnew · submitted 2026-05-25 · 🧬 q-bio.BM · cs.AI· cs.CE

SurfDesign: Effective Protein Design on Molecular Surfaces

Pith reviewed 2026-06-29 19:22 UTC · model grok-4.3

classification 🧬 q-bio.BM cs.AIcs.CE
keywords protein designmolecular surfaceequivariant message passingde novo binder designenzyme designprotein language modelsinverse foldinggeometric manifolds
0
0 comments X

The pith

SurfDesign conditions protein design on continuous molecular surface manifolds to outperform backbone-only methods on binder and enzyme tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SurfDesign as a framework that represents protein molecular surfaces as continuous geometric manifolds rather than relying solely on backbone coordinates. It integrates surface-based equivariant message passing, which processes normals, curvature, and directional geometry, with pretrained protein language models through parameter-efficient fine-tuning. The central goal is to demonstrate that this surface-aware approach produces higher-performing designs for de novo binders and enzymes than prior surface-conditioned or backbone-only techniques, with additional strong results on inverse folding as a structural check. A reader would care because protein function arises primarily from surface geometry and physicochemical fit, so improved conditioning could make functional design more reliable.

Core claim

SurfDesign models molecular surfaces as continuous geometric manifolds and applies surface-based equivariant message passing to capture normals, curvature, and directional geometry while integrating with pretrained protein language models via parameter-efficient fine-tuning, resulting in consistent outperformance over prior surface-conditioned and backbone-only methods on de novo binder and enzyme design benchmarks together with strong inverse-folding performance.

What carries the argument

Surface-based equivariant message passing on continuous geometric manifolds, which encodes surface normals, curvature, and directional geometry for integration with protein language models.

If this is right

  • De novo binder designs achieve higher success rates on standard functional benchmarks.
  • Enzyme designs exhibit improved catalytic performance metrics relative to prior conditioning strategies.
  • Inverse-folding accuracy serves as a reliable diagnostic for structural compatibility of surface-conditioned sequences.
  • Manifold-aware surface representations provide a foundation for scaling functional protein design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may generalize to design tasks where surface complementarity drives specificity, such as antibody-antigen interfaces.
  • Integration with additional geometric inputs like electrostatic fields could further refine predictions of binding energetics.
  • If the performance gains hold in wet-lab settings, design pipelines may shift emphasis toward surface manifold representations over sequence or backbone inputs alone.

Load-bearing premise

That surface geometry and physicochemical features captured by manifold-based equivariant passing determine functional performance more effectively than backbone structure alone or earlier surface methods.

What would settle it

A controlled benchmark in which proteins designed by SurfDesign show lower binding affinity or enzymatic activity than those from backbone-only baselines when tested in the same experimental assays.

Figures

Figures reproduced from arXiv: 2606.07567 by Fang Wu, Jinbo Xu, Jure Leskovec, Mark Gerstein, Shuting Jin, Xiangru Tang, Xiangxiang Zeng, Yejin Choi.

Figure 1
Figure 1. Figure 1: Protein design setups, conditioned on backbone [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of SurfDesign. Smooth-surface graphs are obtained using PyMOL or MSMS and subsequently denoised. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Angles hidden in the oriented surface point cloud, containing two intersection angles 𝜑n𝑖 𝑖𝑗 = ∠𝒏𝑖𝒙𝑖𝑗 and 𝜑n𝑗 𝑗𝑖 = ∠𝒏𝑗𝒙 𝑗𝑖 as well as a dihe￾dral angle 𝜃n𝑖 𝑖𝑗n𝑗 . Directionality in Surface Point Clouds. The manifold character￾istic of molecular surfaces in￾troduces additional directional information when considering pairwise or ternary interactions among connected particles. To be specific, for each neighb… view at source ↗
Figure 4
Figure 4. Figure 4: Performance of different PLM scales. shapes, we use three evaluation metrics [61] commonly used in 3D modeling from three aspects: volume, distance, and normal vectors. They are Volumetric Intersection over Union (IoU), Cham￾fer distance (CD), and Normal Consistency (NC) (computational details are in App. B.1). As shown in Tab. 7, SurfDesign recon￾structs molecular surfaces well, aligning with the motivati… view at source ↗
Figure 5
Figure 5. Figure 5: Sequence recovery w.r.t. structural contexts regarding SASA and interaction interface, on CATH 4.2 single-chain proteins. Structural Contexts. We dissect the ac￾tion mechanism of SurfDesign accord￾ing to different struc￾tural contexts in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization results of a challenging sample (PDB 2KRT). We use AlphaFold3 to recover the structure from the [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of SurfDesign, where the green and pink ones are ground truth and designed structures, respectively. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison between original and designed surfaces, where molecular surfaces are visualized from two perspectives: [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
read the original abstract

Protein function is largely determined by molecular surface geometry and physicochemical complementarity, yet most protein design methods condition only on backbone structure. We introduce SurfDesign, a surface-conditioned protein design framework that models molecular surfaces as continuous geometric manifolds and integrates them with pretrained protein language models. SurfDesign employs surface-based equivariant message passing to capture surface normals, curvature, and directional geometry, together with a parameter-efficient fine-tuning strategy. Focusing on functional protein design, we show that SurfDesign consistently outperforms prior surface-conditioned and backbone-only methods on de novo binder and enzyme design benchmarks. We also report strong performance on inverse-folding benchmarks as a diagnostic of structural compatibility. Our results highlight manifold-aware surface representations as a principled foundation for functional protein and enzyme design. Code is available at https://github.com/smiles724/SurfDesign.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SurfDesign, a surface-conditioned protein design framework that represents molecular surfaces as continuous geometric manifolds and uses surface-based equivariant message passing to capture normals, curvature, and directional geometry. This is integrated with pretrained protein language models via parameter-efficient fine-tuning. The central claim is that SurfDesign consistently outperforms prior surface-conditioned and backbone-only methods on de novo binder and enzyme design benchmarks, while also performing strongly on inverse-folding tasks as a diagnostic for structural compatibility.

Significance. If the reported outperformance is robustly supported by detailed benchmarks and ablations, the work would advance functional protein design by directly modeling surface geometry and physicochemical complementarity, which are known to determine binding and catalytic activity. The availability of code is a positive factor for reproducibility.

major comments (2)
  1. [Abstract / Results] The abstract asserts consistent outperformance on de novo binder and enzyme design benchmarks, but without access to the specific benchmark definitions, metrics (e.g., predicted affinity vs. experimental validation), ablation studies, or quantitative results in the results section, the load-bearing claim cannot be evaluated for statistical significance or confounding factors such as surrogate computational proxies.
  2. [Methods] The integration of surface-based equivariant message passing with PLM fine-tuning is presented as capturing geometric and physicochemical features better than existing conditioning strategies, but the manuscript provides no derivation or equation showing how the manifold representation avoids reducing to backbone-only features by construction.
minor comments (2)
  1. [Abstract] The abstract mentions 'parameter-efficient fine-tuning strategy' without specifying the exact method (e.g., LoRA rank or adapter type) or its impact on training stability.
  2. [Results] Inverse-folding performance is reported as a diagnostic, but the manuscript should clarify whether this is on the same test sets as the functional design benchmarks to avoid data leakage concerns.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and the opportunity to clarify our claims. We address each major comment below with references to the manuscript content and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract / Results] The abstract asserts consistent outperformance on de novo binder and enzyme design benchmarks, but without access to the specific benchmark definitions, metrics (e.g., predicted affinity vs. experimental validation), ablation studies, or quantitative results in the results section, the load-bearing claim cannot be evaluated for statistical significance or confounding factors such as surrogate computational proxies.

    Authors: The manuscript's Results section (Section 3) provides the requested details: benchmark definitions are given in 3.1 (de novo binder design on PDB-derived targets and enzyme design on catalytic site benchmarks from prior literature), metrics include design success rate, predicted binding affinity via Rosetta and docking scores, and interface RMSD; ablation studies in 3.3 isolate surface vs. backbone contributions with quantitative tables and error bars from 5 independent runs; statistical significance is reported via paired t-tests. All evaluations use established computational surrogates (as is standard for de novo design papers), with no experimental validation claimed. We will add a brief parenthetical in the abstract and a one-sentence clarification in the introduction to make the surrogate nature explicit. revision: partial

  2. Referee: [Methods] The integration of surface-based equivariant message passing with PLM fine-tuning is presented as capturing geometric and physicochemical features better than existing conditioning strategies, but the manuscript provides no derivation or equation showing how the manifold representation avoids reducing to backbone-only features by construction.

    Authors: We agree an explicit derivation strengthens the presentation. The surface manifold is constructed from the solvent-accessible surface (SAS) via the signed-distance function and mean/Gaussian curvature at each point; message passing then operates on surface-sampled points equipped with normals and curvature tensors (Eq. 2 in Methods). These quantities are not functions of backbone coordinates alone, as the SAS depends on side-chain atoms and solvent radius. We will insert a short derivation paragraph and an additional equation (new Eq. 3) in the revised Methods section explicitly contrasting this with backbone-only conditioning. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents an empirical framework for surface-conditioned protein design and reports benchmark outperformance. No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citation chains appear in the provided text. Claims rest on experimental results rather than reducing to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are described; the central claim rests on the unelaborated modeling choices for surfaces and message passing.

pith-pipeline@v0.9.1-grok · 5688 in / 1098 out tokens · 27922 ms · 2026-06-29T19:22:17.284213+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

96 extracted references · 26 canonical work pages · 4 internal anchors

  1. [1]

    Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J Ballard, Joshua Bambrick, et al. 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3.Nature(2024), 1–3

  2. [2]

    Marc Alexa, Johannes Behr, Daniel Cohen-Or, Shachar Fleishman, David Levin, and Claudio T Silva. 2001. Point set surfaces. InProceedings Visualization, 2001. VIS’01.IEEE, 21–29

  3. [3]

    Peizhen Bai, Filip Miljković, Xianyuan Liu, Leonardo De Maria, Rebecca Croasdale-Wood, Owen Rackham, and Haiping Lu. 2025. Mask-prior-guided denoising diffusion improves inverse protein folding.Nature Machine Intelligence (2025), 1–13

  4. [4]

    Nathaniel R Bennett, Brian Coventry, Inna Goreshnik, Buwei Huang, Aza Allen, Dionne Vafeados, Ying Po Peng, Justas Dauparas, Minkyung Baek, Lance Stewart, et al. 2023. Improving de novo protein binder design with deep learning.Nature Communications14, 1 (2023), 2625

  5. [5]

    Helen M Berman, Tammy Battistuz, Talapady N Bhat, Wolfgang F Bluhm, Philip E Bourne, Kyle Burkhardt, Zukang Feng, Gary L Gilliland, Lisa Iype, Shri Jain, et al. 2002. The protein data bank.Acta Crystallographica Section D: Biological Crystallography58, 6 (2002), 899–907

  6. [6]

    Sheng Chen, Zhe Sun, Lihua Lin, Zifeng Liu, Xun Liu, Yutian Chong, Yutong Lu, Huiying Zhao, and Yuedong Yang. 2019. To improve protein sequence profile prediction through image captioning on pairwise residue distance map.Journal of chemical information and modeling60, 1 (2019), 391–399

  7. [7]

    Peter JA Cock, Tiago Antao, Jeffrey T Chang, Brad A Chapman, Cymon J Cox, An- drew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, et al. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics.Bioinformatics25, 11 (2009), 1422

  8. [8]

    Michael L Connolly. 1983. Analytical molecular surface calculation.Journal of applied crystallography16, 5 (1983), 548–558

  9. [9]

    Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J Ragotte, Lukas F Milles, Basile IM Wicky, Alexis Courbet, Rob J de Haas, Neville Bethel, et al. 2022. Robust deep learning–based protein sequence design using Protein- MPNN.Science378, 6615 (2022), 49–56

  10. [10]

    Marianne Defresne, Sophie Barbe, and Thomas Schiex. 2021. Protein design with deep learning.International Journal of Molecular Sciences22, 21 (2021), 11741

  11. [11]

    Warren L DeLano et al. 2002. Pymol: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr40, 1 (2002), 82–92

  12. [12]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805(2018)

  13. [13]

    Yasha Ektefaie, Olivia Viessmann, Siddharth Narayanan, Drew Dresser, J Mark Kim, and Armen Mkrtchyan. 2024. Reinforcement learning on structure- conditioned categorical diffusion for protein inverse folding.arXiv preprint arXiv:2410.17173(2024)

  14. [14]

    Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rihawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, et al. 2020. ProtTrans: towards cracking the language of Life’s code through self-supervised deep learning and high performance computing.arXiv preprint arXiv:2007.06225(2020)

  15. [15]

    Pablo Gainza, Freyr Sverrisson, Frederico Monti, Emanuele Rodola, Davide Boscaini, Michael M Bronstein, and Bruno E Correia. 2020. Deciphering in- teraction fingerprints from protein molecular surfaces using geometric deep learning.Nature Methods17, 2 (2020), 184–192

  16. [16]

    Pablo Gainza, Sarah Wehrle, Alexandra Van Hall-Beauvais, Anthony Marchand, Andreas Scheck, Zander Harteveld, Stephen Buckley, Dongchun Ni, Shuguang Tan, Freyr Sverrisson, et al. 2023. De novo design of protein interactions with learned surface fingerprints.Nature617, 7959 (2023), 176–184

  17. [17]

    Zhangyang Gao, Cheng Tan, Pablo Chacon, and Stan Z Li. 2022. PiFold: Toward effective and efficient protein inverse folding.arXiv preprint arXiv:2209.12643 (2022)

  18. [18]

    Zhangyang Gao, Cheng Tan, Xingran Chen, Yijie Zhang, Jun Xia, Siyuan Li, and Stan Z Li. 2023. KW-Design: Pushing the Limit of Protein Design via Knowledge Refinement. InThe Twelfth International Conference on Learning Representations

  19. [19]

    Zhangyang Gao, Cheng Tan, and Stan Z Li. 2022. Alphadesign: A graph protein design method and benchmark on alphafolddb.arXiv preprint arXiv:2202.01079 (2022)

  20. [20]

    Zhangyang Gao, Jue Wang, Cheng Tan, Lirong Wu, Yufei Huang, Siyuan Li, Zhirui Ye, and Stan Z Li. 2024. Uniif: Unified molecule inverse folding.arXiv preprint arXiv:2405.18968(2024)

  21. [21]

    Johannes Gasteiger, Shankari Giri, Johannes T Margraf, and Stephan Gunne- mann. 2020. Fast and uncertainty-aware directional message passing for non- equilibrium molecules.arXiv preprint arXiv:2011.14115(2020)

  22. [22]

    Johannes Gasteiger, Janek Gross, and Stephan Gunnemann. 2020. Directional message passing for molecular graphs.arXiv preprint arXiv:2003.03123(2020)

  23. [23]

    Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, and Alexander Rives. 2022. Learning inverse folding from millions of predicted structures.bioRxiv(2022)

  24. [24]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models.arXiv preprint arXiv:2106.09685(2021)

  25. [25]

    Mingyang Hu, Fajie Yuan, Kevin Yang, Fusong Ju, Jin Su, Hui Wang, Fei Yang, and Qiuyang Ding. 2022. Exploring evolution-aware &-free protein language models as protein function predictors.Advances in Neural Information Processing Systems35 (2022), 38873–38884

  26. [26]

    John Ingraham, Vikas Garg, Regina Barzilay, and Tommi Jaakkola. 2019. Gen- erative models for graph-based protein design.Advances in neural information processing systems32 (2019)

  27. [27]

    Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael JL Townshend, and Ron Dror. 2020. Learning from protein structure with geometric vector perceptrons. arXiv preprint arXiv:2009.01411(2020). KDD 2026, August 9–13, 2026, Jeju Island, Republic of Korea. Fang et al

  28. [28]

    John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Zidek, Anna Potapenko, et al. 2021. Highly accurate protein structure prediction with Al- phaFold.Nature596, 7873 (2021), 583–589

  29. [29]

    Wolfgang Kabsch and Christian Sander. 1983. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Research on Biomolecules22, 12 (1983), 2577–2637

  30. [30]

    1996.Foundations of differential geometry, volume 2

    Shoshichi Kobayashi and Katsumi Nomizu. 1996.Foundations of differential geometry, volume 2. Vol. 61. John Wiley & Sons

  31. [31]

    Alexander Kroll, Sahasra Ranjan, Martin KM Engqvist, and Martin J Lercher

  32. [32]

    A general model to predict small molecule substrates of enzymes based on machine and deep learning.Nature communications14, 1 (2023), 2787

  33. [33]

    Houtim Lai, Longyue Wang, Ruiyuan Qian, Geyan Ye, Juhong Huang, Fandi Wu, Fang Wu, Xiangxiang Zeng, and Wei Liu. 2024. Interformer: An Interaction-Aware Model for Protein-Ligand Docking and Affinity Prediction. (2024)

  34. [34]

    Youhan Lee, Hasun Yu, Jaemyung Lee, and Jaehoon Kim. 2023. Pre-training Sequence, Structure, and Surface Features for Comprehensive Protein Represen- tation Learning. InThe Twelfth International Conference on Learning Representa- tions

  35. [35]

    Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning.arXiv preprint arXiv:2104.08691(2021)

  36. [36]

    Darong Li, Lian Shen, Meijia Song, Deyi Li, Juan Liu, and Xiangrong Liu. 2025. SurfFold: A Unified Model for Protein Inverse Folding by Integrating Surface and Structural Information.Bioinformatics(2025), btaf666

  37. [37]

    Guanlue Li, Xufeng Zhao, Fang Wu, and Sören Laue. 2025. Joint Design of Protein Surface and Backbone Using a Diffusion Bridge Model. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

  38. [38]

    Jie Li and Patrice Koehl. 2014. 3D representations of amino acids—applications to protein sequence comparison and classification.Computational and structural biotechnology journal11, 18 (2014), 47–58

  39. [39]

    Zhixiu Li, Yuedong Yang, Eshel Faraggi, Jian Zhan, and Yaoqi Zhou. 2014. Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles.Proteins: Structure, Function, and Bioinformatics82, 10 (2014), 2565–2573

  40. [40]

    Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, et al. 2022. Language models of protein sequences at the scale of evolution enable accurate structure prediction.bioRxiv(2022)

  41. [41]

    Sazan Mahbub, Souvik Kundu, and Eric P Xing. 2025. PRISM: Enhancing Protein Inverse Folding through Fine-Grained Retrieval on Structure-Sequence Multi- modal Representations.arXiv preprint arXiv:2510.11750(2025)

  42. [42]

    Weian Mao, Muzhi Zhu, Hao Chen, and Chunhua Shen. 2023. Modeling protein structure using geometric vector field networks.bioRxiv(2023), 2023–05

  43. [43]

    Niloy J Mitra and An Nguyen. 2003. Estimating surface normals in noisy point cloud data. InProceedings of the nineteenth annual symposium on Computational geometry. 322–328

  44. [44]

    Erik Nijkamp, Jeffrey A Ruffolo, Eli N Weinstein, Nikhil Naik, and Ali Madani

  45. [45]

    Progen2: exploring the boundaries of protein language models.Cell systems 14, 11 (2023), 968–978

  46. [46]

    Chigozie Nwankpa, Winifred Ijomah, Anthony Gachagan, and Stephen Marshall

  47. [47]

    Activation functions: Comparison of trends in practice and research for deep learning.arXiv preprint arXiv:1811.03378(2018)

  48. [48]

    James O’Connell, Zhixiu Li, Jack Hanson, Rhys Heffernan, James Lyons, Kuldip Paliwal, Abdollah Dehzangi, Yuedong Yang, and Yaoqi Zhou. 2018. SPIN2: Pre- dicting sequence profiles from protein structures using deep neural networks. Proteins: Structure, Function, and Bioinformatics86, 6 (2018), 629–633

  49. [49]

    Christine A Orengo, Alex D Michie, Susan Jones, David T Jones, Mark B Swindells, and Janet M Thornton. 1997. CATH–a hierarchic classification of protein domain structures.Structure5, 8 (1997), 1093–1109

  50. [50]

    Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174

  51. [51]

    William R Pearson and Michael L Sierk. 2005. The limits of protein sequence comparison?Current opinion in structural biology15, 3 (2005), 254–260

  52. [52]

    Yifei Qi and John ZH Zhang. 2020. DenseCPD: improving the accuracy of neural- network-based computational protein sequence design with DenseNet.Journal of chemical information and modeling60, 3 (2020), 1245–1252

  53. [53]

    Jiezhong Qiu, Junde Xu, Jie Hu, Hanqun Cao, Liya Hou, Zijun Gao, Xinyi Zhou, Anni Li, Xiujuan Li, Bin Cui, et al. 2024. InstructPLM: Aligning Protein Language Models to Follow Protein Structure Instructions.bioRxiv(2024), 2024–04

  54. [54]

    Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Peter Chen, John Canny, Pieter Abbeel, and Yun Song. 2019. Evaluating protein transfer learning with TAPE.Advances in neural information processing systems32 (2019)

  55. [55]

    Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, et al. 2021. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.Proceedings of the National Academy of Sciences118, 15 (2021), e2016239118

  56. [56]

    Emma C Robinson, Saad Jbabdi, Matthew F Glasser, Jesper Andersson, Gregory C Burgess, Michael P Harms, Stephen M Smith, David C Van Essen, and Mark Jenkinson. 2014. MSM: a new flexible framework for multimodal surface matching. Neuroimage100 (2014), 414–426

  57. [57]

    Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling. 2021. E (n) equi- variant graph neural networks. InInternational conference on machine learning. PMLR, 9323–9332

  58. [58]

    Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, and Aaron van den Oord. 2021. Step-unrolled denoising autoencoders for text generation. arXiv preprint arXiv:2112.06749(2021)

  59. [59]

    Samuel Sledzieski, Meghana Kshirsagar, Minkyung Baek, Rahul Dodhia, Juan Lavista Ferres, and Bonnie Berger. 2024. Democratizing protein language models with parameter-efficient fine-tuning.Proceedings of the National Academy of Sciences121, 26 (2024), e2405840121

  60. [60]

    Vignesh Ram Somnath, Charlotte Bunne, and Andreas Krause. 2021. Multi-scale representation learning on proteins.Advances in Neural Information Processing Systems34 (2021), 25244–25255

  61. [61]

    Zhenqiao Song, Tinglin Huang, Lei Li, and Wengong Jin. 2024. SurfPro: Functional Protein Design Based on Continuous Surface.arXiv preprint arXiv:2405.06693 (2024)

  62. [62]

    Alexey Strokach, David Becerra, Carles Corbi-Verge, Albert Perez-Riba, and Philip M Kim. 2020. Fast and flexible protein design using deep graph neural networks.Cell systems11, 4 (2020), 402–411

  63. [63]

    Jin Su, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, and Fajie Yuan

  64. [64]

    bioRxiv(2023), 2023–10

    Saprot: Protein language modeling with structure-aware vocabulary. bioRxiv(2023), 2023–10

  65. [65]

    Daiwen Sun, He Huang, Yao Li, Xinqi Gong, and Qiwei Ye. 2024. DSR: dynamical surface representation as implicit neural networks for protein.Advances in Neural Information Processing Systems36 (2024)

  66. [66]

    Freyr Sverrisson, Jean Feydy, Bruno E Correia, and Michael M Bronstein. 2021. Fast end-to-end learning on protein surfaces. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15272–15281

  67. [67]

    Cheng Tan, Zhangyang Gao, Jun Xia, Bozhen Hu, and Stan Z Li. 2023. Global- context aware generative protein design. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5

  68. [68]

    Yang Tan, Mingchen Li, Bingxin Zhou, Bozitao Zhong, Lirong Zheng, Pan Tan, Ziyi Zhou, Huiqun Yu, Guisheng Fan, and Liang Hong. 2024. Simple, efficient and scalable structure-aware adapter boosts protein language models.arXiv preprint arXiv:2404.14850(2024)

  69. [69]

    Xiangru Tang, Xinwu Ye, Fang Wu, Daniel Shao, Dong Xu, and Mark Gerstein

  70. [70]

    InICML 2025 Generative AI and Biology (GenBio) Workshop

    BC-DESIGN: A Biochemistry-Aware Framework for Highly Accurate In- verse Protein Folding. InICML 2025 Generative AI and Biology (GenBio) Workshop

  71. [71]

    Xiaoyu Tian, Haoxi Ran, Yue Wang, and Hang Zhao. 2023. Geomae: Masked geometric target prediction for self-supervised point cloud pre-training. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13570–13580

  72. [72]

    Brian L Trippe, Jason Yim, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay, and Tommi Jaakkola. 2022. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem.arXiv preprint arXiv:2206.04119 (2022)

  73. [73]

    Kathryn Tunyasuvunakool, Jonas Adler, Zachary Wu, Tim Green, Michal Zielin- ski, Augustin Žídek, Alex Bridgland, Andrew Cowie, Clemens Meyer, Agata Laydon, et al. 2021. Highly accurate protein structure prediction for the human proteome.Nature596, 7873 (2021), 590–596

  74. [74]

    Michel Van Kempen, Stephanie S Kim, Charlotte Tumescheit, Milot Mirdita, Jeong- jae Lee, Cameron LM Gilchrist, Johannes Soding, and Martin Steinegger. 2024. Fast and accurate protein structure search with Foldseek.Nature biotechnology 42, 2 (2024), 243–246

  75. [75]

    Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, et al. 2022. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic acids research50, D1 (2022), D439–D444

  76. [76]

    Chuanrui Wang, Bozitao Zhong, Zuobai Zhang, Narendra Chaudhary, Sanchit Misra, and Jian Tang. 2023. PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design.arXiv preprint arXiv:2312.00080(2023)

  77. [77]

    Jingxue Wang, Huali Cao, John ZH Zhang, and Yifei Qi. 2018. Computational protein design with deep learning neural networks.Scientific reports8, 1 (2018), 1–9

  78. [78]

    Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, and Quan- quan Gu. 2024. Diffusion Language Models Are Versatile Protein Learners.arXiv preprint arXiv:2402.18567(2024)

  79. [79]

    Fang Wu. 2024. A semi-supervised molecular learning framework for activity cliff estimation. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 6080–6088. SurfDesign: Effective Protein Design on Molecular Surfaces KDD 2026, August 9–13, 2026, Jeju Island, Republic of Korea

  80. [80]

    Fang Wu. 2025. DiffAntiSeq: A Controllable Diffusion Model for Efficient Anti- body Library Design. InLLM for Scientific Discovery: Reasoning, Assistance, and Collaboration

Showing first 80 references.