D-Flow: Multi-modality Flow Matching for D-peptide Design

Fang Wu; James Zou; Junlin Xu; Mark Gerstein; Shuting Jin; Xiangru Tang

arxiv: 2411.10618 · v4 · submitted 2024-11-15 · 💻 cs.CE

D-Flow: Multi-modality Flow Matching for D-peptide Design

Fang Wu , Shuting Jin , Xiangru Tang , Junlin Xu , Mark Gerstein , James Zou This is my paper

Pith reviewed 2026-05-23 17:37 UTC · model grok-4.3

classification 💻 cs.CE

keywords D-peptide designflow matchingprotein language modelschiralityde novo designmulti-modalityreceptor bindingpeptide binder

0 comments

The pith

D-Flow generates D-peptides that align more closely with native sequences and structures by applying flow matching to chirality-mirrored L-protein data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents D-Flow as a generative framework that designs D-peptides conditioned on receptor binding. It overcomes the shortage of D-protein examples by reversing the handedness of L-receptors and routing structural information through a lightweight adapter into protein language model embeddings. The model works with full-atom details such as backbone frames, side-chain angles, and amino acid types, then uses a two-stage training process to shift from general design to specific binders. If the method succeeds, it would produce peptides that resist breakdown in the body and can be synthesized more readily for use as stable molecular tools.

Core claim

D-Flow is a full-atom flow matching framework for de novo D-peptide design. It represents structures with backbone frames, side-chain angles, and discrete amino acid types, then conditions generation on receptor binding. A mirror-image algorithm converts the chirality of L-receptors to create usable training signals, while a structural adapter injects conformational priors from protein language models into the D-peptide space. A two-stage pipeline lets the model retain broad pre-training knowledge when moving to targeted binder tasks. On the PepMerge benchmark the generated sequences and structures match native examples more closely than prior methods.

What carries the argument

The mirror-image algorithm that converts L-receptor chirality, paired with a structural adapter that injects protein structural representations into language model embeddings, inside a multi-modality flow matching model.

If this is right

Generated D-peptides achieve higher sequence identity to native examples than earlier methods
The model reaches improved affinity scores on receptor-binding tasks
Two-stage training preserves general design knowledge while enabling targeted binder generation
The resulting peptides support creation of proteolysis-resistant molecular tools and diagnostics

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Chirality mirroring combined with structural adapters could be tested on other handedness-sensitive design problems such as small-molecule ligands
The same adapter pattern might reduce data requirements in generative models for additional chiral or mirror-image molecular classes
Running the generated sequences through actual proteolytic stability assays would reveal whether the in silico improvements translate to measurable in vivo gains

Load-bearing premise

The mirror-image algorithm and structural adapter successfully transfer conformational priors from L-protein data to the D-peptide space without introducing systematic bias in binding geometry or sequence statistics.

What would settle it

Laboratory binding assays or crystal structures that directly compare the affinities and geometric matches of D-peptides synthesized from D-Flow outputs against those from baseline generators.

read the original abstract

Among these, D-peptides are resistant to proteolysis, exhibit greater in vivo stability, and are easier to synthesize. Despite advances in deep learning for peptide discovery, the scarcity of natural D-protein data limits the transfer of existing generative models to the D-peptide chemical space. We propose D-Flow, a full-atom flow-based framework for de novo D-peptide design. Conditioned on receptor binding, D-Flow uses structural representations incorporating backbone frames, side-chain angles, and discrete amino acid types. A mirror-image algorithm is implemented to address the lack of training data for D-proteins by converting the chirality of L-receptors. Furthermore, we enhance D-Flow's capacity by integrating protein language models (PLMs) with structural awareness through a lightweight structural adapter that injects structural representations into PLM embeddings. This enables D-Flow to learn conformational priors in the D-peptide chemical space and to accommodate the chiral selectivity of binding sites, thereby mitigating the scarcity of D-peptide data. A two-stage training pipeline and a control toolkit enable D-Flow to transition from general protein design to targeted binder design while preserving pre-training knowledge. Results on the PepMerge benchmark show D-Flow's effectiveness. D-peptides generated by D-Flow align more closely with native sequences and structures, with sequence identity improving by 10.2% over the best baseline, and the top affinity score reaching 24.31%. Overall, D-Flow shows potential for D-peptide design, facilitating the development of bioorthogonal and stable molecular tools and diagnostics. Code is available at https://github.com/smiles724/PeptideDesign.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

D-Flow pairs flow matching with a chirality mirror flip and a PLM structural adapter to target D-peptide design, but the reported gains sit on an unablated transfer step whose bias risk is not quantified.

read the letter

The core move is straightforward: take a flow-matching backbone, flip the chirality of L-receptors to create synthetic D-training data, and route structural features through a lightweight adapter into a protein language model. That combination is new for this subfield and directly tackles the data scarcity problem the abstract flags. The two-stage training pipeline and the released code are practical additions that let the model move from general protein generation to conditioned binder design without starting from scratch each time. Those pieces are worth noting because they are concrete and reproducible on the surface.

Referee Report

3 major / 2 minor

Summary. The paper proposes D-Flow, a full-atom flow-matching generative model for de novo D-peptide design conditioned on receptor binding. It addresses data scarcity via a mirror-image chirality conversion of L-receptors and a lightweight structural adapter that injects backbone/side-chain representations into PLM embeddings. A two-stage training pipeline with control toolkit is used to adapt from general protein design to targeted binders. On the PepMerge benchmark the method reports a 10.2 % gain in sequence identity over the best baseline and a top affinity score of 24.31 %.

Significance. If the performance claims are reproducible, the work would provide a practical route to proteolytically stable D-peptide binders, an area of clear therapeutic interest. Public release of the code is a concrete strength that supports verification. However, the absence of error bars, ablation tables, and explicit affinity-score protocols in the reported results limits the immediate impact assessment.

major comments (3)

[Abstract] Abstract: the headline claims (10.2 % sequence-identity lift, top affinity 24.31) are presented without error bars, number of independent runs, or any description of how the affinity scores were computed or normalized; these quantities are load-bearing for the central performance assertion.
[Methods] Methods (mirror-image algorithm and structural adapter): no ablation isolates the contribution of the chirality-flip step versus the PLM adapter, nor is any direct comparison provided between generated D-peptide frames and experimental D-protein structures; this leaves the transfer of conformational priors unquantified and vulnerable to the bias concern raised in the stress-test note.
[Results] Results (two-stage training): the claim that the pipeline “preserves pre-training knowledge” is not supported by a quantitative comparison (e.g., performance drop when the control toolkit or second stage is removed), making the preservation statement difficult to evaluate.

minor comments (2)

[Abstract] The PepMerge benchmark is referenced but never defined or cited; a short description or pointer to its source would improve clarity.
[Methods] Notation for side-chain angles and backbone frames is introduced without an explicit table or figure legend, complicating reproduction from the text alone.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity, add missing statistical details, and provide additional quantitative support where feasible.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claims (10.2 % sequence-identity lift, top affinity 24.31) are presented without error bars, number of independent runs, or any description of how the affinity scores were computed or normalized; these quantities are load-bearing for the central performance assertion.

Authors: We agree that these details are necessary for reproducibility and impact assessment. In the revised manuscript we will report the number of independent runs, include error bars (standard deviation) on the reported metrics, and add a concise description of the affinity-score computation and normalization protocol (with full details moved to the Methods section). revision: yes
Referee: [Methods] Methods (mirror-image algorithm and structural adapter): no ablation isolates the contribution of the chirality-flip step versus the PLM adapter, nor is any direct comparison provided between generated D-peptide frames and experimental D-protein structures; this leaves the transfer of conformational priors unquantified and vulnerable to the bias concern raised in the stress-test note.

Authors: We will add ablation experiments that isolate the mirror-image chirality conversion from the structural adapter. Direct comparison to experimental D-protein structures is limited by the extreme scarcity of such data; we will include the best available mirror-image structural comparisons and explicitly discuss remaining limitations and potential bias in the revised Methods and Results sections. revision: partial
Referee: [Results] Results (two-stage training): the claim that the pipeline “preserves pre-training knowledge” is not supported by a quantitative comparison (e.g., performance drop when the control toolkit or second stage is removed), making the preservation statement difficult to evaluate.

Authors: We will include new quantitative experiments that measure performance drop when the second stage or control toolkit is ablated, thereby providing direct support for the preservation claim. revision: yes

Circularity Check

0 steps flagged

No circularity: generative model evaluated on external benchmarks with no fitted quantities redefined as predictions

full rationale

The paper presents a flow-matching generative model for D-peptides that incorporates a mirror-image chirality conversion and a PLM structural adapter. All reported performance numbers (10.2% sequence-identity lift, top affinity 24.31) are obtained from external benchmark evaluation on PepMerge rather than from any internal fit or self-referential definition. No equations, parameters, or uniqueness claims are shown to reduce to their own inputs by construction, and no self-citation chain is invoked as load-bearing justification for the central claims. The derivation therefore remains self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central performance claims rest on the unstated assumption that the mirror-image conversion preserves binding-relevant geometry and that the PepMerge benchmark is an unbiased proxy for real D-peptide affinity. No free parameters, axioms, or invented entities are enumerated in the abstract.

pith-pipeline@v0.9.0 · 5842 in / 1083 out tokens · 15340 ms · 2026-05-23T17:37:37.783633+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

D-Flow uses structural representations incorporating backbone frames, side-chain angles, and discrete amino acid types. A mirror-image algorithm is implemented to address the lack of training data for D-proteins by converting the chirality of L-receptors.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 2 internal anchors

[1]

Therapeutic peptides: current applications and future directions,

L. Wanget al., “Therapeutic peptides: current applications and future directions,”Signal transduction and targeted therapy, vol. 7, no. 1, p. 48, 2022

work page 2022
[2]

Ppflow: Target-aware peptide design with torsional flow matching,

H. Linet al., “Ppflow: Target-aware peptide design with torsional flow matching,”bioRxiv, pp. 2024–03, 2024

work page 2024
[3]

Full-atom peptide design based on multi-modal flow matching,

J. Liet al., “Full-atom peptide design based on multi-modal flow matching,”arXiv preprint arXiv:2406.00735, 2024

work page arXiv 2024
[4]

Flow Matching for Generative Modeling

Y . Lipmanet al., “Flow matching for generative modeling,”arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

Molformer: Motif-based transformer on 3d heterogeneous molecular graphs,

F. Wu, D. Radev, and S. Z. Li, “Molformer: Motif-based transformer on 3d heterogeneous molecular graphs,” inAAAI Conference on Artificial Intelligence, vol. 37, no. 4, 2023, pp. 5312–5320

work page 2023
[6]

A hierarchical training paradigm for antibody structure-sequence co-design,

F. Wu and S. Z. Li, “A hierarchical training paradigm for antibody structure-sequence co-design,”Advances in Neural Information Process- ing Systems, vol. 36, 2024

work page 2024
[7]

arXiv preprint arXiv:2206.04119 , year=

B. L. Trippeet al., “Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem,”arXiv preprint arXiv:2206.04119, 2022

work page arXiv 2022
[8]

Protein structure generation via folding diffusion,

K. E. Wuet al., “Protein structure generation via folding diffusion,” Nature communications, vol. 15, no. 1, p. 1059, 2024

work page 2024
[9]

Se (3) diffusion model with application to protein backbone generation,

J. Yimet al., “Se (3) diffusion model with application to protein backbone generation,”arXiv preprint arXiv:2302.02277, 2023

work page arXiv 2023
[10]

3d equivariant diffusion for target-aware molecule generation and affinity prediction,

J. Guan, W. W. Qian, X. Peng, Y . Su, J. Peng, and J. Ma, “3d equivariant diffusion for target-aware molecule generation and affinity prediction,” arXiv preprint arXiv:2303.03543, 2023

work page arXiv 2023
[11]

Abdif- fuser: full-atom generation of in-vitro functioning antibodies,

K. Martinkus, J. Ludwiczak, W.-C. Liang, J. Lafrance-Vanasse, I. Hotzel, A. Rajpal, Y . Wu, K. Cho, R. Bonneau, V . Gligorijevicet al., “Abdif- fuser: full-atom generation of in-vitro functioning antibodies,”Advances in Neural Information Processing Systems, vol. 36, 2024. 10 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS

work page 2024
[12]

Fast protein backbone generation with se (3) flow matching,

J. Yimet al., “Fast protein backbone generation with se (3) flow matching,”arXiv preprint arXiv:2310.05297, 2023

work page arXiv 2023
[13]

A hybrid diffusion model for stable, affinity-driven, receptor-aware peptide generation,

V . S. Ramasubramanianet al., “A hybrid diffusion model for stable, affinity-driven, receptor-aware peptide generation,”bioRxiv, pp. 2024– 03, 2024

work page 2024
[14]

D-peptide and d-protein technology: recent advances, challenges, and opportunities,

A. J. Lander, Y . Jin, and L. Y . Luk, “D-peptide and d-protein technology: recent advances, challenges, and opportunities,”ChemBioChem, vol. 24, no. 4, p. e202200537, 2023

work page 2023
[15]

Accurate de novo design of heterochiral protein-protein interactions,

K. Sunet al., “Accurate de novo design of heterochiral protein-protein interactions,”Cell Research, pp. 1–13, 2024

work page 2024
[16]

Structure-informed language models are protein designers,

Z. Zhenget al., “Structure-informed language models are protein designers,” inInternational conference on machine learning. PMLR, 2023, pp. 42 317–42 338

work page 2023
[17]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inIEEE/CVF International Conference on Computer Vision, 2023, pp. 3836–3847

work page 2023
[18]

3d-transformer: Molecular representation with transformer in 3d space,

F. Wu, Q. Zhang, D. Radev, J. Cui, W. Zhang, H. Xing, N. Zhang, and H. Chen, “3d-transformer: Molecular representation with transformer in 3d space,” 2021

work page 2021
[19]

Diffmd: A geometric diffusion model for molecular dynamics simulations,

F. Wu and S. Z. Li, “Diffmd: A geometric diffusion model for molecular dynamics simulations,” inAAAI Conference on Artificial Intelligence, vol. 37, no. 4, 2023, pp. 5321–5329

work page 2023
[20]

Diffantiseq: A controllable diffusion model for efficient antibody library design,

F. Wu, “Diffantiseq: A controllable diffusion model for efficient antibody library design,” inLLM for Scientific Discovery: Reasoning, Assistance, and Collaboration

work page
[21]

Dynamics-inspired structure hallucination for protein-protein interaction modeling,

F. Wu and S. Z. Li, “Dynamics-inspired structure hallucination for protein-protein interaction modeling,”arXiv preprint arXiv:2601.06214, 2026

work page arXiv 2026
[22]

Generalized implicit neural representations for dynamic molecular surface modeling,

F. Wu, B. Hu, and S. Z. Li, “Generalized implicit neural representations for dynamic molecular surface modeling,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 1, 2025, pp. 877–885

work page 2025
[23]

Pre-training of equivariant graph matching networks with conformation flexibility for drug binding,

F. Wuet al., “Pre-training of equivariant graph matching networks with conformation flexibility for drug binding,”Advanced Science, vol. 9, no. 33, p. 2203796, 2022

work page 2022
[24]

Discovering the representation bottleneck of graph neural networks from multi-order interactions,

F. Wu, S. Li, L. Wu, S. Z. Li, D. Radev, and Q. Zhang, “Discovering the representation bottleneck of graph neural networks from multi-order interactions,”arXiv preprint arXiv:2205.07266, 2022

work page arXiv 2022
[25]

Instructbio: A large-scale semi-supervised learning paradigm for biochemical problems,

F. Wu, H. Qin, W. Gao, S. Li, C. W. Coley, S. Z. Li, X. Zhan, and J. Xu, “Instructbio: A large-scale semi-supervised learning paradigm for biochemical problems,”arXiv preprint arXiv:2304.03906, 2023

work page arXiv 2023
[26]

Surface-vqmae: Vector-quantized masked auto- encoders on molecular surfaces,

F. Wu and S. Z. Li, “Surface-vqmae: Vector-quantized masked auto- encoders on molecular surfaces,” inInternational Conference on Ma- chine Learning. PMLR, 2024, pp. 53 619–53 634

work page 2024
[27]

PoseX: AI Defeats Physics Approaches on Protein-Ligand Cross Docking

Y . Jiang, X. Li, Y . Zhang, J. Han, Y . Xu, A. Pandit, Z. Zhang, M. Wang, M. Wang, C. Liuet al., “Posex: Ai defeats physics approaches on protein-ligand cross docking,”arXiv preprint arXiv:2505.01700, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

Predicting mutational effects on protein binding from folding energy,

A. Deng, K. Householder, F. Wu, S. Thrun, K. C. Garcia, and B. Trippe, “Predicting mutational effects on protein binding from folding energy,” arXiv preprint arXiv:2507.05502, 2025

work page arXiv 2025
[29]

De novo design of protein structure and function with rfdiffusion,

J. L. Watsonet al., “De novo design of protein structure and function with rfdiffusion,”Nature, vol. 620, no. 7976, pp. 1089–1100, 2023

work page 2023
[30]

Generalized biomolecular modeling and design with rosettafold all- atom,

R. Krishna, J. Wang, W. Ahern, P. Sturmfels, P. Venkatesh, I. Kalvet, G. R. Lee, F. S. Morey-Burrows, I. Anishchenko, I. R. Humphreyset al., “Generalized biomolecular modeling and design with rosettafold all- atom,”Science, vol. 384, no. 6693, p. eadl2528, 2024

work page 2024
[31]

Rotamer density estimator is an unsupervised learner of the effect of mutations on protein- protein interaction,

S. Luo, Y . Su, Z. Wu, C. Su, J. Peng, and J. Ma, “Rotamer density estimator is an unsupervised learner of the effect of mutations on protein- protein interaction,”bioRxiv, pp. 2023–02, 2023

work page 2023
[32]

Full-atom peptide design with geometric latent diffusion,

X. Kong, W. Huang, and Y . Liu, “Full-atom peptide design with geometric latent diffusion,”arXiv preprint arXiv:2402.13555, 2024

work page arXiv 2024
[33]

Initiating translation with d-amino acids,

Y . Goto, H. Murakami, and H. Suga, “Initiating translation with d-amino acids,”Rna, vol. 14, no. 7, pp. 1390–1398, 2008

work page 2008
[34]

Characterization of d-amino acid aminotransferase from lactobacillus salivarius,

J. Kobayashi, Y . Shimizu, Y . Mutaguchi, K. Doi, and T. Ohshima, “Characterization of d-amino acid aminotransferase from lactobacillus salivarius,”Journal of Molecular Catalysis B: Enzymatic, vol. 94, pp. 15–22, 2013

work page 2013
[35]

Peptide epimerization machineries found in microorganisms,

Y . Ogasawara and T. Dairi, “Peptide epimerization machineries found in microorganisms,”Frontiers in microbiology, vol. 9, p. 156, 2018

work page 2018
[36]

Extended dipeptide composition framework for accurate identification of anticancer peptides,

F. Ullah, A. Salam, M. Nadeem, F. Amin, H. AlSalman, M. Abrar, and T. Alfakih, “Extended dipeptide composition framework for accurate identification of anticancer peptides,”Scientific Reports, vol. 14, no. 1, p. 17381, 2024

work page 2024
[37]

Recent advances in chemical protein synthesis: method developments and biological applications,

S. Dong, J.-S. Zheng, Y . Li, H. Wang, G. Chen, Y . Chen, G. Fang, J. Guo, C. He, H. Huet al., “Recent advances in chemical protein synthesis: method developments and biological applications,”Science China Chemistry, vol. 67, no. 4, pp. 1060–1096, 2024

work page 2024
[38]

Mirror-image ligand discovery enabled by single-shot fast-flow synthesis of d-proteins,

A. J. Callahan, S. Gandhesiri, T. L. Travaline, R. M. Reja, L. Lozano Salazar, S. Hanna, Y .-C. Lee, K. Li, O. S. Tokareva, J.-M. Swiecickiet al., “Mirror-image ligand discovery enabled by single-shot fast-flow synthesis of d-proteins,”Nature communications, vol. 15, no. 1, p. 1813, 2024

work page 2024
[39]

Mirror-image protein and peptide drug discovery through mirror-image phage display,

Y .-K. Qi, J.-S. Zheng, and L. Liu, “Mirror-image protein and peptide drug discovery through mirror-image phage display,”Chem, 2024

work page 2024
[40]

Antigen-specific antibody design and optimization with diffusion-based generative models,

S. Luoet al., “Antigen-specific antibody design and optimization with diffusion-based generative models,”bioRxiv, 2022

work page 2022
[41]

De novo design of d-peptide ligands: Application to influenza virus hemagglutinin,

J. Juraszek, R. U. Kadam, D. Branduardi, J. van Ameijde, D. Garg, N. Dailly, M. Jongeneelen, J. Vermond, J. P. Brakenhoff, B. Brandenburg et al., “De novo design of d-peptide ligands: Application to influenza virus hemagglutinin,”Proceedings of the National Academy of Sciences, vol. 122, no. 26, p. e2426554122, 2025

work page 2025
[42]

Gen- erative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design,

A. Campbell, J. Yim, R. Barzilay, T. Rainforth, and T. Jaakkola, “Gen- erative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design,”arXiv preprint arXiv:2402.04997, 2024

work page arXiv 2024
[43]

Discrete flow matching,

I. Gatet al., “Discrete flow matching,”arXiv preprint arXiv:2407.15595, 2024

work page arXiv 2024
[44]

Highly accurate protein structure prediction with alphafold,

J. Jumperet al., “Highly accurate protein structure prediction with alphafold,”Nature, vol. 596, no. 7873, pp. 583–589, 2021

work page 2021
[45]

Language models of protein sequences at the scale of evolution enable accurate structure prediction,

Z. Linet al., “Language models of protein sequences at the scale of evolution enable accurate structure prediction,”bioRxiv, 2022

work page 2022
[46]

Integration of pre- trained protein language models into geometric deep learning networks,

F. Wu, L. Wu, D. Radev, J. Xu, and S. Z. Li, “Integration of pre- trained protein language models into geometric deep learning networks,” Communications Biology, vol. 6, no. 1, p. 876, 2023

work page 2023
[47]

Sabdab: the structural antibody database,

J. Dunbaret al., “Sabdab: the structural antibody database,”Nucleic acids research, vol. 42, no. D1, pp. D1140–D1146, 2014

work page 2014
[48]

Joint generation of protein sequence and structure with rosettafold sequence space diffusion,

S. L. Lisanzaet al., “Joint generation of protein sequence and structure with rosettafold sequence space diffusion,”bioRxiv, pp. 2023–05, 2023

work page 2023
[49]

Mirror-image trypsin digestion and sequencing of d-proteins,

G. Zhang and T. F. Zhu, “Mirror-image trypsin digestion and sequencing of d-proteins,”Nature Chemistry, vol. 16, no. 4, pp. 592–598, 2024

work page 2024
[50]

Pepbdb: a comprehensive structural database of biological peptide–protein interactions,

Z. Wen, J. He, H. Tao, and S.-Y . Huang, “Pepbdb: a comprehensive structural database of biological peptide–protein interactions,”Bioinfor- matics, vol. 35, no. 1, pp. 175–177, 2019

work page 2019
[51]

Q-biolip: A comprehensive resource for quaternary structure-based protein–ligand interactions,

H. Wei, W. Wang, Z. Peng, and J. Yang, “Q-biolip: A comprehensive resource for quaternary structure-based protein–ligand interactions,” Genomics, Proteomics & Bioinformatics, vol. 22, no. 1, 2024

work page 2024
[52]

Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets,

M. Steinegger and J. S ¨oding, “Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets,”Nature biotechnology, vol. 35, no. 11, pp. 1026–1028, 2017

work page 2017
[53]

The protein data bank,

H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, “The protein data bank,” Nucleic acids research, vol. 28, no. 1, pp. 235–242, 2000

work page 2000
[54]

Dictionary of protein secondary struc- ture: pattern recognition of hydrogen-bonded and geometrical features,

W. Kabsch and C. Sander, “Dictionary of protein secondary struc- ture: pattern recognition of hydrogen-bonded and geometrical features,” Biopolymers: Original Research on Biomolecules, vol. 22, no. 12, pp. 2577–2637, 1983

work page 1983
[55]

Robust deep learning–based protein sequence design using proteinmpnn,

J. Dauparaset al., “Robust deep learning–based protein sequence design using proteinmpnn,”Science, vol. 378, no. 6615, pp. 49–56, 2022

work page 2022
[56]

The rosetta all-atom energy function for macro- molecular modeling and design,

R. F. Alfordet al., “The rosetta all-atom energy function for macro- molecular modeling and design,”Journal of chemical theory and com- putation, vol. 13, no. 6, pp. 3031–3048, 2017

work page 2017
[57]

Scoring function for automated assessment of protein structure template quality,

Y . Zhang and J. Skolnick, “Scoring function for automated assessment of protein structure template quality,”Proteins: Structure, Function, and Bioinformatics, vol. 57, no. 4, pp. 702–710, 2004

work page 2004
[58]

(23) Varadi, M.; Velankar, S

G. Corsoet al., “Diffdock: Diffusion steps, twists, and turns for molecular docking,”arXiv preprint arXiv:2210.01776, 2022

work page arXiv 2022
[59]

Atom3d: Tasks on molecules in three dimen- sions,

R. J. Townshendet al., “Atom3d: Tasks on molecules in three dimen- sions,”arXiv preprint arXiv:2012.04035, 2020

work page arXiv 2012
[60]

Harnessing protein folding neural networks for peptide–protein docking,

T. Tsaban, J. K. Varga, O. Avraham, Z. Ben-Aharon, A. Khramushin, and O. Schueler-Furman, “Harnessing protein folding neural networks for peptide–protein docking,”Nature communications, vol. 13, no. 1, p. 176, 2022

work page 2022
[61]

Evolutionary debunking arguments meet evolu- tionary science,

A. Levy and Y . Levy, “Evolutionary debunking arguments meet evolu- tionary science,”Philosophy and Phenomenological Research, vol. 100, no. 3, pp. 491–509, 2020

work page 2020
[62]

Symmetry and the energy landscapes of biomolecules,

P. G. Wolynes, “Symmetry and the energy landscapes of biomolecules,” National Academy of Sciences, vol. 93, no. 25, pp. 14 249–14 255, 1996

work page 1996
[63]

Protein structure and sequence generation with equivariant denoising diffusion probabilistic models,

N. Anand and T. Achim, “Protein structure and sequence generation with equivariant denoising diffusion probabilistic models,”arXiv preprint arXiv:2205.15019, 2022

work page arXiv 2022
[64]

Deep generative modeling for protein design,

A. Strokach and P. M. Kim, “Deep generative modeling for protein design,”Current opinion in structural biology, vol. 72, pp. 226–236, 2022

work page 2022

[1] [1]

Therapeutic peptides: current applications and future directions,

L. Wanget al., “Therapeutic peptides: current applications and future directions,”Signal transduction and targeted therapy, vol. 7, no. 1, p. 48, 2022

work page 2022

[2] [2]

Ppflow: Target-aware peptide design with torsional flow matching,

H. Linet al., “Ppflow: Target-aware peptide design with torsional flow matching,”bioRxiv, pp. 2024–03, 2024

work page 2024

[3] [3]

Full-atom peptide design based on multi-modal flow matching,

J. Liet al., “Full-atom peptide design based on multi-modal flow matching,”arXiv preprint arXiv:2406.00735, 2024

work page arXiv 2024

[4] [4]

Flow Matching for Generative Modeling

Y . Lipmanet al., “Flow matching for generative modeling,”arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[5] [5]

Molformer: Motif-based transformer on 3d heterogeneous molecular graphs,

F. Wu, D. Radev, and S. Z. Li, “Molformer: Motif-based transformer on 3d heterogeneous molecular graphs,” inAAAI Conference on Artificial Intelligence, vol. 37, no. 4, 2023, pp. 5312–5320

work page 2023

[6] [6]

A hierarchical training paradigm for antibody structure-sequence co-design,

F. Wu and S. Z. Li, “A hierarchical training paradigm for antibody structure-sequence co-design,”Advances in Neural Information Process- ing Systems, vol. 36, 2024

work page 2024

[7] [7]

arXiv preprint arXiv:2206.04119 , year=

B. L. Trippeet al., “Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem,”arXiv preprint arXiv:2206.04119, 2022

work page arXiv 2022

[8] [8]

Protein structure generation via folding diffusion,

K. E. Wuet al., “Protein structure generation via folding diffusion,” Nature communications, vol. 15, no. 1, p. 1059, 2024

work page 2024

[9] [9]

Se (3) diffusion model with application to protein backbone generation,

J. Yimet al., “Se (3) diffusion model with application to protein backbone generation,”arXiv preprint arXiv:2302.02277, 2023

work page arXiv 2023

[10] [10]

3d equivariant diffusion for target-aware molecule generation and affinity prediction,

J. Guan, W. W. Qian, X. Peng, Y . Su, J. Peng, and J. Ma, “3d equivariant diffusion for target-aware molecule generation and affinity prediction,” arXiv preprint arXiv:2303.03543, 2023

work page arXiv 2023

[11] [11]

Abdif- fuser: full-atom generation of in-vitro functioning antibodies,

K. Martinkus, J. Ludwiczak, W.-C. Liang, J. Lafrance-Vanasse, I. Hotzel, A. Rajpal, Y . Wu, K. Cho, R. Bonneau, V . Gligorijevicet al., “Abdif- fuser: full-atom generation of in-vitro functioning antibodies,”Advances in Neural Information Processing Systems, vol. 36, 2024. 10 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS

work page 2024

[12] [12]

Fast protein backbone generation with se (3) flow matching,

J. Yimet al., “Fast protein backbone generation with se (3) flow matching,”arXiv preprint arXiv:2310.05297, 2023

work page arXiv 2023

[13] [13]

A hybrid diffusion model for stable, affinity-driven, receptor-aware peptide generation,

V . S. Ramasubramanianet al., “A hybrid diffusion model for stable, affinity-driven, receptor-aware peptide generation,”bioRxiv, pp. 2024– 03, 2024

work page 2024

[14] [14]

D-peptide and d-protein technology: recent advances, challenges, and opportunities,

A. J. Lander, Y . Jin, and L. Y . Luk, “D-peptide and d-protein technology: recent advances, challenges, and opportunities,”ChemBioChem, vol. 24, no. 4, p. e202200537, 2023

work page 2023

[15] [15]

Accurate de novo design of heterochiral protein-protein interactions,

K. Sunet al., “Accurate de novo design of heterochiral protein-protein interactions,”Cell Research, pp. 1–13, 2024

work page 2024

[16] [16]

Structure-informed language models are protein designers,

Z. Zhenget al., “Structure-informed language models are protein designers,” inInternational conference on machine learning. PMLR, 2023, pp. 42 317–42 338

work page 2023

[17] [17]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inIEEE/CVF International Conference on Computer Vision, 2023, pp. 3836–3847

work page 2023

[18] [18]

3d-transformer: Molecular representation with transformer in 3d space,

F. Wu, Q. Zhang, D. Radev, J. Cui, W. Zhang, H. Xing, N. Zhang, and H. Chen, “3d-transformer: Molecular representation with transformer in 3d space,” 2021

work page 2021

[19] [19]

Diffmd: A geometric diffusion model for molecular dynamics simulations,

F. Wu and S. Z. Li, “Diffmd: A geometric diffusion model for molecular dynamics simulations,” inAAAI Conference on Artificial Intelligence, vol. 37, no. 4, 2023, pp. 5321–5329

work page 2023

[20] [20]

Diffantiseq: A controllable diffusion model for efficient antibody library design,

F. Wu, “Diffantiseq: A controllable diffusion model for efficient antibody library design,” inLLM for Scientific Discovery: Reasoning, Assistance, and Collaboration

work page

[21] [21]

Dynamics-inspired structure hallucination for protein-protein interaction modeling,

F. Wu and S. Z. Li, “Dynamics-inspired structure hallucination for protein-protein interaction modeling,”arXiv preprint arXiv:2601.06214, 2026

work page arXiv 2026

[22] [22]

Generalized implicit neural representations for dynamic molecular surface modeling,

F. Wu, B. Hu, and S. Z. Li, “Generalized implicit neural representations for dynamic molecular surface modeling,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 1, 2025, pp. 877–885

work page 2025

[23] [23]

Pre-training of equivariant graph matching networks with conformation flexibility for drug binding,

F. Wuet al., “Pre-training of equivariant graph matching networks with conformation flexibility for drug binding,”Advanced Science, vol. 9, no. 33, p. 2203796, 2022

work page 2022

[24] [24]

Discovering the representation bottleneck of graph neural networks from multi-order interactions,

F. Wu, S. Li, L. Wu, S. Z. Li, D. Radev, and Q. Zhang, “Discovering the representation bottleneck of graph neural networks from multi-order interactions,”arXiv preprint arXiv:2205.07266, 2022

work page arXiv 2022

[25] [25]

Instructbio: A large-scale semi-supervised learning paradigm for biochemical problems,

F. Wu, H. Qin, W. Gao, S. Li, C. W. Coley, S. Z. Li, X. Zhan, and J. Xu, “Instructbio: A large-scale semi-supervised learning paradigm for biochemical problems,”arXiv preprint arXiv:2304.03906, 2023

work page arXiv 2023

[26] [26]

Surface-vqmae: Vector-quantized masked auto- encoders on molecular surfaces,

F. Wu and S. Z. Li, “Surface-vqmae: Vector-quantized masked auto- encoders on molecular surfaces,” inInternational Conference on Ma- chine Learning. PMLR, 2024, pp. 53 619–53 634

work page 2024

[27] [27]

PoseX: AI Defeats Physics Approaches on Protein-Ligand Cross Docking

Y . Jiang, X. Li, Y . Zhang, J. Han, Y . Xu, A. Pandit, Z. Zhang, M. Wang, M. Wang, C. Liuet al., “Posex: Ai defeats physics approaches on protein-ligand cross docking,”arXiv preprint arXiv:2505.01700, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

Predicting mutational effects on protein binding from folding energy,

A. Deng, K. Householder, F. Wu, S. Thrun, K. C. Garcia, and B. Trippe, “Predicting mutational effects on protein binding from folding energy,” arXiv preprint arXiv:2507.05502, 2025

work page arXiv 2025

[29] [29]

De novo design of protein structure and function with rfdiffusion,

J. L. Watsonet al., “De novo design of protein structure and function with rfdiffusion,”Nature, vol. 620, no. 7976, pp. 1089–1100, 2023

work page 2023

[30] [30]

Generalized biomolecular modeling and design with rosettafold all- atom,

R. Krishna, J. Wang, W. Ahern, P. Sturmfels, P. Venkatesh, I. Kalvet, G. R. Lee, F. S. Morey-Burrows, I. Anishchenko, I. R. Humphreyset al., “Generalized biomolecular modeling and design with rosettafold all- atom,”Science, vol. 384, no. 6693, p. eadl2528, 2024

work page 2024

[31] [31]

Rotamer density estimator is an unsupervised learner of the effect of mutations on protein- protein interaction,

S. Luo, Y . Su, Z. Wu, C. Su, J. Peng, and J. Ma, “Rotamer density estimator is an unsupervised learner of the effect of mutations on protein- protein interaction,”bioRxiv, pp. 2023–02, 2023

work page 2023

[32] [32]

Full-atom peptide design with geometric latent diffusion,

X. Kong, W. Huang, and Y . Liu, “Full-atom peptide design with geometric latent diffusion,”arXiv preprint arXiv:2402.13555, 2024

work page arXiv 2024

[33] [33]

Initiating translation with d-amino acids,

Y . Goto, H. Murakami, and H. Suga, “Initiating translation with d-amino acids,”Rna, vol. 14, no. 7, pp. 1390–1398, 2008

work page 2008

[34] [34]

Characterization of d-amino acid aminotransferase from lactobacillus salivarius,

J. Kobayashi, Y . Shimizu, Y . Mutaguchi, K. Doi, and T. Ohshima, “Characterization of d-amino acid aminotransferase from lactobacillus salivarius,”Journal of Molecular Catalysis B: Enzymatic, vol. 94, pp. 15–22, 2013

work page 2013

[35] [35]

Peptide epimerization machineries found in microorganisms,

Y . Ogasawara and T. Dairi, “Peptide epimerization machineries found in microorganisms,”Frontiers in microbiology, vol. 9, p. 156, 2018

work page 2018

[36] [36]

Extended dipeptide composition framework for accurate identification of anticancer peptides,

F. Ullah, A. Salam, M. Nadeem, F. Amin, H. AlSalman, M. Abrar, and T. Alfakih, “Extended dipeptide composition framework for accurate identification of anticancer peptides,”Scientific Reports, vol. 14, no. 1, p. 17381, 2024

work page 2024

[37] [37]

Recent advances in chemical protein synthesis: method developments and biological applications,

S. Dong, J.-S. Zheng, Y . Li, H. Wang, G. Chen, Y . Chen, G. Fang, J. Guo, C. He, H. Huet al., “Recent advances in chemical protein synthesis: method developments and biological applications,”Science China Chemistry, vol. 67, no. 4, pp. 1060–1096, 2024

work page 2024

[38] [38]

Mirror-image ligand discovery enabled by single-shot fast-flow synthesis of d-proteins,

A. J. Callahan, S. Gandhesiri, T. L. Travaline, R. M. Reja, L. Lozano Salazar, S. Hanna, Y .-C. Lee, K. Li, O. S. Tokareva, J.-M. Swiecickiet al., “Mirror-image ligand discovery enabled by single-shot fast-flow synthesis of d-proteins,”Nature communications, vol. 15, no. 1, p. 1813, 2024

work page 2024

[39] [39]

Mirror-image protein and peptide drug discovery through mirror-image phage display,

Y .-K. Qi, J.-S. Zheng, and L. Liu, “Mirror-image protein and peptide drug discovery through mirror-image phage display,”Chem, 2024

work page 2024

[40] [40]

Antigen-specific antibody design and optimization with diffusion-based generative models,

S. Luoet al., “Antigen-specific antibody design and optimization with diffusion-based generative models,”bioRxiv, 2022

work page 2022

[41] [41]

De novo design of d-peptide ligands: Application to influenza virus hemagglutinin,

J. Juraszek, R. U. Kadam, D. Branduardi, J. van Ameijde, D. Garg, N. Dailly, M. Jongeneelen, J. Vermond, J. P. Brakenhoff, B. Brandenburg et al., “De novo design of d-peptide ligands: Application to influenza virus hemagglutinin,”Proceedings of the National Academy of Sciences, vol. 122, no. 26, p. e2426554122, 2025

work page 2025

[42] [42]

Gen- erative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design,

A. Campbell, J. Yim, R. Barzilay, T. Rainforth, and T. Jaakkola, “Gen- erative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design,”arXiv preprint arXiv:2402.04997, 2024

work page arXiv 2024

[43] [43]

Discrete flow matching,

I. Gatet al., “Discrete flow matching,”arXiv preprint arXiv:2407.15595, 2024

work page arXiv 2024

[44] [44]

Highly accurate protein structure prediction with alphafold,

J. Jumperet al., “Highly accurate protein structure prediction with alphafold,”Nature, vol. 596, no. 7873, pp. 583–589, 2021

work page 2021

[45] [45]

Language models of protein sequences at the scale of evolution enable accurate structure prediction,

Z. Linet al., “Language models of protein sequences at the scale of evolution enable accurate structure prediction,”bioRxiv, 2022

work page 2022

[46] [46]

Integration of pre- trained protein language models into geometric deep learning networks,

F. Wu, L. Wu, D. Radev, J. Xu, and S. Z. Li, “Integration of pre- trained protein language models into geometric deep learning networks,” Communications Biology, vol. 6, no. 1, p. 876, 2023

work page 2023

[47] [47]

Sabdab: the structural antibody database,

J. Dunbaret al., “Sabdab: the structural antibody database,”Nucleic acids research, vol. 42, no. D1, pp. D1140–D1146, 2014

work page 2014

[48] [48]

Joint generation of protein sequence and structure with rosettafold sequence space diffusion,

S. L. Lisanzaet al., “Joint generation of protein sequence and structure with rosettafold sequence space diffusion,”bioRxiv, pp. 2023–05, 2023

work page 2023

[49] [49]

Mirror-image trypsin digestion and sequencing of d-proteins,

G. Zhang and T. F. Zhu, “Mirror-image trypsin digestion and sequencing of d-proteins,”Nature Chemistry, vol. 16, no. 4, pp. 592–598, 2024

work page 2024

[50] [50]

Pepbdb: a comprehensive structural database of biological peptide–protein interactions,

Z. Wen, J. He, H. Tao, and S.-Y . Huang, “Pepbdb: a comprehensive structural database of biological peptide–protein interactions,”Bioinfor- matics, vol. 35, no. 1, pp. 175–177, 2019

work page 2019

[51] [51]

Q-biolip: A comprehensive resource for quaternary structure-based protein–ligand interactions,

H. Wei, W. Wang, Z. Peng, and J. Yang, “Q-biolip: A comprehensive resource for quaternary structure-based protein–ligand interactions,” Genomics, Proteomics & Bioinformatics, vol. 22, no. 1, 2024

work page 2024

[52] [52]

Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets,

M. Steinegger and J. S ¨oding, “Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets,”Nature biotechnology, vol. 35, no. 11, pp. 1026–1028, 2017

work page 2017

[53] [53]

The protein data bank,

H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, “The protein data bank,” Nucleic acids research, vol. 28, no. 1, pp. 235–242, 2000

work page 2000

[54] [54]

Dictionary of protein secondary struc- ture: pattern recognition of hydrogen-bonded and geometrical features,

W. Kabsch and C. Sander, “Dictionary of protein secondary struc- ture: pattern recognition of hydrogen-bonded and geometrical features,” Biopolymers: Original Research on Biomolecules, vol. 22, no. 12, pp. 2577–2637, 1983

work page 1983

[55] [55]

Robust deep learning–based protein sequence design using proteinmpnn,

J. Dauparaset al., “Robust deep learning–based protein sequence design using proteinmpnn,”Science, vol. 378, no. 6615, pp. 49–56, 2022

work page 2022

[56] [56]

The rosetta all-atom energy function for macro- molecular modeling and design,

R. F. Alfordet al., “The rosetta all-atom energy function for macro- molecular modeling and design,”Journal of chemical theory and com- putation, vol. 13, no. 6, pp. 3031–3048, 2017

work page 2017

[57] [57]

Scoring function for automated assessment of protein structure template quality,

Y . Zhang and J. Skolnick, “Scoring function for automated assessment of protein structure template quality,”Proteins: Structure, Function, and Bioinformatics, vol. 57, no. 4, pp. 702–710, 2004

work page 2004

[58] [58]

(23) Varadi, M.; Velankar, S

G. Corsoet al., “Diffdock: Diffusion steps, twists, and turns for molecular docking,”arXiv preprint arXiv:2210.01776, 2022

work page arXiv 2022

[59] [59]

Atom3d: Tasks on molecules in three dimen- sions,

R. J. Townshendet al., “Atom3d: Tasks on molecules in three dimen- sions,”arXiv preprint arXiv:2012.04035, 2020

work page arXiv 2012

[60] [60]

Harnessing protein folding neural networks for peptide–protein docking,

T. Tsaban, J. K. Varga, O. Avraham, Z. Ben-Aharon, A. Khramushin, and O. Schueler-Furman, “Harnessing protein folding neural networks for peptide–protein docking,”Nature communications, vol. 13, no. 1, p. 176, 2022

work page 2022

[61] [61]

Evolutionary debunking arguments meet evolu- tionary science,

A. Levy and Y . Levy, “Evolutionary debunking arguments meet evolu- tionary science,”Philosophy and Phenomenological Research, vol. 100, no. 3, pp. 491–509, 2020

work page 2020

[62] [62]

Symmetry and the energy landscapes of biomolecules,

P. G. Wolynes, “Symmetry and the energy landscapes of biomolecules,” National Academy of Sciences, vol. 93, no. 25, pp. 14 249–14 255, 1996

work page 1996

[63] [63]

Protein structure and sequence generation with equivariant denoising diffusion probabilistic models,

N. Anand and T. Achim, “Protein structure and sequence generation with equivariant denoising diffusion probabilistic models,”arXiv preprint arXiv:2205.15019, 2022

work page arXiv 2022

[64] [64]

Deep generative modeling for protein design,

A. Strokach and P. M. Kim, “Deep generative modeling for protein design,”Current opinion in structural biology, vol. 72, pp. 226–236, 2022

work page 2022