Synergistic Benefits of Joint Molecule Generation and Property Prediction

Adam Izdebski; Ewa Szczurek; Jakub M. Tomczak; Jan Olszewski; Krzysztof Koras; Pankhil Gawade; Serra Korkmaz; Valentin Rauscher

arxiv: 2504.16559 · v3 · submitted 2025-04-23 · 💻 cs.LG · q-bio.QM

Synergistic Benefits of Joint Molecule Generation and Property Prediction

Adam Izdebski , Jan Olszewski , Pankhil Gawade , Krzysztof Koras , Serra Korkmaz , Valentin Rauscher , Jakub M. Tomczak , Ewa Szczurek This is my paper

Pith reviewed 2026-05-22 17:41 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM

keywords molecule generationproperty predictionjoint learningtransformerconditional samplingdrug designantimicrobial peptides

0 comments

The pith

Hyformer jointly generates molecules and predicts their properties with synergistic performance gains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a single transformer model, Hyformer, that learns the joint distribution of molecular structures and their properties. This setup lets one model perform both molecule generation and property prediction instead of training separate systems. The authors show that joint training produces extra benefits in generating molecules conditioned on target properties, predicting properties for molecules unlike those seen in training, and learning more useful internal representations. They test the approach on a drug design task aimed at finding new antimicrobial peptides. A sympathetic reader would care because joint models could simplify computational pipelines while improving results on both tasks at once.

Core claim

Hyformer is a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mechanism and a joint pre-training scheme. It is simultaneously optimized for molecule generation and property prediction, while exhibiting synergistic benefits in conditional sampling, out-of-distribution property prediction and representation learning.

What carries the argument

Alternating attention mechanism combined with joint pre-training scheme, which allows the model to alternate between generative and predictive modes while sharing learned representations.

If this is right

Conditional generation of molecules with specified properties becomes more accurate because the model learns the joint distribution.
Property prediction generalizes better to molecules outside the training distribution due to shared generative and predictive training.
Representation learning improves for both tasks because each task regularizes the shared features.
Drug design workflows can use one model to propose and score candidate antimicrobial peptides.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The joint approach may extend to other paired data domains such as protein sequences and their functions where generation and prediction are both needed.
Reducing the number of separate models could lower computational overhead in screening large chemical libraries.
Future experiments could test whether adding more property labels during pre-training further amplifies the observed synergies.

Load-bearing premise

The architectural and optimization challenges of training a single joint model for both generation and prediction can be overcome by an alternating attention mechanism and joint pre-training scheme without introducing new instabilities or loss of performance on either task.

What would settle it

Training separate specialized models for generation and for prediction, then comparing their performance on generation quality, property prediction accuracy, conditional sampling success, and out-of-distribution prediction against the joint Hyformer.

Figures

Figures reproduced from arXiv: 2504.16559 by Adam Izdebski, Ewa Szczurek, Jakub M. Tomczak, Jan Olszewski, Krzysztof Koras, Pankhil Gawade, Serra Korkmaz, Valentin Rauscher.

**Figure 1.** Figure 1: A schematic representation of HYFORMER. Depending on the task token [TASK], HYFORMER uses either a causal or a bidirectional mask, outputting token probabilities or predicted property values. We propose HYFORMER, a joint transformer-based model that unifies a generative decoder with a predictive encoder in a single set of shared parameters, using an alternating training scheme. 4.1 Model Formulation HYFORM… view at source ↗

**Figure 2.** Figure 2: (a) Amino-acid distributions between the pre-trained and unconditionally generated sequences. (b) Distributions of charge, aromaticity, and isoelectric point (pI) for: non-AMP, AMP and conditionally generated sequences. (c) Frequency of crossing an attention threshold (x-axis) vs. mean attention weight (y-axis) for distinct amino-acids, colored by charge and sized by hydrophobicity. aromaticity, and isoel… view at source ↗

**Figure 3.** Figure 3: Structures of the twelve generated molecules with Hyformer when the sampling temperature [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 4.** Figure 4: Structures of the twelve generated molecules with Hyformer when the sampling temperature [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Structures of the twelve generated molecules with Hyformer when the sampling temperature [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

**Figure 6.** Figure 6: Structures of molecules generated by Hyformer conditioned on QED values, visualized [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Structures of molecules generated by Hyformer conditioned on SA score, visualized using [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Structures of molecules generated by Hyformer conditioned on LogP values, visualized [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Hyformer’s molecular embeddings. The considered chemical properties are normalized to [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

read the original abstract

Modeling the joint distribution of data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic benefits reaching beyond purely generative or predictive models. However, training joint models presents daunting architectural and optimization challenges. Here, we propose Hyformer, a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mechanism and a joint pre-training scheme. We show that Hyformer is simultaneously optimized for molecule generation and property prediction, while exhibiting synergistic benefits in conditional sampling, out-of-distribution property prediction and representation learning. Finally, we demonstrate the benefits of joint learning in a drug design use case of discovering novel antimicrobial~peptides.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Hyformer, a transformer-based joint model for molecule generation and property prediction. It employs an alternating attention mechanism and joint pre-training to address architectural and optimization challenges, claiming that the model is simultaneously optimized for both tasks and exhibits synergistic benefits in conditional sampling, out-of-distribution property prediction, representation learning, and a drug-design use case involving discovery of novel antimicrobial peptides.

Significance. If the empirical results and ablations hold, the work could meaningfully advance joint generative-predictive modeling in molecular machine learning by demonstrating that a single model can match or exceed specialized baselines while unlocking synergies not available to separate models. The alternating-attention design and joint pre-training scheme constitute a concrete architectural contribution that may generalize beyond the reported tasks.

major comments (3)

[Section 3] Section 3: The alternating attention mechanism is presented as resolving task interference, yet the manuscript provides neither per-task loss curves nor gradient-norm statistics across training epochs. Without these diagnostics it is impossible to verify that the joint optimization truly achieves simultaneous strong performance on generation and prediction rather than trading off one objective against the other.
[Results (Tables 1–3)] Results (Tables 1–3 and associated figures): Single-task baselines for molecule generation (validity, uniqueness, novelty) and property prediction (MAE or R² on held-out sets) are not reported. Consequently the central claim of “synergistic benefits” and absence of negative transfer cannot be quantitatively evaluated; the reported joint-model numbers alone do not establish that Hyformer matches or exceeds specialized models on each task individually.
[OOD experiments] OOD property-prediction experiments: The improvement attributed to joint training is shown only for the full Hyformer; an ablation that freezes the generative component or trains a prediction-only variant on the same data is missing. This ablation is load-bearing for the claim that joint pre-training confers out-of-distribution robustness.

minor comments (2)

[Abstract] Abstract: the phrase “simultaneously optimized for molecule generation and property prediction” is repeated without any numeric preview; adding one or two headline metrics would improve clarity.
[Section 3] Notation: the alternating attention block is introduced with several new symbols; a consolidated table or diagram legend would reduce reader effort when cross-referencing equations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our work. We address each of the major comments below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Section 3] Section 3: The alternating attention mechanism is presented as resolving task interference, yet the manuscript provides neither per-task loss curves nor gradient-norm statistics across training epochs. Without these diagnostics it is impossible to verify that the joint optimization truly achieves simultaneous strong performance on generation and prediction rather than trading off one objective against the other.

Authors: We agree that per-task loss curves and gradient-norm statistics would provide valuable evidence that the alternating attention mechanism enables simultaneous optimization without task interference. We will add these diagnostics to Section 3 in the revised manuscript. revision: yes
Referee: [Results (Tables 1–3)] Results (Tables 1–3 and associated figures): Single-task baselines for molecule generation (validity, uniqueness, novelty) and property prediction (MAE or R² on held-out sets) are not reported. Consequently the central claim of “synergistic benefits” and absence of negative transfer cannot be quantitatively evaluated; the reported joint-model numbers alone do not establish that Hyformer matches or exceeds specialized models on each task individually.

Authors: We recognize that reporting single-task baselines is essential to quantitatively demonstrate synergistic benefits and the absence of negative transfer. In the revised manuscript, we will include single-task baseline results in Tables 1–3 for both generation and property prediction tasks, allowing direct comparison with the joint Hyformer model. revision: yes
Referee: [OOD experiments] OOD property-prediction experiments: The improvement attributed to joint training is shown only for the full Hyformer; an ablation that freezes the generative component or trains a prediction-only variant on the same data is missing. This ablation is load-bearing for the claim that joint pre-training confers out-of-distribution robustness.

Authors: We agree that an ablation comparing the full model to a prediction-only variant is necessary to substantiate the OOD robustness benefits from joint pre-training. We will incorporate this ablation into the OOD experiments section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on experimental results rather than self-referential derivations

full rationale

The paper introduces Hyformer, a transformer architecture for joint molecule generation and property prediction via alternating attention and joint pre-training. All central claims—simultaneous optimization, synergies in conditional sampling, OOD prediction, and representation learning—are presented as outcomes of empirical evaluation on benchmarks and a drug-design case study. No equations, fitted parameters, or predictions are described that reduce by construction to the model's own inputs or to self-citations. The work contains no load-bearing uniqueness theorems, ansatzes smuggled via prior self-work, or renamings of known patterns; results are validated externally against specialized models and datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on the domain assumption that joint distribution modeling inherently produces synergistic benefits that separate models cannot achieve, and that the proposed architecture successfully addresses the stated training challenges.

axioms (1)

domain assumption Modeling the joint distribution of data samples and their properties allows construction of a single model with synergistic benefits beyond purely generative or predictive models.
Opening sentence of the abstract; this premise motivates the entire approach.

invented entities (1)

Hyformer no independent evidence
purpose: Transformer-based joint model blending generative and predictive functionalities via alternating attention and joint pre-training
New model introduced to solve the joint training problem; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5665 in / 1193 out tokens · 43617 ms · 2026-05-22T17:41:41.586249+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HYFORMER unifies a decoder with an encoder using a transformer backbone fθ(x; [TASK]) conditioned on a task token [TASK] ∈ { [LM], [PRED], [MLM]}. ... ATT_Type = → if [TASK] = [LM], ↔ if [TASK] ∈ {[PRED], [MLM]}.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ℓHYFORMER = ℓLM + µℓMLM + ηℓPRED

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 2 internal anchors

[1]

Arús-Pous, S

J. Arús-Pous, S. V . Johansson, O. Prykhodko, E. J. Bjerrum, C. Tyrchan, J.-L. Reymond, H. Chen, and O. Engkvist. Randomized smiles strings improve the quality of molecular generative models. Journal of cheminformatics, 11:1–13, 2019

work page 2019
[2]

Bagal, R

V . Bagal, R. Aggarwal, P. K. Vinod, and U. D. Priyakumar. MolGPT: Molecular Generation Using a Transformer-Decoder Model. Journal of Chemical Information and Modeling, 62(9): 2064–2076, May 2022. ISSN 1549-9596. doi: 10.1021/acs.jcim.1c00600

work page doi:10.1021/acs.jcim.1c00600 2064
[3]

X. Bi, C. Wang, W. Dong, W. Zhu, and D. Shang. Antimicrobial properties and interaction of two trp-substituted cationic antimicrobial peptides with a lipid bilayer. The Journal of Antibiotics, 67(5):361–368, 2014

work page 2014
[4]

C. M. Bishop. Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing, 141(4):217–222, 1994

work page 1994
[5]

E. J. Bjerrum. Smiles enumeration as data augmentation for neural network modeling of molecules, 2017

work page 2017
[6]

Born and M

J. Born and M. Manica. Regression transformer enables concurrent sequence regression and generation for molecular language modelling. Nature Machine Intelligence, 5(4):432–444, 2023

work page 2023
[7]

Brown, M

N. Brown, M. Fiscato, M. H. Segler, and A. C. Vaucher. GuacaMol: Benchmarking Models for de Novo Molecular Design. Journal of Chemical Information and Modeling, 59(3):1096–1108,

work page
[8]

doi: 10.1021/acs.jcim.8b00839

ISSN 1549-9596. doi: 10.1021/acs.jcim.8b00839

work page doi:10.1021/acs.jcim.8b00839
[9]

Cabas-Mora, A

G. Cabas-Mora, A. Daza, N. Soto-García, V . Garrido, D. Alvarez, M. Navarrete, L. Sarmiento- Varón, J. H. Sepúlveda Yañez, M. D. Davari, F. Cadet, Á. Olivera-Nappa, R. Uribe-Paredes, and D. Medina-Ortiz. Peptipedia v2.0: A peptide sequence database and user-friendly web platform. a major update. bioRxiv, 2024. doi: 10.1101/2024.07.11.603053. URL https: //...

work page doi:10.1101/2024.07.11.603053 2024
[10]

Cao and Z

S. Cao and Z. Zhang. Deep hybrid models for out-of-distribution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4733–4743, 2022

work page 2022
[11]

Chen and C

T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016

work page 2016
[12]

T. Chen, P. Vure, R. Pulugurta, and P. Chatterjee. AMP-diffusion: Integrating latent diffusion with protein language models for antimicrobial peptide generation. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023

work page 2023
[13]

T. U. Consortium. Uniprot: the universal protein knowledgebase in 2025. Nucleic Acids Research, 53(D1):D609–D617, 11 2024. ISSN 1362-4962. doi: 10.1093/nar/gkae1010. URL https://doi.org/10.1093/nar/gkae1010

work page doi:10.1093/nar/gkae1010 2025
[14]

P. Das, K. Wadhawan, O. Chang, T. Sercu, C. D. Santos, M. Riemer, V . Chenthamarakshan, I. Padhi, and A. Mojsilovic. Pepcvae: Semi-supervised targeted design of antimicrobial peptide sequences, 2018

work page 2018
[15]

L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y . Wang, J. Gao, M. Zhou, and H.-W. Hon. Unified language model pre-training for natural language understanding and generation. Advances in neural information processing systems, 32, 2019

work page 2019
[16]

Fabian, T

B. Fabian, T. Edlich, H. Gaspar, M. Segler, J. Meyers, M. Fiscato, and M. Ahmed. Molecular representation learning with language models and domain-relevant auxiliary tasks. arXiv preprint arXiv:2011.13230, 2020

work page arXiv 2011
[17]

X. Fang, L. Liu, J. Lei, D. He, S. Zhang, J. Zhou, F. Wang, H. Wu, and H. Wang. Geometry- enhanced molecular representation learning for property prediction. Nature Machine Intelli- gence, 4(2):127–134, 2022. 10

work page 2022
[18]

S. Feng, Y . Ni, Y . Lu, Z.-M. Ma, W.-Y . Ma, and Y . Lan. Unigem: A unified approach to generation and property prediction for molecules. arXiv preprint arXiv:2410.10516, 2024

work page arXiv 2024
[19]

Flam-Shepherd, K

D. Flam-Shepherd, K. Zhu, and A. Aspuru-Guzik. Language models can learn complex molecular distributions. Nature Communications, 13(1):3293, 2022

work page 2022
[20]

B. Gao, M. Ren, Y . Ni, Y . Huang, B. Qiang, Z.-M. Ma, W.-Y . Ma, and Y . Lan. Rethink- ing specificity in sbdd: Leveraging delta score and energy-guided diffusion. arXiv preprint arXiv:2403.12987, 2024

work page arXiv 2024
[21]

L. Gao, J. Schulman, and J. Hilton. Scaling laws for reward model overoptimization. In International Conference on Machine Learning, pages 10835–10866. PMLR, 2023

work page 2023
[22]

Z. Gao, D. Dong, C. Tan, J. Xia, B. Hu, and S. Z. Li. A graph is worth k words: Euclideanizing graph using pure transformer. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors, Proceedings of the 41st International Conference on Machine Learning , volume 235 of Proceedings of Machine Learning Research ...

work page 2024
[23]

Z. Geng, S. Xie, Y . Xia, L. Wu, T. Qin, J. Wang, Y . Zhang, F. Wu, and T.-Y . Liu. De novo molecular generation via connection-aware motif mining, 2023

work page 2023
[24]

Gers and E

F. Gers and E. Schmidhuber. Lstm recurrent networks learn simple context-free and context- sensitive languages. IEEE Transactions on Neural Networks, 12(6):1333–1340, 2001. doi: 10.1109/72.963769

work page doi:10.1109/72.963769 2001
[25]

Gómez-Bombarelli, J

R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik. Auto- matic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science, 4(2):268–276, Feb. 2018. ISSN 2374-7943. doi: 10.1021/acscentsci.7b00572

work page doi:10.1021/acscentsci.7b00572 2018
[26]

Grathwohl, K.-C

W. Grathwohl, K.-C. Wang, J.-H. Jacobsen, D. Duvenaud, M. Norouzi, and K. Swersky. Your classifier is secretly an energy based model and you should treat it like one, 2020

work page 2020
[27]

F. Grisoni. Chemical language models for de novo drug design: Challenges and opportunities. Current Opinion in Structural Biology, 79:102527, 2023

work page 2023
[28]

J. Guan, W. W. Qian, X. Peng, Y . Su, J. Peng, and J. Ma. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. arXiv preprint arXiv:2303.03543, 2023

work page arXiv 2023
[29]

Hetzel, J

L. Hetzel, J. Sommer, B. Rieck, F. Theis, and S. Günnemann. Magnet: Motif-agnostic generation of molecules from shapes, 2023

work page 2023
[30]

Hoogeboom, V

E. Hoogeboom, V . G. Satorras, C. Vignac, and M. Welling. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pages 8867–8887. PMLR, 2022

work page 2022
[31]

W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V . Pande, and J. Leskovec. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265, 2019

work page arXiv 1905
[32]

Irwin, S

R. Irwin, S. Dimitriadis, J. He, and E. J. Bjerrum. Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology, 3(1):015022, 2022

work page 2022
[33]

Jaakkola and D

T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers.Advances in neural information processing systems, 11, 1998

work page 1998
[34]

W. Jin, R. Barzilay, and T. Jaakkola. Junction tree variational autoencoder for molecular graph generation, 2019

work page 2019
[35]

D. P. Kingma and M. Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[36]

Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation

M. Krenn, F. Häse, A. Nigam, P. Friederich, and A. Aspuru-Guzik. Self-referencing embedded strings (selfies): A 100Machine Learning: Science and Technology, 1(4):045024, Oct. 2020. ISSN 2632-2153. doi: 10.1088/2632-2153/aba947. 11

work page doi:10.1088/2632-2153/aba947 2020
[37]

P.-K. Lai, D. T. Tresnak, and B. J. Hackel. Identification and elucidation of proline-rich antimicrobial peptides with enhanced potency and delivery. Biotechnology and bioengineering, 116(10):2439–2450, 2019

work page 2019
[38]

J. A. Lasserre, C. M. Bishop, and T. P. Minka. Principled hybrids of generative and discrimi- native models. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 1, pages 87–94. IEEE, 2006

work page 2006
[39]

T. J. Lawrence, D. L. Carper, M. K. Spangler, A. A. Carrell, T. A. Rush, S. J. Minter, D. J. Weston, and J. L. Labbé. ampeppy 1.0: a portable and accurate antimicrobial peptide prediction tool. Bioinformatics, 37(14):2058–2060, 11 2020. ISSN 1367-4803. doi: 10.1093/bioinformatics/ btaa917. URL https://doi.org/10.1093/bioinformatics/btaa917

work page doi:10.1093/bioinformatics/ 2058
[40]

C. Li, D. Sutherland, S. A. Hammond, C. Yang, F. Taho, L. Bergman, S. Houston, R. L. Warren, T. Wong, L. M. N. Hoang, C. E. Cameron, C. C. Helbing, and I. Birol. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genomics, 23(1):77, Jan. 2022

work page 2022
[41]

P. Li, J. Wang, Y . Qiao, H. Chen, Y . Yu, X. Yao, P. Gao, G. Xie, and S. Song. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Briefings in Bioinformatics, 22(6):bbab109, 2021

work page 2021
[42]

T. Li, X. Ren, X. Luo, Z. Wang, Z. Li, X. Luo, J. Shen, Y . Li, D. Yuan, R. Nussinov, X. Zeng, J. Shi, and F. Cheng. A foundation model identifies broad-spectrum antimicrobial peptides against drug-resistant bacterial infection. Nat. Commun., 15(1):7538, Aug. 2024

work page 2024
[43]

Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y . Shmueli, A. dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, and A. Rives. Evolutionary- scale prediction of atomic-level protein structure with a language model. Science, 379(6637): 1123–1130, 2023. doi: 10.1126/science.ade2574

work page doi:10.1126/science.ade2574 2023
[44]

M. Liu, K. Yan, B. Oztekin, and S. Ji. Graphebm: Molecular graph generation with energy-based models. arXiv preprint arXiv:2102.00546, 2021

work page arXiv 2021
[45]

Q. Liu, M. Allamanis, M. Brockschmidt, and A. Gaunt. Constrained graph variational autoen- coders for molecule design. Advances in neural information processing systems, 31, 2018

work page 2018
[46]

S. Liu, M. F. Demirel, and Y . Liang. N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. Advances in neural information processing systems, 32, 2019

work page 2019
[47]

S. Liu, H. Wang, W. Liu, J. Lasenby, H. Guo, and J. Tang. Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728, 2021

work page arXiv 2021
[48]

Y . Luo, K. Yan, and S. Ji. Graphdf: A discrete flow model for molecular graph generation. In International conference on machine learning, pages 7192–7203. PMLR, 2021

work page 2021
[49]

Maziarz, H

K. Maziarz, H. Jackson-Flux, P. Cameron, F. Sirockin, N. Schneider, N. Stiefl, M. Segler, and M. Brockschmidt. Learning to extend molecular scaffolds with structural motifs, 2022

work page 2022
[50]

Controlled decoding from language models

S. Mudgal, J. Lee, H. Ganapathy, Y . Li, T. Wang, Y . Huang, Z. Chen, H.-T. Cheng, M. Collins, T. Strohman, et al. Controlled decoding from language models.arXiv preprint arXiv:2310.17022, 2023

work page arXiv 2023
[51]

Nalisnick, A

E. Nalisnick, A. Matsukawa, Y . W. Teh, D. Gorur, and B. Lakshminarayanan. Hybrid models with deep and invertible features. In International Conference on Machine Learning, pages 4723–4732. PMLR, 2019

work page 2019
[52]

Özçelik, L

R. Özçelik, L. van Weesep, S. de Ruiter, and F. Grisoni. peptidy: A light-weight python library for peptide representation in machine learning. 2025

work page 2025
[53]

K. B. Petersen, M. S. Pedersen, et al. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008. 12

work page 2008
[54]

Rafailov, A

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36:53728–53741, 2023

work page 2023
[55]

Raffel, N

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023

work page 2023
[56]

Y . Rong, Y . Bian, T. Xu, W. Xie, Y . Wei, W. Huang, and J. Huang. Self-supervised graph transformer on large-scale molecular data. Advances in neural information processing systems, 33:12559–12571, 2020

work page 2020
[57]

C. D. Santos-Júnior, Y . Duan, H. Chong, T. S. Schmidt, A. Fullam, P. Bork, X.-M. Zhao, and L. P. Coelho. Ampsphere : the worldwide survey of prokaryotic antimicrobial peptides, May

work page
[58]

URL https://doi.org/10.5281/zenodo.6511404

work page doi:10.5281/zenodo.6511404
[59]

Schwaller, D

P. Schwaller, D. Probst, A. C. Vaucher, V . H. Nair, D. Kreutter, T. Laino, and J.-L. Reymond. Mapping the space of chemical reactions using attention-based neural networks. ChemRxiv,

work page
[60]

doi: 10.26434/chemrxiv.9897365.v4

work page doi:10.26434/chemrxiv.9897365.v4
[61]

M. H. Segler, T. Kogej, C. Tyrchan, and M. P. Waller. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS central science, 4(1):120–131, 2018

work page 2018
[62]

Y . Shi, S. Zheng, G. Ke, Y . Shen, J. You, J. He, S. Luo, C. Liu, D. He, and T.-Y . Liu. Benchmark- ing graphormer on large-scale molecular modeling datasets. arXiv preprint arXiv:2203.04810, 2022

work page arXiv 2022
[63]

S. Steshin. Lo-hi: Practical ml drug discovery benchmark. In Advances in Neural Information Processing Systems, 2023

work page 2023
[64]

J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann, et al. A deep learning approach to antibiotic discovery. Cell, 180(4):688–702, 2020

work page 2020
[65]

Sultan, J

A. Sultan, J. Sieg, M. Mathea, and A. V olkamer. Transformers for molecular property prediction: Lessons learned from the past five years. arxiv preprint arxiv: 240403969. 2024

work page 2024
[66]

Szymczak, M

P. Szymczak, M. Mo ˙zejko, T. Grzegorzek, R. Jurczak, M. Bauer, D. Neubauer, K. Sikora, M. Michalski, J. Sroka, P. Setny, W. Kamysz, and E. Szczurek. Discovering highly potent antimicrobial peptides with deep generative model hydramp. bioRxiv, 2023. doi: 10.1101/2022. 01.27.478054

work page doi:10.1101/2022 2023
[67]

J. M. Tomczak. Deep generative modeling for neural compression. In Deep Generative Modeling. Springer, 2022

work page 2022
[68]

M. D. T. Torres, T. Chen, F. Wan, P. Chatterjee, and C. de la Fuente-Nunez. Generative latent diffusion language modeling yields anti-infective synthetic peptides. bioRxiv, 2025. doi: 10. 1101/2025.01.31.636003. URL https://www.biorxiv.org/content/early/2025/02/ 01/2025.01.31.636003

work page 2025
[69]

Llama 2: Open Foundation and Fine-Tuned Chat Models

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[70]

C. M. Van Oort, J. B. Ferrell, J. M. Remington, S. Wshah, and J. Li. Ampgan v2: Ma- chine learning-guided design of antimicrobial peptides. Journal of Chemical Information and Modeling, 61(5):2198–2207, 2021. doi: 10.1021/acs.jcim.0c01441. PMID: 33787250

work page doi:10.1021/acs.jcim.0c01441 2021
[71]

van Tilborg, L

D. van Tilborg, L. Rossen, and F. Grisoni. Molecular deep learning at the edge of chemical space. 2025

work page 2025
[72]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. 13

work page 2017
[73]

S. Wang, Y . Guo, Y . Wang, H. Sun, and J. Huang. Smiles-bert: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pages 429–436, 2019

work page 2019
[74]

Y . Wang, J. Wang, Z. Cao, and A. Barati Farimani. Molecular contrastive learning of represen- tations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, 2022

work page 2022
[75]

Weininger

D. Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28 (1):31–36, Feb. 1988. ISSN 0095-2338. doi: 10.1021/ci00057a005

work page doi:10.1021/ci00057a005 1988
[76]

Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V . Pande. Moleculenet: A benchmark for molecular machine learning, 2018

work page 2018
[77]

J. Xia, C. Zhao, B. Hu, Z. Gao, C. Tan, Y . Liu, S. Li, and S. Z. Li. Mole-BERT: Rethinking pre-training graph neural networks for molecules. In The Eleventh International Conference on Learning Representations, 2023

work page 2023
[78]

Xiang, G.-Y

Y .-T. Xiang, G.-Y . Huang, X.-X. Shi, G.-F. Hao, and G.-F. Yang. 3d molecular generation models expand chemical space exploration in drug design. Drug Discovery Today, page 104282, 2024

work page 2024
[79]

Xiong, D

Z. Xiong, D. Wang, X. Liu, F. Zhong, X. Wan, X. Li, Z. Li, X. Luo, K. Chen, H. Jiang, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry, 63(16):8749–8760, 2019

work page 2019
[80]

K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea, et al. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019

work page 2019

Showing first 80 references.

[1] [1]

Arús-Pous, S

J. Arús-Pous, S. V . Johansson, O. Prykhodko, E. J. Bjerrum, C. Tyrchan, J.-L. Reymond, H. Chen, and O. Engkvist. Randomized smiles strings improve the quality of molecular generative models. Journal of cheminformatics, 11:1–13, 2019

work page 2019

[2] [2]

Bagal, R

V . Bagal, R. Aggarwal, P. K. Vinod, and U. D. Priyakumar. MolGPT: Molecular Generation Using a Transformer-Decoder Model. Journal of Chemical Information and Modeling, 62(9): 2064–2076, May 2022. ISSN 1549-9596. doi: 10.1021/acs.jcim.1c00600

work page doi:10.1021/acs.jcim.1c00600 2064

[3] [3]

X. Bi, C. Wang, W. Dong, W. Zhu, and D. Shang. Antimicrobial properties and interaction of two trp-substituted cationic antimicrobial peptides with a lipid bilayer. The Journal of Antibiotics, 67(5):361–368, 2014

work page 2014

[4] [4]

C. M. Bishop. Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing, 141(4):217–222, 1994

work page 1994

[5] [5]

E. J. Bjerrum. Smiles enumeration as data augmentation for neural network modeling of molecules, 2017

work page 2017

[6] [6]

Born and M

J. Born and M. Manica. Regression transformer enables concurrent sequence regression and generation for molecular language modelling. Nature Machine Intelligence, 5(4):432–444, 2023

work page 2023

[7] [7]

Brown, M

N. Brown, M. Fiscato, M. H. Segler, and A. C. Vaucher. GuacaMol: Benchmarking Models for de Novo Molecular Design. Journal of Chemical Information and Modeling, 59(3):1096–1108,

work page

[8] [8]

doi: 10.1021/acs.jcim.8b00839

ISSN 1549-9596. doi: 10.1021/acs.jcim.8b00839

work page doi:10.1021/acs.jcim.8b00839

[9] [9]

Cabas-Mora, A

G. Cabas-Mora, A. Daza, N. Soto-García, V . Garrido, D. Alvarez, M. Navarrete, L. Sarmiento- Varón, J. H. Sepúlveda Yañez, M. D. Davari, F. Cadet, Á. Olivera-Nappa, R. Uribe-Paredes, and D. Medina-Ortiz. Peptipedia v2.0: A peptide sequence database and user-friendly web platform. a major update. bioRxiv, 2024. doi: 10.1101/2024.07.11.603053. URL https: //...

work page doi:10.1101/2024.07.11.603053 2024

[10] [10]

Cao and Z

S. Cao and Z. Zhang. Deep hybrid models for out-of-distribution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4733–4743, 2022

work page 2022

[11] [11]

Chen and C

T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016

work page 2016

[12] [12]

T. Chen, P. Vure, R. Pulugurta, and P. Chatterjee. AMP-diffusion: Integrating latent diffusion with protein language models for antimicrobial peptide generation. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023

work page 2023

[13] [13]

T. U. Consortium. Uniprot: the universal protein knowledgebase in 2025. Nucleic Acids Research, 53(D1):D609–D617, 11 2024. ISSN 1362-4962. doi: 10.1093/nar/gkae1010. URL https://doi.org/10.1093/nar/gkae1010

work page doi:10.1093/nar/gkae1010 2025

[14] [14]

P. Das, K. Wadhawan, O. Chang, T. Sercu, C. D. Santos, M. Riemer, V . Chenthamarakshan, I. Padhi, and A. Mojsilovic. Pepcvae: Semi-supervised targeted design of antimicrobial peptide sequences, 2018

work page 2018

[15] [15]

L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y . Wang, J. Gao, M. Zhou, and H.-W. Hon. Unified language model pre-training for natural language understanding and generation. Advances in neural information processing systems, 32, 2019

work page 2019

[16] [16]

Fabian, T

B. Fabian, T. Edlich, H. Gaspar, M. Segler, J. Meyers, M. Fiscato, and M. Ahmed. Molecular representation learning with language models and domain-relevant auxiliary tasks. arXiv preprint arXiv:2011.13230, 2020

work page arXiv 2011

[17] [17]

X. Fang, L. Liu, J. Lei, D. He, S. Zhang, J. Zhou, F. Wang, H. Wu, and H. Wang. Geometry- enhanced molecular representation learning for property prediction. Nature Machine Intelli- gence, 4(2):127–134, 2022. 10

work page 2022

[18] [18]

S. Feng, Y . Ni, Y . Lu, Z.-M. Ma, W.-Y . Ma, and Y . Lan. Unigem: A unified approach to generation and property prediction for molecules. arXiv preprint arXiv:2410.10516, 2024

work page arXiv 2024

[19] [19]

Flam-Shepherd, K

D. Flam-Shepherd, K. Zhu, and A. Aspuru-Guzik. Language models can learn complex molecular distributions. Nature Communications, 13(1):3293, 2022

work page 2022

[20] [20]

B. Gao, M. Ren, Y . Ni, Y . Huang, B. Qiang, Z.-M. Ma, W.-Y . Ma, and Y . Lan. Rethink- ing specificity in sbdd: Leveraging delta score and energy-guided diffusion. arXiv preprint arXiv:2403.12987, 2024

work page arXiv 2024

[21] [21]

L. Gao, J. Schulman, and J. Hilton. Scaling laws for reward model overoptimization. In International Conference on Machine Learning, pages 10835–10866. PMLR, 2023

work page 2023

[22] [22]

Z. Gao, D. Dong, C. Tan, J. Xia, B. Hu, and S. Z. Li. A graph is worth k words: Euclideanizing graph using pure transformer. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors, Proceedings of the 41st International Conference on Machine Learning , volume 235 of Proceedings of Machine Learning Research ...

work page 2024

[23] [23]

Z. Geng, S. Xie, Y . Xia, L. Wu, T. Qin, J. Wang, Y . Zhang, F. Wu, and T.-Y . Liu. De novo molecular generation via connection-aware motif mining, 2023

work page 2023

[24] [24]

Gers and E

F. Gers and E. Schmidhuber. Lstm recurrent networks learn simple context-free and context- sensitive languages. IEEE Transactions on Neural Networks, 12(6):1333–1340, 2001. doi: 10.1109/72.963769

work page doi:10.1109/72.963769 2001

[25] [25]

Gómez-Bombarelli, J

R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik. Auto- matic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science, 4(2):268–276, Feb. 2018. ISSN 2374-7943. doi: 10.1021/acscentsci.7b00572

work page doi:10.1021/acscentsci.7b00572 2018

[26] [26]

Grathwohl, K.-C

W. Grathwohl, K.-C. Wang, J.-H. Jacobsen, D. Duvenaud, M. Norouzi, and K. Swersky. Your classifier is secretly an energy based model and you should treat it like one, 2020

work page 2020

[27] [27]

F. Grisoni. Chemical language models for de novo drug design: Challenges and opportunities. Current Opinion in Structural Biology, 79:102527, 2023

work page 2023

[28] [28]

J. Guan, W. W. Qian, X. Peng, Y . Su, J. Peng, and J. Ma. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. arXiv preprint arXiv:2303.03543, 2023

work page arXiv 2023

[29] [29]

Hetzel, J

L. Hetzel, J. Sommer, B. Rieck, F. Theis, and S. Günnemann. Magnet: Motif-agnostic generation of molecules from shapes, 2023

work page 2023

[30] [30]

Hoogeboom, V

E. Hoogeboom, V . G. Satorras, C. Vignac, and M. Welling. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pages 8867–8887. PMLR, 2022

work page 2022

[31] [31]

W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V . Pande, and J. Leskovec. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265, 2019

work page arXiv 1905

[32] [32]

Irwin, S

R. Irwin, S. Dimitriadis, J. He, and E. J. Bjerrum. Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology, 3(1):015022, 2022

work page 2022

[33] [33]

Jaakkola and D

T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers.Advances in neural information processing systems, 11, 1998

work page 1998

[34] [34]

W. Jin, R. Barzilay, and T. Jaakkola. Junction tree variational autoencoder for molecular graph generation, 2019

work page 2019

[35] [35]

D. P. Kingma and M. Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[36] [36]

Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation

M. Krenn, F. Häse, A. Nigam, P. Friederich, and A. Aspuru-Guzik. Self-referencing embedded strings (selfies): A 100Machine Learning: Science and Technology, 1(4):045024, Oct. 2020. ISSN 2632-2153. doi: 10.1088/2632-2153/aba947. 11

work page doi:10.1088/2632-2153/aba947 2020

[37] [37]

P.-K. Lai, D. T. Tresnak, and B. J. Hackel. Identification and elucidation of proline-rich antimicrobial peptides with enhanced potency and delivery. Biotechnology and bioengineering, 116(10):2439–2450, 2019

work page 2019

[38] [38]

J. A. Lasserre, C. M. Bishop, and T. P. Minka. Principled hybrids of generative and discrimi- native models. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 1, pages 87–94. IEEE, 2006

work page 2006

[39] [39]

T. J. Lawrence, D. L. Carper, M. K. Spangler, A. A. Carrell, T. A. Rush, S. J. Minter, D. J. Weston, and J. L. Labbé. ampeppy 1.0: a portable and accurate antimicrobial peptide prediction tool. Bioinformatics, 37(14):2058–2060, 11 2020. ISSN 1367-4803. doi: 10.1093/bioinformatics/ btaa917. URL https://doi.org/10.1093/bioinformatics/btaa917

work page doi:10.1093/bioinformatics/ 2058

[40] [40]

C. Li, D. Sutherland, S. A. Hammond, C. Yang, F. Taho, L. Bergman, S. Houston, R. L. Warren, T. Wong, L. M. N. Hoang, C. E. Cameron, C. C. Helbing, and I. Birol. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genomics, 23(1):77, Jan. 2022

work page 2022

[41] [41]

P. Li, J. Wang, Y . Qiao, H. Chen, Y . Yu, X. Yao, P. Gao, G. Xie, and S. Song. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Briefings in Bioinformatics, 22(6):bbab109, 2021

work page 2021

[42] [42]

T. Li, X. Ren, X. Luo, Z. Wang, Z. Li, X. Luo, J. Shen, Y . Li, D. Yuan, R. Nussinov, X. Zeng, J. Shi, and F. Cheng. A foundation model identifies broad-spectrum antimicrobial peptides against drug-resistant bacterial infection. Nat. Commun., 15(1):7538, Aug. 2024

work page 2024

[43] [43]

Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y . Shmueli, A. dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, and A. Rives. Evolutionary- scale prediction of atomic-level protein structure with a language model. Science, 379(6637): 1123–1130, 2023. doi: 10.1126/science.ade2574

work page doi:10.1126/science.ade2574 2023

[44] [44]

M. Liu, K. Yan, B. Oztekin, and S. Ji. Graphebm: Molecular graph generation with energy-based models. arXiv preprint arXiv:2102.00546, 2021

work page arXiv 2021

[45] [45]

Q. Liu, M. Allamanis, M. Brockschmidt, and A. Gaunt. Constrained graph variational autoen- coders for molecule design. Advances in neural information processing systems, 31, 2018

work page 2018

[46] [46]

S. Liu, M. F. Demirel, and Y . Liang. N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. Advances in neural information processing systems, 32, 2019

work page 2019

[47] [47]

S. Liu, H. Wang, W. Liu, J. Lasenby, H. Guo, and J. Tang. Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728, 2021

work page arXiv 2021

[48] [48]

Y . Luo, K. Yan, and S. Ji. Graphdf: A discrete flow model for molecular graph generation. In International conference on machine learning, pages 7192–7203. PMLR, 2021

work page 2021

[49] [49]

Maziarz, H

K. Maziarz, H. Jackson-Flux, P. Cameron, F. Sirockin, N. Schneider, N. Stiefl, M. Segler, and M. Brockschmidt. Learning to extend molecular scaffolds with structural motifs, 2022

work page 2022

[50] [50]

Controlled decoding from language models

S. Mudgal, J. Lee, H. Ganapathy, Y . Li, T. Wang, Y . Huang, Z. Chen, H.-T. Cheng, M. Collins, T. Strohman, et al. Controlled decoding from language models.arXiv preprint arXiv:2310.17022, 2023

work page arXiv 2023

[51] [51]

Nalisnick, A

E. Nalisnick, A. Matsukawa, Y . W. Teh, D. Gorur, and B. Lakshminarayanan. Hybrid models with deep and invertible features. In International Conference on Machine Learning, pages 4723–4732. PMLR, 2019

work page 2019

[52] [52]

Özçelik, L

R. Özçelik, L. van Weesep, S. de Ruiter, and F. Grisoni. peptidy: A light-weight python library for peptide representation in machine learning. 2025

work page 2025

[53] [53]

K. B. Petersen, M. S. Pedersen, et al. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008. 12

work page 2008

[54] [54]

Rafailov, A

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36:53728–53741, 2023

work page 2023

[55] [55]

Raffel, N

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023

work page 2023

[56] [56]

Y . Rong, Y . Bian, T. Xu, W. Xie, Y . Wei, W. Huang, and J. Huang. Self-supervised graph transformer on large-scale molecular data. Advances in neural information processing systems, 33:12559–12571, 2020

work page 2020

[57] [57]

C. D. Santos-Júnior, Y . Duan, H. Chong, T. S. Schmidt, A. Fullam, P. Bork, X.-M. Zhao, and L. P. Coelho. Ampsphere : the worldwide survey of prokaryotic antimicrobial peptides, May

work page

[58] [58]

URL https://doi.org/10.5281/zenodo.6511404

work page doi:10.5281/zenodo.6511404

[59] [59]

Schwaller, D

P. Schwaller, D. Probst, A. C. Vaucher, V . H. Nair, D. Kreutter, T. Laino, and J.-L. Reymond. Mapping the space of chemical reactions using attention-based neural networks. ChemRxiv,

work page

[60] [60]

doi: 10.26434/chemrxiv.9897365.v4

work page doi:10.26434/chemrxiv.9897365.v4

[61] [61]

M. H. Segler, T. Kogej, C. Tyrchan, and M. P. Waller. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS central science, 4(1):120–131, 2018

work page 2018

[62] [62]

Y . Shi, S. Zheng, G. Ke, Y . Shen, J. You, J. He, S. Luo, C. Liu, D. He, and T.-Y . Liu. Benchmark- ing graphormer on large-scale molecular modeling datasets. arXiv preprint arXiv:2203.04810, 2022

work page arXiv 2022

[63] [63]

S. Steshin. Lo-hi: Practical ml drug discovery benchmark. In Advances in Neural Information Processing Systems, 2023

work page 2023

[64] [64]

J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann, et al. A deep learning approach to antibiotic discovery. Cell, 180(4):688–702, 2020

work page 2020

[65] [65]

Sultan, J

A. Sultan, J. Sieg, M. Mathea, and A. V olkamer. Transformers for molecular property prediction: Lessons learned from the past five years. arxiv preprint arxiv: 240403969. 2024

work page 2024

[66] [66]

Szymczak, M

P. Szymczak, M. Mo ˙zejko, T. Grzegorzek, R. Jurczak, M. Bauer, D. Neubauer, K. Sikora, M. Michalski, J. Sroka, P. Setny, W. Kamysz, and E. Szczurek. Discovering highly potent antimicrobial peptides with deep generative model hydramp. bioRxiv, 2023. doi: 10.1101/2022. 01.27.478054

work page doi:10.1101/2022 2023

[67] [67]

J. M. Tomczak. Deep generative modeling for neural compression. In Deep Generative Modeling. Springer, 2022

work page 2022

[68] [68]

M. D. T. Torres, T. Chen, F. Wan, P. Chatterjee, and C. de la Fuente-Nunez. Generative latent diffusion language modeling yields anti-infective synthetic peptides. bioRxiv, 2025. doi: 10. 1101/2025.01.31.636003. URL https://www.biorxiv.org/content/early/2025/02/ 01/2025.01.31.636003

work page 2025

[69] [69]

Llama 2: Open Foundation and Fine-Tuned Chat Models

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[70] [70]

C. M. Van Oort, J. B. Ferrell, J. M. Remington, S. Wshah, and J. Li. Ampgan v2: Ma- chine learning-guided design of antimicrobial peptides. Journal of Chemical Information and Modeling, 61(5):2198–2207, 2021. doi: 10.1021/acs.jcim.0c01441. PMID: 33787250

work page doi:10.1021/acs.jcim.0c01441 2021

[71] [71]

van Tilborg, L

D. van Tilborg, L. Rossen, and F. Grisoni. Molecular deep learning at the edge of chemical space. 2025

work page 2025

[72] [72]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. 13

work page 2017

[73] [73]

S. Wang, Y . Guo, Y . Wang, H. Sun, and J. Huang. Smiles-bert: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pages 429–436, 2019

work page 2019

[74] [74]

Y . Wang, J. Wang, Z. Cao, and A. Barati Farimani. Molecular contrastive learning of represen- tations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, 2022

work page 2022

[75] [75]

Weininger

D. Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28 (1):31–36, Feb. 1988. ISSN 0095-2338. doi: 10.1021/ci00057a005

work page doi:10.1021/ci00057a005 1988

[76] [76]

Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V . Pande. Moleculenet: A benchmark for molecular machine learning, 2018

work page 2018

[77] [77]

J. Xia, C. Zhao, B. Hu, Z. Gao, C. Tan, Y . Liu, S. Li, and S. Z. Li. Mole-BERT: Rethinking pre-training graph neural networks for molecules. In The Eleventh International Conference on Learning Representations, 2023

work page 2023

[78] [78]

Xiang, G.-Y

Y .-T. Xiang, G.-Y . Huang, X.-X. Shi, G.-F. Hao, and G.-F. Yang. 3d molecular generation models expand chemical space exploration in drug design. Drug Discovery Today, page 104282, 2024

work page 2024

[79] [79]

Xiong, D

Z. Xiong, D. Wang, X. Liu, F. Zhong, X. Wan, X. Li, Z. Li, X. Luo, K. Chen, H. Jiang, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry, 63(16):8749–8760, 2019

work page 2019

[80] [80]

K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea, et al. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019

work page 2019