pith. sign in

arxiv: 2504.16559 · v3 · submitted 2025-04-23 · 💻 cs.LG · q-bio.QM

Synergistic Benefits of Joint Molecule Generation and Property Prediction

Pith reviewed 2026-05-22 17:41 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM
keywords molecule generationproperty predictionjoint learningtransformerconditional samplingdrug designantimicrobial peptides
0
0 comments X

The pith

Hyformer jointly generates molecules and predicts their properties with synergistic performance gains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a single transformer model, Hyformer, that learns the joint distribution of molecular structures and their properties. This setup lets one model perform both molecule generation and property prediction instead of training separate systems. The authors show that joint training produces extra benefits in generating molecules conditioned on target properties, predicting properties for molecules unlike those seen in training, and learning more useful internal representations. They test the approach on a drug design task aimed at finding new antimicrobial peptides. A sympathetic reader would care because joint models could simplify computational pipelines while improving results on both tasks at once.

Core claim

Hyformer is a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mechanism and a joint pre-training scheme. It is simultaneously optimized for molecule generation and property prediction, while exhibiting synergistic benefits in conditional sampling, out-of-distribution property prediction and representation learning.

What carries the argument

Alternating attention mechanism combined with joint pre-training scheme, which allows the model to alternate between generative and predictive modes while sharing learned representations.

If this is right

  • Conditional generation of molecules with specified properties becomes more accurate because the model learns the joint distribution.
  • Property prediction generalizes better to molecules outside the training distribution due to shared generative and predictive training.
  • Representation learning improves for both tasks because each task regularizes the shared features.
  • Drug design workflows can use one model to propose and score candidate antimicrobial peptides.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The joint approach may extend to other paired data domains such as protein sequences and their functions where generation and prediction are both needed.
  • Reducing the number of separate models could lower computational overhead in screening large chemical libraries.
  • Future experiments could test whether adding more property labels during pre-training further amplifies the observed synergies.

Load-bearing premise

The architectural and optimization challenges of training a single joint model for both generation and prediction can be overcome by an alternating attention mechanism and joint pre-training scheme without introducing new instabilities or loss of performance on either task.

What would settle it

Training separate specialized models for generation and for prediction, then comparing their performance on generation quality, property prediction accuracy, conditional sampling success, and out-of-distribution prediction against the joint Hyformer.

Figures

Figures reproduced from arXiv: 2504.16559 by Adam Izdebski, Ewa Szczurek, Jakub M. Tomczak, Jan Olszewski, Krzysztof Koras, Pankhil Gawade, Serra Korkmaz, Valentin Rauscher.

Figure 1
Figure 1. Figure 1: A schematic representation of HYFORMER. Depending on the task token [TASK], HYFORMER uses either a causal or a bidirectional mask, outputting token probabilities or predicted property values. We propose HYFORMER, a joint transformer-based model that unifies a generative decoder with a predictive encoder in a single set of shared parameters, using an alternating training scheme. 4.1 Model Formulation HYFORM… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Amino-acid distributions between the pre-trained and unconditionally generated se￾quences. (b) Distributions of charge, aromaticity, and isoelectric point (pI) for: non-AMP, AMP and conditionally generated sequences. (c) Frequency of crossing an attention threshold (x-axis) vs. mean attention weight (y-axis) for distinct amino-acids, colored by charge and sized by hydrophobicity. aromaticity, and isoel… view at source ↗
Figure 3
Figure 3. Figure 3: Structures of the twelve generated molecules with Hyformer when the sampling temperature [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Structures of the twelve generated molecules with Hyformer when the sampling temperature [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Structures of the twelve generated molecules with Hyformer when the sampling temperature [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Structures of molecules generated by Hyformer conditioned on QED values, visualized [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Structures of molecules generated by Hyformer conditioned on SA score, visualized using [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Structures of molecules generated by Hyformer conditioned on LogP values, visualized [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Hyformer’s molecular embeddings. The considered chemical properties are normalized to [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
read the original abstract

Modeling the joint distribution of data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic benefits reaching beyond purely generative or predictive models. However, training joint models presents daunting architectural and optimization challenges. Here, we propose Hyformer, a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mechanism and a joint pre-training scheme. We show that Hyformer is simultaneously optimized for molecule generation and property prediction, while exhibiting synergistic benefits in conditional sampling, out-of-distribution property prediction and representation learning. Finally, we demonstrate the benefits of joint learning in a drug design use case of discovering novel antimicrobial~peptides.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Hyformer, a transformer-based joint model for molecule generation and property prediction. It employs an alternating attention mechanism and joint pre-training to address architectural and optimization challenges, claiming that the model is simultaneously optimized for both tasks and exhibits synergistic benefits in conditional sampling, out-of-distribution property prediction, representation learning, and a drug-design use case involving discovery of novel antimicrobial peptides.

Significance. If the empirical results and ablations hold, the work could meaningfully advance joint generative-predictive modeling in molecular machine learning by demonstrating that a single model can match or exceed specialized baselines while unlocking synergies not available to separate models. The alternating-attention design and joint pre-training scheme constitute a concrete architectural contribution that may generalize beyond the reported tasks.

major comments (3)
  1. [Section 3] Section 3: The alternating attention mechanism is presented as resolving task interference, yet the manuscript provides neither per-task loss curves nor gradient-norm statistics across training epochs. Without these diagnostics it is impossible to verify that the joint optimization truly achieves simultaneous strong performance on generation and prediction rather than trading off one objective against the other.
  2. [Results (Tables 1–3)] Results (Tables 1–3 and associated figures): Single-task baselines for molecule generation (validity, uniqueness, novelty) and property prediction (MAE or R² on held-out sets) are not reported. Consequently the central claim of “synergistic benefits” and absence of negative transfer cannot be quantitatively evaluated; the reported joint-model numbers alone do not establish that Hyformer matches or exceeds specialized models on each task individually.
  3. [OOD experiments] OOD property-prediction experiments: The improvement attributed to joint training is shown only for the full Hyformer; an ablation that freezes the generative component or trains a prediction-only variant on the same data is missing. This ablation is load-bearing for the claim that joint pre-training confers out-of-distribution robustness.
minor comments (2)
  1. [Abstract] Abstract: the phrase “simultaneously optimized for molecule generation and property prediction” is repeated without any numeric preview; adding one or two headline metrics would improve clarity.
  2. [Section 3] Notation: the alternating attention block is introduced with several new symbols; a consolidated table or diagram legend would reduce reader effort when cross-referencing equations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our work. We address each of the major comments below and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Section 3] Section 3: The alternating attention mechanism is presented as resolving task interference, yet the manuscript provides neither per-task loss curves nor gradient-norm statistics across training epochs. Without these diagnostics it is impossible to verify that the joint optimization truly achieves simultaneous strong performance on generation and prediction rather than trading off one objective against the other.

    Authors: We agree that per-task loss curves and gradient-norm statistics would provide valuable evidence that the alternating attention mechanism enables simultaneous optimization without task interference. We will add these diagnostics to Section 3 in the revised manuscript. revision: yes

  2. Referee: [Results (Tables 1–3)] Results (Tables 1–3 and associated figures): Single-task baselines for molecule generation (validity, uniqueness, novelty) and property prediction (MAE or R² on held-out sets) are not reported. Consequently the central claim of “synergistic benefits” and absence of negative transfer cannot be quantitatively evaluated; the reported joint-model numbers alone do not establish that Hyformer matches or exceeds specialized models on each task individually.

    Authors: We recognize that reporting single-task baselines is essential to quantitatively demonstrate synergistic benefits and the absence of negative transfer. In the revised manuscript, we will include single-task baseline results in Tables 1–3 for both generation and property prediction tasks, allowing direct comparison with the joint Hyformer model. revision: yes

  3. Referee: [OOD experiments] OOD property-prediction experiments: The improvement attributed to joint training is shown only for the full Hyformer; an ablation that freezes the generative component or trains a prediction-only variant on the same data is missing. This ablation is load-bearing for the claim that joint pre-training confers out-of-distribution robustness.

    Authors: We agree that an ablation comparing the full model to a prediction-only variant is necessary to substantiate the OOD robustness benefits from joint pre-training. We will incorporate this ablation into the OOD experiments section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on experimental results rather than self-referential derivations

full rationale

The paper introduces Hyformer, a transformer architecture for joint molecule generation and property prediction via alternating attention and joint pre-training. All central claims—simultaneous optimization, synergies in conditional sampling, OOD prediction, and representation learning—are presented as outcomes of empirical evaluation on benchmarks and a drug-design case study. No equations, fitted parameters, or predictions are described that reduce by construction to the model's own inputs or to self-citations. The work contains no load-bearing uniqueness theorems, ansatzes smuggled via prior self-work, or renamings of known patterns; results are validated externally against specialized models and datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on the domain assumption that joint distribution modeling inherently produces synergistic benefits that separate models cannot achieve, and that the proposed architecture successfully addresses the stated training challenges.

axioms (1)
  • domain assumption Modeling the joint distribution of data samples and their properties allows construction of a single model with synergistic benefits beyond purely generative or predictive models.
    Opening sentence of the abstract; this premise motivates the entire approach.
invented entities (1)
  • Hyformer no independent evidence
    purpose: Transformer-based joint model blending generative and predictive functionalities via alternating attention and joint pre-training
    New model introduced to solve the joint training problem; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5665 in / 1193 out tokens · 43617 ms · 2026-05-22T17:41:41.586249+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 2 internal anchors

  1. [1]

    Arús-Pous, S

    J. Arús-Pous, S. V . Johansson, O. Prykhodko, E. J. Bjerrum, C. Tyrchan, J.-L. Reymond, H. Chen, and O. Engkvist. Randomized smiles strings improve the quality of molecular generative models. Journal of cheminformatics, 11:1–13, 2019

  2. [2]

    Bagal, R

    V . Bagal, R. Aggarwal, P. K. Vinod, and U. D. Priyakumar. MolGPT: Molecular Generation Using a Transformer-Decoder Model. Journal of Chemical Information and Modeling, 62(9): 2064–2076, May 2022. ISSN 1549-9596. doi: 10.1021/acs.jcim.1c00600

  3. [3]

    X. Bi, C. Wang, W. Dong, W. Zhu, and D. Shang. Antimicrobial properties and interaction of two trp-substituted cationic antimicrobial peptides with a lipid bilayer. The Journal of Antibiotics, 67(5):361–368, 2014

  4. [4]

    C. M. Bishop. Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing, 141(4):217–222, 1994

  5. [5]

    E. J. Bjerrum. Smiles enumeration as data augmentation for neural network modeling of molecules, 2017

  6. [6]

    Born and M

    J. Born and M. Manica. Regression transformer enables concurrent sequence regression and generation for molecular language modelling. Nature Machine Intelligence, 5(4):432–444, 2023

  7. [7]

    Brown, M

    N. Brown, M. Fiscato, M. H. Segler, and A. C. Vaucher. GuacaMol: Benchmarking Models for de Novo Molecular Design. Journal of Chemical Information and Modeling, 59(3):1096–1108,

  8. [8]

    doi: 10.1021/acs.jcim.8b00839

    ISSN 1549-9596. doi: 10.1021/acs.jcim.8b00839

  9. [9]

    Cabas-Mora, A

    G. Cabas-Mora, A. Daza, N. Soto-García, V . Garrido, D. Alvarez, M. Navarrete, L. Sarmiento- Varón, J. H. Sepúlveda Yañez, M. D. Davari, F. Cadet, Á. Olivera-Nappa, R. Uribe-Paredes, and D. Medina-Ortiz. Peptipedia v2.0: A peptide sequence database and user-friendly web platform. a major update. bioRxiv, 2024. doi: 10.1101/2024.07.11.603053. URL https: //...

  10. [10]

    Cao and Z

    S. Cao and Z. Zhang. Deep hybrid models for out-of-distribution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4733–4743, 2022

  11. [11]

    Chen and C

    T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016

  12. [12]

    T. Chen, P. Vure, R. Pulugurta, and P. Chatterjee. AMP-diffusion: Integrating latent diffusion with protein language models for antimicrobial peptide generation. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023

  13. [13]

    T. U. Consortium. Uniprot: the universal protein knowledgebase in 2025. Nucleic Acids Research, 53(D1):D609–D617, 11 2024. ISSN 1362-4962. doi: 10.1093/nar/gkae1010. URL https://doi.org/10.1093/nar/gkae1010

  14. [14]

    P. Das, K. Wadhawan, O. Chang, T. Sercu, C. D. Santos, M. Riemer, V . Chenthamarakshan, I. Padhi, and A. Mojsilovic. Pepcvae: Semi-supervised targeted design of antimicrobial peptide sequences, 2018

  15. [15]

    L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y . Wang, J. Gao, M. Zhou, and H.-W. Hon. Unified language model pre-training for natural language understanding and generation. Advances in neural information processing systems, 32, 2019

  16. [16]

    Fabian, T

    B. Fabian, T. Edlich, H. Gaspar, M. Segler, J. Meyers, M. Fiscato, and M. Ahmed. Molecular representation learning with language models and domain-relevant auxiliary tasks. arXiv preprint arXiv:2011.13230, 2020

  17. [17]

    X. Fang, L. Liu, J. Lei, D. He, S. Zhang, J. Zhou, F. Wang, H. Wu, and H. Wang. Geometry- enhanced molecular representation learning for property prediction. Nature Machine Intelli- gence, 4(2):127–134, 2022. 10

  18. [18]

    S. Feng, Y . Ni, Y . Lu, Z.-M. Ma, W.-Y . Ma, and Y . Lan. Unigem: A unified approach to generation and property prediction for molecules. arXiv preprint arXiv:2410.10516, 2024

  19. [19]

    Flam-Shepherd, K

    D. Flam-Shepherd, K. Zhu, and A. Aspuru-Guzik. Language models can learn complex molecular distributions. Nature Communications, 13(1):3293, 2022

  20. [20]

    B. Gao, M. Ren, Y . Ni, Y . Huang, B. Qiang, Z.-M. Ma, W.-Y . Ma, and Y . Lan. Rethink- ing specificity in sbdd: Leveraging delta score and energy-guided diffusion. arXiv preprint arXiv:2403.12987, 2024

  21. [21]

    L. Gao, J. Schulman, and J. Hilton. Scaling laws for reward model overoptimization. In International Conference on Machine Learning, pages 10835–10866. PMLR, 2023

  22. [22]

    Z. Gao, D. Dong, C. Tan, J. Xia, B. Hu, and S. Z. Li. A graph is worth k words: Euclideanizing graph using pure transformer. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors, Proceedings of the 41st International Conference on Machine Learning , volume 235 of Proceedings of Machine Learning Research ...

  23. [23]

    Z. Geng, S. Xie, Y . Xia, L. Wu, T. Qin, J. Wang, Y . Zhang, F. Wu, and T.-Y . Liu. De novo molecular generation via connection-aware motif mining, 2023

  24. [24]

    Gers and E

    F. Gers and E. Schmidhuber. Lstm recurrent networks learn simple context-free and context- sensitive languages. IEEE Transactions on Neural Networks, 12(6):1333–1340, 2001. doi: 10.1109/72.963769

  25. [25]

    Gómez-Bombarelli, J

    R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik. Auto- matic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science, 4(2):268–276, Feb. 2018. ISSN 2374-7943. doi: 10.1021/acscentsci.7b00572

  26. [26]

    Grathwohl, K.-C

    W. Grathwohl, K.-C. Wang, J.-H. Jacobsen, D. Duvenaud, M. Norouzi, and K. Swersky. Your classifier is secretly an energy based model and you should treat it like one, 2020

  27. [27]

    F. Grisoni. Chemical language models for de novo drug design: Challenges and opportunities. Current Opinion in Structural Biology, 79:102527, 2023

  28. [28]

    J. Guan, W. W. Qian, X. Peng, Y . Su, J. Peng, and J. Ma. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. arXiv preprint arXiv:2303.03543, 2023

  29. [29]

    Hetzel, J

    L. Hetzel, J. Sommer, B. Rieck, F. Theis, and S. Günnemann. Magnet: Motif-agnostic generation of molecules from shapes, 2023

  30. [30]

    Hoogeboom, V

    E. Hoogeboom, V . G. Satorras, C. Vignac, and M. Welling. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pages 8867–8887. PMLR, 2022

  31. [31]

    W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V . Pande, and J. Leskovec. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265, 2019

  32. [32]

    Irwin, S

    R. Irwin, S. Dimitriadis, J. He, and E. J. Bjerrum. Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology, 3(1):015022, 2022

  33. [33]

    Jaakkola and D

    T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers.Advances in neural information processing systems, 11, 1998

  34. [34]

    W. Jin, R. Barzilay, and T. Jaakkola. Junction tree variational autoencoder for molecular graph generation, 2019

  35. [35]

    D. P. Kingma and M. Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

  36. [36]

    Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation

    M. Krenn, F. Häse, A. Nigam, P. Friederich, and A. Aspuru-Guzik. Self-referencing embedded strings (selfies): A 100Machine Learning: Science and Technology, 1(4):045024, Oct. 2020. ISSN 2632-2153. doi: 10.1088/2632-2153/aba947. 11

  37. [37]

    P.-K. Lai, D. T. Tresnak, and B. J. Hackel. Identification and elucidation of proline-rich antimicrobial peptides with enhanced potency and delivery. Biotechnology and bioengineering, 116(10):2439–2450, 2019

  38. [38]

    J. A. Lasserre, C. M. Bishop, and T. P. Minka. Principled hybrids of generative and discrimi- native models. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 1, pages 87–94. IEEE, 2006

  39. [39]

    T. J. Lawrence, D. L. Carper, M. K. Spangler, A. A. Carrell, T. A. Rush, S. J. Minter, D. J. Weston, and J. L. Labbé. ampeppy 1.0: a portable and accurate antimicrobial peptide prediction tool. Bioinformatics, 37(14):2058–2060, 11 2020. ISSN 1367-4803. doi: 10.1093/bioinformatics/ btaa917. URL https://doi.org/10.1093/bioinformatics/btaa917

  40. [40]

    C. Li, D. Sutherland, S. A. Hammond, C. Yang, F. Taho, L. Bergman, S. Houston, R. L. Warren, T. Wong, L. M. N. Hoang, C. E. Cameron, C. C. Helbing, and I. Birol. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genomics, 23(1):77, Jan. 2022

  41. [41]

    P. Li, J. Wang, Y . Qiao, H. Chen, Y . Yu, X. Yao, P. Gao, G. Xie, and S. Song. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Briefings in Bioinformatics, 22(6):bbab109, 2021

  42. [42]

    T. Li, X. Ren, X. Luo, Z. Wang, Z. Li, X. Luo, J. Shen, Y . Li, D. Yuan, R. Nussinov, X. Zeng, J. Shi, and F. Cheng. A foundation model identifies broad-spectrum antimicrobial peptides against drug-resistant bacterial infection. Nat. Commun., 15(1):7538, Aug. 2024

  43. [43]

    Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y . Shmueli, A. dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, and A. Rives. Evolutionary- scale prediction of atomic-level protein structure with a language model. Science, 379(6637): 1123–1130, 2023. doi: 10.1126/science.ade2574

  44. [44]

    M. Liu, K. Yan, B. Oztekin, and S. Ji. Graphebm: Molecular graph generation with energy-based models. arXiv preprint arXiv:2102.00546, 2021

  45. [45]

    Q. Liu, M. Allamanis, M. Brockschmidt, and A. Gaunt. Constrained graph variational autoen- coders for molecule design. Advances in neural information processing systems, 31, 2018

  46. [46]

    S. Liu, M. F. Demirel, and Y . Liang. N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. Advances in neural information processing systems, 32, 2019

  47. [47]

    S. Liu, H. Wang, W. Liu, J. Lasenby, H. Guo, and J. Tang. Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728, 2021

  48. [48]

    Y . Luo, K. Yan, and S. Ji. Graphdf: A discrete flow model for molecular graph generation. In International conference on machine learning, pages 7192–7203. PMLR, 2021

  49. [49]

    Maziarz, H

    K. Maziarz, H. Jackson-Flux, P. Cameron, F. Sirockin, N. Schneider, N. Stiefl, M. Segler, and M. Brockschmidt. Learning to extend molecular scaffolds with structural motifs, 2022

  50. [50]

    Controlled decoding from language models

    S. Mudgal, J. Lee, H. Ganapathy, Y . Li, T. Wang, Y . Huang, Z. Chen, H.-T. Cheng, M. Collins, T. Strohman, et al. Controlled decoding from language models.arXiv preprint arXiv:2310.17022, 2023

  51. [51]

    Nalisnick, A

    E. Nalisnick, A. Matsukawa, Y . W. Teh, D. Gorur, and B. Lakshminarayanan. Hybrid models with deep and invertible features. In International Conference on Machine Learning, pages 4723–4732. PMLR, 2019

  52. [52]

    Özçelik, L

    R. Özçelik, L. van Weesep, S. de Ruiter, and F. Grisoni. peptidy: A light-weight python library for peptide representation in machine learning. 2025

  53. [53]

    K. B. Petersen, M. S. Pedersen, et al. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008. 12

  54. [54]

    Rafailov, A

    R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36:53728–53741, 2023

  55. [55]

    Raffel, N

    C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023

  56. [56]

    Y . Rong, Y . Bian, T. Xu, W. Xie, Y . Wei, W. Huang, and J. Huang. Self-supervised graph transformer on large-scale molecular data. Advances in neural information processing systems, 33:12559–12571, 2020

  57. [57]

    C. D. Santos-Júnior, Y . Duan, H. Chong, T. S. Schmidt, A. Fullam, P. Bork, X.-M. Zhao, and L. P. Coelho. Ampsphere : the worldwide survey of prokaryotic antimicrobial peptides, May

  58. [58]

    URL https://doi.org/10.5281/zenodo.6511404

  59. [59]

    Schwaller, D

    P. Schwaller, D. Probst, A. C. Vaucher, V . H. Nair, D. Kreutter, T. Laino, and J.-L. Reymond. Mapping the space of chemical reactions using attention-based neural networks. ChemRxiv,

  60. [60]

    doi: 10.26434/chemrxiv.9897365.v4

  61. [61]

    M. H. Segler, T. Kogej, C. Tyrchan, and M. P. Waller. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS central science, 4(1):120–131, 2018

  62. [62]

    Y . Shi, S. Zheng, G. Ke, Y . Shen, J. You, J. He, S. Luo, C. Liu, D. He, and T.-Y . Liu. Benchmark- ing graphormer on large-scale molecular modeling datasets. arXiv preprint arXiv:2203.04810, 2022

  63. [63]

    S. Steshin. Lo-hi: Practical ml drug discovery benchmark. In Advances in Neural Information Processing Systems, 2023

  64. [64]

    J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann, et al. A deep learning approach to antibiotic discovery. Cell, 180(4):688–702, 2020

  65. [65]

    Sultan, J

    A. Sultan, J. Sieg, M. Mathea, and A. V olkamer. Transformers for molecular property prediction: Lessons learned from the past five years. arxiv preprint arxiv: 240403969. 2024

  66. [66]

    Szymczak, M

    P. Szymczak, M. Mo ˙zejko, T. Grzegorzek, R. Jurczak, M. Bauer, D. Neubauer, K. Sikora, M. Michalski, J. Sroka, P. Setny, W. Kamysz, and E. Szczurek. Discovering highly potent antimicrobial peptides with deep generative model hydramp. bioRxiv, 2023. doi: 10.1101/2022. 01.27.478054

  67. [67]

    J. M. Tomczak. Deep generative modeling for neural compression. In Deep Generative Modeling. Springer, 2022

  68. [68]

    M. D. T. Torres, T. Chen, F. Wan, P. Chatterjee, and C. de la Fuente-Nunez. Generative latent diffusion language modeling yields anti-infective synthetic peptides. bioRxiv, 2025. doi: 10. 1101/2025.01.31.636003. URL https://www.biorxiv.org/content/early/2025/02/ 01/2025.01.31.636003

  69. [69]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

  70. [70]

    C. M. Van Oort, J. B. Ferrell, J. M. Remington, S. Wshah, and J. Li. Ampgan v2: Ma- chine learning-guided design of antimicrobial peptides. Journal of Chemical Information and Modeling, 61(5):2198–2207, 2021. doi: 10.1021/acs.jcim.0c01441. PMID: 33787250

  71. [71]

    van Tilborg, L

    D. van Tilborg, L. Rossen, and F. Grisoni. Molecular deep learning at the edge of chemical space. 2025

  72. [72]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. 13

  73. [73]

    S. Wang, Y . Guo, Y . Wang, H. Sun, and J. Huang. Smiles-bert: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pages 429–436, 2019

  74. [74]

    Y . Wang, J. Wang, Z. Cao, and A. Barati Farimani. Molecular contrastive learning of represen- tations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, 2022

  75. [75]

    Weininger

    D. Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28 (1):31–36, Feb. 1988. ISSN 0095-2338. doi: 10.1021/ci00057a005

  76. [76]

    Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V . Pande. Moleculenet: A benchmark for molecular machine learning, 2018

  77. [77]

    J. Xia, C. Zhao, B. Hu, Z. Gao, C. Tan, Y . Liu, S. Li, and S. Z. Li. Mole-BERT: Rethinking pre-training graph neural networks for molecules. In The Eleventh International Conference on Learning Representations, 2023

  78. [78]

    Xiang, G.-Y

    Y .-T. Xiang, G.-Y . Huang, X.-X. Shi, G.-F. Hao, and G.-F. Yang. 3d molecular generation models expand chemical space exploration in drug design. Drug Discovery Today, page 104282, 2024

  79. [79]

    Xiong, D

    Z. Xiong, D. Wang, X. Liu, F. Zhong, X. Wan, X. Li, Z. Li, X. Luo, K. Chen, H. Jiang, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry, 63(16):8749–8760, 2019

  80. [80]

    K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea, et al. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019

Showing first 80 references.