Pretraining a Foundation Model for Small-Molecule Natural Products
Pith reviewed 2026-05-22 23:30 UTC · model grok-4.3
The pith
Pretraining with scaffold-focused contrastive and masked graph learning produces representations that reach state-of-the-art on natural product mining and drug discovery tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors pretrain a foundation model for small-molecule natural products. Their pretraining strategy combines contrastive learning and masked graph learning objectives that emphasize evolutionary information from molecular scaffolds while capturing side-chain information. The resulting model achieves state-of-the-art performance on taxonomy classification, fine-grained evolutionary analysis at gene and microbial levels, and virtual screening for drug candidates, outperforming both synthesized-molecule baselines and standard supervised approaches.
What carries the argument
The novel pretraining strategy that combines contrastive learning and masked graph learning to emphasize evolutionary scaffold information.
If this is right
- Current models are shown to be inadequate for understanding natural synthesis through taxonomy classification comparisons.
- The model captures evolutionary information at both gene and microbial levels.
- Virtual screening experiments show the representations help identify potential drug candidates more effectively.
- The approach moves beyond one-model-for-one-task paradigms to a more generalizable foundation model.
Where Pith is reading between the lines
- The same scaffold-focused pretraining might improve prediction of natural-product interactions with human targets not examined in the reported experiments.
- Combining the learned representations with genomic sequence data could enable more accurate microbial source attribution for newly discovered metabolites.
- Evaluating the model on larger or chemically more diverse natural-product collections would test whether the reported gains persist outside the current evaluation sets.
Load-bearing premise
That a pretraining strategy focused on evolutionary scaffold information will produce more generalizable representations than standard supervised or general-molecule models for natural product tasks.
What would settle it
A competing model trained without the scaffold-emphasizing pretraining objectives that matches or exceeds the reported performance on the same taxonomy classification, evolutionary analysis, and virtual screening benchmarks would falsify the central claim.
read the original abstract
Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learning approaches designed for specific downstream tasks. However, such one-model-for-a-task paradigm often lacks generalizability and leaves significant room for performance improvement. Additionally, existing molecular characterization methods are not well-suited for the unique tasks associated with natural products. To address these limitations, we have pre-trained a foundation model for natural products based on their unique properties. Our approach employs a novel pretraining strategy that is especially tailored to natural products. By incorporating contrastive learning and masked graph learning objectives, we emphasize evolutional information from molecular scaffolds while capturing side-chain information. Our framework achieves state-of-the-art (SOTA) results in various downstream tasks related to natural product mining and drug discovery. We first compare taxonomy classification with synthesized molecule-focused baselines to demonstrate that current models are inadequate for understanding natural synthesis. Furthermore, by diving into a fine-grained analysis at both the gene and microbial levels, NaFM demonstrates the ability to capture evolutionary information. Eventually, our method is experimented with virtual screening, illustrating informative natural product representations that can lead to more effective identification of potential drug candidates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces NaFM, a foundation model for small-molecule natural products pretrained via a novel combination of contrastive learning (emphasizing evolutionary information from molecular scaffolds) and masked graph learning (capturing side-chain information). It claims this addresses limitations of supervised task-specific models and general molecular characterization methods, demonstrating SOTA results on taxonomy classification (vs. synthesized-molecule baselines), fine-grained gene- and microbial-level evolutionary analysis, and virtual screening for drug discovery.
Significance. If the performance gains are shown to arise specifically from the scaffold-evolutionary contrastive objective rather than domain-specific training data alone, the work could provide more generalizable representations for natural-product tasks and improve downstream applications in drug discovery.
major comments (1)
- [Abstract (downstream evaluation description) and Results (taxonomy classification comparison)] The central SOTA claim on taxonomy classification, gene-level analysis, and virtual screening requires evidence that the evolutionary contrastive term contributes beyond standard pretraining on the same natural-product structures; the manuscript provides no ablations or controls that isolate this objective from simply training a GNN on natural-product data with conventional objectives.
minor comments (1)
- [Abstract] The abstract asserts SOTA performance but supplies no quantitative metrics, baseline descriptions, dataset sizes, or statistical details, which should be added to the main text or a dedicated results table for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review. The primary concern regarding the need to isolate the contribution of the scaffold-focused contrastive objective is well-taken, and we address it directly below.
read point-by-point responses
-
Referee: [Abstract (downstream evaluation description) and Results (taxonomy classification comparison)] The central SOTA claim on taxonomy classification, gene-level analysis, and virtual screening requires evidence that the evolutionary contrastive term contributes beyond standard pretraining on the same natural-product structures; the manuscript provides no ablations or controls that isolate this objective from simply training a GNN on natural-product data with conventional objectives.
Authors: We agree that explicit ablations are required to demonstrate that performance gains arise specifically from the evolutionary contrastive term rather than from domain-specific natural-product data alone. Our existing comparisons to synthesized-molecule baselines establish that general molecular models are inadequate for natural-product tasks, but they do not isolate the effect of the contrastive objective within the natural-product domain. In the revised manuscript we will add the requested controls: (i) a GNN pretrained on the identical natural-product corpus using only the masked-graph objective, and (ii) direct head-to-head comparisons of this baseline against the full NaFM objective on taxonomy classification, gene/microbial analysis, and virtual screening. These results will be reported in a new subsection of the Results and discussed in the context of the referee’s point. revision: yes
Circularity Check
No circularity: purely empirical pretraining and downstream evaluation
full rationale
The paper presents an empirical framework that pretrains a graph neural network on natural-product structures using contrastive and masked objectives, then evaluates on taxonomy classification, gene-level analysis, and virtual screening tasks. No equations, derivations, or first-principles claims appear; performance is measured by standard supervised fine-tuning metrics on held-out data. The central premise that the tailored objectives capture evolutionary scaffold information is tested via ablation-style comparisons to baselines, not asserted by construction or reduced to fitted parameters. Self-citations, if present, are not load-bearing for any mathematical step. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Nucleic Acids Research53(D1), 634–643 (2025)
Chandrasekhar, V., Rajan, K., Kanakam, S.R.S., Sharma, N., Weißenborn, V., Schaub, J., Steinbeck, C.: Coconut 2.0: a comprehensive overhaul and curation of the collection of open natural products database. Nucleic Acids Research53(D1), 634–643 (2025)
work page 2025
-
[2]
Journal of natural products79(3), 629–661 (2016)
Newman, D.J., Cragg, G.M.: Natural products as sources of new drugs from 1981 to 2014. Journal of natural products79(3), 629–661 (2016)
work page 1981
-
[3]
Pharmaceutical research13(8), 1133–1141 (1996)
Clark, A.M.: Natural products as a resource for new drugs. Pharmaceutical research13(8), 1133–1141 (1996)
work page 1996
-
[4]
Drug discovery today13(19-20), 894–901 (2008)
Harvey, A.L.: Natural products in drug discovery. Drug discovery today13(19-20), 894–901 (2008)
work page 2008
-
[5]
Li, J.W.-H., Vederas, J.C.: Drug discovery and natural products: end of an era or an endless frontier? Science 325(5937), 161–165 (2009)
work page 2009
-
[6]
Nature reviews Drug discovery20(3), 200–216 (2021)
Atanasov, A.G., Zotchev, S.B., Dirsch, V.M., Supuran, C.T.: Natural products in drug discovery: advances and opportunities. Nature reviews Drug discovery20(3), 200–216 (2021)
work page 2021
-
[7]
Corson, T.W., Crews, C.M.: Molecular understanding and modern application of traditional medicines: triumphs and trials. Cell130(5), 769–774 (2007)
work page 2007
-
[8]
Nucleic acids research43(D1), 935–939 (2015)
Banerjee, P., Erehman, J., Gohlke, B.-O., Wilhelm, T., Preissner, R., Dunkel, M.: Super natural ii—a database of natural products. Nucleic acids research43(D1), 935–939 (2015)
work page 2015
-
[9]
Journal of Cheminformatics13(1), 2 (2021)
Sorokina, M., Merseburger, P., Rajan, K., Yirik, M.A., Steinbeck, C.: Coconut online: collection of open natural products database. Journal of Cheminformatics13(1), 2 (2021)
work page 2021
-
[10]
Rutz, A., Sorokina, M., Galgonek, J., Mietchen, D., Willighagen, E., Gaudry, A., Graham, J.G., Stephan, R., Page, R., Vondrášek, J.,et al.: The lotus initiative for open knowledge management in natural products research. Elife11, 70780 (2022)
work page 2022
-
[11]
Nucleic acids research46(D1), 1217–1222 (2018)
Zeng, X., Zhang, P., He, W., Qin, C., Chen, S., Tao, L., Wang, Y., Tan, Y., Gao, D., Wang, B.,et al.: Npass: natural product activity and species source database for natural product research, discovery and tool development. Nucleic acids research46(D1), 1217–1222 (2018)
work page 2018
-
[12]
ACS central science5(11), 1824–1833 (2019)
Van Santen, J.A., Jacob, G., Singh, A.L., Aniebok, V., Balunas, M.J., Bunsko, D., Neto, F.C., Castaño- Espriu, L., Chang, C., Clark, T.N.,et al.: The natural products atlas: an open access knowledge base for microbial natural products discovery. ACS central science5(11), 1824–1833 (2019)
work page 2019
-
[13]
Terlouw, B.R., Blin, K., Navarro-Munoz, J.C., Avalon, N.E., Chevrette, M.G., Egbert, S., Lee, S., Meijer, D., Recchia, M.J., Reitz, Z.L., et al. : Mibig 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters. Nucleic acids research51(D1), 603–610 (2023)
work page 2023
-
[14]
Journal of chemical information and computer sciences 42(3), 742–748 (2002)
Lei, J., Zhou, J.: A marine natural product database. Journal of chemical information and computer sciences 42(3), 742–748 (2002)
work page 2002
-
[15]
Biotechnology journal14(11), 1800607 (2019)
Barbosa, A.J., Roque, A.C.: Free marine natural products databases for biotechnology and bioengi- neering. Biotechnology journal14(11), 1800607 (2019)
work page 2019
-
[16]
Nucleic Acids Research 49(D1), 509–515 (2021)
Lyu, C., Chen, T., Qiang, B., Liu, N., Wang, H., Zhang, L., Liu, Z.: Cmnpd: a comprehensive marine natural products database towards facilitating drug discovery from the ocean. Nucleic Acids Research 49(D1), 509–515 (2021)
work page 2021
-
[17]
Environmental microbiome16(1), 6 (2021)
Aghdam, S.A., Brown, A.M.V.: Deep learning approaches for natural product discovery from plant endophytic microbiomes. Environmental microbiome16(1), 6 (2021)
work page 2021
-
[18]
Nature Communications13(1), 3342 (2022) 16
Zheng, S., Zeng, T., Li, C., Chen, B., Coley, C.W., Yang, Y., Wu, R.: Deep learning driven biosynthetic pathways navigation for natural products with bionavi-np. Nature Communications13(1), 3342 (2022) 16
work page 2022
-
[19]
Molecular informatics39(11), 2000057 (2020)
Lai, J., Hu, J., Wang, Y., Zhou, X., Li, Y., Zhang, L., Liu, Z.: Privileged scaffold analysis of natural products with deep learning-based indication prediction model. Molecular informatics39(11), 2000057 (2020)
work page 2020
-
[20]
Frontiers in Pharmacology11, 584875 (2020)
Yoo,S.,Yang,H.C.,Lee,S.,Shin,J.,Min,S.,Lee,E.,Song,M.,Lee,D.:Adeeplearning-basedapproach for identifying the medicinal uses of plant-derived natural compounds. Frontiers in Pharmacology11, 584875 (2020)
work page 2020
-
[21]
Nucleic acids research47(18), 110–110 (2019)
Hannigan, G.D., Prihoda, D., Palicka, A., Soukup, J., Klempir, O., Rampula, L., Durcak, J., Wurst, M., Kotowski, J., Chang, D.,et al.: A deep learning genome-mining strategy for biosynthetic gene cluster prediction. Nucleic acids research47(18), 110–110 (2019)
work page 2019
-
[22]
European Journal of Medicinal Chemistry 210, 112982 (2021)
Liu, Z., Huang, D., Zheng, S., Song, Y., Liu, B., Sun, J., Niu, Z., Gu, Q., Xu, J., Xie, L.: Deep learning enables discovery of highly potent anti-osteoporosis natural products. European Journal of Medicinal Chemistry 210, 112982 (2021)
work page 2021
-
[23]
Xu, Q., Tan, A.K., Guo, L., Lim, Y.H., Tay, D.W., Ang, S.J.: Composite machine learning strategy for natural products taxonomical classification and structural insights. Digital Discovery (2024)
work page 2024
-
[24]
Stokes, J.M., Yang, K., Swanson, K., Jin, W., Cubillos-Ruiz, A., Donghia, N.M., MacNair, C.R., French, S., Carfrae, L.A., Bloom-Ackermann, Z.,et al.: A deep learning approach to antibiotic discovery. Cell 180(4), 688–702 (2020)
work page 2020
-
[25]
: Classyfire: automated chemical classification with a comprehensive, computable taxonomy
Djoumbou Feunang, Y., Eisner, R., Knox, C., Chepelev, L., Hastings, J., Owen, G., Fahy, E., Stein- beck, C., Subramanian, S., Bolton, E., et al. : Classyfire: automated chemical classification with a comprehensive, computable taxonomy. Journal of cheminformatics8, 1–20 (2016)
work page 2016
-
[26]
Journal of Natural Products84(11), 2795–2807 (2021)
Kim, H.W., Wang, M., Leber, C.A., Nothias, L.-F., Reher, R., Kang, K.B., Van Der Hooft, J.J., Dorrestein, P.C., Gerwick, W.H., Cottrell, G.W.: Npclassifier: a deep neural network-based structural classification tool for natural products. Journal of Natural Products84(11), 2795–2807 (2021)
work page 2021
-
[27]
Briefings in functional genomics20(5), 323–332 (2021)
Yu, L., Su, Y., Liu, Y., Zeng, X.: Review of unsupervised pretraining strategies for molecules representation. Briefings in functional genomics20(5), 323–332 (2021)
work page 2021
-
[28]
Weininger, D., Weininger, A., Weininger, J.L.: Smiles. 2. algorithm for generation of unique smiles notation. Journal of chemical information and computer sciences29(2), 97–101 (1989)
work page 1989
-
[29]
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[30]
Advances in neural information processing systems30 (2017)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems30 (2017)
work page 2017
-
[31]
Xu, Z., Wang, S., Zhu, F., Huang, J.: Seq2seq fingerprint: An unsupervised deep molecular embed- ding for drug discovery. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 285–294 (2017)
work page 2017
-
[32]
Jastrzębski, S., Leśniak, D., Czarnecki, W.M.: Learning to smile (s). arXiv preprint arXiv:1602.06289 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[33]
Journal of computer-aided molecular design30, 595–608 (2016)
Kearnes, S., McCloskey, K., Berndl, M., Pande, V., Riley, P.: Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design30, 595–608 (2016)
work page 2016
-
[34]
Advances in neural information processing systems30 (2017)
Schütt, K., Kindermans, P.-J., Sauceda Felix, H.E., Chmiela, S., Tkatchenko, A., Müller, K.-R.: Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems30 (2017)
work page 2017
-
[35]
Nature Machine Intelligence4(3), 279–287 (2022) 17
Wang, Y., Wang, J., Cao, Z., Barati Farimani, A.: Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence4(3), 279–287 (2022) 17
work page 2022
-
[36]
Strategies for pre-training graph neural networks.arXiv preprint arXiv:1905.12265, 2019
Hu, W., Liu, B., Gomes, J., Zitnik, M., Liang, P., Pande, V., Leskovec, J.: Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019)
-
[37]
Xia, J., Zhao, C., Hu, B., Gao, Z., Tan, C., Liu, Y., Li, S., Li, S.Z.: Mole-bert: Rethinking pre-training graph neural networks for molecules (2023)
work page 2023
-
[38]
Pre-training molecular graph representation with 3d geometry
Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H., Tang, J.: Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728 (2021)
-
[39]
In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp
Zhu, J., Xia, Y., Wu, L., Xie, S., Qin, T., Zhou, W., Li, H., Liu, T.-Y.: Unified 2d and 3d pre-training of molecular representations. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2626–2636 (2022)
work page 2022
-
[40]
Nature Communications14(1), 7568 (2023)
Li, H., Zhang, R., Min, Y., Ma, D., Zhao, D., Zeng, J.: A knowledge-guided pre-training framework for improving molecular representation learning. Nature Communications14(1), 7568 (2023)
work page 2023
-
[41]
Nature Machine Intelligence, 1–10 (2024)
Ni, Y., Feng, S., Hong, X., Sun, Y., Ma, W.-Y., Ma, Z.-M., Ye, Q., Lan, Y.: Pre-training with fractional denoising to enhance molecular property prediction. Nature Machine Intelligence, 1–10 (2024)
work page 2024
-
[42]
Nature Reviews Drug Discovery22(11), 895–916 (2023)
Mullowney, M.W., Duncan, K.R., Elsayed, S.S., Garg, N., Hooft, J.J., Martin, N.I., Meijer, D., Terlouw, B.R., Biermann, F., Blin, K.,et al.: Artificial intelligence for natural product drug discovery. Nature Reviews Drug Discovery22(11), 895–916 (2023)
work page 2023
-
[43]
Angewandte Chemie International Edition55(27), 7586–7605 (2016)
Garcia-Castro, M., Zimmermann, S., Sankar, M.G., Kumar, K.: Scaffold diversity synthesis and its application in probe and drug discovery. Angewandte Chemie International Edition55(27), 7586–7605 (2016)
work page 2016
-
[44]
Cruz-Monteagudo,M.,Medina-Franco,J.L.,Pérez-Castillo,Y.,Nicolotti,O.,Cordeiro,M.N.D.,Borges, F.: Activity cliffs in drug discovery: Dr jekyll or mr hyde? Drug Discovery Today19(8), 1069–1080 (2014)
work page 2014
-
[45]
ACS omega4(11), 14360–14368 (2019)
Stumpfe, D., Hu, H., Bajorath, J.: Evolving concept of activity cliffs. ACS omega4(11), 14360–14368 (2019)
work page 2019
-
[46]
Journal of chemical information and modeling62(23), 5938–5951 (2022)
Van Tilborg, D., Alenicheva, A., Grisoni, F.: Exposing the limitations of molecular machine learning with activity cliffs. Journal of chemical information and modeling62(23), 5938–5951 (2022)
work page 2022
-
[47]
Shen, W.X., Cui, C., Shi, X.C., Zhang, Y.B., Wu, J., Chen, Y.Z.: Online triplet contrastive learning enables efficient cliff awareness in molecular activity prediction (2023)
work page 2023
-
[48]
Sun, R., Dai, H., Yu, A.W.: Does gnn pretraining help molecular representation? Advances in Neural Information Processing Systems35, 12096–12109 (2022)
work page 2022
-
[49]
Proceedings of the National Academy of Sciences102(48), 17272–17277 (2005)
Koch,M.A.,Schuffenhauer,A.,Scheck,M.,Wetzel,S.,Casaulta,M.,Odermatt,A.,Ertl,P.,Waldmann, H.: Charting biologically relevant chemical space: a structural classification of natural products (sconp). Proceedings of the National Academy of Sciences102(48), 17272–17277 (2005)
work page 2005
-
[50]
Journal of Chemical Information and Modeling60(7), 3376–3386 (2020)
Martinez-Trevino, S.H., Uc-Cetina, V., Fernández-Herrera, M.A., Merino, G.: Prediction of natu- ral product classes using machine learning and 13c nmr spectroscopic data. Journal of Chemical Information and Modeling60(7), 3376–3386 (2020)
work page 2020
-
[51]
Journal of Cheminformatics12(1), 12 (2020)
Probst,D.,Reymond,J.-L.:Visualizationofverylargehigh-dimensionaldatasetsasminimumspanning trees. Journal of Cheminformatics12(1), 12 (2020)
work page 2020
-
[52]
Bioinformatics34(8), 1433–1435 (2018)
Probst, D., Reymond, J.-L.: Fun: a framework for interactive visualizations of large, high-dimensional datasets on the web. Bioinformatics34(8), 1433–1435 (2018)
work page 2018
-
[53]
Biomolecules10(10), 1385 (2020)
Capecchi, A., Reymond, J.-L.: Assigning the origin of microbial natural products by chemical space map and machine learning. Biomolecules10(10), 1385 (2020)
work page 2020
-
[54]
Journal of cheminformatics13, 1–11 (2021)
Capecchi, A., Reymond, J.-L.: Classifying natural products from plants, fungi or bacteria using the 18 coconut database and machine learning. Journal of cheminformatics13, 1–11 (2021)
work page 2021
-
[55]
Current opinion in biotechnology23(5), 736–743 (2012)
Winter, J.M., Tang, Y.: Synthetic biological approaches to natural product biosynthesis. Current opinion in biotechnology23(5), 736–743 (2012)
work page 2012
-
[56]
Annual review of microbiology43(1), 173–206 (1989)
Martin, J.F., Liras, P.: Organization and expression of genes involved in the biosynthesis of antibiotics and other secondary metabolites. Annual review of microbiology43(1), 173–206 (1989)
work page 1989
-
[57]
Journal of industrial microbiology9, 73–90 (1992)
Martin, J.F.: Clusters of genes for the biosynthesis of antibiotics: regulatory genes and overproduction of pharmaceuticals. Journal of industrial microbiology9, 73–90 (1992)
work page 1992
-
[58]
Carroll, L.M., Larralde, M., Fleck, J.S., Ponnudurai, R., Milanese, A., Cappio, E., Zeller, G.: Accurate de novo identification of biosynthetic gene clusters with gecco. BioRxiv, 2021–05 (2021)
work page 2021
-
[59]
Sanchez, S., Rogers, J.D., Rogers, A.B., Nassar, M., McEntyre, J., Welch, M., Hollfelder, F., Finn, R.D.: Expansion of novel biosynthetic gene clusters from diverse environments using sanntis. bioRxiv, 2023–05 (2023)
work page 2023
-
[60]
Nucleic acids research49(D1), 412–419 (2021)
Mistry, J., Chuguransky, S., Williams, L., Qureshi, M., Salazar, G.A., Sonnhammer, E.L., Tosatto, S.C., Paladin, L., Raj, S., Richardson, L.J.,et al.: Pfam: The protein families database in 2021. Nucleic acids research49(D1), 412–419 (2021)
work page 2021
-
[61]
Nucleic acids research35(suppl_1), 237–240 (2007)
Marchler-Bauer, A., Anderson, J.B., Derbyshire, M.K., DeWeese-Scott, C., Gonzales, N.R., Gwadz, M., Hao, L., He, S., Hurwitz, D.I., Jackson, J.D.,et al.: Cdd: a conserved domain database for interactive domain family analysis. Nucleic acids research35(suppl_1), 237–240 (2007)
work page 2007
-
[62]
Nucleic acids research38(suppl_1), 401–407 (2010)
Ulrich, L.E., Zhulin, I.B.: The mist2 database: a comprehensive genomics resource on microbial signal transduction. Nucleic acids research38(suppl_1), 401–407 (2010)
work page 2010
-
[63]
Pharmaceutical Science Advances, 100050 (2024)
Zeng, T., Li, J., Wu, R.: Natural product databases for drug discovery: Features and applications. Pharmaceutical Science Advances, 100050 (2024)
work page 2024
-
[64]
Frontiers in chemistry8, 343 (2020)
Maia, E.H.B., Assis, L.C., De Oliveira, T.A., Da Silva, A.M., Taranto, A.G.: Structure-based virtual screening: from classical to artificial intelligence. Frontiers in chemistry8, 343 (2020)
work page 2020
-
[65]
Friesner, R.A., Banks, J.L., Murphy, R.B., Halgren, T.A., Klicic, J.J., Mainz, D.T., Repasky, M.P., Knoll, E.H., Shelley, M., Perry, J.K.,et al.: Glide: a new approach for rapid, accurate docking and scoring.1.methodandassessmentofdockingaccuracy.Journalofmedicinalchemistry 47(7),1739–1749 (2004)
work page 2004
-
[66]
Journal of computational chemistry31(2), 455–461 (2010)
Trott, O., Olson, A.J.: Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry31(2), 455–461 (2010)
work page 2010
-
[67]
Proteins: Structure, Function, and Bioinformatics52(4), 609–623 (2003)
Verdonk, M.L., Cole, J.C., Hartshorn, M.J., Murray, C.W., Taylor, R.D.: Improved protein–ligand docking using gold. Proteins: Structure, Function, and Bioinformatics52(4), 609–623 (2003)
work page 2003
-
[68]
International journal of molecular sciences22(9), 4435 (2021)
Kimber, T.B., Chen, Y., Volkamer, A.: Deep learning in virtual screening: recent applications and developments. International journal of molecular sciences22(9), 4435 (2021)
work page 2021
-
[69]
Journal of Chemical Information and Modeling62(19), 4642–4659 (2022)
Krasoulis, A., Antonopoulos, N., Pitsikalis, V., Theodorakis, S.: Denvis: scalable and high-throughput virtual screening using graph neural networks with atomic and surface protein pocket features. Journal of Chemical Information and Modeling62(19), 4642–4659 (2022)
work page 2022
-
[70]
Nature Machine Intelligence2(2), 134–140 (2020)
Zheng, S., Li, Y., Chen, S., Xu, J., Yang, Y.: Predicting drug–protein interaction using quasi-visual question answering system. Nature Machine Intelligence2(2), 134–140 (2020)
work page 2020
-
[71]
Advances in Neural Information Processing Systems36 (2024) 19
Gao, B., Qiang, B., Tan, H., Jia, Y., Ren, M., Lu, M., Liu, J., Ma, W.-Y., Lan, Y.: Drugclip: Con- trasive protein-molecule representation learning for virtual screening. Advances in Neural Information Processing Systems36 (2024) 19
work page 2024
-
[72]
Chemical science2(9), 1656–1665 (2011)
Ma, D.-L., Chan, D.S.-H., Leung, C.-H.: Molecular docking for virtual screening of natural product databases. Chemical science2(9), 1656–1665 (2011)
work page 2011
-
[73]
Nature Reviews Neuroscience 2(4), 294–302 (2001)
Soreq, H., Seidman, S.: Acetylcholinesterase—new roles for an old actor. Nature Reviews Neuroscience 2(4), 294–302 (2001)
work page 2001
-
[74]
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[75]
arXiv preprint arXiv:2103.09430 (2021)
Hu, W., Fey, M., Ren, H., Nakata, M., Dong, Y., Leskovec, J.: Ogb-lsc: A large-scale challenge for machine learning on graphs. arXiv preprint arXiv:2103.09430 (2021)
-
[76]
Journal of Chemical Information and Modeling 62(11), 2713–2725 (2022)
Wang, Y., Magar, R., Liang, C., Barati Farimani, A.: Improving molecular contrastive learning via faulty negative mitigation and decomposed fragment contrast. Journal of Chemical Information and Modeling 62(11), 2713–2725 (2022)
work page 2022
-
[77]
Journal of chemical information and computer sciences42(6), 1273–1280 (2002)
Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences42(6), 1273–1280 (2002)
work page 2002
-
[78]
Advances in neural information processing systems32 (2019)
Liu, S., Demirel, M.F., Liang, Y.: N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. Advances in neural information processing systems32 (2019)
work page 2019
-
[79]
Journal of chemical information and modeling59(8), 3370–3388 (2019)
Yang, K., Swanson, K., Jin, W., Coley, C., Eiden, P., Gao, H., Guzman-Perez, A., Hopper, T., Kelley, B., Mathea, M.,et al.: Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling59(8), 3370–3388 (2019)
work page 2019
-
[80]
Journal of chemical information and modeling 50(5), 742–754 (2010)
Rogers, D., Hahn, M.: Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5), 742–754 (2010)
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.