Structure-guided molecular design with contrastive 3D protein-ligand learning

Carles Navarro; Gianni de Fabritiis; Philipp Tholke

arxiv: 2604.19562 · v1 · submitted 2026-04-21 · 💻 cs.LG

Structure-guided molecular design with contrastive 3D protein-ligand learning

Carles Navarro , Philipp Tholke , Gianni de Fabritiis This is my paper

Pith reviewed 2026-05-10 02:25 UTC · model grok-4.3

classification 💻 cs.LG

keywords structure-based drug discoverycontrastive learningSE(3)-equivariant transformermolecular generationprotein-ligand interactionschemical language modelvirtual screeningautoregressive generation

0 comments

The pith

A unified framework combines contrastive 3D structure encoding with autoregressive generation in a multimodal chemical language model to produce target-specific molecules with favorable predicted binding properties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles the dual problem of modeling accurate 3D protein-ligand interactions and searching enormous chemical spaces for accessible compounds. It trains an SE(3)-equivariant transformer with contrastive learning to map pockets and ligands into a shared embedding space, then feeds those embeddings into a multimodal chemical language model that generates molecules conditioned on either structure. A learned dataset token further directs outputs toward commercial chemical libraries. A sympathetic reader cares because the method promises to generate promising drug-like candidates for previously unseen targets directly from pocket geometry, bypassing some limitations of traditional virtual screening pipelines.

Core claim

We present a unified framework that addresses these challenges by combining contrastive 3D structure encoding with autoregressive molecular generation conditioned on commercial compound spaces. First, we introduce an SE(3)-equivariant transformer that encodes ligand and pocket structures into a shared embedding space via contrastive learning, achieving competitive results in zero-shot virtual screening. Second, we integrate these embeddings into a multimodal Chemical Language Model (MCLM). The model generates target-specific molecules conditioned on either pocket or ligand structures, with a learned dataset token that steers the output toward targeted chemical spaces, yielding candidates in

What carries the argument

SE(3)-equivariant transformer producing contrastive 3D embeddings that condition a multimodal chemical language model (MCLM) for autoregressive generation steered by dataset tokens.

If this is right

The same embeddings support competitive zero-shot virtual screening without task-specific fine-tuning.
Generation can be conditioned on either the protein pocket or a known ligand structure.
A single learned dataset token routes outputs into desired commercial compound libraries.
The resulting molecules exhibit favorable predicted binding properties across multiple diverse targets.
No target-specific retraining of the generator is required once the contrastive encoder is trained.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be tested for transfer to other 3D biomolecular tasks such as protein-protein interface design.
Larger-scale contrastive pretraining on expanded structure databases might further tighten the embedding space and improve generation specificity.
Combining the generated candidates with retrosynthesis predictors would create an end-to-end design-to-synthesis pipeline.
The contrastive embeddings might serve as a drop-in featurizer for existing docking or affinity prediction tools.

Load-bearing premise

The embeddings produced by the contrastive SE(3)-equivariant transformer can be effectively integrated into the multimodal chemical language model to steer generation toward molecules with favorable predicted binding properties for unseen targets.

What would settle it

Generate molecules for a held-out target protein, dock or assay the top candidates, and check whether their binding scores are statistically indistinguishable from or worse than those of molecules sampled randomly from the same commercial chemical space.

Figures

Figures reproduced from arXiv: 2604.19562 by Carles Navarro, Gianni de Fabritiis, Philipp Tholke.

**Figure 2.** Figure 2: MCLM architecture. A ligand or pocket 3D context is encoded by the frozen [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Mean search metrics across 15 LIT-PCBA targets for CLIPP-SET ligand search, [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Joint distribution of 3D shape similarity and 2D chemical similarity for [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Per-target comparison of generative and search methods across 15 LIT-PCBA [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Chemical space steering of generated molecules. (a) Per-target Tanimoto similarity [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

read the original abstract

Structure-based drug discovery faces the dual challenge of accurately capturing 3D protein-ligand interactions while navigating ultra-large chemical spaces to identify synthetically accessible candidates. In this work, we present a unified framework that addresses these challenges by combining contrastive 3D structure encoding with autoregressive molecular generation conditioned on commercial compound spaces. First, we introduce an SE(3)-equivariant transformer that encodes ligand and pocket structures into a shared embedding space via contrastive learning, achieving competitive results in zero-shot virtual screening. Second, we integrate these embeddings into a multimodal Chemical Language Model (MCLM). The model generates target-specific molecules conditioned on either pocket or ligand structures, with a learned dataset token that steers the output toward targeted chemical spaces, yielding candidates with favorable predicted binding properties across diverse targets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links a contrastive SE(3)-equivariant encoder for protein-ligand pairs to a conditioned chemical language model for structure-guided molecule generation, but the abstract supplies no experimental numbers or baselines to assess whether the results hold.

read the letter

The main new piece is the two-stage pipeline: first train an SE(3)-equivariant transformer contrastively so that pocket and ligand embeddings live in a shared space, then feed those embeddings into a multimodal chemical language model that generates molecules autoregressively while a learned dataset token biases the output toward commercial chemical spaces. The claim is that this produces target-specific candidates with good predicted binding across several targets and works in zero-shot virtual screening mode. That unification is straightforward and addresses a real workflow need in structure-based design where you want both retrieval and generation in one model. The dataset token is a practical detail that lets the generator stay inside synthetically accessible regions without extra post-filtering. The approach builds directly on existing equivariant transformers and chemical LMs, so the novelty sits in the joint training and conditioning rather than any single component. The abstract gives no concrete numbers, no list of baselines, no metrics such as enrichment or validity rates, and no description of the test targets or how binding was predicted. That makes it impossible to judge whether the contrastive embeddings actually improve generation or whether the model simply reproduces patterns from the training distribution. The assumption that the learned embeddings transfer cleanly to unseen pockets for generation is plausible but remains untested in the summary provided. If the full paper contains solid ablations and external validation, the work is worth a serious look for groups doing multimodal protein-ligand modeling. Otherwise it risks being another architecture sketch without reproducible evidence. I would bring the full version to a reading group to walk through the embedding alignment and generation conditioning steps. It deserves peer review so the authors can supply the missing experimental details and let referees check the numbers.

Referee Report

2 major / 2 minor

Summary. The paper introduces a unified framework for structure-based drug discovery that combines contrastive learning using an SE(3)-equivariant transformer to encode 3D protein-ligand interactions into a shared embedding space with an autoregressive multimodal Chemical Language Model (MCLM) for generating molecules conditioned on these structures and steered towards commercial chemical spaces via dataset tokens. It claims competitive performance in zero-shot virtual screening and generation of candidates with favorable predicted binding properties across diverse targets.

Significance. If the experimental claims hold, the work could advance structure-guided molecular design by bridging equivariant 3D encoders with conditioned generative models, offering a pathway to explore ultra-large commercial libraries more efficiently than traditional virtual screening alone.

major comments (2)

[Abstract] Abstract: the claim of 'achieving competitive results in zero-shot virtual screening' provides no benchmarks, datasets, baselines, or metrics (e.g., AUC, enrichment factor), which is load-bearing for the first contribution and prevents verification of whether the contrastive SE(3) encoder actually advances the state of the art.
[Abstract] Abstract: the statement that the MCLM 'yields candidates with favorable predicted binding properties across diverse targets' lacks any description of the binding prediction protocol, the specific targets or commercial spaces tested, or quantitative comparisons, undermining support for the central claim that the embedding integration successfully steers generation.

minor comments (2)

The acronym MCLM is introduced with its expansion, but subsequent sections should consistently use the full name on first use within each major section for clarity.
Ensure all references to 'commercial compound spaces' and 'dataset token' are accompanied by explicit definitions or citations to the relevant training data construction in the methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments regarding the abstract. We address each point below and agree to revise the abstract to provide additional context on our experimental claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'achieving competitive results in zero-shot virtual screening' provides no benchmarks, datasets, baselines, or metrics (e.g., AUC, enrichment factor), which is load-bearing for the first contribution and prevents verification of whether the contrastive SE(3) encoder actually advances the state of the art.

Authors: The abstract is a concise summary, while the full details of the zero-shot virtual screening benchmarks, datasets, baselines, and metrics (including AUC and enrichment factors) are provided in the experimental results section of the manuscript. This enables verification of the SE(3)-equivariant encoder's performance. To address the concern, we will revise the abstract to briefly reference the key metrics and competitive results demonstrated. revision: yes
Referee: [Abstract] Abstract: the statement that the MCLM 'yields candidates with favorable predicted binding properties across diverse targets' lacks any description of the binding prediction protocol, the specific targets or commercial spaces tested, or quantitative comparisons, undermining support for the central claim that the embedding integration successfully steers generation.

Authors: The binding prediction protocol, specific targets, commercial spaces, and quantitative comparisons are detailed in the generation and results sections. We will revise the abstract to include a concise description of the protocol and note the favorable predicted binding properties observed. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents a standard two-stage ML pipeline: contrastive training of an SE(3)-equivariant transformer to produce shared embeddings, followed by integration into a multimodal chemical language model for conditioned autoregressive generation. No load-bearing step reduces to its own inputs by definition, no fitted parameter is relabeled as a prediction, and no self-citation chain is invoked to justify uniqueness or core assumptions. The derivation relies on external training data and standard contrastive/generative objectives, remaining self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract, the paper does not introduce new free parameters, axioms beyond standard ones, or invented entities. It relies on established techniques in equivariant neural networks and contrastive learning without introducing new physical entities or many free parameters beyond typical model hyperparameters.

axioms (1)

domain assumption SE(3) equivariance is appropriate for 3D molecular structures
Standard assumption in geometric deep learning for molecules and proteins.

pith-pipeline@v0.9.0 · 5434 in / 1224 out tokens · 56034 ms · 2026-05-10T02:25:38.537336+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

[1]

, title =

Jain, Ajay N. , title =. J. Med. Chem. , year =

work page
[2]

and Banks, Jay L

Friesner, Richard A. and Banks, Jay L. and Murphy, Robert B. and Halgren, Thomas A. and Klicic, Jasna J. and Mainz, Daniel T. and Repasky, Matthew P. and Knoll, Eric H. and Shelley, Mee and Perry, Jason K. and Shaw, David E. and Francis, Perry and Shenkin, Peter S. , title =. J. Med. Chem. , year =

work page
[3]

Zhang, Xiangying and Gao, Haotian and Wang, Haojie and Chen, Zhihang and Zhang, Zhe and Chen, Xinchong and Li, Yan and Qi, Yifei and Wang, Renxiao , title =. J. Chem. Inf. Model. , year =

work page
[4]

and Francoeur, Paul and Aggarwal, Rishal and Masuda, Tomohide and Meli, Rocco and Ragoza, Matthew and Sunseri, Jocelyn and Koes, David Ryan , title =

McNutt, Andrew T. and Francoeur, Paul and Aggarwal, Rishal and Masuda, Tomohide and Meli, Rocco and Ragoza, Matthew and Sunseri, Jocelyn and Koes, David Ryan , title =. J. Cheminf. , year =

work page
[5]

Ozt\"urk, Hakime and \

\"Ozt\"urk, Hakime and \"Ozg\"ur, Arzucan and Ozkirimli, Elif , title =. Bioinformatics , year =

work page
[6]

and Aggarwal, Rishal and Popov, Konstantin I

Brocidiacono, Michael and Francoeur, Paul G. and Aggarwal, Rishal and Popov, Konstantin I. and Koes, David Ryan and Tropsha, Alexander , title =. J. Chem. Inf. Model. , year =

work page
[7]

Alayrac, Jean-Baptiste and Donahue, Jeff and Luc, Pauline and Miech, Antoine and Barr, Iain and Hasson, Yana and Lenc, Karel and Mensch, Arthur and Millican, Katherine and Reynolds, Malcolm and Ring, Roman and Rutherford, Eliza and Cabi, Serkan and Han, Tengda and Gong, Zhitao and Samangooei, Sina and Monteiro, Marianne and Menick, Jacob and Borgeaud, Seb...

work page
[8]

Li, Junnan and Li, Dongxu and Savarese, Silvio and Hoi, Steven , title =. Proc. Int. Conf. Mach. Learn. , year =

work page
[9]

Peng, Xingang and Luo, Shitong and Guan, Jiaqi and Xie, Qi and Peng, Jian and Ma, Jianzhu , title =. Proc. Int. Conf. Mach. Learn. , year =

work page
[10]

3d equivariant diffusion for target-aware molecule generation and affinity prediction,

Guan, Jiaqi and Qian, Wesley Wei and Peng, Xingang and Su, Yufeng and Peng, Jian and Ma, Jianzhu , title =. arXiv preprint arXiv:2303.03543 , year =

work page arXiv
[11]

Structure-based drug design with equivariant diffusion models,

Schneuing, Arne and Harris, Charles and Du, Yuanqi and Didi, Kieran and Jamasb, Arian and Igashov, Ilia and Du, Weitao and Gomes, Carla and Blundell, Tom and Li. arXiv preprint arXiv:2210.13695 , year =

work page arXiv
[12]

Cremer, Julian and Le, Tuan and No. Chem. Sci. , year =

work page
[13]

and Underkoffler, Carl and Pederson, Ryan and Mardirossian, Narbe and Watson, Ian and Parkhill, John , title =

Kaufman, Benjamin and Williams, Edward C. and Underkoffler, Carl and Pederson, Ryan and Mardirossian, Narbe and Watson, Ian and Parkhill, John , title =. J. Chem. Inf. Model. , year =

work page
[14]

arXiv preprint arXiv:2406.08961 , year =

Huang, Yanwen and Gao, Bowen and Jia, Yinjun and Ma, Hongbo and Ma, Wei-Ying and Zhang, Ya-Qin and Lan, Yanyan , title =. arXiv preprint arXiv:2406.08961 , year =

work page arXiv
[15]

Gao, Bowen and Jia, Yinjun and Mo, Yuanle and Ni, Yuyan and Ma, Wei-Ying and Ma, Zhi-Ming and Lan, Yanyan , title =. Proc. Int. Conf. Learn. Represent. , year =

work page
[16]

and Blackshaw, James and Corbett, Sybilla and de Veij, Marleen and Ioannidis, Harris and Lopez, David Mendez and Mosquera, Juan F

Zdrazil, Barbara and Felix, Eloy and Hunter, Fiona and Manners, Emma J. and Blackshaw, James and Corbett, Sybilla and de Veij, Marleen and Ioannidis, Harris and Lopez, David Mendez and Mosquera, Juan F. and Magarinos, Maria Paula and Bosc, Nicolas and Arcila, Ricardo and Kizil. Nucleic Acids Res. , year =

work page
[17]

and Thiessen, Paul A

Kim, Sunghwan and Chen, Jie and Cheng, Tiejun and Gindulyte, Asta and He, Jia and He, Siqian and Li, Qingliang and Shoemaker, Benjamin A. and Thiessen, Paul A. and Yu, Bo and Zaslavsky, Leonid and Zhang, Jian and Bolton, Evan E. , title =. Nucleic Acids Res. , year =

work page
[18]

Tran-Nguyen, Viet-Khoa and Jacquemard, Christophe and Rognan, Didier , title =. J. Chem. Inf. Model. , year =

work page
[19]

Axelrod, Simon and Gomez-Bombarelli, Rafael , title =. Sci. Data , year =

work page
[20]

Journal of Medicinal Chemistry , volume =

Finding a Needle in a Haystack: Development of a Combinatorial Virtual Screening Approach for Identifying High Specificity Heparin/Heparan Sulfate Sequence(s) , author =. Journal of Medicinal Chemistry , volume =. 2006 , publisher =

work page 2006
[21]

Journal of computer-aided molecular design , volume=

Estimation of the size of drug-like chemical space based on GDB-17 data , author=. Journal of computer-aided molecular design , volume=. 2013 , publisher=

work page 2013
[22]

Gao, Bowen and Qiang, Bo and Tan, Haichuan and Jia, Yinjun and Ren, Minsi and Gui, Shurui and Huang, Yanwen and Ma, Weiying and Lan, Yanyan , booktitle=

work page
[23]

ACS Central Science , volume=

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , author=. ACS Central Science , volume=. 2018 , publisher=

work page 2018
[24]

Exploring the

Ar. Exploring the. Journal of Cheminformatics , volume=. 2019 , publisher=

work page 2019
[25]

Nature Machine Intelligence , volume=

Machine Learning-Aided Generative Molecular Design , author=. Nature Machine Intelligence , volume=. 2024 , publisher=

work page 2024
[26]

Journal of Cheminformatics , volume=

Molecular De-Novo Design through Deep Reinforcement Learning , author=. Journal of Cheminformatics , volume=. 2017 , publisher=

work page 2017
[27]

Medicinal Research Reviews , volume=

The art and practice of structure-based drug design: a molecular modeling perspective , author=. Medicinal Research Reviews , volume=. 1996 , publisher=

work page 1996
[28]

Nature , volume=

Virtual screening of chemical libraries , author=. Nature , volume=. 2004 , publisher=

work page 2004
[29]

Nature , volume=

Ultra-large library docking for discovering new chemotypes , author=. Nature , volume=. 2019 , publisher=

work page 2019
[30]

iScience , volume=

Generating multibillion chemical space of readily accessible screening compounds , author=. iScience , volume=. 2020 , publisher=

work page 2020
[31]

Journal of Chemical Information and Modeling , volume=

Exploration of ultralarge compound collections for drug discovery , author=. Journal of Chemical Information and Modeling , volume=. 2022 , publisher=

work page 2022
[32]

Journal of Chemical Information and Modeling , volume=

Machine learning-boosted docking enables the efficient structure-based virtual screening of giga-scale enumerated chemical libraries , author=. Journal of Chemical Information and Modeling , volume=. 2023 , publisher=

work page 2023
[33]

Journal of Medicinal Chemistry , volume=

Comparison of shape-matching and docking as virtual screening tools , author=. Journal of Medicinal Chemistry , volume=. 2007 , publisher=

work page 2007
[34]

Advances in Neural Information Processing Systems , volume=

Geometric Algebra Transformer , author=. Advances in Neural Information Processing Systems , volume=

work page
[35]

Representation Learning with Contrastive Predictive Coding

Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[36]

International Conference on Machine Learning , pages=

Learning Transferable Visual Models From Natural Language Supervision , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[37]

bioRxiv , pages=

Enhancing Challenging Target Screening via Multimodal Protein-Ligand Contrastive Learning , author=. bioRxiv , pages=. 2024 , publisher=

work page 2024
[38]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[39]

Journal of Machine Learning Research , volume=

A Neural Probabilistic Language Model , author=. Journal of Machine Learning Research , volume=

work page
[40]

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle=

work page
[41]

2025 , doi=

Atwi, Rasha and Farr, Stephen and Wang, Ye and Antoszewski, Adam and Sciabola, Simone , journal=. 2025 , doi=

work page 2025
[42]

bioRxiv , year=

Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction , author=. bioRxiv , year=

work page

[1] [1]

, title =

Jain, Ajay N. , title =. J. Med. Chem. , year =

work page

[2] [2]

and Banks, Jay L

Friesner, Richard A. and Banks, Jay L. and Murphy, Robert B. and Halgren, Thomas A. and Klicic, Jasna J. and Mainz, Daniel T. and Repasky, Matthew P. and Knoll, Eric H. and Shelley, Mee and Perry, Jason K. and Shaw, David E. and Francis, Perry and Shenkin, Peter S. , title =. J. Med. Chem. , year =

work page

[3] [3]

Zhang, Xiangying and Gao, Haotian and Wang, Haojie and Chen, Zhihang and Zhang, Zhe and Chen, Xinchong and Li, Yan and Qi, Yifei and Wang, Renxiao , title =. J. Chem. Inf. Model. , year =

work page

[4] [4]

and Francoeur, Paul and Aggarwal, Rishal and Masuda, Tomohide and Meli, Rocco and Ragoza, Matthew and Sunseri, Jocelyn and Koes, David Ryan , title =

McNutt, Andrew T. and Francoeur, Paul and Aggarwal, Rishal and Masuda, Tomohide and Meli, Rocco and Ragoza, Matthew and Sunseri, Jocelyn and Koes, David Ryan , title =. J. Cheminf. , year =

work page

[5] [5]

Ozt\"urk, Hakime and \

\"Ozt\"urk, Hakime and \"Ozg\"ur, Arzucan and Ozkirimli, Elif , title =. Bioinformatics , year =

work page

[6] [6]

and Aggarwal, Rishal and Popov, Konstantin I

Brocidiacono, Michael and Francoeur, Paul G. and Aggarwal, Rishal and Popov, Konstantin I. and Koes, David Ryan and Tropsha, Alexander , title =. J. Chem. Inf. Model. , year =

work page

[7] [7]

Alayrac, Jean-Baptiste and Donahue, Jeff and Luc, Pauline and Miech, Antoine and Barr, Iain and Hasson, Yana and Lenc, Karel and Mensch, Arthur and Millican, Katherine and Reynolds, Malcolm and Ring, Roman and Rutherford, Eliza and Cabi, Serkan and Han, Tengda and Gong, Zhitao and Samangooei, Sina and Monteiro, Marianne and Menick, Jacob and Borgeaud, Seb...

work page

[8] [8]

Li, Junnan and Li, Dongxu and Savarese, Silvio and Hoi, Steven , title =. Proc. Int. Conf. Mach. Learn. , year =

work page

[9] [9]

Peng, Xingang and Luo, Shitong and Guan, Jiaqi and Xie, Qi and Peng, Jian and Ma, Jianzhu , title =. Proc. Int. Conf. Mach. Learn. , year =

work page

[10] [10]

3d equivariant diffusion for target-aware molecule generation and affinity prediction,

Guan, Jiaqi and Qian, Wesley Wei and Peng, Xingang and Su, Yufeng and Peng, Jian and Ma, Jianzhu , title =. arXiv preprint arXiv:2303.03543 , year =

work page arXiv

[11] [11]

Structure-based drug design with equivariant diffusion models,

Schneuing, Arne and Harris, Charles and Du, Yuanqi and Didi, Kieran and Jamasb, Arian and Igashov, Ilia and Du, Weitao and Gomes, Carla and Blundell, Tom and Li. arXiv preprint arXiv:2210.13695 , year =

work page arXiv

[12] [12]

Cremer, Julian and Le, Tuan and No. Chem. Sci. , year =

work page

[13] [13]

and Underkoffler, Carl and Pederson, Ryan and Mardirossian, Narbe and Watson, Ian and Parkhill, John , title =

Kaufman, Benjamin and Williams, Edward C. and Underkoffler, Carl and Pederson, Ryan and Mardirossian, Narbe and Watson, Ian and Parkhill, John , title =. J. Chem. Inf. Model. , year =

work page

[14] [14]

arXiv preprint arXiv:2406.08961 , year =

Huang, Yanwen and Gao, Bowen and Jia, Yinjun and Ma, Hongbo and Ma, Wei-Ying and Zhang, Ya-Qin and Lan, Yanyan , title =. arXiv preprint arXiv:2406.08961 , year =

work page arXiv

[15] [15]

Gao, Bowen and Jia, Yinjun and Mo, Yuanle and Ni, Yuyan and Ma, Wei-Ying and Ma, Zhi-Ming and Lan, Yanyan , title =. Proc. Int. Conf. Learn. Represent. , year =

work page

[16] [16]

and Blackshaw, James and Corbett, Sybilla and de Veij, Marleen and Ioannidis, Harris and Lopez, David Mendez and Mosquera, Juan F

Zdrazil, Barbara and Felix, Eloy and Hunter, Fiona and Manners, Emma J. and Blackshaw, James and Corbett, Sybilla and de Veij, Marleen and Ioannidis, Harris and Lopez, David Mendez and Mosquera, Juan F. and Magarinos, Maria Paula and Bosc, Nicolas and Arcila, Ricardo and Kizil. Nucleic Acids Res. , year =

work page

[17] [17]

and Thiessen, Paul A

Kim, Sunghwan and Chen, Jie and Cheng, Tiejun and Gindulyte, Asta and He, Jia and He, Siqian and Li, Qingliang and Shoemaker, Benjamin A. and Thiessen, Paul A. and Yu, Bo and Zaslavsky, Leonid and Zhang, Jian and Bolton, Evan E. , title =. Nucleic Acids Res. , year =

work page

[18] [18]

Tran-Nguyen, Viet-Khoa and Jacquemard, Christophe and Rognan, Didier , title =. J. Chem. Inf. Model. , year =

work page

[19] [19]

Axelrod, Simon and Gomez-Bombarelli, Rafael , title =. Sci. Data , year =

work page

[20] [20]

Journal of Medicinal Chemistry , volume =

Finding a Needle in a Haystack: Development of a Combinatorial Virtual Screening Approach for Identifying High Specificity Heparin/Heparan Sulfate Sequence(s) , author =. Journal of Medicinal Chemistry , volume =. 2006 , publisher =

work page 2006

[21] [21]

Journal of computer-aided molecular design , volume=

Estimation of the size of drug-like chemical space based on GDB-17 data , author=. Journal of computer-aided molecular design , volume=. 2013 , publisher=

work page 2013

[22] [22]

Gao, Bowen and Qiang, Bo and Tan, Haichuan and Jia, Yinjun and Ren, Minsi and Gui, Shurui and Huang, Yanwen and Ma, Weiying and Lan, Yanyan , booktitle=

work page

[23] [23]

ACS Central Science , volume=

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , author=. ACS Central Science , volume=. 2018 , publisher=

work page 2018

[24] [24]

Exploring the

Ar. Exploring the. Journal of Cheminformatics , volume=. 2019 , publisher=

work page 2019

[25] [25]

Nature Machine Intelligence , volume=

Machine Learning-Aided Generative Molecular Design , author=. Nature Machine Intelligence , volume=. 2024 , publisher=

work page 2024

[26] [26]

Journal of Cheminformatics , volume=

Molecular De-Novo Design through Deep Reinforcement Learning , author=. Journal of Cheminformatics , volume=. 2017 , publisher=

work page 2017

[27] [27]

Medicinal Research Reviews , volume=

The art and practice of structure-based drug design: a molecular modeling perspective , author=. Medicinal Research Reviews , volume=. 1996 , publisher=

work page 1996

[28] [28]

Nature , volume=

Virtual screening of chemical libraries , author=. Nature , volume=. 2004 , publisher=

work page 2004

[29] [29]

Nature , volume=

Ultra-large library docking for discovering new chemotypes , author=. Nature , volume=. 2019 , publisher=

work page 2019

[30] [30]

iScience , volume=

Generating multibillion chemical space of readily accessible screening compounds , author=. iScience , volume=. 2020 , publisher=

work page 2020

[31] [31]

Journal of Chemical Information and Modeling , volume=

Exploration of ultralarge compound collections for drug discovery , author=. Journal of Chemical Information and Modeling , volume=. 2022 , publisher=

work page 2022

[32] [32]

Journal of Chemical Information and Modeling , volume=

Machine learning-boosted docking enables the efficient structure-based virtual screening of giga-scale enumerated chemical libraries , author=. Journal of Chemical Information and Modeling , volume=. 2023 , publisher=

work page 2023

[33] [33]

Journal of Medicinal Chemistry , volume=

Comparison of shape-matching and docking as virtual screening tools , author=. Journal of Medicinal Chemistry , volume=. 2007 , publisher=

work page 2007

[34] [34]

Advances in Neural Information Processing Systems , volume=

Geometric Algebra Transformer , author=. Advances in Neural Information Processing Systems , volume=

work page

[35] [35]

Representation Learning with Contrastive Predictive Coding

Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[36] [36]

International Conference on Machine Learning , pages=

Learning Transferable Visual Models From Natural Language Supervision , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021

[37] [37]

bioRxiv , pages=

Enhancing Challenging Target Screening via Multimodal Protein-Ligand Contrastive Learning , author=. bioRxiv , pages=. 2024 , publisher=

work page 2024

[38] [38]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[39] [39]

Journal of Machine Learning Research , volume=

A Neural Probabilistic Language Model , author=. Journal of Machine Learning Research , volume=

work page

[40] [40]

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle=

work page

[41] [41]

2025 , doi=

Atwi, Rasha and Farr, Stephen and Wang, Ye and Antoszewski, Adam and Sciabola, Simone , journal=. 2025 , doi=

work page 2025

[42] [42]

bioRxiv , year=

Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction , author=. bioRxiv , year=

work page