pith. sign in

arxiv: 2606.23745 · v1 · pith:ALAQKH4Enew · submitted 2026-06-21 · 🧬 q-bio.BM · cs.AI· cs.LG

JEDEL: Zero-Shot DNA-Encoded Library Design for Early-Stage Drug Discovery

Pith reviewed 2026-06-26 09:15 UTC · model grok-4.3

classification 🧬 q-bio.BM cs.AIcs.LG
keywords DNA-encoded librariespharmacophorezero-shot designdrug discoverysynthesis planningcombinatorial chemistrybinding prediction
0
0 comments X

The pith

JEDEL maps 3D pharmacophore patterns of ligands to scalable, synthesis-ready DNA-encoded library instructions in zero-shot fashion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents JEDEL as a way to turn the three-dimensional binding shapes of known active molecules into concrete recipes for making millions of DNA-encoded compounds that can actually be synthesized in the lab. It learns a direct link between those shapes and combinations of real, purchasable building blocks plus proven reactions, so the output libraries are ready for experiments without extra planning steps. The model produces libraries for many different protein targets that score higher on predicted binding strength and shape recovery than libraries chosen at random or for diversity, all without retraining for each new target. This addresses a common bottleneck where promising molecular ideas stay virtual because turning them into testable physical compounds at large scale is slow and expensive.

Core claim

JEDEL is the first model to map pharmacophore interaction patterns to actionable, scalable synthesis instructions, enabling the design of targeted libraries comprising potentially millions of molecules. Unlike existing generative approaches that produce virtual compounds requiring downstream synthesis planning, JEDEL operates within the space of purchasable building blocks and validated reactions, ensuring that every output is experimentally realizable by construction. JEDEL learns a predictive alignment between pharmacophore geometry and molecular structure and decodes this into combinatorial synthesis routes at scale. Across 18 protein targets, it generates focused libraries that outperfor

What carries the argument

The predictive alignment between pharmacophore geometry and molecular structure, decoded into combinatorial synthesis routes using only purchasable building blocks and validated reactions.

If this is right

  • Focused libraries generated by JEDEL outperform random and diversity-based baselines in predicted binding affinity across 18 protein targets.
  • The approach achieves higher pharmacophore recovery and sample efficiency without target-specific retraining.
  • Every generated molecule is experimentally realizable by construction since it uses validated reactions and purchasable blocks.
  • JEDEL shifts the paradigm from generating virtual compounds to designing experimentally deployable libraries at the scale of millions of molecules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same alignment approach might extend to other library formats that rely on combinatorial synthesis from standard reagents.
  • Adding filters for properties such as cell permeability during the decoding step could produce libraries better suited for downstream biological tests.
  • Running the model on pharmacophores from multiple ligands for the same target could further improve library focus beyond single-ligand inputs.

Load-bearing premise

The learned predictive alignment between pharmacophore geometry and molecular structure generalizes across protein targets in a zero-shot manner, and the predicted binding affinities used for evaluation accurately reflect real-world performance without requiring experimental synthesis or binding validation.

What would settle it

Synthesize a few hundred molecules from one JEDEL-designed library and one baseline library for the same protein target, then measure their experimental binding affinities to test whether the JEDEL set shows measurably stronger binding on average.

read the original abstract

We present JEDEL, a framework for generating synthesis-ready DNA-encoded libraries (DELs) directly from three-dimensional pharmacophore representations of active ligands. JEDEL is the first model to map pharmacophore interaction patterns to actionable, scalable synthesis instructions, enabling the design of targeted libraries comprising potentially millions of molecules. Unlike existing generative approaches that produce virtual compounds requiring downstream synthesis planning, JEDEL operates within the space of purchasable building blocks and validated reactions, ensuring that every output is experimentally realizable by construction. JEDEL learns a predictive alignment between pharmacophore geometry and molecular structure and decodes this into combinatorial synthesis routes at scale. Across 18 protein targets, it generates focused libraries that outperform random and diversity-based baselines in predicted binding affinity, pharmacophore recovery, and sample efficiency, without target-specific retraining. JEDEL enables a shift from virtual molecule generation to experimentally deployable library design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper presents JEDEL, a framework for zero-shot generation of synthesis-ready DNA-encoded libraries (DELs) directly from 3D pharmacophore representations of active ligands. It claims to be the first to map pharmacophore interaction patterns to actionable combinatorial synthesis routes using purchasable building blocks and validated reactions, enabling targeted libraries of potentially millions of molecules. Across 18 protein targets, JEDEL-generated libraries are reported to outperform random and diversity-based baselines in predicted binding affinity, pharmacophore recovery, and sample efficiency, without target-specific retraining.

Significance. If the central claims hold with proper validation, this would be a notable contribution to early-stage drug discovery by shifting from virtual compound generation to directly deployable, scalable DEL design. The zero-shot aspect and focus on experimentally realizable outputs address a practical gap in the field. However, the current evidence base relies entirely on unvalidated in silico proxies, limiting immediate significance.

major comments (3)
  1. [Abstract / Evaluation] Abstract and evaluation sections: The reported outperformance on 'predicted binding affinity' across 18 targets supplies no methodological details on the affinity predictor (e.g., its architecture, training data, cross-validation against experimental Kd/IC50 values, or error bars), making it impossible to assess whether the metrics support the zero-shot generalization claim or reduce to self-referential evaluations.
  2. [Results / Methods] Results and methods: No experimental synthesis or binding assay validation is described for any JEDEL-generated libraries; all headline metrics (affinity, recovery, efficiency) rest on in silico predictions whose correlation with real-world performance is unknown, which is load-bearing for the 'actionable' and 'experimentally realizable by construction' framing.
  3. [Model description] § on model architecture (inferred from abstract): The claim that JEDEL 'learns a predictive alignment between pharmacophore geometry and molecular structure' lacks any equations, derivation steps, or ablation studies showing that performance does not arise from fitted parameters or baseline definitions.
minor comments (1)
  1. [Methods] Dataset descriptions for the 18 protein targets and baseline implementations are not provided in sufficient detail for reproducibility.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive comments. We address each major comment point by point below, indicating planned revisions where the manuscript will be updated.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Abstract and evaluation sections: The reported outperformance on 'predicted binding affinity' across 18 targets supplies no methodological details on the affinity predictor (e.g., its architecture, training data, cross-validation against experimental Kd/IC50 values, or error bars), making it impossible to assess whether the metrics support the zero-shot generalization claim or reduce to self-referential evaluations.

    Authors: We agree that additional details are required. The revised Methods section will include a full description of the affinity predictor architecture, training data sources, cross-validation procedure against experimental Kd/IC50 values, and reporting of error bars to enable proper evaluation of the metrics. revision: yes

  2. Referee: [Results / Methods] Results and methods: No experimental synthesis or binding assay validation is described for any JEDEL-generated libraries; all headline metrics (affinity, recovery, efficiency) rest on in silico predictions whose correlation with real-world performance is unknown, which is load-bearing for the 'actionable' and 'experimentally realizable by construction' framing.

    Authors: This comment correctly identifies that the work is computational. We will revise the Discussion to clarify that 'experimentally realizable by construction' refers specifically to the use of purchasable building blocks and validated reactions, while noting that wet-lab synthesis and binding assays are planned as future work and outside the scope of the current manuscript. The in silico results are presented as such. revision: partial

  3. Referee: [Model description] § on model architecture (inferred from abstract): The claim that JEDEL 'learns a predictive alignment between pharmacophore geometry and molecular structure' lacks any equations, derivation steps, or ablation studies showing that performance does not arise from fitted parameters or baseline definitions.

    Authors: We will expand the model architecture section in the revised manuscript to provide the equations describing the predictive alignment, derivation steps, and ablation studies that demonstrate performance contributions beyond fitted parameters or baseline definitions. revision: yes

standing simulated objections not resolved
  • Experimental synthesis and binding assay validation of JEDEL-generated libraries, as no such wet-lab experiments were conducted in this computational study.

Circularity Check

0 steps flagged

No circularity: no equations, derivations, or self-referential predictions visible

full rationale

The provided abstract and text describe a generative framework for DEL design but contain no mathematical derivations, equations, fitted parameters, or self-citations that reduce any claimed result to its inputs by construction. Performance claims reference 'predicted binding affinity' and baselines, yet no load-bearing step is shown where a prediction is statistically forced by the fitting process itself or where uniqueness is imported via author self-citation. The derivation chain is therefore self-contained against external benchmarks as no internal reduction is exhibited.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, parameters, or explicit assumptions; therefore no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.1-grok · 5703 in / 1043 out tokens · 25472 ms · 2026-06-26T09:15:03.587726+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 12 canonical work pages

  1. [1]

    Guan, J., Qian, W.W., Peng, X., Su, Y., Peng, J., Ma, J.: 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction (2023)

  2. [2]

    In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S

    Peng, X., Luo, S., Guan, J., Xie, Q., Peng, J., Ma, J.: Pocket2Mol: Effi- cient molecular sampling based on 3D protein pockets. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 17644–17655. PMLR,...

  3. [3]

    Digital Discovery3, 1308–1318 (2024) https: //doi.org/10.1039/D4DD00076E

    Jocys, Z., Grundy, J., Farrahi, K.: Drugpose: benchmarking 3d generative meth- ods for early stage drug discovery. Digital Discovery3, 1308–1318 (2024) https: //doi.org/10.1039/D4DD00076E

  4. [4]

    https://arxiv.org/abs/2406.04628

    Luo, S., Gao, W., Wu, Z., Peng, J., Coley, C.W., Ma, J.: Projecting Molecules into Synthesizable Chemical Spaces (2024). https://arxiv.org/abs/2406.04628

  5. [5]

    https://arxiv.org/abs/2405

    Cretu, M., Harris, C., Roy, J., Bengio, E., Li` o, P.: SynFlowNet: Towards Molecule Design with Guaranteed Synthesis Pathways (2024). https://arxiv.org/abs/2405. 01155

  6. [6]

    https://arxiv

    Gao, W., Mercado, R., Coley, C.W.: Amortized Tree Generation for Bottom-up Synthesis Planning and Synthesizable Molecular Design (2022). https://arxiv. org/abs/2110.06389

  7. [7]

    ACS Combinatorial Science 17(9), 476–480 (2015) https://doi.org/10.1021/acscombsci.5b00106

    Cavett, V.J., Paegel, B.M.: DNA-encoded solid-phase synthesis: Encoding lan- guage design and complex oligomer library synthesis. ACS Combinatorial Science 17(9), 476–480 (2015) https://doi.org/10.1021/acscombsci.5b00106

  8. [8]

    ACS Combinatorial Science19(3), 181–192 (2017) https://doi.org/10.1021/acscombsci.6b00192

    Price, A.K., Paegel, B.M.: An integrated microfluidic processor for DNA-encoded combinatorial library functional screening. ACS Combinatorial Science19(3), 181–192 (2017) https://doi.org/10.1021/acscombsci.6b00192

  9. [9]

    Science (2024) https://doi.org/10.1126/science.adn3412

    Keller, B.M.,et al.: Highly pure DNA-encoded chemical libraries by dual-linker solid-phase synthesis. Science (2024) https://doi.org/10.1126/science.adn3412

  10. [10]

    Nature Chemical Biology5(9), 647–654 (2009) https://doi.org/10.1038/ nchembio.211

    Clark, M.A., Acharya, R.A., Arico-Muendel, C.C., Belyanskaya, S.L., Ben- jamin, D.R., Carlson, N.R., Centrella, P.A., Chiu, C.H., Creaser, S.P., Cuozzo, J.W.,et al.: Design, synthesis and selection of DNA-encoded small-molecule libraries. Nature Chemical Biology5(9), 647–654 (2009) https://doi.org/10.1038/ nchembio.211

  11. [11]

    Proceedings of the National Academy of Sciences89(12), 5381–5383 (1992) https://doi.org/10.1073/ pnas.89.12.5381 14

    Brenner, S., Lerner, R.A.: Encoded combinatorial chemistry. Proceedings of the National Academy of Sciences89(12), 5381–5383 (1992) https://doi.org/10.1073/ pnas.89.12.5381 14

  12. [12]

    Journal of Medicinal Chemistry63(16), 8857–8866 (2020) https: //doi.org/10.1021/acs.jmedchem.0c00452

    McCloskey, K., Sigel, E.A., Kearnes, S., Xue, L., Tian, X., Mocber, D., Ramsun- dar, B., Pande, V.: Machine learning on DNA-encoded libraries: a new paradigm for hit finding. Journal of Medicinal Chemistry63(16), 8857–8866 (2020) https: //doi.org/10.1021/acs.jmedchem.0c00452

  13. [13]

    npj Drug Discovery2, 5 (2025) https://doi.org/10.1038/s44386-025-00007-4

    Iqbal, S., Jiang, W., Hansen, E., Ghosh, A., Hou, Y., Wang, X., Li, J.: Evaluation of DNA encoded library and machine learning model combinations for hit discov- ery. npj Drug Discovery2, 5 (2025) https://doi.org/10.1038/s44386-025-00007-4

  14. [14]

    Journal of Medici- nal Chemistry63(16), 8857–8866 (2020) https://doi.org/10.1021/acs.jmedchem

    McCloskey, K., Sigel, E.A., Kearnes, S., Xue, L., Tian, X., Mocber, D., Ram- sundar, B., Mani, V.S., Husain, I., Iqbal, S., Riley, P.: Machine learning on DNA-encoded libraries: A new paradigm for hit finding. Journal of Medici- nal Chemistry63(16), 8857–8866 (2020) https://doi.org/10.1021/acs.jmedchem. 0c00452

  15. [15]

    Journal of Chemical Information and Modeling62(9), 2248–2262 (2022) https://doi.org/10.1021/acs.jcim.2c00041

    Lim, K.S., Reidenbach, A.G., Hua, B.K., Mason, J.W., Gerry, C.J., Clemons, P.A., Coley, C.W.: Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function. Journal of Chemical Information and Modeling62(9), 2248–2262 (2022) https://doi.org/10.1021/acs.jcim.2c00041

  16. [16]

    ACS Combinatorial Science22(8), 410–421 (2020) https://doi.org/10.1021/ acscombsci.0c00007

    K´ om´ ar, P., Kalini´ c, M.: Denoising DNA encoded library screens with sparse learn- ing. ACS Combinatorial Science22(8), 410–421 (2020) https://doi.org/10.1021/ acscombsci.0c00007

  17. [17]

    arXiv preprint (2024) arXiv:2410.08938

    Chen, B., Danel, T., Dreiman, G.H.S., McEnaney, P.J., Jain, N., Novikov, K., Potapov, V., Harris, B., Krauklis, K., Ross, G., Franke, B., Gasser, M.T., Sul- tan, M.M.: KinDEL: DNA-encoded library dataset for kinase inhibitors. arXiv preprint (2024) arXiv:2410.08938. Accepted at ICML 2025

  18. [18]

    Kaggle Competition Dataset (2024)

    Blevins, W.M., Quigley, I., Bio, L.: BELKA: Big Encoded Library for Chemi- cal Assessment. Kaggle Competition Dataset (2024). https://www.kaggle.com/ competitions/leash-BELKA

  19. [19]

    Highly accurate protein structure prediction with AlphaFold.Nature, 596(7873):583–589, 2021

    Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., ˇZ´ ıdek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S.A.A., Ballard, A.J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berg...

  20. [20]

    Nature Communications14(1), 6234 (2023) https://doi.org/10.1038/s41467-023-41454-9 15

    Zhu, H., Zhou, R., Tang, J., Li, M.: A pharmacophore-guided deep learning approach for bioactive molecular generation. Nature Communications14(1), 6234 (2023) https://doi.org/10.1038/s41467-023-41454-9 15

  21. [21]

    In: International Conference on Learning Representations (ICLR) (2025)

    Adams, K., Abeywardane, K., Fromer, J., Coley, C.W.: ShEPhERD: Diffus- ing shape, electrostatics, and pharmacophores for bioisosteric drug design. In: International Conference on Learning Representations (ICLR) (2025)

  22. [22]

    arXiv preprint arXiv:2505.10545 (2025)

    Alakhdar, A., Poczos, B., Washburn, N.: Pharmacophore-conditioned diffusion model for ligand-based de novo drug design. arXiv preprint arXiv:2505.10545 (2025)

  23. [23]

    Satorras, V.G., Hoogeboom, E., Welling, M.: E(n) Equivariant Graph Neural Networks (2021)

  24. [24]

    https://arxiv.org/abs/2302.07541

    Zhang, Z., Zhao, B., Xie, A., Bian, Y., Zhou, S.: Activity Cliff Prediction: Dataset and Benchmark (2023). https://arxiv.org/abs/2302.07541

  25. [25]

    In: Proceedings of the 37th Inter- national Conference on Machine Learning

    Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th Inter- national Conference on Machine Learning. PMLR, vol. 119, pp. 1597–1607 (2020)

  26. [26]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Assran, M., Duval, Q., Misra, I., Bojanowski, P., Vincent, P., Rabbat, M., LeCun, Y., Ballas, N.: Self-supervised learning from images with a joint-embedding pre- dictive architecture. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15619–15629 (2023)

  27. [27]

    Journal of Chemical Information and Modeling55(12), 2562–2574 (2015) https://doi.org/10.1021/acs.jcim.5b00654

    Riniker, S., Landrum, G.A.: Better informed distance geometry: Using what we know to improve conformation generation. Journal of Chemical Information and Modeling55(12), 2562–2574 (2015) https://doi.org/10.1021/acs.jcim.5b00654

  28. [28]

    The pdbbind database: Collection of binding affinities for protein–ligand complexes with known three-dimensional structures.Journal of Medicinal Chemistry, 47(12): 2977–2980, 2004

    Wang, R., Fang, X., Lu, Y., Wang, S.: The PDBbind database: Collection of binding affinities for protein–ligand complexes with known three-dimensional structures. Journal of Medicinal Chemistry47(12), 2977–2980 (2004) https: //doi.org/10.1021/jm030580l

  29. [29]

    Journal of Chem- ical Information and Modeling53(8), 1893–1904 (2013) https://doi.org/10.1021/ ci300604z 16

    Koes, D.R., Baumgartner, M.P., Camacho, C.J.: Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. Journal of Chem- ical Information and Modeling53(8), 1893–1904 (2013) https://doi.org/10.1021/ ci300604z 16