ShallowBench: Benchmarking Generative Drug Design Models on Shallow-Pocket Targets

Saket Reddy; Shiwei Liu

arxiv: 2606.06717 · v1 · pith:PJPY4Y54new · submitted 2026-06-04 · 💻 cs.LG · cs.AI· q-bio.BM· q-bio.QM

ShallowBench: Benchmarking Generative Drug Design Models on Shallow-Pocket Targets

Saket Reddy , Shiwei Liu This is my paper

Pith reviewed 2026-06-28 02:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.BMq-bio.QM

keywords shallow-pocket targetsgenerative drug designbinding affinitybenchmarklow concavityKRASMYCCrossDocked2020

0 comments

The pith

Generative drug design models exhibit weaker binding affinity on shallow-pocket protein targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ShallowBench, a benchmark of 5,780 shallow-pocket targets from CrossDocked2020. It isolates these using the difference between Alpha Shape lid volume and protein atom voxel volume to find low-concavity targets with enough binding surface. Tests on state-of-the-art generative models show they produce ligands with weaker predicted binding on these shallow interfaces. This is relevant because important targets like KRAS and MYC are shallow and often undruggable. The work pushes for better models that can handle such challenging pockets.

Core claim

ShallowBench provides a strictly curated benchmark of 5,780 shallow-pocket targets. By computing the difference between an Alpha Shape lid volume and the underlying protein atom voxel volume, targets with low concavity and sufficient surface area are isolated. Evaluating state-of-the-art generative models on this set reveals weaker predicted binding affinity on low-concavity interfaces, highlighting the need for new architectural innovations or loss functions.

What carries the argument

ShallowBench, a benchmark of shallow-pocket targets curated via Alpha Shape lid volume minus protein voxel volume difference.

If this is right

Current generative models require evaluation on shallow-pocket benchmarks to demonstrate effectiveness.
New model architectures or loss functions are needed to better sample ligands for low-concavity targets.
ShallowBench enables systematic testing on historically undruggable targets such as KRAS and MYC.
The benchmark serves as a standard for advancing generative biology models beyond deep-pocket assumptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying the curation method to additional datasets could create larger or more diverse shallow-pocket collections.
The performance difference may stem from models being trained predominantly on deep-pocket examples.
Future work could explore whether incorporating explicit concavity features into models closes the affinity gap.
Experimental binding assays on ligands generated for ShallowBench targets would provide ground-truth validation.

Load-bearing premise

The difference between an Alpha Shape lid volume and the underlying protein atom voxel volume accurately isolates targets with low concavity while ensuring sufficient surface area for binding.

What would settle it

Demonstrating that generative models achieve similar or superior predicted binding affinities on ShallowBench shallow-pocket targets as on deep-pocket targets would disprove the weaker performance observation.

Figures

Figures reproduced from arXiv: 2606.06717 by Saket Reddy, Shiwei Liu.

**Figure 2.** Figure 2: Comparison of deep pocket volume to shallow pocket volume. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of concavity and surface area of targets in ShallowBench. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Data curation pipeline for the control dataset. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

While generative AI models have demonstrated remarkable success in structure-based drug design, they predominantly rely on deep binding pockets and struggle to sample effective ligands for challenging low-pocketability targets, such as the historically "undruggable" oncology targets KRAS and MYC. To address this gap, we introduce ShallowBench, a strictly curated benchmark of 5,780 shallow-pocket targets extracted from CrossDocked2020. By computing the difference between an Alpha Shape "lid" volume and the underlying protein atom voxel volume, we successfully isolated targets with low concavity while ensuring sufficient surface area for binding. Evaluating various state-of-the-art generative models reveals weaker predicted binding affinity on these low-concavity interfaces. ShallowBench therefore provides a rigorous benchmark for generative biology models and highlights the necessity of new architectural innovations or loss functions capable of navigating these challenging targets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ShallowBench creates a new shallow-pocket benchmark from CrossDocked2020 but the Alpha Shape volume method has no validation against known examples or other descriptors.

read the letter

The paper's core contribution is a benchmark of 5,780 targets labeled as shallow-pocket, pulled from CrossDocked2020 via the difference between an Alpha Shape lid volume and the protein atom voxel volume. They run several generative models on it and report weaker predicted affinities compared to standard sets. That finding is the main result.

The work does one useful thing: it flags a practical gap. Generative drug design papers usually focus on deep pockets, and targets like KRAS or MYC are known to be hard for that reason. Turning public docking data into a focused test set for low-concavity cases is a reasonable step.

The soft spot is the curation itself. The abstract describes the volume difference but gives no thresholds, no correlation with established pocket tools such as fpocket, and no check against the actual shallow-pocket proteins the authors mention. If the metric does not reliably separate low-concavity interfaces, the performance drop cannot be attributed to pocket shape. The stress-test concern lands: without that check, the benchmark does not yet test the regime it claims to test.

The paper is for groups building or evaluating ligand generators who want a harder test set than the usual deep-pocket collections. A reader already working on undruggable oncology targets would find the curation idea worth looking at, even if the current validation is thin.

It deserves referee time because the underlying problem is real and the data source is public, but any review would need to press for the missing validation experiments and full metrics. I would send it out rather than desk reject.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces ShallowBench, a benchmark of 5,780 shallow-pocket targets extracted from CrossDocked2020. Curation relies on the difference between an Alpha Shape 'lid' volume and the underlying protein atom voxel volume to isolate low-concavity interfaces while retaining sufficient surface area. Evaluation of state-of-the-art generative drug design models on this set reports weaker predicted binding affinities relative to deeper-pocket cases, with the abstract highlighting implications for targets such as KRAS and MYC.

Significance. If the curation criterion is shown to be valid, the benchmark would fill a clear gap by providing a reproducible test set focused on low-concavity interfaces that current generative models struggle with. The reliance on public CrossDocked2020 data and an independent geometric computation (rather than fitted parameters) is a strength that supports reproducibility and community use.

major comments (1)

[Abstract] Abstract: The central claim that the benchmark isolates 'targets with low concavity' and that weaker affinities are attributable to this property rests on the Alpha Shape lid-volume difference. No quantitative validation is described—no correlation with known shallow-pocket examples (KRAS, MYC), no comparison to established descriptors such as fpocket depth or concavity scores, and no reported thresholds or sensitivity analysis. This directly undermines attribution of the performance gap to shallow pockets.

minor comments (1)

[Abstract] Abstract: The statement 'weaker predicted binding affinity' is given without naming the specific affinity predictor, the exact metric (e.g., docking score, predicted pKd), or any error bars or statistical tests.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The concern regarding quantitative validation of the shallow-pocket curation criterion is valid, and we will strengthen the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the benchmark isolates 'targets with low concavity' and that weaker affinities are attributable to this property rests on the Alpha Shape lid-volume difference. No quantitative validation is described—no correlation with known shallow-pocket examples (KRAS, MYC), no comparison to established descriptors such as fpocket depth or concavity scores, and no reported thresholds or sensitivity analysis. This directly undermines attribution of the performance gap to shallow pockets.

Authors: We agree that the manuscript as submitted lacks explicit quantitative validation of the Alpha Shape lid-volume criterion. In the revised version we will add: (i) a sensitivity analysis varying the lid-volume difference threshold and reporting how the size and composition of the 5,780-target set changes; (ii) direct numerical comparison of our selected targets against fpocket depth and concavity scores; and (iii) explicit verification that known low-concavity oncology targets (KRAS, MYC) fall inside the ShallowBench distribution or exhibit comparable lid-volume statistics. These additions will provide the missing empirical grounding for attributing the observed affinity gap to low concavity rather than other factors. revision: yes

Circularity Check

0 steps flagged

No significant circularity; benchmark uses independent geometric selection on public data

full rationale

The paper constructs ShallowBench by applying an Alpha Shape lid-volume difference to the public CrossDocked2020 dataset to select 5,780 targets. This geometric filter is a direct computation on external data and does not reduce any claimed result (weaker binding affinity on selected targets) to a fitted parameter, self-definition, or self-citation chain. Model evaluations are performed on the resulting fixed benchmark set without the predictions being forced by the selection metric itself. No load-bearing self-citations, ansatzes, or renamings are present in the derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on a geometric domain assumption for pocket identification and an ad hoc selection process whose exact parameters are unspecified in the abstract.

free parameters (1)

volume difference threshold
The cutoff value used to define low concavity and select exactly 5,780 targets is not stated and appears chosen to produce the reported set size.

axioms (1)

domain assumption Alpha Shape lid volume minus protein voxel volume reliably quantifies low concavity with sufficient binding surface
Invoked in the abstract to isolate shallow-pocket targets from CrossDocked2020.

pith-pipeline@v0.9.1-grok · 5681 in / 1197 out tokens · 40762 ms · 2026-06-28T02:18:44.688477+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 1 canonical work pages

[1]

and Lio, Pietro and Welling, Max and Bronstein, Michael and Correia, Bruno , title =

Schneuing, Arne and Harris, Charles and Du, Yuanqi and Didi, Kieran and Jamasb, Arian and Igashov, Ilia and Du, Weitao and Gomes, Carla and Blundell, Tom L. and Lio, Pietro and Welling, Max and Bronstein, Michael and Correia, Bruno , title =. Nature Computational Science , year =
[2]

DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking , booktitle =

Corso, Gabriele and St. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking , booktitle =
[3]

and Zoephel, Andreas and Mayer, Moriz and Gollner, Andreas and Covini, David and Fischer, Silke and Gerstberger, Thomas and others , title =

Kessler, Dirk and Gmachl, Michael and Mantoulidis, Andreas and Martin, Laetitia J. and Zoephel, Andreas and Mayer, Moriz and Gollner, Andreas and Covini, David and Fischer, Silke and Gerstberger, Thomas and others , title =. Proceedings of the National Academy of Sciences , year =
[4]

, title =

Llombart, Victor and Mansour, Marc R. , title =. eBioMedicine , year =
[5]

and Masuda, Tomohide and Sunseri, Jocelyn and Jia, Andrew and Iovanisci, Richard B

Francoeur, Paul G. and Masuda, Tomohide and Sunseri, Jocelyn and Jia, Andrew and Iovanisci, Richard B. and Snyder, Ian and Koes, David R. , title =. Journal of Chemical Information and Modeling , year =
[6]

Bioinformatics , year =

Liu, Zhihai and Li, Min and Xiong, Jian-Xing and Li, Yan and Liu, Jie and Zhao, Zhixiong and Chen, Hao and Wang, Renxiao , title =. Bioinformatics , year =
[7]

and Oviedo, Felipe and Ferres, Juan Lavista and Bowman, Gregory R

Meller, Artur and Ward, Michael and Borowsky, Jonathan and Kshirsagar, Meghana and Lotthammer, Jeffrey M. and Oviedo, Felipe and Ferres, Juan Lavista and Bowman, Gregory R. , title =. Nature Communications , year =
[8]

2025 , volume =

CryptoBench: cryptic protein--ligand binding sites dataset and benchmark , journal =. 2025 , volume =

2025
[9]

RDKit: Open-Source Cheminformatics Software , year =
[10]

Richard and Paolini, Gaia V

Bickerton, G. Richard and Paolini, Gaia V. and Besnard, J. Quantifying the chemical beauty of drugs , journal =. 2012 , volume =

2012
[11]

, title =

Trott, Oleg and Olson, Arthur J. , title =. Journal of Computational Chemistry , year =
[12]

, title =

Dunn, Ian and Koes, David R. , title =. 2024 , eprint =. doi:10.48550/arXiv.2411.16644 , url =

work page doi:10.48550/arxiv.2411.16644 2024
[13]

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics , year =

Irwin, Ross and Tibo, Alessandro and Janet, Jon Paul and Olsson, Simon , title =. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics , year =
[14]

ICLR Workshop on Integrating Generative and Experimental Platforms for Biomolecular Design , year =

Buttenschoen, Martin and Ziv, Yael and Morris, Garrett and Deane, Charlotte , title =. ICLR Workshop on Integrating Generative and Experimental Platforms for Biomolecular Design , year =
[15]

and Laaksonen, Aatto and Nordenski

Korolev, Nikolay and Lyubartsev, Alexander P. and Laaksonen, Aatto and Nordenski. On the Competition between Water, Sodium Ions, and Spermine in Binding to DNA: A Molecular Dynamics Computer Simulation Study , journal =. 2002 , volume =

2002
[16]

Biomolecules , year =

Dong, Benzhi and Li, Sijia and Hou, Chang and Xu, Dali , title =. Biomolecules , year =
[17]

and Lindert, Steffen , title =

Turzo, SM Bargeen Alam and Hantz, Eric R. and Lindert, Steffen , title =. QRB Discovery , year =
[18]

, title =

Bellock, Kenneth E. , title =
[19]

Enhancing the Performance of

Gavali, Esha Rajesh , school=. Enhancing the Performance of
[20]

Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

Towards Multi-Modal Foundations for Chemical Science , author=. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS) , year=
[21]

Guan, Haotian and Lu, Jiejun and Zhang, Chen and Chen, Fangjian and He, Xiaofei , booktitle=. Cohort
[22]

Journal of Molecular Biology , volume =

Shape complementarity at protein/protein interfaces , author =. Journal of Molecular Biology , volume =. 1993 , doi =

1993

[1] [1]

and Lio, Pietro and Welling, Max and Bronstein, Michael and Correia, Bruno , title =

Schneuing, Arne and Harris, Charles and Du, Yuanqi and Didi, Kieran and Jamasb, Arian and Igashov, Ilia and Du, Weitao and Gomes, Carla and Blundell, Tom L. and Lio, Pietro and Welling, Max and Bronstein, Michael and Correia, Bruno , title =. Nature Computational Science , year =

[2] [2]

DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking , booktitle =

Corso, Gabriele and St. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking , booktitle =

[3] [3]

and Zoephel, Andreas and Mayer, Moriz and Gollner, Andreas and Covini, David and Fischer, Silke and Gerstberger, Thomas and others , title =

Kessler, Dirk and Gmachl, Michael and Mantoulidis, Andreas and Martin, Laetitia J. and Zoephel, Andreas and Mayer, Moriz and Gollner, Andreas and Covini, David and Fischer, Silke and Gerstberger, Thomas and others , title =. Proceedings of the National Academy of Sciences , year =

[4] [4]

, title =

Llombart, Victor and Mansour, Marc R. , title =. eBioMedicine , year =

[5] [5]

and Masuda, Tomohide and Sunseri, Jocelyn and Jia, Andrew and Iovanisci, Richard B

Francoeur, Paul G. and Masuda, Tomohide and Sunseri, Jocelyn and Jia, Andrew and Iovanisci, Richard B. and Snyder, Ian and Koes, David R. , title =. Journal of Chemical Information and Modeling , year =

[6] [6]

Bioinformatics , year =

Liu, Zhihai and Li, Min and Xiong, Jian-Xing and Li, Yan and Liu, Jie and Zhao, Zhixiong and Chen, Hao and Wang, Renxiao , title =. Bioinformatics , year =

[7] [7]

and Oviedo, Felipe and Ferres, Juan Lavista and Bowman, Gregory R

Meller, Artur and Ward, Michael and Borowsky, Jonathan and Kshirsagar, Meghana and Lotthammer, Jeffrey M. and Oviedo, Felipe and Ferres, Juan Lavista and Bowman, Gregory R. , title =. Nature Communications , year =

[8] [8]

2025 , volume =

CryptoBench: cryptic protein--ligand binding sites dataset and benchmark , journal =. 2025 , volume =

2025

[9] [9]

RDKit: Open-Source Cheminformatics Software , year =

[10] [10]

Richard and Paolini, Gaia V

Bickerton, G. Richard and Paolini, Gaia V. and Besnard, J. Quantifying the chemical beauty of drugs , journal =. 2012 , volume =

2012

[11] [11]

, title =

Trott, Oleg and Olson, Arthur J. , title =. Journal of Computational Chemistry , year =

[12] [12]

, title =

Dunn, Ian and Koes, David R. , title =. 2024 , eprint =. doi:10.48550/arXiv.2411.16644 , url =

work page doi:10.48550/arxiv.2411.16644 2024

[13] [13]

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics , year =

Irwin, Ross and Tibo, Alessandro and Janet, Jon Paul and Olsson, Simon , title =. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics , year =

[14] [14]

ICLR Workshop on Integrating Generative and Experimental Platforms for Biomolecular Design , year =

Buttenschoen, Martin and Ziv, Yael and Morris, Garrett and Deane, Charlotte , title =. ICLR Workshop on Integrating Generative and Experimental Platforms for Biomolecular Design , year =

[15] [15]

and Laaksonen, Aatto and Nordenski

Korolev, Nikolay and Lyubartsev, Alexander P. and Laaksonen, Aatto and Nordenski. On the Competition between Water, Sodium Ions, and Spermine in Binding to DNA: A Molecular Dynamics Computer Simulation Study , journal =. 2002 , volume =

2002

[16] [16]

Biomolecules , year =

Dong, Benzhi and Li, Sijia and Hou, Chang and Xu, Dali , title =. Biomolecules , year =

[17] [17]

and Lindert, Steffen , title =

Turzo, SM Bargeen Alam and Hantz, Eric R. and Lindert, Steffen , title =. QRB Discovery , year =

[18] [18]

, title =

Bellock, Kenneth E. , title =

[19] [19]

Enhancing the Performance of

Gavali, Esha Rajesh , school=. Enhancing the Performance of

[20] [20]

Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

Towards Multi-Modal Foundations for Chemical Science , author=. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

[21] [21]

Guan, Haotian and Lu, Jiejun and Zhang, Chen and Chen, Fangjian and He, Xiaofei , booktitle=. Cohort

[22] [22]

Journal of Molecular Biology , volume =

Shape complementarity at protein/protein interfaces , author =. Journal of Molecular Biology , volume =. 1993 , doi =

1993