ShallowBench: Benchmarking Generative Drug Design Models on Shallow-Pocket Targets
Pith reviewed 2026-06-28 02:18 UTC · model grok-4.3
The pith
Generative drug design models exhibit weaker binding affinity on shallow-pocket protein targets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ShallowBench provides a strictly curated benchmark of 5,780 shallow-pocket targets. By computing the difference between an Alpha Shape lid volume and the underlying protein atom voxel volume, targets with low concavity and sufficient surface area are isolated. Evaluating state-of-the-art generative models on this set reveals weaker predicted binding affinity on low-concavity interfaces, highlighting the need for new architectural innovations or loss functions.
What carries the argument
ShallowBench, a benchmark of shallow-pocket targets curated via Alpha Shape lid volume minus protein voxel volume difference.
If this is right
- Current generative models require evaluation on shallow-pocket benchmarks to demonstrate effectiveness.
- New model architectures or loss functions are needed to better sample ligands for low-concavity targets.
- ShallowBench enables systematic testing on historically undruggable targets such as KRAS and MYC.
- The benchmark serves as a standard for advancing generative biology models beyond deep-pocket assumptions.
Where Pith is reading between the lines
- Applying the curation method to additional datasets could create larger or more diverse shallow-pocket collections.
- The performance difference may stem from models being trained predominantly on deep-pocket examples.
- Future work could explore whether incorporating explicit concavity features into models closes the affinity gap.
- Experimental binding assays on ligands generated for ShallowBench targets would provide ground-truth validation.
Load-bearing premise
The difference between an Alpha Shape lid volume and the underlying protein atom voxel volume accurately isolates targets with low concavity while ensuring sufficient surface area for binding.
What would settle it
Demonstrating that generative models achieve similar or superior predicted binding affinities on ShallowBench shallow-pocket targets as on deep-pocket targets would disprove the weaker performance observation.
Figures
read the original abstract
While generative AI models have demonstrated remarkable success in structure-based drug design, they predominantly rely on deep binding pockets and struggle to sample effective ligands for challenging low-pocketability targets, such as the historically "undruggable" oncology targets KRAS and MYC. To address this gap, we introduce ShallowBench, a strictly curated benchmark of 5,780 shallow-pocket targets extracted from CrossDocked2020. By computing the difference between an Alpha Shape "lid" volume and the underlying protein atom voxel volume, we successfully isolated targets with low concavity while ensuring sufficient surface area for binding. Evaluating various state-of-the-art generative models reveals weaker predicted binding affinity on these low-concavity interfaces. ShallowBench therefore provides a rigorous benchmark for generative biology models and highlights the necessity of new architectural innovations or loss functions capable of navigating these challenging targets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ShallowBench, a benchmark of 5,780 shallow-pocket targets extracted from CrossDocked2020. Curation relies on the difference between an Alpha Shape 'lid' volume and the underlying protein atom voxel volume to isolate low-concavity interfaces while retaining sufficient surface area. Evaluation of state-of-the-art generative drug design models on this set reports weaker predicted binding affinities relative to deeper-pocket cases, with the abstract highlighting implications for targets such as KRAS and MYC.
Significance. If the curation criterion is shown to be valid, the benchmark would fill a clear gap by providing a reproducible test set focused on low-concavity interfaces that current generative models struggle with. The reliance on public CrossDocked2020 data and an independent geometric computation (rather than fitted parameters) is a strength that supports reproducibility and community use.
major comments (1)
- [Abstract] Abstract: The central claim that the benchmark isolates 'targets with low concavity' and that weaker affinities are attributable to this property rests on the Alpha Shape lid-volume difference. No quantitative validation is described—no correlation with known shallow-pocket examples (KRAS, MYC), no comparison to established descriptors such as fpocket depth or concavity scores, and no reported thresholds or sensitivity analysis. This directly undermines attribution of the performance gap to shallow pockets.
minor comments (1)
- [Abstract] Abstract: The statement 'weaker predicted binding affinity' is given without naming the specific affinity predictor, the exact metric (e.g., docking score, predicted pKd), or any error bars or statistical tests.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The concern regarding quantitative validation of the shallow-pocket curation criterion is valid, and we will strengthen the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the benchmark isolates 'targets with low concavity' and that weaker affinities are attributable to this property rests on the Alpha Shape lid-volume difference. No quantitative validation is described—no correlation with known shallow-pocket examples (KRAS, MYC), no comparison to established descriptors such as fpocket depth or concavity scores, and no reported thresholds or sensitivity analysis. This directly undermines attribution of the performance gap to shallow pockets.
Authors: We agree that the manuscript as submitted lacks explicit quantitative validation of the Alpha Shape lid-volume criterion. In the revised version we will add: (i) a sensitivity analysis varying the lid-volume difference threshold and reporting how the size and composition of the 5,780-target set changes; (ii) direct numerical comparison of our selected targets against fpocket depth and concavity scores; and (iii) explicit verification that known low-concavity oncology targets (KRAS, MYC) fall inside the ShallowBench distribution or exhibit comparable lid-volume statistics. These additions will provide the missing empirical grounding for attributing the observed affinity gap to low concavity rather than other factors. revision: yes
Circularity Check
No significant circularity; benchmark uses independent geometric selection on public data
full rationale
The paper constructs ShallowBench by applying an Alpha Shape lid-volume difference to the public CrossDocked2020 dataset to select 5,780 targets. This geometric filter is a direct computation on external data and does not reduce any claimed result (weaker binding affinity on selected targets) to a fitted parameter, self-definition, or self-citation chain. Model evaluations are performed on the resulting fixed benchmark set without the predictions being forced by the selection metric itself. No load-bearing self-citations, ansatzes, or renamings are present in the derivation.
Axiom & Free-Parameter Ledger
free parameters (1)
- volume difference threshold
axioms (1)
- domain assumption Alpha Shape lid volume minus protein voxel volume reliably quantifies low concavity with sufficient binding surface
Reference graph
Works this paper leans on
-
[1]
and Lio, Pietro and Welling, Max and Bronstein, Michael and Correia, Bruno , title =
Schneuing, Arne and Harris, Charles and Du, Yuanqi and Didi, Kieran and Jamasb, Arian and Igashov, Ilia and Du, Weitao and Gomes, Carla and Blundell, Tom L. and Lio, Pietro and Welling, Max and Bronstein, Michael and Correia, Bruno , title =. Nature Computational Science , year =
-
[2]
DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking , booktitle =
Corso, Gabriele and St. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking , booktitle =
-
[3]
and Zoephel, Andreas and Mayer, Moriz and Gollner, Andreas and Covini, David and Fischer, Silke and Gerstberger, Thomas and others , title =
Kessler, Dirk and Gmachl, Michael and Mantoulidis, Andreas and Martin, Laetitia J. and Zoephel, Andreas and Mayer, Moriz and Gollner, Andreas and Covini, David and Fischer, Silke and Gerstberger, Thomas and others , title =. Proceedings of the National Academy of Sciences , year =
-
[4]
, title =
Llombart, Victor and Mansour, Marc R. , title =. eBioMedicine , year =
-
[5]
and Masuda, Tomohide and Sunseri, Jocelyn and Jia, Andrew and Iovanisci, Richard B
Francoeur, Paul G. and Masuda, Tomohide and Sunseri, Jocelyn and Jia, Andrew and Iovanisci, Richard B. and Snyder, Ian and Koes, David R. , title =. Journal of Chemical Information and Modeling , year =
-
[6]
Bioinformatics , year =
Liu, Zhihai and Li, Min and Xiong, Jian-Xing and Li, Yan and Liu, Jie and Zhao, Zhixiong and Chen, Hao and Wang, Renxiao , title =. Bioinformatics , year =
-
[7]
and Oviedo, Felipe and Ferres, Juan Lavista and Bowman, Gregory R
Meller, Artur and Ward, Michael and Borowsky, Jonathan and Kshirsagar, Meghana and Lotthammer, Jeffrey M. and Oviedo, Felipe and Ferres, Juan Lavista and Bowman, Gregory R. , title =. Nature Communications , year =
-
[8]
2025 , volume =
CryptoBench: cryptic protein--ligand binding sites dataset and benchmark , journal =. 2025 , volume =
2025
-
[9]
RDKit: Open-Source Cheminformatics Software , year =
-
[10]
Richard and Paolini, Gaia V
Bickerton, G. Richard and Paolini, Gaia V. and Besnard, J. Quantifying the chemical beauty of drugs , journal =. 2012 , volume =
2012
-
[11]
, title =
Trott, Oleg and Olson, Arthur J. , title =. Journal of Computational Chemistry , year =
-
[12]
Dunn, Ian and Koes, David R. , title =. 2024 , eprint =. doi:10.48550/arXiv.2411.16644 , url =
-
[13]
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics , year =
Irwin, Ross and Tibo, Alessandro and Janet, Jon Paul and Olsson, Simon , title =. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics , year =
-
[14]
ICLR Workshop on Integrating Generative and Experimental Platforms for Biomolecular Design , year =
Buttenschoen, Martin and Ziv, Yael and Morris, Garrett and Deane, Charlotte , title =. ICLR Workshop on Integrating Generative and Experimental Platforms for Biomolecular Design , year =
-
[15]
and Laaksonen, Aatto and Nordenski
Korolev, Nikolay and Lyubartsev, Alexander P. and Laaksonen, Aatto and Nordenski. On the Competition between Water, Sodium Ions, and Spermine in Binding to DNA: A Molecular Dynamics Computer Simulation Study , journal =. 2002 , volume =
2002
-
[16]
Biomolecules , year =
Dong, Benzhi and Li, Sijia and Hou, Chang and Xu, Dali , title =. Biomolecules , year =
-
[17]
and Lindert, Steffen , title =
Turzo, SM Bargeen Alam and Hantz, Eric R. and Lindert, Steffen , title =. QRB Discovery , year =
-
[18]
, title =
Bellock, Kenneth E. , title =
-
[19]
Enhancing the Performance of
Gavali, Esha Rajesh , school=. Enhancing the Performance of
-
[20]
Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS) , year=
Towards Multi-Modal Foundations for Chemical Science , author=. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS) , year=
-
[21]
Guan, Haotian and Lu, Jiejun and Zhang, Chen and Chen, Fangjian and He, Xiaofei , booktitle=. Cohort
-
[22]
Journal of Molecular Biology , volume =
Shape complementarity at protein/protein interfaces , author =. Journal of Molecular Biology , volume =. 1993 , doi =
1993
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.