pith. sign in

arxiv: 2605.19579 · v1 · pith:4AHVMGAUnew · submitted 2026-05-19 · 🧬 q-bio.QM

TACK: A statistical evaluation of degradation activity on a novel TArgeting Chimeras Knowledge dataset

Pith reviewed 2026-05-20 02:18 UTC · model grok-4.3

classification 🧬 q-bio.QM
keywords PROTACstargeted protein degradationmachine learningdegradation predictiondatasetXGBoostgraph neural networksuncertainty quantification
0
0 comments X

The pith

Classical machine learning models outperform a specialized graph neural network in predicting PROTAC degradation activity on a new aggregated dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors compile TACK, a dataset of 3,514 PROTACs and 6,561 standardized degradation endpoints drawn from three repositories. Using scaffold-based cross-validation, they run statistical tests that show XGBoost and multilayer perceptrons achieve higher accuracy than the domain-specific PROTAC-STAN graph neural network for classifying active compounds. The same models also predict the potency measure pDC50 far more accurately than the maximum degradation extent Dmax, while feature studies indicate that cellular context details and basic protein descriptions perform comparably to advanced protein language model embeddings. These results matter because they supply concrete evidence on which parts of PROTAC behavior are most amenable to computational prediction and which modeling choices deliver reliable performance.

Core claim

On the TACK dataset, scaffold-based 5x5 cross-validation and statistical tests establish that XGBoost and MLP models reach a ROC-AUC of 0.85 for binary degradation activity classification versus 0.74 for PROTAC-STAN, with p less than 0.001. Regression models trained on the strongest feature sets attain R squared of 0.66 for pDC50 but only 0.36 for Dmax. Feature ablation further shows cellular context features and simple protein representations match the performance of complex ESM embeddings, and ensemble uncertainty quantification correlates prediction variance with error for both tasks.

What carries the argument

The TACK dataset of standardized PROTAC degradation endpoints together with scaffold-based cross-validation and feature ablation studies that directly compare classical models against a domain-specific graph neural network.

If this is right

  • Potency predictions can be used more confidently than maximum degradation predictions to prioritize compounds for synthesis and testing.
  • Ensemble-based uncertainty estimates can flag low-confidence predictions so that experimental resources focus on high-variance cases.
  • Simple protein representations and cellular context features can replace complex embeddings in future modeling pipelines without loss of accuracy.
  • Classical methods such as XGBoost or MLP become the default choice for this task unless new architectures demonstrate clear gains on the same standardized data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Standardization across public repositories may prove more valuable than architectural novelty for other targeted degradation modalities such as molecular glues.
  • The stronger predictability of potency suggests computational screening campaigns could be designed to optimize DC50 values first and treat maximum degradation as a secondary filter.
  • If feature engineering continues to rival sophisticated embeddings, future work could test whether the same pattern holds when the dataset is expanded with new E3 ligase or target protein families.

Load-bearing premise

The data aggregated from three separate repositories, once molecular representations, protein labels, and experimental conditions are standardized, accurately reflect true degradation activities without major unaccounted biases from source differences or protocol variations.

What would settle it

An independent collection of PROTAC degradation measurements obtained under uniform experimental conditions that, when evaluated with the same scaffold-based cross-validation, shows no statistically significant performance gap between classical models and PROTAC-STAN or comparable predictability for pDC50 and Dmax.

Figures

Figures reproduced from arXiv: 2605.19579 by Nils Dunlop, Roc\'io Mercado, Stefano Ribes.

Figure 1
Figure 1. Figure 1: Pipeline summarizing our methodology. (a) Data curation includes merging and cleaning open-source databases [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a, b) Histograms of standardized degradation activity for the training and hold-out sets, showing the distributions of [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Parity plots using MLP(†) and XGBoost(†) for 𝑝𝐷𝐶50 and 𝐷𝑚𝑎𝑥 , respectively, on the hold-out test set (25 CV fold models). (b) Performance comparison for XGBoost(‡), MLP(‡), PROTAC-STAN on the binary activity classification task across four metrics. Points show means with 95% confidence intervals from 25 CV folds. Non-overlapping intervals indicate significant differences (Tukey HSD, 𝑝 < 0.05). Omnibus … view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of the target values across the 5 [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effect sizes of the performance metric values on the [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Normality diagnostic of the performance met [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

Proteolysis-targeting chimeras (PROTACs) represent a promising therapeutic modality that induces targeted protein degradation by hijacking the ubiquitin-proteasome system. However, rational PROTAC design remains challenging due to the complex interplay between molecular structure, target proteins, E3 ligases, and the cellular context. We present TACK, a statistical evaluation of degradation activity on a novel TArgeting Chimeras Knowledge dataset of 3,514 PROTACs and 6,561 degradation endpoints aggregated from three major repositories with standardized molecular representations, protein annotations, and experimental conditions. Using scaffold-based 5$\times$5 cross-validation, we perform a rigorous statistical comparison of three machine learning methods to predict PROTAC degradation activity across three tasks: $DC_{50}$ and Dmax regression, and binary activity classification. Feature ablation demonstrates that cellular context features and simple protein representations rival complex ESM protein embeddings, highlighting the importance of feature engineering over architectural sophistication. Models trained on the best performing features show that potency ($pDC_{50}$, $R^2=0.66$) is substantially more predictable than maximum degradation (Dmax, $R^2=0.36$). In activity prediction, statistical tests support that classical methods (XGBoost and MLP) significantly outperform PROTAC-STAN, a domain-specific graph neural network model (ROC-AUC: 0.85 vs. 0.74, p<0.001). Finally, we propose an ensemble-based uncertainty quantification approach showing that prediction variance correlates with prediction error ($pDC_{50}$: Spearman $\rho=0.36$, p<0.001; Dmax: $\rho=0.69$, p<0.001), enabling confidence-aware experimental prioritization. Our findings challenge assumptions about specialized architectures for degradation prediction and provide evidence-based guidance for ML-driven PROTAC assessment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the TACK dataset of 3,514 PROTACs and 6,561 degradation endpoints aggregated from three repositories with standardized representations. It evaluates XGBoost, MLP, and PROTAC-STAN via scaffold-based 5×5 cross-validation for pDC50 and Dmax regression plus binary activity classification, reporting that classical methods outperform the GNN (ROC-AUC 0.85 vs. 0.74, p<0.001), potency is more predictable than Dmax (R²=0.66 vs. 0.36), and ensemble uncertainty correlates with error.

Significance. If the data aggregation is robust, the work supplies concrete statistical evidence that feature engineering and classical models can outperform specialized graph architectures for PROTAC degradation prediction, directly challenging prevailing assumptions. The scaffold-based CV, reported R²/AUC values with significance tests, and uncertainty quantification add practical value for prioritizing experiments in this therapeutic area.

major comments (2)
  1. [Data aggregation and standardization (Abstract and Methods)] The headline claims (XGBoost/MLP ROC-AUC 0.85 vs. PROTAC-STAN 0.74; pDC50 R²=0.66 vs. Dmax R²=0.36) rest on the assumption that the 6,561 aggregated endpoints are directly comparable after standardization of molecular representations, protein annotations, and experimental conditions. Degradation readouts such as DC50 and Dmax are known to vary with cell line, assay format, incubation time, and detection method. Without explicit details on exclusion criteria, source-specific bias checks, or sensitivity analyses, residual protocol effects could systematically inflate predictability differences and favor classical models that exploit offsets over structure-aware GNNs. This is load-bearing for the central model-comparison and feature-ablation conclusions.
  2. [Results, model comparison subsection] The statistical tests supporting classical methods over PROTAC-STAN are reported with p<0.001, but the manuscript should specify whether the GNN received identical feature sets (including cellular context and protein representations) or was evaluated under its native graph-only regime. If the latter, the performance gap may reflect input differences rather than architecture, weakening the claim that 'architectural sophistication' is less important than feature engineering.
minor comments (2)
  1. [Abstract] Notation inconsistency: the abstract refers to DC50 regression yet reports pDC50 R²; ensure consistent use of pDC50 throughout.
  2. [Methods] Clarify implementation details of the scaffold-based 5×5 cross-validation to confirm no leakage between train and test folds across the three tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. These have prompted us to clarify key aspects of the data aggregation process and model evaluation protocols. We address each major comment point by point below, indicating revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Data aggregation and standardization (Abstract and Methods)] The headline claims (XGBoost/MLP ROC-AUC 0.85 vs. PROTAC-STAN 0.74; pDC50 R²=0.66 vs. Dmax R²=0.36) rest on the assumption that the 6,561 aggregated endpoints are directly comparable after standardization of molecular representations, protein annotations, and experimental conditions. Degradation readouts such as DC50 and Dmax are known to vary with cell line, assay format, incubation time, and detection method. Without explicit details on exclusion criteria, source-specific bias checks, or sensitivity analyses, residual protocol effects could systematically inflate predictability differences and favor classical models that exploit offsets over structure-aware GNNs. This is load-bearing for the central model-comparison and feature-ablation conclusions.

    Authors: We agree that explicit documentation of aggregation choices is essential to support the central claims. The original Methods section describes standardization of molecular representations (canonical SMILES via RDKit), protein targets (UniProt mapping), and activity values (conversion to pDC50 where feasible). To directly address the referee's concern, we have added a dedicated subsection on data curation that specifies exclusion criteria (e.g., removal of duplicates, entries lacking cell-line metadata, or inconsistent assay units) and source-specific bias diagnostics. We also include a sensitivity analysis in the revised Supplementary Information comparing model performance on single-repository subsets versus the aggregated set; the relative ordering of methods and the potency-vs-Dmax predictability gap remain consistent. These additions confirm that residual protocol variation does not drive the reported differences. revision: yes

  2. Referee: [Results, model comparison subsection] The statistical tests supporting classical methods over PROTAC-STAN are reported with p<0.001, but the manuscript should specify whether the GNN received identical feature sets (including cellular context and protein representations) or was evaluated under its native graph-only regime. If the latter, the performance gap may reflect input differences rather than architecture, weakening the claim that 'architectural sophistication' is less important than feature engineering.

    Authors: We appreciate the referee's clarification request. PROTAC-STAN was evaluated strictly in its native graph-only regime as defined in the original publication, using only the molecular graph without cellular-context or protein-sequence features. This choice reflects how the model is typically deployed and allows a direct comparison between a specialized GNN architecture and classical models that incorporate the engineered feature set (including cellular context and simple protein annotations) described in our ablation study. We have revised the Results and Discussion sections to state this input-regime difference explicitly and have added a sentence noting that future work could explore augmenting graph models with the same contextual features. The performance gap therefore illustrates the practical value of feature engineering in the current PROTAC modeling landscape rather than a pure architectural comparison. revision: yes

Circularity Check

0 steps flagged

Empirical ML evaluation on held-out splits shows no circularity

full rationale

The paper conducts a purely empirical statistical evaluation of ML models for PROTAC degradation prediction. It aggregates data from repositories, applies standardization, performs scaffold-based 5×5 cross-validation, conducts feature ablation, and reports direct performance metrics (ROC-AUC, R²) plus statistical tests on held-out data. No derivation chain, equations, or self-referential quantities are described; central claims rest on external data splits and comparisons rather than reducing to fitted parameters or self-citations by construction. The methodology is self-contained against benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that standardized aggregation from heterogeneous repositories produces unbiased labels and that scaffold splitting sufficiently controls for molecular similarity; no new physical entities are postulated.

free parameters (1)
  • XGBoost and MLP hyperparameters
    Standard ML training choices that affect reported R² and AUC values but are not enumerated in the abstract.
axioms (1)
  • domain assumption Standardized molecular representations and protein annotations from three repositories accurately capture experimental degradation activity without source-specific biases.
    Invoked when combining data for the 6,561 endpoints and when claiming feature ablation results generalize.

pith-pipeline@v0.9.0 · 5886 in / 1433 out tokens · 42577 ms · 2026-05-20T02:18:59.073194+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A next-generation hyperparameter optimization frame- work. InProceedings of the 25th ACM SIGKDD international conference on knowl- edge discovery & data mining. 2623–2631

  2. [2]

    Ash, Cas Wognum, Raquel Rodríguez-Pérez, Matteo Aldeghi, Alan C

    Jeremy R. Ash, Cas Wognum, Raquel Rodríguez-Pérez, Matteo Aldeghi, Alan C. Cheng, Djork-Arné Clevert, Ola Engkvist, Cheng Fang, Daniel J. Price, Jacque- line M. Hughes-Oliver, and W. Patrick Walters. 2025. Practically Significant Method Comparison Protocols for Machine Learning in Small Molecule Drug Discovery.Journal of Chemical Information and Modeling6...

  3. [3]

    Amos Bairoch. 2018. The Cellosaurus, a cell-line knowledge resource.Journal of Biomolecular Techniques29, 2 (2018), 25–38. doi:10.7171/jbt.18-2902-002

  4. [4]

    Miklós Békés, David R Langley, and Craig M Crews. 2022. PROTAC targeted protein degraders: the past is prologue.Nature Reviews Drug Discovery21, 3 (2022), 181–200

  5. [5]

    Bemis and Mark A

    Guy W. Bemis and Mark A. Murcko. 1996. The Properties of Known Drugs

  6. [6]

    Bemis and Mark A

    Molecular Frameworks.Journal of Medicinal Chemistry39, 15 (Jan. 1996), 2887–2893. doi:10.1021/jm9602928 Publisher: American Chemical Society

  7. [7]

    Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological)57, 1 (1995), 289–300

  8. [8]

    James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization.Advances in Neural Information Processing Systems24 (2011)

  9. [9]

    Hong Cai, Gengyuan Yao, Yulong Shi, Tianyi Zhang, and Yuanjia Hu. 2025. PROTAC-PatentDB: A PROTAC patent compound dataset.Scientific Data12, 1 (2025), 1840

  10. [10]

    Annabel Cardno, Bryony Kennedy, and Catherine Lindon. 2025. Cellular parame- ters shaping pathways of targeted protein degradation.Communications Biology 8, 1 (2025), 691

  11. [11]

    Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. 2004. Ensemble selection from libraries of models. InProceedings of the Twenty-First International Conference on Machine Learning(Banff, Alberta, Canada)(ICML ’04). Association for Computing Machinery, New York, NY, USA, 18. doi:10.1145/ 1015330.1015432

  12. [12]

    Zhenglu Chen, Chunbin Gu, Shuoyan Tan, Xiaorui Wang, Yuquan Li, Mutian He, Ruiqiang Lu, Shijia Sun, Chang-Yu Hsieh, Xiaojun Yao, Huanxiang Liu, and Pheng-Ann Heng. 2025. Interpretable PROTAC degradation prediction with structure-informed deep ternary attention framework.Advanced Science12, 47 (2025), e08138. doi:10.1002/advs.202508138

  13. [13]

    Deborah Chirnomas, Keith R Hornberger, and Craig M Crews. 2023. Protein degraders enter the clinic—a new approach to cancer therapy.Nature Reviews Clinical Oncology20, 4 (2023), 265–278

  14. [14]

    Michael L Drummond and Christopher I Williams. 2019. In silico modeling of PROTAC-mediated ternary complexes.Journal of Chemical Information and Modeling59, 4 (2019), 1634–1644

  15. [15]

    Nils Dunlop, Francisco Erazo, Farzaneh Jalalypour, and Rocío Mercado. 2025. Predicting PROTAC-mediated ternary complexes with AlphaFold3 and Boltz-1. Digital Discovery4, 12 (2025), 3782–3809

  16. [16]

    Milton Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance.J. Amer. Statist. Assoc.32, 200 (1937), 675–701. doi:10.1080/01621459.1937.10503522

  17. [17]

    Jingxuan Ge, Shimeng Li, Gaoqi Weng, Huating Wang, Meijing Fang, Huiyong Sun, Yafeng Deng, Chang-Yu Hsieh, Dan Li, and Tingjun Hou. 2025. PROTAC-DB 3.0: an updated database of PROTACs with extended pharmacokinetic parameters. Nucleic Acids Research53, D1 (2025), D1510–D1515

  18. [18]

    Yossra Gharbi and Rocío Mercado. 2024. A comprehensive review of emerging approaches in machine learning for de novo PROTAC design.Digital Discovery 3, 11 (2024), 2158–2176

  19. [19]

    2000.Statistical modelling with quantile functions

    Warren Gilchrist. 2000.Statistical modelling with quantile functions. Chapman and Hall/CRC

  20. [20]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. InProceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) (ICCV ’15). IEEE Computer Society, USA, 1026–1034. doi:10.1109/ICCV.2015.123

  21. [21]

    Dejun Jiang, Zhenxing Wu, Chang-Yu Hsieh, Guangyong Chen, Ben Liao, Zhe Wang, Chao Shen, Dongsheng Cao, Jian Wu, and Tingjun Hou. 2021. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models.Journal of Cheminformatics13, 1 (2021), 12

  22. [22]

    Jin-Hwa Kim, Jaehyun Jun, and Byoung-Tak Zhang. 2018. Bilinear Attention Networks. InAdvances in Neural Information Processing Systems 31. 1571–1581

  23. [23]

    Gregory Landrum. 2024. RDKit: Open-source Cheminformatics. https://www. rdkit.org Accessed: 2026-02-01

  24. [24]

    Howard Levene. 1960. Robust tests for equality of variances.Contributions to Probability and Statistics(1960), 278–292

  25. [25]

    Fenglei Li, Qiaoyu Hu, Xianglei Zhang, Renhong Sun, Zhuanghua Liu, Sanan Wu, Siyuan Tian, Xinyue Ma, Zhizhuo Dai, Xiaobao Yang, et al. 2022. DeepPROTACs is a deep learning-based targeted degradation predictor for PROTACs.Nature Communications13, 1 (2022), 7133

  26. [26]

    Nir London and Jaime Prilusky. 2024. PROTACpedia. https://protacpedia. weizmann.ac.il. Accessed: 2026-02-01

  27. [27]

    Coley, and Rocío Mercado

    Divya Nori, Connor W. Coley, and Rocío Mercado. 2022. De novo PROTAC design using graph-based deep generative models. InProceedings of the NeurIPS 2022 Workshop on AI for Science. doi:10.48550/arXiv.2211.02660

  28. [28]

    Maja Pavlovic. 2025. Understanding Model Calibration: A gentle introduction and visual exploration of calibration and the expected calibration error (ECE). arXiv:2501.19047 [stat.ME] https://arxiv.org/abs/2501.19047

  29. [29]

    Andrew Pike, Beth Williamson, Stephanie Harlfinger, Scott Martin, and Dermot F McGinnity. 2020. Optimising proteolysis-targeting chimeras (PROTACs) for KDD AI for Sciences T rack ’26, Aug 09–13, 2026, Jeju, Korea Ribes et al. oral drug delivery: a drug metabolism and pharmacokinetics perspective.Drug Discovery Today25, 10 (2020), 1793–1800

  30. [30]

    Xinran Qin, Yinpeng Zhang, Yajunzi Wang, Yintao Zhang, Jiachen Jing, Yuyuan Zhang, Gaoxiang Xu, Haoping Teng, Tianjun Wang, Lei Fu, et al. 2026. TPDdb: the comprehensive database of targeted protein degrader.Nucleic Acids Research 54, D1 (2026), D1683–D1691

  31. [31]

    Stefano Ribes, Eva Nittinger, Christian Tyrchan, and Rocío Mercado. 2024. Mod- eling PROTAC degradation activity with machine learning.Artificial Intelligence in the Life Sciences6 (2024), 100104. doi:10.1016/j.ailsci.2024.100104

  32. [32]

    Kathleen M Sakamoto, Kyung B Kim, Akiko Kumagai, Frank Mercurio, Craig M Crews, and Raymond J Deshaies. 2001. Protacs: Chimeric molecules that target proteins to the Skp1–Cullin–F box complex for ubiquitination and degradation. Proceedings of the National Academy of Sciences98, 15 (2001), 8554–8559

  33. [33]

    Ashley R Schneekloth, Mathieu Pucheault, Hyun Seop Tae, and Craig M Crews

  34. [34]

    Targeted intracellular protein degradation induced by a small molecule: En route to chemical proteomics.Bioorganic & Medicinal Chemistry Letters18, 22 (2008), 5904–5908

  35. [35]

    The UniProt Consortium. 2025. UniProt: the Universal Protein Knowledgebase in 2025.Nucleic Acids Research53, D1 (2025), D609–D617. doi:10.1093/nar/gkae1010

  36. [36]

    John W Tukey. 1949. Comparing individual means in the analysis of variance. Biometrics(1949), 99–114

  37. [37]

    Derek van Tilborg and Francesca Grisoni. 2024. Traversing chemical space with active deep learning for low-data drug discovery.Nature Computational Science 4, 10 (2024), 786–796

  38. [38]

    Gaoqi Weng, Xuanyan Cai, Dongsheng Cao, Hongyan Du, Chao Shen, Yafeng Deng, Qiaojun He, Bo Yang, Dan Li, and Tingjun Hou. 2023. PROTAC-DB 2.0: an updated database of PROTACs.Nucleic Acids Research51, D1 (2023), D1367– D1372

  39. [39]

    R. F. Woolson. 2005.Wilcoxon Signed-Rank Test. John Wiley & Sons, Ltd. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/0470011815.b2a15177 doi:10.1002/0470011815.b2a15177

  40. [40]

    Daniel Zaidman, Jaime Prilusky, and Nir London. 2020. PRosettaC: Rosetta based modeling of PROTAC mediated ternary complexes.Journal of Chemical Information and Modeling60, 9 (2020), 4467–4480

  41. [41]

    Zuobai Zhang, Jiarui Lu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, and Jian Tang. 2024. Structure-Informed Protein Language Model. arXiv:2402.05856 doi:10.48550/arXiv.2402.05856 A Model Details We developed a multilayer perceptron (MLP) architecture with flexible depth and width. The MLP consists of a variable number of hidden layers with dimensi...

  42. [42]

    The top-left-most model indicates the best performing model for a given metric. 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 0.100 Value 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0Count ROC-AUC 0.10 0.05 0.00 0.05 Value 0 2 4 6 8 10 12 14 16 PR-AUC 2 1 0 1 2 Theoretical quantiles 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 0.100 Ordered Values 2 1 0 1...