I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models
Pith reviewed 2026-05-22 09:52 UTC · model grok-4.3
The pith
I-SAFE auditing reveals different distributional profiles in DTI models with similar accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a trained black-box predictor and an external structural prior encoding domain knowledge about task-relevant input structure, I-SAFE evaluates raw model outputs under structurally guided perturbations of the input. The proposed audit measures output-distribution coherence through three complementary metrics: a Quantile-Based Metric for location-level coherence, the Wasserstein Coherence Metric for ordinal coherence, and a translation-invariant WCM variant for shape coherence. Instantiated on drug-target interaction prediction using the Davis kinase benchmark, KLIFS binding-pocket annotations, and three models, the framework shows that models with comparable predictive performance can,
What carries the argument
Wasserstein Coherence Metric that quantifies ordinal and shape coherence of model output distributions under perturbations derived from the external structural prior.
If this is right
- Models can be compared and selected according to structural coherence in addition to predictive accuracy.
- The audit can identify reliance on dataset-specific regularities rather than domain-relevant features.
- The framework applies directly to any scientific prediction task where inputs admit structured decomposition and an external prior exists.
Where Pith is reading between the lines
- Combining coherence scores with standard accuracy could produce a joint ranking criterion for model deployment in scientific settings.
- Low coherence on specific perturbation types might guide targeted data collection or architecture adjustments.
Load-bearing premise
The external structural prior accurately encodes task-relevant input structure that can be used to generate meaningful perturbations for the audit.
What would settle it
Running the three coherence metrics on the same set of KLIFS-guided perturbations and finding that the three DTI models produce identical or statistically indistinguishable distributional response profiles.
Figures
read the original abstract
Deep learning models are increasingly used in scientific prediction tasks where strong benchmark performance is often interpreted as evidence of scientifically meaningful behavior. This interpretation is fragile, as models may exploit shortcut features, dataset-specific regularities, or distributional biases that are predictive on held-out data but not aligned with domain-relevant structure. To address this limitation, we introduce the \textsc{I-SAFE} (Interventional Secure, Accurate, Fair and Explainable) framework, a post-hoc distributional auditing framework for scientific AI models centered on the Wasserstein Coherence Metric (WCM). Given a trained black-box predictor and an external structural prior encoding domain knowledge about task-relevant input structure, \textsc{I-SAFE} evaluates raw model outputs under structurally guided perturbations of the input. The proposed audit measures output-distribution coherence through three complementary metrics: a Quantile-Based Metric (QBM) for location-level coherence, the WCM for ordinal coherence, and a translation-invariant WCM variant for shape coherence. We instantiate \textsc{I-SAFE} on drug--target interaction (DTI) prediction using the Davis kinase benchmark, KLIFS (Kinase--Ligand Interaction Fingerprints and Structures) binding-pocket annotations, and three sequence-based DTI models: DeepConvDTI, DeepDTA, and TAPB. Although the models operate in a comparable predictive regime, \textsc{I-SAFE} reveals substantially different distributional response profiles, a distinction invisible to accuracy-based evaluation. The framework is model-agnostic and applicable to any domain where inputs admit a structured decomposition and an external prior is available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the I-SAFE framework, a post-hoc auditing method for scientific AI models that applies Wasserstein Coherence Metrics (WCM) and a Quantile-Based Metric (QBM) to evaluate output-distribution coherence under perturbations generated from an external structural prior (KLIFS binding-pocket annotations). It demonstrates the approach on three sequence-based drug-target interaction models (DeepConvDTI, DeepDTA, TAPB) trained on the Davis kinase benchmark, claiming that the models exhibit substantially different distributional response profiles despite comparable predictive accuracy.
Significance. If the central claims hold, I-SAFE offers a model-agnostic tool for detecting misalignment between model behavior and domain-relevant structure that standard accuracy metrics miss. The use of Wasserstein distances for ordinal and shape coherence, combined with an external prior, provides a concrete way to audit shortcut exploitation in scientific prediction tasks.
major comments (2)
- [§3.2] §3.2 (Perturbation Generation): The paper does not report whether KLIFS-guided perturbations preserve marginal input statistics such as amino-acid composition or sequence-length distribution across the three models. Without this check, the observed differences in QBM/WCM profiles could reflect architecture-specific sensitivity to any structured input change rather than genuine misalignment with binding-pocket structure, directly undermining the central claim that I-SAFE isolates task-relevant structural coherence.
- [§4.3] §4.3 (Results and Comparison): The claim that accuracy-based evaluation is 'invisible' to the distinctions found by I-SAFE requires explicit quantification of how much of the WCM/QBM separation is explained by residual correlations with non-structural features; the current presentation leaves open the possibility that the metrics are re-detecting known architecture differences rather than new scientific misalignment.
minor comments (2)
- [§2.3] The definition of the translation-invariant WCM variant should include an explicit equation showing how translation invariance is enforced, to allow readers to verify it does not inadvertently remove shape information relevant to the audit.
- [Figure 2] Figure 2 (Distributional response profiles): Axis labels and legend entries are too small for readability; increase font size and add a brief caption explaining the color coding for the three models.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments on our manuscript. These observations help clarify how to better isolate the contribution of structural priors in the I-SAFE framework. We respond to each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Perturbation Generation): The paper does not report whether KLIFS-guided perturbations preserve marginal input statistics such as amino-acid composition or sequence-length distribution across the three models. Without this check, the observed differences in QBM/WCM profiles could reflect architecture-specific sensitivity to any structured input change rather than genuine misalignment with binding-pocket structure, directly undermining the central claim that I-SAFE isolates task-relevant structural coherence.
Authors: We agree that an explicit check on marginal input statistics would strengthen the interpretation. In the revised manuscript we will add a supplementary table and brief analysis comparing amino-acid composition and sequence-length distributions between the original sequences and the KLIFS-guided perturbations for each of the three models. The perturbations are constructed by targeted residue substitutions within the binding-pocket regions annotated by KLIFS; because the changes are localized and the overall sequence length is unchanged, we expect the marginals to remain largely preserved. Including this verification will directly address the concern that the observed coherence differences could arise from generic sensitivity to any input modification. revision: yes
-
Referee: [§4.3] §4.3 (Results and Comparison): The claim that accuracy-based evaluation is 'invisible' to the distinctions found by I-SAFE requires explicit quantification of how much of the WCM/QBM separation is explained by residual correlations with non-structural features; the current presentation leaves open the possibility that the metrics are re-detecting known architecture differences rather than new scientific misalignment.
Authors: We acknowledge that a quantitative separation from non-structural factors would make the claim more robust. In the revision we will add a short analysis (new panel or appendix) that reports partial correlations and a simple regression of the WCM/QBM scores against a set of non-structural covariates (model depth, embedding dimension, and basic sequence statistics). This will allow readers to see the fraction of metric separation that remains after controlling for these factors. We maintain that the primary distinction arises from differential sensitivity to the KLIFS structural prior, but the added quantification will clarify the extent to which architecture-specific traits contribute. revision: yes
Circularity Check
No circularity: I-SAFE metrics defined directly from external prior and Wasserstein distances
full rationale
The paper defines the Quantile-Based Metric (QBM), Wasserstein Coherence Metric (WCM), and its translation-invariant variant explicitly as functions of raw model outputs under perturbations generated from the independent KLIFS binding-pocket annotations. These definitions rely on standard Wasserstein distance applied to the resulting output distributions and do not reduce to fitted parameters, self-referential quantities, or prior results by the same authors. The central empirical claim—that the three DTI models exhibit distinct distributional response profiles despite comparable accuracy—is an observation obtained by applying the externally defined metrics, not a tautology. No self-citation chains, uniqueness theorems, or smuggled ansatzes appear in the load-bearing steps of the derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption External structural prior encodes domain knowledge about task-relevant input structure
invented entities (1)
-
Wasserstein Coherence Metric (WCM)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Wasserstein Coherence Metric (WCM) defined via optimal transport reordering of output profiles under mechanistic vs spurious perturbations
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Goodfellow, Moritz Hardt, and Been Kim
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian J. Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. InAdvances in Neural Information Processing 11 Systems, volume 31, pages 9525–9536, 2018
work page 2018
-
[2]
T. W. Anderson. On the distribution of the two-sample Cramér–von Mises criterion.The Annals of Mathematical Statistics, 33(3):1148–1159, 1962
work page 1962
-
[3]
Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[4]
Gennaro Auricchio, Adelaide Emma Bernardelli, Paolo Giudici, and Giuseppe Toscani. On rank graduation metrics for high-dimensional ordinal data.Mathematical Models and Methods in Applied Sciences, pages 1–35, 2026
work page 2026
-
[5]
The equivalence of fourier-based and wasserstein metrics on imaging problems
Gennaro Auricchio, Andrea Codegoni, Stefano Gualandi, Giuseppe Toscani, and Marco Veneroni. The equivalence of fourier-based and wasserstein metrics on imaging problems. Rendiconti Lincei, 31(3):627–649, 2020
work page 2020
-
[6]
A rank graduation box for safe ai.Expert systems with applications, 259:125239, 2025
Golnoosh Babaei, Paolo Giudici, and Emanuela Raffinetti. A rank graduation box for safe ai.Expert systems with applications, 259:125239, 2025
work page 2025
-
[7]
On the composition of elementary errors.Scandinavian Actuarial Journal, 1928(1):13–74, 1928
Harald Cramér. On the composition of elementary errors.Scandinavian Actuarial Journal, 1928(1):13–74, 1928
work page 1928
-
[8]
Mindy I. Davis, Jeremy P. Hunt, Sanna Herrgard, Pietro Ciceri, Lisa M. Wodicka, Gabriel Pallares, Michael Hocker, Daniel K. Treiber, and Patrick P. Zarrinkar. Comprehensive analysis of kinase inhibitor selectivity.Nature Biotechnology, 29(11):1046–1051, 2011
work page 2011
-
[9]
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning.arXiv preprint arXiv:1702.08608, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[10]
Causal abstractions of neural networks
Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. Causal abstractions of neural networks. InAdvances in Neural Information Processing Systems, volume 34, pages 9574–9586, 2021
work page 2021
-
[11]
Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, and Noah D. Goodman. Finding alignments between interpretable causal variables and distributed neural repre- sentations. InProceedings of the Third Conference on Causal Learning and Reasoning, volume 236 ofProceedings of Machine Learning Research, pages 160–187, 2024
work page 2024
-
[12]
Zemel, Wieland Brendel, Matthias Bethge, and Felix A
Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard S. Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. Shortcut learning in deep neural networks.Nature Machine Intelligence, 2:665–673, 2020
work page 2020
-
[13]
Dennis Graber, Patrick Stockinger, Fabian Meyer, Siddharth Mishra, Christopher Horn, and Rebecca Buller. Resolving data bias improves generalization in binding affinity prediction.Nature Machine Intelligence, 7(10):1713–1725, 2025
work page 2025
-
[14]
Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik
Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik. Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. In Advances in Neural Information Processing Systems, 2021. Datasets and Benchmarks Track
work page 2021
-
[15]
Adversarial examples are not bugs, they are features
Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Mądry. Adversarial examples are not bugs, they are features. InAdvances in Neural Information Processing Systems, volume 32, 2019. 12
work page 2019
-
[16]
Georgi K. Kanev, Chris de Graaf, Bart A. Westerman, Iwan J. P. de Esch, and Albert J. Kooistra. KLIFS: an overhaul after the first 5 years of supporting kinase research.Nucleic Acids Research, 49(D1):D562–D569, 2021
work page 2021
-
[17]
Albert J. Kooistra, Georgi K. Kanev, Oscar P. J. van Linden, Rob Leurs, Iwan J. P. de Esch, and Chris de Graaf. KLIFS: a structural kinase–ligand interaction database. Nucleic Acids Research, 44(D1):D365–D371, 2016
work page 2016
-
[18]
Ingoo Lee, Jongsoo Keum, and Hojung Nam. DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences.PLOS Computational Biology, 15(6):e1007129, 2019
work page 2019
-
[19]
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian D. Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zeiler, Dan Jurafsky, Tatsunori Hashimoto, Peter Hende...
work page 2023
-
[20]
Guanxing Lin, Xinyi Zhang, Zhen Ren, Quan Zou, Prayag Tiwari, Cheng Zhou, and Yi Ding. TAPB: an interventional debiasing framework for alleviating target prior bias in drug–target interaction prediction.Nature Communications, 16:10867, 2025
work page 2025
-
[21]
Mohammad Lotfollahi, Anna Klimovskaia Susmelj, Carlo De Donno, et al. Predicting cellular responses to complex perturbations in high-throughput screens.Molecular Systems Biology, 19:e11517, 2023
work page 2023
-
[22]
Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InAdvances in Neural Information Processing Systems, volume 30, pages 4766–4777, 2017
work page 2017
-
[23]
Andrea Mastropietro, Giuseppe Pasculli, and Jürgen Bajorath. Learning characteristics of graph neural networks predicting protein–ligand affinities.Nature Machine Intelligence, 5:1427–1436, 2023
work page 2023
-
[24]
DeepDTA: deep drug–target binding affinity prediction.Bioinformatics, 34(17):i821–i829, 2018
Hakime Öztürk, Arzucan Özgür, and Elif Ozkirimli. DeepDTA: deep drug–target binding affinity prediction.Bioinformatics, 34(17):i821–i829, 2018
work page 2018
-
[25]
Cambridge University Press, 2nd edition, 2009
Judea Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition, 2009
work page 2009
-
[26]
Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society: Series B, 78(5):947–1012, 2016
work page 2016
-
[27]
Computational optimal transport.Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019
Gabriel Peyré and Marco Cuturi. Computational optimal transport.Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019
work page 2019
-
[28]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should I trust you?”: explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, 2016
work page 2016
-
[29]
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: visualising image classification models and saliency maps.arXiv preprint arXiv:1312.6034, 2014. 13
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[30]
Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al
Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. Beyond the imitation game: quantifying and extrapolating the capabilities of language models.Transactions on Machine Learning Research, 2023
work page 2023
-
[31]
Axiomatic attribution for deep networks
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InProceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3319–3328, 2017
work page 2017
-
[32]
Isaac: Auditing causal reasoning in deep models for drug-target interaction, 2026
Barbara Tarantino, Sun Kim, Yijingxiu Lu, and Paolo Giudici. Isaac: Auditing causal reasoning in deep models for drug-target interaction, 2026
work page 2026
-
[33]
Derek van Tilborg, Alisa Alenicheva, and Francesca Grisoni. Exposing the limitations of molecular machine learning with activity cliffs.Journal of Chemical Information and Modeling, 62(23):5938–5951, 2022
work page 2022
-
[34]
Cédric Villani.Optimal Transport: Old and New. Springer, Berlin, 2009
work page 2009
-
[35]
Izhar Wallach and Abraham Heifets. Most ligand-based classification benchmarks reward memorization rather than generalization.Journal of Chemical Information and Modeling, 58(5):916–932, 2018
work page 2018
-
[36]
Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Ryan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, and Bo Li. DecodingTrust: a comprehensive assessment of trustworthiness in GPT models. In Advances in Neural Informati...
work page 2023
-
[37]
Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S
Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. MoleculeNet: a benchmark for molecular machine learning.Chemical Science, 9(2):513–530, 2018
work page 2018
-
[38]
Nori, Rishabh Sharma, Abhay Sharma, and Javier González
Xiao Xu, Robert Lawrence, Kumar Dubey, Ayush Pandey, Ryo Ueno, Fabian Falck, Aditya V. Nori, Rishabh Sharma, Abhay Sharma, and Javier González. RE-IMAGINE: symbolic benchmark synthesis for reasoning evaluation. InProceedings of the 42nd International Conference on Machine Learning, Proceedings of Machine Learning Research, 2025
work page 2025
-
[39]
Veridical data science.Proceedings of the National Academy of Sciences, 117(8):3920–3929, 2020
Bin Yu and Karl Kumbier. Veridical data science.Proceedings of the National Academy of Sciences, 117(8):3920–3929, 2020
work page 2020
-
[40]
LLMScan: causal scan for LLM misbehavior detection
Meng Zhang, Keng Kiat Goh, Ping Zhang, Jingwei Sun, Ronald Lok Xin, and Huan Zhang. LLMScan: causal scan for LLM misbehavior detection. InProceedings of the 42nd International Conference on Machine Learning, Proceedings of Machine Learning Research, 2025. 14 A Appendix In this appendix we report the missing proof and all the technical discussion omitted f...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.