I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models

Barbara Tarantino; Gennaro Auricchio; Paolo Giudici

arxiv: 2605.21731 · v1 · pith:4VVZE3IUnew · submitted 2026-05-20 · 💻 cs.LG

I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models

Barbara Tarantino , Gennaro Auricchio , Paolo Giudici This is my paper

Pith reviewed 2026-05-22 09:52 UTC · model grok-4.3

classification 💻 cs.LG

keywords scientific AI auditingWasserstein distancedrug-target interactiondistributional coherencestructural perturbationspost-hoc evaluationmodel interpretability

0 comments

The pith

I-SAFE auditing reveals different distributional profiles in DTI models with similar accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the I-SAFE framework to audit black-box scientific AI models by measuring coherence of their output distributions under perturbations guided by an external structural prior. It defines three metrics: a quantile-based measure for location shifts, the Wasserstein Coherence Metric for ordinal consistency, and a translation-invariant version for distributional shape. The approach matters because benchmark accuracy alone cannot distinguish models that capture domain-relevant structure from those that exploit shortcuts or biases. When applied to three sequence-based drug-target interaction models on the Davis benchmark, the audit detects substantially different response profiles that accuracy scores do not reveal.

Core claim

Given a trained black-box predictor and an external structural prior encoding domain knowledge about task-relevant input structure, I-SAFE evaluates raw model outputs under structurally guided perturbations of the input. The proposed audit measures output-distribution coherence through three complementary metrics: a Quantile-Based Metric for location-level coherence, the Wasserstein Coherence Metric for ordinal coherence, and a translation-invariant WCM variant for shape coherence. Instantiated on drug-target interaction prediction using the Davis kinase benchmark, KLIFS binding-pocket annotations, and three models, the framework shows that models with comparable predictive performance can,

What carries the argument

Wasserstein Coherence Metric that quantifies ordinal and shape coherence of model output distributions under perturbations derived from the external structural prior.

If this is right

Models can be compared and selected according to structural coherence in addition to predictive accuracy.
The audit can identify reliance on dataset-specific regularities rather than domain-relevant features.
The framework applies directly to any scientific prediction task where inputs admit structured decomposition and an external prior exists.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Combining coherence scores with standard accuracy could produce a joint ranking criterion for model deployment in scientific settings.
Low coherence on specific perturbation types might guide targeted data collection or architecture adjustments.

Load-bearing premise

The external structural prior accurately encodes task-relevant input structure that can be used to generate meaningful perturbations for the audit.

What would settle it

Running the three coherence metrics on the same set of KLIFS-guided perturbations and finding that the three DTI models produce identical or statistically indistinguishable distributional response profiles.

Figures

Figures reproduced from arXiv: 2605.21731 by Barbara Tarantino, Gennaro Auricchio, Paolo Giudici.

**Figure 1.** Figure 1: I-SAFE prior-relative coherence contrasts on the Davis benchmark: ∆QBM (a), ∆WCM (b), and ∆TI-WCM (c), computed as spurious minus mechanistic coherence. The dashed line marks no differential coherence; positive values indicate greater coherence under mechanistic perturbations. Error bars denote 95 % confidence intervals across five seeds. and ∆WCM “ ´0.013 (r´0.057, 0.031s), showing no comparable prior-ali… view at source ↗

read the original abstract

Deep learning models are increasingly used in scientific prediction tasks where strong benchmark performance is often interpreted as evidence of scientifically meaningful behavior. This interpretation is fragile, as models may exploit shortcut features, dataset-specific regularities, or distributional biases that are predictive on held-out data but not aligned with domain-relevant structure. To address this limitation, we introduce the \textsc{I-SAFE} (Interventional Secure, Accurate, Fair and Explainable) framework, a post-hoc distributional auditing framework for scientific AI models centered on the Wasserstein Coherence Metric (WCM). Given a trained black-box predictor and an external structural prior encoding domain knowledge about task-relevant input structure, \textsc{I-SAFE} evaluates raw model outputs under structurally guided perturbations of the input. The proposed audit measures output-distribution coherence through three complementary metrics: a Quantile-Based Metric (QBM) for location-level coherence, the WCM for ordinal coherence, and a translation-invariant WCM variant for shape coherence. We instantiate \textsc{I-SAFE} on drug--target interaction (DTI) prediction using the Davis kinase benchmark, KLIFS (Kinase--Ligand Interaction Fingerprints and Structures) binding-pocket annotations, and three sequence-based DTI models: DeepConvDTI, DeepDTA, and TAPB. Although the models operate in a comparable predictive regime, \textsc{I-SAFE} reveals substantially different distributional response profiles, a distinction invisible to accuracy-based evaluation. The framework is model-agnostic and applicable to any domain where inputs admit a structured decomposition and an external prior is available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

I-SAFE gives a workable post-hoc audit for DTI models via Wasserstein metrics on KLIFS-guided perturbations, but the claim that it isolates real structural coherence rather than architecture-specific shortcuts still needs tighter controls.

read the letter

The main takeaway is that this paper offers a concrete way to go beyond accuracy numbers when checking whether drug-target models are actually picking up on binding-pocket structure. They define I-SAFE around three metrics—quantile-based location coherence, the Wasserstein Coherence Metric for ordinal behavior, and a translation-invariant version for shape—and apply them to DeepConvDTI, DeepDTA, and TAPB on the Davis benchmark with KLIFS annotations. The reported result is that the models produce visibly different output distributions under the same perturbations even though their predictive scores look similar. That observation is useful on its face because it shows accuracy alone can mask differences in how models respond to domain-relevant changes.

Referee Report

2 major / 2 minor

Summary. The paper introduces the I-SAFE framework, a post-hoc auditing method for scientific AI models that applies Wasserstein Coherence Metrics (WCM) and a Quantile-Based Metric (QBM) to evaluate output-distribution coherence under perturbations generated from an external structural prior (KLIFS binding-pocket annotations). It demonstrates the approach on three sequence-based drug-target interaction models (DeepConvDTI, DeepDTA, TAPB) trained on the Davis kinase benchmark, claiming that the models exhibit substantially different distributional response profiles despite comparable predictive accuracy.

Significance. If the central claims hold, I-SAFE offers a model-agnostic tool for detecting misalignment between model behavior and domain-relevant structure that standard accuracy metrics miss. The use of Wasserstein distances for ordinal and shape coherence, combined with an external prior, provides a concrete way to audit shortcut exploitation in scientific prediction tasks.

major comments (2)

[§3.2] §3.2 (Perturbation Generation): The paper does not report whether KLIFS-guided perturbations preserve marginal input statistics such as amino-acid composition or sequence-length distribution across the three models. Without this check, the observed differences in QBM/WCM profiles could reflect architecture-specific sensitivity to any structured input change rather than genuine misalignment with binding-pocket structure, directly undermining the central claim that I-SAFE isolates task-relevant structural coherence.
[§4.3] §4.3 (Results and Comparison): The claim that accuracy-based evaluation is 'invisible' to the distinctions found by I-SAFE requires explicit quantification of how much of the WCM/QBM separation is explained by residual correlations with non-structural features; the current presentation leaves open the possibility that the metrics are re-detecting known architecture differences rather than new scientific misalignment.

minor comments (2)

[§2.3] The definition of the translation-invariant WCM variant should include an explicit equation showing how translation invariance is enforced, to allow readers to verify it does not inadvertently remove shape information relevant to the audit.
[Figure 2] Figure 2 (Distributional response profiles): Axis labels and legend entries are too small for readability; increase font size and add a brief caption explaining the color coding for the three models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. These observations help clarify how to better isolate the contribution of structural priors in the I-SAFE framework. We respond to each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [§3.2] §3.2 (Perturbation Generation): The paper does not report whether KLIFS-guided perturbations preserve marginal input statistics such as amino-acid composition or sequence-length distribution across the three models. Without this check, the observed differences in QBM/WCM profiles could reflect architecture-specific sensitivity to any structured input change rather than genuine misalignment with binding-pocket structure, directly undermining the central claim that I-SAFE isolates task-relevant structural coherence.

Authors: We agree that an explicit check on marginal input statistics would strengthen the interpretation. In the revised manuscript we will add a supplementary table and brief analysis comparing amino-acid composition and sequence-length distributions between the original sequences and the KLIFS-guided perturbations for each of the three models. The perturbations are constructed by targeted residue substitutions within the binding-pocket regions annotated by KLIFS; because the changes are localized and the overall sequence length is unchanged, we expect the marginals to remain largely preserved. Including this verification will directly address the concern that the observed coherence differences could arise from generic sensitivity to any input modification. revision: yes
Referee: [§4.3] §4.3 (Results and Comparison): The claim that accuracy-based evaluation is 'invisible' to the distinctions found by I-SAFE requires explicit quantification of how much of the WCM/QBM separation is explained by residual correlations with non-structural features; the current presentation leaves open the possibility that the metrics are re-detecting known architecture differences rather than new scientific misalignment.

Authors: We acknowledge that a quantitative separation from non-structural factors would make the claim more robust. In the revision we will add a short analysis (new panel or appendix) that reports partial correlations and a simple regression of the WCM/QBM scores against a set of non-structural covariates (model depth, embedding dimension, and basic sequence statistics). This will allow readers to see the fraction of metric separation that remains after controlling for these factors. We maintain that the primary distinction arises from differential sensitivity to the KLIFS structural prior, but the added quantification will clarify the extent to which architecture-specific traits contribute. revision: yes

Circularity Check

0 steps flagged

No circularity: I-SAFE metrics defined directly from external prior and Wasserstein distances

full rationale

The paper defines the Quantile-Based Metric (QBM), Wasserstein Coherence Metric (WCM), and its translation-invariant variant explicitly as functions of raw model outputs under perturbations generated from the independent KLIFS binding-pocket annotations. These definitions rely on standard Wasserstein distance applied to the resulting output distributions and do not reduce to fitted parameters, self-referential quantities, or prior results by the same authors. The central empirical claim—that the three DTI models exhibit distinct distributional response profiles despite comparable accuracy—is an observation obtained by applying the externally defined metrics, not a tautology. No self-citation chains, uniqueness theorems, or smuggled ansatzes appear in the load-bearing steps of the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based on abstract only; framework rests on domain assumption that structural priors are valid and introduces new metrics without explicit free parameters or invented physical entities.

axioms (1)

domain assumption External structural prior encodes domain knowledge about task-relevant input structure
Invoked when using KLIFS annotations to guide perturbations in the I-SAFE audit.

invented entities (1)

Wasserstein Coherence Metric (WCM) no independent evidence
purpose: Quantify ordinal and shape coherence of model output distributions under structural perturbations
New metric family introduced as core of the auditing framework.

pith-pipeline@v0.9.0 · 5818 in / 1285 out tokens · 63581 ms · 2026-05-22T09:52:51.572309+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Wasserstein Coherence Metric (WCM) defined via optimal transport reordering of output profiles under mechanistic vs spurious perturbations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 3 internal anchors

[1]

Goodfellow, Moritz Hardt, and Been Kim

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian J. Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. InAdvances in Neural Information Processing 11 Systems, volume 31, pages 9525–9536, 2018

work page 2018
[2]

T. W. Anderson. On the distribution of the two-sample Cramér–von Mises criterion.The Annals of Mathematical Statistics, 33(3):1148–1159, 1962

work page 1962
[3]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[4]

On rank graduation metrics for high-dimensional ordinal data.Mathematical Models and Methods in Applied Sciences, pages 1–35, 2026

Gennaro Auricchio, Adelaide Emma Bernardelli, Paolo Giudici, and Giuseppe Toscani. On rank graduation metrics for high-dimensional ordinal data.Mathematical Models and Methods in Applied Sciences, pages 1–35, 2026

work page 2026
[5]

The equivalence of fourier-based and wasserstein metrics on imaging problems

Gennaro Auricchio, Andrea Codegoni, Stefano Gualandi, Giuseppe Toscani, and Marco Veneroni. The equivalence of fourier-based and wasserstein metrics on imaging problems. Rendiconti Lincei, 31(3):627–649, 2020

work page 2020
[6]

A rank graduation box for safe ai.Expert systems with applications, 259:125239, 2025

Golnoosh Babaei, Paolo Giudici, and Emanuela Raffinetti. A rank graduation box for safe ai.Expert systems with applications, 259:125239, 2025

work page 2025
[7]

On the composition of elementary errors.Scandinavian Actuarial Journal, 1928(1):13–74, 1928

Harald Cramér. On the composition of elementary errors.Scandinavian Actuarial Journal, 1928(1):13–74, 1928

work page 1928
[8]

Davis, Jeremy P

Mindy I. Davis, Jeremy P. Hunt, Sanna Herrgard, Pietro Ciceri, Lisa M. Wodicka, Gabriel Pallares, Michael Hocker, Daniel K. Treiber, and Patrick P. Zarrinkar. Comprehensive analysis of kinase inhibitor selectivity.Nature Biotechnology, 29(11):1046–1051, 2011

work page 2011
[9]

Towards A Rigorous Science of Interpretable Machine Learning

Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning.arXiv preprint arXiv:1702.08608, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

Causal abstractions of neural networks

Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. Causal abstractions of neural networks. InAdvances in Neural Information Processing Systems, volume 34, pages 9574–9586, 2021

work page 2021
[11]

Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, and Noah D. Goodman. Finding alignments between interpretable causal variables and distributed neural repre- sentations. InProceedings of the Third Conference on Causal Learning and Reasoning, volume 236 ofProceedings of Machine Learning Research, pages 160–187, 2024

work page 2024
[12]

Zemel, Wieland Brendel, Matthias Bethge, and Felix A

Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard S. Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. Shortcut learning in deep neural networks.Nature Machine Intelligence, 2:665–673, 2020

work page 2020
[13]

Resolving data bias improves generalization in binding affinity prediction.Nature Machine Intelligence, 7(10):1713–1725, 2025

Dennis Graber, Patrick Stockinger, Fabian Meyer, Siddharth Mishra, Christopher Horn, and Rebecca Buller. Resolving data bias improves generalization in binding affinity prediction.Nature Machine Intelligence, 7(10):1713–1725, 2025

work page 2025
[14]

Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik

Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik. Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. In Advances in Neural Information Processing Systems, 2021. Datasets and Benchmarks Track

work page 2021
[15]

Adversarial examples are not bugs, they are features

Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Mądry. Adversarial examples are not bugs, they are features. InAdvances in Neural Information Processing Systems, volume 32, 2019. 12

work page 2019
[16]

Kanev, Chris de Graaf, Bart A

Georgi K. Kanev, Chris de Graaf, Bart A. Westerman, Iwan J. P. de Esch, and Albert J. Kooistra. KLIFS: an overhaul after the first 5 years of supporting kinase research.Nucleic Acids Research, 49(D1):D562–D569, 2021

work page 2021
[17]

Kooistra, Georgi K

Albert J. Kooistra, Georgi K. Kanev, Oscar P. J. van Linden, Rob Leurs, Iwan J. P. de Esch, and Chris de Graaf. KLIFS: a structural kinase–ligand interaction database. Nucleic Acids Research, 44(D1):D365–D371, 2016

work page 2016
[18]

DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences.PLOS Computational Biology, 15(6):e1007129, 2019

Ingoo Lee, Jongsoo Keum, and Hojung Nam. DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences.PLOS Computational Biology, 15(6):e1007129, 2019

work page 2019
[19]

Cosgrove, Christopher D

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian D. Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zeiler, Dan Jurafsky, Tatsunori Hashimoto, Peter Hende...

work page 2023
[20]

TAPB: an interventional debiasing framework for alleviating target prior bias in drug–target interaction prediction.Nature Communications, 16:10867, 2025

Guanxing Lin, Xinyi Zhang, Zhen Ren, Quan Zou, Prayag Tiwari, Cheng Zhou, and Yi Ding. TAPB: an interventional debiasing framework for alleviating target prior bias in drug–target interaction prediction.Nature Communications, 16:10867, 2025

work page 2025
[21]

Predicting cellular responses to complex perturbations in high-throughput screens.Molecular Systems Biology, 19:e11517, 2023

Mohammad Lotfollahi, Anna Klimovskaia Susmelj, Carlo De Donno, et al. Predicting cellular responses to complex perturbations in high-throughput screens.Molecular Systems Biology, 19:e11517, 2023

work page 2023
[22]

Lundberg and Su-In Lee

Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InAdvances in Neural Information Processing Systems, volume 30, pages 4766–4777, 2017

work page 2017
[23]

Learning characteristics of graph neural networks predicting protein–ligand affinities.Nature Machine Intelligence, 5:1427–1436, 2023

Andrea Mastropietro, Giuseppe Pasculli, and Jürgen Bajorath. Learning characteristics of graph neural networks predicting protein–ligand affinities.Nature Machine Intelligence, 5:1427–1436, 2023

work page 2023
[24]

DeepDTA: deep drug–target binding affinity prediction.Bioinformatics, 34(17):i821–i829, 2018

Hakime Öztürk, Arzucan Özgür, and Elif Ozkirimli. DeepDTA: deep drug–target binding affinity prediction.Bioinformatics, 34(17):i821–i829, 2018

work page 2018
[25]

Cambridge University Press, 2nd edition, 2009

Judea Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition, 2009

work page 2009
[26]

Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society: Series B, 78(5):947–1012, 2016

Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society: Series B, 78(5):947–1012, 2016

work page 2016
[27]

Computational optimal transport.Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019

Gabriel Peyré and Marco Cuturi. Computational optimal transport.Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019

work page 2019
[28]

Why should I trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should I trust you?”: explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, 2016

work page 2016
[29]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: visualising image classification models and saliency maps.arXiv preprint arXiv:1312.6034, 2014. 13

work page internal anchor Pith review Pith/arXiv arXiv 2014
[30]

Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. Beyond the imitation game: quantifying and extrapolating the capabilities of language models.Transactions on Machine Learning Research, 2023

work page 2023
[31]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InProceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3319–3328, 2017

work page 2017
[32]

Isaac: Auditing causal reasoning in deep models for drug-target interaction, 2026

Barbara Tarantino, Sun Kim, Yijingxiu Lu, and Paolo Giudici. Isaac: Auditing causal reasoning in deep models for drug-target interaction, 2026

work page 2026
[33]

Exposing the limitations of molecular machine learning with activity cliffs.Journal of Chemical Information and Modeling, 62(23):5938–5951, 2022

Derek van Tilborg, Alisa Alenicheva, and Francesca Grisoni. Exposing the limitations of molecular machine learning with activity cliffs.Journal of Chemical Information and Modeling, 62(23):5938–5951, 2022

work page 2022
[34]

Springer, Berlin, 2009

Cédric Villani.Optimal Transport: Old and New. Springer, Berlin, 2009

work page 2009
[35]

Most ligand-based classification benchmarks reward memorization rather than generalization.Journal of Chemical Information and Modeling, 58(5):916–932, 2018

Izhar Wallach and Abraham Heifets. Most ligand-based classification benchmarks reward memorization rather than generalization.Journal of Chemical Information and Modeling, 58(5):916–932, 2018

work page 2018
[36]

Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, and Bo Li

Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Ryan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, and Bo Li. DecodingTrust: a comprehensive assessment of trustworthiness in GPT models. In Advances in Neural Informati...

work page 2023
[37]

Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S

Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. MoleculeNet: a benchmark for molecular machine learning.Chemical Science, 9(2):513–530, 2018

work page 2018
[38]

Nori, Rishabh Sharma, Abhay Sharma, and Javier González

Xiao Xu, Robert Lawrence, Kumar Dubey, Ayush Pandey, Ryo Ueno, Fabian Falck, Aditya V. Nori, Rishabh Sharma, Abhay Sharma, and Javier González. RE-IMAGINE: symbolic benchmark synthesis for reasoning evaluation. InProceedings of the 42nd International Conference on Machine Learning, Proceedings of Machine Learning Research, 2025

work page 2025
[39]

Veridical data science.Proceedings of the National Academy of Sciences, 117(8):3920–3929, 2020

Bin Yu and Karl Kumbier. Veridical data science.Proceedings of the National Academy of Sciences, 117(8):3920–3929, 2020

work page 2020
[40]

LLMScan: causal scan for LLM misbehavior detection

Meng Zhang, Keng Kiat Goh, Ping Zhang, Jingwei Sun, Ronald Lok Xin, and Huan Zhang. LLMScan: causal scan for LLM misbehavior detection. InProceedings of the 42nd International Conference on Machine Learning, Proceedings of Machine Learning Research, 2025. 14 A Appendix In this appendix we report the missing proof and all the technical discussion omitted f...

work page 2025

[1] [1]

Goodfellow, Moritz Hardt, and Been Kim

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian J. Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. InAdvances in Neural Information Processing 11 Systems, volume 31, pages 9525–9536, 2018

work page 2018

[2] [2]

T. W. Anderson. On the distribution of the two-sample Cramér–von Mises criterion.The Annals of Mathematical Statistics, 33(3):1148–1159, 1962

work page 1962

[3] [3]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[4] [4]

On rank graduation metrics for high-dimensional ordinal data.Mathematical Models and Methods in Applied Sciences, pages 1–35, 2026

Gennaro Auricchio, Adelaide Emma Bernardelli, Paolo Giudici, and Giuseppe Toscani. On rank graduation metrics for high-dimensional ordinal data.Mathematical Models and Methods in Applied Sciences, pages 1–35, 2026

work page 2026

[5] [5]

The equivalence of fourier-based and wasserstein metrics on imaging problems

Gennaro Auricchio, Andrea Codegoni, Stefano Gualandi, Giuseppe Toscani, and Marco Veneroni. The equivalence of fourier-based and wasserstein metrics on imaging problems. Rendiconti Lincei, 31(3):627–649, 2020

work page 2020

[6] [6]

A rank graduation box for safe ai.Expert systems with applications, 259:125239, 2025

Golnoosh Babaei, Paolo Giudici, and Emanuela Raffinetti. A rank graduation box for safe ai.Expert systems with applications, 259:125239, 2025

work page 2025

[7] [7]

On the composition of elementary errors.Scandinavian Actuarial Journal, 1928(1):13–74, 1928

Harald Cramér. On the composition of elementary errors.Scandinavian Actuarial Journal, 1928(1):13–74, 1928

work page 1928

[8] [8]

Davis, Jeremy P

Mindy I. Davis, Jeremy P. Hunt, Sanna Herrgard, Pietro Ciceri, Lisa M. Wodicka, Gabriel Pallares, Michael Hocker, Daniel K. Treiber, and Patrick P. Zarrinkar. Comprehensive analysis of kinase inhibitor selectivity.Nature Biotechnology, 29(11):1046–1051, 2011

work page 2011

[9] [9]

Towards A Rigorous Science of Interpretable Machine Learning

Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning.arXiv preprint arXiv:1702.08608, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

Causal abstractions of neural networks

Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. Causal abstractions of neural networks. InAdvances in Neural Information Processing Systems, volume 34, pages 9574–9586, 2021

work page 2021

[11] [11]

Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, and Noah D. Goodman. Finding alignments between interpretable causal variables and distributed neural repre- sentations. InProceedings of the Third Conference on Causal Learning and Reasoning, volume 236 ofProceedings of Machine Learning Research, pages 160–187, 2024

work page 2024

[12] [12]

Zemel, Wieland Brendel, Matthias Bethge, and Felix A

Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard S. Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. Shortcut learning in deep neural networks.Nature Machine Intelligence, 2:665–673, 2020

work page 2020

[13] [13]

Resolving data bias improves generalization in binding affinity prediction.Nature Machine Intelligence, 7(10):1713–1725, 2025

Dennis Graber, Patrick Stockinger, Fabian Meyer, Siddharth Mishra, Christopher Horn, and Rebecca Buller. Resolving data bias improves generalization in binding affinity prediction.Nature Machine Intelligence, 7(10):1713–1725, 2025

work page 2025

[14] [14]

Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik

Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik. Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. In Advances in Neural Information Processing Systems, 2021. Datasets and Benchmarks Track

work page 2021

[15] [15]

Adversarial examples are not bugs, they are features

Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Mądry. Adversarial examples are not bugs, they are features. InAdvances in Neural Information Processing Systems, volume 32, 2019. 12

work page 2019

[16] [16]

Kanev, Chris de Graaf, Bart A

Georgi K. Kanev, Chris de Graaf, Bart A. Westerman, Iwan J. P. de Esch, and Albert J. Kooistra. KLIFS: an overhaul after the first 5 years of supporting kinase research.Nucleic Acids Research, 49(D1):D562–D569, 2021

work page 2021

[17] [17]

Kooistra, Georgi K

Albert J. Kooistra, Georgi K. Kanev, Oscar P. J. van Linden, Rob Leurs, Iwan J. P. de Esch, and Chris de Graaf. KLIFS: a structural kinase–ligand interaction database. Nucleic Acids Research, 44(D1):D365–D371, 2016

work page 2016

[18] [18]

DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences.PLOS Computational Biology, 15(6):e1007129, 2019

Ingoo Lee, Jongsoo Keum, and Hojung Nam. DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences.PLOS Computational Biology, 15(6):e1007129, 2019

work page 2019

[19] [19]

Cosgrove, Christopher D

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian D. Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zeiler, Dan Jurafsky, Tatsunori Hashimoto, Peter Hende...

work page 2023

[20] [20]

TAPB: an interventional debiasing framework for alleviating target prior bias in drug–target interaction prediction.Nature Communications, 16:10867, 2025

Guanxing Lin, Xinyi Zhang, Zhen Ren, Quan Zou, Prayag Tiwari, Cheng Zhou, and Yi Ding. TAPB: an interventional debiasing framework for alleviating target prior bias in drug–target interaction prediction.Nature Communications, 16:10867, 2025

work page 2025

[21] [21]

Predicting cellular responses to complex perturbations in high-throughput screens.Molecular Systems Biology, 19:e11517, 2023

Mohammad Lotfollahi, Anna Klimovskaia Susmelj, Carlo De Donno, et al. Predicting cellular responses to complex perturbations in high-throughput screens.Molecular Systems Biology, 19:e11517, 2023

work page 2023

[22] [22]

Lundberg and Su-In Lee

Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InAdvances in Neural Information Processing Systems, volume 30, pages 4766–4777, 2017

work page 2017

[23] [23]

Learning characteristics of graph neural networks predicting protein–ligand affinities.Nature Machine Intelligence, 5:1427–1436, 2023

Andrea Mastropietro, Giuseppe Pasculli, and Jürgen Bajorath. Learning characteristics of graph neural networks predicting protein–ligand affinities.Nature Machine Intelligence, 5:1427–1436, 2023

work page 2023

[24] [24]

DeepDTA: deep drug–target binding affinity prediction.Bioinformatics, 34(17):i821–i829, 2018

Hakime Öztürk, Arzucan Özgür, and Elif Ozkirimli. DeepDTA: deep drug–target binding affinity prediction.Bioinformatics, 34(17):i821–i829, 2018

work page 2018

[25] [25]

Cambridge University Press, 2nd edition, 2009

Judea Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition, 2009

work page 2009

[26] [26]

Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society: Series B, 78(5):947–1012, 2016

Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals.Journal of the Royal Statistical Society: Series B, 78(5):947–1012, 2016

work page 2016

[27] [27]

Computational optimal transport.Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019

Gabriel Peyré and Marco Cuturi. Computational optimal transport.Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019

work page 2019

[28] [28]

Why should I trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should I trust you?”: explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, 2016

work page 2016

[29] [29]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: visualising image classification models and saliency maps.arXiv preprint arXiv:1312.6034, 2014. 13

work page internal anchor Pith review Pith/arXiv arXiv 2014

[30] [30]

Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. Beyond the imitation game: quantifying and extrapolating the capabilities of language models.Transactions on Machine Learning Research, 2023

work page 2023

[31] [31]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InProceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3319–3328, 2017

work page 2017

[32] [32]

Isaac: Auditing causal reasoning in deep models for drug-target interaction, 2026

Barbara Tarantino, Sun Kim, Yijingxiu Lu, and Paolo Giudici. Isaac: Auditing causal reasoning in deep models for drug-target interaction, 2026

work page 2026

[33] [33]

Exposing the limitations of molecular machine learning with activity cliffs.Journal of Chemical Information and Modeling, 62(23):5938–5951, 2022

Derek van Tilborg, Alisa Alenicheva, and Francesca Grisoni. Exposing the limitations of molecular machine learning with activity cliffs.Journal of Chemical Information and Modeling, 62(23):5938–5951, 2022

work page 2022

[34] [34]

Springer, Berlin, 2009

Cédric Villani.Optimal Transport: Old and New. Springer, Berlin, 2009

work page 2009

[35] [35]

Most ligand-based classification benchmarks reward memorization rather than generalization.Journal of Chemical Information and Modeling, 58(5):916–932, 2018

Izhar Wallach and Abraham Heifets. Most ligand-based classification benchmarks reward memorization rather than generalization.Journal of Chemical Information and Modeling, 58(5):916–932, 2018

work page 2018

[36] [36]

Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, and Bo Li

Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Ryan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, and Bo Li. DecodingTrust: a comprehensive assessment of trustworthiness in GPT models. In Advances in Neural Informati...

work page 2023

[37] [37]

Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S

Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. MoleculeNet: a benchmark for molecular machine learning.Chemical Science, 9(2):513–530, 2018

work page 2018

[38] [38]

Nori, Rishabh Sharma, Abhay Sharma, and Javier González

Xiao Xu, Robert Lawrence, Kumar Dubey, Ayush Pandey, Ryo Ueno, Fabian Falck, Aditya V. Nori, Rishabh Sharma, Abhay Sharma, and Javier González. RE-IMAGINE: symbolic benchmark synthesis for reasoning evaluation. InProceedings of the 42nd International Conference on Machine Learning, Proceedings of Machine Learning Research, 2025

work page 2025

[39] [39]

Veridical data science.Proceedings of the National Academy of Sciences, 117(8):3920–3929, 2020

Bin Yu and Karl Kumbier. Veridical data science.Proceedings of the National Academy of Sciences, 117(8):3920–3929, 2020

work page 2020

[40] [40]

LLMScan: causal scan for LLM misbehavior detection

Meng Zhang, Keng Kiat Goh, Ping Zhang, Jingwei Sun, Ronald Lok Xin, and Huan Zhang. LLMScan: causal scan for LLM misbehavior detection. InProceedings of the 42nd International Conference on Machine Learning, Proceedings of Machine Learning Research, 2025. 14 A Appendix In this appendix we report the missing proof and all the technical discussion omitted f...

work page 2025