Position: Genomic Model Research Must Move Beyond Anecdotal Evaluation of Interpretability Methods

Ke Li; Mingyu Huang; Shasha Zhou

arxiv: 2606.07607 · v1 · pith:6AXMNM2Vnew · submitted 2026-05-29 · 💻 cs.LG · q-bio.GN

Position: Genomic Model Research Must Move Beyond Anecdotal Evaluation of Interpretability Methods

Shasha Zhou , Mingyu Huang , Ke Li This is my paper

Pith reviewed 2026-06-28 23:14 UTC · model grok-4.3

classification 💻 cs.LG q-bio.GN

keywords interpretabilitygenomicsmachine learningtranscription factor bindingbenchmarkingvalidationexplainable AI

0 comments

The pith

Genomic interpretability research relies on anecdotal success stories that different methods often contradict.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that current genomic machine learning papers typically pick one interpretability method and report only cases where it produces plausible-looking results. A benchmarking experiment on transcription factor binding models reveals that multiple common methods frequently disagree on the same predictions, fail to recover known regulatory motifs, and do not match the model's actual decision process. The authors therefore call for evaluation practices modeled on clinical trials, with requirements for consistency checks, faithfulness tests, and biological validation instead of isolated plausibility claims. They outline a tiered reporting framework to enforce this shift.

Core claim

The central claim is that anecdotal validation of interpretability methods in genomics is unreliable because the same model predictions can receive contradictory explanations from different methods, those explanations often miss known sequence motifs, and they do not faithfully track the model's internal computations; therefore genomic IML work must adopt systematic assessment of consistency, faithfulness, and biological validity.

What carries the argument

A benchmarking study on transcription factor binding that compares multiple IML methods on the same models and measures agreement, motif recovery, and faithfulness to model behavior.

Load-bearing premise

The assumption that the inconsistencies observed in one transcription-factor-binding benchmark will appear across other genomic tasks, model architectures, and datasets.

What would settle it

A replication study that applies the same set of IML methods to multiple independent genomic datasets and finds high agreement on explanations plus reliable recovery of known motifs in the great majority of cases.

Figures

Figures reproduced from arXiv: 2606.07607 by Ke Li, Mingyu Huang, Shasha Zhou.

**Figure 1.** Figure 1: Systematic mapping reveals a reliance on anecdotal evaluation practices. (a) Distribution of the number of IML methods employed per study (n = 3, 575, the same for all panels). (b) Breakdown of validation strategies. (c) Frequency of validated interpretation instances per paper. (d) Reporting of failure modes. between coverage and relevance. ( ("interpretability" OR "explainability" OR "explainable AI" OR … view at source ↗

**Figure 2.** Figure 2: Different IML methods produce inconsistent explanations. (a) Attribution maps from five IML methods on the same CTCF sequence (NTv3 model), showing strikingly different patterns. (b) Mean Spearman rank correlation coefficients between method pairs across all models and datasets. (c) Mean Jaccard similarity of the top-20 attributed positions between method pairs. rank correlation between method pairs is c… view at source ↗

**Figure 3.** Figure 3: Faithfulness evaluation via perturbation analysis. (a) Sequential deletion (MoRF): prediction probability as top-ranked positions are progressively masked. (b) Sequential insertion: probability recovery as positions are restored to a neutral baseline. Curves show means with shaded standard errors on CTCF (NTv3). (c, d) Distribution of mean AUC scores across all sequences for deletion (c) and insertion (d) … view at source ↗

**Figure 4.** Figure 4: Alignment between IML explanations and biological ground truth. (a) Distribution of motif overlap scores (Perception) across TFs and IML methods on NTv3. Higher values indicate better alignment with UniBind-annotated binding sites. (b, c) Representative examples showing attribution profiles (colored curves) relative to ground-truth motif regions (white background): (b) successful alignment where attributio… view at source ↗

**Figure 5.** Figure 5: Per-model and per-TF Spearman rank correlation between IML method pairs. Each subplot reports the pairwise Spearman correlation matrix over the five interpretability methods (DeepLIFT, IG, ISM, LIME, CJ) for one combination of foundation model (DNABERT-2, HyenaDNA, NTv3) and transcription factor (CTCF, MAX, SP1, TBP, GATA1). NTv3 HyenaDNA DNABERT CTCF MAX SP1 TBP GATA1 0.937 0.025 0.025 0.022 0.023 0.316 0… view at source ↗

**Figure 6.** Figure 6: Per-model and per-TF top-20 Jaccard similarity between IML method pairs. Each subplot reports the pairwise Jaccard similarity of the top-20 attributed positions over the five interpretability methods for one combination of foundation model (DNABERT-2, HyenaDNA, NTv3) and transcription factor (CTCF, MAX, SP1, TBP, GATA1). 23 [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: , [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Faithfulness AUC distributions on HyenaDNA. Distribution of mean AUC scores across all test sequences, stratified by transcription factor (CTCF, MAX, SP1, TBP, GATA1) and IML method, for (a) the deletion experiment (lower AUC indicates better faithfulness) and (b) the insertion experiment (higher AUC indicates better faithfulness). DeepLIFT IG ISM LIME CJ AUC 0 0.5 1.0 Transcription Factors CTCF MAX SP1 TB… view at source ↗

**Figure 9.** Figure 9: Faithfulness AUC distributions on NTv3. Distribution of mean AUC scores across all test sequences, stratified by transcription factor (CTCF, MAX, SP1, TBP, GATA1) and IML method, for (a) the deletion experiment (lower AUC indicates better faithfulness) and (b) the insertion experiment (higher AUC indicates better faithfulness). 25 [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗

**Figure 10.** Figure 10: reports the per-TF and per-IML method Perception scores on DNABERT-2, HyenaDNA, and NTv3. Perception 0 0.5 1.0 Transcription Factors CTCF MAX SP1 TBP GATA1 DeepLIFT IG ISM LIME Perception 0 0.5 1.0 Transcription Factors CTCF MAX SP1 TBP GATA1 Perception 0 0.5 1.0 Transcription Factors CTCF MAX SP1 TBP GATA1 a b c [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

read the original abstract

Advances in machine learning and computational power have unlocked the predictive potential of the human genome, yet biologists now demand that these models also elucidate the underlying biological mechanisms. While interpretable machine learning (IML) techniques have been increasingly applied to bridge this gap, there has been a pervasive reliance on anecdotal validation: the vast majority of research relies on a single IML method and reports only isolated successful instances. Through a benchmarking study on transcription factor binding, we demonstrate the risks of current practices. We show that different IML methods can often (1) yield contradictory explanations for the same predictions, (2) fail to localize known regulatory motifs, and (3) fail to faithfully reflect the model's internal decision process. In light of this, we argue for a validation framework analogous to clinical trials: just as trials require rigorous design and adverse-event reporting, genomic interpretability must move beyond cherry-picked plausibility toward systematic assessment of consistency, faithfulness, and biological validity. To facilitate this, we propose a tiered framework to guide rigorous evaluation and reporting of genomic IML methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a real problem with cherry-picked IML validation in genomics but rests its broad call on one TF-binding benchmark.

read the letter

This paper's core message is that interpretability work on genomic models too often stops at showing one plausible example with one method. The authors back this with a benchmarking study on transcription factor binding where multiple IML techniques gave conflicting results, missed known motifs, and did not align with how the model actually decided.

That demonstration is the concrete part. It shows the risks they list: contradictory explanations, failure to localize motifs, and lack of faithfulness. The proposed tiered framework, modeled on clinical trials with requirements for consistency, faithfulness, and biological validity, is a direct response.

The paper does well in naming a problem that many in the field have seen but not always quantified. Requiring systematic reporting instead of isolated successes would raise the bar.

The limitation is clear from the abstract: the benchmark is restricted to TF binding. The stress test note is right that without showing these issues appear across other tasks like chromatin accessibility or eQTL prediction, or across different model types, the generalization to "the vast majority of research" stays untested. No details on the number of models tested or the exact metrics appear in the summary, which makes it hard to judge how strong the evidence is.

No circularity or invented math here; it's an empirical observation turned into a position.

This is for people working on or reviewing genomic ML interpretability papers. A reader already convinced that anecdotal validation is weak will find support, while someone wanting quantitative backing for the framework might want more.

It deserves serious refereeing. The point about evaluation standards matters for a field tied to biology, and referees could help strengthen the empirical base or clarify the scope of the recommendation.

Referee Report

2 major / 1 minor

Summary. The manuscript is a position paper arguing that genomic ML interpretability research relies excessively on anecdotal validation of single IML methods. It supports this via a benchmarking study on transcription factor binding demonstrating that different IML methods often (1) yield contradictory explanations for the same predictions, (2) fail to localize known regulatory motifs, and (3) fail to faithfully reflect the model's internal decision process. The paper advocates replacing such practices with a clinical-trial-style validation framework emphasizing systematic assessment of consistency, faithfulness, and biological validity, and proposes a tiered framework to guide evaluation and reporting.

Significance. If the benchmarking results prove robust and the single-task findings generalize, the paper would highlight a systemic weakness in how IML is applied to genomics and provide a constructive path toward more reliable biological insights from models. The explicit analogy to clinical trials and the tiered framework are practical contributions that could influence community standards.

major comments (2)

[Abstract] Abstract: the three concrete risks are asserted from a benchmarking study, but no details on study design, number of models, quantitative metrics, or statistical tests are supplied, preventing assessment of support for the claims of contradictory explanations, motif localization failure, and lack of faithfulness.
[Benchmarking study and position argument] The position that 'the vast majority of research' must move to a clinical-trial-style framework rests on the single TF-binding benchmark generalizing to broader genomic IML; no multi-task replication, comparison to other targets (e.g., chromatin accessibility), or argument for representativeness of the chosen models and IML suite is provided.

minor comments (1)

[Proposed framework] The tiered framework is introduced at a high level; adding concrete criteria, example metrics, or reporting templates for each tier would improve actionability without altering the central argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive suggestions. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the three concrete risks are asserted from a benchmarking study, but no details on study design, number of models, quantitative metrics, or statistical tests are supplied, preventing assessment of support for the claims of contradictory explanations, motif localization failure, and lack of faithfulness.

Authors: The abstract is intentionally concise, as is standard. The full manuscript details the benchmarking study design, including the models evaluated, the suite of IML methods, the quantitative metrics used to measure contradictory explanations (e.g., agreement rates), motif localization performance against known ground truth, and faithfulness assessments (e.g., via perturbation or surrogate model tests), along with any statistical comparisons. To improve accessibility, we will revise the abstract to incorporate a brief statement on study scale and the nature of the quantitative findings. revision: yes
Referee: [Benchmarking study and position argument] The position that 'the vast majority of research' must move to a clinical-trial-style framework rests on the single TF-binding benchmark generalizing to broader genomic IML; no multi-task replication, comparison to other targets (e.g., chromatin accessibility), or argument for representativeness of the chosen models and IML suite is provided.

Authors: We selected transcription factor binding precisely because it supplies established biological ground truth (known motifs), enabling rigorous assessment of localization and faithfulness that would be harder on less annotated tasks. The observed inconsistencies illustrate the risks of anecdotal validation even under favorable conditions. While we do not provide multi-task replication here, we will add a dedicated discussion paragraph explaining the choice of TF binding as a representative and stringent test case, acknowledging the single-task scope, and calling for future multi-task studies (including chromatin accessibility) to further test generalizability. revision: partial

Circularity Check

0 steps flagged

No circularity: position rests on direct empirical benchmarking, not self-referential logic or derivations

full rationale

The paper is a position statement whose central claims are grounded in the authors' own benchmarking study on transcription-factor binding (explicitly described in the abstract as demonstrating contradictory explanations, motif localization failures, and faithfulness issues). No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the provided text. The generalization concern raised by the skeptic is an empirical representativeness issue, not a circular reduction where a result equals its inputs by construction. The proposed tiered framework is presented as a normative recommendation following the observations, not derived from any self-definitional step. This matches the default expectation of no significant circularity for non-derivational papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a position paper without mathematical models, fitted parameters, or formal derivations, so the ledger contains no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5721 in / 1124 out tokens · 25262 ms · 2026-06-28T23:14:07.694254+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

111 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Lundberg and Su

Ian Covert and Scott M. Lundberg and Su. Explaining by Removing:. J. Mach. Learn. Res. , volume =
[2]

Zou , title =

Amirata Ghorbani and Abubakar Abid and James Y. Zou , title =
[3]

Preece , title =

Richard Tomsett and Dan Harborne and Supriyo Chakraborty and Prudhvi Gurram and Alun D. Preece , title =
[4]

Do Feature Attribution Methods Correctly Attribute Features? , booktitle =

Yilun Zhou and Serena Booth and Marco T. Do Feature Attribution Methods Correctly Attribute Features? , booktitle =
[5]

Weina Jin and Xiaoxiao Li and Ghassan Hamarneh , title =
[6]

Towards Faithfully Interpretable

Alon Jacovi and Yoav Goldberg , editor =. Towards Faithfully Interpretable
[7]

Plataniotis , title =

Sam Sattarzadeh and Mahesh Sudhakar and Konstantinos N. Plataniotis , title =
[8]

On the Robustness of Interpretability Methods , journal =

David Alvarez. On the Robustness of Interpretability Methods , journal =. 2018 , eprinttype =

2018
[9]

CoRR , volume =

Chirag Agarwal and Nari Johnson and Martin Pawelczyk and Satyapriya Krishna and Eshika Saxena and Marinka Zitnik and Himabindu Lakkaraju , title =. CoRR , volume =. 2022 , eprinttype =

2022
[10]

Goodfellow and Been Kim , title =

Julius Adebayo and Justin Gilmer and Ian J. Goodfellow and Been Kim , title =
[11]

Varshney and Caiming Xiong and Richard Socher and Nazneen Fatema Rajani , title =

Jesse Vig and Ali Madani and Lav R. Varshney and Caiming Xiong and Richard Socher and Nazneen Fatema Rajani , title =
[12]

Peiyu Yang and Naveed Akhtar and Zeyi Wen and Mubarak Shah and Ajmal Saeed Mian , title =
[13]

Allen , title =

Camille Olivia Little and Debolina Halder Lina and Genevera I. Allen , title =. Trans. Mach. Learn. Res. , year =
[14]

Elizabeth Kumar and Suresh Venkatasubramanian and Carlos Scheidegger and Sorelle A

I. Elizabeth Kumar and Suresh Venkatasubramanian and Carlos Scheidegger and Sorelle A. Friedler , title =
[15]

Himabindu Lakkaraju and Nino Arsov and Osbert Bastani , title =
[16]

Sushant Agarwal and Shahin Jabbari and Chirag Agarwal and Sohini Upadhyay and Steven Wu and Himabindu Lakkaraju , title =
[17]

Yao Rong and Tobias Leemann and Vadim Borisov and Gjergji Kasneci and Enkelejda Kasneci , title =
[18]

Joon Sik Kim and Gregory Plumb and Ameet Talwalkar , title =
[19]

Goodfellow and Moritz Hardt and Been Kim , title =

Julius Adebayo and Justin Gilmer and Michael Muelly and Ian J. Goodfellow and Moritz Hardt and Been Kim , title =
[20]

Towards Robust Interpretability with Self-Explaining Neural Networks , booktitle =

David Alvarez. Towards Robust Interpretability with Self-Explaining Neural Networks , booktitle =
[21]

A Benchmark for Interpretability Methods in Deep Neural Networks , booktitle =

Sara Hooker and Dumitru Erhan and Pieter. A Benchmark for Interpretability Methods in Deep Neural Networks , booktitle =
[22]

On the (In)fidelity and Sensitivity of Explanations , booktitle =

Chih. On the (In)fidelity and Sensitivity of Explanations , booktitle =
[23]

Gunady and H

Aya Abdelsalam Ismail and Mohamed K. Gunady and H. Benchmarking Deep Learning Interpretability in Time Series Predictions , booktitle =
[24]

Julius Adebayo and Michael Muelly and Ilaria Liccardi and Been Kim , title =
[25]

Michael Tsang and Sirisha Rambhatla and Yan Liu , title =
[26]

Lundberg and Su

Ian Covert and Scott M. Lundberg and Su. Understanding Global Feature Contributions With Additive Importance Measures , booktitle =
[27]

Dylan Slack and Anna Hilgard and Sameer Singh and Himabindu Lakkaraju , title =
[28]

Giang Nguyen and Daeyoung Kim and Anh Nguyen , title =
[29]

Peter Hase and Harry Xie and Mohit Bansal , title =
[30]

Chirag Agarwal and Satyapriya Krishna and Eshika Saxena and Martin Pawelczyk and Nari Johnson and Isha Puri and Marinka Zitnik and Himabindu Lakkaraju , title =
[31]

Zou , title =

Yongchan Kwon and James Y. Zou , title =
[32]

Tessa Han and Suraj Srinivas and Himabindu Lakkaraju , title =
[33]

Usha Bhalla and Suraj Srinivas and Himabindu Lakkaraju , title =
[34]

Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance , booktitle =

Jonathan Crabb. Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance , booktitle =
[35]

Xuhong Li and Mengnan Du and Jiamin Chen and Yekun Chai and Himabindu Lakkaraju and Haoyi Xiong , title =
[36]

Lukas Klein and Carsten T. L. Navigating the Maze of Explainable
[37]

and Kinney, Justin B

Melnikov, Alexandre and Murugan, Anand and Zhang, Xiaolan and Tesileanu, Tiberiu and Wang, Li and Rogov, Peter and Feizi, Soheil and Gnirke, Andreas and Callan, Jr., Curtis G. and Kinney, Justin B. and Kellis, Manolis and Lander, Eric S. and Mikkelsen, Tarjei S. , title =. Nat. Biotechnol. , year =
[38]

and Araya, Carlos L

Fowler, Douglas M. and Araya, Carlos L. and Fleishman, Sarel J. and Kellogg, Elizabeth H. and Stephany, Jason J. and Baker, David and Fields, Stanley , title =. Nat. Methods , year =
[39]

Zou, James and Huss, Mikael and Abid, Abubakar and Mohammadi, Pejman and Torkamani, Ali and Telenti, Amalio , title =. Nat. Genet. , year =
[40]

Deep learning: new computational modelling techniques for genomics , journal =

Eraslan, G. Deep learning: new computational modelling techniques for genomics , journal =. 2019 , volume =

2019
[41]

Barbadilla-Mart. Nat. Rev. Genet. , title =
[42]

and Dufault, Cameron and Wainberg, Michael and Forster, Duncan and Karimzadeh, Mehran and Goodarzi, Hani and Theis, Fabian J

Consens, Micaela E. and Dufault, Cameron and Wainberg, Michael and Forster, Duncan and Karimzadeh, Mehran and Goodarzi, Hani and Theis, Fabian J. and Moses, Alan and Wang, Bo , journal =. Transformers and genome language models , volume =
[43]

Base-resolution models of transcription-factor binding reveal soft motif syntax , volume =

Avsec,. Base-resolution models of transcription-factor binding reveal soft motif syntax , volume =. Nat. Genet. , number =
[44]

Effective gene expression prediction from sequence by integrating long-range interactions , volume =

Avsec,. Effective gene expression prediction from sequence by integrating long-range interactions , volume =. Nat. Methods , number =
[45]

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , volume =

Alipanahi, Babak and Delong, Andrew and Weirauch, Matthew T and Frey, Brendan J , journal =. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , volume =
[46]

and Reiter, Franziska and Pagani, Michaela and Stark, Alexander , journal =

de Almeida, Bernardo P. and Reiter, Franziska and Pagani, Michaela and Stark, Alexander , journal =. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers , volume =
[47]

Predicting effects of noncoding variants with deep learning--based sequence model , volume =

Zhou, Jian and Troyanskaya, Olga G , journal =. Predicting effects of noncoding variants with deep learning--based sequence model , volume =
[48]

and Yao, Kevin and Chen, Kathleen M

Zhou, Jian and Theesfeld, Chandra L. and Yao, Kevin and Chen, Kathleen M. and Wong, Aaron K. and Troyanskaya, Olga G. , journal =. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk , volume =
[49]

Predicting Splicing from Primary Sequence with Deep Learning , volume =

Kishore Jaganathan and Sofia. Predicting Splicing from Primary Sequence with Deep Learning , volume =. Cell , number =
[50]

Karen Simonyan and Andrea Vedaldi and Andrew Zisserman , title =
[51]

Avanti Shrikumar and Peyton Greenside and Anshul Kundaje , title =
[52]

Daniel Smilkov and Nikhil Thorat and Been Kim and Fernanda B. Vi. SmoothGrad: removing noise by adding noise , journal =. 2017 , eprinttype =

2017
[53]

Mukund Sundararajan and Ankur Taly and Qiqi Yan , title =
[54]

Lundberg and Su

Scott M. Lundberg and Su. A Unified Approach to Interpreting Model Predictions , booktitle =
[55]

Janizek and Pascal Sturmfels and Su

Joseph D. Janizek and Pascal Sturmfels and Su. Explaining Explanations: Axiomatic Feature Interactions for Deep Networks , journal =
[56]

Bioinformatics , volume =

Greenside, Peyton and Shimko, Tyler and Fordyce, Polly and Kundaje, Anshul , title =. Bioinformatics , volume =. 2018 , month =

2018
[57]

and McCandlish, David M

Seitz, Evan E. and McCandlish, David M. and Kinney, Justin B. and Koo, Peter K. , journal =. Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models , volume =
[58]

McCandlish and Joshua B

Jakub Otwinowski and David M. McCandlish and Joshua B. Plotkin , title =. Proc. Natl. Acad. Sci. U. S. A. , volume =
[59]

Science , volume =

CW Bakerlee and AN Nguyen Ba and Y Shulgina and JI Rojas Echenique and MM Desai , title =. Science , volume =
[60]

Li and Tan and Ma and Zhong and Yu and Zhou and Ouyang and Zhou and Tan and Hong , title =
[61]

Sofroniew and Deniz Oktay and Zeming Lin and Robert Verkuil and Vincent Q

Thomas Hayes and Roshan Rao and Halil Akin and Nicholas J. Sofroniew and Deniz Oktay and Zeming Lin and Robert Verkuil and Vincent Q. Tran and Jonathan Deaton and Marius Wiggert and Rohil Badkundri and Irhum Shafkat and Jun Gong and Alexander Derry and Raul S. Molina and Neil Thomas and Yousuf A. Khan and Chetan Mishra and Carolyn Kim and Liam J. Bartie a...
[62]

Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval , booktitle =

Pascal Notin and Mafalda Dias and Jonathan Frazer and Javier Marchena. Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval , booktitle =
[63]

Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning , volume =

Wang, Ning and Bian, Jiang and Li, Yuchen and Li, Xuhong and Mumtaz, Shahid and Kong, Linghe and Xiong, Haoyi , journal =. Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning , volume =
[64]

and Schaub, Christoph and Pagani, Michaela and Secchia, Stefano and Furlong, Eileen E

de Almeida, Bernardo P. and Schaub, Christoph and Pagani, Michaela and Secchia, Stefano and Furlong, Eileen E. M. and Stark, Alexander , journal =. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo , volume =
[65]

and Zhang, Ruochi and Ma, Sai and Shrestha, Rojesh and Kartha, Vinay K

Hu, Yan and Horlbeck, Max A. and Zhang, Ruochi and Ma, Sai and Shrestha, Rojesh and Kartha, Vinay K. and Duarte, Fabiana M. and Hock, Conrad and Savage, Rachel E. and Labade, Ajay and Kletzien, Heidi and Meliki, Alia and Castillo, Andrew and Durand, Neva C. and Mattei, Eugenio and Anderson, Lauren J. and Tay, Tristan and Earl, Andrew S. and Shoresh, Noam ...
[66]

and Tasaki, Shinya and Bennett, David A

Sasse, Alexander and Ng, Bernard and Spiro, Anna E. and Tasaki, Shinya and Bennett, David A. and Gaiteri, Christopher and De Jager, Philip L. and Chikina, Maria and Mostafavi, Sara , journal =. Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings , volume =
[67]

The power of multiplexed functional analysis of genetic variants , volume =

Gasperini, Molly and Starita, Lea and Shendure, Jay , journal =. The power of multiplexed functional analysis of genetic variants , volume =
[68]

Deep mutational scanning: a new style of protein science , volume =

Fowler, Douglas M and Fields, Stanley , journal =. Deep mutational scanning: a new style of protein science , volume =
[69]

Saporta, Adriel and Gui, Xiaotong and Agrawal, Ashwin and Pareek, Anuj and Truong, Steven Q. H. and Nguyen, Chanh D. T. and Ngo, Van-Doan and Seekins, Jayne and Blankenberg, Francis G. and Ng, Andrew Y. and Lungren, Matthew P. and Rajpurkar, Pranav , journal =. Benchmarking saliency methods for chest X-ray interpretation , volume =
[70]

Evaluation of post-hoc interpretability methods in time-series classification , volume =

Turb. Evaluation of post-hoc interpretability methods in time-series classification , volume =. Nat. Mach. Intell. , number =
[71]

Why Should

Ghada El. Why Should
[72]

and Molinet, Jennifer and Yassour, Moran and Fan, Lin and Adiconis, Xian and Thompson, Dawn A

Vaishnav, Eeshit Dhaval and de Boer, Carl G. and Molinet, Jennifer and Yassour, Moran and Fan, Lin and Adiconis, Xian and Thompson, Dawn A. and Levin, Joshua Z. and Cubillos, Francisco A. and Regev, Aviv , isbn =. The evolution, evolvability and engineering of gene regulatory DNA , volume =. Nature , number =
[73]

and Wagner, Andreas , journal =

Payne, Joshua L. and Wagner, Andreas , journal =. The causes of evolvability and their evolution , volume =
[74]

and Frydman, Judith and Andino, Raul , journal =

Lauring, Adam S. and Frydman, Judith and Andino, Raul , journal =. The role of mutational robustness in RNA virus evolution , volume =
[75]

and Parsons, Todd L

Draghi, Jeremy A. and Parsons, Todd L. and Wagner, G. Mutational robustness can facilitate adaptation , volume =. Nature , number =
[76]

, journal =

Phillips, Patrick C. , journal =. Epistasis ---the essential role of gene interactions in the structure and evolution of genetic systems , volume =
[77]

Pairwise and higher-order genetic interactions during the evolution of a tRNA , volume =

Domingo, J. Pairwise and higher-order genetic interactions during the evolution of a tRNA , volume =. Nature , number =
[78]

Sewall Wright , title =. Proc. XI Int. Congr. Genet. , volume =
[79]

Natural Selection and the Concept of a Protein Space , volume =

Maynard Smith, John , journal =. Natural Selection and the Concept of a Protein Space , volume =
[80]

Mingyu Huang and Shasha Zhou and Yuxuan Chen and Ke Li , title =

Showing first 80 references.

[1] [1]

Lundberg and Su

Ian Covert and Scott M. Lundberg and Su. Explaining by Removing:. J. Mach. Learn. Res. , volume =

[2] [2]

Zou , title =

Amirata Ghorbani and Abubakar Abid and James Y. Zou , title =

[3] [3]

Preece , title =

Richard Tomsett and Dan Harborne and Supriyo Chakraborty and Prudhvi Gurram and Alun D. Preece , title =

[4] [4]

Do Feature Attribution Methods Correctly Attribute Features? , booktitle =

Yilun Zhou and Serena Booth and Marco T. Do Feature Attribution Methods Correctly Attribute Features? , booktitle =

[5] [5]

Weina Jin and Xiaoxiao Li and Ghassan Hamarneh , title =

[6] [6]

Towards Faithfully Interpretable

Alon Jacovi and Yoav Goldberg , editor =. Towards Faithfully Interpretable

[7] [7]

Plataniotis , title =

Sam Sattarzadeh and Mahesh Sudhakar and Konstantinos N. Plataniotis , title =

[8] [8]

On the Robustness of Interpretability Methods , journal =

David Alvarez. On the Robustness of Interpretability Methods , journal =. 2018 , eprinttype =

2018

[9] [9]

CoRR , volume =

Chirag Agarwal and Nari Johnson and Martin Pawelczyk and Satyapriya Krishna and Eshika Saxena and Marinka Zitnik and Himabindu Lakkaraju , title =. CoRR , volume =. 2022 , eprinttype =

2022

[10] [10]

Goodfellow and Been Kim , title =

Julius Adebayo and Justin Gilmer and Ian J. Goodfellow and Been Kim , title =

[11] [11]

Varshney and Caiming Xiong and Richard Socher and Nazneen Fatema Rajani , title =

Jesse Vig and Ali Madani and Lav R. Varshney and Caiming Xiong and Richard Socher and Nazneen Fatema Rajani , title =

[12] [12]

Peiyu Yang and Naveed Akhtar and Zeyi Wen and Mubarak Shah and Ajmal Saeed Mian , title =

[13] [13]

Allen , title =

Camille Olivia Little and Debolina Halder Lina and Genevera I. Allen , title =. Trans. Mach. Learn. Res. , year =

[14] [14]

Elizabeth Kumar and Suresh Venkatasubramanian and Carlos Scheidegger and Sorelle A

I. Elizabeth Kumar and Suresh Venkatasubramanian and Carlos Scheidegger and Sorelle A. Friedler , title =

[15] [15]

Himabindu Lakkaraju and Nino Arsov and Osbert Bastani , title =

[16] [16]

Sushant Agarwal and Shahin Jabbari and Chirag Agarwal and Sohini Upadhyay and Steven Wu and Himabindu Lakkaraju , title =

[17] [17]

Yao Rong and Tobias Leemann and Vadim Borisov and Gjergji Kasneci and Enkelejda Kasneci , title =

[18] [18]

Joon Sik Kim and Gregory Plumb and Ameet Talwalkar , title =

[19] [19]

Goodfellow and Moritz Hardt and Been Kim , title =

Julius Adebayo and Justin Gilmer and Michael Muelly and Ian J. Goodfellow and Moritz Hardt and Been Kim , title =

[20] [20]

Towards Robust Interpretability with Self-Explaining Neural Networks , booktitle =

David Alvarez. Towards Robust Interpretability with Self-Explaining Neural Networks , booktitle =

[21] [21]

A Benchmark for Interpretability Methods in Deep Neural Networks , booktitle =

Sara Hooker and Dumitru Erhan and Pieter. A Benchmark for Interpretability Methods in Deep Neural Networks , booktitle =

[22] [22]

On the (In)fidelity and Sensitivity of Explanations , booktitle =

Chih. On the (In)fidelity and Sensitivity of Explanations , booktitle =

[23] [23]

Gunady and H

Aya Abdelsalam Ismail and Mohamed K. Gunady and H. Benchmarking Deep Learning Interpretability in Time Series Predictions , booktitle =

[24] [24]

Julius Adebayo and Michael Muelly and Ilaria Liccardi and Been Kim , title =

[25] [25]

Michael Tsang and Sirisha Rambhatla and Yan Liu , title =

[26] [26]

Lundberg and Su

Ian Covert and Scott M. Lundberg and Su. Understanding Global Feature Contributions With Additive Importance Measures , booktitle =

[27] [27]

Dylan Slack and Anna Hilgard and Sameer Singh and Himabindu Lakkaraju , title =

[28] [28]

Giang Nguyen and Daeyoung Kim and Anh Nguyen , title =

[29] [29]

Peter Hase and Harry Xie and Mohit Bansal , title =

[30] [30]

Chirag Agarwal and Satyapriya Krishna and Eshika Saxena and Martin Pawelczyk and Nari Johnson and Isha Puri and Marinka Zitnik and Himabindu Lakkaraju , title =

[31] [31]

Zou , title =

Yongchan Kwon and James Y. Zou , title =

[32] [32]

Tessa Han and Suraj Srinivas and Himabindu Lakkaraju , title =

[33] [33]

Usha Bhalla and Suraj Srinivas and Himabindu Lakkaraju , title =

[34] [34]

Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance , booktitle =

Jonathan Crabb. Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance , booktitle =

[35] [35]

Xuhong Li and Mengnan Du and Jiamin Chen and Yekun Chai and Himabindu Lakkaraju and Haoyi Xiong , title =

[36] [36]

Lukas Klein and Carsten T. L. Navigating the Maze of Explainable

[37] [37]

and Kinney, Justin B

Melnikov, Alexandre and Murugan, Anand and Zhang, Xiaolan and Tesileanu, Tiberiu and Wang, Li and Rogov, Peter and Feizi, Soheil and Gnirke, Andreas and Callan, Jr., Curtis G. and Kinney, Justin B. and Kellis, Manolis and Lander, Eric S. and Mikkelsen, Tarjei S. , title =. Nat. Biotechnol. , year =

[38] [38]

and Araya, Carlos L

Fowler, Douglas M. and Araya, Carlos L. and Fleishman, Sarel J. and Kellogg, Elizabeth H. and Stephany, Jason J. and Baker, David and Fields, Stanley , title =. Nat. Methods , year =

[39] [39]

Zou, James and Huss, Mikael and Abid, Abubakar and Mohammadi, Pejman and Torkamani, Ali and Telenti, Amalio , title =. Nat. Genet. , year =

[40] [40]

Deep learning: new computational modelling techniques for genomics , journal =

Eraslan, G. Deep learning: new computational modelling techniques for genomics , journal =. 2019 , volume =

2019

[41] [41]

Barbadilla-Mart. Nat. Rev. Genet. , title =

[42] [42]

and Dufault, Cameron and Wainberg, Michael and Forster, Duncan and Karimzadeh, Mehran and Goodarzi, Hani and Theis, Fabian J

Consens, Micaela E. and Dufault, Cameron and Wainberg, Michael and Forster, Duncan and Karimzadeh, Mehran and Goodarzi, Hani and Theis, Fabian J. and Moses, Alan and Wang, Bo , journal =. Transformers and genome language models , volume =

[43] [43]

Base-resolution models of transcription-factor binding reveal soft motif syntax , volume =

Avsec,. Base-resolution models of transcription-factor binding reveal soft motif syntax , volume =. Nat. Genet. , number =

[44] [44]

Effective gene expression prediction from sequence by integrating long-range interactions , volume =

Avsec,. Effective gene expression prediction from sequence by integrating long-range interactions , volume =. Nat. Methods , number =

[45] [45]

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , volume =

Alipanahi, Babak and Delong, Andrew and Weirauch, Matthew T and Frey, Brendan J , journal =. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , volume =

[46] [46]

and Reiter, Franziska and Pagani, Michaela and Stark, Alexander , journal =

de Almeida, Bernardo P. and Reiter, Franziska and Pagani, Michaela and Stark, Alexander , journal =. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers , volume =

[47] [47]

Predicting effects of noncoding variants with deep learning--based sequence model , volume =

Zhou, Jian and Troyanskaya, Olga G , journal =. Predicting effects of noncoding variants with deep learning--based sequence model , volume =

[48] [48]

and Yao, Kevin and Chen, Kathleen M

Zhou, Jian and Theesfeld, Chandra L. and Yao, Kevin and Chen, Kathleen M. and Wong, Aaron K. and Troyanskaya, Olga G. , journal =. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk , volume =

[49] [49]

Predicting Splicing from Primary Sequence with Deep Learning , volume =

Kishore Jaganathan and Sofia. Predicting Splicing from Primary Sequence with Deep Learning , volume =. Cell , number =

[50] [50]

Karen Simonyan and Andrea Vedaldi and Andrew Zisserman , title =

[51] [51]

Avanti Shrikumar and Peyton Greenside and Anshul Kundaje , title =

[52] [52]

Daniel Smilkov and Nikhil Thorat and Been Kim and Fernanda B. Vi. SmoothGrad: removing noise by adding noise , journal =. 2017 , eprinttype =

2017

[53] [53]

Mukund Sundararajan and Ankur Taly and Qiqi Yan , title =

[54] [54]

Lundberg and Su

Scott M. Lundberg and Su. A Unified Approach to Interpreting Model Predictions , booktitle =

[55] [55]

Janizek and Pascal Sturmfels and Su

Joseph D. Janizek and Pascal Sturmfels and Su. Explaining Explanations: Axiomatic Feature Interactions for Deep Networks , journal =

[56] [56]

Bioinformatics , volume =

Greenside, Peyton and Shimko, Tyler and Fordyce, Polly and Kundaje, Anshul , title =. Bioinformatics , volume =. 2018 , month =

2018

[57] [57]

and McCandlish, David M

Seitz, Evan E. and McCandlish, David M. and Kinney, Justin B. and Koo, Peter K. , journal =. Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models , volume =

[58] [58]

McCandlish and Joshua B

Jakub Otwinowski and David M. McCandlish and Joshua B. Plotkin , title =. Proc. Natl. Acad. Sci. U. S. A. , volume =

[59] [59]

Science , volume =

CW Bakerlee and AN Nguyen Ba and Y Shulgina and JI Rojas Echenique and MM Desai , title =. Science , volume =

[60] [60]

Li and Tan and Ma and Zhong and Yu and Zhou and Ouyang and Zhou and Tan and Hong , title =

[61] [61]

Sofroniew and Deniz Oktay and Zeming Lin and Robert Verkuil and Vincent Q

Thomas Hayes and Roshan Rao and Halil Akin and Nicholas J. Sofroniew and Deniz Oktay and Zeming Lin and Robert Verkuil and Vincent Q. Tran and Jonathan Deaton and Marius Wiggert and Rohil Badkundri and Irhum Shafkat and Jun Gong and Alexander Derry and Raul S. Molina and Neil Thomas and Yousuf A. Khan and Chetan Mishra and Carolyn Kim and Liam J. Bartie a...

[62] [62]

Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval , booktitle =

Pascal Notin and Mafalda Dias and Jonathan Frazer and Javier Marchena. Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval , booktitle =

[63] [63]

Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning , volume =

Wang, Ning and Bian, Jiang and Li, Yuchen and Li, Xuhong and Mumtaz, Shahid and Kong, Linghe and Xiong, Haoyi , journal =. Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning , volume =

[64] [64]

and Schaub, Christoph and Pagani, Michaela and Secchia, Stefano and Furlong, Eileen E

de Almeida, Bernardo P. and Schaub, Christoph and Pagani, Michaela and Secchia, Stefano and Furlong, Eileen E. M. and Stark, Alexander , journal =. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo , volume =

[65] [65]

and Zhang, Ruochi and Ma, Sai and Shrestha, Rojesh and Kartha, Vinay K

Hu, Yan and Horlbeck, Max A. and Zhang, Ruochi and Ma, Sai and Shrestha, Rojesh and Kartha, Vinay K. and Duarte, Fabiana M. and Hock, Conrad and Savage, Rachel E. and Labade, Ajay and Kletzien, Heidi and Meliki, Alia and Castillo, Andrew and Durand, Neva C. and Mattei, Eugenio and Anderson, Lauren J. and Tay, Tristan and Earl, Andrew S. and Shoresh, Noam ...

[66] [66]

and Tasaki, Shinya and Bennett, David A

Sasse, Alexander and Ng, Bernard and Spiro, Anna E. and Tasaki, Shinya and Bennett, David A. and Gaiteri, Christopher and De Jager, Philip L. and Chikina, Maria and Mostafavi, Sara , journal =. Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings , volume =

[67] [67]

The power of multiplexed functional analysis of genetic variants , volume =

Gasperini, Molly and Starita, Lea and Shendure, Jay , journal =. The power of multiplexed functional analysis of genetic variants , volume =

[68] [68]

Deep mutational scanning: a new style of protein science , volume =

Fowler, Douglas M and Fields, Stanley , journal =. Deep mutational scanning: a new style of protein science , volume =

[69] [69]

Saporta, Adriel and Gui, Xiaotong and Agrawal, Ashwin and Pareek, Anuj and Truong, Steven Q. H. and Nguyen, Chanh D. T. and Ngo, Van-Doan and Seekins, Jayne and Blankenberg, Francis G. and Ng, Andrew Y. and Lungren, Matthew P. and Rajpurkar, Pranav , journal =. Benchmarking saliency methods for chest X-ray interpretation , volume =

[70] [70]

Evaluation of post-hoc interpretability methods in time-series classification , volume =

Turb. Evaluation of post-hoc interpretability methods in time-series classification , volume =. Nat. Mach. Intell. , number =

[71] [71]

Why Should

Ghada El. Why Should

[72] [72]

and Molinet, Jennifer and Yassour, Moran and Fan, Lin and Adiconis, Xian and Thompson, Dawn A

Vaishnav, Eeshit Dhaval and de Boer, Carl G. and Molinet, Jennifer and Yassour, Moran and Fan, Lin and Adiconis, Xian and Thompson, Dawn A. and Levin, Joshua Z. and Cubillos, Francisco A. and Regev, Aviv , isbn =. The evolution, evolvability and engineering of gene regulatory DNA , volume =. Nature , number =

[73] [73]

and Wagner, Andreas , journal =

Payne, Joshua L. and Wagner, Andreas , journal =. The causes of evolvability and their evolution , volume =

[74] [74]

and Frydman, Judith and Andino, Raul , journal =

Lauring, Adam S. and Frydman, Judith and Andino, Raul , journal =. The role of mutational robustness in RNA virus evolution , volume =

[75] [75]

and Parsons, Todd L

Draghi, Jeremy A. and Parsons, Todd L. and Wagner, G. Mutational robustness can facilitate adaptation , volume =. Nature , number =

[76] [76]

, journal =

Phillips, Patrick C. , journal =. Epistasis ---the essential role of gene interactions in the structure and evolution of genetic systems , volume =

[77] [77]

Pairwise and higher-order genetic interactions during the evolution of a tRNA , volume =

Domingo, J. Pairwise and higher-order genetic interactions during the evolution of a tRNA , volume =. Nature , number =

[78] [78]

Sewall Wright , title =. Proc. XI Int. Congr. Genet. , volume =

[79] [79]

Natural Selection and the Concept of a Protein Space , volume =

Maynard Smith, John , journal =. Natural Selection and the Concept of a Protein Space , volume =

[80] [80]

Mingyu Huang and Shasha Zhou and Yuxuan Chen and Ke Li , title =