pith. sign in

arxiv: 2512.07988 · v3 · submitted 2025-12-08 · 💻 cs.LG · cs.GR· cs.HC

HOLE: Homological Observation of Latent Embeddings for Neural Network Interpretability

Pith reviewed 2026-05-16 23:51 UTC · model grok-4.3

classification 💻 cs.LG cs.GRcs.HC
keywords persistent homologyneural network interpretabilitylatent embeddingstopological data analysisclass separationmodel robustnessfeature disentanglement
0
0 comments X

The pith

Persistent homology on neural network activations reveals topological patterns tied to class separation and robustness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Deep learning models succeed at tasks but keep their internal representations hard to inspect. This paper presents HOLE, which computes persistent homology on the activations inside network layers to extract shape-based features such as connected components and holes. These features are shown through visualizations including cluster flow diagrams, blob graphs, and heatmap dendrograms that track how data structure changes from layer to layer. The work finds that the resulting topological signatures align with improved class separation, disentangled features, and greater resistance to input changes or compression. This supplies a geometric perspective that can complement existing ways of probing model behavior.

Core claim

HOLE extracts topological features from intermediate activations using persistent homology and visualizes them with cluster flow diagrams, blob graphs, and heatmap dendrograms. Evaluation on discriminative models shows these features associate with class separation, feature disentanglement, and robustness to perturbations and compression.

What carries the argument

Persistent homology applied directly to the intermediate activations of a neural network to track topological evolution across layers.

Load-bearing premise

That the topological invariants computed on activations reflect meaningful semantic properties such as class separation instead of unrelated geometric artifacts of the embedding spaces.

What would settle it

A test on matched models known to differ sharply in class separation that finds no corresponding difference in their persistent homology barcodes or persistence diagrams at the same layers.

Figures

Figures reproduced from arXiv: 2512.07988 by Paul Rosen, Sudhanva Manjunath Athreya.

Figure 1
Figure 1. Figure 1: HOLE provides global interpretability via multiple visualization techniques: (left) Sankey flows (layer-wise represen [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example (left) persistence diagram and (rght) barcode. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: HOLE overview shows how during inference, neural net [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (o) The input dataset was used to generate (a-d,i-k) dis [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Examples of the visualizations used to support tasks [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Input noise robustness evaluation on CIFAR-10. Various [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of (a-d) Sankey diagrams for ViT encoder [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of (left) Blob graphs and (right) Sankey dia [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: ResNet-34 persistent dendrogram + heatmap before and [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Blob visualizations of ViT encoder layer 11 activations [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
read the original abstract

Deep learning models have achieved remarkable success across various domains, yet their learned representations and decision-making processes remain largely opaque and hard to interpret. This work introduces HOLE (Homological Observation of Latent Embeddings), a method for analyzing and interpreting discriminative neural networks through persistent homology. HOLE extracts topological features from intermediate activations and presents them using a suite of visualization techniques, including cluster flow diagrams, blob graphs, and heatmap dendrograms. These tools facilitate the examination of representation structure and quality across layers. We evaluate HOLE using a range of discriminative models, focusing on representation quality, interpretability across layers, and robustness to input perturbations and model compression. The results indicate that topological analysis reveals patterns associated with class separation, feature disentanglement, and model robustness, providing a complementary perspective for understanding and improving deep learning systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces HOLE, a method that applies persistent homology to intermediate activations of discriminative neural networks to extract topological features, which are then visualized via cluster flow diagrams, blob graphs, and heatmap dendrograms. It claims this provides insights into representation quality, layer-wise interpretability, class separation, feature disentanglement, and robustness to perturbations and compression, evaluated across a range of models.

Significance. If validated with appropriate controls, HOLE could provide a useful topological lens for neural network interpretability that complements existing activation-based or attribution methods, potentially helping identify structural changes across layers or under model modifications.

major comments (3)
  1. Abstract and evaluation sections: the claims of revealing patterns associated with class separation, feature disentanglement, and model robustness are supported only by qualitative descriptions; no quantitative metrics (e.g., persistence diagram distances, classification accuracies on topological features), baselines (e.g., random networks or untrained models), or statistical analysis are reported to substantiate the interpretability conclusions.
  2. Method and experiments: the central assumption that persistent homology barcodes on activation point clouds encode learned class structure (rather than incidental geometry of the input manifold or any Lipschitz embedding) is not tested via controls such as randomly initialized networks, label-shuffled training, or linear probes on the same inputs; without these, the interpretability interpretation remains ungrounded.
  3. Evaluation claims: the robustness analysis to input perturbations and model compression lacks specific comparisons (e.g., before/after compression persistence diagrams or correlation with accuracy drops) that would make the robustness findings load-bearing rather than observational.
minor comments (2)
  1. The visualization techniques (cluster flow diagrams, blob graphs) would benefit from explicit pseudocode or parameter settings (e.g., filtration thresholds, distance metrics used in Vietoris-Rips) to allow reproducibility.
  2. Notation for persistent homology features (e.g., barcodes, persistence diagrams) should be defined more formally with reference to standard definitions to avoid ambiguity for readers unfamiliar with topological data analysis.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate quantitative controls and comparisons where they strengthen the claims without altering the core contribution.

read point-by-point responses
  1. Referee: Abstract and evaluation sections: the claims of revealing patterns associated with class separation, feature disentanglement, and model robustness are supported only by qualitative descriptions; no quantitative metrics (e.g., persistence diagram distances, classification accuracies on topological features), baselines (e.g., random networks or untrained models), or statistical analysis are reported to substantiate the interpretability conclusions.

    Authors: We agree that the current version relies primarily on qualitative visualizations. In the revised manuscript we will add quantitative metrics, including Wasserstein distances between persistence diagrams across layers and models, as well as baseline comparisons against randomly initialized networks. Statistical tests will be included to support the reported patterns. revision: yes

  2. Referee: Method and experiments: the central assumption that persistent homology barcodes on activation point clouds encode learned class structure (rather than incidental geometry of the input manifold or any Lipschitz embedding) is not tested via controls such as randomly initialized networks, label-shuffled training, or linear probes on the same inputs; without these, the interpretability interpretation remains ungrounded.

    Authors: This is a fair criticism. While the original experiments focus on trained models, we will add the suggested controls—randomly initialized networks and label-shuffled training—in the revised version. These experiments will directly test whether the observed topological signatures arise from learned class structure rather than input geometry alone. revision: yes

  3. Referee: Evaluation claims: the robustness analysis to input perturbations and model compression lacks specific comparisons (e.g., before/after compression persistence diagrams or correlation with accuracy drops) that would make the robustness findings load-bearing rather than observational.

    Authors: We accept that more explicit quantitative links are needed. The revised manuscript will include direct before-and-after persistence diagram comparisons under compression and perturbations, together with reported correlations between topological changes and accuracy drops. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational application of standard persistent homology

full rationale

The paper introduces HOLE as a visualization and analysis pipeline that applies off-the-shelf persistent homology (Vietoris-Rips or equivalent) to intermediate activation point clouds and then renders the resulting barcodes via cluster-flow diagrams, blob graphs, and dendrograms. No equations are presented that derive a new quantity from fitted parameters, no predictions are made that are statistically forced by the same data used to demonstrate them, and no uniqueness theorems or ansatzes are smuggled in via self-citation. All reported patterns (class separation, disentanglement, robustness) are empirical observations from the computed topological features; they are not shown to be equivalent to the input activations by construction. The method is therefore self-contained as an observational tool and receives a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that topological invariants computed via persistent homology on activation vectors carry semantic meaning for model behavior; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption Persistent homology applied to point clouds in activation space yields features that reflect representation quality and robustness.
    Invoked when the abstract claims the extracted features reveal class separation and disentanglement.

pith-pipeline@v0.9.0 · 5443 in / 1207 out tokens · 40034 ms · 2026-05-16T23:51:24.033839+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages

  1. [1]

    R. Amar, J. Eagan, and J. Stasko. Low-level components of analytic activity in information visualization. InIEEE Symposium on Infor- mation Visualization, pp. 111–117, 2005. doi: 10.1109/INFVIS.2005 .1532136 2, 5

  2. [2]

    S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. M ¨uller, and W. Samek. On pixel-wise explanations for non-linear classifier deci- sions by layer-wise relevance propagation.PloS one, 10(7), 2015. doi: 10.1371/journal.pone.0130140 2

  3. [3]

    Software design analysis and technical debt management based on design rule theory,

    R. Ballester, X. Arnal, C. Casacuberta, M. Madadi, C. Corneanu, and S. Escalera. Predicting the generalization gap in neural networks us- ing topological data analysis.Neurocomputing, 2024. doi: 10.1016/j. neucom.2024.127787 3

  4. [4]

    Banner, Y

    R. Banner, Y . Nahshan, E. Hoffer, and D. Soudry. Post training 4- bit quantization of convolutional networks for rapid-deployment. In Advances in Neural Information Processing Systems, vol. 32, 2019. 9

  5. [5]

    Barocas and A

    S. Barocas and A. D. Selbst.Big data’s disparate impact, vol. 104. HeinOnline, 2016. doi: 10.2139/ssrn.2477899 1

  6. [6]

    D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba. Network dissection: Quantifying interpretability of deep visual representations. InIEEE Conference on Computer Vision and Pattern Recognition, pp. 6541–6549, 2017. doi: 10.1109/CVPR.2017.354 2

  7. [7]

    Birdal, A

    T. Birdal, A. Lou, L. Guibas, and U. Simsekli. Intrinsic dimension, persistent homology and generalization in neural networks. InAd- vances in Neural Information Processing Systems, 2021. 3

  8. [8]

    Blalock, J

    D. Blalock, J. J. G. Ortiz, J. Frankle, and J. Guttag. What is the state of neural network pruning?Machine Learning and Systems, 2:129–146,

  9. [9]

    P. Bubenik. Statistical topological data analysis using persistence landscapes.Journal of Machine Learning Research, 16:77–102, 2015. 2

  10. [10]

    Carlsson

    G. Carlsson. Topology and data.Bulletin of the American Mathe- matical Society, 46(2):255–308, 2009. doi: 10.1090/S0273-0979-09 -01249-X 4

  11. [11]

    Carri `ere, M

    M. Carri `ere, M. Cuturi, and S. Oudot. Sliced wasserstein kernel for persistence diagrams. InInternational Conference on Machine Learn- ing, pp. 664–673, 2017. 2

  12. [12]

    Carri `ere, M

    M. Carri `ere, M. Cuturi, S. Oudot, and B. Rieck. Perslay: A neu- ral network layer for persistence diagrams and new graph topological signatures. InAISTATS, 2020. 3

  13. [13]

    Cohen-Steiner, H

    D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persis- tence diagrams.Discrete & Computational Geometry, 37(1):103–120,

  14. [14]

    doi: 10.1007/s00454-006-1276-5 2

  15. [15]

    Dettmers, M

    T. Dettmers, M. Lewis, Y . Belkada, and L. Zettlemoyer. Gpt3.int8(): 8-bit matrix multiplication for transformers at scale.Advances in Neu- ral Information Processing Systems, 35:30318–30332, 2022. 9

  16. [16]

    Devlin, M.-W

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint, 2018. 1

  17. [17]

    Dosovitskiy, L

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Trans- formers for image recognition at scale. InInternational Conference on Learning Representations, 2021. 7

  18. [18]

    Edelsbrunner and J

    H. Edelsbrunner and J. Harer.Computational topology: an introduc- tion. American Mathematical Soc., 2010. 4

  19. [19]

    Edelsbrunner, D

    H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological per- sistence and simplification.Discrete & Computational Geometry, 28(4):511–533, 2002. doi: 10.1109/SFCS.2000.892133 2, 4

  20. [20]

    Erhan, Y

    D. Erhan, Y . Bengio, A. Courville, and P. Vincent. Visualizing higher- layer features of a deep network. InInternational Conference on Ma- chine Learning, pp. 341–348, 2009. 2

  21. [21]

    S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha. Learned step size quantization. InInternational Conference on Learning Representations, 2020. 9

  22. [22]

    Feldman, M

    D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, pca, and projective clus- tering.SIAM Journal on Computing, 49(3):601–657, 2020. doi: 10. 1137/18M1209854 10 10

  23. [23]

    Frankle and M

    J. Frankle and M. Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019. 8

  24. [24]

    Gholami, S

    A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer. A survey of quantization methods for efficient neural net- work inference.arXiv preprint, 2021. doi: 10.1201/9781003162810 -13 9

  25. [25]

    R. Ghrist. Barcodes: The persistent topology of data.Bulletin of the American Mathematical Society, 45(1):61–75, 2008. doi: 10.1090/ S0273-0979-07-01191-3 2

  26. [26]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, and A. Courville.Deep learning. MIT press, 2016. 1

  27. [27]

    Goodfellow, J

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, vol. 27, 2014. 10

  28. [28]

    Guti ´errez-Fandi˜no, D

    A. Guti ´errez-Fandi˜no, D. P ´erez-Fern´andez, J. Armengol-Estap ´e, and M. Villegas. Persistent homology captures the generalization of neural networks without a validation set.arXiv preprint, 2021. 3

  29. [29]

    S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. InInternational Conference on Learning Representations,

  30. [30]

    ICLR 2016 (oral). 8, 9

  31. [31]

    Hassibi and D

    B. Hassibi and D. G. Stork. Second order derivatives for network pruning: Optimal brain surgeon.Advances in Neural Information Pro- cessing Systems, 5, 1993. 8

  32. [32]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InIEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016. doi: 10.1109/CVPR.2016.90 6

  33. [33]

    Hofer, R

    C. Hofer, R. Kwitt, M. Niethammer, and A. Uhl. Deep learning with topological signatures. InAdvances in Neural Information Processing Systems, 2017. 3

  34. [34]

    Hohman, H

    F. Hohman, H. Park, C. Robinson, and D. H. Chau. Summit: Scaling deep learning interpretability by visualizing activation and attribution summarizations.IEEE Transactions on Visualization and Computer Graphics, 26(1):1–12, 2020. doi: 10.1109/TVCG.2019.2934659 3

  35. [35]

    In: Proceedings of the IEEE conference on computer vi- sion and pattern recognition

    B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. InIEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713, 2018. doi: 10.1109/CVPR.2018.00286 9

  36. [36]

    Kahng, P

    M. Kahng, P. Y . Andrews, A. Kalro, and D. H. Chau. Activis: Vi- sual exploration of industry-scale deep neural network models.IEEE Transactions on Visualization and Computer Graphics, 24(1):88–97,

  37. [37]

    doi: 10.1109/TVCG.2017.2744718 3

  38. [38]

    A. E. Khandani, A. J. Kim, and A. W. Lo. Consumer credit-risk mod- els via machine-learning algorithms.Journal of Banking & Finance, 34(11):2767–2787, 2010. 1

  39. [39]

    B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, and R. Sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). InInternational Con- ference on Machine Learning, pp. 2668–2677, 2018. 2

  40. [40]

    D. P. Kingma and M. Welling. Auto-encoding variational bayes.arXiv preprint, 2013. 10

  41. [41]

    Krishnamoorthi

    R. Krishnamoorthi. Quantizing deep convolutional networks for effi- cient inference: A whitepaper.arXiv preprint, 2018. 9

  42. [42]

    Krizhevsky

    A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. Technical Report. 6

  43. [43]

    Krizhevsky, I

    A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classifica- tion with deep convolutional neural networks. InAdvances in Neural Information Processing Systems, pp. 1097–1105, 2012. doi: 10.1145/ 3065386 1

  44. [44]

    Nature, 521, 436 –444, https://doi.org/10.1038/nature14539

    Y . LeCun, Y . Bengio, and G. Hinton. Deep learning.Nature, 521(7553):436–444, 2015. doi: 10.1038/nature14539 1

  45. [45]

    LeCun, J

    Y . LeCun, J. Denker, and S. Solla. Optimal brain damage.Advances in Neural Information Processing Systems, 2, 1989. 8

  46. [46]

    H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruning filters for efficient convnets. InInternational Conference on Learning Representations, 2017. 8

  47. [47]

    M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu. Towards better anal- ysis of deep convolutional neural networks.IEEE Transactions on Visualization and Computer Graphics, 23(1):831–840, 2017. doi: 10. 1109/TVCG.2016.2598831 3

  48. [48]

    Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang. Learning efficient convolutional networks through network slimming. InIEEE International Conference on Computer Vision, pp. 2736–2744, 2017. doi: 10.1109/ICCV.2017.298 8

  49. [49]

    F. J. L ´opez Iturriaga and I. P. Sanz. Machine learning: Challenges, lessons, and opportunities in credit risk modeling.Moody’s Analytics Risk Perspectives, 2013. 1

  50. [50]

    A. Lou, D. Lim, I. Katsman, L. Huang, Q. Jiang, S.-N. Lim, and C. De Sa. Neural manifold ordinary differential equations.Advances in Neural Information Processing Systems, 33:17548–17558, 2020. 10

  51. [51]

    Louizos, M

    C. Louizos, M. Welling, and D. P. Kingma. Learning sparse neural networks throughl 0 regularization. InInternational Conference on Learning Representations, 2018. 8

  52. [52]

    S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. InAdvances in Neural Information Processing Systems, vol. 30, pp. 4765–4774, 2017. 2

  53. [53]

    Maria, J.-D

    C. Maria, J.-D. Boissonnat, M. Glisse, and M. Yvinec. The gudhi library: Simplicial complexes and persistent homology. InInter- national Congress on Mathematical Software (ICMS), pp. 167–174,

  54. [54]

    doi: 10.1007/978-3-662-44199-2 28 2, 6

  55. [55]

    ACM Comput

    N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan. A survey on bias and fairness in machine learning.ACM Computing Surveys (CSUR), 54(6):1–35, 2021. doi: 10.1145/3457607 1

  56. [56]

    S. Migacz. 8-bit inference with tensorrt. InGPU Technology Confer- ence, 2017. 9

  57. [57]

    Molchanov, S

    P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz. Pruning convolutional neural networks for resource efficient inference. InIn- ternational Conference on Learning Representations, 2017. 8

  58. [58]

    Molnar.Interpretable machine learning

    C. Molnar.Interpretable machine learning. Lulu. com, 2020. 1

  59. [59]

    M. Moor, M. Horn, B. Rieck, and K. Borgwardt. Topological autoen- coders. InInternational Conference on Machine Learning, 2020. 3

  60. [60]

    Nagel, M

    M. Nagel, M. v. Baalen, T. Blankevoort, and M. Welling. Data-free quantization through weight equalization and bias correction. InIEEE International Conference on Computer Vision, pp. 1325–1334, 2019. doi: 10.1109/ICCV.2019.00141 9

  61. [61]

    C. Olah, A. Mordvintsev, and L. Schubert. Feature visualization.Dis- till, 2017. doi: 10.23915/distill.00007 2

  62. [63]

    Experimental observations of the topology of convolutional neural network activations

    E. Purvine et al. Experimental observations of the topology of convo- lutional neural network activations. InIEEE Symposium on Visualiza- tion for Cyber Security, 2022. doi: 10.1609/aaai.v37i8.26134 3

  63. [64]

    Scalable and accurate deep learning with electronic health records

    A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt, P. J. Liu, X. Liu, J. Marcus, M. Sun, et al. Scalable and accurate deep learning with electronic health records.NPJ Digital Medicine, 1(1):1– 10, 2018. doi: 10.1038/s41746-018-0029-1 1

  64. [65]

    M. T. Ribeiro, S. Singh, and C. Guestrin. ”why should i trust you?” explaining the predictions of any classifier. InACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, pp. 1135–1144, 2016. doi: 10.18653/v1/N16-3020 2

  65. [66]

    Rieck, M

    B. Rieck, M. Togninalli, M. Bianchini, J. M. Buhmann, C. Kenel, D. Lun, A. Radeghieri, C. Ertle, and D. H”ottger. Neural persistence: A complexity measure for deep neural networks using algebraic topol- ogy. InInternational Conference on Learning Representations, 2019. ICLR. 3

  66. [67]

    C. Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature Machine Intelligence, 1(5):206–215, 2019. doi: 10.1038/s42256-019-0048-x 1

  67. [68]

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InIEEE International Conference on Computer Vision, pp. 618–626, 2017. doi: 10.1007/s11263-019 -01228-7 2 11

  68. [69]

    B. W. Silverman.Density Estimation for Statistics and Data Analysis. Routledge, 1st ed., 2018. doi: 10.1201/9781315140919 10

  69. [70]

    Singh, F

    G. Singh, F. M ´emoli, G. E. Carlsson, et al. Topological methods for the analysis of high dimensional data sets and 3d object recognition. PBG@ Eurographics, 2:091–100, 2007. 2

  70. [71]

    Smilkov, N

    D. Smilkov, N. Thorat, B. Kim, F. Vi ´egas, and M. Wattenberg. Smoothgrad: removing noise by adding noise.arXiv preprint, 2017. 2

  71. [72]

    Sundararajan, A

    M. Sundararajan, A. Taly, and Q. Yan. Axiomatic attribution for deep networks.International Conference on Machine Learning, pp. 3319– 3328, 2017. 2

  72. [73]

    Szegedy, W

    C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfel- low, and R. Fergus. Intriguing properties of neural networks.arXiv preprint, 2013. 10

  73. [74]

    E. J. Topol. High-performance medicine: the convergence of human and artificial intelligence.Nature medicine, 25(1):44–56, 2019. doi: 10.1038/s41591-018-0300-7 1

  74. [75]

    Z. J. Wang, R. Turko, O. Shaikh, H. Park, N. Das, F. Hohman, M. Kahng, and D. H. Chau. Cnn explainer: Learning convolutional neural networks with interactive visualization.IEEE Transactions on Visualization and Computer Graphics, 27(1):1396–1406, 2021. doi: 10.1109/TVCG.2020.3030418 3

  75. [76]

    Watanabe and H

    S. Watanabe and H. Yamana. Topological measurement of deep neural networks using persistent homology.Complexity, 2021. doi: 10.1007/ s10472-021-09761-3 3

  76. [77]

    Wheeler, V

    B. Wheeler, V . Bouza, and P. Bubenik. Activation landscapes as a topological summary of neural network performance. InInternational Conference on Machine Learning, 2021. doi: 10.1109/BigData52589 .2021.9671368 3

  77. [78]

    H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius. Integer quan- tization for deep learning inference: Principles and empirical evalua- tion.arXiv preprint, 2020. 9

  78. [79]

    Z. Yao, Z. Dong, Z. Zheng, A. Gholami, J. Yu, E. Tan, L. Wang, Q. Huang, Y . Wang, M. Mahoney, et al. Hawq-v3: Dyadic neural net- work quantization. InInternational Conference on Machine Learning, pp. 11875–11886, 2021. 9

  79. [80]

    M. D. Zeiler and R. Fergus. Visualizing and understanding convo- lutional networks. InEuropean Conference on Computer Vision, pp. 818–833, 2014. doi: 10.1007/978-3-319-10590-1 53 2

  80. [81]

    Zomorodian and G

    A. Zomorodian and G. Carlsson. Computing persistent homology. Discrete & Computational Geometry, 33(2):249–274, 2005. doi: 10. 1145/997817.997870 2 12