Interpretable and Steerable Sequence Learning via Prototypes

Huamin Qu; Liu Ren; Panpan Xu; Yao Ming

arxiv: 1907.09728 · v1 · pith:VLHAPVHUnew · submitted 2019-07-23 · 💻 cs.LG · cs.HC· stat.ML

Interpretable and Steerable Sequence Learning via Prototypes

Yao Ming , Panpan Xu , Huamin Qu , Liu Ren This is my paper

Pith reviewed 2026-05-24 17:53 UTC · model grok-4.3

classification 💻 cs.LG cs.HCstat.ML

keywords interpretable machine learningsequence modelingprototype learningcase-based reasoningmodel steeringdeep learning explanationsinterpretable deep networks

0 comments

The pith

ProSeNet makes sequence predictions by comparing inputs to a small set of learned prototypes that experts can edit directly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ProSeNet, a deep sequence model that bases each prediction on similarity to a handful of prototype sequences drawn from the domain. It trains these prototypes under explicit penalties for simplicity, diversity, and sparsity so the resulting set remains human-readable. The same prototypes double as the explanation mechanism, following case-based reasoning rather than feature attribution. Domain experts can then revise the prototypes by hand to steer the model, without touching weights or code. Experiments across ECG, protein, text, and automotive data show accuracy stays comparable to standard deep networks while the prototypes align with human judgment in user studies.

Core claim

ProSeNet obtains its predictions by measuring similarity between an input sequence and a learned set of prototypes, which are representative cases from the training data. The model is optimized with a combined loss that includes classification accuracy plus penalties for prototype complexity, lack of diversity, and excessive number. This yields both high accuracy and explanations in the form of the closest prototypes. Domain experts can then adjust the prototypes manually to inject knowledge, and the steered model retains its accuracy.

What carries the argument

The prototypes: a small collection of exemplar sequences that serve as comparison points for new inputs and as the source of case-based explanations.

If this is right

Each prediction comes with an automatic explanation by naming the nearest prototypes.
Steering requires only direct edits to the prototypes, not access to model parameters.
The same training procedure applies to multiple sequence domains without task-specific redesign.
Accuracy remains on par with black-box deep models on real diagnostic and text tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The refinement step could be automated with suggestions drawn from misclassified cases to guide experts.
A similar prototype mechanism might transfer to non-sequence data if an appropriate distance function is supplied.
Interactive editing may surface hidden dataset biases that remain invisible in static black-box models.

Load-bearing premise

Jointly optimizing accuracy with simplicity, diversity, and sparsity produces prototypes that experts can refine without causing accuracy to drop.

What would settle it

A test in which domain experts manually edit the learned prototypes and accuracy on a held-out test set falls more than a few percentage points below the unedited model.

Figures

Figures reproduced from arXiv: 1907.09728 by Huamin Qu, Liu Ren, Panpan Xu, Yao Ming.

**Figure 2.** Figure 2: A user verifying and refining a ProSeNet with [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: An input sequence and the similarity scores with [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Examples of binary sentiment classification on [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: A prototype protein sequence and its neighboring [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: The influence of the prototype number and the di [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: The influence of the diversity regularization term [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 10.** Figure 10: Additional examples of learned prototype se [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Complete list of time series prototypes learned [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

read the original abstract

One of the major challenges in machine learning nowadays is to provide predictions with not only high accuracy but also user-friendly explanations. Although in recent years we have witnessed increasingly popular use of deep neural networks for sequence modeling, it is still challenging to explain the rationales behind the model outputs, which is essential for building trust and supporting the domain experts to validate, critique and refine the model. We propose ProSeNet, an interpretable and steerable deep sequence model with natural explanations derived from case-based reasoning. The prediction is obtained by comparing the inputs to a few prototypes, which are exemplar cases in the problem domain. For better interpretability, we define several criteria for constructing the prototypes, including simplicity, diversity, and sparsity and propose the learning objective and the optimization procedure. ProSeNet also provides a user-friendly approach to model steering: domain experts without any knowledge on the underlying model or parameters can easily incorporate their intuition and experience by manually refining the prototypes. We conduct experiments on a wide range of real-world applications, including predictive diagnostics for automobiles, ECG, and protein sequence classification and sentiment analysis on texts. The result shows that ProSeNet can achieve accuracy on par with state-of-the-art deep learning models. We also evaluate the interpretability of the results with concrete case studies. Finally, through user study on Amazon Mechanical Turk (MTurk), we demonstrate that the model selects high-quality prototypes which align well with human knowledge and can be interactively refined for better interpretability without loss of performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ProSeNet adds manual prototype editing to sequence models for interpretability, but the no-performance-loss claim after edits lacks the needed before-and-after numbers.

read the letter

The main point is a sequence model that learns a small set of prototypes from data and lets domain experts edit those prototypes by hand to steer the model. That combination of learned case-based reasoning plus direct human refinement is the practical addition here. They optimize for accuracy plus simplicity, diversity, and sparsity on the prototypes, then test on car diagnostics, ECG, protein sequences, and sentiment analysis. Accuracy matches standard deep models, and an MTurk study shows the prototypes align with what people expect. The case studies give concrete examples of how the explanations look in each domain. That part is useful for anyone who needs both performance and a way for non-ML experts to adjust the model without touching weights. The soft spot is exactly the one in the stress-test note. The abstract states that manual refinement works without loss of performance, yet there are no reported accuracy figures before versus after edits and no ablation measuring how far an expert can change a prototype before the decision boundary moves. Without those numbers or an explicit stability check in the objective, the steerability claim is an extrapolation rather than a measured result. If the full paper has those tables, the gap closes; otherwise it is the main item a referee would flag. The rest of the optimization and citation pattern looks standard for prototype methods and does not show circularity. This paper is for groups working on explainable sequence models in applied settings like healthcare or diagnostics. A reader who wants concrete examples of case-based steering would get value from the experiments and user study. It deserves a serious referee because the core architecture and multi-domain tests are solid enough to review even if the steering experiments need more detail. Recommendation: send to peer review and request the before-and-after accuracy numbers on prototype edits.

Referee Report

2 major / 1 minor

Summary. The paper proposes ProSeNet, a prototype-based deep sequence model for interpretable predictions via case-based reasoning. Prototypes are learned by jointly optimizing accuracy with simplicity, diversity, and sparsity criteria; the model supports steering through manual prototype refinement by domain experts. Experiments on automobile diagnostics, ECG classification, protein sequences, and text sentiment analysis report accuracy parity with state-of-the-art deep models, supported by case studies and an MTurk user study on prototype quality.

Significance. If the steerability result holds with quantitative support, the work would provide a concrete mechanism for expert-driven model editing in sequence domains without requiring ML expertise, addressing a practical gap between high-accuracy black-box models and usable explanations.

major comments (2)

[Abstract] Abstract: the central steerability claim states that prototypes can be 'manually refined ... without loss of performance,' yet no before/after accuracy numbers, ablation tables, or perturbation experiments are reported for any of the four tasks; this leaves the stability of the learned prototypes under human edits unverified.
[Experiments] Experiments section: the joint objective (accuracy + simplicity + diversity + sparsity) is presented as producing refinable prototypes, but no quantitative stability analysis or controlled edit study measures accuracy drop after refinement, making the 'without loss of performance' guarantee an extrapolation rather than a demonstrated result.

minor comments (1)

Notation for the prototype selection criteria and loss weights could be clarified with an explicit table of symbols and their roles in the objective.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and for identifying the need for stronger empirical support of the steerability claim. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central steerability claim states that prototypes can be 'manually refined ... without loss of performance,' yet no before/after accuracy numbers, ablation tables, or perturbation experiments are reported for any of the four tasks; this leaves the stability of the learned prototypes under human edits unverified.

Authors: We agree that the abstract asserts performance stability under manual refinement without providing explicit before-and-after accuracy figures. While the MTurk study evaluates prototype quality and human alignment, it does not report numerical accuracy changes post-refinement. In the revision we will add a dedicated table (or subsection) with accuracy metrics before and after prototype edits on the tasks where refinement was demonstrated, directly substantiating the claim. revision: yes
Referee: [Experiments] Experiments section: the joint objective (accuracy + simplicity + diversity + sparsity) is presented as producing refinable prototypes, but no quantitative stability analysis or controlled edit study measures accuracy drop after refinement, making the 'without loss of performance' guarantee an extrapolation rather than a demonstrated result.

Authors: The referee correctly notes the absence of quantitative stability analysis in the experiments section. The current text relies on the user study for the steerability result without controlled accuracy measurements. We will revise the experiments section to include a quantitative stability analysis reporting accuracy before and after controlled prototype edits, converting the claim from an extrapolation to a measured result. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces ProSeNet as a new architecture whose learning objective is explicitly defined to jointly optimize accuracy together with prototype simplicity, diversity, and sparsity; the steering mechanism is presented as an independent post-hoc manual editing capability. No equation or claim reduces a reported result or prediction to a fitted quantity by construction, nor does any central premise rest on a self-citation chain. Empirical accuracy comparisons and the MTurk user study are external evaluations rather than tautological outputs of the objective itself.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on the ability to learn a small set of prototypes that simultaneously support accurate prediction, human-interpretable explanations, and manual editing. This introduces new entities (the prototypes) and several hyperparameters whose values are not fixed by prior theory.

free parameters (2)

number of prototypes
Hyperparameter controlling how many exemplar cases are learned; must be chosen by the user.
weights on simplicity, diversity, and sparsity terms
Coefficients in the learning objective that balance the three prototype criteria against prediction loss.

axioms (1)

domain assumption Sequence inputs can be meaningfully compared to a small set of learned prototypes for both prediction and explanation.
Core premise of the case-based reasoning approach stated in the abstract.

invented entities (1)

learned prototypes no independent evidence
purpose: Serve as the basis for both prediction and human-understandable explanations
New entities introduced by the model; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5806 in / 1518 out tokens · 54425 ms · 2026-05-24T17:53:51.218688+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 4 internal anchors

[1]

Dimitrios Alikaniotis, Helen Yannakoudakis, and Marek Rei. 2016. Automatic Text Scoring using Neural Networks. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016), 715–725

work page 2016
[2]

ANSI/AAMI. 2008. Testing and reporting performance results of cardiac rhythm and ST segment measurement algorithms . Standard. American National Standards In- stitute, Inc. (ANSI), Association for the Advancement of Medical Instrumentation (AAMI)

work page 2008
[3]

Roel Bertens, Jilles Vreeken, and Arno Siebes. 2016. Keeping it short and simple: Summarising complex event sequences with multivariate patterns. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 735–744

work page 2016
[4]

Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 1721–1730

work page 2015
[5]

Zhengping Che, Sanjay Purushotham, Robinder Khemani, and Yan Liu. 2016. In- terpretable deep models for icu outcome prediction. In AMIA Annual Symposium Proceedings, Vol. 2016. American Medical Informatics Association, 371

work page 2016
[6]

Chaofan Chen, Oscar Li, Alina Barnett, Jonathan Su, and Cynthia Rudin. 2018. This looks like that: deep learning for interpretable image recognition. arXiv preprint arXiv:1806.10574 (2018)

work page arXiv 2018
[7]

Yuanzhe Chen, Panpan Xu, and Liu Ren. 2018. Sequence Synopsis: Optimize Visual Summary of Temporal Event Data. IEEE Transactions on Visualization and Computer Graphics 24, 1 (Jan 2018), 45–55. https://doi.org/10.1109/TVCG.2017. 2745083

work page doi:10.1109/tvcg.2017 2018
[8]

Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016. Doctor ai: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference . 301–318

work page 2016
[9]

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems. 3504–3512

work page 2016
[10]

Edward Choi, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2017. Using recurrent neural network models for early detection of heart failure onset.Journal of the American Medical Informatics Association 24, 2 (2017), 361–370

work page 2017
[11]

Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of inter- pretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. 2013. Speech recognition with deep recurrent neural networks. In Int. Conf. Acoustics, Speech and Signal Processing. IEEE, 6645–6649

work page 2013
[13]

Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, and Aram Galstyan. 2017. Multitask learning and benchmarking with clinical time series data.arXiv preprint arXiv:1703.07771 (2017)

work page arXiv 2017
[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

work page 2016
[15]

Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database.Scientific data 3 (2016), 160035

work page 2016
[16]

Mohammad Kachuee, Shayan Fazeli, and Majid Sarrafzadeh. 2018. ECG Heartbeat Classification: A Deep Transferable Representation. arXiv preprint arXiv:1805.00794 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and under- standing recurrent networks. arXiv preprint arXiv:1506.02078 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

Janet L Kolodner. 1992. An Introduction to Case-based Reasoning. Artificial Intelligence Review 6, 1 (1992), 3–34

work page 1992
[19]

Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. 2018. Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions. In AAAI Conference on Artificial Intelligence

work page 2018
[20]

Zachary C. Lipton. 2018. The Mythos of Model Interpretability. Commun. ACM 61, 10 (Sept. 2018), 36–43

work page 2018
[21]

Zachary C Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[22]

James Murdoch, Peter J

W. James Murdoch, Peter J. Liu, and Bin Yu. 2018. Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs. In International Conference on Learning Representations

work page 2018
[23]

W James Murdoch and Arthur Szlam. 2017. Automatic Rule Extraction from Long Short Term Memory Networks. (2017)

work page 2017
[24]

Parliament and Council of the European Union. 2016. The General Data Protection Regulation. (2016)

work page 2016
[25]

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the Difficulty of Training Recurrent Neural Networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML’13). 1310–1318

work page 2013
[26]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135–1144

work page 2016
[27]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High- precision model-agnostic explanations. In AAAI Conference on Artificial Intelli- gence

work page 2018
[28]

Elaine Rich and Kevin Knight. 1991. Artificial intelligence. Tata McGraw-Hill

work page 1991
[29]

Cynthia Rudin. 2018. Please Stop Explaining Black Box Models for High Stakes Decisions. NeurIPS 2018 Workshop on Critiquing and Correcting Trends in Machine Learning (2018)

work page 2018
[30]

Rainer Schmidt, Stefania Montani, Riccardo Bellazzi, Luigi Portinale, and Lothar Gierl. 2001. Cased-based reasoning for medical knowledge-based systems. Inter- national Journal of Medical Informatics 64, 2-3 (2001), 355–367

work page 2001
[31]

Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M Rush

work page
[32]

IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 667–676

Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 667–676

work page 2018
[33]

Jimeng Sun, Fei Wang, Jianying Hu, and Shahram Edabollahi. 2012. Supervised patient similarity measure of heterogeneous patient records. ACM SIGKDD Explorations Newsletter 14, 1 (2012), 16–24

work page 2012
[34]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems (NIPS’14). 3104–3112

work page 2014
[35]

Restuarant

Duyu Tang, Bing Qin, and Ting Liu. 2015. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification. In Proc. Conf. Empirical Methods in Natural Language Processing . 1422–1432. A PROTOTYPE SIMPLIFICATION VIA BEAM SEARCH The beam search algorithm that we used for prototype simplifica- tion is shown in Algorithm 1. The BestCandidate...

work page 2015

[1] [1]

Dimitrios Alikaniotis, Helen Yannakoudakis, and Marek Rei. 2016. Automatic Text Scoring using Neural Networks. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016), 715–725

work page 2016

[2] [2]

ANSI/AAMI. 2008. Testing and reporting performance results of cardiac rhythm and ST segment measurement algorithms . Standard. American National Standards In- stitute, Inc. (ANSI), Association for the Advancement of Medical Instrumentation (AAMI)

work page 2008

[3] [3]

Roel Bertens, Jilles Vreeken, and Arno Siebes. 2016. Keeping it short and simple: Summarising complex event sequences with multivariate patterns. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 735–744

work page 2016

[4] [4]

Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 1721–1730

work page 2015

[5] [5]

Zhengping Che, Sanjay Purushotham, Robinder Khemani, and Yan Liu. 2016. In- terpretable deep models for icu outcome prediction. In AMIA Annual Symposium Proceedings, Vol. 2016. American Medical Informatics Association, 371

work page 2016

[6] [6]

Chaofan Chen, Oscar Li, Alina Barnett, Jonathan Su, and Cynthia Rudin. 2018. This looks like that: deep learning for interpretable image recognition. arXiv preprint arXiv:1806.10574 (2018)

work page arXiv 2018

[7] [7]

Yuanzhe Chen, Panpan Xu, and Liu Ren. 2018. Sequence Synopsis: Optimize Visual Summary of Temporal Event Data. IEEE Transactions on Visualization and Computer Graphics 24, 1 (Jan 2018), 45–55. https://doi.org/10.1109/TVCG.2017. 2745083

work page doi:10.1109/tvcg.2017 2018

[8] [8]

Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016. Doctor ai: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference . 301–318

work page 2016

[9] [9]

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems. 3504–3512

work page 2016

[10] [10]

Edward Choi, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2017. Using recurrent neural network models for early detection of heart failure onset.Journal of the American Medical Informatics Association 24, 2 (2017), 361–370

work page 2017

[11] [11]

Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of inter- pretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. 2013. Speech recognition with deep recurrent neural networks. In Int. Conf. Acoustics, Speech and Signal Processing. IEEE, 6645–6649

work page 2013

[13] [13]

Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, and Aram Galstyan. 2017. Multitask learning and benchmarking with clinical time series data.arXiv preprint arXiv:1703.07771 (2017)

work page arXiv 2017

[14] [14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

work page 2016

[15] [15]

Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database.Scientific data 3 (2016), 160035

work page 2016

[16] [16]

Mohammad Kachuee, Shayan Fazeli, and Majid Sarrafzadeh. 2018. ECG Heartbeat Classification: A Deep Transferable Representation. arXiv preprint arXiv:1805.00794 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and under- standing recurrent networks. arXiv preprint arXiv:1506.02078 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

Janet L Kolodner. 1992. An Introduction to Case-based Reasoning. Artificial Intelligence Review 6, 1 (1992), 3–34

work page 1992

[19] [19]

Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. 2018. Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions. In AAAI Conference on Artificial Intelligence

work page 2018

[20] [20]

Zachary C. Lipton. 2018. The Mythos of Model Interpretability. Commun. ACM 61, 10 (Sept. 2018), 36–43

work page 2018

[21] [21]

Zachary C Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[22] [22]

James Murdoch, Peter J

W. James Murdoch, Peter J. Liu, and Bin Yu. 2018. Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs. In International Conference on Learning Representations

work page 2018

[23] [23]

W James Murdoch and Arthur Szlam. 2017. Automatic Rule Extraction from Long Short Term Memory Networks. (2017)

work page 2017

[24] [24]

Parliament and Council of the European Union. 2016. The General Data Protection Regulation. (2016)

work page 2016

[25] [25]

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the Difficulty of Training Recurrent Neural Networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML’13). 1310–1318

work page 2013

[26] [26]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135–1144

work page 2016

[27] [27]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High- precision model-agnostic explanations. In AAAI Conference on Artificial Intelli- gence

work page 2018

[28] [28]

Elaine Rich and Kevin Knight. 1991. Artificial intelligence. Tata McGraw-Hill

work page 1991

[29] [29]

Cynthia Rudin. 2018. Please Stop Explaining Black Box Models for High Stakes Decisions. NeurIPS 2018 Workshop on Critiquing and Correcting Trends in Machine Learning (2018)

work page 2018

[30] [30]

Rainer Schmidt, Stefania Montani, Riccardo Bellazzi, Luigi Portinale, and Lothar Gierl. 2001. Cased-based reasoning for medical knowledge-based systems. Inter- national Journal of Medical Informatics 64, 2-3 (2001), 355–367

work page 2001

[31] [31]

Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M Rush

work page

[32] [32]

IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 667–676

Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 667–676

work page 2018

[33] [33]

Jimeng Sun, Fei Wang, Jianying Hu, and Shahram Edabollahi. 2012. Supervised patient similarity measure of heterogeneous patient records. ACM SIGKDD Explorations Newsletter 14, 1 (2012), 16–24

work page 2012

[34] [34]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems (NIPS’14). 3104–3112

work page 2014

[35] [35]

Restuarant

Duyu Tang, Bing Qin, and Ting Liu. 2015. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification. In Proc. Conf. Empirical Methods in Natural Language Processing . 1422–1432. A PROTOTYPE SIMPLIFICATION VIA BEAM SEARCH The beam search algorithm that we used for prototype simplifica- tion is shown in Algorithm 1. The BestCandidate...

work page 2015