pith. sign in

arxiv: 1611.07478 · v3 · pith:4Q3I55PDnew · submitted 2016-11-22 · 💻 cs.AI

An unexpected unity among methods for interpreting model predictions

classification 💻 cs.AI
keywords modelpredictionsmethodsrepresentationaccuracyadditivecomplexfeatures
0
0 comments X
read the original abstract

Understanding why a model made a certain prediction is crucial in many data science fields. Interpretable predictions engender appropriate trust and provide insight into how the model may be improved. However, with large modern datasets the best accuracy is often achieved by complex models even experts struggle to interpret, which creates a tension between accuracy and interpretability. Recently, several methods have been proposed for interpreting predictions from complex models by estimating the importance of input features. Here, we present how a model-agnostic additive representation of the importance of input features unifies current methods. This representation is optimal, in the sense that it is the only set of additive values that satisfies important properties. We show how we can leverage these properties to create novel visual explanations of model predictions. The thread of unity that this representation weaves through the literature indicates that there are common principles to be learned about the interpretation of model predictions that apply in many scenarios.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Global Aggregations of Local Explanations for Black Box models

    cs.IR 2019-07 unverdicted novelty 6.0

    GALE aggregates local explanations to reveal global model behavior, showing that LIME's global importance measure is unreliable while the proposed aggregations better capture how features affect predictions.