pith. sign in

arxiv: 1707.01943 · v3 · pith:D5ECHRD4new · submitted 2017-07-06 · 💻 cs.LG

A causal framework for explaining the predictions of black-box sequence-to-sequence models

classification 💻 cs.LG
keywords black-boxinput-outputmethodmodelpredictionssequence-to-sequencetokensacross
0
0 comments X
read the original abstract

We interpret the predictions of any black-box structured input-structured output model around a specific input-output pair. Our method returns an "explanation" consisting of groups of input-output tokens that are causally related. These dependencies are inferred by querying the black-box model with perturbed inputs, generating a graph over tokens from the responses, and solving a partitioning problem to select the most relevant components. We focus the general approach on sequence-to-sequence problems, adopting a variational autoencoder to yield meaningful input perturbations. We test our method across several NLP sequence generation tasks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.