Interpreting Blackbox Models via Model Extraction

Carolyn Kim; Hamsa Bastani; Osbert Bastani

arxiv: 1705.08504 · v6 · pith:4GI6Q35Bnew · submitted 2017-05-23 · 💻 cs.LG

Interpreting Blackbox Models via Model Extraction

Osbert Bastani , Carolyn Kim , Hamsa Bastani This is my paper

classification 💻 cs.LG

keywords decisionblackboxtreealgorithmexplanationsmodelmodelsseveral

0 comments

read the original abstract

Interpretability has become incredibly important as machine learning is increasingly used to inform consequential decisions. We propose to construct global explanations of complex, blackbox models in the form of a decision tree approximating the original model---as long as the decision tree is a good approximation, then it mirrors the computation performed by the blackbox model. We devise a novel algorithm for extracting decision tree explanations that actively samples new training points to avoid overfitting. We evaluate our algorithm on a random forest to predict diabetes risk and a learned controller for cart-pole. Compared to several baselines, our decision trees are both substantially more accurate and equally or more interpretable based on a user study. Finally, we describe several insights provided by our interpretations, including a causal issue validated by a physician.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

BoolXLLM: LLM-Assisted Explainability for Boolean Models
cs.AI 2026-05 unverdicted novelty 6.0

BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.
The Price of Interpretability
cs.LG 2019-07 unverdicted novelty 6.0

Introduces a framework for constructing ML models via interpretable steps, generalizes standard proxies into a parametrized family of measures, and quantifies the accuracy-interpretability tradeoff via practical algorithms.
Optimal Explanations of Linear Models
cs.LG 2019-07 unverdicted novelty 5.0

An optimization framework decomposes linear models into increasing-complexity sequences using coordinate updates to generate parametrized interpretability metrics.
Assessing Model-Agnostic XAI Methods against EU AI Act Explainability Requirements
cs.CY 2026-03 unverdicted novelty 4.0

A qualitative-to-quantitative scoring framework is proposed to evaluate how well model-agnostic XAI methods support EU AI Act explainability requirements.