The Price of Interpretability
Pith reviewed 2026-05-25 01:25 UTC · model grok-4.3
The pith
Machine learning models built from sequences of interpretable steps quantify the accuracy price of interpretability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Machine learning models are constructed in a sequence of interpretable steps. For a variety of models, a natural choice of interpretable steps recovers standard interpretability proxies, for example sparsity in linear models. These proxies are generalized to a parametrized family of consistent measures of model interpretability. This definition quantifies the price of interpretability as the tradeoff with predictive accuracy, with practical algorithms demonstrated on real and synthetic datasets.
What carries the argument
The sequence of interpretable steps, which defines model construction and supports generalization of interpretability measures.
If this is right
- Algorithms can find models that achieve specified interpretability levels at minimal accuracy cost.
- The approach applies directly to both real-world and synthetic data.
- Interpretability measures become comparable across different model families through the parametrized family.
Where Pith is reading between the lines
- Decision support systems could incorporate explicit accuracy penalties for required levels of interpretability.
- The method might be extended to incorporate other concerns like fairness by defining additional step types.
Load-bearing premise
A natural choice of interpretable steps will recover standard interpretability proxies such as sparsity and produce consistent measures across the parametrized family.
What would settle it
Demonstrating a set of models where the interpretability measure does not correlate with human judgments of interpretability or where increasing interpretability never reduces accuracy.
Figures
read the original abstract
When quantitative models are used to support decision-making on complex and important topics, understanding a model's ``reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks. However, the concept of interpretability remains loosely defined and application-specific. In this paper, we introduce a mathematical framework in which machine learning models are constructed in a sequence of interpretable steps. We show that for a variety of models, a natural choice of interpretable steps recovers standard interpretability proxies (e.g., sparsity in linear models). We then generalize these proxies to yield a parametrized family of consistent measures of model interpretability. This formal definition allows us to quantify the ``price'' of interpretability, i.e., the tradeoff with predictive accuracy. We demonstrate practical algorithms to apply our framework on real and synthetic datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a mathematical framework in which machine learning models are constructed via a sequence of interpretable steps. It claims that a natural choice of such steps recovers standard interpretability proxies (e.g., sparsity for linear models), generalizes these to a parametrized family of consistent interpretability measures, and thereby quantifies the tradeoff ('price') between interpretability and predictive accuracy, with practical algorithms demonstrated on real and synthetic data.
Significance. If the recovery of standard proxies and the consistency of the parametrized family can be established rigorously, the framework would supply a formal, general definition of interpretability that enables explicit quantification of accuracy-interpretability tradeoffs, a contribution that could structure future work on interpretable ML.
major comments (2)
- [Abstract] Abstract: the central claim that 'a natural choice of interpretable steps recovers standard interpretability proxies' is asserted without any derivation, error analysis, or explicit construction visible; this step is load-bearing for the entire framework and for the subsequent quantification of the price of interpretability.
- [Abstract] Abstract: the assertion that the generalization 'yields a parametrized family of consistent measures of model interpretability' lacks supporting equations, proof outline, or verification that the family is internally consistent or reduces to known proxies; without this, the formal definition of the price cannot be substantiated.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and for highlighting the importance of making the abstract claims fully traceable to the manuscript's technical content. We address each point below by directing to the relevant sections where the derivations, constructions, and consistency arguments appear.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'a natural choice of interpretable steps recovers standard interpretability proxies' is asserted without any derivation, error analysis, or explicit construction visible; this step is load-bearing for the entire framework and for the subsequent quantification of the price of interpretability.
Authors: The abstract is a concise summary; the explicit constructions appear in Section 3. For linear models we define the sequence of interpretable steps as successive feature selections and show that the resulting interpretability measure equals the number of nonzero coefficients (i.e., the standard sparsity proxy). Analogous derivations are given for decision trees (depth and number of leaves) and for other model families. Consistency and error bounds for these recoveries are established via the general framework introduced in Section 2 and proved in the appendix. revision: no
-
Referee: [Abstract] Abstract: the assertion that the generalization 'yields a parametrized family of consistent measures of model interpretability' lacks supporting equations, proof outline, or verification that the family is internally consistent or reduces to known proxies; without this, the formal definition of the price cannot be substantiated.
Authors: Section 4 introduces the parametrized family by replacing the indicator function in the base interpretability measure with a continuous, monotone function controlled by a parameter. We prove internal consistency (monotonicity, normalization, and invariance under equivalent representations) and show that the family reduces exactly to the standard proxies at the boundary values of the parameter. The price of interpretability is then defined in Section 5 as the minimal accuracy loss subject to a given interpretability level; the algorithms in Section 6 implement this optimization on real and synthetic data. revision: partial
Circularity Check
No significant circularity; framework derives interpretability measures independently
full rationale
The abstract and reader's assessment describe a self-contained framework that defines models via sequences of interpretable steps, shows recovery of standard proxies (e.g., sparsity), generalizes to a parametrized family, and quantifies the accuracy tradeoff. No load-bearing step reduces by the paper's own equations to a fitted input renamed as prediction, nor relies on self-citation chains or ansatzes smuggled from prior work. The central claim remains independent of the inputs by construction, consistent with the reader's circularity score of 3.0 and the absence of any quoted reduction in the material.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Interpreting Blackbox Models via Model Extraction
Hamsa Bastani, Osbert Bastani, and Carolyn Kim. Interpreting Predictive Models for Human-in-the-Loop Ana- lytics. arXiv preprint arXiv:1705.08504, pages 1–45, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[2]
An impact assessment of machine learning risk forecasts on parole board decisions and recidivism
Richard Berk. An impact assessment of machine learning risk forecasts on parole board decisions and recidivism. Journal of Experimental Criminology, 13(2):193–216, 2017
2017
-
[3]
Optimal classification trees
Dimitris Bertsimas and Jack Dunn. Optimal classification trees. Machine Learning, 106(7):1039–1082, 2017
2017
-
[4]
Weinstein, and Ying Daisy Zhuo
Dimitris Bertsimas, Nathan Kallus, Alexander M. Weinstein, and Ying Daisy Zhuo. Personalized diabetes man- agement using electronic medical records. Diabetes Care, 40(2):210–217, 2017
2017
-
[5]
Best subset selection via a modern optimization lens
Dimitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. Annals of Statistics, 44(2):813–852, 2016
2016
-
[6]
Sparse High-Dimensional Regression: Exact Scalable Algorithms and Phase Transitions
Dimitris Bertsimas and Bart Van Parys. Sparse High-Dimensional Regression: Exact Scalable Algorithms and Phase Transitions. Annals of Statistics, to appear, 2019
2019
-
[7]
Classification and regression trees
Leo Breiman. Classification and regression trees. New York: Routledge, 1984
1984
-
[8]
Random Forests
Leo Breiman. Random Forests. Machine Learning, 45(1):5–32, 2001
2001
-
[9]
Statistical modeling: The two cultures
Leo Breiman. Statistical modeling: The two cultures. Statistical science, 16(3):199–231, 2001
2001
-
[10]
Model compression
Cristian Bucil, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’06 , page 535, New York, New York, USA, 2006. ACM, ACM Press
2006
-
[11]
Algorithmic Transparency via Quantitative Input Influence :
Anupam Datta, Shayak Sen, and Yair Zick. Algorithmic Transparency via Quantitative Input Influence :. In 2016 IEEE Symposium on Security and Privacy, 2016
2016
-
[12]
Algorithm aversion: People erroneously avoid algorithms after seeing them err
Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1):114, 2015
2015
-
[13]
Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them.Management Science, 64(3):1155–1170, nov 2016
Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them.Management Science, 64(3):1155–1170, nov 2016. 15 A PREPRINT - J ULY 9, 2019
2016
-
[14]
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez and Been Kim. Towards A Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:1702.08608, (Ml):1–13, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
Least Angle Regression
Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least Angle Regression. Annals of Statis- tics, 32(2):407–499, apr 2004
2004
-
[16]
Alex A. Freitas. Comprehensible classification models. ACM SIGKDD Explorations Newsletter , 15(1):1–10, 2014
2014
-
[17]
The elements of statistical learning
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Springer series in statistics New York, NY , USA:, 2001
2001
-
[18]
Explaining Explanations: An Overview of Interpretability of Machine Learning
Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. Explaining Ex- planations : An Approach to Evaluating Interpretability of Machine Learning. arXiv preprint arXiv:1806.00069, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
European Union regulations on algorithmic decision-making and a ”right to explanation”
Bryce Goodman and Seth Flaxman. European Union regulations on algorithmic decision-making and a ”right to explanation”. pages 1–9, 2016
2016
-
[20]
Statistical learning with sparsity: the lasso and generalizations
Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC press, 2015
2015
-
[21]
The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification
Been Kim, Cynthia Rudin, and Julie Shah. The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification. In Neural Information Processing Systems (NIPS) 2014, 2014
2014
-
[22]
I. Y . Kim and O. L. De Weck. Adaptive weighted-sum method for bi-objective optimization: Pareto front generation. Structural and Multidisciplinary Optimization, 29(2):149–158, 2005
2005
-
[23]
Human decisions and machine predictions
Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. Human decisions and machine predictions. The quarterly journal of economics, 133(1):237–293, 2017
2017
-
[24]
Ali Koc ¸ and David P. Morton. Prioritization via Stochastic Optimization.Management Science, 61(3):586–603, 2014
2014
-
[25]
Interpretable decision sets: a joint framework for description and prediction
Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. Interpretable decision sets: a joint framework for description and prediction. KDD ’16 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1:1675–1684, 2016
2016
-
[26]
Interpretable & Explorable Approxima- tions of Black Box Models
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. Interpretable & Explorable Approxima- tions of Black Box Models. FAT/ML, jul 2017
2017
-
[27]
McCormick, and David Madigan
Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. Annals of Applied Statistics, 9(3):1350–1371, 2015
2015
-
[28]
A general approach for incremental approximation and hierarchical clus- tering
Guolong Lin and David Williamson. A general approach for incremental approximation and hierarchical clus- tering. SIAM Journal Computing, 39(8):3633–3669, 2010
2010
-
[29]
Zachary C. Lipton. The Mythos of Model Interpretability. arXiv preprint arXiv:1606.03490, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[30]
Intelligible Models for Classification and Regression
Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible Models for Classification and Regression. In Pro- ceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining , pages 150–158. ACM, 2012
2012
-
[31]
Does machine learning automate moral hazard and error? American Economic Review, 107(5):476–480, 2017
Sendhil Mullainathan and Ziad Obermeyer. Does machine learning automate moral hazard and error? American Economic Review, 107(5):476–480, 2017
2017
-
[32]
Why Should I Trust You? Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why Should I Trust You? Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016
2016
-
[33]
Tibshirani
Jonathan Taylor and Robert J. Tibshirani. Statistical learning and selective inference.Proceedings of the National Academy of Sciences, 112(25):7629–7634, jun 2015
2015
-
[34]
Tibshirani
Robert J. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996
1996
-
[35]
Supersparse linear integer models for optimized medical scoring systems
Berk Ustun and Cynthia Rudin. Supersparse linear integer models for optimized medical scoring systems. Ma- chine Learning, 102(3):349–391, 2016
2016
-
[36]
Scalable Bayesian Rule Lists
Hongyu Yang, Cynthia Rudin, and Margo Seltzer. Scalable Bayesian Rule Lists. In Proceedings of the 34th International Conference on Machine Learning, 2017. 16 A PREPRINT - J ULY 9, 2019 A Appendix A.1 Proof of Theorem 1 Proof of part (a). Asc(·) is bounded, we havecmax ∈ R such that 0<c (·) ≤cmax. Letm+ ∈ P (m+) be a path of optimal length to the modelm+,...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.