pith. sign in

hub Canonical reference

Towards A Rigorous Science of Interpretable Machine Learning

Canonical reference. 71% of citing Pith papers cite this work as background.

94 Pith papers citing it
Background 71% of classified citations
abstract

As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.

hub tools

citation-role summary

background 13 method 1

citation-polarity summary

clear filters

representative citing papers

Forecasting Future Behavior as a Learning Task

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

Behavior Forecasters trained on LRM trajectories outperform larger models in predicting repeatability and input sensitivity at low cost.

In Defense of Information Leakage in Concept-based Models

cs.LG · 2026-06-09 · conditional · novelty 7.0

Concept-based models can use controlled 'benign' information leakage to remain accurate and intervenable under real-world concept incompleteness by reframing their training objective.

Monosemanticity in Recommender Systems

cs.IR · 2026-06-28 · unverdicted · novelty 6.0 · 2 refs

Matryoshka Sparse Autoencoders applied to matrix-factorization embeddings from the Amazon Fashion dataset recover hierarchical monosemantic features that align with metadata and permit targeted intervention.

Explaining Rankings with Hidden Group Bonuses

cs.DS · 2026-05-28 · unverdicted · novelty 6.0 · 2 refs

Introduces a constraint-satisfaction algorithm and complexity results for recovering linear utilities and latent group bonuses to explain observed rankings under hidden sensitive features.

Interpretability Can Be Actionable

cs.LG · 2026-05-11 · conditional · novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.