pith. sign in

arxiv: 1711.09784 · v1 · pith:CBKWNLIAnew · submitted 2017-11-27 · 💻 cs.LG · cs.AI· stat.ML

Distilling a Neural Network Into a Soft Decision Tree

classification 💻 cs.LG cs.AIstat.ML
keywords decisionneuralparticularclassificationdatahierarchicalinputknowledge
0
0 comments X
read the original abstract

Deep neural networks have proved to be a very effective way to perform classification tasks. They excel when the input data is high dimensional, the relationship between the input and the output is complicated, and the number of labeled training examples is large. But it is hard to explain why a learned network makes a particular classification decision on a particular test case. This is due to their reliance on distributed hierarchical representations. If we could take the knowledge acquired by the neural net and express the same knowledge in a model that relies on hierarchical decisions instead, explaining a particular decision would be much easier. We describe a way of using a trained neural net to create a type of soft decision tree that generalizes better than one learned directly from the training data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TILT: Target-induced loss tilting under covariate shift

    cs.LG 2026-05 conditional novelty 7.0

    TILT adds a target-data penalty on an auxiliary predictor component to induce effective importance weighting for unsupervised domain adaptation under covariate shift.

  2. Minimax Rates and Spectral Distillation for Tree Ensembles

    stat.ML 2026-05 unverdicted novelty 7.0

    Spectral analysis of tree ensembles produces minimax rates for random forests governed by kernel eigenvalue decay and enables distillation of RFs and GBMs into compact models via leading eigenfunctions and singular vectors.

  3. Approximation-Free Differentiable Oblique Decision Trees

    cs.LG 2026-05 unverdicted novelty 7.0

    DTSemNet gives an exact, invertible neural-network encoding of hard oblique decision trees that supports direct gradient training for both classification and regression without probabilistic softening or quantized estimators.

  4. Ternary Decision Trees with Locally-Adaptive Uncertainty Zones

    cs.LG 2026-05 unverdicted novelty 6.0

    Ternary decision trees with locally-adaptive uncertainty zones estimated from CART statistics improve decided accuracy over standard trees by blending boundary predictions and flagging uncertain cases.

  5. Prophecy: Inferring Formal Properties from Neuron Activations

    cs.LG 2025-09 unverdicted novelty 6.0

    Prophecy infers formal properties of feed-forward neural networks by extracting rules from neuron activation patterns that imply desirable output behaviors.

  6. The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models

    cs.LG 2025-07 unverdicted novelty 6.0

    Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.

  7. SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks

    cs.LG 2024-11 unverdicted novelty 6.0

    SkillTree reduces continuous action spaces to discrete skills via a differentiable decision tree in a hierarchical policy, achieving comparable performance to neural skill methods with added skill-level explainability...

  8. SaliencyDecor: Enhancing Neural Network Interpretability through Feature Decorrelation

    cs.CV 2026-04 unverdicted novelty 5.0

    Enforcing feature decorrelation during training produces sharper saliency maps and higher accuracy on image classification benchmarks.

  9. Cross-Paradigm Knowledge Distillation: A Comprehensive Study of Bidirectional Transfer Between Random Forests and Deep Neural Networks for Big Data Applications

    cs.LG 2026-05 unverdicted novelty 4.0

    A study of bidirectional knowledge transfer between Random Forests and Deep Neural Networks using proposed distillation methods, evaluated on classification and regression tasks across six datasets.

  10. What does it mean to understand a neural network?

    cs.LG 2019-07 unverdicted novelty 4.0

    Simple training code produces complex neural networks, suggesting that brain learning rules may be easier to understand than mature brain properties and that neuroscience should shift focus accordingly.