arxiv: 2605.07837 · v1 · submitted 2026-05-08 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Approximation-Free Differentiable Oblique Decision Trees

Subrat Prasad Panda , Blaise Genest , Arvind Easwaran

Authors on Pith no claims yet

Pith reviewed 2026-05-11 03:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords oblique decision treesdifferentiable decision treesneural network representationgradient descent trainingapproximation-free optimizationregression with decision treesreinforcement learning policies

0 comments

The pith

Hard oblique decision trees can be represented exactly as invertible neural networks for approximation-free training with standard gradient descent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a mapping called DTSemNet that turns hard oblique decision trees into neural networks while preserving exact semantics and allowing inversion back to the original tree form. This enables direct use of backpropagation to optimize split parameters and leaf values together, without softening the boundaries or using gradient approximations such as the Straight-Through Estimator. For regression, where joint optimization is especially sensitive, an annealed Top-k procedure supplies the required gradient signals. The resulting trees achieve higher accuracy than prior differentiable decision-tree methods on classification and regression benchmarks and can operate directly as policies in reinforcement-learning environments.

Core claim

DTSemNet is a semantically equivalent and invertible representation of hard oblique decision trees as neural networks that enables end-to-end training with standard gradient descent, eliminating the need for approximations in both classification and regression. While classification aligns naturally with this formulation, regression remains challenging due to the joint optimization of internal nodes and leaf regressors. To address this, the authors analyze the limitations of STE and introduce an annealed Top-k method that provides accurate gradient signals without approximation. Extensive experiments on classification and regression benchmarks show that DTSemNet-trained oblique DTs outperform

What carries the argument

DTSemNet, the semantically equivalent and invertible mapping of hard oblique decision trees to neural networks that carries the exact decision logic into a form trainable by gradient descent.

If this is right

Oblique decision trees can be trained end-to-end like neural networks while remaining fully hard and interpretable at inference time.
Joint optimization of split directions and leaf regressors becomes feasible without introducing approximation bias that grows with depth.
The same trained model works for both classification and regression without separate softening schemes.
The resulting trees can be deployed directly as programmatic policies inside reinforcement-learning agents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The invertibility of the mapping could allow a trained network to be converted back to an explicit tree for post-hoc inspection or regulatory auditing.
Because the representation is exact, it might be inserted as an interpretable module inside larger hybrid neural architectures.
The approach could reduce the overfitting that sometimes arises when soft or quantized gradients push splits toward suboptimal local minima.
Testing on deeper trees or higher-dimensional tabular data would reveal whether the annealed Top-k procedure continues to scale without extra hyper-parameter tuning.

Load-bearing premise

The invertible mapping preserves exact semantic equivalence between the decision tree and the neural network throughout gradient-based optimization, and the annealed Top-k method supplies unbiased gradients that remain accurate as tree depth and data dimension increase.

What would settle it

Training DTSemNet on a low-dimensional dataset with a known globally optimal oblique split and then verifying that the extracted hard tree produces identical predictions and loss values to the trained network on held-out data.

Figures

Figures reproduced from arXiv: 2605.07837 by Arvind Easwaran, Blaise Genest, Subrat Prasad Panda.

**Figure 1.** Figure 1: (a) DT with 4 internal nodes (I0–I3) and 5 leaves (L0–L4) and (b) DTSemNet NT corresponding to DT T in (a) with x ∈ R 3 . Second Layer (Linear Layer). m linear nodes (I ′ 0 , . . . , I′ m−1 ), each corresponding to an internal node (I0, . . . , Im−1) of the DT T and computing I ′ i (x) = ⟨Ai , x⟩ + bi . Third Layer (Branching Activation). 2m nodes, two per internal node: ⊤i(x) := ReLU(I ′ i (x)), ⊥i(x) := … view at source ↗

**Figure 2.** Figure 2: Architecture for extending DTSemNet to regression tasks. In (a), the thick line (orange) for backpropagation indicates that DTSemNet is updated using all regressor outputs r, whereas in (b), only the selected leaf(s) is used for the update. When used in DTSemNet for regression tasks, we replace the one-hot vector h by Sk(z) = s, the final output is thus the weighted sum over the k leaf-specific regressors:… view at source ↗

**Figure 3.** Figure 3: Sample distribution across leaves for the CTSlice and Pol datasets, showing that [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of training dynamics between DTSemNet with Top-k routing (k = 2) and STE in a DTSemNet-regression model of height 5, using a synthetic regression dataset generated by a teacher DT. The student network is trained to imitate the teacher. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Sample distribution across leaves / regressors for CTSlice at the beginning of [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: The loss landscape of DTSemNet and DGT for different datasets. A flatter landscape allows flexible parameter updates without affecting training loss [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

read the original abstract

Decision Trees (DTs) are widely used in safety-critical domains such as medical diagnosis, valued for their interpretability and effectiveness on tabular data. However, training accurate oblique DTs is challenging due to complex optimization landscapes and overfitting risks, particularly in regression. Recent advances have introduced differentiable formulations that enable gradient-based training and joint optimization of decision boundaries and leaf regressors. Yet, existing approaches typically rely on approximations, either through probabilistic softening of boundaries (soft DTs) or quantized gradients such as the Straight-Through Estimator (STE). To overcome these limitations, we propose DTSemNet, a novel, semantically equivalent, and invertible representation of hard oblique DTs as neural networks. DTSemNet enables end-to-end training with standard gradient descent, eliminating the need for approximations in both classification and regression. While classification aligns naturally with this formulation, regression remains challenging due to the joint optimization of internal nodes and leaf regressors. To address this, we analyze the limitations of STE and introduce an annealed Top-k method that provides accurate gradient signals without approximation. Extensive experiments on classification and regression benchmarks show that DTSemNet-trained oblique DTs outperform state-of-the-art differentiable DTs. Furthermore, we demonstrate that DTSemNet can serve as programmatic DT policies in reinforcement learning environments, thereby broadening their applicability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DTSemNet gives an invertible neural encoding of hard oblique trees plus annealed Top-k gradients, but the bias question for joint regression optimization still needs checking.

read the letter

The paper's main move is DTSemNet: an exact, invertible neural representation of hard oblique decision trees that supports ordinary gradient descent without softening the splits or using STE. They add annealed Top-k to supply gradients for the harder regression case where splits and leaf values are optimized together. That combination is new relative to the soft-DT and STE papers they cite, and the experiments show it beats prior differentiable trees on classification and regression benchmarks while also working as RL policies. Those results are concrete and worth looking at if you care about tabular or safety-critical models. The soft spot is the one the stress test flags. Even with annealing, the Top-k surrogate can leave residual bias that compounds across depth or across many candidate splits per node. The abstract asserts equivalence and superior performance, but without the derivation, invertibility proof, or post-training checks that the learned model is actually a hard tree, it is hard to know whether the gains are real or just from a still-approximate surrogate. For readers working on interpretable ML for tabular data this is worth a look; the encoding idea is clean enough that a careful referee could tighten the theory without killing the contribution. I would send it to review and ask specifically for the bias analysis and equivalence verification.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce DTSemNet, a semantically equivalent and invertible neural-network representation of hard oblique decision trees that permits exact end-to-end training by standard gradient descent without any approximation. For the regression case, where splits and leaf regressors must be optimized jointly, the authors replace the Straight-Through Estimator with an annealed Top-k surrogate that is asserted to supply unbiased gradients; extensive experiments on classification and regression benchmarks are reported to show superiority over prior differentiable DT methods, with an additional demonstration of DTSemNet as programmatic policies in reinforcement-learning environments.

Significance. If the claimed exact equivalence and unbiased gradient property are rigorously established, the result would be a meaningful advance for differentiable decision trees: it would allow hard, interpretable oblique DTs to be trained end-to-end on tabular data without the bias introduced by soft boundaries or STE, which is particularly valuable in safety-critical domains and for RL policy learning.

major comments (2)

[Abstract and §3] Abstract and §3 (DTSemNet construction): the central claim that DTSemNet is a 'semantically equivalent' and 'invertible' representation of hard oblique DTs that remains exact under gradient flow is load-bearing for the entire contribution, yet the manuscript provides no formal proof of invertibility or of preservation of the hard decision boundaries after parameter updates; without this, the 'approximation-free' guarantee cannot be verified.
[§4.2] §4.2 (annealed Top-k method): the assertion that the annealed Top-k surrogate supplies 'accurate gradient signals without approximation' for joint split/leaf optimization is contradicted by the known risk that bias accumulates under repeated composition with tree depth and with increasing input dimension; the paper must either prove that the bias vanishes exactly at the end of annealing or provide a quantitative bound that does not grow with depth or dimension.

minor comments (2)

[Abstract] The abstract should explicitly name the benchmark datasets and report error bars or statistical significance for the claimed performance gains.
[§3 and §4] Notation for the oblique split parameters and the Top-k annealing schedule should be introduced once and used consistently; several equations in §3 and §4 reuse symbols without redefinition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where the manuscript's central claims require stronger formal support. We respond to each major comment below and commit to revisions that directly address the concerns while preserving the integrity of the presented results.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (DTSemNet construction): the central claim that DTSemNet is a 'semantically equivalent' and 'invertible' representation of hard oblique DTs that remains exact under gradient flow is load-bearing for the entire contribution, yet the manuscript provides no formal proof of invertibility or of preservation of the hard decision boundaries after parameter updates; without this, the 'approximation-free' guarantee cannot be verified.

Authors: We agree that an explicit formal proof is necessary to rigorously substantiate the semantic equivalence, invertibility, and exactness under gradient flow. The DTSemNet construction encodes each oblique split via a linear transformation and exact hard threshold in the forward pass, with a bijective mapping from tree parameters to network weights that ensures invertibility by design. However, the current manuscript presents this through the architectural definition rather than a standalone theorem. We will add a new subsection in §3 containing a formal theorem and proof establishing (i) semantic equivalence to the original hard oblique DT, (ii) invertibility of the encoding, and (iii) invariance of the hard decision boundaries under gradient-based parameter updates, as the forward computation remains identical to the DT evaluation regardless of how parameters are optimized. revision: yes
Referee: [§4.2] §4.2 (annealed Top-k method): the assertion that the annealed Top-k surrogate supplies 'accurate gradient signals without approximation' for joint split/leaf optimization is contradicted by the known risk that bias accumulates under repeated composition with tree depth and with increasing input dimension; the paper must either prove that the bias vanishes exactly at the end of annealing or provide a quantitative bound that does not grow with depth or dimension.

Authors: We acknowledge the legitimate concern about bias accumulation through repeated composition in deeper trees or higher dimensions. The annealed Top-k surrogate is constructed so that the temperature parameter is driven to zero by the end of training, at which point the operator recovers the exact hard selection used in the original DT. While our experiments demonstrate effective joint optimization and superior performance, the manuscript does not derive a formal bound on residual bias. In the revision we will expand §4.2 with an analysis that either (a) proves the bias vanishes exactly under the annealing schedule or (b) supplies a quantitative error bound together with a discussion of its dependence on depth and dimension; if a tight bound independent of these factors cannot be obtained, we will instead provide a clear statement of the conditions under which the bias remains negligible in practice along with additional empirical diagnostics. revision: partial

Circularity Check

0 steps flagged

No circularity: DTSemNet is a direct architectural construction with independent gradient analysis

full rationale

The paper defines DTSemNet as a novel invertible mapping from hard oblique DTs to neural networks and introduces annealed Top-k as a replacement for STE after analyzing its limitations. No load-bearing step reduces a claimed prediction or equivalence to a fitted parameter, self-citation, or ansatz imported from prior work by the same authors. The central claims rest on the explicit construction of the representation and the proposed gradient method rather than re-expressing inputs by definition. This is the normal case of a self-contained architectural contribution.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of an exact invertible mapping from hard oblique trees to neural networks and on the stability of annealed Top-k gradients; these are introduced by the paper rather than derived from prior literature.

free parameters (1)

annealing schedule parameters
The rate and schedule for annealing the Top-k selection must be chosen; these control how quickly the gradient focuses on the correct leaf and are not fixed by the tree structure.

axioms (1)

domain assumption The mapping from oblique tree to neural network is exactly invertible and preserves the hard decision semantics under back-propagation.
Invoked when the abstract states that DTSemNet is semantically equivalent and enables standard gradient descent without approximation.

invented entities (1)

DTSemNet no independent evidence
purpose: Exact neural-network encoding of hard oblique decision trees
New representation introduced to remove the need for softening or STE; independent evidence would be a formal proof of equivalence or public code that reproduces the claimed identity.

pith-pipeline@v0.9.0 · 5529 in / 1461 out tokens · 44051 ms · 2026-05-11T03:35:11.323626+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1 … argmax N_T(x) = T(x) … L'_ℓ(x) is the unique maximum … using ReLU(I'_i(x)) and ReLU(−I'_i(x))
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

annealed Top-k … no approximation … forward and backward passes share the same semantics

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 4 internal anchors

[1]

Vanilla Gradient Descent for Oblique Decision Trees , ISBN =

Panda, Subrat Prasad and Genest, Blaise and Easwaran, Arvind and Suganthan, Ponnuthurai Nagaratnam , year =. Vanilla Gradient Descent for Oblique Decision Trees , ISBN =. doi:10.3233/faia240607 , booktitle =

work page doi:10.3233/faia240607
[2]

arXiv preprint arXiv:2408.09135 , year=

Vanilla Gradient Descent for Oblique Decision Trees , author=. arXiv preprint arXiv:2408.09135 , year=. 2408.09135 , archivePrefix=

work page arXiv
[3]

Journal of Machine Learning Research , year =

Antonin Raffin and Ashley Hill and Adam Gleave and Anssi Kanervisto and Maximilian Ernestus and Noah Dormann , title =. Journal of Machine Learning Research , year =

work page
[4]

Araújo , title =

Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga and Dipam Chakraborty and Kinal Mehta and João G.M. Araújo , title =. Journal of Machine Learning Research , year =

work page
[5]

The New England journal of medicine , volume=

Machine learning and prediction in medicine—beyond the peak of inflated expectations , author=. The New England journal of medicine , volume=. 2017 , publisher=

work page 2017
[6]

Machine Learning , volume=

Optimal classification trees , author=. Machine Learning , volume=. 2017 , publisher=

work page 2017
[7]

International Statistical Review , volume=

Fifty years of classification and regression trees , author=. International Statistical Review , volume=. 2014 , publisher=

work page 2014
[8]

Information processing letters , volume=

Constructing optimal binary decision trees is NP-complete , author=. Information processing letters , volume=

work page
[9]

Advances in neural information processing systems , volume=

Why do tree-based models still outperform deep learning on typical tabular data? , author=. Advances in neural information processing systems , volume=

work page
[10]

The Eleventh International Conference on Learning Representations , year=

Treeformer: Dense Gradient Trees for Efficient Attention Computation , author=. The Eleventh International Conference on Learning Representations , year=

work page
[11]

Classification and regression trees , year=

Cart , author=. Classification and regression trees , year=

work page
[12]

Advances in neural information processing systems , volume=

Alternating optimization of decision trees, with application to learning sparse oblique trees , author=. Advances in neural information processing systems , volume=

work page
[13]

Rensselaer Polytechnic Institute Math Report , volume=

Optimal decision trees , author=. Rensselaer Polytechnic Institute Math Report , volume=

work page
[14]

Advances in neural information processing systems , volume=

Binarized neural networks , author=. Advances in neural information processing systems , volume=

work page
[15]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Estimating or propagating gradients through stochastic neurons for conditional computation , author=. arXiv preprint arXiv:1308.3432 , year=

work page internal anchor Pith review arXiv
[16]

Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , author=

work page
[17]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Proceedings of the AAAI conference on artificial intelligence , volume=

Gradtree: Learning axis-aligned decision trees with gradient descent , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[19]

The Thirteenth International Conference on Learning Representations , year=

Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[20]

Swarm and Evolutionary Computation , volume=

Recent trends in the use of statistical tests for comparing swarm and evolutionary computing algorithms: Practical guidelines and a critical review , author=. Swarm and Evolutionary Computation , volume=. 2020 , publisher=

work page 2020
[21]

Proceedings of AAAI , volume=

OC1: A randomized algorithm for building oblique decision trees , author=. Proceedings of AAAI , volume=. 1993 , organization=

work page 1993
[22]

Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, and Rodney Tsing

Starcraft ii: A new challenge for reinforcement learning , author=. arXiv preprint arXiv:1708.04782 , year=

work page arXiv
[23]

2024 , issn =

Efficient evolution of decision trees via fully matrix-based fitness evaluation , journal =. 2024 , issn =. doi:https://doi.org/10.1016/j.asoc.2023.111045 , author =

work page doi:10.1016/j.asoc.2023.111045 2024
[24]

Explainable Artificial Intelligence by Genetic Programming: A Survey , year=

Mei, Yi and Chen, Qi and Lensen, Andrew and Xue, Bing and Zhang, Mengjie , journal=. Explainable Artificial Intelligence by Genetic Programming: A Survey , year=

work page
[25]

International Conference on Machine Learning , pages=

Adaptive neural trees , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019
[26]

arXiv preprint arXiv:2003.04675 , year=

Towards Interpretable ANNs: An Exact Transformation to Multi-Class Multivariate Decision Trees , author=. arXiv preprint arXiv:2003.04675 , year=

work page arXiv 2003
[27]

arXiv preprint arXiv:2209.03415 , year=

A survey of neural trees , author=. arXiv preprint arXiv:2209.03415 , year=

work page arXiv
[28]

Sankhya A , volume=

Neural random forests , author=. Sankhya A , volume=. 2019 , publisher=

work page 2019
[29]

2021 , eprint=

CDT: Cascading Decision Trees for Explainable Reinforcement Learning , author=. 2021 , eprint=

work page 2021
[30]

International Conference on Learning Representations , year=

Programmatic reinforcement learning without oracles , author=. International Conference on Learning Representations , year=

work page
[31]

arXiv preprint arXiv:2004.00221 , year=

NBDT: neural-backed decision trees , author=. arXiv preprint arXiv:2004.00221 , year=

work page arXiv 2004
[32]

Deep Neural Decision Forests , year=

Kontschieder, Peter and Fiterau, Madalina and Criminisi, Antonio and Bulò, Samuel Rota , booktitle=. Deep Neural Decision Forests , year=

work page
[33]

Neural Decision Forests for Semantic Image Labelling , year=

Bulò, Samuel and Kontschieder, Peter , booktitle=. Neural Decision Forests for Semantic Image Labelling , year=

work page
[34]

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence , pages =

Gupta, Ujjwal Das and Talvitie, Erik and Bowling, Michael , title =. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence , pages =. 2015 , isbn =

work page 2015
[35]

International Conference on Machine Learning , pages=

Learning binary decision trees by argmin differentiation , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[36]

International Conference on Machine Learning , pages=

The tree ensemble layer: Differentiability meets conditional computation , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[37]

arXiv preprint arXiv:1806.06988 , year=

Deep neural decision trees , author=. arXiv preprint arXiv:1806.06988 , year=

work page arXiv
[38]

Distilling a Neural Network Into a Soft Decision Tree

Distilling a neural network into a soft decision tree , author=. arXiv preprint arXiv:1711.09784 , year=

work page Pith review arXiv
[39]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Encoding Human Domain Knowledge to Warm Start Reinforcement Learning , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2021 , pages=

work page 2021
[40]

2022 , eprint=

Neural Networks are Decision Trees , author=. 2022 , eprint=

work page 2022
[41]

International conference on artificial intelligence and statistics , pages=

Optimization methods for interpretable differentiable decision trees applied to reinforcement learning , author=. International conference on artificial intelligence and statistics , pages=. 2020 , organization=

work page 2020
[42]

2022 , journal =

Ajaykrishna Karthikeyan and Jain, Naman and Natarajan, Nagarajan and Jain, Prateek , title =. 2022 , journal =

work page 2022
[43]

Paleja and Yaru Niu and Andrew Silva and Chace Ritchie and Sugju Choi and Matthew C

Rohan R. Paleja and Yaru Niu and Andrew Silva and Chace Ritchie and Sugju Choi and Matthew C. Gombolay , title =. Robotics: Science and Systems , year =

work page
[44]

arXiv preprint arXiv:1909.13488 , year=

Oblique decision trees from derivatives of relu networks , author=. arXiv preprint arXiv:1909.13488 , year=

work page arXiv 1909
[45]

Proceedings of the 37th International Conference on Machine Learning , pages =

Smaller, more accurate regression forests using tree alternating optimization , author =. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , volume =

work page 2020
[46]

International conference on machine learning , pages=

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[47]

International conference on machine learning , pages=

On the optimization of deep networks: Implicit acceleration by overparameterization , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[48]

Advances in neural information processing systems , volume=

Verifiable reinforcement learning via policy extraction , author=. Advances in neural information processing systems , volume=

work page
[49]

Advances in neural information processing systems , volume=

Visualizing the loss landscape of neural nets , author=. Advances in neural information processing systems , volume=

work page
[50]

Proceedings of the twenty-first international conference on Machine learning , pages=

Apprenticeship learning via inverse reinforcement learning , author=. Proceedings of the twenty-first international conference on Machine learning , pages=

work page
[51]

Improved Policy Extraction via Online Q-Value Distillation , year=

Jhunjhunwala, Aman and Lee, Jaeyoung and Sedwards, Sean and Abdelzad, Vahdat and Czarnecki, Krzysztof , booktitle=. Improved Policy Extraction via Online Q-Value Distillation , year=

work page
[52]

Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10--14, 2018, Proceedings, Part II 18 , pages=

Toward interpretable deep reinforcement learning with linear model u-trees , author=. Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10--14, 2018, Proceedings, Part II 18 , pages=. 2019 , organization=

work page 2018
[53]

International Conference on Machine Learning , pages=

Programmatically interpretable reinforcement learning , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018
[54]

Advances in Neural Information Processing Systems , volume=

Imitation-projected programmatic reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[55]

Proceedings of the IJCAI 2019 workshop on explainable artificial intelligence , pages=

Distilling deep reinforcement learning policies in soft decision trees , author=. Proceedings of the IJCAI 2019 workshop on explainable artificial intelligence , pages=

work page 2019
[56]

Regularizing Black-box Models for Improved Interpretability , volume =

Plumb, Gregory and Al-Shedivat, Maruan and Cabrera, \'. Regularizing Black-box Models for Improved Interpretability , volume =. Advances in Neural Information Processing Systems , pages =

work page
[57]

and Parbhoo, Sonali and Zazzi, Maurizio and Roth, Volker and Doshi-Velez, Finale , title =

Wu, Mike and Hughes, Michael C. and Parbhoo, Sonali and Zazzi, Maurizio and Roth, Volker and Doshi-Velez, Finale , title =. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence , ar...

work page 2018
[58]

Journal of Artificial Intelligence Research , volume=

Optimizing for interpretability in deep neural networks with tree regularization , author=. Journal of Artificial Intelligence Research , volume=

work page
[59]

Enhancing Decision Tree Based Interpretation of Deep Neural Networks through L1-Orthogonal Regularization , year=

Schaaf, Nina and Huber, Marco and Maucher, Johannes , booktitle=. Enhancing Decision Tree Based Interpretation of Deep Neural Networks through L1-Orthogonal Regularization , year=

work page
[60]

Complex & Intelligent Systems , year=

Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis , author=. Complex & Intelligent Systems , year=

work page
[61]

Engineering Applications of Artificial Intelligence , volume=

Interpretable policies for reinforcement learning by genetic programming , author=. Engineering Applications of Artificial Intelligence , volume=. 2018 , publisher=

work page 2018
[62]

IFAC-PapersOnLine , volume=

Optimal control via reinforcement learning with symbolic policy approximation , author=. IFAC-PapersOnLine , volume=. 2017 , publisher=

work page 2017
[63]

Neural Networks , volume=

Moet: Mixture of expert trees and its application to verifiable reinforcement learning , author=. Neural Networks , volume=

work page
[64]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Tripletree: A versatile interpretable representation of black box agents and their environments , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[65]

arXiv preprint arXiv:1907.01180 , year=

Conservative q-improvement: Reinforcement learning for an interpretable decision-tree policy , author=. arXiv preprint arXiv:1907.01180 , year=

work page arXiv 1907
[66]

dtControl 2.0: Explainable strategy representation via decision tree learning steered by experts , author=. Tools and Algorithms for the Construction and Analysis of Systems: 27th International Conference, TACAS 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27--April...

work page 2021
[67]

Transactions on Machine Learning Research , year=

Soft Merging of Experts with Adaptive Routing , author=. Transactions on Machine Learning Research , year=

work page
[68]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer , author=. arXiv preprint arXiv:1701.06538 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[69]

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity , author=

work page
[70]

Categorical Reparameterization with Gumbel-Softmax

Categorical reparameterization with gumbel-softmax , author=. arXiv preprint arXiv:1611.01144 , year=

work page internal anchor Pith review arXiv
[71]

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

The concrete distribution: A continuous relaxation of discrete random variables , author=. arXiv preprint arXiv:1611.00712 , year=

work page Pith review arXiv