SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks

Hangyu Mao; Lei Yuan; Peng Liu; Rongchang Zuo; Siyuan Li; Yongyan Wen

arxiv: 2411.12173 · v1 · pith:ZHVZWP7Qnew · submitted 2024-11-19 · 💻 cs.LG · cs.AI

SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks

Yongyan Wen , Siyuan Li , Rongchang Zuo , Lei Yuan , Hangyu Mao , Peng Liu This is my paper

Pith reviewed 2026-05-25 08:31 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords explainable reinforcement learningskill-based reinforcement learningdecision treeshierarchical policiesrobotic controllong-horizon taskscontinuous control

0 comments

The pith

SkillTree uses a decision tree in the high-level policy to select skills, turning opaque continuous control into explainable skill choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes SkillTree to address the lack of transparency in deep reinforcement learning for long-horizon continuous control tasks such as robotic arm manipulation. It reduces the continuous action space to a discrete set of skills by placing a differentiable decision tree at the high level of a hierarchical policy; the tree outputs skill embeddings that direct a low-level policy. This structure produces decisions that can be inspected at the skill level while the authors report performance that remains comparable to black-box skill-based neural networks. A reader would care if the approach allows practical deployment of reinforcement learning agents whose choices can be understood without sacrificing task success.

Core claim

SkillTree reduces complex continuous action spaces into discrete skill spaces through a hierarchical policy in which a differentiable decision tree generates skill embeddings to guide the low-level policy, delivering performance comparable to skill-based neural networks together with skill-level explanations.

What carries the argument

A differentiable decision tree placed inside the high-level policy that selects and embeds skills to steer the low-level controller.

If this is right

The method matches the task performance of skill-based neural networks on complex robotic arm domains.
Decisions become inspectable at the level of individual skills rather than raw actions.
Transparency of the overall decision process increases for long-horizon control problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The tree structure could let engineers trace why a particular skill was chosen on any given timestep.
The same hierarchical split might be tested on other continuous-control domains that currently rely on opaque networks.
If the skill set is expanded or learned rather than fixed, the performance gap to neural baselines could shrink further.

Load-bearing premise

Reducing the continuous action space to a discrete set of skills still leaves enough expressive power to solve the target long-horizon tasks without noticeable loss of performance.

What would settle it

A long-horizon robotic task in which any fixed skill set forces a measurable drop below the success rate of an unrestricted neural policy.

Figures

Figures reproduced from arXiv: 2411.12173 by Hangyu Mao, Lei Yuan, Peng Liu, Rongchang Zuo, Siyuan Li, Yongyan Wen.

**Figure 2.** Figure 2: Discrete skill embedding learning and downstream high-level DT policy learning. After completing the skill learning, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Four long-horizon sparse reward tasks to evaluate. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Downstream task learning curves of both our method and baselines. Averaged over 5 independent runs. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 7.** Figure 7: Skill index output visualization of an episode in [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 6.** Figure 6: We fix the skill output of high-level policy and [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Deep reinforcement learning (DRL) has achieved remarkable success in various research domains. However, its reliance on neural networks results in a lack of transparency, which limits its practical applications. To achieve explainability, decision trees have emerged as a popular and promising alternative to neural networks. Nonetheless, due to their limited expressiveness, traditional decision trees struggle with high-dimensional long-horizon continuous control tasks. In this paper, we proposes SkillTree, a novel framework that reduces complex continuous action spaces into discrete skill spaces. Our hierarchical approach integrates a differentiable decision tree within the high-level policy to generate skill embeddings, which subsequently guide the low-level policy in executing skills. By making skill decisions explainable, we achieve skill-level explainability, enhancing the understanding of the decision-making process in complex tasks. Experimental results demonstrate that our method achieves performance comparable to skill-based neural networks in complex robotic arm control domains. Furthermore, SkillTree offers explanations at the skill level, thereby increasing the transparency of the decision-making process.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SkillTree swaps a neural high-level policy for a differentiable decision tree that picks skill embeddings, giving skill-level explanations in hierarchical RL, but the abstract supplies zero metrics or baselines to support the comparable-performance claim.

read the letter

SkillTree puts a differentiable decision tree at the high level of a skill-based RL hierarchy so that skill selection becomes traceable through the tree splits rather than a black-box network. The low-level policy still executes the chosen skills in the usual way. This is the concrete addition: the tree operates on skill embeddings instead of raw actions or values, and it stays differentiable for joint training. Prior decision-tree work in RL tends to target direct action output or value functions, so the placement here is a modest but distinct step inside existing hierarchical skill pipelines. The framing around long-horizon robotic control and the need for skill-level transparency is clear and on target. The paper therefore gives readers a workable template if they already use skill abstractions and want an off-the-shelf explainability layer on top. The central empirical claim is that performance stays comparable to ordinary skill-based neural networks on robotic-arm tasks. The abstract states this directly but contains no numbers, no baseline list, no variance measures, and no protocol details. That leaves the key assumption—that mapping continuous actions to a discrete skill space loses no material expressiveness—untested in the visible text. If the full experiments later show the tree policy matches the neural baselines within reasonable margins and the tree paths actually yield useful explanations, the contribution is solid incremental work. If the numbers reveal a noticeable gap, the method mainly demonstrates a trade-off rather than a free lunch. The paper is aimed at people already working on hierarchical RL or explainable control in robotics. A reader who needs new algorithmic machinery will find little beyond the tree placement; a reader who wants a concrete example of skill-level interpretability might pick up the idea. It is coherent on its own terms and engages the relevant literature without obvious internal contradictions, so it deserves a serious referee to examine the experiments and the actual tree sizes and training details.

Referee Report

2 major / 1 minor

Summary. The paper proposes SkillTree, a hierarchical framework for explainable skill-based deep reinforcement learning. It reduces complex continuous action spaces to discrete skill spaces by integrating a differentiable decision tree in the high-level policy to generate skill embeddings that guide a low-level policy. The central claims are that this achieves performance comparable to skill-based neural networks in complex robotic arm control domains while providing skill-level explanations that increase transparency of the decision-making process.

Significance. If the performance claim holds with rigorous evidence, the work could meaningfully advance explainable RL by showing that decision-tree-based skill selection can match neural baselines in long-horizon continuous control without substantial loss of expressiveness. This would be relevant for domains where interpretability is required. The manuscript does not ship machine-checked proofs or open reproducible code, but the empirical focus on robotic tasks provides a concrete testbed for the claims.

major comments (2)

[Abstract] Abstract: the assertion that 'Experimental results demonstrate that our method achieves performance comparable to skill-based neural networks' supplies no metrics, baselines, statistical tests, or experimental protocol. This prevents evaluation of the central empirical claim that the hierarchical tree policy matches neural baselines without material performance loss.
[Method (implied by abstract description)] The reduction of continuous action spaces into discrete skill spaces is load-bearing for the performance claim, yet the manuscript provides no ablation or analysis quantifying expressiveness loss (or lack thereof) when mapping to the skill space.

minor comments (1)

[Abstract] Abstract contains a grammatical error: 'we proposes SkillTree' should read 'we propose SkillTree'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'Experimental results demonstrate that our method achieves performance comparable to skill-based neural networks' supplies no metrics, baselines, statistical tests, or experimental protocol. This prevents evaluation of the central empirical claim that the hierarchical tree policy matches neural baselines without material performance loss.

Authors: We agree that the abstract lacks sufficient quantitative detail to allow immediate evaluation of the performance claim. In the revised manuscript we will expand the abstract to report concrete metrics (e.g., mean cumulative reward and task success rate on the robotic-arm domains), name the neural baselines, and briefly indicate the evaluation protocol and number of random seeds used. revision: yes
Referee: [Method (implied by abstract description)] The reduction of continuous action spaces into discrete skill spaces is load-bearing for the performance claim, yet the manuscript provides no ablation or analysis quantifying expressiveness loss (or lack thereof) when mapping to the skill space.

Authors: We acknowledge that an explicit quantification of any expressiveness loss introduced by the skill-space mapping is currently missing. We will add an ablation subsection that compares SkillTree against (i) a low-level policy trained directly on the original continuous action space and (ii) a neural high-level policy, reporting the resulting performance delta and any degradation in long-horizon task completion. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces SkillTree as a hierarchical framework that combines a differentiable decision tree at the high level to produce skill embeddings guiding a low-level policy, with the central claims resting on experimental demonstrations of performance parity with neural baselines and added skill-level explainability. No equations, derivations, or self-citations appear in the provided text that reduce any claimed result to a fitted quantity or input by construction; the expressiveness of the discrete skill space is treated as an empirical outcome rather than a definitional equivalence, and no load-bearing uniqueness theorems or ansatzes from prior self-work are invoked to force the architecture.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, background axioms, or newly postulated entities; the central claim rests on the unstated premise that discrete skills suffice for the target domains.

pith-pipeline@v0.9.0 · 5715 in / 1115 out tokens · 25004 ms · 2026-05-25T08:31:32.078345+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 3 internal anchors

[1]

Akkaya, I.; Andrychowicz, M.; Chociej, M.; Litwin, M.; McGrew, B.; Petron, A.; Paino, A.; Plappert, M.; Powell, G.; Ribas, R.; et al. 2019. Solving rubik's cube with a robot hand. arXiv preprint arXiv:1910.07113

work page internal anchor Pith review Pith/arXiv arXiv 2019
[2]

Anderson, A.; Dodge, J.; Sadarangani, A.; Juozapaitis, Z.; Newman, E.; Irvine, J.; Chattopadhyay, S.; Fern, A.; and Burnett, M. 2019. Explaining reinforcement learning to mere mortals: an empirical study. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 1328--1334

work page 2019
[3]

BAAI, P. 2023. Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563

work page arXiv 2023
[4]

Bastani, O.; Pu, Y.; and Solar-Lezama, A. 2018. Verifiable reinforcement learning via policy extraction. In Advances in Neural Information Processing Systems, volume 31

work page 2018
[5]

Bewley, T.; and Lawry, J. 2021. Tripletree: A versatile interpretable representation of black box agents and their environments. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 11415--11422

work page 2021
[6]

Cheng, Z.; Wu, X.; Yu, J.; Sun, W.; Guo, W.; and Xing, X. 2024. Statemask: Explaining deep reinforcement learning through state mask. In Advances in Neural Information Processing Systems, volume 36

work page 2024
[7]

Coppens, Y.; Efthymiadis, K.; Lenaerts, T.; Now \'e , A.; Miller, T.; Weber, R.; and Magazzeni, D. 2019. Distilling deep reinforcement learning policies in soft decision trees. In Proceedings of the IJCAI 2019 workshop on explainable artificial intelligence, 1--6

work page 2019
[8]

G.; and Pedreira, C

Costa, V. G.; and Pedreira, C. E. 2023. Recent advances in decision trees: An updated survey. Artificial Intelligence Review, 56(5): 4765--4800

work page 2023
[9]

Dalal, M.; Pathak, D.; and Salakhutdinov, R. R. 2021. Accelerating robotic reinforcement learning via parameterized action primitives. In Advances in Neural Information Processing Systems, volume 34, 21847--21859

work page 2021
[10]

V.; Dasgupta, A.; Krishnamurthy, B.; Jiang, N.; Agarwal, C.; Theocharous, G.; and Subramanian, J

Deshmukh, S. V.; Dasgupta, A.; Krishnamurthy, B.; Jiang, N.; Agarwal, C.; Theocharous, G.; and Subramanian, J. 2023. Explaining RL Decisions with Trajectories. In International Conference on Learning Representations

work page 2023
[11]

Dhebar, Y.; and Deb, K. 2020. Interpretable rule discovery through bilevel optimization of split-rules of nonlinear decision trees for classification problems. IEEE Transactions on Cybernetics, 51(11): 5573--5584

work page 2020
[12]

W.; Li, C.; and Huang, R

Ding, Z.; Hernandez-Leal, P.; Ding, G. W.; Li, C.; and Huang, R. 2020. Cdt: Cascading decision trees for explainable reinforcement learning. arXiv preprint arXiv:2011.07553

work page arXiv 2020
[13]

Frosst, N.; and Hinton, G. 2017. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Fu, J.; Kumar, A.; Nachum, O.; Tucker, G.; and Levine, S. 2020. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219

work page internal anchor Pith review Pith/arXiv arXiv 2020
[15]

Greydanus, S.; Koul, A.; Dodge, J.; and Fern, A. 2018. Visualizing and understanding atari agents. In International conference on machine learning, 1792--1801. PMLR

work page 2018
[16]

Guo, W.; Wu, X.; Khan, U.; and Xing, X. 2021. Edge: Explaining deep reinforcement learning policies. In Advances in Neural Information Processing Systems, volume 34, 12222--12236

work page 2021
[17]

Haarnoja, T.; Zhou, A.; Abbeel, P.; and Levine, S. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Dy, J.; and Krause, A., eds., Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 1861--1870. PMLR

work page 2018
[18]

T.; Wang, Z.; Heess, N.; and Riedmiller, M

Hausman, K.; Springenberg, J. T.; Wang, Z.; Heess, N.; and Riedmiller, M. 2018. Learning an embedding space for transferable robot skills. In International Conference on Learning Representations

work page 2018
[19]

Hein, D.; Udluft, S.; and Runkler, T. A. 2018. Interpretable policies for reinforcement learning by genetic programming. Engineering Applications of Artificial Intelligence, 76: 158--169

work page 2018
[20]

Heuillet, A.; Couthouis, F.; and D \' az-Rodr \' guez, N. 2021. Explainability in deep reinforcement learning. Knowledge-Based Systems, 214: 106685

work page 2021
[21]

Heuillet, A.; Couthouis, F.; and D \' az-Rodr \' guez, N. 2022. Collective explainable AI: Explaining cooperative strategies and agent contribution in multiagent reinforcement learning with shapley values. IEEE Computational Intelligence Magazine, 17(1): 59--71

work page 2022
[22]

Hickling, T.; Zenati, A.; Aouf, N.; and Spencer, P. 2023. Explainability in deep reinforcement learning: A review into current methods and applications. ACM Computing Surveys, 56(5): 1--35

work page 2023
[23]

Jitosho, R.; Lum, T. G. W.; Okamura, A.; and Liu, K. 2023. Reinforcement Learning Enables Real-Time Planning and Control of Agile Maneuvers for Soft Robot Arms. In Tan, J.; Toussaint, M.; and Darvish, K., eds., Proceedings of The 7th Conference on Robot Learning, volume 229 of Proceedings of Machine Learning Research, 1131--1153. PMLR

work page 2023
[24]

Kipf, T.; Li, Y.; Dai, H.; Zambaldi, V.; Sanchez-Gonzalez, A.; Grefenstette, E.; Kohli, P.; and Battaglia, P. 2019. Compile: Compositional imitation learning and execution. In International Conference on Machine Learning, 3418--3428. PMLR

work page 2019
[25]

Kulh \'a nek, J.; Derner, E.; and Babu s ka, R. 2021. Visual navigation in real-world indoor environments using end-to-end deep reinforcement learning. IEEE Robotics and Automation Letters, 6(3): 4345--4352

work page 2021
[26]

K.; Kim, S.; Santiago, C

Landajuela, M.; Petersen, B. K.; Kim, S.; Santiago, C. P.; Glatt, R.; Mundhenk, N.; Pettit, J. F.; and Faissol, D. 2021. Discovering symbolic policies with deep reinforcement learning. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 5979--5989. PMLR

work page 2021
[27]

Lee, Y.; Yang, J.; and Lim, J. J. 2019. Learning to coordinate manipulation skills via skill behavior diversification. In International conference on learning representations

work page 2019
[28]

Liu, G.; Schulte, O.; Zhu, W.; and Li, Q. 2019. Toward interpretable deep reinforcement learning with linear model u-trees. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10--14, 2018, Proceedings, Part II 18, 414--429. Springer

work page 2019
[29]

Liu, G.; Sun, X.; Schulte, O.; and Poupart, P. 2021. Learning tree interpretation from object representation for deep reinforcement learning. In Advances in Neural Information Processing Systems, volume 34, 19622--19636

work page 2021
[30]

Loh, W.-Y. 2011. Classification and regression trees. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(1): 14--23

work page 2011
[31]

Lynch, C.; Khansari, M.; Xiao, T.; Kumar, V.; Tompson, J.; Levine, S.; and Sermanet, P. 2020. Learning latent plans from play. In Conference on robot learning, 1113--1132. PMLR

work page 2020
[32]

Lyu, D.; Yang, F.; Liu, B.; and Gustafson, S. 2019. SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 2970--2977

work page 2019
[33]

H.; Li, D.; Liu, W.; and Hao, J

Ma, Z.; Zhuang, Y.; Weng, P.; Zhuo, H. H.; Li, D.; Liu, W.; and Hao, J. 2021. Learning symbolic rules for interpretable deep reinforcement learning. arXiv preprint arXiv:2103.08228

work page arXiv 2021
[34]

Madumal, P.; Miller, T.; Sonenberg, L.; and Vetere, F. 2020. Explainable reinforcement learning through a causal lens. In Proceedings of the AAAI conference on artificial intelligence, volume 34, 2493--2500

work page 2020
[35]

Mees, O.; Hermann, L.; Rosete-Beas, E.; and Burgard, W. 2022. Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks. IEEE Robotics and Automation Letters, 7(3): 7327--7334

work page 2022
[36]

Milani, S.; Topin, N.; Veloso, M.; and Fang, F. 2024. Explainable reinforcement learning: A survey and comparative review. ACM Computing Surveys, 56(7): 1--36

work page 2024
[37]

A.; Veness, J.; Bellemare, M

Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Human-level control through deep reinforcement learning. nature, 518(7540): 529--533

work page 2015
[38]

Molnar, C.; Casalicchio, G.; and Bischl, B. 2020. Interpretable machine learning--a brief history, state-of-the-art and challenges. In Joint European conference on machine learning and knowledge discovery in databases, 417--431. Springer

work page 2020
[39]

L.; Khanna, R.; Neal, L.; Li, F.; and Wong, W.-K

Olson, M. L.; Khanna, R.; Neal, L.; Li, F.; and Wong, W.-K. 2021. Counterfactual state explanations for reinforcement learning agents via generative deep learning. Artificial Intelligence, 295: 103455

work page 2021
[40]

Orfanos, S.; and Lelis, L. H. 2023. Synthesizing programmatic policies with actor-critic algorithms and relu networks. arXiv preprint arXiv:2308.02729

work page arXiv 2023
[41]

Pertsch, K.; Lee, Y.; and Lim, J. 2021. Accelerating reinforcement learning with learned skill priors. In Conference on robot learning, 188--204. PMLR

work page 2021
[42]

Pertsch, K.; Lee, Y.; Wu, Y.; and Lim, J. J. 2021. Demonstration-Guided Reinforcement Learning with Learned Skills. In 5th Annual Conference on Robot Learning

work page 2021
[43]

Quinlan, J. R. 1993. C4. 5: Programs for Machine Learning

work page 1993
[44]

V.; and Kakade, S

Rajeswaran, A.; Lowrey, K.; Todorov, E. V.; and Kakade, S. M. 2017. Towards generalization and simplicity in continuous control. In Advances in neural information processing systems, volume 30

work page 2017
[45]

Shankar, T.; and Gupta, A. 2020. Learning robot skills with temporal variational inference. In International Conference on Machine Learning, 8624--8633. PMLR

work page 2020
[46]

X.; Lim, J

Shi, L. X.; Lim, J. J.; and Lee, Y. 2023. Skill-based Model-based Reinforcement Learning. In Conference on Robot Learning, 2262--2272. PMLR

work page 2023
[47]

Shiarlis, K.; Wulfmeier, M.; Salter, S.; Whiteson, S.; and Posner, I. 2018. Taco: Learning task decomposition via temporal alignment for control. In International Conference on Machine Learning, 4654--4663. PMLR

work page 2018
[48]

Shu, T.; Xiong, C.; and Socher, R. 2018. Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning. In International Conference on Learning Representations

work page 2018
[49]

Silva, A.; Gombolay, M.; Killian, T.; Jimenez, I.; and Son, S.-H. 2020. Optimization methods for interpretable differentiable decision trees applied to reinforcement learning. In International conference on artificial intelligence and statistics, 1855--1865. PMLR

work page 2020
[50]

S.; and Barto, A

Sutton, R. S.; and Barto, A. G. 2018. Reinforcement Learning: An Introduction. MIT press

work page 2018
[51]

Van Den Oord, A.; Vinyals, O.; et al. 2017. Neural discrete representation learning. In Advances in Neural Information Processing Systems, volume 30

work page 2017
[52]

Vasić, M.; Petrović, A.; Wang, K.; Nikolić, M.; Singh, R.; and Khurshid, S. 2022. MoËT: Mixture of Expert Trees and its application to verifiable reinforcement learning. Neural Networks, 151: 34--47

work page 2022
[53]

Wabartha, M.; and Pineau, J. 2023. Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning. In International Conference on Learning Representations

work page 2023
[54]

Zhang, K.; Zhang, J.; Xu, P.-D.; Gao, T.; and Gao, D. W. 2021. Explainable AI in deep reinforcement learning models for power system emergency control. IEEE Transactions on Computational Social Systems, 9(2): 419--427

work page 2021
[55]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[56]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Akkaya, I.; Andrychowicz, M.; Chociej, M.; Litwin, M.; McGrew, B.; Petron, A.; Paino, A.; Plappert, M.; Powell, G.; Ribas, R.; et al. 2019. Solving rubik's cube with a robot hand. arXiv preprint arXiv:1910.07113

work page internal anchor Pith review Pith/arXiv arXiv 2019

[2] [2]

Anderson, A.; Dodge, J.; Sadarangani, A.; Juozapaitis, Z.; Newman, E.; Irvine, J.; Chattopadhyay, S.; Fern, A.; and Burnett, M. 2019. Explaining reinforcement learning to mere mortals: an empirical study. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 1328--1334

work page 2019

[3] [3]

BAAI, P. 2023. Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563

work page arXiv 2023

[4] [4]

Bastani, O.; Pu, Y.; and Solar-Lezama, A. 2018. Verifiable reinforcement learning via policy extraction. In Advances in Neural Information Processing Systems, volume 31

work page 2018

[5] [5]

Bewley, T.; and Lawry, J. 2021. Tripletree: A versatile interpretable representation of black box agents and their environments. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 11415--11422

work page 2021

[6] [6]

Cheng, Z.; Wu, X.; Yu, J.; Sun, W.; Guo, W.; and Xing, X. 2024. Statemask: Explaining deep reinforcement learning through state mask. In Advances in Neural Information Processing Systems, volume 36

work page 2024

[7] [7]

Coppens, Y.; Efthymiadis, K.; Lenaerts, T.; Now \'e , A.; Miller, T.; Weber, R.; and Magazzeni, D. 2019. Distilling deep reinforcement learning policies in soft decision trees. In Proceedings of the IJCAI 2019 workshop on explainable artificial intelligence, 1--6

work page 2019

[8] [8]

G.; and Pedreira, C

Costa, V. G.; and Pedreira, C. E. 2023. Recent advances in decision trees: An updated survey. Artificial Intelligence Review, 56(5): 4765--4800

work page 2023

[9] [9]

Dalal, M.; Pathak, D.; and Salakhutdinov, R. R. 2021. Accelerating robotic reinforcement learning via parameterized action primitives. In Advances in Neural Information Processing Systems, volume 34, 21847--21859

work page 2021

[10] [10]

V.; Dasgupta, A.; Krishnamurthy, B.; Jiang, N.; Agarwal, C.; Theocharous, G.; and Subramanian, J

Deshmukh, S. V.; Dasgupta, A.; Krishnamurthy, B.; Jiang, N.; Agarwal, C.; Theocharous, G.; and Subramanian, J. 2023. Explaining RL Decisions with Trajectories. In International Conference on Learning Representations

work page 2023

[11] [11]

Dhebar, Y.; and Deb, K. 2020. Interpretable rule discovery through bilevel optimization of split-rules of nonlinear decision trees for classification problems. IEEE Transactions on Cybernetics, 51(11): 5573--5584

work page 2020

[12] [12]

W.; Li, C.; and Huang, R

Ding, Z.; Hernandez-Leal, P.; Ding, G. W.; Li, C.; and Huang, R. 2020. Cdt: Cascading decision trees for explainable reinforcement learning. arXiv preprint arXiv:2011.07553

work page arXiv 2020

[13] [13]

Frosst, N.; and Hinton, G. 2017. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Fu, J.; Kumar, A.; Nachum, O.; Tucker, G.; and Levine, S. 2020. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219

work page internal anchor Pith review Pith/arXiv arXiv 2020

[15] [15]

Greydanus, S.; Koul, A.; Dodge, J.; and Fern, A. 2018. Visualizing and understanding atari agents. In International conference on machine learning, 1792--1801. PMLR

work page 2018

[16] [16]

Guo, W.; Wu, X.; Khan, U.; and Xing, X. 2021. Edge: Explaining deep reinforcement learning policies. In Advances in Neural Information Processing Systems, volume 34, 12222--12236

work page 2021

[17] [17]

Haarnoja, T.; Zhou, A.; Abbeel, P.; and Levine, S. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Dy, J.; and Krause, A., eds., Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 1861--1870. PMLR

work page 2018

[18] [18]

T.; Wang, Z.; Heess, N.; and Riedmiller, M

Hausman, K.; Springenberg, J. T.; Wang, Z.; Heess, N.; and Riedmiller, M. 2018. Learning an embedding space for transferable robot skills. In International Conference on Learning Representations

work page 2018

[19] [19]

Hein, D.; Udluft, S.; and Runkler, T. A. 2018. Interpretable policies for reinforcement learning by genetic programming. Engineering Applications of Artificial Intelligence, 76: 158--169

work page 2018

[20] [20]

Heuillet, A.; Couthouis, F.; and D \' az-Rodr \' guez, N. 2021. Explainability in deep reinforcement learning. Knowledge-Based Systems, 214: 106685

work page 2021

[21] [21]

Heuillet, A.; Couthouis, F.; and D \' az-Rodr \' guez, N. 2022. Collective explainable AI: Explaining cooperative strategies and agent contribution in multiagent reinforcement learning with shapley values. IEEE Computational Intelligence Magazine, 17(1): 59--71

work page 2022

[22] [22]

Hickling, T.; Zenati, A.; Aouf, N.; and Spencer, P. 2023. Explainability in deep reinforcement learning: A review into current methods and applications. ACM Computing Surveys, 56(5): 1--35

work page 2023

[23] [23]

Jitosho, R.; Lum, T. G. W.; Okamura, A.; and Liu, K. 2023. Reinforcement Learning Enables Real-Time Planning and Control of Agile Maneuvers for Soft Robot Arms. In Tan, J.; Toussaint, M.; and Darvish, K., eds., Proceedings of The 7th Conference on Robot Learning, volume 229 of Proceedings of Machine Learning Research, 1131--1153. PMLR

work page 2023

[24] [24]

Kipf, T.; Li, Y.; Dai, H.; Zambaldi, V.; Sanchez-Gonzalez, A.; Grefenstette, E.; Kohli, P.; and Battaglia, P. 2019. Compile: Compositional imitation learning and execution. In International Conference on Machine Learning, 3418--3428. PMLR

work page 2019

[25] [25]

Kulh \'a nek, J.; Derner, E.; and Babu s ka, R. 2021. Visual navigation in real-world indoor environments using end-to-end deep reinforcement learning. IEEE Robotics and Automation Letters, 6(3): 4345--4352

work page 2021

[26] [26]

K.; Kim, S.; Santiago, C

Landajuela, M.; Petersen, B. K.; Kim, S.; Santiago, C. P.; Glatt, R.; Mundhenk, N.; Pettit, J. F.; and Faissol, D. 2021. Discovering symbolic policies with deep reinforcement learning. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 5979--5989. PMLR

work page 2021

[27] [27]

Lee, Y.; Yang, J.; and Lim, J. J. 2019. Learning to coordinate manipulation skills via skill behavior diversification. In International conference on learning representations

work page 2019

[28] [28]

Liu, G.; Schulte, O.; Zhu, W.; and Li, Q. 2019. Toward interpretable deep reinforcement learning with linear model u-trees. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10--14, 2018, Proceedings, Part II 18, 414--429. Springer

work page 2019

[29] [29]

Liu, G.; Sun, X.; Schulte, O.; and Poupart, P. 2021. Learning tree interpretation from object representation for deep reinforcement learning. In Advances in Neural Information Processing Systems, volume 34, 19622--19636

work page 2021

[30] [30]

Loh, W.-Y. 2011. Classification and regression trees. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(1): 14--23

work page 2011

[31] [31]

Lynch, C.; Khansari, M.; Xiao, T.; Kumar, V.; Tompson, J.; Levine, S.; and Sermanet, P. 2020. Learning latent plans from play. In Conference on robot learning, 1113--1132. PMLR

work page 2020

[32] [32]

Lyu, D.; Yang, F.; Liu, B.; and Gustafson, S. 2019. SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 2970--2977

work page 2019

[33] [33]

H.; Li, D.; Liu, W.; and Hao, J

Ma, Z.; Zhuang, Y.; Weng, P.; Zhuo, H. H.; Li, D.; Liu, W.; and Hao, J. 2021. Learning symbolic rules for interpretable deep reinforcement learning. arXiv preprint arXiv:2103.08228

work page arXiv 2021

[34] [34]

Madumal, P.; Miller, T.; Sonenberg, L.; and Vetere, F. 2020. Explainable reinforcement learning through a causal lens. In Proceedings of the AAAI conference on artificial intelligence, volume 34, 2493--2500

work page 2020

[35] [35]

Mees, O.; Hermann, L.; Rosete-Beas, E.; and Burgard, W. 2022. Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks. IEEE Robotics and Automation Letters, 7(3): 7327--7334

work page 2022

[36] [36]

Milani, S.; Topin, N.; Veloso, M.; and Fang, F. 2024. Explainable reinforcement learning: A survey and comparative review. ACM Computing Surveys, 56(7): 1--36

work page 2024

[37] [37]

A.; Veness, J.; Bellemare, M

Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Human-level control through deep reinforcement learning. nature, 518(7540): 529--533

work page 2015

[38] [38]

Molnar, C.; Casalicchio, G.; and Bischl, B. 2020. Interpretable machine learning--a brief history, state-of-the-art and challenges. In Joint European conference on machine learning and knowledge discovery in databases, 417--431. Springer

work page 2020

[39] [39]

L.; Khanna, R.; Neal, L.; Li, F.; and Wong, W.-K

Olson, M. L.; Khanna, R.; Neal, L.; Li, F.; and Wong, W.-K. 2021. Counterfactual state explanations for reinforcement learning agents via generative deep learning. Artificial Intelligence, 295: 103455

work page 2021

[40] [40]

Orfanos, S.; and Lelis, L. H. 2023. Synthesizing programmatic policies with actor-critic algorithms and relu networks. arXiv preprint arXiv:2308.02729

work page arXiv 2023

[41] [41]

Pertsch, K.; Lee, Y.; and Lim, J. 2021. Accelerating reinforcement learning with learned skill priors. In Conference on robot learning, 188--204. PMLR

work page 2021

[42] [42]

Pertsch, K.; Lee, Y.; Wu, Y.; and Lim, J. J. 2021. Demonstration-Guided Reinforcement Learning with Learned Skills. In 5th Annual Conference on Robot Learning

work page 2021

[43] [43]

Quinlan, J. R. 1993. C4. 5: Programs for Machine Learning

work page 1993

[44] [44]

V.; and Kakade, S

Rajeswaran, A.; Lowrey, K.; Todorov, E. V.; and Kakade, S. M. 2017. Towards generalization and simplicity in continuous control. In Advances in neural information processing systems, volume 30

work page 2017

[45] [45]

Shankar, T.; and Gupta, A. 2020. Learning robot skills with temporal variational inference. In International Conference on Machine Learning, 8624--8633. PMLR

work page 2020

[46] [46]

X.; Lim, J

Shi, L. X.; Lim, J. J.; and Lee, Y. 2023. Skill-based Model-based Reinforcement Learning. In Conference on Robot Learning, 2262--2272. PMLR

work page 2023

[47] [47]

Shiarlis, K.; Wulfmeier, M.; Salter, S.; Whiteson, S.; and Posner, I. 2018. Taco: Learning task decomposition via temporal alignment for control. In International Conference on Machine Learning, 4654--4663. PMLR

work page 2018

[48] [48]

Shu, T.; Xiong, C.; and Socher, R. 2018. Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning. In International Conference on Learning Representations

work page 2018

[49] [49]

Silva, A.; Gombolay, M.; Killian, T.; Jimenez, I.; and Son, S.-H. 2020. Optimization methods for interpretable differentiable decision trees applied to reinforcement learning. In International conference on artificial intelligence and statistics, 1855--1865. PMLR

work page 2020

[50] [50]

S.; and Barto, A

Sutton, R. S.; and Barto, A. G. 2018. Reinforcement Learning: An Introduction. MIT press

work page 2018

[51] [51]

Van Den Oord, A.; Vinyals, O.; et al. 2017. Neural discrete representation learning. In Advances in Neural Information Processing Systems, volume 30

work page 2017

[52] [52]

Vasić, M.; Petrović, A.; Wang, K.; Nikolić, M.; Singh, R.; and Khurshid, S. 2022. MoËT: Mixture of Expert Trees and its application to verifiable reinforcement learning. Neural Networks, 151: 34--47

work page 2022

[53] [53]

Wabartha, M.; and Pineau, J. 2023. Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning. In International Conference on Learning Representations

work page 2023

[54] [54]

Zhang, K.; Zhang, J.; Xu, P.-D.; Gao, T.; and Gao, D. W. 2021. Explainable AI in deep reinforcement learning models for power system emergency control. IEEE Transactions on Computational Social Systems, 9(2): 419--427

work page 2021

[55] [55]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page

[56] [56]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page