Toward Template-Free Explainability for Monte Carlo Tree Search

Ayan Mukhopadhyay; Hemant Purohit; Hiba Baroud; MirSaleh Bahavarnia; Siqi Lu; Yixuan Zhang

arxiv: 2605.16524 · v2 · pith:J2CL2OFQnew · submitted 2026-05-15 · 💻 cs.HC · cs.AI

Toward Template-Free Explainability for Monte Carlo Tree Search

Siqi Lu , Mirsaleh Bahavarnia , Hiba Baroud , Yixuan Zhang , Hemant Purohit , Ayan Mukhopadhyay This is my paper

Pith reviewed 2026-05-21 08:21 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords Monte Carlo Tree SearchExplainable AILarge Language ModelsProbabilistic SearchNatural Language ExplanationsSearch TracesDecision Making Under Uncertainty

0 comments

The pith

Large language models can generate natural-language explanations for Monte Carlo Tree Search decisions directly from raw tree statistics without templates or formal logic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that lets large language models turn natural-language questions about Monte Carlo Tree Search into explanations by first mapping the question to an intent category, then checking whether the recorded search tree holds enough evidence, and expanding the tree only when needed before producing the answer. The explanations draw on concrete statistics such as visit counts, value estimates, and risk measures rather than hand-crafted rules. A sympathetic reader would care because earlier approaches demanded custom formal constraints that had to be rewritten whenever the underlying decision problem changed, making them brittle for new tasks.

Core claim

The framework maps natural-language questions to a structured set of intent categories, determines whether the existing tree contains sufficient evidence, triggers targeted expansion when needed, and generates explanations using tree statistics such as visit counts, value estimates, and risk information. Experimental results provide the first evidence that LLMs can serve as end-to-end explainers for probabilistic search without requiring intermediate formal representations.

What carries the argument

The end-to-end LLM framework that converts natural-language questions into intent categories and produces explanations grounded in raw MCTS tree statistics.

If this is right

Explanations become possible without rewriting formal logic constraints for each new problem domain.
The system can automatically decide whether to expand the search tree before answering a user question.
Users receive explanations tied directly to measurable quantities such as visit counts and value estimates.
The same pipeline applies to any asymmetric search tree produced by bandit-based traversal and simulation-based evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could extend to other probabilistic planning algorithms that produce comparable tree statistics.
Interactive systems might let users refine their questions and receive updated explanations in the same session.
Accuracy could be tested by comparing generated text against human-written summaries of the same trees.

Load-bearing premise

Large language models can reliably judge when a search tree contains enough evidence and produce accurate natural-language explanations from visit counts, value estimates, and risk data.

What would settle it

An experiment in which the LLM explanations are compared against the actual tree statistics and found to misstate visit counts or value estimates on a majority of test cases.

Figures

Figures reproduced from arXiv: 2605.16524 by Ayan Mukhopadhyay, Hemant Purohit, Hiba Baroud, MirSaleh Bahavarnia, Siqi Lu, Yixuan Zhang.

**Figure 1.** Figure 1: Overview of the proposed framework. Our explainability module takes two inputs: a question from the end-user in natural language and a saved MCTS tree that records visited states (denoted by nodes in the tree), available actions at each state, visit counts, and value estimates generated by rollouts (or a trained value estimator, e.g., a neural network). The LLM interprets the user’s question, identifies th… view at source ↗

**Figure 2.** Figure 2: The FrozenLake environment used for evaluation. The environment presents the canonical challenge of [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative example of the generated explanation. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Probabilistic search algorithms, such as Monte Carlo Tree Search (MCTS), have proven very effective in solving sequential decision-making tasks under uncertainty. However, interpreting asymmetric search trees that incorporate bandit-based tree traversal and simulation-based value estimation is difficult for end users based solely on raw tree statistics. While prior work requires hand-crafted formal logic constraints that must be updated when the problem changes, we present a framework that enables large language models (LLMs) to generate evidence-grounded explanations of MCTS decisions from recorded search traces in an end-to-end manner. Our framework maps natural-language questions to a structured set of intent categories, determines whether the existing tree contains sufficient evidence, triggers targeted expansion when needed, and generates explanations using tree statistics such as visit counts, value estimates, and risk information. Experimental results provide the first evidence that LLMs can serve as end-to-end explainers for probabilistic search, without requiring intermediate formal representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLM explanations for MCTS from raw tree stats without templates is a direct idea, but the abstract leaves the experimental grounding unshown.

read the letter

The main thing to know is that this paper sets up LLMs to turn MCTS search traces into natural-language explanations without any hand-crafted formal rules that need rewriting for each new problem. They describe a pipeline that takes a question, maps it to an intent category, checks if the tree has enough visit counts or value estimates, expands the search when it does not, and then generates the output from those statistics. The abstract positions this as the first evidence for end-to-end LLM explainers in probabilistic search.

Referee Report

2 major / 2 minor

Summary. The paper proposes a framework that uses large language models (LLMs) to generate natural-language explanations for Monte Carlo Tree Search (MCTS) decisions directly from raw search traces. The approach maps user questions to structured intent categories, determines whether the recorded tree statistics (visit counts, value estimates, risk information) contain sufficient evidence, triggers targeted expansion if needed, and produces explanations without relying on hand-crafted formal logic or intermediate representations. The central claim is that experimental results supply the first evidence that LLMs can function as end-to-end explainers for probabilistic search algorithms.

Significance. If the experimental validation holds, the work would represent a meaningful step toward template-free interpretability for asymmetric, bandit-driven search trees that are otherwise difficult for end users to parse. By removing the requirement to maintain problem-specific formal constraints, the framework could broaden the applicability of explainable AI in sequential decision-making domains such as planning and game playing. The absence of machine-checked proofs or parameter-free derivations is offset by the attempt to ground explanations in observable tree statistics.

major comments (2)

[Results / Experimental Evaluation] Results / Experimental Evaluation section: The manuscript reports no quantitative grounding checks (e.g., token-level citation accuracy to the input visit counts/value estimates or inter-rater agreement metrics with human analysts on the same trees). Because the LLM is responsible for both sufficiency judgment and explanation generation, the lack of independent validation against raw tree data is load-bearing for the end-to-end claim and leaves open the possibility of plausible but unfaithful narratives.
[Framework description] Framework description: The abstract and methods assert that the LLM 'determines whether the existing tree contains sufficient evidence' and 'triggers targeted expansion when needed,' yet no concrete decision procedure, threshold, or fallback mechanism is specified. Without these details it is impossible to assess whether the sufficiency judgment is reproducible or merely delegates the core reasoning burden to the LLM.

minor comments (2)

[Abstract] The abstract would be clearer if it listed the specific intent categories used to map natural-language questions.
[Notation / Preliminaries] Notation for tree statistics (visit counts, value estimates, risk information) should be introduced with explicit symbols or a small table to aid readers unfamiliar with MCTS.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important areas for strengthening the experimental validation and clarifying the framework's operational details. We address each major comment below and commit to revisions that will improve the rigor and reproducibility of the work.

read point-by-point responses

Referee: [Results / Experimental Evaluation] Results / Experimental Evaluation section: The manuscript reports no quantitative grounding checks (e.g., token-level citation accuracy to the input visit counts/value estimates or inter-rater agreement metrics with human analysts on the same trees). Because the LLM is responsible for both sufficiency judgment and explanation generation, the lack of independent validation against raw tree data is load-bearing for the end-to-end claim and leaves open the possibility of plausible but unfaithful narratives.

Authors: We acknowledge the validity of this observation. Our current experiments center on human evaluations of explanation quality and alignment with tree statistics, but we agree that these do not fully substitute for quantitative grounding metrics. In the revised manuscript we will add token-level citation accuracy measures that verify direct references to visit counts, value estimates, and risk information from the input traces, along with inter-rater agreement statistics for the human analyst assessments. These additions will be placed in the Experimental Evaluation section to provide stronger support for the faithfulness of the generated explanations. revision: yes
Referee: [Framework description] Framework description: The abstract and methods assert that the LLM 'determines whether the existing tree contains sufficient evidence' and 'triggers targeted expansion when needed,' yet no concrete decision procedure, threshold, or fallback mechanism is specified. Without these details it is impossible to assess whether the sufficiency judgment is reproducible or merely delegates the core reasoning burden to the LLM.

Authors: We agree that additional specification is required for reproducibility. The revised manuscript will expand the Framework description to detail the prompting strategy used for the sufficiency judgment, the expected structured output format from the LLM, any response-based decision rules, and the explicit fallback mechanism (such as defaulting to tree expansion or generating a conservative explanation when evidence is insufficient). This will clarify the balance between LLM reasoning and guided procedure without altering the core template-free approach. revision: yes

Circularity Check

0 steps flagged

No circularity: framework is procedural and externally validated by experiments

full rationale

The manuscript presents a procedural framework that maps questions to intent categories, checks evidence sufficiency in MCTS trees, and generates natural-language explanations from visit counts and value estimates. No equations, fitted parameters, or first-principles derivations appear in the abstract or description. The central claim rests on experimental evidence that LLMs can perform these steps end-to-end, which is an external capability test rather than a self-referential reduction. No self-citation chains, uniqueness theorems, or ansatzes are invoked to force the result. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circular finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the unproven assumption that current LLMs possess sufficient reasoning to map questions to intents and produce evidence-grounded explanations from tree statistics; this capability is treated as given rather than derived or benchmarked within the paper.

axioms (1)

domain assumption Large language models can accurately interpret MCTS tree statistics and generate faithful explanations from them without hand-crafted formal constraints.
Invoked when the framework is described as mapping questions to explanations using visit counts, value estimates, and risk information.

pith-pipeline@v0.9.0 · 5705 in / 1249 out tokens · 45309 ms · 2026-05-21T08:21:22.647580+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our framework maps natural-language questions to a structured set of intent categories, determines whether the existing tree contains sufficient evidence, triggers targeted expansion when needed, and generates explanations using tree statistics such as visit counts, value estimates, and risk information.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the first framework that enables the explainability of probabilistic search trees without any intermediate formal representation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

[1]

Offline vehicle routing problem with online bookings: A novel problem formulation with applications to paratransit.arXiv preprint arXiv:2204.11992, 2022

Amutheezan Sivagnanam, Salah Uddin Kadir, Ayan Mukhopadhyay, Philip Pugliese, Abhishek Dubey, Samitha Samaranayake, and Aron Laszka. Offline vehicle routing problem with online bookings: A novel problem formulation with applications to paratransit.arXiv preprint arXiv:2204.11992, 2022

work page arXiv 2022
[2]

Hierarchical planning for resource allocation in emergency response systems

Geoffrey Pettet, Ayan Mukhopadhyay, Mykel J Kochenderfer, and Abhishek Dubey. Hierarchical planning for resource allocation in emergency response systems. InProceedings of the ACM/IEEE 12th international conference on cyber-physical systems, pages 155–166, 2021

work page 2021
[3]

Reinforcement learning in healthcare: A survey.ACM Computing Surveys (CSUR), 55(1):1–36, 2021

Chao Yu, Jiming Liu, Shamim Nemati, and Guosheng Yin. Reinforcement learning in healthcare: A survey.ACM Computing Surveys (CSUR), 55(1):1–36, 2021. 5 Table 3: Keyword-based grounding results for generated explanations. Keyword Check Passed / Total Rate Agent Core Decision 16 / 21 76.2% Risk Calculation 19 / 21 90.5% Asked State-Action Pair 19 / 19 100.0%...

work page 2021
[4]

John Wiley & Sons, 2014

Martin L Puterman.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014

work page 2014
[5]

Bandit based monte-carlo planning

Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. InEuropean conference on machine learning, pages 282–293. Springer, 2006

work page 2006
[6]

Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

work page 2020
[7]

Monte-carlo 6 tree search for multi-agent pathfinding: Preliminary results

Yelisey Pitanov, Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, and Aleksandr Panov. Monte-carlo 6 tree search for multi-agent pathfinding: Preliminary results. InInternational Conference on Hybrid Artificial Intelligence Systems, pages 649–660. Springer, 2023

work page 2023
[8]

Browne, Edward Powley, Daniel Whitehouse, Simon M

Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. A survey of monte carlo tree search methods.IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1–43, 2012

work page 2012
[9]

Monte carlo tree search: A review of recent modifications and applications.Artificial Intelligence Review, 56(3):2497–2562, 2023

Maciej ´Swiechowski, Konrad Godlewski, Bartosz Sawicki, and Jacek Ma ´ndziuk. Monte carlo tree search: A review of recent modifications and applications.Artificial Intelligence Review, 56(3):2497–2562, 2023

work page 2023
[10]

Digital Guardians: The Past and The Future of Cyber-Physical Resilience

Saurabh Bagchi, Hyunseung Kim, Tarek Abdelzaher, Homa Alemzadeh, Somali Chaterji, Glen Chou, Yuying Duan, Fanxin Kong, Michael Lemmon, Yin Li, et al. Digital guardians: The past and the future of cyber-physical resilience.arXiv preprint arXiv:2604.14360, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[11]

Act as you learn: Adaptive decision- making in non-stationary markov decision processes.arXiv preprint arXiv:2401.01841, 2024

Baiting Luo, Yunuo Zhang, Abhishek Dubey, and Ayan Mukhopadhyay. Act as you learn: Adaptive decision- making in non-stationary markov decision processes.arXiv preprint arXiv:2401.01841, 2024

work page arXiv 2024
[12]

Towards explainable MCTS.2021 AAAI Workshop on Explainable Agency in AI, 178, 2021

Hendrik Baier and Michael Kaisers. Towards explainable MCTS.2021 AAAI Workshop on Explainable Agency in AI, 178, 2021

work page 2021
[13]

Explainable ai and reinforcement learning—a systematic review of current approaches and trends.Frontiers in artificial intelligence, 4:550030, 2021

Lindsay Wells and Tomasz Bednarz. Explainable ai and reinforcement learning—a systematic review of current approaches and trends.Frontiers in artificial intelligence, 4:550030, 2021

work page 2021
[14]

Logiex: Integrating formal logic and llms for explainable transit planning, 2026

Ziyan An, Xia Wang, Hendrik Baier, Zirong Chen, Abhishek Dubey, Taylor Johnson, Jonathan Sprinkle, and Meiyi Ma. Logiex: Integrating formal logic and llms for explainable transit planning, 2026

work page 2026
[15]

Enabling mcts explainability for sequential planning through computation tree logic, 2024

Ziyan An, Hendrik Baier, Abhishek Dubey, Ayan Mukhopadhyay, and Meiyi Ma. Enabling mcts explainability for sequential planning through computation tree logic, 2024

work page 2024
[16]

On the modeling capabilities of large language models for sequential decision making

Martin Klissarov, R Devon Hjelm, Alexander T Toshev, and Bogdan Mazoure. On the modeling capabilities of large language models for sequential decision making. InThe Thirteenth International Conference on Learning Representations, 2024

work page 2024
[17]

Explanations for sequential decision-making–an overview

Hendrik Baier, Mark T Keane, Sarath Sreedharan, Silvia Tulli, and Abhinav Verma. Explanations for sequential decision-making–an overview. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 40948–40953, 2026

work page 2026
[18]

Springer Nature, 2019

Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller.Explainable AI: interpreting, explaining and visualizing deep learning. Springer Nature, 2019

work page 2019
[19]

Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai.Information fusion, 58:82–115, 2020

Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai.Information fusion, 58:82–115, 2020

work page 2020
[20]

Explanation in artificial intelligence: Insights from the social sciences.Artificial intelligence, 267:1–38, 2019

Tim Miller. Explanation in artificial intelligence: Insights from the social sciences.Artificial intelligence, 267:1–38, 2019

work page 2019
[21]

Bridging the gap: Providing post-hoc symbolic explanations for sequential decision-making problems with inscrutable representations.arXiv preprint arXiv:2002.01080, 2020

Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava, and Subbarao Kambhampati. Bridging the gap: Providing post-hoc symbolic explanations for sequential decision-making problems with inscrutable representations.arXiv preprint arXiv:2002.01080, 2020

work page arXiv 2002
[22]

Explainable agency in human-robot interaction

Pat Langley. Explainable agency in human-robot interaction. InAAAI fall symposium series, pages 504–507, 2016

work page 2016
[23]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

Gymnasium: A standard interface for reinforcement learning environments

Mark Towers, Ariel Kwiatkowski, John U Balis, Gianluca De Cola, Tristan Deleu, Manuel Goulão, Kallinteris Andreas, Markus Krimmel, Arjun KG, Rodrigo De Lazcano Perez-Vicente, et al. Gymnasium: A standard interface for reinforcement learning environments. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 7

work page

[1] [1]

Offline vehicle routing problem with online bookings: A novel problem formulation with applications to paratransit.arXiv preprint arXiv:2204.11992, 2022

Amutheezan Sivagnanam, Salah Uddin Kadir, Ayan Mukhopadhyay, Philip Pugliese, Abhishek Dubey, Samitha Samaranayake, and Aron Laszka. Offline vehicle routing problem with online bookings: A novel problem formulation with applications to paratransit.arXiv preprint arXiv:2204.11992, 2022

work page arXiv 2022

[2] [2]

Hierarchical planning for resource allocation in emergency response systems

Geoffrey Pettet, Ayan Mukhopadhyay, Mykel J Kochenderfer, and Abhishek Dubey. Hierarchical planning for resource allocation in emergency response systems. InProceedings of the ACM/IEEE 12th international conference on cyber-physical systems, pages 155–166, 2021

work page 2021

[3] [3]

Reinforcement learning in healthcare: A survey.ACM Computing Surveys (CSUR), 55(1):1–36, 2021

Chao Yu, Jiming Liu, Shamim Nemati, and Guosheng Yin. Reinforcement learning in healthcare: A survey.ACM Computing Surveys (CSUR), 55(1):1–36, 2021. 5 Table 3: Keyword-based grounding results for generated explanations. Keyword Check Passed / Total Rate Agent Core Decision 16 / 21 76.2% Risk Calculation 19 / 21 90.5% Asked State-Action Pair 19 / 19 100.0%...

work page 2021

[4] [4]

John Wiley & Sons, 2014

Martin L Puterman.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014

work page 2014

[5] [5]

Bandit based monte-carlo planning

Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. InEuropean conference on machine learning, pages 282–293. Springer, 2006

work page 2006

[6] [6]

Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

work page 2020

[7] [7]

Monte-carlo 6 tree search for multi-agent pathfinding: Preliminary results

Yelisey Pitanov, Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, and Aleksandr Panov. Monte-carlo 6 tree search for multi-agent pathfinding: Preliminary results. InInternational Conference on Hybrid Artificial Intelligence Systems, pages 649–660. Springer, 2023

work page 2023

[8] [8]

Browne, Edward Powley, Daniel Whitehouse, Simon M

Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. A survey of monte carlo tree search methods.IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1–43, 2012

work page 2012

[9] [9]

Monte carlo tree search: A review of recent modifications and applications.Artificial Intelligence Review, 56(3):2497–2562, 2023

Maciej ´Swiechowski, Konrad Godlewski, Bartosz Sawicki, and Jacek Ma ´ndziuk. Monte carlo tree search: A review of recent modifications and applications.Artificial Intelligence Review, 56(3):2497–2562, 2023

work page 2023

[10] [10]

Digital Guardians: The Past and The Future of Cyber-Physical Resilience

Saurabh Bagchi, Hyunseung Kim, Tarek Abdelzaher, Homa Alemzadeh, Somali Chaterji, Glen Chou, Yuying Duan, Fanxin Kong, Michael Lemmon, Yin Li, et al. Digital guardians: The past and the future of cyber-physical resilience.arXiv preprint arXiv:2604.14360, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[11] [11]

Act as you learn: Adaptive decision- making in non-stationary markov decision processes.arXiv preprint arXiv:2401.01841, 2024

Baiting Luo, Yunuo Zhang, Abhishek Dubey, and Ayan Mukhopadhyay. Act as you learn: Adaptive decision- making in non-stationary markov decision processes.arXiv preprint arXiv:2401.01841, 2024

work page arXiv 2024

[12] [12]

Towards explainable MCTS.2021 AAAI Workshop on Explainable Agency in AI, 178, 2021

Hendrik Baier and Michael Kaisers. Towards explainable MCTS.2021 AAAI Workshop on Explainable Agency in AI, 178, 2021

work page 2021

[13] [13]

Explainable ai and reinforcement learning—a systematic review of current approaches and trends.Frontiers in artificial intelligence, 4:550030, 2021

Lindsay Wells and Tomasz Bednarz. Explainable ai and reinforcement learning—a systematic review of current approaches and trends.Frontiers in artificial intelligence, 4:550030, 2021

work page 2021

[14] [14]

Logiex: Integrating formal logic and llms for explainable transit planning, 2026

Ziyan An, Xia Wang, Hendrik Baier, Zirong Chen, Abhishek Dubey, Taylor Johnson, Jonathan Sprinkle, and Meiyi Ma. Logiex: Integrating formal logic and llms for explainable transit planning, 2026

work page 2026

[15] [15]

Enabling mcts explainability for sequential planning through computation tree logic, 2024

Ziyan An, Hendrik Baier, Abhishek Dubey, Ayan Mukhopadhyay, and Meiyi Ma. Enabling mcts explainability for sequential planning through computation tree logic, 2024

work page 2024

[16] [16]

On the modeling capabilities of large language models for sequential decision making

Martin Klissarov, R Devon Hjelm, Alexander T Toshev, and Bogdan Mazoure. On the modeling capabilities of large language models for sequential decision making. InThe Thirteenth International Conference on Learning Representations, 2024

work page 2024

[17] [17]

Explanations for sequential decision-making–an overview

Hendrik Baier, Mark T Keane, Sarath Sreedharan, Silvia Tulli, and Abhinav Verma. Explanations for sequential decision-making–an overview. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 40948–40953, 2026

work page 2026

[18] [18]

Springer Nature, 2019

Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller.Explainable AI: interpreting, explaining and visualizing deep learning. Springer Nature, 2019

work page 2019

[19] [19]

Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai.Information fusion, 58:82–115, 2020

Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai.Information fusion, 58:82–115, 2020

work page 2020

[20] [20]

Explanation in artificial intelligence: Insights from the social sciences.Artificial intelligence, 267:1–38, 2019

Tim Miller. Explanation in artificial intelligence: Insights from the social sciences.Artificial intelligence, 267:1–38, 2019

work page 2019

[21] [21]

Bridging the gap: Providing post-hoc symbolic explanations for sequential decision-making problems with inscrutable representations.arXiv preprint arXiv:2002.01080, 2020

Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava, and Subbarao Kambhampati. Bridging the gap: Providing post-hoc symbolic explanations for sequential decision-making problems with inscrutable representations.arXiv preprint arXiv:2002.01080, 2020

work page arXiv 2002

[22] [22]

Explainable agency in human-robot interaction

Pat Langley. Explainable agency in human-robot interaction. InAAAI fall symposium series, pages 504–507, 2016

work page 2016

[23] [23]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[24] [24]

Gymnasium: A standard interface for reinforcement learning environments

Mark Towers, Ariel Kwiatkowski, John U Balis, Gianluca De Cola, Tristan Deleu, Manuel Goulão, Kallinteris Andreas, Markus Krimmel, Arjun KG, Rodrigo De Lazcano Perez-Vicente, et al. Gymnasium: A standard interface for reinforcement learning environments. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 7

work page