pith. sign in

arxiv: 2604.26964 · v1 · submitted 2026-04-14 · 💻 cs.CY · cs.AI· cs.LG

Learning-to-Explain through 20Q Gaming: An Explainable Recommender for Cybersecurity Education

Pith reviewed 2026-05-10 14:20 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.LG
keywords cybersecurity educationexplainable AIreinforcement learning20 questions gamerecommender systemsinteractive learningpolicy-based RLXAI
0
0 comments X

The pith

A reinforcement learning agent recommends cybersecurity mitigations by playing a 20-questions game that gathers the minimal justifying facts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that reframes the question of why to apply a cybersecurity defense as an interactive 20-questions game. A policy-based reinforcement learning agent queries the user adaptively until it can both select the optimal educational recommendation and produce a short explanation built from the collected answers. This targets the limitations of traditional static training by making learning more responsive and by revealing concepts at adjustable difficulty levels. The approach applies the game structure to cybersecurity topics such as attack vectors and defense strategies, drawing on prior 20Q reinforcement learning techniques to generate concise dialogue traces as explanations.

Core claim

The Explainable Q20 Cybersecurity Recommender (EQ-20CR) casts justification for defensive actions as a 20-questions game. A policy-based RL agent leads users through a sequence of questions to recognize targeted concepts and continues until it can recommend optimal security education while explaining the decision with a minimal set of evidential facts. The framework designs this process to support various cybersecurity concepts through case studies and adaptive difficulty.

What carries the argument

The EQ-20CR framework, in which a policy-based reinforcement learning agent plays a 20-questions game to elicit the smallest set of facts that justify a cybersecurity recommendation and produce an explanation trace.

If this is right

  • Both recommendation and explanation emerge from the same sequence of adaptive questions.
  • Question difficulty adjusts automatically to the user's responses.
  • Explanations take the form of short dialogue traces rather than lengthy narratives.
  • The same structure can be applied across different cybersecurity concepts such as attacks and defenses.
  • Training shifts from passive reading to active game-like interaction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The questioning approach could extend to other domains that require clear justification, such as medical treatment choices or financial risk decisions.
  • Effectiveness would hinge on constructing a well-curated set of questions and suitable reward signals for the agent.
  • Empirical validation would require measuring whether users retain and apply the recommended actions better than with conventional tutorials.

Load-bearing premise

That modeling the justification for a cybersecurity mitigation as a 20-questions game lets a reinforcement learning policy learn to produce both accurate recommendations and short, evidence-based explanations.

What would settle it

User studies or simulations in which the trained RL policy produces explanations that are either longer than traditional static advice or fail to improve participant understanding of the recommended defenses.

Figures

Figures reproduced from arXiv: 2604.26964 by Gahangir Hossain, Mary Nusrat, Sarfuddin Bhuiyan.

Figure 1
Figure 1. Figure 1: Ishikawa Diagram or Fishbone Analysis of Cybersecurity Awareness with Q20 Game. III. "LEARNING-TO-EXPLAIN" (LTE) FRAMEWORK The proposed LTE framework for cybersecurity is designed as an interactive system where users engage with a question-based chatbot to learn about various cyber threats. This section details the architecture and the underlying algorithm of the framework. The core of the framework is a "… view at source ↗
Figure 2
Figure 2. Figure 2: Training a Policy-based RL Agent in the LTE framework. Identification and Explanation: The game ends when a set number of questions (up to 20) have been asked or when the likelihood of a single attack vector exceeds a specific threshold. The system then identifies the most probable attack and provides a detailed justification with an explanation. This explanation highlights how the user's decisions led to … view at source ↗
Figure 3
Figure 3. Figure 3: Entropy-based question ranking process. Initial Belief State (Prior Distribution): Equation (7) initializes the system's belief about which cybersecurity concept the user is thinking of: w′ (𝑐𝑚) = w(𝑐𝑚) ∑ w(𝑐𝑚) 𝑀 𝑚=1 (7) here, ▪ 𝑚 = 1, … 𝑀: is the total number of cybersecurity concepts (attack vectors) in the knowledge base ▪ w(𝑐𝑚): Prior weight of concept 𝑐𝑚 ▪ w′ (𝑐𝑚): Normalized probability Answer Likeli… view at source ↗
read the original abstract

The growing sophistication of contemporary cyber threats necessitates a more effective and adaptive approach to cybersecurity training. Intuitive and adaptive approaches to learning, which are often required, are not provided in traditional learning methods. In this article, we present a new educational framework, "Learning to Explain Cybersecurity with Q20 Game", based on explainable AI (XAI), an educational game to enhance interactivity in learning. We propose a novel, game-inspired framework - the Explainable Q20 Cybersecurity Recommender (EQ-20CR), that learns to elicit the minimal set of evidential facts needed to justify cybersecurity defensive action. By casting "Why should I execute this mitigation?" as a 20 questions (Q20) game, a policy-based reinforcement-learning (RL) agent actively queries an environment until it can both (i) recommend the optimal security education and (ii) explain that decision with a concise dialogue trace. The article draws from "Playing 20 Question Game with Policy-Based Reinforcement Learning" [1] and "Learning-to-Explain: Recommendation Reason Determination through Q20 Gaming" [2]. The framework uses a policy-based reinforcement learning (RL) agent that leads the user through a sequence of questions to recognize and articulate a targeted cybersecurity concept, attack vector, or defense strategy. Furthermore, users are gradually exposed to informative questions by the system, revealing complicated, structured way at an adaptive difficulty level. In this paper, we design the architecture, its application to various concepts of cybersecurity through illustrative case studies, and its transformative potential on the training and awareness of cybersecurity recommendations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a novel framework, the Explainable Q20 Cybersecurity Recommender (EQ-20CR), which casts cybersecurity recommendation explanation as a 20-questions game played by a policy-based RL agent. The agent is intended to query an environment to recommend optimal security education and provide concise explanations by eliciting minimal evidential facts for defensive actions. It draws from prior work on 20Q RL and describes the architecture along with case studies for cybersecurity concepts.

Significance. Should the proposed RL-based 20Q framework be fully specified, implemented, and empirically validated, it has the potential to offer a more interactive and adaptive method for cybersecurity education, improving the explainability of security recommendations and user engagement with complex topics.

major comments (2)
  1. [Abstract and Framework Description] No concrete formulation of the Markov Decision Process is provided, including the state representation for cybersecurity concepts, the action space of questions, or the reward function that encourages minimal and explanatory dialogues. This omission is load-bearing for the claim that the agent learns to elicit minimal facts.
  2. [Evaluation and Results] The paper claims transformative potential on training and awareness but provides no evaluation metrics, experimental results, or error analysis to support the effectiveness of the EQ-20CR in achieving its goals.
minor comments (1)
  1. [References] The citations to [1] and [2] are mentioned but the manuscript would benefit from a more detailed comparison of how the new framework extends or differs from these prior works beyond domain application.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [Abstract and Framework Description] No concrete formulation of the Markov Decision Process is provided, including the state representation for cybersecurity concepts, the action space of questions, or the reward function that encourages minimal and explanatory dialogues. This omission is load-bearing for the claim that the agent learns to elicit minimal facts.

    Authors: We agree that an explicit MDP formulation would strengthen the presentation. The framework adapts the policy-based RL approach from the cited prior works on 20Q gaming, where states represent partial knowledge of the target cybersecurity concept, actions correspond to yes/no questions about evidential facts, and the reward balances accurate recommendation with dialogue minimality. The manuscript illustrates this through case studies but does not include the formal components. We will add a dedicated subsection in the revised version that specifies the state space (as vectors of cybersecurity facts and user responses), action space (adaptive question selection), transition dynamics, and reward function (negative per question plus terminal bonuses for correct recommendation and concise explanation). revision: yes

  2. Referee: [Evaluation and Results] The paper claims transformative potential on training and awareness but provides no evaluation metrics, experimental results, or error analysis to support the effectiveness of the EQ-20CR in achieving its goals.

    Authors: The current manuscript is a conceptual proposal that defines the EQ-20CR architecture and demonstrates its use via illustrative case studies for cybersecurity concepts. We do not present quantitative results or error analysis, as the focus is on the novel game-based formulation rather than empirical validation. We will revise the manuscript to include a discussion of suitable evaluation metrics (e.g., average dialogue length, recommendation accuracy, and explanation conciseness) and outline planned simulation-based experiments and user studies as future work, while clarifying that full empirical validation lies beyond the scope of this submission. revision: partial

Circularity Check

1 steps flagged

Central EQ-20CR claim reduces to domain transfer of prior Q20 policy-RL method via self-citation

specific steps
  1. self citation load bearing [Abstract]
    "We propose a novel, game-inspired framework - the Explainable Q20 Cybersecurity Recommender (EQ-20CR), that learns to elicit the minimal set of evidential facts needed to justify cybersecurity defensive action. By casting 'Why should I execute this mitigation?' as a 20 questions (Q20) game, a policy-based reinforcement-learning (RL) agent actively queries an environment until it can both (i) recommend the optimal security education and (ii) explain that decision with a concise dialogue trace. The article draws from 'Playing 20 Question Game with Policy-Based Reinforcement Learning' [1] and 'L"

    The claim that the RL agent learns to produce minimal explanatory traces is not derived from any cybersecurity-specific formulation in the paper; it is directly adopted from the two cited prior works. Without defining the state representation, action space, or reward function for the cybersecurity environment, the 'learning' behavior reduces to re-use of the imported Q20 RL policy rather than an independent result.

full rationale

The paper's core contribution is the EQ-20CR framework that 'learns to elicit the minimal set of evidential facts' by casting cybersecurity questions as a 20Q game solved by a policy-based RL agent. This mechanism is explicitly imported from the two cited references without any new MDP definition (states, actions, rewards), equations, or independent derivation. The result is therefore an application of the prior technique rather than a self-contained derivation, satisfying the self-citation load-bearing pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the domain assumption that RL can be used to model question-asking for minimal-fact elicitation in cybersecurity explanations, with no free parameters, new entities, or additional axioms detailed in the abstract.

axioms (1)
  • domain assumption A 20 questions game can be effectively modeled as a policy-based RL problem for eliciting minimal evidential facts to justify cybersecurity recommendations.
    This assumption is invoked when the paper casts the recommendation process as a Q20 game and proposes the EQ-20CR framework.

pith-pipeline@v0.9.0 · 5598 in / 1406 out tokens · 42848 ms · 2026-05-10T14:20:38.435784+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Playing 20 Question Game with Policy -Based Reinforcement Learning,

    H. Hu, X. Wu, B. Luo, C. Tao, C. Xu, W. Wu, and Z. Chen, “Playing 20 Question Game with Policy -Based Reinforcement Learning,” in Proc. 2018 Conf. Empirical Methods Nat. Lang. Process., 2018, pp. 3233–3242

  2. [2]

    Learning -to-Explain: Recommendation Reason Determination Through Q20 Gaming,

    X. Wu, “Learning -to-Explain: Recommendation Reason Determination Through Q20 Gaming,” in Proc. SIGIR 2019 Workshop ExplainAble Recommend. Search (EARS’19), 2019, pp. 1–10

  3. [3]

    Simple statistical gradient -following algorithms for connectionist reinforcement learning,

    R. J. Williams, “Simple statistical gradient -following algorithms for connectionist reinforcement learning,” Mach. Learn., vol. 8, no. 3–4, pp. 229–256, 1992

  4. [4]

    Mastering the game of Go with deep neural networks and tree search,

    D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484 – 489, 2016

  5. [5]

    Learning -to-Ask: Knowledge Acquisition via 20 Questions,

    Y. Chen, B. Chen, X. Duan, J. -G. Lou, Y. Wang, W. Zhu, and Y. Can, “Learning -to-Ask: Knowledge Acquisition via 20 Questions,” in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2018, pp. 1216–1225

  6. [6]

    Human -level control through deep reinforcement learning,

    V. Mnih et al., “Human -level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529 – 533, 2015

  7. [7]

    Towards End -to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning,

    T. Zhao and M. Eskenazi, “Towards End -to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning,” in Proc. 17th Annu. Meeting Special Interest Group Discourse Dialogue, 2016, pp. 1–10

  8. [8]

    MultiWOZ – A Large-Scale Multi-Domain Wizard -of-Oz Dataset for Task -Oriented Dialogue Modelling,

    P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gasic, “MultiWOZ – A Large-Scale Multi-Domain Wizard -of-Oz Dataset for Task -Oriented Dialogue Modelling,” in Proc. 2018 Conf. Empirical Methods Natural Lang. Process., 2018, pp. 5016–5026

  9. [9]

    Q20: Rinna Riddles Your Mind by Asking 20 Questions,

    X. Wu, H. Hu, M. Klyen, K. Tomita, and Z. Chen, “Q20: Rinna Riddles Your Mind by Asking 20 Questions,” in Proc. 24th Annu. Meeting Assoc. Natural Lang. Process., 2018, pp. 1312–1315

  10. [10]

    MITRE ATT&CK®

    The MITRE Corporation, “MITRE ATT&CK®.” [Online]. Available: https://attack.mitre.org/, [Accessed: Sep. 8, 2025]

  11. [11]

    The Cyber Kill Chain®,

    Lockheed Martin, "The Cyber Kill Chain®," [Online]. Available: https://www.lockheedmartin.com/en- us/capabilities/cyber/cyber-kill-chain.html, [Accessed: Sep. 8, 2025]

  12. [12]

    Assessing moral decision making in large language models

    N. Mary and G. Hossain, “Towards Personalized Recommender System: A Gray -Box Modeling Approach,” in Proc. 2025 IEEE Int. Conf. Consumer Electron. (ICCE), Las Vegas, NV, USA, 2025, pp. 1 –6, doi: 10.1109/ICCE63647.2025.10930197

  13. [13]

    Property analysis of colloidal quantum dot in semiconductor nanostructure,

    N. I. C. Mary and M. A. Islam, "Property analysis of colloidal quantum dot in semiconductor nanostructure," AIP Conference Proceedings, vol. 1919, p. 020034, 2017. [Online]. Available: https://doi.org/10.1063/1.5018552

  14. [14]

    Interactive Approaches to Multiple Criteria Sorting Problems: Entropy -Based Question Selection Methods,

    A. Özarslan and G. Karakaya, “Interactive Approaches to Multiple Criteria Sorting Problems: Entropy -Based Question Selection Methods,” Int. J. Inf. Technol. Decis. Making, vol. 22, no. 1, pp. 279 –312, 2023, doi: https://doi.org/10.1142/S0219622022500389

  15. [15]

    Exploring Rich Evidence for Maximum Entropy- based Question Answering,

    D. Shen, “Exploring Rich Evidence for Maximum Entropy- based Question Answering,” Ph.D. dissertation, Universität des Saarlandes, Saarbrücken, Germany, 2008. [Online]. Available: https://d-nb.info/996170383/34

  16. [16]

    Control -Alt- Hack™: White Hat Hacking for Fun and Profit,

    T. Denning, Y. Kohno, and A. Shostack, “Control -Alt- Hack™: White Hat Hacking for Fun and Profit,” in *Proc. 44th ACM Technical Symposium on Computer Science Education (SIGCSE '13)*, Denver, CO, USA, Mar. 2013. [Online]. Available: https://dl.acm.org/doi/10.1145/2445196.2445408

  17. [17]

    Anti -Phishing Phil: The design and evaluation of a game that teaches people not to fall for phish,

    S. Sheng, B. Magnien, P. Kumaraguru, A. Acquisti, L. F. Cranor, J. I. Hong, and E. Nunge, “Anti -Phishing Phil: The design and evaluation of a game that teaches people not to fall for phish,” in *Proc. 3rd Symp. Usable Privacy and Security (SOUPS)*, Pittsburgh, PA, USA, Jul. 2007, pp. 88–99

  18. [18]

    How one typo helped let Russian hackers in,

    CNN, “How one typo helped let Russian hackers in,” *CNN Politics*, Jun. 27, 2017. [Online]. Available: https://edition.cnn.com/2017/06/27/politics/russia-dnc-hacking- csr/index.html

  19. [19]

    UK hospitals hit with massive ransomware attack,

    G. Brandom, “UK hospitals hit with massive ransomware attack,” *The Verge*, May 12, 2017. [Online]. Available: https://www.theverge.com/2017/5/12/15630354/nhs-hospitals- ransomware-hack-wannacry-bitcoin. [Accessed: Sep. 08, 2025]

  20. [20]

    TalkTalk hit with record £400k fine over cyber - attack,

    A. Hern, “TalkTalk hit with record £400k fine over cyber - attack,” *The Guardian*, Oct. 5, 2016. [Online]. Available: https://www.theguardian.com/business/2016/oct/05/talktalk-hit- with-record-400k-fine-over-cyber-attack. [Accessed: Sep. 08, 2025]

  21. [21]

    How Hackers Slipped by British Airways' Defenses,

    "How Hackers Slipped by British Airways' Defenses," *Wired*, Sep. 11, 2018. [Online]. Available: https://www.wired.com/story/british-airways-hack-details. [Accessed: Sep. 08, 2025]

  22. [22]

    Mitigating the Information Cocoon Effect in Cognitively Aligned Recommendations: A Human -Centered Approach,

    M. Nusrat, H. R. Mahi, S. Bhuiyan and G. Hossain, "Mitigating the Information Cocoon Effect in Cognitively Aligned Recommendations: A Human -Centered Approach," 2026 IEEE 16th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 2026, pp. 0324 -0330, doi: 10.1109/CCWC67433.2026.11393796

  23. [23]

    Learning- to-Explain through 20Q Gaming: An Explainable Recommender for Cybersecurity Education,

    M. Nusrat, G. Hossain, Kinshuk and S. Bhuiyan, "Learning- to-Explain through 20Q Gaming: An Explainable Recommender for Cybersecurity Education," 2025 IEEE 16th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Berkele y, CA, USA, 2025, pp. 0381 -0388, doi: 10.1109/IEMCON67450.2025.11381070