Learning-to-Explain through 20Q Gaming: An Explainable Recommender for Cybersecurity Education
Pith reviewed 2026-05-10 14:20 UTC · model grok-4.3
The pith
A reinforcement learning agent recommends cybersecurity mitigations by playing a 20-questions game that gathers the minimal justifying facts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Explainable Q20 Cybersecurity Recommender (EQ-20CR) casts justification for defensive actions as a 20-questions game. A policy-based RL agent leads users through a sequence of questions to recognize targeted concepts and continues until it can recommend optimal security education while explaining the decision with a minimal set of evidential facts. The framework designs this process to support various cybersecurity concepts through case studies and adaptive difficulty.
What carries the argument
The EQ-20CR framework, in which a policy-based reinforcement learning agent plays a 20-questions game to elicit the smallest set of facts that justify a cybersecurity recommendation and produce an explanation trace.
If this is right
- Both recommendation and explanation emerge from the same sequence of adaptive questions.
- Question difficulty adjusts automatically to the user's responses.
- Explanations take the form of short dialogue traces rather than lengthy narratives.
- The same structure can be applied across different cybersecurity concepts such as attacks and defenses.
- Training shifts from passive reading to active game-like interaction.
Where Pith is reading between the lines
- The questioning approach could extend to other domains that require clear justification, such as medical treatment choices or financial risk decisions.
- Effectiveness would hinge on constructing a well-curated set of questions and suitable reward signals for the agent.
- Empirical validation would require measuring whether users retain and apply the recommended actions better than with conventional tutorials.
Load-bearing premise
That modeling the justification for a cybersecurity mitigation as a 20-questions game lets a reinforcement learning policy learn to produce both accurate recommendations and short, evidence-based explanations.
What would settle it
User studies or simulations in which the trained RL policy produces explanations that are either longer than traditional static advice or fail to improve participant understanding of the recommended defenses.
Figures
read the original abstract
The growing sophistication of contemporary cyber threats necessitates a more effective and adaptive approach to cybersecurity training. Intuitive and adaptive approaches to learning, which are often required, are not provided in traditional learning methods. In this article, we present a new educational framework, "Learning to Explain Cybersecurity with Q20 Game", based on explainable AI (XAI), an educational game to enhance interactivity in learning. We propose a novel, game-inspired framework - the Explainable Q20 Cybersecurity Recommender (EQ-20CR), that learns to elicit the minimal set of evidential facts needed to justify cybersecurity defensive action. By casting "Why should I execute this mitigation?" as a 20 questions (Q20) game, a policy-based reinforcement-learning (RL) agent actively queries an environment until it can both (i) recommend the optimal security education and (ii) explain that decision with a concise dialogue trace. The article draws from "Playing 20 Question Game with Policy-Based Reinforcement Learning" [1] and "Learning-to-Explain: Recommendation Reason Determination through Q20 Gaming" [2]. The framework uses a policy-based reinforcement learning (RL) agent that leads the user through a sequence of questions to recognize and articulate a targeted cybersecurity concept, attack vector, or defense strategy. Furthermore, users are gradually exposed to informative questions by the system, revealing complicated, structured way at an adaptive difficulty level. In this paper, we design the architecture, its application to various concepts of cybersecurity through illustrative case studies, and its transformative potential on the training and awareness of cybersecurity recommendations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a novel framework, the Explainable Q20 Cybersecurity Recommender (EQ-20CR), which casts cybersecurity recommendation explanation as a 20-questions game played by a policy-based RL agent. The agent is intended to query an environment to recommend optimal security education and provide concise explanations by eliciting minimal evidential facts for defensive actions. It draws from prior work on 20Q RL and describes the architecture along with case studies for cybersecurity concepts.
Significance. Should the proposed RL-based 20Q framework be fully specified, implemented, and empirically validated, it has the potential to offer a more interactive and adaptive method for cybersecurity education, improving the explainability of security recommendations and user engagement with complex topics.
major comments (2)
- [Abstract and Framework Description] No concrete formulation of the Markov Decision Process is provided, including the state representation for cybersecurity concepts, the action space of questions, or the reward function that encourages minimal and explanatory dialogues. This omission is load-bearing for the claim that the agent learns to elicit minimal facts.
- [Evaluation and Results] The paper claims transformative potential on training and awareness but provides no evaluation metrics, experimental results, or error analysis to support the effectiveness of the EQ-20CR in achieving its goals.
minor comments (1)
- [References] The citations to [1] and [2] are mentioned but the manuscript would benefit from a more detailed comparison of how the new framework extends or differs from these prior works beyond domain application.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract and Framework Description] No concrete formulation of the Markov Decision Process is provided, including the state representation for cybersecurity concepts, the action space of questions, or the reward function that encourages minimal and explanatory dialogues. This omission is load-bearing for the claim that the agent learns to elicit minimal facts.
Authors: We agree that an explicit MDP formulation would strengthen the presentation. The framework adapts the policy-based RL approach from the cited prior works on 20Q gaming, where states represent partial knowledge of the target cybersecurity concept, actions correspond to yes/no questions about evidential facts, and the reward balances accurate recommendation with dialogue minimality. The manuscript illustrates this through case studies but does not include the formal components. We will add a dedicated subsection in the revised version that specifies the state space (as vectors of cybersecurity facts and user responses), action space (adaptive question selection), transition dynamics, and reward function (negative per question plus terminal bonuses for correct recommendation and concise explanation). revision: yes
-
Referee: [Evaluation and Results] The paper claims transformative potential on training and awareness but provides no evaluation metrics, experimental results, or error analysis to support the effectiveness of the EQ-20CR in achieving its goals.
Authors: The current manuscript is a conceptual proposal that defines the EQ-20CR architecture and demonstrates its use via illustrative case studies for cybersecurity concepts. We do not present quantitative results or error analysis, as the focus is on the novel game-based formulation rather than empirical validation. We will revise the manuscript to include a discussion of suitable evaluation metrics (e.g., average dialogue length, recommendation accuracy, and explanation conciseness) and outline planned simulation-based experiments and user studies as future work, while clarifying that full empirical validation lies beyond the scope of this submission. revision: partial
Circularity Check
Central EQ-20CR claim reduces to domain transfer of prior Q20 policy-RL method via self-citation
specific steps
-
self citation load bearing
[Abstract]
"We propose a novel, game-inspired framework - the Explainable Q20 Cybersecurity Recommender (EQ-20CR), that learns to elicit the minimal set of evidential facts needed to justify cybersecurity defensive action. By casting 'Why should I execute this mitigation?' as a 20 questions (Q20) game, a policy-based reinforcement-learning (RL) agent actively queries an environment until it can both (i) recommend the optimal security education and (ii) explain that decision with a concise dialogue trace. The article draws from 'Playing 20 Question Game with Policy-Based Reinforcement Learning' [1] and 'L"
The claim that the RL agent learns to produce minimal explanatory traces is not derived from any cybersecurity-specific formulation in the paper; it is directly adopted from the two cited prior works. Without defining the state representation, action space, or reward function for the cybersecurity environment, the 'learning' behavior reduces to re-use of the imported Q20 RL policy rather than an independent result.
full rationale
The paper's core contribution is the EQ-20CR framework that 'learns to elicit the minimal set of evidential facts' by casting cybersecurity questions as a 20Q game solved by a policy-based RL agent. This mechanism is explicitly imported from the two cited references without any new MDP definition (states, actions, rewards), equations, or independent derivation. The result is therefore an application of the prior technique rather than a self-contained derivation, satisfying the self-citation load-bearing pattern.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A 20 questions game can be effectively modeled as a policy-based RL problem for eliciting minimal evidential facts to justify cybersecurity recommendations.
Reference graph
Works this paper leans on
-
[1]
Playing 20 Question Game with Policy -Based Reinforcement Learning,
H. Hu, X. Wu, B. Luo, C. Tao, C. Xu, W. Wu, and Z. Chen, “Playing 20 Question Game with Policy -Based Reinforcement Learning,” in Proc. 2018 Conf. Empirical Methods Nat. Lang. Process., 2018, pp. 3233–3242
work page 2018
-
[2]
Learning -to-Explain: Recommendation Reason Determination Through Q20 Gaming,
X. Wu, “Learning -to-Explain: Recommendation Reason Determination Through Q20 Gaming,” in Proc. SIGIR 2019 Workshop ExplainAble Recommend. Search (EARS’19), 2019, pp. 1–10
work page 2019
-
[3]
Simple statistical gradient -following algorithms for connectionist reinforcement learning,
R. J. Williams, “Simple statistical gradient -following algorithms for connectionist reinforcement learning,” Mach. Learn., vol. 8, no. 3–4, pp. 229–256, 1992
work page 1992
-
[4]
Mastering the game of Go with deep neural networks and tree search,
D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484 – 489, 2016
work page 2016
-
[5]
Learning -to-Ask: Knowledge Acquisition via 20 Questions,
Y. Chen, B. Chen, X. Duan, J. -G. Lou, Y. Wang, W. Zhu, and Y. Can, “Learning -to-Ask: Knowledge Acquisition via 20 Questions,” in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2018, pp. 1216–1225
work page 2018
-
[6]
Human -level control through deep reinforcement learning,
V. Mnih et al., “Human -level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529 – 533, 2015
work page 2015
-
[7]
T. Zhao and M. Eskenazi, “Towards End -to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning,” in Proc. 17th Annu. Meeting Special Interest Group Discourse Dialogue, 2016, pp. 1–10
work page 2016
-
[8]
MultiWOZ – A Large-Scale Multi-Domain Wizard -of-Oz Dataset for Task -Oriented Dialogue Modelling,
P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gasic, “MultiWOZ – A Large-Scale Multi-Domain Wizard -of-Oz Dataset for Task -Oriented Dialogue Modelling,” in Proc. 2018 Conf. Empirical Methods Natural Lang. Process., 2018, pp. 5016–5026
work page 2018
-
[9]
Q20: Rinna Riddles Your Mind by Asking 20 Questions,
X. Wu, H. Hu, M. Klyen, K. Tomita, and Z. Chen, “Q20: Rinna Riddles Your Mind by Asking 20 Questions,” in Proc. 24th Annu. Meeting Assoc. Natural Lang. Process., 2018, pp. 1312–1315
work page 2018
-
[10]
The MITRE Corporation, “MITRE ATT&CK®.” [Online]. Available: https://attack.mitre.org/, [Accessed: Sep. 8, 2025]
work page 2025
-
[11]
Lockheed Martin, "The Cyber Kill Chain®," [Online]. Available: https://www.lockheedmartin.com/en- us/capabilities/cyber/cyber-kill-chain.html, [Accessed: Sep. 8, 2025]
work page 2025
-
[12]
Assessing moral decision making in large language models
N. Mary and G. Hossain, “Towards Personalized Recommender System: A Gray -Box Modeling Approach,” in Proc. 2025 IEEE Int. Conf. Consumer Electron. (ICCE), Las Vegas, NV, USA, 2025, pp. 1 –6, doi: 10.1109/ICCE63647.2025.10930197
-
[13]
Property analysis of colloidal quantum dot in semiconductor nanostructure,
N. I. C. Mary and M. A. Islam, "Property analysis of colloidal quantum dot in semiconductor nanostructure," AIP Conference Proceedings, vol. 1919, p. 020034, 2017. [Online]. Available: https://doi.org/10.1063/1.5018552
-
[14]
A. Özarslan and G. Karakaya, “Interactive Approaches to Multiple Criteria Sorting Problems: Entropy -Based Question Selection Methods,” Int. J. Inf. Technol. Decis. Making, vol. 22, no. 1, pp. 279 –312, 2023, doi: https://doi.org/10.1142/S0219622022500389
-
[15]
Exploring Rich Evidence for Maximum Entropy- based Question Answering,
D. Shen, “Exploring Rich Evidence for Maximum Entropy- based Question Answering,” Ph.D. dissertation, Universität des Saarlandes, Saarbrücken, Germany, 2008. [Online]. Available: https://d-nb.info/996170383/34
-
[16]
Control -Alt- Hack™: White Hat Hacking for Fun and Profit,
T. Denning, Y. Kohno, and A. Shostack, “Control -Alt- Hack™: White Hat Hacking for Fun and Profit,” in *Proc. 44th ACM Technical Symposium on Computer Science Education (SIGCSE '13)*, Denver, CO, USA, Mar. 2013. [Online]. Available: https://dl.acm.org/doi/10.1145/2445196.2445408
-
[17]
Anti -Phishing Phil: The design and evaluation of a game that teaches people not to fall for phish,
S. Sheng, B. Magnien, P. Kumaraguru, A. Acquisti, L. F. Cranor, J. I. Hong, and E. Nunge, “Anti -Phishing Phil: The design and evaluation of a game that teaches people not to fall for phish,” in *Proc. 3rd Symp. Usable Privacy and Security (SOUPS)*, Pittsburgh, PA, USA, Jul. 2007, pp. 88–99
work page 2007
-
[18]
How one typo helped let Russian hackers in,
CNN, “How one typo helped let Russian hackers in,” *CNN Politics*, Jun. 27, 2017. [Online]. Available: https://edition.cnn.com/2017/06/27/politics/russia-dnc-hacking- csr/index.html
work page 2017
-
[19]
UK hospitals hit with massive ransomware attack,
G. Brandom, “UK hospitals hit with massive ransomware attack,” *The Verge*, May 12, 2017. [Online]. Available: https://www.theverge.com/2017/5/12/15630354/nhs-hospitals- ransomware-hack-wannacry-bitcoin. [Accessed: Sep. 08, 2025]
work page 2017
-
[20]
TalkTalk hit with record £400k fine over cyber - attack,
A. Hern, “TalkTalk hit with record £400k fine over cyber - attack,” *The Guardian*, Oct. 5, 2016. [Online]. Available: https://www.theguardian.com/business/2016/oct/05/talktalk-hit- with-record-400k-fine-over-cyber-attack. [Accessed: Sep. 08, 2025]
work page 2016
-
[21]
How Hackers Slipped by British Airways' Defenses,
"How Hackers Slipped by British Airways' Defenses," *Wired*, Sep. 11, 2018. [Online]. Available: https://www.wired.com/story/british-airways-hack-details. [Accessed: Sep. 08, 2025]
work page 2018
-
[22]
M. Nusrat, H. R. Mahi, S. Bhuiyan and G. Hossain, "Mitigating the Information Cocoon Effect in Cognitively Aligned Recommendations: A Human -Centered Approach," 2026 IEEE 16th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 2026, pp. 0324 -0330, doi: 10.1109/CCWC67433.2026.11393796
-
[23]
Learning- to-Explain through 20Q Gaming: An Explainable Recommender for Cybersecurity Education,
M. Nusrat, G. Hossain, Kinshuk and S. Bhuiyan, "Learning- to-Explain through 20Q Gaming: An Explainable Recommender for Cybersecurity Education," 2025 IEEE 16th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Berkele y, CA, USA, 2025, pp. 0381 -0388, doi: 10.1109/IEMCON67450.2025.11381070
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.