CyBOKClaw: Human-in-the-Loop CyBOK Mapping for Cybersecurity Curriculum
Pith reviewed 2026-06-30 12:51 UTC · model grok-4.3
The pith
CyBOKClaw produces top-5 CyBOK candidates that experts accept as exact or closest match 98 percent of the time on validation data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CyBOKClaw is an interpretable human-in-the-loop retrieval framework that combines query normalization, curated term expansion, concept-level boosts, topic-description enrichment, and domain-sensitive ranking rules to produce top-k candidate mappings from cybersecurity keywords or phrases to CyBOK. On the development dataset it reaches 64.73 percent EXA-5, 84.18 percent structural semantic alignment, and 91.88 percent ECA-5; on the validation dataset the figures are 81.19 percent, 93.32 percent, and 98.00 percent. These results show that expert-guided top-k usefulness provides a more faithful account of practical CyBOK mapping utility than exact structural matching alone.
What carries the argument
The top-k candidate generator that integrates normalization, term expansion, boosts, enrichment, and ranking rules to support expert review of CyBOK mappings.
If this is right
- Expert review of a short candidate list better captures practical utility for CyBOK mapping than requiring an exact structural match.
- The combination of normalization, expansion, and ranking rules produces candidate sets that experts accept at high rates.
- Performance on the expert usefulness metric rises from development to validation data, indicating the approach generalizes to new terms.
- Structural semantic alignment alone understates the framework's value for curriculum tasks.
Where Pith is reading between the lines
- Curriculum designers could embed the candidate generator in workflow tools to shorten the time spent on manual alignment.
- The same human-in-the-loop pattern might transfer to mapping tasks for other domain bodies of knowledge.
- If larger labeled sets become available, the ranking rules could be tuned further while preserving interpretability for experts.
Load-bearing premise
The ECA-5 metric based on expert judgment of top-5 candidates provides a faithful account of practical utility and the development and validation datasets represent real-world educational KWoPs.
What would settle it
A fresh collection of KWoPs drawn from actual cybersecurity courses in which experts judge that none of the top-5 candidates is an exact match or acceptable nearest placement for more than 10 percent of the terms.
Figures
read the original abstract
This paper presents CyBOKClaw, an interpretable human-in-the-loop retrieval framework for mapping cybersecurity keywords or phrases (KWoPs) to the Cyber Security Body of Knowledge (CyBOK). Rather than treating the task as strict exact classification, the framework is designed as a top-k candidate generator for expert review. It combines query normalization, curated term expansion, concept-level boosts, topic-description enrichment, and domain-sensitive ranking rules. Because educational KWoPs are often broad, ambiguous, and only approximately aligned with CyBOK terminology, strict exact matching provides only a partial account of practical utility. We therefore evaluate the framework using both structural retrieval metrics and an expert-guided top-5 usefulness metric, ECA-5 (Exact or Closest Acceptable Match at top-5), which records whether the returned candidates contain at least one mapping that an expert would judge exact or accept as the nearest practical CyBOK placement. On the development dataset, CyBOKClaw achieves 64.73% EXA-5 (Exact Match at top-5), 84.18% structural semantic alignment, and 91.88% ECA-5; on the validation dataset, it achieves 81.19% EXA-5, 93.32% structural semantic alignment, and 98.00% ECA-5. These results show that expert-guided top-k usefulness provides a more faithful account of practical CyBOK mapping utility than exact structural matching alone, and that CyBOKClaw is effective as a CyBOK-specific expert-support retrieval system.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents CyBOKClaw, a human-in-the-loop retrieval framework for mapping cybersecurity keywords or phrases (KWoPs) to CyBOK topics. It uses query normalization, curated term expansion, concept boosts, topic enrichment, and domain-sensitive ranking to generate top-k candidates for expert review rather than exact classification. The framework is evaluated on development and validation datasets using EXA-5 (exact match at top-5), structural semantic alignment, and the expert-judged ECA-5 metric (whether at least one top-5 candidate is exact or closest acceptable), reporting 64.73%/84.18%/91.88% on development data and 81.19%/93.32%/98.00% on validation data. The central claim is that ECA-5 provides a more faithful account of practical utility than exact matching alone and that CyBOKClaw is effective as a CyBOK-specific expert-support system.
Significance. If the evaluation methodology is made rigorous, the work provides a concrete, interpretable retrieval approach tailored to the ambiguities of educational cybersecurity terminology, with explicit credit for reporting numerical results on two distinct datasets and for distinguishing exact-match from expert-acceptance metrics. This could support curriculum design tasks where strict classification is insufficient.
major comments (2)
- [Abstract] Abstract (evaluation description): The ECA-5 scores (91.88% development, 98.00% validation) that support the effectiveness claim are defined directly in terms of expert judgment on the system's own top-5 outputs, yet the manuscript provides no information on the number of experts, inter-rater agreement, blinding, conflict-of-interest controls, or sampling procedure. This detail is load-bearing for the central claim that the system is practically useful.
- [Abstract] Abstract (evaluation description): The development and validation datasets are used to report all performance numbers, but no description is given of their construction, size, provenance, or how the KWoPs were sampled or selected. Without this, it is impossible to assess whether the reported ECA-5 figures reflect representative real-world educational mappings or an easy test set.
minor comments (1)
- [Abstract] The abstract introduces multiple acronyms (KWoP, CyBOK, ECA-5, EXA-5) without initial expansion; while common in the subfield, explicit definitions on first use would improve accessibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the evaluation methodology. We address each major comment below and will revise the manuscript to provide the requested details on expert evaluation and dataset construction.
read point-by-point responses
-
Referee: [Abstract] Abstract (evaluation description): The ECA-5 scores (91.88% development, 98.00% validation) that support the effectiveness claim are defined directly in terms of expert judgment on the system's own top-5 outputs, yet the manuscript provides no information on the number of experts, inter-rater agreement, blinding, conflict-of-interest controls, or sampling procedure. This detail is load-bearing for the central claim that the system is practically useful.
Authors: We agree that the manuscript should provide more detail on the expert evaluation process underlying ECA-5 to support the practical-utility claim. In the revised version we will add a dedicated subsection on the expert review protocol, specifying the number of experts, inter-rater agreement statistics, blinding procedures, conflict-of-interest controls, and the sampling procedure used for the judged KWoPs. revision: yes
-
Referee: [Abstract] Abstract (evaluation description): The development and validation datasets are used to report all performance numbers, but no description is given of their construction, size, provenance, or how the KWoPs were sampled or selected. Without this, it is impossible to assess whether the reported ECA-5 figures reflect representative real-world educational mappings or an easy test set.
Authors: We agree that the manuscript must describe dataset construction, size, provenance, and sampling to allow assessment of representativeness. The revised manuscript will include an expanded section detailing both the development and validation datasets, their sizes, sources, and the criteria used to sample or select the KWoPs. revision: yes
Circularity Check
No circularity: empirical evaluation on held-out data with external expert judgments
full rationale
The paper reports performance of a retrieval system on explicitly separated development and validation datasets using both structural metrics (EXA-5, semantic alignment) and the ECA-5 human-judgment metric. ECA-5 is defined as an external expert assessment of whether top-5 outputs contain an acceptable mapping; this is standard relevance evaluation and does not reduce the effectiveness claim to a tautology or fitted input by construction. No equations, self-citations, or uniqueness theorems are invoked to derive the results. The central claim rests on reported numbers from independent validation data rather than any self-referential loop.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Expert judgment is a reliable and unbiased way to assess the usefulness of CyBOK mappings
Reference graph
Works this paper leans on
-
[1]
Addison-Wesley, 2nd edn
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology behind Search. Addison-Wesley, 2nd edn. (2011)
2011
-
[2]
ACM Computing Surveys44(1), 1–50 (2012).https://doi.org/10.1145/ 2071389.2071390
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Computing Surveys44(1), 1–50 (2012).https://doi.org/10.1145/ 2071389.2071390
-
[3]
AI Mag.26(1), 83–94 (Mar 2005)
Doan, A., Halevy, A.Y.: Semantic-integration research in the database community. AI Mag.26(1), 83–94 (Mar 2005)
2005
-
[4]
Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, 2nd edn. (2013).https: //doi.org/10.1007/978-3-642-38721-0
-
[5]
21: Curriculum mapping: A tool for transparent and authentic teaching and learning
Harden, R.M.: Amee guide no. 21: Curriculum mapping: A tool for transparent and authentic teaching and learning. Medical Teacher23(2), 123–137 (2001). https://doi.org/10.1080/01421590120036547
-
[6]
In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Horvitz, E.: Principles of mixed-initiative user interfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. p. 159–166. CHI ’99, Association for Computing Machinery, New York, NY, USA (1999).https: //doi.org/10.1145/302979.303030,https://doi.org/10.1145/302979.303030
-
[7]
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst.20(4), 422–446 (Oct 2002).https://doi.org/10.1145/582415. 582418,https://doi.org/10.1145/582415.582418
-
[8]
Cambridge University Press, Cambridge, England (2008),https://nlp.stanford
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge, England (2008),https://nlp.stanford. edu/IR-book/information-retrieval-book.html
2008
-
[9]
National Initiative for Cybersecurity Education (NICE): NICE Workforce Frame- work for Cybersecurity (NIST Special Publication 800-181 Revision 1). Tech. Rep. NIST SP 800-181 Rev. 1, National Institute of Standards and Technology (2020). https://doi.org/10.6028/NIST.SP.800-181r1
-
[10]
Nautiyal, L., Rashid, A.: CyBOK Mapping Framework for NCSC Certi- fied Degrees: Harvard University Exemplar. Tech. rep., University of Bristol (Jul 2020), https://www.cybok.org/wp-content/uploads/Harvard_University_ Examplar_-_Final-29.07.20.pdf
2020
-
[11]
Nautiyal, L., Rashid, A.: CyBOK Mapping Framework for NCSC Certified Degrees: MIT (USA) Exemplar. Tech. rep., University of Bristol (Jul 2020),https://www. cybok.org/wp-content/uploads/MIT_USA_Examplar_-_Final-29.07.20.pdf
2020
-
[12]
Nautiyal, L., Rashid, A.: CyBOK Mapping Framework for NCSC Certified Degrees: University of Bristol Exemplar. Tech. rep., University of Bristol (Aug 2020),https://www.cybok.org/media/downloads/University_of_Bristol_ Examplar_-_Final-10.08.20.pdf
2020
-
[13]
Nautiyal, L., Rashid, A.: CyBOK Mapping Framework for NCSC Certified Degrees: University of Oxford Exemplar. Tech. rep., University of Bristol (Jul 2020),https:// www.cybok.org/wp-content/uploads/Oxford_Examplar_-_Final-30.07.20.pdf
2020
-
[14]
Nautiyal, L., Rashid, A.: CyBOK Mapping Framework for NCSC Certified Degrees: University of Surrey Exemplar. Tech. rep., University of Bristol (Jul 2020),https:// www.cybok.org/wp-content/uploads/Surrey_Examplar_-_Final-16.07.20.pdf
2020
-
[15]
Sakai, T.: On the reliability of information retrieval metrics based on graded relevance. Inf. Process. Manage.43(2), 531–548 (Mar 2007).https://doi.org/10. 1016/j.ipm.2006.07.020,https://doi.org/10.1016/j.ipm.2006.07.020
-
[16]
Steinberger, P.: OpenClaw: Your own personal ai assistant.https://openclaw.ai/ (2026)
2026
-
[17]
https://www.cybok.org/media/downloads/CyBOK_Knowledge_trees_topic_ list_1.1.0.csv(2021)
The CyBOK Project: CyBOK Knowledge Trees Topic List (Version 1.1.0). https://www.cybok.org/media/downloads/CyBOK_Knowledge_trees_topic_ list_1.1.0.csv(2021)
2021
-
[18]
org/(2021)
The CyBOK Project: The Cyber Security Body of Knowledge.https://www.cybok. org/(2021)
2021
-
[19]
In: Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Voorhees, E.M.: The philosophy of information retrieval evaluation. In: Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems. p. 355–370. CLEF ’01, Springer-Verlag, Berlin, Heidelberg (2001)
2001
-
[20]
(eds.): TREC: Experiment and Evaluation in Information Retrieval
Voorhees, E.M., Harman, D.K. (eds.): TREC: Experiment and Evaluation in Information Retrieval. Digital Libraries and Electronic Publishing, MIT Press, Cambridge, MA (2005) Fig. 2.Example of CyBOK mapping and credit allocation. Module-level mappings and associated credits are aggregated by CyBOK Knowledge Area to produce a curriculum-level coverage profile...
2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.