pith. sign in

arxiv: 2606.21059 · v1 · pith:S2NAK5KKnew · submitted 2026-06-19 · 💻 cs.CR

DEFENGRAPH: Knowledge Graph-Enhanced LLMs for Blue Team Cyber Defense

Pith reviewed 2026-06-26 14:09 UTC · model grok-4.3

classification 💻 cs.CR
keywords knowledge graphlarge language modelscybersecurityblue team defenseSIEM alertscyber range exercisesdecision supportgraph retrieval
0
0 comments X

The pith

A dual-layer static-dynamic knowledge graph grounds LLMs to raise accuracy in cyber defense reasoning and actions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DEFENGRAPH, an LLM assistant for blue team cyber defense that builds a dual-layer static-dynamic knowledge graph from SIEM alerts, system topology, attacker behaviors, and prior defensive actions. Graph-based path retrieval, LLM-driven contextual filtering, and reasoning-based re-ranking then connect long-term domain knowledge with evolving event context. Evaluations on data from live Red versus Blue cyber range exercises show higher reasoning-recall and ticket-action recall across GPT-4o, LLaMA-3, DeepSeek-R1, and QWen-3 while fault rates stay steady and more correct defense actions surface. The approach targets hallucinations and shallow temporal reasoning that limit plain LLMs in high-stakes, time-evolving settings.

Core claim

DEFENGRAPH integrates a dual-layer Static-Dynamic Knowledge Graph with graph-based path retrieval, LLM-driven contextual filtering, and reasoning-based re-ranking to ground LLM outputs in both long-term domain knowledge and evolving event context from heterogeneous security artifacts, enabling faithful and temporally aware decision support as measured by improved recall metrics on realistic noisy datasets from cyber range exercises.

What carries the argument

Dual-layer Static-Dynamic Knowledge Graph together with graph-based path retrieval, LLM-driven contextual filtering, and reasoning-based re-ranking.

If this is right

  • Reasoning-recall rises from 61.45% to 73.49% on GPT-4o.
  • Ticket-action recall rises from 52.17% to 72.46% on GPT-4o with precision moving from 24.49% to 29.24%.
  • Up to 50 correct defense actions surface versus 36 for the next best baseline.
  • Comparable recall gains appear on LLaMA-3, DeepSeek-R1, and QWen-3.
  • Fault rates remain steady across the tested models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same grounding pattern could apply to other high-stakes domains that require tracking evolving states, such as network operations or industrial control.
  • Structured external memory may allow smaller or less specialized models to reach performance levels otherwise needing larger ones.
  • Maintaining an accurate dynamic layer in live environments would require automated update mechanisms not tested in the exercises.

Load-bearing premise

Knowledge graphs built from SIEM alerts, system topology, attacker behaviors, and prior defensive actions faithfully represent both long-term domain knowledge and evolving event context in a manner that directly enables the observed improvements.

What would settle it

Replacing the constructed knowledge graphs with random connections or incomplete security data and checking whether the recall gains on reasoning and actions disappear would settle whether the graph integration drives the results.

Figures

Figures reproduced from arXiv: 2606.21059 by Ahmad Mohsin, Ahmed Ibrahim, Diksha Goel, Gang Li, Guangsheng Yu, Helge Janicke, Kristen Moore, Minjune Kim, Qin Wang, Zhen Wang.

Figure 1
Figure 1. Figure 1: Top: Wazuh alert for a suspicious login to a Linux server. Bottom: [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Complex and noisy context in Cybersecurity defense. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: System overview of DEFENGRAPH. Historical alerts and tickets are structured into a static KG, while new Wazuh alerts provide dynamic knowledge. The system KG encodes the host and service topology. A static–dynamic KG is built by fusing historical and real-time data. Given a new alert, the system extracts subgraphs, ranks reasoning paths with an LLM, and generates context-aware defense actions aligned with … view at source ↗
Figure 4
Figure 4. Figure 4: Red Team Attack Tracing. Left shows the attack trace from the red team. Red Nodes represent attacking actions performed by red team members. For example, the attack 8 (red09 attack 8) is carried out by the red team member 09. Blue Nodes and Purple Nodes indicate servers or firewalls. Right shows the attacking shell scripts used in red team attack 8, along with the corresponding defense ticket from the blue… view at source ↗
Figure 5
Figure 5. Figure 5: The ACDC Knowledge Graph example demonstrates the nodes and [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Wazuh alert graph query on the entity node RSLGB [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Examples of incorrect aggressive actions suggested by LLMs. Top: [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Output comparison of LLM-based Contextual Filter between the DG (SKG) and the DG-Full. [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 11
Figure 11. Figure 11: Demonstration of Top-K generated relation paths (§III-C) and [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example generated action generated by Full D [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Example generated response by DG (SKG) based on static KG with [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
read the original abstract

Large Language Models (LLMs) show promise for supporting decision-making in cybersecurity, but their reliability in high-stakes, time-evolving environments remains limited due to hallucinations, poor temporal reasoning, and shallow grounding in system context. We introduce DEFENGRAPH, an LLM-driven assistant designed to support human defenders during cybersecurity incidents. DEFENGRAPH improves contextual reasoning by integrating a dual-layer Static-Dynamic Knowledge Graph (KG) with graph-based path retrieval, LLM-driven contextual filtering, and reasoning-based re-ranking. The framework grounds LLM outputs in both long-term domain knowledge and evolving event context, enabling faithful and temporally aware decision support. We evaluate DEFENGRAPH in a cyber defense setting using knowledge graphs constructed from heterogeneous security artifacts, including SIEM alerts, system topology, attacker behaviors, and prior defensive actions. The evaluation uses data collected during live Red vs. Blue team cyber range exercises simulating attacks on critical infrastructure, which generate realistic and noisy datasets reflecting real-world defender workflows and system dynamics. Evaluations across four prevalent LLMs show that DEFENGRAPH sets a new state-of-the-art: on GPT-4o it boosts reasoning-recall from 61.45\% to 73.49\% and ticket-action recall from 52.17% to 72.46% (precision 24.49\% to 29.24\%), with similar gains on LLaMA-3 (46.99\% to 61.45\%), DeepSeek-R1 (45.78\% to 56.63\%) and QWen-3 (51.81\% to 59.04\%), while surfacing up to 50 correct defense actions versus 36 for the next best baseline and holding fault rates steady.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces DEFENGRAPH, an LLM-driven assistant for blue-team cyber defense that augments models via a dual-layer Static-Dynamic Knowledge Graph (KG) built from SIEM alerts, system topology, attacker behaviors, and prior actions. Graph path retrieval, LLM contextual filtering, and reasoning-based re-ranking are used to ground outputs in long-term knowledge and evolving context. On data from live Red-vs-Blue cyber-range exercises, the system is reported to raise reasoning-recall from 61.45% to 73.49% and ticket-action recall from 52.17% to 72.46% (precision 24.49% to 29.24%) on GPT-4o, with analogous gains on LLaMA-3, DeepSeek-R1 and QWen-3, while surfacing up to 50 correct actions versus 36 for the next baseline and keeping fault rates steady.

Significance. If the empirical claims are substantiated with full methodological detail, the work would offer a concrete demonstration that dual-layer KGs can measurably improve LLM reliability for time-sensitive defensive decision support on realistic, noisy security data. The multi-model evaluation and use of live exercise traces are strengths that increase external validity relative to purely synthetic benchmarks.

major comments (3)
  1. [Abstract and Evaluation section] Abstract and Evaluation section: the concrete lifts (reasoning-recall 61.45%→73.49%, ticket-action recall 52.17%→72.46% on GPT-4o) are stated without definitions of the recall/precision metrics, descriptions of baseline implementations, statistical significance tests, or data-exclusion rules applied to the cyber-range dataset. These omissions are load-bearing for any claim that the gains are attributable to the proposed mechanisms rather than confounds or metric choices.
  2. [Methodology / KG construction section] Methodology / KG construction section: the central attribution of performance gains to the Static-Dynamic KG plus path retrieval, filtering, and re-ranking rests on the unverified assumption that the constructed graphs faithfully encode both long-term domain knowledge and evolving event context. No coverage metrics, expert validation of completeness, or ablation isolating graph structure from raw-text augmentation are supplied, leaving open the possibility that improvements arise simply from additional structured context.
  3. [Results table] Results table (action-surfacing numbers): the claim of surfacing up to 50 correct defense actions versus 36 for the next baseline is presented without variance estimates, per-exercise breakdowns, or confirmation that the same set of ground-truth actions was used for all systems, undermining the cross-baseline comparison.
minor comments (2)
  1. [Figure 1 / §3] The notation distinguishing static versus dynamic layers in the KG could be made more explicit, ideally with an accompanying diagram that labels edge types and temporal scopes.
  2. [Related Work] A small number of recent KG-augmented LLM papers in the security domain are not referenced in the related-work section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications and commitments to revisions that strengthen the manuscript's transparency without altering its core claims.

read point-by-point responses
  1. Referee: [Abstract and Evaluation section] Abstract and Evaluation section: the concrete lifts (reasoning-recall 61.45%→73.49%, ticket-action recall 52.17%→72.46% on GPT-4o) are stated without definitions of the recall/precision metrics, descriptions of baseline implementations, statistical significance tests, or data-exclusion rules applied to the cyber-range dataset. These omissions are load-bearing for any claim that the gains are attributable to the proposed mechanisms rather than confounds or metric choices.

    Authors: We agree these details are essential for rigorous interpretation. The revised manuscript will add explicit definitions of reasoning-recall and ticket-action recall (including how ground-truth positives are identified from the annotated cyber-range traces), full descriptions of baseline implementations, results from statistical significance tests such as McNemar's test, and a statement of any data-exclusion rules. These additions will be placed in a new subsection of the Evaluation section. revision: yes

  2. Referee: [Methodology / KG construction section] Methodology / KG construction section: the central attribution of performance gains to the Static-Dynamic KG plus path retrieval, filtering, and re-ranking rests on the unverified assumption that the constructed graphs faithfully encode both long-term domain knowledge and evolving event context. No coverage metrics, expert validation of completeness, or ablation isolating graph structure from raw-text augmentation are supplied, leaving open the possibility that improvements arise simply from additional structured context.

    Authors: The manuscript details KG construction from SIEM alerts, topology, attacker behaviors, and prior actions, with consistent gains across four LLMs supporting the value of the structured dual-layer approach. We will add quantitative coverage metrics for both Static and Dynamic layers. Formal expert validation of completeness was not performed in the original study; we will note this limitation explicitly. A full ablation isolating graph structure from raw-text context was not conducted; we will either add a targeted ablation where feasible or discuss it as future work while maintaining that path retrieval and re-ranking provide benefits beyond unstructured augmentation. revision: partial

  3. Referee: [Results table] Results table (action-surfacing numbers): the claim of surfacing up to 50 correct defense actions versus 36 for the next baseline is presented without variance estimates, per-exercise breakdowns, or confirmation that the same set of ground-truth actions was used for all systems, undermining the cross-baseline comparison.

    Authors: The 50 versus 36 figures represent the maximum correct actions surfaced across the set of exercises. The revision will include variance estimates (standard deviation across exercises), per-exercise breakdowns in an appendix, and an explicit statement of the evaluation protocol confirming that identical ground-truth action sets—derived from the same annotated traces—were used for every system and baseline. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation rests on external baselines and data

full rationale

The paper describes an empirical system (DEFENGRAPH) that constructs dual-layer KGs from SIEM alerts, topology, attacker behaviors and prior actions collected during cyber-range exercises, then measures LLM recall/precision gains against external baselines on the same held-out exercise data. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the abstract or described methodology. The reported lifts (e.g., GPT-4o reasoning-recall 61.45% → 73.49%) are presented as direct experimental outcomes, not derived quantities that reduce to the input construction by definition. The framework is therefore self-contained against the external benchmarks it reports.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, new physical entities, or ad-hoc axioms are stated. The work rests on the domain assumption that heterogeneous security artifacts can be assembled into faithful static and dynamic graphs that improve LLM outputs.

axioms (1)
  • domain assumption Heterogeneous security artifacts (SIEM alerts, topology, attacker behaviors, prior actions) can be assembled into static and dynamic knowledge graphs that faithfully capture both long-term domain knowledge and evolving event context.
    Central premise invoked to justify the dual-layer KG construction and its use for grounding LLM outputs.

pith-pipeline@v0.9.1-grok · 5870 in / 1429 out tokens · 35265 ms · 2026-06-26T14:09:20.639339+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Stop overthinking: A survey on efficient reasoning for large language models,

    Y . Sui, Y .-N. Chuang, G. Wang, J. Zhang, T. Zhang, J. Yuan, H. Liu, A. Wen, S. Zhong, H. Chen, and X. Hu, “Stop overthinking: A survey on efficient reasoning for large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2503.16419

  2. [2]

    Sok: Semantic privacy in large language models,

    B. Ma, Y . Jiang, X. Wang, G. Yu, Q. Wang, C. Sun, C. Li, X. Qi, Y . He, W. Ni, and R. P. Liu, “Sok: Semantic privacy in large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2506.23603

  3. [3]

    When llms meet cybersecurity: a systematic literature review,

    J. Zhang, H. Bu, H. Wen, Y . Liu, H. Fei, R. Xi, L. Li, Y . Yang, H. Zhu, and D. Meng, “When llms meet cybersecurity: a systematic literature review,”Cybersecurity, vol. 8, no. 1, p. 55, Feb 2025. [Online]. Available: https://doi.org/10.1186/s42400-025-00361-w

  4. [4]

    ACM Trans

    H. Xu, S. Wang, N. Li, K. Wang, Y . Zhao, K. Chen, T. Yu, Y . Liu, and H. Wang, “Large language models for cyber security: A systematic literature review,”ACM Trans. Softw. Eng. Methodol., Sep. 2025, just Accepted. [Online]. Available: https://doi.org/10.1145/3769676

  5. [5]

    Accountability and reliability in 6g o-ran: Who is responsible when it fails?

    Y . He, G. Yu, X. Wang, Q. Wang, Z. Niu, W. Ni, and R. P. Liu, “Accountability and reliability in 6g o-ran: Who is responsible when it fails?”IEEE Wireless Communications, vol. 32, no. 2, pp. 52–59, 2025

  6. [6]

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions , volume=

    L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, and T. Liu, “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,”ACM Trans. Inf. Syst., vol. 43, no. 2, Jan. 2025. [Online]. Available: https://doi.org/10.1145/3703155

  7. [7]

    Intellbot: Retrieval augmented llm chatbot for cyber threat knowledge delivery,

    D. R. Arikkat, A. M., N. Binu, P. M., N. Biju, K. S. Arunima, V . P., R. R. K. A., and M. Conti, “Intellbot: Retrieval augmented llm chatbot for cyber threat knowledge delivery,” 2024. [Online]. Available: https://arxiv.org/abs/2411.05442

  8. [8]

    Security and threat detection through cloud-based wazuh deployment,

    S. Moiz, A. Majid, A. Basit, M. Ebrahim, A. A. Abro, and M. Naeem, “Security and threat detection through cloud-based wazuh deployment,” in2024 IEEE 1st Karachi Section Humanitarian Technology Conference (KHI-HTC), 2024, pp. 1–5

  9. [9]

    Enhancing security operations center: Wazuh security event response with retrieval-augmented-generation- driven copilot,

    Ismail, R. Kurnia, F. Widyatama, I. M. Wibawa, Z. A. Brata, Ukasyah, G. A. Nelistiani, and H. Kim, “Enhancing security operations center: Wazuh security event response with retrieval-augmented-generation- driven copilot,”Sensors, vol. 25, no. 3, 2025. [Online]. Available: https://www.mdpi.com/1424-8220/25/3/870

  10. [10]

    Dense passage retrieval for open-domain question answering,

    V . Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih, “Dense passage retrieval for open-domain question answering,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y . He, and Y . Liu, Eds. Online: Association for Computational Linguistics, Nov. 2020, pp. 6...

  11. [11]

    A survey on RAG meeting LLMs: Towards retrieval-augmented large language models,

    W. Fan, Y . Ding, L. Ning, S. Wang, H. Li, D. Yin, T.-S. Chua, and Q. Li, “A survey on RAG meeting LLMs: Towards retrieval-augmented large language models,” inACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD), 2024

  12. [12]

    Context- aware prompting for llm-based program repair,

    Y . Li, M. Cai, J. Chen, Y . Xu, L. Huang, and J. Li, “Context- aware prompting for llm-based program repair,”Automated Software Engineering (ASE), 2025

  13. [13]

    Make your LLM fully utilize the context,

    S. An, Z. Ma, Z. Lin, N. Zheng, J.-G. Lou, and W. Chen, “Make your LLM fully utilize the context,”Advances in Neural Information Processing Systems (NeurIPS), 2024

  14. [14]

    Machine learning on knowledge graphs for context-aware security monitoring,

    J. S. Garrido, D. Dold, and J. Frank, “Machine learning on knowledge graphs for context-aware security monitoring,” inIEEE International Conference on Cyber Security and Resilience (CSR), 2021

  15. [15]

    From local to global: A graph rag approach to query-focused summarization,

    D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson, “From local to global: A graph rag approach to query-focused summarization,”arXiv preprint arXiv:2404.16130, 2024

  16. [16]

    Sub- graph retrieval enhanced model for multi-hop knowledge base question answering,

    J. Zhang, X. Zhang, J. Yu, J. Tang, J. Tang, C. Li, and H. Chen, “Sub- graph retrieval enhanced model for multi-hop knowledge base question answering,” inAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

  17. [17]

    GPT-4 Technical Report

    OpenAI, “GPT-4 technical report,”CoRR, vol. abs/2303.08774, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.08774

  18. [18]

    The llama 3 herd of models,

    A. Grattafiori, A. Dubey, A. Jauhriet al., “The llama 3 herd of models,” 2024. [Online]. Available: https://arxiv.org/abs/2407.21783

  19. [19]

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,

    DeepSeek-AI, “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” 2025. [Online]. Available: https: //arxiv.org/abs/2501.12948

  20. [20]

    Qwen3 technical report,

    A. Yang, A. Li, B. Yang,et al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

  21. [21]

    A hybrid RAG system with comprehensive enhancement on complex reasoning,

    Y . Yuan, C. Liu, J. Yuan, G. Sun, S. Li, and M. Zhang, “A hybrid RAG system with comprehensive enhancement on complex reasoning,”arXiv preprint arXiv:2408.05141, 2024

  22. [22]

    spaCy: Industrial-strength Natural Language Processing in Python,

    M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd, “spaCy: Industrial-strength Natural Language Processing in Python,”https:// spacy.io/, 2020

  23. [23]

    Direct evaluation of chain-of-thought in multi-hop reasoning with knowledge graphs,

    T. Nguyen, L. Luo, F. Shiri, D. Phung, Y .-F. Li, T. Vu, and G. Haffari, “Direct evaluation of chain-of-thought in multi-hop reasoning with knowledge graphs,” inAssociation for Computational Linguistics (ACL), 2024

  24. [24]

    Simple is effective: The roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation,

    M. Li, S. Miao, and P. Li, “Simple is effective: The roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation,” inInternational Conference on Learning Representations, 2025

  25. [25]

    Improving multi-hop knowledge base question answering by learning intermediate supervision signals,

    G. He, Y . Lan, J. Jiang, W. X. Zhao, and J.-R. Wen, “Improving multi-hop knowledge base question answering by learning intermediate supervision signals,”ACM International Conference on Web Search and Data Mining (WSDM), 2021

  26. [26]

    Large language models enhanced collaborative filtering,

    Z. Sun, Z. Si, X. Zang, K. Zheng, Y . Song, X. Zhang, and J. Xu, “Large language models enhanced collaborative filtering,” inACM International Conference on Information and Knowledge Management (CIKM), 2024

  27. [27]

    Think-then- react: Towards unconstrained human action-to-reaction generation,

    W. Tan, B. Li, C. Jin, W. Huang, X. Wang, and R. Song, “Think-then- react: Towards unconstrained human action-to-reaction generation,”

  28. [28]

    Available: https://arxiv.org/abs/2503.16451

    [Online]. Available: https://arxiv.org/abs/2503.16451

  29. [29]

    Cyberq: Generat- ing questions and answers for cybersecurity education using knowledge graph-augmented LLMs,

    G. Agrawal, K. Pal, Y . Deng, H. Liu, and Y .-C. Chen, “Cyberq: Generat- ing questions and answers for cybersecurity education using knowledge graph-augmented LLMs,” inProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024

  30. [30]

    Cyber security knowledge graph based cyber attack attribution framework for space-ground in- tegration information network,

    Z. Zhu, R. Jiang, Y . Jia, J. Xu, and A. Li, “Cyber security knowledge graph based cyber attack attribution framework for space-ground in- tegration information network,” inIEEE International Conference on Communication Technology (ICCT), 2018

  31. [31]

    Cskg4apt: A cy- bersecurity knowledge graph for advanced persistent threat organization attribution,

    Y . Ren, Y . Xiao, Y . Zhou, Z. Zhang, and Z. Tian, “Cskg4apt: A cy- bersecurity knowledge graph for advanced persistent threat organization attribution,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 6, pp. 5695–5709, 2023

  32. [32]

    Ctinexus: Automatic cyber threat intelligence knowledge graph construction using large language models,

    Y . Cheng, O. Bajaber, S. A. Tsegai, D. Song, and P. Gao, “Ctinexus: Automatic cyber threat intelligence knowledge graph construction using large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2410.21060

  33. [33]

    Attackg+: Boosting attack graph construction with large language models,

    Y . Zhang, T. Du, Y . Ma, X. Wang, Y . Xie, G. Yang, Y . Lu, and E.-C. Chang, “Attackg+: Boosting attack graph construction with large language models,”Computers & Security, vol. 150, p. 104220, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0167404824005261

  34. [34]

    Actionable cyber threat in- telligence using knowledge graphs and large language models,

    R. Fieblinger, M. T. Alam, and N. Rastogi, “Actionable cyber threat in- telligence using knowledge graphs and large language models,” inIEEE European symposium on security and privacy workshops (EuroS&PW), 2024

  35. [35]

    Kg-ibl: Knowledge graph driven incremental broad learning for few-shot specific emitter identification,

    M. Hua, Y . Zhang, Q. Zhang, H. Tang, L. Guo, Y . Lin, H. Sari, and G. Gui, “Kg-ibl: Knowledge graph driven incremental broad learning for few-shot specific emitter identification,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 10 016–10 028, 2024

  36. [36]

    Wrongdoing monitor: A graph-based behavioral anomaly detection in cyber security,

    C. Wang and H. Zhu, “Wrongdoing monitor: A graph-based behavioral anomaly detection in cyber security,”IEEE Transactions on Information Forensics and Security, vol. 17, pp. 2703–2718, 2022

  37. [37]

    K-getnid: Knowledge-guided graphs for early and transferable network intrusion detection,

    M. Wang, N. Yang, and N. Weng, “K-getnid: Knowledge-guided graphs for early and transferable network intrusion detection,”IEEE Transactions on Information Forensics and Security, vol. 19, p. 7147–7160, Jan. 2024. [Online]. Available: https://doi.org/10.1109/ TIFS.2024.3431932

  38. [38]

    Knowledge graph reasoning for cyber attack detection,

    E. Gilliard, J. Liu, and A. A. Aliyu, “Knowledge graph reasoning for cyber attack detection,”IET Communications, vol. 18, no. 4, p. 297–308, Feb. 2024. [Online]. Available: https://doi.org/10.1049/cmu2.12736