pith. sign in

arxiv: 2512.12078 · v3 · submitted 2025-12-12 · 💻 cs.CR

The Procedural Semantics Gap in Structured CTI: A Measurement-Driven STIX Analysis for APT Emulation

Pith reviewed 2026-05-16 22:21 UTC · model grok-4.3

classification 💻 cs.CR
keywords cyber threat intelligenceprocedural semanticsadversary emulationbehavioral measurementsemantic gapmulti-stage campaignsstructured objects
0
0 comments X p. Extension

The pith

Structured CTI describes adversary behavior but omits the ordering, preconditions, and assumptions needed to turn descriptions into executable emulation steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures collections of structured threat intelligence objects to test whether they supply enough detail for emulating complete multi-stage attacks. It finds that campaign objects cover only a minority of known techniques with no detectable reusable sequences, while intrusion sets span more ground but still leave out how steps should be ordered or what conditions must hold. The authors then define a three-stage process that converts the available descriptions into concrete operations and records the extra parameters analysts must add. This process succeeds in producing runnable emulation chains, but only because the missing procedural information is supplied from outside the original objects. The work therefore locates the exact boundary between descriptive knowledge and machine-actionable CTI.

Core claim

Measurements of campaign objects show that only 35.6 percent of techniques appear in at least one campaign, and neither clustering by technique overlap nor longest-common-subsequence analysis recovers any reusable behavioral structure. Intrusion sets cover a larger share of the technique space yet still omit the ordering, preconditions, and environmental assumptions required to operationalize the described behavior. A three-stage methodology that translates behavioral information into executable steps while making analyst-supplied parameters explicit demonstrates that structured CTI can support multi-stage emulation once those parameters are recorded.

What carries the argument

A three-stage methodology that converts behavioral descriptions from structured objects into executable steps while forcing explicit recording of ordering, preconditions, and environmental assumptions.

If this is right

  • Emulation of multi-stage campaigns is possible only when analyst parameters and assumptions are recorded alongside the original objects.
  • Current standards stop short of automation precisely at the points where procedural details such as ordering and preconditions are omitted.
  • Behavioral knowledge remains descriptive rather than directly machine-actionable without additional explicit recording.
  • The boundary between what the objects contain and what must be supplied externally can now be measured for any given collection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Standards could add dedicated fields for ordering constraints and precondition checks so that future objects carry more of the procedural layer natively.
  • The translation steps could be applied to other structured intelligence formats to test whether the same gap appears outside the measured collection.
  • Automation pipelines would need standardized ways to store and version the analyst assumptions so that emulations remain reproducible.
  • Longitudinal tracking of the same adversaries could reveal whether the missing procedural details accumulate over time in the objects themselves.

Load-bearing premise

The chosen measurements of the main public collection of structured objects and the two selected case studies are representative enough to support a general claim about a procedural gap across all such intelligence.

What would settle it

Repeating the coverage and sequence measurements on a substantially larger and independently collected set of structured objects and finding high rates of explicit ordering or precondition fields already present inside the objects.

Figures

Figures reproduced from arXiv: 2512.12078 by \'Agney Lopes Roth Ferraz, Louren\c{c}o Alves Pereira J\'unior, Murray Evangelista de Souza, Sidnei Barbieri.

Figure 1
Figure 1. Figure 1: High-level structure of the methodology to convert descriptive CTI into executable multi-stage adversary behavior. Figure 1: High-level structure of the methodology to convert descriptive CTI into executable multi-stage adversary behavior. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Bar chart showing the number of techniques by [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Technique-based similarity of 187 MITRE ATT&CK [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Technique usage across 51 MITRE ATT&CK Enter Figure 5: Technique usage across 51 MITRE ATT&CK Enter iid ii [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Isolated environment used for executing CTI-derived multi-step behaviors. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

Cyber threat intelligence (CTI) encoded in STIX and structured according to the MITRE ATT&CK framework has become a global reference for describing adversary behavior. However, ATT&CK was designed as a descriptive knowledge base rather than a procedural model. We therefore ask whether its structured artifacts contain sufficient behavioral detail to support multi-stage adversary emulation. Through systematic measurements of the ATT&CK Enterprise bundle, we show that campaign objects encode just fragmented slices of behavior. Only 35.6% of techniques appear in at least one campaign, and neither clustering nor sequence analysis reveals any reusable behavioral structure under technique overlap or LCS-based analyses. Intrusion sets cover a broader portion of the technique space, yet omit the procedural semantics required to transform behavioral knowledge into executable chains, including ordering, preconditions, and environmental assumptions. These findings reveal a procedural semantic gap in current CTI standards: they describe what adversaries do, but not exactly how that behavior was operationalized. To assess how far this gap can be bridged in practice, we introduce a three-stage methodology that translates behavioral information from structured CTI into executable steps and makes the necessary environmental assumptions explicit. We demonstrate its viability by instantiating the resulting steps as operations in the MITRE Caldera framework. Case studies of ShadowRay and Soft Cell show that structured CTI can enable the emulation of multi-stage APT campaigns, but only when analyst-supplied parameters and assumptions are explicitly recorded. This, in turn, exposes the precise points at which current standards fail to support automation. Our results clarify the boundary between descriptive and machine-actionable CTI for adversary emulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper measures the ATT&CK Enterprise bundle and finds that campaign objects cover only 35.6% of techniques with no reusable structure detected by clustering or LCS sequence analysis, while intrusion sets omit ordering, preconditions, and assumptions. It introduces a three-stage methodology to translate structured CTI into executable Caldera steps and demonstrates this on ShadowRay and Soft Cell case studies, concluding that current STIX/ATT&CK standards exhibit a procedural semantic gap between descriptive and machine-actionable CTI.

Significance. If the measurements and methodology hold, the work provides empirical quantification of coverage and structure limitations in widely used CTI artifacts, clarifying the boundary between descriptive knowledge bases and emulation-ready models. The direct counts from public data and the explicit-assumption approach in the case studies offer a concrete foundation for future standard improvements and automation efforts in adversary emulation.

major comments (2)
  1. [§4 (ATT&CK bundle measurements)] The general claim of a procedural semantic gap across structured CTI is load-bearing on the representativeness of the ATT&CK Enterprise bundle measurements (campaign coverage at 35.6% and absence of structure via clustering/LCS). STIX permits additional object types (observed-data, indicators, relationships) that could encode procedural details; without analyzing or explicitly justifying their exclusion, the gap may be specific to the chosen subset rather than inherent to the standard.
  2. [§6 (Case studies)] In the case-study section, the three-stage methodology's technique inclusion rules, clustering parameters, and assumption-recording process for ShadowRay and Soft Cell are not shown to be pre-specified. This leaves open the possibility that the demonstrated feasibility of multi-stage emulation is influenced by post-hoc choices, weakening support for the claim that structured CTI can enable emulation only when assumptions are recorded.
minor comments (2)
  1. [Abstract] The abstract introduces 'LCS-based analyses' without expanding the acronym or briefly defining longest common subsequence on first use, reducing accessibility for readers outside sequence-analysis subfields.
  2. [§5 (Methodology)] The description of the three-stage methodology would benefit from an explicit numbered list or flowchart in the main text to improve traceability from CTI objects to Caldera operations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which highlight important aspects of scope and methodological transparency. We address each point below and have revised the manuscript to strengthen the presentation.

read point-by-point responses
  1. Referee: [§4 (ATT&CK bundle measurements)] The general claim of a procedural semantic gap across structured CTI is load-bearing on the representativeness of the ATT&CK Enterprise bundle measurements (campaign coverage at 35.6% and absence of structure via clustering/LCS). STIX permits additional object types (observed-data, indicators, relationships) that could encode procedural details; without analyzing or explicitly justifying their exclusion, the gap may be specific to the chosen subset rather than inherent to the standard.

    Authors: We focused the measurements on the ATT&CK Enterprise bundle because it constitutes the primary structured representation of adversary techniques and campaigns within STIX. Objects such as observed-data and indicators are designed for detection and evidence rather than encoding the procedural ordering, preconditions, or environmental assumptions required for emulation. STIX relationships link entities but do not supply the missing behavioral semantics identified in our clustering and LCS analyses. We have added an explicit justification subsection in §4 that delineates this scope, explains why supplementary STIX types do not close the procedural gap, and notes that future work could extend the analysis to other object classes. This revision preserves the core claim while clarifying its boundaries. revision: yes

  2. Referee: [§6 (Case studies)] In the case-study section, the three-stage methodology's technique inclusion rules, clustering parameters, and assumption-recording process for ShadowRay and Soft Cell are not shown to be pre-specified. This leaves open the possibility that the demonstrated feasibility of multi-stage emulation is influenced by post-hoc choices, weakening support for the claim that structured CTI can enable emulation only when assumptions are recorded.

    Authors: The three-stage methodology was developed from the §4 measurement findings, with technique inclusion rules based on standard ATT&CK coverage thresholds, clustering parameters drawn from established sequence-analysis practices (e.g., similarity cutoffs used in prior behavioral clustering studies), and the assumption-recording step defined as an integral output of the translation process. To eliminate ambiguity, we have revised §6 to state these choices explicitly, provide their pre-application rationale, and confirm that the parameters were fixed prior to the ShadowRay and Soft Cell instantiations. The revised text also emphasizes that the case studies serve to illustrate the necessity of recording assumptions rather than to claim fully automated translation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; measurements and methodology are self-contained

full rationale

The paper's derivation chain consists of direct empirical counts (e.g., 35.6% technique coverage in campaigns) and sequence checks (clustering, LCS) performed on the public ATT&CK Enterprise bundle, followed by a forward three-stage translation methodology demonstrated in two case studies. No equations, fitted parameters, or predictions reduce to inputs by construction. No self-citations are load-bearing for the central claim, no uniqueness theorems are imported, and no ansatzes are smuggled. The analysis relies on observable data properties rather than self-referential definitions or renamings of known results, making the procedural-gap conclusion independent of the paper's own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on the public ATT&CK bundle as ground truth and introduces no new entities or fitted constants; the main unstated premise is that the chosen bundle and two case studies suffice to characterize the entire class of structured CTI.

axioms (1)
  • domain assumption The ATT&CK Enterprise bundle is a representative sample of real-world adversary behavior descriptions
    All coverage percentages and structure conclusions are computed from this single dataset.

pith-pipeline@v0.9.0 · 5615 in / 1291 out tokens · 34210 ms · 2026-05-16T22:21:44.268513+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

  1. [1]

    Berkay Celik, Xiangyu Zhang, and Dongyan Xu

    Abdulellah Alsaheel, Yuhong Nan, Shiqing Ma, Le Yu, Gregory Walkup, Z. Berkay Celik, Xiangyu Zhang, and Dongyan Xu. 2021. ATLAS: A Sequence-based Learn- ing Approach for Attack Investigation. InProceedings of the 30th USENIX Security Symposium (USENIX Security ’21). USENIX Association, Vancouver, BC, Canada, 3005–3022. https://www.usenix.org/conference/us...

  2. [2]

    Enes Altinisik, Fatih Deniz, and Hüsrev Taha Sencar. 2023. ProvG-Searcher: A Graph Representation Learning Approach for Efficient Provenance Graph Search. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security(Copenhagen, Denmark)(CCS ’23). Association for Computing Machinery, New York, NY, USA, 2247–2261. doi:10.1145/3576...

  3. [3]

    Ahmed Aly, Essam Mansour, and Amr Youssef. 2025. OCR-APT: Reconstructing APT Stories from Audit Logs Using Subgraph Anomaly Detection and LLMs. In Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS ’25)(Taipei, Taiwan). Association for Computing Machinery, New York, NY, USA, 261–275. doi:10.1145/3719027.3765219

  4. [4]

    Frederick Barr-Smith, Xabier Ugarte-Pedrero, Mariano Graziano, Riccardo Spo- laor, and Ivan Martinovic. 2021. Survivalism: Systematic Analysis of Windows Malware Living-Off-The-Land. In2021 IEEE Symposium on Security and Privacy (SP). IEEE, Los Alamitos, CA, USA, 1557–1574. doi:10.1109/SP40001.2021.00047

  5. [5]

    Marvin Büchel, Tommaso Paladini, Stefano Longari, Michele Carminati, Stefano Zanero, Hodaya Binyamini, Gal Engelberg, Dan Klein, Giancarlo Guizzardi, Marco Caselli, Andrea Continella, Maarten van Steen, Andreas Peter, and Thijs van Ede

  6. [6]

    In Proceedings of the 34th USENIX Security Symposium (USENIX Security ’25) (SEC ’25)

    SoK: Automated TTP Extraction from CTI Reports – Are We There Yet?. In Proceedings of the 34th USENIX Security Symposium (USENIX Security ’25) (SEC ’25). USENIX Association, Seattle, WA, USA, Article 238, 21 pages

  7. [7]

    Yutong Cheng, Osama Bajaber, Saimon Amanuel Tsegai, Dawn Song, and Peng Gao. 2025. CTINexus: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models. InProceedings of the 2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P). IEEE, Dublin, Ireland, 923–938. doi:10.1109/EuroSP63326.2025.00057

  8. [8]

    Zijun Cheng, Qiujian Lv, Jinyuan Liang, Yan Wang, Degang Sun, Thomas Pasquier, and Xueyuan Han. 2024. Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance. In2024 IEEE Symposium on Security and Privacy (SP). IEEE, Los Alamitos, CA, USA, 3533–3551. doi:10.1109/SP54263.2024.00005

  9. [9]

    2020.Fox Kitten: Widespread Iranian Espionage-Offensive Campaign

    ClearSky Cyber Security. 2020.Fox Kitten: Widespread Iranian Espionage-Offensive Campaign. Technical Report. ClearSky Cyber Security. https://www.clearskysec. com/wp-content/uploads/2020/02/ClearSky-Fox-Kitten-Campaign.pdf Threat intelligence report

  10. [10]

    Damodaran and Paul D

    Suresh K. Damodaran and Paul D. Rowe. 2025. Automated Repeatable Adversary Threat Emulation with Effects Language (EL). arXiv:2510.06420 arXiv:2510.06420

  11. [11]

    Sofia Della Penna, Roberto Natella, Vittorio Orbinato, Lorenzo Parracino, and Luciano Pianese. 2025. CTI-HAL: A Human-Annotated Dataset for Cyber Threat Intelligence Analysis. arXiv preprint arXiv:2504.05866. https://arxiv.org/abs/ 2504.05866 Unpublished

  12. [12]

    Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. 2024. PENTESTGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing. InProceedings of the 33rd USENIX Security Symposium(Philadelphia, PA, USA)(SEC ’24). USENIX Association, USA, Article 48, 18 pages

  13. [13]

    Jiangyi Deng, Xinfeng Li, Yanjiao Chen, Yijie Bai, Haiqin Weng, Yan Liu, Tao Wei, and Wenyuan Xu. 2024. RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer. arXiv preprint arXiv:2409.02074. https: //arxiv.org/abs/2409.02074 Unpublished

  14. [14]

    Feng Dong, Shaofei Li, Peng Jiang, Ding Li, Haoyu Wang, Liangyi Huang, Xusheng Xiao, Jiedong Chen, Xiapu Luo, Yao Guo, and Xiangqun Chen. 2023. Are we there yet? An Industrial Viewpoint on Provenance-based Endpoint De- tection and Response Tools. InProceedings of the 2023 ACM SIGSAC Confer- ence on Computer and Communications Security(Copenhagen, Denmark)...

  15. [15]

    Feng Dong, Liu Wang, Xu Nie, Fei Shao, Haoyu Wang, Ding Li, Xiapu Luo, and Xusheng Xiao. 2023. DISTDET: A Cost-Effective Distributed Cyber Threat Detection System. InProceedings of the 32nd USENIX Security Symposium (USENIX Security ’23). USENIX Association, Anaheim, CA, USA, 6575–6592. https://www. usenix.org/conference/usenixsecurity23/presentation/dong-feng

  16. [16]

    2019.Double Dragon: APT41, a Dual Espionage and Cyber Crime Operation

    FireEye Threat Intelligence. 2019.Double Dragon: APT41, a Dual Espionage and Cyber Crime Operation. Technical Report. FireEye. https://cloud.google.com/blog/topics/threat-intelligence/apt41-dual- espionage-and-cyber-crime-operation Threat intelligence report

  17. [17]

    Ryan Gabrys, Mark Bilinski, Sunny Fugate, and Daniel Silva. 2024. Using Natural Language Processing Tools to Infer Adversary Techniques and Tactics under the Mitre ATT&CK Framework. InProceedings of the 2024 IEEE 14th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, Las Vegas, NV, USA, 541–547. doi:10.1109/CCWC60891.2024.10427746

  18. [18]

    Mickens, and Margo I

    Xueyuan Han, Thomas Pasquier, Adam Bates, James W. Mickens, and Margo I. Seltzer. 2020. UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats. arXiv preprint arXiv:2001.01525. https://arxiv.org/abs/2001. 01525 Unpublished

  19. [19]

    Md Nahid Hossain, Sanaz Sheikhi, and R. Sekar. 2020. Combating Dependence Explosion in Forensic Analysis Using Alternative Tag Propagation Semantics. In 2020 IEEE Symposium on Security and Privacy (SP). IEEE, Los Alamitos, CA, USA, 1139–1155. doi:10.1109/SP40000.2020.00064

  20. [20]

    Zian Jia, Yun Xiong, Yuhong Nan, Yao Zhang, Jinjing Zhao, and Mi Wen. 2024. MAGIC: Detecting Advanced Persistent Threats via Masked Graph Represen- tation Learning. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, USA, 5197–5214. https://www.usenix. org/conference/usenixsecurity24/presentation/jia-zian

  21. [21]

    Peng Jiang, Ruizhe Huang, Ding Li, Yao Guo, Xiangqun Chen, Jianhai Luan, Yuxin Ren, and Xinwei Hu. 2023. Auditing Frameworks Need Resource Isolation: A Systematic Study on the Super Producer Threat to System Auditing and Its Mitigation. InProceedings of the 32nd USENIX Security Symposium (USENIX Security ’23). USENIX Association, Anaheim, CA, USA, 355–372...

  22. [22]

    Beomjin Jin, Eunsoo Kim, Hyunwoo Lee, Elisa Bertino, Doowon Kim, and Hy- oungshick Kim. 2024. Sharing Cyber Threat Intelligence: Does It Really Help?. In Proceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego, CA, USA, –

  23. [23]

    Joseph Khoury, Ðorđe Klisura, Hadi Zanddizari, Gonzalo De La Torre Parra, Peyman Najafirad, and Elias Bou-Harb. 2024. Jbeil: Temporal Graph-Based Inductive Learning to Infer Lateral Movement in Evolving Enterprise Networks. InProceedings of the 2024 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, CA, USA, 3644–3660. doi:10.1109/SP54263.2...

  24. [24]

    King and H

    Isaiah J. King and H. Howie Huang. 2023. Euler: Detecting Network Lateral Movement via Scalable Temporal Link Prediction.ACM Transactions on Privacy and Security26, 3, Article 35 (June 2023), 36 pages. doi:10.1145/3588771

  25. [25]

    Shaofei Li, Feng Dong, Xusheng Xiao, Haoyu Wang, Fei Shao, Jiedong Chen, Yao Guo, Xiangqun Chen, and Ding Li. 2024. NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation. InProceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego, CA, USA, –. doi:10.14722/ndss.2024.232...

  26. [26]

    Mingqi Lv, Hongzhe Gao, Xuebo Qiu, Tieming Chen, Tiantian Zhu, Jinyin Chen, and Shouling Ji. 2024. TREC: APT Tactic / Technique Recognition via Few- Shot Provenance Subgraph Learning. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security(Salt Lake City, UT, USA) (CCS ’24). Association for Computing Machinery, New York,...

  27. [27]

    2025.Google Cloud Security 2025 Report: M-Trends

    Mandiant Consulting. 2025.Google Cloud Security 2025 Report: M-Trends. Techni- cal Report. Mandiant, Google Cloud Security. https://services.google.com/fh/ files/misc/google_cloud_m-trends_2025.pdf Threat intelligence report

  28. [28]

    Yuhan Meng, Shaofei Li, Jiaping Gui, Peng Jiang, and Ding Li. 2025. KnowHow: Automatically Applying High-level CTI Knowledge for Interpretable and Accu- rate Provenance Analysis. arXiv:2509.05698 arXiv:2509.05698

  29. [29]

    Milajerdi, Rigel Gjomemo, Birhanu Eshete, R

    Sadegh M. Milajerdi, Rigel Gjomemo, Birhanu Eshete, R. Sekar, and V. N. Venkatakrishnan. 2019. HOLMES: Real-Time APT Detection through Corre- lation of Suspicious Information Flows. In2019 IEEE Symposium on Security and Privacy (SP). IEEE, Los Alamitos, CA, USA, 1137–1152. doi:10.1109/SP.2019.00026

  30. [30]

    OASIS CTI Technical Committee. 2021. STIX Version 2.1: Committee Specification

  31. [31]

    https://docs.oasis-open.org/cti/stix/v2.1/stix-v2.1.html [Online; accessed 2024-06-01]

  32. [32]

    Vittorio Orbinato, Marco Carlo Feliciano, Domenico Cotroneo, and Roberto Natella. 2024. Laccolith: Hypervisor-based Adversary Emulation with Anti- detection.IEEE Transactions on Dependable and Secure Computing21, 6 (2024), 5374–5387

  33. [33]

    Wei Qiao, Yebo Feng, Teng Li, Zhuo Ma, Yulong Shen, Jianfeng Ma, and Yang Liu

  34. [34]

    InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS ’25)(Taipei, Taiwan)

    Slot: Provenance-Driven APT Detection through Graph Reinforcement Learning. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS ’25)(Taipei, Taiwan). Association for Computing Machinery, New York, NY, USA, 963–977. doi:10.1145/3719027.3744788

  35. [35]

    Kiavash Satvat, Rigel Gjomemo, and V. N. Venkatakrishnan. 2021. Extractor: Extracting Attack Behavior from Threat Reports. In2021 IEEE European Sympo- sium on Security and Privacy (EuroS&P). IEEE, Los Alamitos, CA, USA, 598–615. doi:10.1109/EuroSP51992.2021.00046

  36. [36]

    Xiangmin Shen, Zhenyuan Li, Graham Burleigh, Lingzhi Wang, and Yan Chen

  37. [37]

    InProceedings of the 19th ACM Asia Conference on Computer and Communications Security(Singapore, Singapore)(ASIA CCS ’24)

    Decoding the MITRE Engenuity ATT&CK Enterprise Evaluation: An Analysis of EDR Performance in Real-World Environments. InProceedings of the 19th ACM Asia Conference on Computer and Communications Security(Singapore, Singapore)(ASIA CCS ’24). Association for Computing Machinery, New York, NY, USA, 96–111. doi:10.1145/3634737.3645012

  38. [38]

    Xiangmin Shen, Lingzhi Wang, Zhenyuan Li, Yan Chen, Wencheng Zhao, Dawei Sun, Jiashui Wang, and Wei Ruan. 2025. PentestAgent: Incorporating LLM Agents to Automated Penetration Testing. InProceedings of the 20th ACM Asia Conference on Computer and Communications Security (ASIA CCS ’25). Association for Computing Machinery, New York, NY, USA, 375–391. doi:1...

  39. [39]

    Brian Singer, Keane Lucas, Lakshmi Adiga, Meghna Jain, Lujo Bauer, and Vyas Sekar. 2025. On the Feasibility of Using LLMs to Autonomously Execute Multi- host Network Attacks. arXiv preprint arXiv:2501.16466. https://arxiv.org/abs/ 2501.16466 Unpublished

  40. [40]

    Brian Singer, Yusuf Saquib, Lujo Bauer, and Vyas Sekar. 2025. Perry: A High-level Framework for Accelerating Cyber Deception Experimentation. arXiv preprint arXiv:2506.20770. https://arxiv.org/abs/2506.20770 Unpublished

  41. [41]

    2018.MITRE ATT&CK: Design and Philosophy

    Blake E Strom, Andy Applebaum, Doug P Miller, Kathryn C Nickels, Adam G Pennington, and Cody B Thomas. 2018.MITRE ATT&CK: Design and Philosophy. Technical Report. The MITRE Corporation, McLean, VA, USA. – pages

  42. [42]

    Nan Sun, Ming Ding, Jiaojiao Jiang, Weikang Xu, Xiaoxing Mo, Yonghang Tai, and Jun Zhang. 2023. Cyber Threat Intelligence Mining for Proactive Cybersecurity Defense: A Survey and New Perspectives.IEEE Communications Surveys & Tutorials25, 3 (2023), 1748–1774. doi:10.1109/COMST.2023.3273282

  43. [43]

    Mati Ur Rehman, Hadi Ahmadi, and Wajih Ul Hassan. 2024. Flash: A Compre- hensive Approach to Intrusion Detection via Provenance Graph Representation Learning. In2024 IEEE Symposium on Security and Privacy (SP). IEEE, Los Alami- tos, CA, USA, 3552–3570. doi:10.1109/SP54263.2024.00139

  44. [44]

    Mathew Vermeer, Michel van Eeten, and Carlos Gañán. 2022. Ruling the Rules: Quantifying the Evolution of Rulesets, Alerts and Incidents in Network Intrusion Detection. InProceedings of the 2022 ACM on Asia Conference on Computer and Communications Security(Nagasaki, Japan)(ASIA CCS ’22). Association for Com- puting Machinery, New York, NY, USA, 799–814. d...

  45. [45]

    Caio M. C. Viana, Carlos H. G. Ferreira, Fabricio Murai, Aldri Luiz Dos Santos, and Lourenço Alves Pereira Júnior. 2024. Devil in the Noise: Detecting Ad- vanced Persistent Threats with Backbone Extraction. In2024 IEEE Symposium on Computers and Communications (ISCC). IEEE, Los Alamitos, CA, USA, 1–7. doi:10.1109/ISCC61673.2024.10733665

  46. [46]

    Apurva Virkud, Muhammad Adil Inam, Andy Riddle, Jason Liu, Gang Wang, and Adam Bates. 2024. How Does Endpoint Detection Use the MITRE ATT&CK Framework?. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 3891–3908. https://www.usenix.org/conference/ usenixsecurity24/presentation/virkud

  47. [47]

    Jian Wang, Tiantian Zhu, Chunlin Xiong, and Yan Chen. 2024. MultiKG: Multi- source Threat Intelligence Aggregation for High-quality Knowledge Graph Rep- resentation of Attack Techniques. arXiv:2411.08359 arXiv:2411.08359

  48. [48]

    Lingzhi Wang, Zhenyuan Li, Yi Jiang, Zhengkai Wang, Zonghan Guo, Jiahui Wang, Yangyang Wei, Xiangmin Shen, Wei Ruan, and Yan Chen. 2024. From Sands to Mansions: Towards Automated Cyberattack Emulation with Classical Planning and Large Language Models. arXiv:2407.16928 arXiv:2407.16928

  49. [49]

    Ming Xu, Hongtai Wang, Jiahao Liu, Yun Lin, Chenyang Xu, Yingshi Liu, Hoon Wei Lim, and Jin Song Dong. 2024. IntelEX: A LLM-driven Attack-level Threat Intelligence Extraction Framework. arXiv preprint arXiv:2412.10872. https://arxiv.org/abs/2412.10872 Unpublished

  50. [50]

    Fan Yang, Jiacen Xu, Chunlin Xiong, Zhou Li, and Kehuan Zhang. 2023. PROGRA- PHER: An Anomaly Detection System Based on Provenance Graph Embedding. InProceedings of the 32nd USENIX Security Symposium (USENIX Security ’23). USENIX Association, Anaheim, CA, USA, 4355–4372. https://www.usenix.org/ conference/usenixsecurity23/presentation/yang-fan

  51. [51]

    Urias, Han Wei Lin, Gabriela F

    Le Yu, Shiqing Ma, Zhuo Zhang, Guanhong Tao, Xiangyu Zhang, Dongyan Xu, Vincent E. Urias, Han Wei Lin, Gabriela F. Ciocarlie, Vinod Yegneswaran, et al

  52. [52]

    InProceedings of the Network and Distributed System Security Symposium (NDSS)

    ALchemist: Fusing Application and Audit Logs for Precise Attack Prove- nance without Instrumentation. InProceedings of the Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego, CA, USA, –

  53. [53]

    Bo Zhang, Yansong Gao, Changlong Yu, Boyu Kuang, Zhi Zhang, Hyoungshick Kim, and Anmin Fu. 2025. TAPAS: An Efficient Online APT Detection with Task- guided Process Provenance Graph Segmentation and Analysis. InProceedings of the 34th USENIX Security Symposium (USENIX Security ’25). USENIX Association, Seattle, WA, USA, 607–624. The Procedural Semantics Ga...