pith. sign in

arxiv: 2606.08700 · v1 · pith:4YBBTYUSnew · submitted 2026-06-07 · 💻 cs.CR

AutoSUT: The Environment Semantics Gap in Structured CTI for Adversary Emulation

Pith reviewed 2026-06-27 17:51 UTC · model grok-4.3

classification 💻 cs.CR
keywords structured CTIadversary emulationATT&CKSTIXSystem Under Testenvironment semantics gapreplay-ready emulationcyber threat intelligence
0
0 comments X

The pith

ATT&CK structured CTI narrows candidate environments for emulation but cannot produce a unique replay-ready target system.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates how much detail about a System Under Test can be extracted from MITRE ATT&CK STIX bundles for use in adversary emulation. It applies metrics for platform coverage, software specificity, vulnerability evidence, and deployment compatibility across Enterprise, Mobile, and ICS datasets. Results show common platform annotations but rare version or CPE details, with 97.6 percent of Enterprise software objects missing both. Structured CTI supports narrowing options and lower-bound assignments yet leaves multiple campaign-compatible SUTs when analyst details vary. This establishes that the corpus constrains the environment without uniquely determining it.

Core claim

ATT&CK-style structured CTI can narrow candidate environments and support lower-bound backend-family assignment, but structured fields alone are insufficient to derive a replay-ready SUT. Profile confusion decreases from 1.3% when one software item is linked to 0% when two are linked. Keeping corpus-supported elements fixed while varying only analyst-authored details yields multiple distinct, campaign-compatible SUTs, including an executable witness exploiting the same real vulnerability. Structured CTI, therefore, constrains but does not uniquely determine the environment.

What carries the argument

The environment semantics gap quantified by platform coverage, software specificity, vulnerability evidence, and deployment compatibility metrics applied to ATT&CK STIX bundles.

If this is right

  • Platform annotations appear frequently enough to support initial narrowing of emulation targets.
  • Software references without versions or CPE identifiers limit the ability to specify exact backend families.
  • Campaign-level CVEs are too sparse to fully specify vulnerability conditions from the corpus alone.
  • Multiple distinct SUTs remain compatible with the same structured CTI when only analyst-authored elements change.
  • Separation between corpus-supported commitments and analyst-authored assumptions improves replay-ready emulation design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Emulation workflows could benefit from explicit documentation of which SUT properties derive from CTI versus external sources.
  • Extending STIX bundles with more version-specific and deployment fields might reduce the identified gap.
  • Testing whether adding external CPE or CVE data ever collapses the set of possible SUTs to a single configuration would validate the boundary identified.

Load-bearing premise

The selected metrics of platform coverage, software specificity, vulnerability evidence, and deployment compatibility applied to the ATT&CK STIX bundles suffice to measure the full gap in environment semantics for replay-ready adversary emulation.

What would settle it

Demonstrating a collection of ATT&CK STIX objects from which exactly one replay-ready SUT configuration can be derived without any external data sources would falsify the insufficiency claim.

Figures

Figures reproduced from arXiv: 2606.08700 by \'Agney Lopes Roth Ferraz, Louren\c{c}o Alves Pereira J\'unior, Sidnei Barbieri.

Figure 1
Figure 1. Figure 1: From structured CTI to partial SUT design. Each evidence layer [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cross-corpus coverage of structured SUT-relevant signals at [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Enterprise software evidence is rarely version-pinned. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Operational CVE funnel in Enterprise STIX data. Most detected [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Compatibility-class distribution for active Enterprise techniques. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Worked examples of the SUT-derivation boundary. The three campaigns show where structured CTI ends and analyst-supplied environment [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
read the original abstract

Structured Cyber Threat Intelligence (CTI) is increasingly used for adversary emulation, detection evaluation, and cyber range design. However, these workflows still require a target System Under Test (SUT) whose environment is not fully described by public CTI. We measure how much of that environment can be derived from MITRE ATT&CK Structured Threat Information Expression (STIX) bundles. Using the ATT&CK Enterprise, Mobile, and Industrial Control Systems datasets, with CAPEC and FiGHT as comparison datasets, we evaluate platform coverage, software specificity, vulnerability evidence, and deployment compatibility. Platform annotations are common, but software references rarely include versions or Common Platform Enumeration (CPE) identifiers. In Enterprise, 97.6% of software objects lack both, and campaign-level Common Vulnerabilities and Exposures (CVEs) remain sparse. Our results show that ATT&CK-style structured CTI can narrow candidate environments and support lower-bound backend-family assignment, but structured fields alone are insufficient to derive a replay-ready SUT. Profile confusion decreases from 1.3% when one software item is linked to 0% when two are linked. The results identify a boundary between environment details supported by the corpus and the version, vulnerability, and deployment information that must come from external sources. Keeping corpus-supported elements fixed while varying only analyst-authored details yields multiple distinct, campaign-compatible SUTs, including an executable witness exploiting the same real vulnerability. Structured CTI, therefore, constrains but does not uniquely determine the environment, highlighting the need to separate corpus-supported commitments from analyst-authored assumptions in replay-ready emulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper measures the completeness of environment semantics in MITRE ATT&CK STIX bundles (Enterprise, Mobile, ICS) for defining replay-ready Systems Under Test (SUTs) in adversary emulation. It reports that platform annotations are common but 97.6% of Enterprise software objects lack version and CPE identifiers, CVEs are sparse at campaign level, and that fixing corpus-supported elements while varying analyst-authored details produces multiple distinct campaign-compatible SUTs (with profile confusion dropping from 1.3% to 0% when linking two software items). The central claim is that ATT&CK-style structured CTI constrains candidate environments and supports lower-bound backend assignment but is insufficient to uniquely determine a replay-ready SUT, requiring external sources for version, vulnerability, and deployment details.

Significance. If the reported counts and multi-SUT demonstration hold, the work supplies a reproducible, quantitative baseline for the environment semantics gap in structured CTI. The direct use of public ATT&CK bundles, CAPEC, and FiGHT as comparators, together with the concrete executable-witness example, makes the boundary between corpus-supported commitments and analyst-authored assumptions falsifiable and actionable for emulation, detection evaluation, and cyber-range workflows. This strengthens the case for explicit separation of those commitments in future tooling.

minor comments (3)
  1. [§3] §3 (Metrics): The four evaluation dimensions (platform coverage, software specificity, vulnerability evidence, deployment compatibility) are clearly motivated, but an explicit enumeration of the STIX query predicates used to classify 'software objects' and 'lacking both version and CPE' would improve replicability of the 97.6% figure.
  2. [Results] Table 2 (or equivalent results table): The profile-confusion percentages (1.3% → 0%) are useful, but the table should state the exact number of campaigns and software linkages examined so readers can assess statistical stability.
  3. [§4.3] §4.3 (Multi-SUT construction): The claim that 'multiple distinct, campaign-compatible SUTs' exist is load-bearing; a short appendix listing the differing analyst-authored fields (OS version, patch level, etc.) for the two SUTs would make the non-uniqueness concrete without lengthening the main text.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the accurate summary of our findings on the environment semantics gap in ATT&CK STIX bundles and for recommending minor revision. The report correctly identifies our quantitative results on platform coverage, software specificity, and the demonstration that multiple campaign-compatible SUTs remain possible when varying analyst-authored details.

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a pure empirical measurement study on public ATT&CK, CAPEC, and FiGHT datasets. All reported results are direct counts (e.g., 97.6% of Enterprise software objects lack version and CPE) or concrete demonstrations that fixing corpus-supported elements while varying analyst-authored details produces multiple campaign-compatible SUTs. No equations, fitted parameters, predictions, or derivations appear; the central claim that structured CTI constrains but does not uniquely determine replay-ready environments follows immediately from the observed gaps without any self-referential reduction or load-bearing self-citation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the selected public STIX datasets and the four evaluation metrics (platform, software, vulnerability, deployment) capture the environment semantics gap for adversary emulation; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The ATT&CK Enterprise, Mobile, and ICS STIX bundles plus CAPEC and FiGHT are representative of structured CTI used in emulation workflows.
    The study uses these specific corpora to measure coverage and derive the boundary between corpus-supported and analyst-authored elements.

pith-pipeline@v0.9.1-grok · 5841 in / 1348 out tokens · 25507 ms · 2026-06-27T17:51:59.652342+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense

    cs.CR 2026-06 unverdicted novelty 5.0

    ARENA creates anonymized SOC telemetry artifacts that reveal a measurable privacy-utility boundary when used both as training material for MITRE-mapped challenges and as a substrate to detect non-compliant LLM defende...

  2. From Production SIEM to Reusable Cybersecurity Artifacts

    cs.CR 2026-06 unverdicted novelty 4.0

    Methodology turns private production SIEM logs into reusable, anonymized cybersecurity artifacts validated on 37 ATT&CK-mapped challenges and 200 SOCpilot incidents.

Reference graph

Works this paper leans on

43 extracted references · 9 canonical work pages · cited by 2 Pith papers

  1. [1]

    MITRE ATT&CK: Design and philosophy,

    B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pennington, and C. B. Thomas, “MITRE ATT&CK: Design and philosophy,” The MITRE Corporation, McLean, V A, USA, Tech. Rep., 2018

  2. [2]

    How does endpoint detection use the MITRE att&ck framework?

    A. Virkud, M. A. Inam, A. Riddle, J. Liu, G. Wang, and A. Bates, “How does endpoint detection use the MITRE att&ck framework?” in33rd USENIX Security Symposium, USENIX Security 2024, Philadelphia, PA, USA, August 14-16, 2024, D. Balzarotti and W. Xu, Eds. Philadelphia, PA, USA: USENIX Association,

  3. [3]

    Available: https : / / www

    [Online]. Available: https : / / www. usenix . org / conference / usenixsecurity24/presentation/virkud

  4. [4]

    Intel- ligent, automated red team emulation,

    A. Applebaum, D. Miller, B. Strom, C. Korban, and R. Wolf, “Intel- ligent, automated red team emulation,” inProceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. New York, NY , USA: Association for Computing Machinery, 2016, pp. 7–14

  5. [5]

    Atomic red team,

    Red Canary, “Atomic red team,” Software repository, GitHub, 2023, accessed: 2026-03-05. [Online]. Available: https://github.com/ redcanaryco/atomic-red-team

  6. [6]

    Laccol- ith: Hypervisor-based adversary emulation with anti-detection,

    V . Orbinato, M. C. Feliciano, D. Cotroneo, and R. Natella, “Laccol- ith: Hypervisor-based adversary emulation with anti-detection,”IEEE Transactions on Dependable and Secure Computing, vol. 21, no. 6, pp. 5374–5387, 2024

  7. [7]

    Automated repeatable adversary threat emulation with effects language (EL),

    S. K. Damodaran and P. D. Rowe, “Automated repeatable adversary threat emulation with effects language (EL),”Digital Threats, May 2026, just Accepted. [Online]. Available: https : //doi.org/10.1145/3816043

  8. [8]

    From sands to mansions: Towards automated cyberattack emulation with classical planning and large language models,

    L. Wang, Z. Li, Y . Jiang, Z. Wang, Z. Guo, J. Wang, Y . Wei, X. Shen, W. Ruan, and Y . Chen, “From sands to mansions: Towards automated cyberattack emulation with classical planning and large language models,” 2024, arXiv:2407.16928

  9. [9]

    Decoding the mitre engenuity att&ck enterprise evaluation: An analysis of edr performance in real-world environments,

    X. Shen, Z. Li, G. Burleigh, L. Wang, and Y . Chen, “Decoding the mitre engenuity att&ck enterprise evaluation: An analysis of edr performance in real-world environments,” inProceedings of the 19th ACM Asia Conference on Computer and Communications Security, ser. ASIA CCS ’24. New York, NY , USA: Association for Computing Machinery, 2024, pp. 96–111. [Onl...

  10. [10]

    Holmes: Real-time apt detection through correla- tion of suspicious information flows,

    S. M. Milajerdi, R. Gjomemo, B. Eshete, R. Sekar, and V . N. Venkatakrishnan, “Holmes: Real-time apt detection through correla- tion of suspicious information flows,” in2019 IEEE Symposium on Security and Privacy (SP). Los Alamitos, CA, USA: IEEE, 2019, pp. 1137–1152

  11. [11]

    Unicorn: Runtime provenance-based detector for advanced persistent threats,

    X. Han, T. Pasquier, A. Bates, J. W. Mickens, and M. I. Seltzer, “Unicorn: Runtime provenance-based detector for advanced persistent threats,” inProceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS). San Diego, CA, USA: The Internet Society, 2020. [Online]. Available: https : / / www. ndss - symposium . org / ndss - paper...

  12. [12]

    DISTDET: A cost-effective distributed cyber threat detection system,

    F. Dong, L. Wang, X. Nie, F. Shao, H. Wang, D. Li, X. Luo, and X. Xiao, “DISTDET: A cost-effective distributed cyber threat detection system,” in32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023, J. A. Calandrino and C. Troncoso, Eds. Anaheim, CA, USA: USENIX Association, 2023, pp. 6575–6592. [Online]. Available: ht...

  13. [13]

    PROGRAPHER: an anomaly detection system based on provenance graph embedding,

    F. Yang, J. Xu, C. Xiong, Z. Li, and K. Zhang, “PROGRAPHER: an anomaly detection system based on provenance graph embedding,” in32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023, J. A. Calandrino and C. Troncoso, Eds. Anaheim, CA, USA: USENIX Association, 2023, pp. 4355–4372. [Online]. Available: https://www.usenix....

  14. [14]

    Expert insights into advanced persistent threats: Analysis, attribution, and challenges,

    A. Saha, J. Mattei, J. Blasco, L. Cavallaro, D. V otipka, and M. Lindorfer, “Expert insights into advanced persistent threats: Analysis, attribution, and challenges,” in34th USENIX Security Symposium, USENIX Security 2025, Seattle, WA, USA, August 13-15, 2025, L. Bauer and G. Pellegrino, Eds. Seattle, W A, USA: USENIX Association, 2025, pp. 2185–2204. [On...

  15. [15]

    A decade-long landscape of advanced persistent threats: Longitudinal analysis and global trends,

    S. Yuldoshkhujaev, M. Jeon, D. Kim, N. Nikiforakis, and H. Koo, “A decade-long landscape of advanced persistent threats: Longitudinal analysis and global trends,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, CCS 2025, Taipei, Taiwan, October 13-17, 2025, C. Huang, J. Chen, S. Shieh, D. Lie, and V . Cortier, Eds....

  16. [16]

    Available: https://doi.org/10.1145/3719027.3765085

    [Online]. Available: https://doi.org/10.1145/3719027.3765085

  17. [17]

    TREC: APT tactic / technique recognition via few-shot provenance subgraph learning,

    M. Lv, H. Gao, X. Qiu, T. Chen, T. Zhu, J. Chen, and S. Ji, “TREC: APT tactic / technique recognition via few-shot provenance subgraph learning,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, CCS 2024, Salt Lake City, UT, USA, October 14-18, 2024, B. Luo, X. Liao, J. Xu, E. Kirda, and D. Lie, Eds. Salt Lake Ci...

  18. [18]

    Are we there yet? an industrial viewpoint on provenance-based endpoint detection and response tools,

    F. Dong, S. Li, P. Jiang, D. Li, H. Wang, L. Huang, X. Xiao, J. Chen, X. Luo, Y . Guo, and X. Chen, “Are we there yet? an industrial viewpoint on provenance-based endpoint detection and response tools,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS 2023, Copenhagen, Denmark, November 26-30, 2023, W. Meng, C. ...

  19. [19]

    STIX Version 2.1,

    OASIS Cyber Threat Intelligence (CTI) TC, “STIX Version 2.1,” OASIS Open, OASIS Committee Specification 02, Jan. 2021, latest stage: https://docs.oasis-open.org/cti/stix/v2.1/stix-v2.1.html. [Online]. Available: https : / / docs . oasis - open . org / cti / stix / v2 . 1 / cs02/stix-v2.1-cs02.html

  20. [20]

    From CVE entries to verifiable exploits: An automated multi-agent framework for reproducing CVEs,

    S. Ullah, P. Balasubramanian, H. Pearce, W. Guo, C. Kruegel, G. Vigna, A. Burnett, and G. Stringhini, “From CVE entries to verifiable exploits: An automated multi-agent framework for reproducing CVEs,” inProceedings of the 2026 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2026, arXiv:2509.01835. [Online]. Available: https://arxiv.o...

  21. [21]

    Confusing value with enumeration: Studying the use of CVEs in academia,

    M. Schloegel, D. Klischies, S. Koch, D. Klein, L. Gerlach, M. Wessels, L. Trampert, M. Johns, M. Vanhoef, M. Schwarz, T. Holz, and J. Van Bulck, “Confusing value with enumeration: Studying the use of CVEs in academia,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 2887–2906. [Online]. Available: https : / / www. usenix . org / conferenc...

  22. [22]

    Sharing cyber threat intelligence: Does it really help?

    B. Jin, E. Kim, H. Lee, E. Bertino, D. Kim, and H. Kim, “Sharing cyber threat intelligence: Does it really help?” in31st Annual Network and Distributed System Security Symposium, NDSS 2024, San Diego, California, USA, February 26 - March 1, 2024. San Diego, CA, USA: The Internet Society, 2024. [Online]. Available: https://www.ndss- symposium.org/ndss- pap...

  23. [23]

    A comprehensive survey of threat intelligence research: A measurement-based study,

    K. Furumoto, T. Morikawa, A. Kolehmainen, B. Silverajan, T. Takahashi, and D. Inoue, “A comprehensive survey of threat intelligence research: A measurement-based study,”ACM Comput. Surv., vol. 58, no. 6, 2025. [Online]. Available: https: //doi.org/10.1145/3772280

  24. [24]

    A survey on advanced persistent threat detection: A unified framework, challenges, and countermeasures,

    B. Zhang, Y . Gao, B. Kuang, C. Yu, A. Fu, and W. Susilo, “A survey on advanced persistent threat detection: A unified framework, challenges, and countermeasures,”ACM Computing Surveys, vol. 57, no. 3, pp. 1–36, 2025

  25. [25]

    Sok: History is a vast early warning system: Auditing the provenance of system intrusions,

    M. A. Inam, Y . Chen, A. Goyal, J. Liu, J. Mink, N. Michael, S. Gaur, A. Bates, and W. U. Hassan, “Sok: History is a vast early warning system: Auditing the provenance of system intrusions,” in2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023, pp. 2620– 2638

  26. [26]

    What we talk about when we talk about logs: Understanding the effects of dataset quality on endpoint threat detection research,

    J. Liu, M. A. Inam, A. Goyal, A. Riddle, K. Westfall, and A. Bates, “What we talk about when we talk about logs: Understanding the effects of dataset quality on endpoint threat detection research,” in 2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2025, pp. 112–129

  27. [27]

    Sometimes simpler is better: A comprehensive analysis of State-of-the-Art provenance-based intrusion detection systems,

    T. Bilot, B. Jiang, Z. Li, N. El Madhoun, K. Al Agha, A. Zouaoui, and T. Pasquier, “Sometimes simpler is better: A comprehensive analysis of State-of-the-Art provenance-based intrusion detection systems,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 7193–7212. [Online]. Available: https://www.usenix. org/conference/usenixsecurity25/pre...

  28. [28]

    Nodlink: An online system for fine-grained apt attack detection and investigation,

    S. Li, F. Dong, X. Xiao, H. Wang, F. Shao, J. Chen, Y . Guo, X. Chen, and D. Li, “Nodlink: An online system for fine-grained apt attack detection and investigation,” inProceedings 2024 Network and Distributed System Security Symposium, ser. NDSS 2024. San Diego, CA, USA: Internet Society, 2024. [Online]. Available: http://dx.doi.org/10.14722/ndss.2024.23204

  29. [29]

    MAGIC: detecting advanced persistent threats via masked graph representation learning,

    Z. Jia, Y . Xiong, Y . Nan, Y . Zhang, J. Zhao, and M. Wen, “MAGIC: detecting advanced persistent threats via masked graph representation learning,” in33rd USENIX Security Symposium, USENIX Security 2024, Philadelphia, PA, USA, August 14- 16, 2024, D. Balzarotti and W. Xu, Eds. Philadelphia, PA, USA: USENIX Association, 2024. [Online]. Available: https : ...

  30. [30]

    TAPAS: an efficient online APT detection with task-guided process provenance graph segmentation and analysis,

    B. Zhang, Y . Gao, C. Yu, B. Kuang, Z. Zhang, H. Kim, and A. Fu, “TAPAS: an efficient online APT detection with task-guided process provenance graph segmentation and analysis,” in34th USENIX Security Symposium, USENIX Security 2025, Seattle, WA, USA, August 13-15, 2025, L. Bauer and G. Pellegrino, Eds. Seattle, W A, USA: USENIX Association, 2025, pp. 607–...

  31. [31]

    Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting,

    S. M. Milajerdi, B. Eshete, R. Gjomemo, and V . N. Venkatakrishnan, “Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting,” inProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. New York, NY , USA: Association for Computing Machinery, 2019, pp. 1795–1812

  32. [32]

    Tactical provenance analysis for endpoint detection and response systems,

    W. U. Hassan, A. Bates, and D. Marino, “Tactical provenance analysis for endpoint detection and response systems,” in2020 IEEE Sympo- sium on Security and Privacy (SP). IEEE, 2020, pp. 1172–1189

  33. [33]

    The procedural semantics gap in structured cti: A measurement-driven stix analysis for apt emulation,

    ´Agney Lopes Roth Ferraz, S. Barbieri, M. E. de Souza, and L. A. P. J ´unior, “The procedural semantics gap in structured cti: A measurement-driven stix analysis for apt emulation,” 2026. [Online]. Available: https://arxiv.org/abs/2512.12078

  34. [34]

    Extractor: Ex- tracting attack behavior from threat reports,

    K. Satvat, R. Gjomemo, and V . N. Venkatakrishnan, “Extractor: Ex- tracting attack behavior from threat reports,” in2021 IEEE European Symposium on Security and Privacy (EuroS&P). Los Alamitos, CA, USA: IEEE, 2021, pp. 598–615

  35. [35]

    Intelex: A llm-driven attack-level threat intelligence extraction framework,

    M. Xu, H. Wang, J. Liu, Y . Lin, C. Xu, Y . Liu, H. W. Lim, and J. S. Dong, “Intelex: A llm-driven attack-level threat intelligence extraction framework,”CoRR, vol. abs/2412.10872, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2412.10872

  36. [36]

    Ctinexus: Automatic cyber threat intelligence knowledge graph construction using large language models,

    Y . Cheng, O. Bajaber, S. A. Tsegai, D. Song, and P. Gao, “Ctinexus: Automatic cyber threat intelligence knowledge graph construction using large language models,” inProceedings of the 2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P). Dublin, Ireland: IEEE, 2025, pp. 923–938

  37. [37]

    Sok: Automated TTP extraction from CTI reports - are we there yet?

    M. B ¨uchel, T. Paladini, S. Longari, M. Carminati, S. Zanero, H. Binyamini, G. Engelberg, D. Klein, G. Guizzardi, M. Caselli, A. Continella, M. van Steen, A. Peter, and T. van Ede, “Sok: Automated TTP extraction from CTI reports - are we there yet?” in34th USENIX Security Symposium, USENIX Security 2025, Seattle, WA, USA, August 13-15, 2025, L. Bauer and...

  38. [38]

    High stakes, low certainty: Evaluating the efficacy of high-level indicators of compromise in ransomware attribution,

    M. van der Horst, R. Kho, O. Gadyatskaya, M. Mollema, M. van Eeten, and Y . Zhauniarovich, “High stakes, low certainty: Evaluating the efficacy of high-level indicators of compromise in ransomware attribution,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 4819–4838. [Online]. Available: https://www.usenix. org/conference/usenixsecurity...

  39. [39]

    Building next-generation datasets for provenance-based intrusion detection,

    Q. Cai, L. Wang, Y . Zhu, Z. Chen, X. Shen, and Z. Li, “Building next-generation datasets for provenance-based intrusion detection,” inWorkshop on Attack Provenance, Reasoning, and Investigation for Security in the Monitored Environment (PRISM) 2026, 2026. [Online]. Available: https://www.ndss-symposium.org/wp-content/ uploads/prism2026-21.pdf

  40. [40]

    InCALMO: An autonomous LLM-assisted system for red teaming multi-host networks,

    B. Singer, K. Lucas, L. Adiga, M. Jain, L. Bauer, and V . Sekar, “InCALMO: An autonomous LLM-assisted system for red teaming multi-host networks,” arXiv preprint arXiv:2501.16466, 2025, unpublished. [Online]. Available: https://arxiv.org/abs/2501.16466

  41. [41]

    Pentestgpt: Evaluating and har- nessing large language models for automated penetration testing,

    G. Deng, Y . Liu, V . Mayoral-Vilches, P. Liu, Y . Li, Y . Xu, T. Zhang, Y . Liu, M. Pinzger, and S. Rass, “Pentestgpt: Evaluating and har- nessing large language models for automated penetration testing,” in Proceedings of the 33rd USENIX Security Symposium, ser. SEC ’24. USA: USENIX Association, 2024

  42. [42]

    Pentestagent: Incorporating llm agents to automated penetration testing,

    X. Shen, L. Wang, Z. Li, Y . Chen, W. Zhao, D. Sun, J. Wang, and W. Ruan, “Pentestagent: Incorporating llm agents to automated penetration testing,” inProceedings of the 20th ACM Asia Conference on Computer and Communications Security (ASIA CCS ’25). New York, NY , USA: Association for Computing Machinery, 2025, pp. 375–391. [Online]. Available: https://d...

  43. [43]

    Expansion of ICS testbed for security validation based on MITRE ATT&CK techniques,

    S. Choi, J. Choi, J.-H. Yun, B.-G. Min, and H. Kim, “Expansion of ICS testbed for security validation based on MITRE ATT&CK techniques,” in13th USENIX Workshop on Cyber Security Experimentation and Test (CSET 20), 2020. [Online]. Available: https://www.usenix.org/conference/cset20/presentation/choi