Beyond Collection: Measuring the Detection Efficacy of Modern Security Logging Standards
Pith reviewed 2026-05-08 09:38 UTC · model grok-4.3
The pith
Security logging standards differ significantly in capturing the telemetry needed to detect remote code execution attacks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The SETC framework generates reproducible exploit scenarios in containerized environments and collects telemetry across CIM, OCSF, and ECS; experiments with fifty remote code execution vulnerabilities demonstrate measurable differences in how completely each standard records the indicators required for detection.
What carries the argument
The SETC automated Security Exploit Telemetry Collection framework, which runs standardized exploit scenarios and measures telemetry completeness and detectability for each logging standard.
If this is right
- Security teams can use the measured differences to select a logging standard that supplies more complete attack indicators.
- Standards with identified gaps can be targeted for extension or replacement to improve detection coverage.
- The reproducible methodology supports repeated testing as new vulnerabilities and logging updates appear.
- Organizations gain a concrete basis for deciding among CIM, OCSF, and ECS rather than relying on vendor claims alone.
Where Pith is reading between the lines
- The same container-based testing approach could be applied to other attack classes such as web application or privilege-escalation exploits.
- Standards developers might use the identified missing telemetry fields as direct requirements for future revisions.
- Longer-term monitoring of real incidents could be cross-checked against the paper's completeness rankings to validate or adjust the container results.
Load-bearing premise
The assumption that results from automated exploit runs inside containers accurately represent how the same logging standards would behave under real-world attacks.
What would settle it
Execute the same fifty remote code execution exploits on non-containerized production servers, collect the actual logs produced by each standard, and check whether the completeness scores match those obtained inside the SETC containers.
Figures
read the original abstract
Effective security logging is crucial for the timely and accurate detection of cyber threats; however, the relative effectiveness of various industry-standard logging frameworks remains understudied. This paper addresses this critical gap by presenting the first systematic evaluation of modern security logging standards utilizing a novel methodology built upon the automated Security Exploit Telemetry Collection (SETC) framework. SETC systematically generates reproducible exploit scenarios in containerized environments, collecting rich telemetry across multiple logging standards, including CIM (Common Information Model), OCSF (Open Cybersecurity Schema Framework), and ECS (Elastic Common Schema). The detection efficacy of each logging standard is quantified by measuring telemetry completeness and exploit detectability across standardized logs through detailed experiments involving 50 diverse remote code execution vulnerabilities. The resulting findings identify critical gaps and reveal significant differences in logging standards' abilities to capture key attack indicators. Our contributions include a novel evaluation methodology that enables scalable and reproducible analysis of exploit telemetry, as well as new findings that provide clear, evidence-based guidance for security practitioners to make informed decisions about adopting logging standards.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to deliver the first systematic evaluation of the detection efficacy of modern security logging standards (CIM, OCSF, and ECS) via a novel automated Security Exploit Telemetry Collection (SETC) framework. It executes reproducible exploit scenarios for 50 diverse RCE vulnerabilities inside containerized environments, collects telemetry under each schema, and quantifies efficacy through metrics of telemetry completeness and exploit detectability, ultimately identifying critical gaps and significant differences to guide practitioner adoption decisions.
Significance. If the results hold and the testbed is shown to be representative, the work supplies the first large-scale empirical comparison of these schemas on attack-indicator capture, together with a reusable automated methodology. This could directly inform security operations choices and highlight concrete schema deficiencies that standards bodies might address.
major comments (1)
- [§3 and §4] §3 (SETC Methodology) and §4 (Experimental Setup): The central claim that observed differences in completeness and detectability reflect intrinsic properties of the logging standards rests on the assumption that containerized automated exploit execution produces representative telemetry. The paper does not report any validation (e.g., side-by-side comparison with production hosts, full SIEM pipelines, or real network stacks) that container isolation, simplified networking, and absence of production logging agents do not materially change which indicators appear or how completely each schema records them. If this assumption fails, the quantified gaps become testbed artifacts rather than general findings.
minor comments (2)
- [Abstract] Abstract: The abstract states that the experiments 'identify critical gaps' and 'reveal significant differences' yet supplies no numerical values for completeness percentages, detectability rates, or statistical significance; including at least headline metrics would strengthen the summary.
- The paper introduces the SETC framework as a contribution but does not include a dedicated limitations subsection discussing the scope of the 50 RCE sample or the containerized threat model.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the representativeness of our experimental testbed. We address the single major comment below.
read point-by-point responses
-
Referee: [§3 and §4] §3 (SETC Methodology) and §4 (Experimental Setup): The central claim that observed differences in completeness and detectability reflect intrinsic properties of the logging standards rests on the assumption that containerized automated exploit execution produces representative telemetry. The paper does not report any validation (e.g., side-by-side comparison with production hosts, full SIEM pipelines, or real network stacks) that container isolation, simplified networking, and absence of production logging agents do not materially change which indicators appear or how completely each schema records them. If this assumption fails, the quantified gaps become testbed artifacts rather than general findings.
Authors: We agree that the manuscript does not include external validation of the containerized environment against production hosts or full SIEM deployments. The SETC framework was deliberately designed as a controlled, reproducible testbed to isolate the effects of the logging schemas themselves by holding the underlying execution environment constant across all 50 exploits and all three standards. This internal-validity focus enables direct attribution of differences in telemetry completeness and detectability to schema design choices rather than confounding factors such as varying host configurations or agent implementations. Nevertheless, we acknowledge that this controlled setting may omit certain production artifacts (e.g., richer network-stack telemetry or agent-specific enrichment). In the revised manuscript we will add an explicit “Threats to Validity” subsection (likely in §6) that (1) states the scope limitation, (2) explains the methodological rationale for the containerized design, and (3) outlines how future work could extend the framework to production-like environments. We will also soften the language around “intrinsic properties” to “properties observable under standardized, reproducible conditions.” These changes will make the claims more precise without requiring new experiments. revision: partial
Circularity Check
No circularity: empirical measurements from new experiments
full rationale
The paper conducts a direct empirical evaluation by running 50 RCE exploits inside the SETC framework in containerized environments, then measuring telemetry completeness and exploit detectability under CIM, OCSF, and ECS schemas. No equations, fitted parameters, or predictions are defined in terms of the target outcomes; the results follow from the collected logs rather than any self-definitional reduction or load-bearing self-citation. The methodology is introduced as novel and the findings are presented as evidence-based observations from those runs, keeping the derivation chain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Selection of 50 RCE vulnerabilities
axioms (1)
- domain assumption Containerized environments and automated exploit generation accurately simulate real-world logging behavior and attack telemetry
invented entities (1)
-
SETC (Security Exploit Telemetry Collection) framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
2025 Data Breach Investigations Report,
C. D. Hylender, P. Langlois, A. Pinto, and S. Widup, “2025 Data Breach Investigations Report,” Verizon, Technical Report, 2025, 18th annual edition. Available: https://verizon.com/dbir
2025
-
[2]
Common Information Model Add-on Manual
Splunk, “Common Information Model Add-on Manual.” Available: https://docs.splunk.com/Documentation/CIM/5.1.1/User/ Overview (Accessed 2024-01-24)
2024
-
[3]
Elastic Common Schema (ECS),
Elastic, “Elastic Common Schema (ECS),” n.d. Available: https: //github.com/elastic/ecs
-
[4]
Understanding the Open Cybersecurity Schema Framework,
P. Agbabian, “Understanding the Open Cybersecurity Schema Framework,” 2022. Available: https://github.com/ocsf/ocsf-docs/blob/ main/UnderstandingOCSF.pdf (Accessed 2024-01-24)
2022
-
[5]
SETC: A Vul- nerability Telemetry Collection Framework,
R. Holeman, J. D. Hastings, and V . M. Vaidyan, “SETC: A Vul- nerability Telemetry Collection Framework,” in2024 Cyber Aware- ness and Research Symposium (CARS). IEEE, 2024, pp. 1–7. doi:10.1109/CARS61786.2024.10778761
-
[6]
Implementing ArcSight Common Event Format (CEF) - Version 26
Micro Focus, “Implementing ArcSight Common Event Format (CEF) - Version 26.” Avail- able: https://www.microfocus.com/documentation/arcsight/arcsight- smartconnectors-8.3/cef-implementation-standard/ (Accessed 2024- 01-24)
2024
-
[7]
Overview of the Unified Data Model,
Google, “Overview of the Unified Data Model,” n.d. Available: https: //cloud.google.com/chronicle/docs/event-processing/udm-overview
-
[8]
Vulhub, “Vulhub.” Available: https://github.com/vulhub/vulhub (Accessed 2024-01-24)
2024
-
[9]
Kennedy, J
D. Kennedy, J. O’gorman, D. Kearns, and M. Aharoni,Metasploit: the penetration tester’s guide. No Starch Press, 2011
2011
-
[10]
Design patterns for container- based distributed systems,
B. Burns and D. Oppenheimer, “Design patterns for container- based distributed systems,” inProceedings of the 8th USENIX Conference on Hot Topics in Cloud Computing, ser. HotCloud’16. USA: USENIX Association, 2016, p. 108–113. Available: https://www.usenix.org/system/files/conference/ hotcloud16/hotcloud16 burns.pdf (Accessed 2024-01-24)
2016
-
[11]
A Semantic-aware Representation Framework for Online Log Analysis,
W. Menget al., “A Semantic-aware Representation Framework for Online Log Analysis,” in2020 29th International Conference on Computer Communications and Networks (ICCCN), 2020, pp. 1–7. doi:10.1109/ICCCN49398.2020.9209707
-
[12]
Less is more: quantifying the security benefits of debloating web applications,
B. A. Azad, P. Laperdrix, and N. Nikiforakis, “Less is more: quantifying the security benefits of debloating web applications,” inProceedings of the 28th USENIX Conference on Security Symposium, ser. SEC’19. USA: USENIX Association, 2019, p. 1697–1714. Available: https://www.usenix.org/system/files/sec19- azad.pdf (Accessed 2024-01-24)
2019
-
[13]
Mitre att&ck: Design and philosophy,
B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pen- nington, and C. B. Thomas, “Mitre att&ck: Design and philosophy,” inTechnical report. The MITRE Corporation, 2018
2018
-
[14]
APT3 adversary emulation plan,
C. A. Korban, D. P. Miller, A. Pennington, and C. B. Thomas, “APT3 adversary emulation plan,”MITRE, 2017
2017
-
[15]
Cisco systems netflow services export version 9,
B. Claise, “Cisco systems netflow services export version 9,” Tech. Rep., 2004, doi: 10.17487/RFC3954
-
[16]
NetFlow: Information loss or win?
R. Sommer and A. Feldmann, “NetFlow: Information loss or win?” inProceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, 2002, pp. 173–174. doi:10.1145/637201.637226
-
[17]
In2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020
W. U. Hassan, A. Bates, and D. Marino, “Tactical provenance analysis for endpoint detection and response systems,” in2020 IEEE Sympo- sium on Security and Privacy (SP). IEEE, 2020, pp. 1172–1189. doi:10.1109/SP40000.2020.00096
-
[18]
SHADEW ATCHER: Recommendation-guided Cy- ber Threat Analysis using System Audit Records,
J. Zenget al., “SHADEW ATCHER: Recommendation-guided Cy- ber Threat Analysis using System Audit Records,” in2022 IEEE Symposium on Security and Privacy (SP), 2022, pp. 489–506. doi:10.1109/SP46214.2022.9833669
-
[19]
Individualizing cybersecurity lab exercises with Labtainers,
M. F. Thompson and C. E. Irvine, “Individualizing cybersecurity lab exercises with Labtainers,”IEEE Security & Privacy, vol. 16, no. 2, pp. 91–95, 2018. doi:10.1109/MSP.2018.1870862
-
[20]
Live lesson: Labtainers: A docker-based framework for cybersecurity labs,
C. E. Irvine, M. F. Thompson, M. McCarrin, and J. Khosalim, “Live lesson: Labtainers: A docker-based framework for cybersecurity labs,” in2017 USENIX Workshop on Advances in Security Education (ASE 17), 2017, Conference Proceedings. Available: https://www.usenix. org/system/files/conference/ase17/ase17 paper irvine.pdf (Accessed 2024-01-24)
2017
-
[21]
Building next generation cyber ranges with CRACK,
E. Russo, G. Costa, and A. Armando, “Building next generation cyber ranges with CRACK,”Computers & Security, vol. 95, p. 101837,
-
[22]
doi:10.1016/j.cose.2020.101837
-
[23]
Automating software in- stallation for cyber security research and testing public exploits in crate,
J. Kahlstr ¨om and J. Hedlin, “Automating software in- stallation for cyber security research and testing public exploits in crate,” Master’s thesis, Link ¨opings univer- sitet, 2021. Available: https://www.diva-portal.org/smash/get/diva2: 1574026/FULLTEXT01.pdf (Accessed 2025-05-24)
2021
-
[24]
TestREX: A testbed for repeatable exploits,
S. Dashevskyi, D. R. dos Santos, F. Massacci, and A. Sabetta, “TestREX: A testbed for repeatable exploits,” in 7th Workshop on Cyber Security Experimentation and Test (CSET 14). USENIX Association, 2014, Conference Proceed- ings. Available: https://www.usenix.org/conference/cset14/workshop- program/presentation/dashevskyi (Accessed 2024-01-24)
2014
-
[25]
BugBox: A vulnerability corpus for PHP web applications,
G. Nilson, K. Wills, J. Stuckman, and J. Purtilo, “BugBox: A vulnerability corpus for PHP web applications,” in6th Workshop on Cyber Security Experimentation and Test (CSET 13), 2013, Conference Proceedings. Available: https://www.usenix.org/system/ files/conference/cset13/cset13-nilson.pdf (Accessed 2024-01-24)
2013
-
[26]
Dynamic malware analysis using cuckoo sandbox,
S. Jamalpur, Y . S. Navya, P. Raja, G. Tagore, and G. R. K. Rao, “Dynamic malware analysis using cuckoo sandbox,” in2018 Sec- ond international conference on inventive communication and com- putational technologies (ICICCT). IEEE, 2018, pp. 1056–1060. doi:10.1109/ICICCT.2018.8473346
-
[27]
Reinforcement learning for intelli- gent penetration testing,
M. C. Ghanem and T. M. Chen, “Reinforcement learning for intelli- gent penetration testing,” in2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4). IEEE, 2018, pp. 185–192. doi:10.1109/WorldS4.2018.8611595
-
[28]
V APE-BRIDGE: Bridging Open- V AS results for automating Metasploit framework,
K. Vimala and S. Fugkeaw, “V APE-BRIDGE: Bridging Open- V AS results for automating Metasploit framework,” in2022 14th International Conference on Knowledge and Smart Technol- ogy (KST). IEEE, 2022, Conference Proceedings, pp. 69–74. doi:10.1109/KST53302.2022.9729085
-
[29]
Vulnerability exploitation using reinforce- ment learning,
A. AlMajaliet al., “Vulnerability exploitation using reinforce- ment learning,” in2023 IEEE Jordan International Joint Con- ference on Electrical Engineering and Information Technology (JEEIT). IEEE, 2023, Conference Proceedings, pp. 281–286. doi:10.1109/JEEIT58638.2023.10185700
-
[30]
Incalmo: An Au- tonomous LLM-assisted System for Red Teaming Multi-Host Networks, November 2025
B. Singer, K. Lucas, L. Adiga, M. Jain, L. Bauer, and V . Sekar, “On the feasibility of using LLMs to autonomously execute multi-host network attacks,” 2025,arXiv:2501.16466
-
[31]
PentestGPT: Evaluating and harnessing large lan- guage models for automated penetration testing,
G. Denget al., “PentestGPT: Evaluating and harnessing large lan- guage models for automated penetration testing,” in33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 847–864
2024
-
[32]
NYU CTF bench: A scalable open-source benchmark dataset for evaluating llms in offensive security,
M. Shaoet al., “NYU CTF bench: A scalable open-source benchmark dataset for evaluating llms in offensive security,” inAdvances in Neural Information Processing Systems (NeurIPS 2024), vol. 37, 2024, pp. 57 472–57 498
2024
-
[33]
Teams of llm agents can exploit zero-day vulnerabilities,
Y . Zhuet al., “Teams of LLM agents can exploit zero-day vulnera- bilities,” 2024,arXiv:2406.01637
-
[34]
Z. Wang, T. Shi, J. He, M. Cai, J. Zhang, and D. Song, “CyberGym: Evaluating AI agents’ cybersecurity capabilities with real-world vulnerabilities at scale,” 2025. Available: https: //arxiv.org/abs/2506.02548
-
[35]
A comprehensive survey on cyber deception techniques to improve honeypot performance,
A. Javadpour, F. Ja’fari, T. Taleb, M. Shojafar, and C. Benza ¨ıd, “A comprehensive survey on cyber deception techniques to improve honeypot performance,”Computers & Security, p. 103792, 2024. doi:10.1016/j.cose.2024.103792
-
[36]
N. Naik, C. Shang, P. Jenkins, and Q. Shen, “D-FRI-Honeypot: A secure sting operation for hacking the hackers using dynamic fuzzy rule interpolation,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 5, no. 6, pp. 893–907, 2020. doi:10.1109/TETCI.2020.3023447 Appendix A. Vulnerability & Exploit Corpus TABLE 9: CVE Vulnerabilities an...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.