arxiv: 2605.05531 · v1 · submitted 2026-05-07 · 💻 cs.CR

Beyond Collection: Measuring the Detection Efficacy of Modern Security Logging Standards

Ryan Holeman , John Hastings , Varghese Mathew Vaidyan This is my paper

Pith reviewed 2026-05-08 09:38 UTC · model grok-4.3

classification 💻 cs.CR

keywords security loggingexploit detectiontelemetry collectionlogging standardsremote code executionCIMOCSFECS

0 comments p. Extension

The pith

Security logging standards differ significantly in capturing the telemetry needed to detect remote code execution attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that industry logging standards are not interchangeable when it comes to giving defenders the data they need to spot attacks. It introduces an automated framework that spins up containerized exploit scenarios, runs fifty different remote code execution cases, and records what each standard actually logs. A reader would care because the choice of logging format directly affects how complete the evidence is when an incident occurs. The experiments quantify telemetry completeness and exploit detectability, revealing concrete gaps in some standards and clearer coverage in others.

Core claim

The SETC framework generates reproducible exploit scenarios in containerized environments and collects telemetry across CIM, OCSF, and ECS; experiments with fifty remote code execution vulnerabilities demonstrate measurable differences in how completely each standard records the indicators required for detection.

What carries the argument

The SETC automated Security Exploit Telemetry Collection framework, which runs standardized exploit scenarios and measures telemetry completeness and detectability for each logging standard.

If this is right

Security teams can use the measured differences to select a logging standard that supplies more complete attack indicators.
Standards with identified gaps can be targeted for extension or replacement to improve detection coverage.
The reproducible methodology supports repeated testing as new vulnerabilities and logging updates appear.
Organizations gain a concrete basis for deciding among CIM, OCSF, and ECS rather than relying on vendor claims alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same container-based testing approach could be applied to other attack classes such as web application or privilege-escalation exploits.
Standards developers might use the identified missing telemetry fields as direct requirements for future revisions.
Longer-term monitoring of real incidents could be cross-checked against the paper's completeness rankings to validate or adjust the container results.

Load-bearing premise

The assumption that results from automated exploit runs inside containers accurately represent how the same logging standards would behave under real-world attacks.

What would settle it

Execute the same fifty remote code execution exploits on non-containerized production servers, collect the actual logs produced by each standard, and check whether the completeness scores match those obtained inside the SETC containers.

Figures

Figures reproduced from arXiv: 2605.05531 by John Hastings, Ryan Holeman, Varghese Mathew Vaidyan.

**Figure 1.** Figure 1: SETC network and HTTP log sizes for CVE-2024-38856 view at source ↗

**Figure 2.** Figure 2: Vulnerability population attack graph and associated exploit activity view at source ↗

**Figure 3.** Figure 3: Raw HTTP data. Note that data in the figure is truncated using view at source ↗

**Figure 4.** Figure 4: Hierarchical breakdown of HTTP vulnerability detection capa view at source ↗

read the original abstract

Effective security logging is crucial for the timely and accurate detection of cyber threats; however, the relative effectiveness of various industry-standard logging frameworks remains understudied. This paper addresses this critical gap by presenting the first systematic evaluation of modern security logging standards utilizing a novel methodology built upon the automated Security Exploit Telemetry Collection (SETC) framework. SETC systematically generates reproducible exploit scenarios in containerized environments, collecting rich telemetry across multiple logging standards, including CIM (Common Information Model), OCSF (Open Cybersecurity Schema Framework), and ECS (Elastic Common Schema). The detection efficacy of each logging standard is quantified by measuring telemetry completeness and exploit detectability across standardized logs through detailed experiments involving 50 diverse remote code execution vulnerabilities. The resulting findings identify critical gaps and reveal significant differences in logging standards' abilities to capture key attack indicators. Our contributions include a novel evaluation methodology that enables scalable and reproducible analysis of exploit telemetry, as well as new findings that provide clear, evidence-based guidance for security practitioners to make informed decisions about adopting logging standards.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims to deliver the first systematic evaluation of the detection efficacy of modern security logging standards (CIM, OCSF, and ECS) via a novel automated Security Exploit Telemetry Collection (SETC) framework. It executes reproducible exploit scenarios for 50 diverse RCE vulnerabilities inside containerized environments, collects telemetry under each schema, and quantifies efficacy through metrics of telemetry completeness and exploit detectability, ultimately identifying critical gaps and significant differences to guide practitioner adoption decisions.

Significance. If the results hold and the testbed is shown to be representative, the work supplies the first large-scale empirical comparison of these schemas on attack-indicator capture, together with a reusable automated methodology. This could directly inform security operations choices and highlight concrete schema deficiencies that standards bodies might address.

major comments (1)

[§3 and §4] §3 (SETC Methodology) and §4 (Experimental Setup): The central claim that observed differences in completeness and detectability reflect intrinsic properties of the logging standards rests on the assumption that containerized automated exploit execution produces representative telemetry. The paper does not report any validation (e.g., side-by-side comparison with production hosts, full SIEM pipelines, or real network stacks) that container isolation, simplified networking, and absence of production logging agents do not materially change which indicators appear or how completely each schema records them. If this assumption fails, the quantified gaps become testbed artifacts rather than general findings.

minor comments (2)

[Abstract] Abstract: The abstract states that the experiments 'identify critical gaps' and 'reveal significant differences' yet supplies no numerical values for completeness percentages, detectability rates, or statistical significance; including at least headline metrics would strengthen the summary.
The paper introduces the SETC framework as a contribution but does not include a dedicated limitations subsection discussing the scope of the 50 RCE sample or the containerized threat model.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the representativeness of our experimental testbed. We address the single major comment below.

read point-by-point responses

Referee: [§3 and §4] §3 (SETC Methodology) and §4 (Experimental Setup): The central claim that observed differences in completeness and detectability reflect intrinsic properties of the logging standards rests on the assumption that containerized automated exploit execution produces representative telemetry. The paper does not report any validation (e.g., side-by-side comparison with production hosts, full SIEM pipelines, or real network stacks) that container isolation, simplified networking, and absence of production logging agents do not materially change which indicators appear or how completely each schema records them. If this assumption fails, the quantified gaps become testbed artifacts rather than general findings.

Authors: We agree that the manuscript does not include external validation of the containerized environment against production hosts or full SIEM deployments. The SETC framework was deliberately designed as a controlled, reproducible testbed to isolate the effects of the logging schemas themselves by holding the underlying execution environment constant across all 50 exploits and all three standards. This internal-validity focus enables direct attribution of differences in telemetry completeness and detectability to schema design choices rather than confounding factors such as varying host configurations or agent implementations. Nevertheless, we acknowledge that this controlled setting may omit certain production artifacts (e.g., richer network-stack telemetry or agent-specific enrichment). In the revised manuscript we will add an explicit “Threats to Validity” subsection (likely in §6) that (1) states the scope limitation, (2) explains the methodological rationale for the containerized design, and (3) outlines how future work could extend the framework to production-like environments. We will also soften the language around “intrinsic properties” to “properties observable under standardized, reproducible conditions.” These changes will make the claims more precise without requiring new experiments. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical measurements from new experiments

full rationale

The paper conducts a direct empirical evaluation by running 50 RCE exploits inside the SETC framework in containerized environments, then measuring telemetry completeness and exploit detectability under CIM, OCSF, and ECS schemas. No equations, fitted parameters, or predictions are defined in terms of the target outcomes; the results follow from the collected logs rather than any self-definitional reduction or load-bearing self-citation. The methodology is introduced as novel and the findings are presented as evidence-based observations from those runs, keeping the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the validity of the newly introduced SETC methodology and the representativeness of the chosen vulnerabilities and containerized test environment.

free parameters (1)

Selection of 50 RCE vulnerabilities
The specific set of 50 vulnerabilities is presented as diverse, but selection criteria and diversity metrics are not specified in the abstract.

axioms (1)

domain assumption Containerized environments and automated exploit generation accurately simulate real-world logging behavior and attack telemetry
Invoked throughout the description of the SETC framework for collecting comparable telemetry across standards.

invented entities (1)

SETC (Security Exploit Telemetry Collection) framework no independent evidence
purpose: To systematically generate reproducible exploit scenarios and collect rich telemetry across multiple logging standards
New methodology introduced by the paper to enable the comparative evaluation.

pith-pipeline@v0.9.0 · 5477 in / 1376 out tokens · 74243 ms · 2026-05-08T09:38:24.177177+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 17 canonical work pages

[1]

2025 Data Breach Investigations Report,

C. D. Hylender, P. Langlois, A. Pinto, and S. Widup, “2025 Data Breach Investigations Report,” Verizon, Technical Report, 2025, 18th annual edition. Available: https://verizon.com/dbir

2025
[2]

Common Information Model Add-on Manual

Splunk, “Common Information Model Add-on Manual.” Available: https://docs.splunk.com/Documentation/CIM/5.1.1/User/ Overview (Accessed 2024-01-24)

2024
[3]

Elastic Common Schema (ECS),

Elastic, “Elastic Common Schema (ECS),” n.d. Available: https: //github.com/elastic/ecs
[4]

Understanding the Open Cybersecurity Schema Framework,

P. Agbabian, “Understanding the Open Cybersecurity Schema Framework,” 2022. Available: https://github.com/ocsf/ocsf-docs/blob/ main/UnderstandingOCSF.pdf (Accessed 2024-01-24)

2022
[5]

SETC: A Vul- nerability Telemetry Collection Framework,

R. Holeman, J. D. Hastings, and V . M. Vaidyan, “SETC: A Vul- nerability Telemetry Collection Framework,” in2024 Cyber Aware- ness and Research Symposium (CARS). IEEE, 2024, pp. 1–7. doi:10.1109/CARS61786.2024.10778761

work page doi:10.1109/cars61786.2024.10778761 2024
[6]

Implementing ArcSight Common Event Format (CEF) - Version 26

Micro Focus, “Implementing ArcSight Common Event Format (CEF) - Version 26.” Avail- able: https://www.microfocus.com/documentation/arcsight/arcsight- smartconnectors-8.3/cef-implementation-standard/ (Accessed 2024- 01-24)

2024
[7]

Overview of the Unified Data Model,

Google, “Overview of the Unified Data Model,” n.d. Available: https: //cloud.google.com/chronicle/docs/event-processing/udm-overview
[8]

Vulhub, “Vulhub.” Available: https://github.com/vulhub/vulhub (Accessed 2024-01-24)

2024
[9]

Kennedy, J

D. Kennedy, J. O’gorman, D. Kearns, and M. Aharoni,Metasploit: the penetration tester’s guide. No Starch Press, 2011

2011
[10]

Design patterns for container- based distributed systems,

B. Burns and D. Oppenheimer, “Design patterns for container- based distributed systems,” inProceedings of the 8th USENIX Conference on Hot Topics in Cloud Computing, ser. HotCloud’16. USA: USENIX Association, 2016, p. 108–113. Available: https://www.usenix.org/system/files/conference/ hotcloud16/hotcloud16 burns.pdf (Accessed 2024-01-24)

2016
[11]

A Semantic-aware Representation Framework for Online Log Analysis,

W. Menget al., “A Semantic-aware Representation Framework for Online Log Analysis,” in2020 29th International Conference on Computer Communications and Networks (ICCCN), 2020, pp. 1–7. doi:10.1109/ICCCN49398.2020.9209707

work page doi:10.1109/icccn49398.2020.9209707 2020
[12]

Less is more: quantifying the security benefits of debloating web applications,

B. A. Azad, P. Laperdrix, and N. Nikiforakis, “Less is more: quantifying the security benefits of debloating web applications,” inProceedings of the 28th USENIX Conference on Security Symposium, ser. SEC’19. USA: USENIX Association, 2019, p. 1697–1714. Available: https://www.usenix.org/system/files/sec19- azad.pdf (Accessed 2024-01-24)

2019
[13]

Mitre att&ck: Design and philosophy,

B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pen- nington, and C. B. Thomas, “Mitre att&ck: Design and philosophy,” inTechnical report. The MITRE Corporation, 2018

2018
[14]

APT3 adversary emulation plan,

C. A. Korban, D. P. Miller, A. Pennington, and C. B. Thomas, “APT3 adversary emulation plan,”MITRE, 2017

2017
[15]

Cisco systems netflow services export version 9,

B. Claise, “Cisco systems netflow services export version 9,” Tech. Rep., 2004, doi: 10.17487/RFC3954

work page doi:10.17487/rfc3954 2004
[16]

NetFlow: Information loss or win?

R. Sommer and A. Feldmann, “NetFlow: Information loss or win?” inProceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, 2002, pp. 173–174. doi:10.1145/637201.637226

work page doi:10.1145/637201.637226 2002
[17]

In2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020

W. U. Hassan, A. Bates, and D. Marino, “Tactical provenance analysis for endpoint detection and response systems,” in2020 IEEE Sympo- sium on Security and Privacy (SP). IEEE, 2020, pp. 1172–1189. doi:10.1109/SP40000.2020.00096

work page doi:10.1109/sp40000.2020.00096 2020
[18]

SHADEW ATCHER: Recommendation-guided Cy- ber Threat Analysis using System Audit Records,

J. Zenget al., “SHADEW ATCHER: Recommendation-guided Cy- ber Threat Analysis using System Audit Records,” in2022 IEEE Symposium on Security and Privacy (SP), 2022, pp. 489–506. doi:10.1109/SP46214.2022.9833669

work page doi:10.1109/sp46214.2022.9833669 2022
[19]

Individualizing cybersecurity lab exercises with Labtainers,

M. F. Thompson and C. E. Irvine, “Individualizing cybersecurity lab exercises with Labtainers,”IEEE Security & Privacy, vol. 16, no. 2, pp. 91–95, 2018. doi:10.1109/MSP.2018.1870862

work page doi:10.1109/msp.2018.1870862 2018
[20]

Live lesson: Labtainers: A docker-based framework for cybersecurity labs,

C. E. Irvine, M. F. Thompson, M. McCarrin, and J. Khosalim, “Live lesson: Labtainers: A docker-based framework for cybersecurity labs,” in2017 USENIX Workshop on Advances in Security Education (ASE 17), 2017, Conference Proceedings. Available: https://www.usenix. org/system/files/conference/ase17/ase17 paper irvine.pdf (Accessed 2024-01-24)

2017
[21]

Building next generation cyber ranges with CRACK,

E. Russo, G. Costa, and A. Armando, “Building next generation cyber ranges with CRACK,”Computers & Security, vol. 95, p. 101837,
[22]

doi:10.1016/j.cose.2020.101837

work page doi:10.1016/j.cose.2020.101837 2020
[23]

Automating software in- stallation for cyber security research and testing public exploits in crate,

J. Kahlstr ¨om and J. Hedlin, “Automating software in- stallation for cyber security research and testing public exploits in crate,” Master’s thesis, Link ¨opings univer- sitet, 2021. Available: https://www.diva-portal.org/smash/get/diva2: 1574026/FULLTEXT01.pdf (Accessed 2025-05-24)

2021
[24]

TestREX: A testbed for repeatable exploits,

S. Dashevskyi, D. R. dos Santos, F. Massacci, and A. Sabetta, “TestREX: A testbed for repeatable exploits,” in 7th Workshop on Cyber Security Experimentation and Test (CSET 14). USENIX Association, 2014, Conference Proceed- ings. Available: https://www.usenix.org/conference/cset14/workshop- program/presentation/dashevskyi (Accessed 2024-01-24)

2014
[25]

BugBox: A vulnerability corpus for PHP web applications,

G. Nilson, K. Wills, J. Stuckman, and J. Purtilo, “BugBox: A vulnerability corpus for PHP web applications,” in6th Workshop on Cyber Security Experimentation and Test (CSET 13), 2013, Conference Proceedings. Available: https://www.usenix.org/system/ files/conference/cset13/cset13-nilson.pdf (Accessed 2024-01-24)

2013
[26]

Dynamic malware analysis using cuckoo sandbox,

S. Jamalpur, Y . S. Navya, P. Raja, G. Tagore, and G. R. K. Rao, “Dynamic malware analysis using cuckoo sandbox,” in2018 Sec- ond international conference on inventive communication and com- putational technologies (ICICCT). IEEE, 2018, pp. 1056–1060. doi:10.1109/ICICCT.2018.8473346

work page doi:10.1109/icicct.2018.8473346 2018
[27]

Reinforcement learning for intelli- gent penetration testing,

M. C. Ghanem and T. M. Chen, “Reinforcement learning for intelli- gent penetration testing,” in2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4). IEEE, 2018, pp. 185–192. doi:10.1109/WorldS4.2018.8611595

work page doi:10.1109/worlds4.2018.8611595 2018
[28]

V APE-BRIDGE: Bridging Open- V AS results for automating Metasploit framework,

K. Vimala and S. Fugkeaw, “V APE-BRIDGE: Bridging Open- V AS results for automating Metasploit framework,” in2022 14th International Conference on Knowledge and Smart Technol- ogy (KST). IEEE, 2022, Conference Proceedings, pp. 69–74. doi:10.1109/KST53302.2022.9729085

work page doi:10.1109/kst53302.2022.9729085 2022
[29]

Vulnerability exploitation using reinforce- ment learning,

A. AlMajaliet al., “Vulnerability exploitation using reinforce- ment learning,” in2023 IEEE Jordan International Joint Con- ference on Electrical Engineering and Information Technology (JEEIT). IEEE, 2023, Conference Proceedings, pp. 281–286. doi:10.1109/JEEIT58638.2023.10185700

work page doi:10.1109/jeeit58638.2023.10185700 2023
[30]

Incalmo: An Au- tonomous LLM-assisted System for Red Teaming Multi-Host Networks, November 2025

B. Singer, K. Lucas, L. Adiga, M. Jain, L. Bauer, and V . Sekar, “On the feasibility of using LLMs to autonomously execute multi-host network attacks,” 2025,arXiv:2501.16466

work page arXiv 2025
[31]

PentestGPT: Evaluating and harnessing large lan- guage models for automated penetration testing,

G. Denget al., “PentestGPT: Evaluating and harnessing large lan- guage models for automated penetration testing,” in33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 847–864

2024
[32]

NYU CTF bench: A scalable open-source benchmark dataset for evaluating llms in offensive security,

M. Shaoet al., “NYU CTF bench: A scalable open-source benchmark dataset for evaluating llms in offensive security,” inAdvances in Neural Information Processing Systems (NeurIPS 2024), vol. 37, 2024, pp. 57 472–57 498

2024
[33]

Teams of llm agents can exploit zero-day vulnerabilities,

Y . Zhuet al., “Teams of LLM agents can exploit zero-day vulnera- bilities,” 2024,arXiv:2406.01637

work page arXiv 2024
[34]

Cy- berGym: Evaluating AI agents’ real-world cybersecurity capabilities at scale.arXiv preprint arXiv:2506.02548, 2025

Z. Wang, T. Shi, J. He, M. Cai, J. Zhang, and D. Song, “CyberGym: Evaluating AI agents’ cybersecurity capabilities with real-world vulnerabilities at scale,” 2025. Available: https: //arxiv.org/abs/2506.02548

work page arXiv 2025
[35]

A comprehensive survey on cyber deception techniques to improve honeypot performance,

A. Javadpour, F. Ja’fari, T. Taleb, M. Shojafar, and C. Benza ¨ıd, “A comprehensive survey on cyber deception techniques to improve honeypot performance,”Computers & Security, p. 103792, 2024. doi:10.1016/j.cose.2024.103792

work page doi:10.1016/j.cose.2024.103792 2024
[36]

D-FRI-Honeypot: A secure sting operation for hacking the hackers using dynamic fuzzy rule interpolation,

N. Naik, C. Shang, P. Jenkins, and Q. Shen, “D-FRI-Honeypot: A secure sting operation for hacking the hackers using dynamic fuzzy rule interpolation,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 5, no. 6, pp. 893–907, 2020. doi:10.1109/TETCI.2020.3023447 Appendix A. Vulnerability & Exploit Corpus TABLE 9: CVE Vulnerabilities an...

work page doi:10.1109/tetci.2020.3023447 2020