From Production SIEM to Reusable Cybersecurity Artifacts
Pith reviewed 2026-06-26 13:53 UTC · model grok-4.3
The pith
A methodology extracts, anonymizes, structures, and validates production SIEM data from a financial SOC to yield reusable artifacts that preserve task-relevant investigative structure inside a declared privacy boundary.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Operational evidence is not automatically scientific evidence. The most realistic Security Operations Center data is production telemetry, yet it remains scientifically inaccessible because raw logs cannot be released; as a result, research relies on synthetic or dated datasets. We treat the boundary between private production telemetry and reusable research artifacts as the design object: a methodology that extracts, anonymizes, structures, and validates SIEM data from a production financial SOC while preserving task-relevant investigative structure within a declared privacy boundary. Two consumers stress the same artifact. As training material, it fails loudly: 37 MITRE ATT&CK-mapped HIKAR
What carries the argument
The extraction-anonymization-structuring-validation pipeline that maintains temporal order and entity consistency inside an explicit privacy boundary.
If this is right
- The same anonymized artifacts support both training challenges mapped to MITRE ATT&CK and quantitative measurement of automated incident response.
- A measurable privacy-utility boundary replaces binary anonymity claims as the evaluation standard for released SOC data.
- Production telemetry can serve as the substrate for reproducible experiments once temporal and entity structure is preserved.
Where Pith is reading between the lines
- Other organizations holding production logs could run equivalent extraction pipelines to enlarge the pool of controlled research artifacts.
- The dual-use testing approach could be applied to evaluate additional automated tools beyond LLMs on the same incident set.
- The privacy boundary definition offers a template for sharing data across regulated sectors where full release remains impossible.
Load-bearing premise
The anonymization steps keep temporal order and entity consistency intact enough for the 37 challenges to run correctly and for the verifier to produce meaningful comparisons without the transformations adding hidden biases.
What would settle it
If the 37 HIKARI challenges stop working or the deterministic verifier no longer detects differences between LLM and human actions on the 200 incidents after the anonymization steps are applied, the claimed utility of the artifacts would not hold.
read the original abstract
Operational evidence is not automatically scientific evidence. The most realistic Security Operations Center (SOC) data is production telemetry, yet it remains scientifically inaccessible because raw logs cannot be released; as a result, research relies on synthetic or dated datasets. We treat the boundary between private production telemetry and reusable research artifacts as the design object: a methodology that extracts, anonymizes, structures, and validates Security Information and Event Management (SIEM) data from a production financial SOC while preserving task-relevant investigative structure within a declared privacy boundary. Two consumers stress the same artifact. As training material, it fails loudly: 37 MITRE ATT&CK-mapped HIKARI challenges work only when anonymization preserves temporal order and entity consistency. As a measurement substrate, it fails quietly: across 200 SOCpilot incidents, a deterministic verifier detects non-compliant Large Language Model (LLM) actions that are absent from the human baseline. The result is a measurable privacy-utility boundary rather than a formal anonymity claim.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce a methodology that extracts, anonymizes, structures, and validates SIEM data from a production financial SOC to produce reusable cybersecurity artifacts that preserve task-relevant investigative structure within a declared privacy boundary. This is demonstrated via two consumers of the same artifact: as training material, 37 MITRE ATT&CK-mapped HIKARI challenges succeed only when anonymization preserves temporal order and entity consistency; as a measurement substrate, a deterministic verifier on 200 SOCpilot incidents detects non-compliant LLM actions absent from the human baseline, yielding a measurable privacy-utility boundary rather than a formal anonymity claim.
Significance. If the central claims hold, the work addresses a key barrier in cybersecurity research by enabling realistic, production-derived datasets for evaluation and training while respecting privacy constraints. The dual-use design (loud failure on challenges, quiet detection on incidents) and concrete scale (37 challenges, 200 incidents) provide a practical, falsifiable demonstration of the privacy-utility trade-off that could influence how future SOC artifacts are shared.
major comments (2)
- [Methodology (anonymization description)] The anonymization step (extract-anonymize-structure-validate) is load-bearing for the central claim that task-relevant structure is preserved, yet the manuscript provides no concrete mechanism for global entity-ID remapping that maintains cross-log references or for timestamp jittering that preserves relative ordering and causality; without these, the HIKARI challenge results and SOCpilot verifier outcomes could be artifacts of the pipeline rather than evidence of utility.
- [Evaluation / Results] The evaluation reports 37 HIKARI challenges and 200 SOCpilot incidents as concrete outcomes but supplies no error analysis, baseline comparisons against non-anonymized data, or explicit checks that the transformations did not introduce or destroy correlations; this undermines the assertion of a measurable privacy-utility boundary.
minor comments (1)
- [Notation / Definitions] Notation for the privacy boundary and the deterministic verifier should be defined more explicitly (e.g., with a short pseudocode listing or table of invariants) to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight areas where additional detail will strengthen the manuscript, and we address each point below with proposed revisions.
read point-by-point responses
-
Referee: [Methodology (anonymization description)] The anonymization step (extract-anonymize-structure-validate) is load-bearing for the central claim that task-relevant structure is preserved, yet the manuscript provides no concrete mechanism for global entity-ID remapping that maintains cross-log references or for timestamp jittering that preserves relative ordering and causality; without these, the HIKARI challenge results and SOCpilot verifier outcomes could be artifacts of the pipeline rather than evidence of utility.
Authors: We agree that the current description of anonymization is insufficiently concrete. In revision we will expand Section 3 with (a) the precise global entity remapping algorithm, including how cross-log references are maintained via consistent pseudonym assignment and collision avoidance, and (b) the timestamp jitter procedure with explicit bounds and ordering guarantees. These additions will make clear that the reported outcomes depend on preserved structure rather than incidental pipeline effects. revision: yes
-
Referee: [Evaluation / Results] The evaluation reports 37 HIKARI challenges and 200 SOCpilot incidents as concrete outcomes but supplies no error analysis, baseline comparisons against non-anonymized data, or explicit checks that the transformations did not introduce or destroy correlations; this undermines the assertion of a measurable privacy-utility boundary.
Authors: We will add an error-analysis subsection detailing failure cases for both the HIKARI challenges and the SOCpilot verifier, plus statistical checks (e.g., correlation matrices on non-sensitive fields) confirming that transformations do not materially alter task-relevant distributions. Direct comparison against the original non-anonymized logs is not feasible; we will instead introduce a control condition using deliberately inconsistent entity IDs as a degraded baseline. revision: partial
- Direct baseline comparisons against the original non-anonymized production logs, which remain inaccessible due to privacy constraints.
Circularity Check
No circularity: claims rest on external validation tasks
full rationale
The paper presents a methodology for extracting and anonymizing SIEM data, validated by success/failure on independent external benchmarks (37 HIKARI challenges requiring preserved order/consistency, and 200 SOCpilot incidents with a deterministic verifier). No equations, self-definitional quantities, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The privacy-utility boundary is demonstrated via these external consumers rather than by construction from the method's own outputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Anonymization steps can be chosen such that temporal order and entity consistency are retained at a level usable for MITRE ATT&CK-mapped challenges and incident-response verification.
Reference graph
Works this paper leans on
-
[1]
MITRE ATT&CK,
The MITRE Corporation, “MITRE ATT&CK,” https://attack.mitre.org, 2026, accessed: 2026-06-14
2026
-
[2]
Toward generating a new intrusion detection dataset and intrusion traffic characterization,
I. Sharafaldin, A. H. Lashkari, A. A. Ghorbani, “Toward generating a new intrusion detection dataset and intrusion traffic characterization,” inInt. Conf. on Information Systems Security and Privacy (ICISSP), 2018
2018
-
[3]
UNSW-NB15: A comprehensive data set for network intrusion detection systems,
N. Moustafa J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems,” inMilitary Communications and Information Systems Conf. (MilCIS), 2015
2015
-
[4]
An empirical comparison of botnet detection methods,
S. Garc ´ıa, M. Grill, J. Stiborek, A. Zunino, “An empirical comparison of botnet detection methods,”Computers & Security, vol. 45, 2014
2014
-
[5]
Comprehensive, multi-source cyber-security events data set,
A. D. Kent, “Comprehensive, multi-source cyber-security events data set,” Los Alamos National Laboratory, Tech. Rep., 2015
2015
-
[6]
Bridging the gap: A pragmatic approach to ge- nerating insider threat data,
J. Glasser B. Lindauer, “Bridging the gap: A pragmatic approach to ge- nerating insider threat data,” inIEEE Security and Privacy Workshops, 2013
2013
-
[7]
Transparent Computing,
Defense Advanced Research Projects Agency, “Transparent Computing,” Program documentation, 2026, accessed: 2026-06-
2026
-
[8]
Available: https://www.darpa.mil/research/programs/ transparent-computing
[Online]. Available: https://www.darpa.mil/research/programs/ transparent-computing
-
[9]
Boss of the SOC Dataset Version 2,
Splunk, “Boss of the SOC Dataset Version 2,” Dataset repository, 2017, accessed: 2026-06-17. [Online]. Available: https://github.com/ splunk/botsv2
2017
-
[10]
Tools and benchmarks for automated log parsing,
J. Zhu, S. He, J. Liu, P. He, Q. Xie, Z. Zheng, M. R. Lyu, “Tools and benchmarks for automated log parsing,” inInt. Conf. on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2019
2019
-
[11]
k-anonymity: A model for protecting privacy,
L. Sweeney, “k-anonymity: A model for protecting privacy,”Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, 2002
2002
-
[12]
AutoSUT: The environment semantics gap in structured CTI for adversary emulation,
S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “AutoSUT: The environment semantics gap in structured CTI for adversary emulation,”
-
[13]
Available: https://arxiv.org/abs/2606.08700
[Online]. Available: https://arxiv.org/abs/2606.08700
-
[14]
SOCpilot: Verifying policy compliance for LLM-assisted incident response,
S. Barbieri, L. V . d. Meneses, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “SOCpilot: Verifying policy compliance for LLM-assisted incident response,” 2026. [Online]. Available: https://arxiv.org/abs/2605.05501
Pith/arXiv arXiv 2026
-
[15]
Prefix-preserving IP address anonymization,
J. Xu, J. Fan, M. H. Ammar, S. B. Moon, “Prefix-preserving IP address anonymization,” inIEEE Int. Conf. on Network Protocols (ICNP), 2002
2002
-
[16]
Hikari: A gamified cyber-range platform for defensive training,
S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “Hikari: A gamified cyber-range platform for defensive training,” 2026, lab artifact; paper in preparation
2026
-
[17]
True Attacks, Attack Attempts, or Benign Triggers? An Empirical Measurement of Network Alerts in a Security Operations Center,
L. Yang, Z. Chen, C. Wang, Z. Zhang, S. Booma, P. Cao, C. Adam, A. Withers, Z. Kalbarczyk, R. K. Iyer, G. Wang, “True Attacks, Attack Attempts, or Benign Triggers? An Empirical Measurement of Network Alerts in a Security Operations Center,” in 33rd USENIX Security Symposium. USENIX Association, 2024, pp. 1525–1542. [Online]. Available: https://www.usenix....
2024
-
[18]
99% False Positives: A Qualitative Study of SOC Analysts’ Perspectives on Security Alarms,
B. A. AlAhmadi, L. Axon, I. Martinovic, “99% False Positives: A Qualitative Study of SOC Analysts’ Perspectives on Security Alarms,” in31st USENIX Security Symposium. USENIX Association, 2022, pp. 2783–2800. [Online]. Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/alahmadi
2022
-
[19]
Ruling the Rules: Quantifying the Evolution of Rulesets, Alerts and Incidents in Network Intrusion Detection,
M. Vermeer, M. van Eeten, C. Ga ˜n´an, “Ruling the Rules: Quantifying the Evolution of Rulesets, Alerts and Incidents in Network Intrusion Detection,” inProceedings of the ACM Asia Conference on Computer and Communications Security. Association for Computing Machinery, 2022, pp. 799–814
2022
-
[20]
TopVenues: A reproducible corpus and tooling substrate for cybersecurity literature reviews,
S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “TopVenues: A reproducible corpus and tooling substrate for cybersecurity literature reviews,” 2026. [Online]. Available: https://arxiv.org/abs/2606.18320
Pith/arXiv arXiv 2026
-
[21]
CAIDA Anonymized Internet Traces Dataset,
Center for Applied Internet Data Analysis, “CAIDA Anonymized Internet Traces Dataset,” Dataset documentation, 2020, accessed: 2026- 06-17. [Online]. Available: https://www-old.caida.org/data/passive/ passive dataset.xml
2020
-
[22]
MAWI Working Group Traffic Archive,
MAWI Working Group, “MAWI Working Group Traffic Archive,” Dataset documentation, 2026, accessed: 2026-06-17. [Online]. Available: https://mawi.wide.ad.jp/mawi/
2026
-
[23]
A Network Gene-Based Fra- mework for Detecting Advanced Persistent Threats,
Y . Wang, Y . Wang, J. Liu, Z. Huang, “A Network Gene-Based Fra- mework for Detecting Advanced Persistent Threats,” in2014 Ninth International Conference on P2P , Parallel, Grid, Cloud and Internet Computing. IEEE, 2014, pp. 47–54
2014
-
[24]
CSKG4APT: A Cyberse- curity Knowledge Graph for Advanced Persistent Threat Organization Attribution,
Y . Ren, Y . Xiao, Y . Zhou, Z. Zhang, Z. Tian, “CSKG4APT: A Cyberse- curity Knowledge Graph for Advanced Persistent Threat Organization Attribution,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 6, pp. 5695–5709, 2023
2023
-
[25]
Devil in the Noise: Detecting Advanced Persistent Threats with Backbone Extraction,
C. M. C. Viana, C. H. G. Ferreira, F. Murai, A. L. d. Santos, L. A. Pereira J ´unior, “Devil in the Noise: Detecting Advanced Persistent Threats with Backbone Extraction,” in2024 IEEE Symposium on Computers and Communications. IEEE, 2024
2024
-
[26]
Systems for Detecting Advanced Persistent Threats: A Development Roadmap Using Intelligent Data Analysis,
J. de Vries, H. Hoogstraaten, J. van den Berg, S. Daskapan, “Systems for Detecting Advanced Persistent Threats: A Development Roadmap Using Intelligent Data Analysis,” in2012 International Conference on Cyber Security. IEEE, 2012, pp. 54–61
2012
-
[27]
Matched and Mismatched SOCs: A Qualitative Study on Security Operations Center Issues,
F. B. Kokulu, A. Soneji, T. Bao, Y . Shoshitaishvili, Z. Zhao, A. Doup ´e, G.-J. Ahn, “Matched and Mismatched SOCs: A Qualitative Study on Security Operations Center Issues,” inProceedings of the ACM SIGSAC Conference on Computer and Communications Security. Association for Computing Machinery, 2019, pp. 1955–1970
2019
-
[28]
Do You Play It by the Books? A Study on Incident Response Playbooks and Influencing Factors,
D. Schlette, P. Empl, M. Caselli, T. Schreck, G. Pernul, “Do You Play It by the Books? A Study on Incident Response Playbooks and Influencing Factors,” in2024 IEEE Symposium on Security and Privacy. IEEE, 2024, pp. 3625–3643
2024
-
[29]
Lessons Lost: Incident Response in the Age of Cyber Insurance and Breach Attorneys,
D. W. Woods, R. B ¨ohme, J. Wolff, D. Schwarcz, “Lessons Lost: Incident Response in the Age of Cyber Insurance and Breach Attorneys,” in32nd USENIX Security Symposium. USENIX Association, 2023, pp. 2259–2273. [Online]. Available: https://www. usenix.org/conference/usenixsecurity23/presentation/woods
2023
-
[30]
PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing,
G. Deng, Y . Liu, V . M. Vilches, P. Liu, Y . Li, Y . Xu, M. Pinzger, S. Rass, T. Zhang, Y . Liu, “PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing,” in33rd USENIX Security Symposium. USENIX Association, 2024. [Online]. Available: https://www.usenix.org/conference/usenixsecurity24/presentation/deng
2024
-
[31]
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models,
M. Bhatt, S. Chennabasappa, Y . Li, C. Nikolaidis, D. Song, S. Wan, F. Ahmad, C. Aschermann, Y . Chen, D. Kapil, D. Molnar, S. Whitman, J. Saxe, “CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models,” https://arxiv.org/abs/2404.13161, 2024, arXiv preprint
arXiv 2024
-
[32]
Enforceable Security Policies,
F. B. Schneider, “Enforceable Security Policies,”ACM Transactions on Information and System Security, vol. 3, no. 1, pp. 30–50, 2000
2000
-
[33]
Edit Automata: Enforcement Me- chanisms for Run-Time Security Policies,
J. Ligatti, L. Bauer, D. Walker, “Edit Automata: Enforcement Me- chanisms for Run-Time Security Policies,”International Journal of Information Security, vol. 4, no. 1, pp. 2–16, 2005
2005
-
[34]
Progent: Programmable Privilege Control for LLM Agents,
T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, D. Song, “Progent: Programmable Privilege Control for LLM Agents,” https://arxiv.org/abs/ 2504.11703, 2025, arXiv preprint
Pith/arXiv arXiv 2025
-
[35]
AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents,
H. Wang, C. M. Poskitt, J. Sun, “AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents,” https://arxiv.org/abs/ 2503.18666, 2025, arXiv preprint
Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.