Recognition: no theorem link
Evolution of Log-Based Detection Rules in Public Repositories
Pith reviewed 2026-05-12 03:02 UTC · model grok-4.3
The pith
Detection rules in public repositories evolve non-monotonically, repeatedly adding and removing logical conditions rather than converging to stable forms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a predicate graph intermediate representation to canonicalize rule logic and a tree alignment procedure to compare versions, the authors examine 6,859 rule histories and determine that roughly 56 percent undergo detection-logic revisions. Evolution is predominantly non-monotonic, with over half of rules both adding and removing clauses over time, and recurring reversions are common. Combining structural metrics with LLM-assisted intent inference and human validation shows that a quarter to a third of rules oscillate between expanding coverage and reducing false positives rather than converging to stable forms.
What carries the argument
The predicate graph intermediate representation that canonicalizes the logical structure of a detection rule, together with a tree alignment procedure for quantifying changes across revisions.
If this is right
- Rule changes frequently revisit prior decisions instead of strictly accumulating improvements.
- A substantial share of rules continue oscillating between broader coverage and lower false-positive rates throughout their lifetimes.
- The same non-monotonic pattern appears in both community-driven and curated public repositories.
- Detection rule development reflects ongoing trade-offs rather than convergence to an optimal stable state.
Where Pith is reading between the lines
- Tools that visualize reversion patterns could help analysts avoid repeating past adjustments.
- The oscillation may arise from real-world shifts in threat activity or alert volume that the paper does not directly measure.
- Automated systems could monitor rule histories to flag when a rule is likely to require a reversal of a recent change.
Load-bearing premise
The predicate graph and tree alignment procedure faithfully capture semantic changes in detection logic without introducing artifacts or losing critical operational distinctions.
What would settle it
A domain-expert manual audit of a random sample of rule histories that found the automated method systematically misclassified monotonic changes as non-monotonic, or missed equivalent logic expressed in different syntax.
Figures
read the original abstract
Log-based detection rules remain central to modern security operations, encoding domain expertise that analysts iteratively refine to balance detection coverage against alert volume. Yet while prior work has examined the evolution of network intrusion detection signatures, the longitudinal behavior of log-based detection rules has received little empirical study. We present the first longitudinal analysis of detection rule evolution across two widely used repositories: the community-driven Sigma project and the curated Splunk Security Content (SSC). To compare rule versions based on detection logic rather than surface syntax, we introduce a predicate graph intermediate representation that canonicalizes the logical structure of a rule, together with a tree alignment procedure for analyzing changes across revisions. We apply this method to 6,859 rule histories from Sigma and SSC and find that roughly 56% of rules undergo at least one revision on detection logic. Across rule lifetimes, evolution is predominantly non-monotonic, with over half of rules both adding and removing clauses over time. We further observe recurring reversions, indicating that changes are often revisited rather than strictly accumulated. Combining structural analysis with LLM-based inference and human validation of operational intent shows that roughly a quarter to a third of rules alternate between expanding coverage and reducing false positives, rather than converging toward a stable form. Together, these results reveal that detection rule evolution in public repositories reflects ongoing operational trade-offs rather than steady convergence. Our study raises questions about why rules change the way they do and supports research towards better processes for devising and deploying security rules.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to perform the first longitudinal empirical analysis of detection rule evolution in public log-based security rule repositories (Sigma and SSC). By developing a predicate graph intermediate representation and tree alignment method to track semantic changes in detection logic across revisions, the authors analyze 6,859 rule histories. Key findings include that roughly 56% of rules are revised at least once, evolution is predominantly non-monotonic with over half of rules both adding and removing clauses, recurring reversions occur, and approximately 25-33% of rules alternate between expanding coverage and reducing false positives instead of converging to a stable form. The work concludes that such evolution reflects ongoing operational trade-offs.
Significance. If the methodological components are validated, this would represent a significant contribution by providing the first data-driven view into the iterative refinement process of log-based detection rules, revealing non-convergent patterns that contrast with assumptions of steady improvement. The predicate graph approach for abstracting syntax is a positive methodological step that enables the analysis. The results could inform security practitioners and tool developers on managing rule complexity and change. However, the current lack of strong validation for the IR and alignment reduces the immediate impact, though the empirical pipeline is clearly outlined.
major comments (2)
- [§3 (Introduction of Predicate Graph IR and Tree Alignment)] All headline quantitative results (56% revision rate, non-monotonic evolution in >50% of rules, 25-33% alternation) depend on the predicate graph canonicalization and tree alignment correctly identifying meaningful semantic edits rather than syntactic variations. The manuscript provides internal consistency checks, LLM-assisted labeling, and limited human spot-checks but lacks a rigorous blinded expert validation on a held-out sample to establish the fidelity of the procedure, such as precision/recall against ground-truth semantic changes. This is load-bearing for the central claims about operational trade-offs.
- [§5 (Results - Intent Classification)] The classification of rule changes as alternating between coverage expansion and false-positive reduction is based on LLM inference of intent with human validation. No quantitative details on the strength of this validation (e.g., agreement rates, number of checked samples, or disagreement resolution) are reported, which is necessary to support the specific fraction of rules exhibiting this behavior.
minor comments (2)
- [Abstract and Data Description] The total of 6,859 rule histories is presented without a breakdown by repository (Sigma vs. SSC), time period, or explicit filtering criteria applied to select histories, making it difficult to assess potential selection biases.
- [Throughout] The paper would benefit from more precise reporting of exact counts and percentages alongside the 'roughly' and 'over half' qualifiers to allow readers to better gauge the effect sizes.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review. The comments highlight important opportunities to strengthen the validation of our methodological components. We address each major comment below and will revise the manuscript accordingly to improve transparency and rigor.
read point-by-point responses
-
Referee: [§3 (Introduction of Predicate Graph IR and Tree Alignment)] All headline quantitative results (56% revision rate, non-monotonic evolution in >50% of rules, 25-33% alternation) depend on the predicate graph canonicalization and tree alignment correctly identifying meaningful semantic edits rather than syntactic variations. The manuscript provides internal consistency checks, LLM-assisted labeling, and limited human spot-checks but lacks a rigorous blinded expert validation on a held-out sample to establish the fidelity of the procedure, such as precision/recall against ground-truth semantic changes. This is load-bearing for the central claims about operational trade-offs.
Authors: We agree that the fidelity of the predicate graph IR and tree alignment is critical, as our quantitative findings on revision rates, non-monotonic evolution, and alternation patterns depend on it distinguishing semantic edits from syntactic noise. The manuscript reports internal consistency checks, LLM-assisted labeling, and limited human spot-checks, but we acknowledge that these fall short of a rigorous blinded expert validation on a held-out sample with precision/recall metrics. In the revised manuscript, we will add a blinded validation study: experts will independently label ground-truth semantic changes on a held-out sample of rule revisions, enabling computation and reporting of precision and recall for our canonicalization and alignment procedure. This will directly address the load-bearing nature of the method for claims about operational trade-offs. revision: yes
-
Referee: [§5 (Results - Intent Classification)] The classification of rule changes as alternating between coverage expansion and false-positive reduction is based on LLM inference of intent with human validation. No quantitative details on the strength of this validation (e.g., agreement rates, number of checked samples, or disagreement resolution) are reported, which is necessary to support the specific fraction of rules exhibiting this behavior.
Authors: We thank the referee for noting this gap in reporting. The intent classification combines LLM inference with human validation, but the manuscript does not provide quantitative details on validation strength. In the revised version, we will expand the relevant section to report the number of samples subjected to human validation, inter-annotator agreement rates (e.g., Cohen's kappa or percentage agreement), and the process for resolving disagreements. These additions will offer the necessary transparency and support the reported fraction of rules alternating between coverage expansion and false-positive reduction. revision: yes
Circularity Check
No circularity: results are direct empirical counts from repository data
full rationale
The paper collects 6,859 rule histories from Sigma and SSC repositories, converts each to a predicate graph IR, applies tree alignment to detect changes, and reports aggregate statistics (56% revised, >50% non-monotonic, 25-33% alternating) plus LLM-assisted intent labels. These quantities are straightforward observational tallies and classifications; none are obtained by fitting parameters on a data subset and then treating a related quantity as a 'prediction,' nor are any results defined in terms of themselves. The IR and alignment procedure are introduced as an analysis tool rather than a derived claim whose correctness is presupposed by the output statistics. No load-bearing self-citations, uniqueness theorems, or ansatzes from prior author work appear in the derivation chain. The analysis is therefore self-contained against external benchmarks and receives a score of 0.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The predicate graph canonicalizes logical structure of rules sufficiently to enable meaningful comparison of detection intent across revisions
invented entities (1)
-
predicate graph intermediate representation
no independent evidence
Reference graph
Works this paper leans on
-
[1]
99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms
Bushra A Alahmadi, Louise Axon, and Ivan Martinovic. 99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms. InUSENIX Security Symposium, 2022
work page 2022
-
[2]
SoK: Pragmatic assessment of machine learning for network intrusion detection
Giovanni Apruzzese, Pavel Laskov, and Johannes Schneider. SoK: Pragmatic assessment of machine learning for network intrusion detection. InIEEE European Symposium on Security and Privacy, 2023. 19
work page 2023
-
[3]
DeepLog: Anomaly detection and diagnosis from system logs through deep learning
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. InACM SIGSAC Conference on Computer and Communications Security, 2017
work page 2017
-
[4]
Fine-grained and accurate source code differencing
Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. Fine-grained and accurate source code differencing. InIEEE/ACM International Conference on Automated Software Engineering, 2014
work page 2014
-
[5]
Automatic generation of HTTP intrusion signatures by selective identification of anomalies
Pedro Garcia-Teodoro, JE Diaz-Verdejo, Juan E Tapiador, and Rolando Salazar-Hernández. Automatic generation of HTTP intrusion signatures by selective identification of anomalies. Computers & Security, 55:159–174, 2015
work page 2015
-
[6]
Charac- terizing the modification space of signature IDS rules
Ryan Guide, Eric Pauley, Yohan Beugin, Ryan Sheatsley, and Patrick McDaniel. Charac- terizing the modification space of signature IDS rules. InIEEE Military Communications Conference, 2023
work page 2023
-
[7]
Ebrima Jaw and Xueming Wang. A novel hybrid-based approach of snort automatic rule gen- erator and security event correlation (SARG-SEC).PeerJ Computer Science, 2022
work page 2022
-
[8]
Mining intrusion detection alarms for actionable knowledge
Klaus Julisch and Marc Dacier. Mining intrusion detection alarms for actionable knowledge. InACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002
work page 2002
-
[9]
Gyuwan Kim, Hayoon Yi, Jangho Lee, Yunheung Paek, and Sungroh Yoon. LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems. arXiv:1611.01726, 2016
-
[10]
Toward context-aware alert classification in security operations centers using LLMs
Jonathan Roy and Elhadj Abdourahmane Balde. Toward context-aware alert classification in security operations centers using LLMs. InIEEE International Conference on AI in Cyberse- curity, 2026
work page 2026
-
[11]
Muhammad Sheeraz, Muhammad Hanif Durad, Muhammad Arsalan Paracha, Syed Muham- mad Mohsin, Sadia Nishat Kazmi, and Carsten Maple. Revolutionizing SIEM security: An innovative correlation engine design for multi-layered attack detection.Sensors, 24(15):4901, 2024
work page 2024
-
[12]
RuleGenie: SIEM detection rule set optimization.arXiv preprint arXiv:2505.06701, 2025
Akansha Shukla, Parth Atulbhai Gandhi, Yuval Elovici, and Asaf Shabtai. RuleGenie: SIEM detection rule set optimization.arXiv preprint arXiv:2505.06701, 2025
-
[13]
Sigma-CLI: Sigma command line interface.https://github.com/SigmaHQ/s igma-cli, 2026
SigmaHQ. Sigma-CLI: Sigma command line interface.https://github.com/SigmaHQ/s igma-cli, 2026
work page 2026
-
[14]
Koen TW Teuwen, Tom Mulders, Emmanuele Zambon, and Luca Allodi. Ruling the unruly: Designing effective, low-noise network intrusion detection rules for security operations cen- ters. InACM Asia Conference on Computer and Communications Security, 2025
work page 2025
-
[15]
You cannot escape me: Detecting evasions of SIEM rules in enterprise networks
Rafael Uetz, Marco Herzog, Louis Hackländer, Simon Schwarz, and Martin Henze. You cannot escape me: Detecting evasions of SIEM rules in enterprise networks. InUSENIX Security Symposium, 2024
work page 2024
-
[16]
DeepCASE: Semi-supervised contextual analysis of security events
Thijs Van Ede, Hojjat Aghakhani, Noah Spahn, Riccardo Bortolameotti, Marco Cova, An- drea Continella, Maarten Van Steen, Andreas Peter, Christopher Kruegel, and Giovanni Vigna. DeepCASE: Semi-supervised contextual analysis of security events. InIEEE Symposium on Security and Privacy, 2022
work page 2022
-
[17]
Mathew Vermeer, Michel van Eeten, and Carlos Gañán. Ruling the rules: Quantifying the evo- lution of rulesets, alerts and incidents in network intrusion detection. InACM Asia Conference on Computer and Communications Security, 2022
work page 2022
-
[18]
Detecting intrusions using system calls: Alternative data models
Christina Warrender, Stephanie Forrest, and Barak Pearlmutter. Detecting intrusions using system calls: Alternative data models. InIEEE Symposium on Security and Privacy, 1999
work page 1999
-
[19]
An architecture for generating semantic-aware signatures
Vinod Yegneswaran, Paul Barford, and Johannes Ullrich. An architecture for generating semantic-aware signatures. InUSENIX Security Symposium, 2005
work page 2005
-
[20]
auditd")) PRED(field=proctitle, operator=IN, value=[WILDCARD(
Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing distance between trees and related problems.SIAM Journal on Computing, 18(6):1245–1262, 1989. Acknowledgments This work is supported in part by funds provided by the National Science Foundation, Department of Homeland Security, and IBM through the ACTION AI Institute (Award #2229876)...
work page 1989
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.