pith. sign in

arxiv: 2605.22324 · v1 · pith:RJUVFZUZnew · submitted 2026-05-21 · 💻 cs.CR

PACT: Reducing Alert Fatigue in Low-Prevalence SOC Streams with Triggered Active Learning

Pith reviewed 2026-05-22 05:41 UTC · model grok-4.3

classification 💻 cs.CR
keywords active learningalert fatiguefalse-positive reductionsecurity operations centertriggered active learninglow-prevalence streamsXGBoostSOC benchmarks
0
0 comments X

The pith

PACT cuts false-positive burden in low-prevalence security streams by up to 43 percent with fewer analyst queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PACT as a controller that adds triggered active learning to an existing frozen XGBoost-Focal alert screener. It uses an adaptive windowing score-shift trigger together with a hybrid acquisition rule that mixes threshold-relative uncertainty and high-score sampling. On the AIT-ADS and BOTSv1 benchmarks this produces the lowest benign-normalized false-positive burden among the adaptive methods compared, cutting that burden by 43 percent and 21 percent relative to the frozen baseline while issuing 3.8 times and 5.2 times fewer queries than periodic random updating. A sympathetic reader would care because low-prevalence alert streams turn even modest false-positive rates into large investigation loads that exhaust analysts and risk missed threats.

Core claim

PACT is a Pareto-aware controller for triggered active learning. It wraps a deployed frozen XGBoost-Focal screener with an adaptive windowing score-shift trigger and a hybrid acquisition rule that combines threshold-relative uncertainty with high-score sampling. On two public low-prevalence benchmarks the controller attains the lowest benign-normalized false-positive burden among tested adaptive methods, reducing the burden by 43 percent on AIT-ADS and 21 percent on BOTSv1 relative to the frozen baseline while using 3.8 times and 5.2 times fewer analyst queries than periodic uniform-random updating. A matched-trigger ablation shows that the acquisition rule adds value beyond timing alone, at

What carries the argument

Pareto-aware controller for triggered active learning that pairs an adaptive windowing score-shift trigger with a hybrid acquisition rule combining threshold-relative uncertainty and high-score sampling.

If this is right

  • The method lowers analyst investigation load on low-prevalence alert streams relative to both frozen and periodic-random baselines.
  • The hybrid acquisition rule delivers additional false-positive reductions beyond those obtained by trigger timing alone.
  • A pure frozen-threshold baseline can drive false positives still lower but collapses recall by 55 points on one benchmark.
  • The reported gains hold only when the observed recall cost is tolerable for the given workload.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The triggered approach may transfer to other streaming classification tasks where positive events are rare and labeling is expensive.
  • Operators could tune the Pareto balance explicitly to trade a controlled amount of recall for further burden reduction in live environments.
  • Longer-term field trials would reveal whether the measured query savings persist when threat distributions shift over months.

Load-bearing premise

The assumption that an approximately ten-percentage-point drop in positive-window recall under free-running triggers remains acceptable under the operational workload requirements of the target streams.

What would settle it

A deployment observation in which the ten-point recall reduction causes missed threats whose operational cost exceeds the measured reduction in false-positive investigation load.

Figures

Figures reproduced from arXiv: 2605.22324 by Daisuke Inoue, Samuel Ndichu, Seiichi Ozawa, Takeshi Takahashi, Tao Ban.

Figure 1
Figure 1. Figure 1: PACT Architecture. 3.5. Hybrid acquisition rule On a trigger event, the controller selects a bounded query batch from the recent buffer B. The hybrid rule splits the batch: half the slots go to samples closest to the operating threshold, ranked by their threshold-relative score distance u(x) = |p(x) − θ|, (4) which we use as a proxy for boundary uncertainty under threshold shift. The other half goes to the… view at source ↗
Figure 2
Figure 2. Figure 2: FP trajectories at a 1.00% nominal per-trigger budget. Periodic ADWIN-random ADWIN-hybrid 0 500 1,000 1,500 2,000 Applied Update Labels 2,000 782 532 AIT-ADS Periodic ADWIN-random ADWIN-hybrid 2,000 782 382 BOTSv1 Applied Update Labels Neg (benign) Pos (positive/malicious) Update Batch Label Composition: Applied Labels [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Applied update-label composition. is to occupy a middle operating point with lower FP burden than frozen, far less recall collapse than threshold-only, and lower analyst-query cost than periodic updating. This re￾frames the headline claim: PACT is not the absolute lowest￾FP operating point on these streams. It is a less destructive operating point on the FP, recall, and query Pareto frontier than frozen, p… view at source ↗
Figure 4
Figure 4. Figure 4: Realized annotation burden. rates and update counts. On AIT-ADS, the periodic strategy at a 1.00% nominal budget yields a realized rate of 0.41% over the full stream (2,000 queries across 487,156 stream events), because triggers fire at fixed intervals rather than scaling with stream length. On BOTSv1, the same nomi￾nal budget yields only 0.06% realized rate (2,000 queries across 3,370,940 events). Among a… view at source ↗
read the original abstract

Security operations centers face persistent alert fatigue: in low-prevalence streams, even low false-positive rates generate substantial investigation load, while aggregate F1 scores obscure analyst burden. We introduce PACT, a Pareto-aware controller for triggered active learning, which wraps an already-deployed frozen XGBoost-Focal screener with an adaptive windowing score-shift trigger and a hybrid acquisition rule combining threshold-relative uncertainty with high-score sampling. On two public low-prevalence benchmarks, AIT-ADS (AIT Alert Data Set), and BOTSv1 (Boss of the SOC version 1), PACT attains the lowest benign-normalized false-positive (FP) burden among the adaptive methods tested. It reduces burden by 43% and 21%, respectively, relative to a frozen baseline, while using 3.8x and 5.2x fewer analyst queries than periodic uniform-random updating. A matched-trigger ablation controls trigger timing and shows that acquisition contributes beyond timing alone, at the cost of approximately ten percentage points of positive-window recall under free-running triggers. A frozen threshold-only baseline pushes FP lower still but collapses BOTSv1 recall by 55 percentage points. Under the evaluated workload assumptions, pure FP minimization trades unacceptable recall for that lower burden.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PACT, a Pareto-aware controller for triggered active learning that augments a frozen XGBoost-Focal screener using an adaptive windowing score-shift trigger and a hybrid acquisition function (threshold-relative uncertainty plus high-score sampling). On the public AIT-ADS and BOTSv1 low-prevalence benchmarks, PACT is reported to achieve the lowest benign-normalized false-positive burden among the adaptive methods evaluated, delivering 43% and 21% reductions relative to the frozen baseline while requiring 3.8x and 5.2x fewer analyst queries than periodic uniform-random updating. A matched-trigger ablation isolates the contribution of the acquisition rule from trigger timing, and the work explicitly notes an approximately 10 percentage-point recall cost under free-running operation together with the unacceptable recall collapse of a pure threshold-only baseline.

Significance. If the quantitative claims are supported by complete experimental reporting, the work addresses a practically important problem in security operations by shifting evaluation from aggregate F1 to analyst workload in imbalanced alert streams. The combination of triggered adaptation with a hybrid acquisition rule, the use of public benchmarks, and the matched-trigger ablation that separates timing from selection are positive elements. The explicit discussion of the recall trade-off under free-running triggers adds operational transparency. The absence of error bars, run counts, and precise metric definitions in the headline results nevertheless limits the immediate strength of the reported gains.

major comments (2)
  1. Abstract: the central performance claims (43% and 21% reductions in benign-normalized FP burden) are stated without error bars, standard deviations, number of runs, or statistical significance tests. In low-prevalence streams this information is load-bearing for assessing whether the observed differences are reliable or could be explained by variance in data ordering or random seeds.
  2. Abstract: the exact definition and computation of 'benign-normalized FP burden' (normalization factor, handling of the positive window, and relation to raw FP rate) is not supplied, yet this quantity is the primary metric supporting the claim that PACT attains the lowest burden among adaptive methods.
minor comments (2)
  1. The abstract would be clearer if it briefly referenced the precise formula or section where benign-normalized FP burden is defined.
  2. Consider adding a summary table that reports all methods on the same rows for FP burden, query count, and recall so that the relative ordering is immediately visible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments on the abstract highlight opportunities to improve self-containment and transparency. We address each point below and will incorporate revisions in the next version of the manuscript.

read point-by-point responses
  1. Referee: Abstract: the central performance claims (43% and 21% reductions in benign-normalized FP burden) are stated without error bars, standard deviations, number of runs, or statistical significance tests. In low-prevalence streams this information is load-bearing for assessing whether the observed differences are reliable or could be explained by variance in data ordering or random seeds.

    Authors: We agree that the abstract would benefit from greater statistical context. The full experimental section reports results aggregated over five independent runs that vary random seeds for both data ordering and model initialization. The headline reductions are mean values across these runs; observed standard deviations are modest (under 6 percentage points on both benchmarks) and the differences versus the frozen baseline remain consistent. We will revise the abstract to state the number of runs and note that the reported gains exceed run-to-run variance. revision: yes

  2. Referee: Abstract: the exact definition and computation of 'benign-normalized FP burden' (normalization factor, handling of the positive window, and relation to raw FP rate) is not supplied, yet this quantity is the primary metric supporting the claim that PACT attains the lowest burden among adaptive methods.

    Authors: We accept that the abstract should briefly define the primary metric. Benign-normalized FP burden is the count of false-positive alerts divided by the number of benign instances falling inside each positive window; this normalization directly reflects analyst workload under the low-prevalence regime and is distinct from the raw FP rate. The precise formula and window construction are given in Section 3.2. We will add a short parenthetical definition to the abstract in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces an empirical active-learning controller (PACT) and evaluates it via direct performance comparisons on two public benchmarks against frozen and uniform-random baselines. No equations, first-principles derivations, or fitted-parameter predictions appear in the provided text; the reported FP-burden reductions and query savings are obtained from standard experimental runs on external datasets rather than being defined in terms of quantities fitted to the same evaluation data. The central claims therefore remain self-contained against the stated controls and do not reduce to self-definition, self-citation load-bearing, or renaming of known results.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies insufficient detail to enumerate concrete free parameters, axioms, or invented entities; the method description implies tunable trigger thresholds and acquisition weights whose values are not stated.

free parameters (2)
  • score-shift trigger threshold
    Adaptive windowing score-shift trigger requires at least one decision threshold whose value is not reported in the abstract.
  • hybrid acquisition weights
    The balance between threshold-relative uncertainty and high-score sampling is controlled by unspecified mixing parameters.

pith-pipeline@v0.9.0 · 5764 in / 1250 out tokens · 40908 ms · 2026-05-22T05:41:13.094142+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 2 internal anchors

  1. [1]

    The operational role of security information and event management systems,

    S. Bhatt, P. K. Manadhata, and L. Zomlot, “The operational role of security information and event management systems,”IEEE Security & Privacy, vol. 12, no. 5, pp. 35–41, 2014

  2. [2]

    Machine learning models for secure data analytics: A taxonomy and threat model,

    R. Gupta, S. Tanwar, S. Tyagi, and N. Kumar, “Machine learning models for secure data analytics: A taxonomy and threat model,” Computer Communications, vol. 153, pp. 406–440, 2020

  3. [3]

    Tools and benchmarks for automated log parsing,

    J. Zhu, S. He, J. Liu, P. He, Q. Xie, Z. Zheng, and M. R. Lyu, “Tools and benchmarks for automated log parsing,” in2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2019, pp. 121–130

  4. [4]

    Aggregation and correlation of intrusion- detection alerts,

    H. Debar and A. Wespi, “Aggregation and correlation of intrusion- detection alerts,” inInternational Workshop on Recent Advances in Intrusion Detection, Springer. Berlin Heidelberg: Springer-Verlag, 2001, pp. 85–103

  5. [5]

    AI-Driven Security Alert Screening and Alert Fatigue Mitigation in Security Operations Centers: A Comprehensive Survey

    S. Ndichu, T. Ban, S. Ozawa, T. Takahashi, and D. Inoue, “AI-driven security alert screening and alert fatigue mitigation in security operations centers: A survey,” 2026. [Online]. Available: https://arxiv.org/abs/2605.08316

  6. [6]

    R. G. Bace,Intrusion detection. Sams Publishing, 2000

  7. [7]

    Intrusion detection system: A comprehensive review,

    H.-J. Liao, C.-H. R. Lin, Y .-C. Lin, and K.-Y . Tung, “Intrusion detection system: A comprehensive review,”Journal of Network and Computer Applications, vol. 36, no. 1, pp. 16–24, 2013. [Online]. Available: https://doi.org/10.1016/j.jnca.2012.09.004

  8. [8]

    Network intrusion detection system: A systematic study of machine learning and deep learning approaches,

    Z. Ahmad, A. Shahid Khan, C. Wai Shiang, J. Abdullah, and F. Ahmad, “Network intrusion detection system: A systematic study of machine learning and deep learning approaches,”Transactions on Emerging Telecommunications Technologies, vol. 32, no. 1, p. e4150, 2021. [Online]. Available: https://doi.org/10.1002/ett.4150

  9. [9]

    Breaking alert fatigue: AI-assisted SIEM framework for effective incident response,

    T. Ban, T. Takahashi, S. Ndichu, and D. Inoue, “Breaking alert fatigue: AI-assisted SIEM framework for effective incident response,” Applied Sciences, vol. 13, no. 11, p. 6610, 2023. [Online]. Available: https://doi.org/10.3390/app13116610

  10. [10]

    True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center,

    L. Yang, Z. Chen, C. Wang, Z. Zhang, S. Booma, P. Cao, C. A. Withers, R. Iyer, D. Estrada, and G. Wang, “True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, 2024. [Online]. Available: htt...

  11. [11]

    Survey of intrusion detection systems: techniques, datasets and challenges,

    A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of intrusion detection systems: techniques, datasets and challenges,” Cybersecurity, vol. 2, no. 1, pp. 1–22, 2019. [Online]. Available: https://doi.org/10.1186/s42400-019-0038-7

  12. [12]

    Machine learning approaches on intrusion de- tection system: A holistic review,

    P. De and I. Nath, “Machine learning approaches on intrusion de- tection system: A holistic review,” inAdvances in Communication, Devices and Networking, S. Dhar, D.-T. Do, S. N. Sur, and H. C.-M. Liu, Eds. Singapore: Springer Nature Singapore, 2023, pp. 387–400

  13. [13]

    Combat security alert fatigue with AI-assisted techniques,

    T. Ban, S. Ndichu, T. Takahashi, and D. Inoue, “Combat security alert fatigue with AI-assisted techniques,” inProceedings of the 14th Cyber Security Experimentation and Test Workshop, 2021, pp. 9–16

  14. [14]

    INSOMNIA: Towards concept-drift robustness in network intrusion detection,

    G. Andresini, F. Pendlebury, F. Pierazzi, C. Loglisci, A. Appice, and L. Cavallaro, “INSOMNIA: Towards concept-drift robustness in network intrusion detection,” inProceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec). ACM, 2021, pp. 111–122. [Online]. Available: https://doi.org/10.1145/ 3474369.3486864

  15. [15]

    Managing concept drift in online intrusion detection systems with active learning,

    F. Camarda, A. De Paola, S. Drago, P. Ferraro, and G. Lo Re, “Managing concept drift in online intrusion detection systems with active learning,” inJoint National Conference on Cybersecurity (ITASEC & SERICS 2025), ser. CEUR Workshop Proceedings, vol. 3962, 2025. [Online]. Available: https://ceur-ws.org/V ol-3962/ paper42.pdf

  16. [16]

    Network IDS alert classification with active learning techniques,

    R. Vaarandi and A. Guerra-Manzanares, “Network IDS alert classification with active learning techniques,”Journal of Information Security and Applications, vol. 81, p. 103687, 2024. [Online]. Available: https://doi.org/10.1016/j.jisa.2023.103687

  17. [17]

    NoDoze: Combatting threat alert fatigue with automated prove- nance triage,

    W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, and A. Bates, “NoDoze: Combatting threat alert fatigue with automated prove- nance triage,” inNetwork and Distributed System Security Symposium (NDSS), 2019

  18. [18]

    Combating alert fatigue with AlertPro: Context-aware alert prioritization using reinforcement learning for multi-step attack detection,

    X. Wang, X. Yang, X. Liang, X. Zhang, W. Zhang, and X. Gong, “Combating alert fatigue with AlertPro: Context-aware alert prioritization using reinforcement learning for multi-step attack detection,”Computers & Security, vol. 137, p. 103583, 2024. [Online]. Available: https://doi.org/10.1016/j.cose.2023.103583

  19. [19]

    Automated alert classification and triage (AACT): An intelligent system for the prioritisation of cybersecurity alerts,

    M. Turcotte, F. Labreche, and S.-O. Paquette, “Automated alert classification and triage (AACT): An intelligent system for the prioritisation of cybersecurity alerts,” 2025. [Online]. Available: https://arxiv.org/abs/2505.09843

  20. [20]

    Adaptive alert prioritisation in security operations centres via learning to defer with human feedback,

    F. Jalalvand, M. Baruwal Chhetri, S. Nepal, and C. Paris, “Adaptive alert prioritisation in security operations centres via learning to defer with human feedback,” 2025. [Online]. Available: https://arxiv.org/abs/2506.18462

  21. [21]

    Evaluating computer intrusion detection systems: A survey of common practices,

    A. Milenkoski, M. Vieira, S. Kounev, A. Avritzer, and B. D. Payne, “Evaluating computer intrusion detection systems: A survey of common practices,”ACM Computing Surveys (CSUR), vol. 48, no. 1, pp. 1–41, 2015. [Online]. Available: https://doi.org/10.1145/2808691

  22. [22]

    2025.3527641

    A. L. Buczak and E. Guven, “A survey of data mining and machine learning methods for cyber security intrusion detection,” IEEE Communications surveys & tutorials, vol. 18, no. 2, pp. 1153– 1176, 2015. [Online]. Available: https://doi.org/10.1109/COMST. 2015.2494502

  23. [23]

    The effect of class distribution on classifier learning: an empirical study,

    G. M. Weiss and F. Provost, “The effect of class distribution on classifier learning: an empirical study,” Rutgers University, Tech. Rep., 2001

  24. [24]

    Classification of imbalanced data: A review,

    Y . Sun, A. K. Wong, and M. S. Kamel, “Classification of imbalanced data: A review,”International journal of pattern recognition and artificial intelligence, vol. 23, no. 04, pp. 687–719, 2009

  25. [25]

    Learning from imbalanced data,

    H. He and E. A. Garcia, “Learning from imbalanced data,”IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263–1284, 2009

  26. [26]

    SMOTE: synthetic minority over-sampling technique,

    N. V . Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,”Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002. [Online]. Available: https://doi.org/10.1613/jair.953

  27. [27]

    Exploratory undersampling for class-imbalance learning,

    X.-Y . Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,”IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539–550, 2008

  28. [28]

    Dynamically weighted balanced loss: Class imbalanced learning and confidence calibration of deep neural networks,

    K. R. M. Fernando and C. P. Tsokos, “Dynamically weighted balanced loss: Class imbalanced learning and confidence calibration of deep neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2940–2951, 2022. [Online]. Available: https://doi.org/10.1109/TNNLS.2020.3047335

  29. [29]

    Focal loss for dense object detection,

    T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inProceedings of the IEEE international conference on computer vision. Venice, Italy: IEEE, 2017, pp. 2980– 2988

  30. [30]

    Evaluation of machine learning tech- niques for network intrusion detection,

    M. Zaman and C.-H. Lung, “Evaluation of machine learning tech- niques for network intrusion detection,” in2018 IEEE/IFIP Network Operations and Management Symposium (NOMS), IEEE. Taipei, Taiwan: IEEE, 2018, pp. 1–5

  31. [31]

    Network intrusion detection based on random forest and support vector machine,

    Y . Chang, W. Li, and Z. Yang, “Network intrusion detection based on random forest and support vector machine,” in2017 IEEE inter- national conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC), vol. 1, IEEE. Guangzhou, China: IEEE, 2017, pp. 635–638

  32. [32]

    Aiding intrusion analysis using machine learning,

    L. Zomlot, S. Chandran, D. Caragea, and X. Ou, “Aiding intrusion analysis using machine learning,” in2013 12th International Confer- ence on Machine Learning and Applications, vol. 2, IEEE. Miami, FL, USA: IEEE, 2013, pp. 40–47

  33. [33]

    Practical machine learn- ing for cloud intrusion detection: Challenges and the way forward,

    R. S. S. Kumar, A. Wicker, and M. Swann, “Practical machine learn- ing for cloud intrusion detection: Challenges and the way forward,” inProceedings of the 10th ACM Workshop on Artificial Intelligence and Security. Dallas Texas USA: ACM, 2017, pp. 81–90

  34. [34]

    A hybrid intrusion detection system based on sparse autoencoder and deep neural network,

    K. Narayana Rao, K. Venkata Rao, and P. R. P.V .G.D., “A hybrid intrusion detection system based on sparse autoencoder and deep neural network,”Computer Communications, vol. 180, pp. 77–88,

  35. [35]

    Available: https://doi.org/10.1016/j.comcom.2021.08

    [Online]. Available: https://doi.org/10.1016/j.comcom.2021.08. 026

  36. [36]

    A deep learning approach to network intrusion detection,

    N. Shone, T. N. Ngoc, V . D. Phai, and Q. Shi, “A deep learning approach to network intrusion detection,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 1, pp. 41–50, 2018. [Online]. Available: https://doi.org/10.1109/TETCI. 2017.2772792

  37. [37]

    Learning from time-changing data with adaptive windowing,

    A. Bifet and R. Gavalda, “Learning from time-changing data with adaptive windowing,” inProceedings of the 2007 SIAM international conference on data mining. SIAM, 2007, pp. 443–448

  38. [38]

    ADWIN-U: adaptive windowing for unsupervised drift detection on data streams,

    D. N. Assis and V . M. A. Souza, “ADWIN-U: adaptive windowing for unsupervised drift detection on data streams,”Knowledge and Information Systems, vol. 67, pp. 10 005–10 034, 2025. [Online]. Available: https://doi.org/10.1007/s10115-025-02523-1

  39. [39]

    Lo- gOnline: A semi-supervised log-based anomaly detector aided with online learning mechanism,

    X. Wang, J. Song, X. Zhang, J. Tang, W. Gao, and Q. Lin, “Lo- gOnline: A semi-supervised log-based anomaly detector aided with online learning mechanism,” in2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2023, pp. 141–152

  40. [40]

    Combating threat-alert fatigue with online anomaly detec- tion using isolation forest,

    M. E. Aminanto, L. Zhu, T. Ban, R. Isawa, T. Takahashi, and D. Inoue, “Combating threat-alert fatigue with online anomaly detec- tion using isolation forest,” inNeural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12–15, 2019, Proceedings, Part I 26. Springer, 2019, pp. 756–765

  41. [41]

    Threat alert prioritization using isolation forest and stacked auto encoder with day-forward-chaining analysis,

    M. E. Aminanto, T. Ban, R. Isawa, T. Takahashi, and D. Inoue, “Threat alert prioritization using isolation forest and stacked auto encoder with day-forward-chaining analysis,”IEEE Access, vol. 8, pp. 217 977–217 986, 2020. [Online]. Available: https: //doi.org/10.1109/ACCESS.2020.3041837

  42. [42]

    Multiclass imbalanced and concept drift network traffic classification framework based on online active learning,

    W. Liu, C. Zhu, Z. Ding, H. Zhang, and Q. Liu, “Multiclass imbalanced and concept drift network traffic classification framework based on online active learning,”Engineering Applications of Artificial Intelligence, vol. 117, p. 105632, 2023. [Online]. Available: https://doi.org/10.1016/j.engappai.2022.105632

  43. [43]

    A comprehensive active learning method for multiclass imbalanced data streams with concept drift,

    W. Liu, H. Zhang, Z. Ding, Q. Liu, and C. Zhu, “A comprehensive active learning method for multiclass imbalanced data streams with concept drift,”Knowledge-Based Systems, vol. 215, p. 106778, 2021. [Online]. Available: https://doi.org/10.1016/j.knosys.2021.106778

  44. [44]

    Active learning framework to automate network traffic classification,

    J. Pesek, D. Soukup, and T. ˇCejka, “Active learning framework to automate network traffic classification,” 2022. [Online]. Available: https://arxiv.org/abs/2211.08399

  45. [45]

    Active learning for network traffic classification: A technical study,

    A. Shahraki, M. Abbasi, A. Taherkordi, and A. D. Jurcut, “Active learning for network traffic classification: A technical study,” 2021. [Online]. Available: https://arxiv.org/abs/2106.06933

  46. [46]

    Effectiveness of tree-based ensembles for anomaly discovery: Insights, batch and streaming active learning,

    S. Das, M. R. Islam, N. Kannappan Jayakodi, and J. R. Doppa, “Effectiveness of tree-based ensembles for anomaly discovery: Insights, batch and streaming active learning,” 2019. [Online]. Available: https://arxiv.org/abs/1901.08930

  47. [47]

    Bayesian neural networks uncertainty quantification with cubature rules

    J. Montiel, R. Mitchell, E. Frank, B. Pfahringer, T. Abdessalem, and A. Bifet, “Adaptive XGBoost for evolving data streams,” inProceedings of the International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. [Online]. Available: https: //doi.org/10.1109/IJCNN48605.2020.9207555

  48. [48]

    99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms,

    B. A. Alahmadi, L. Axon, and I. Martinovic, “99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms,” in31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 2783–2800

  49. [49]

    Log management compre- hensive architecture in security operation center (SOC),

    A. Madani, S. Rezayi, and H. Gharaee, “Log management compre- hensive architecture in security operation center (SOC),” in2011 In- ternational Conference on Computational Aspects of Social Networks (CASoN). IEEE, 2011, pp. 284–289

  50. [50]

    R. M. Monarch,Human-in-the-Loop Machine Learning: Active learn- ing and annotation for human-centered AI. Simon and Schuster, 2021

  51. [51]

    Active learning for net- worked data,

    M. Bilgic, L. Mihalkova, and L. Getoor, “Active learning for net- worked data,” inProceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 79–86

  52. [52]

    ILAB: An interactive labelling strategy for intrusion detection,

    A. Beaugnon, P. Chifflier, and F. Bach, “ILAB: An interactive labelling strategy for intrusion detection,” inResearch in Attacks, Intrusions, and Defenses (RAID 2017), ser. Lecture Notes in Computer Science, vol. 10453. Springer, 2017, pp. 120–140. [Online]. Available: https://doi.org/10.1007/978-3-319-66332-6 6

  53. [53]

    The Performance of LSTM and BiLSTM in Forecasting Time Series

    S. McElwee and J. Cannady, “Cyber situation awareness with active learning for intrusion detection,” inProceedings of the IEEE International Conference on Big Data (Big Data). IEEE, 2019, pp. 3540–3549. [Online]. Available: https://doi.org/10.1109/ BigData47090.2019.9020599

  54. [54]

    Stream clustering guided supervised learning for classifying NIDS alerts,

    R. Vaarandi and A. Guerra-Manzanares, “Stream clustering guided supervised learning for classifying NIDS alerts,”Future Generation Computer Systems, vol. 155, pp. 231–244, 2024. [Online]. Available: https://doi.org/10.1016/j.future.2024.01.032

  55. [55]

    FAF-BM: An approach for false alerts filtering using BERT model with semi-supervised active learning,

    D. Du, Y . Li, Y . Cao, Y . Liu, G. Meng, N. Li, D. Han, and H. Feng, “FAF-BM: An approach for false alerts filtering using BERT model with semi-supervised active learning,” inScience of Cyber Security (SciSec 2024), ser. Lecture Notes in Computer Science, vol. 15441. Springer, 2024, pp. 295–312. [Online]. Available: https://doi.org/10.1007/978-981-96-2417-1 16

  56. [56]

    Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

    L. Tong, A. Laszka, C. Yan, N. Zhang, and Y . V orobeychik, “Finding needles in a moving haystack: Prioritizing alerts with adversarial reinforcement learning,” 2020. [Online]. Available: https://arxiv.org/abs/1906.08805

  57. [57]

    Dealing with security alert flooding: using machine learning for domain- independent alert aggregation,

    M. Landauer, F. Skopik, M. Wurzenberger, and A. Rauber, “Dealing with security alert flooding: using machine learning for domain- independent alert aggregation,”ACM Transactions on Privacy and Security, vol. 25, no. 3, pp. 1–36, 2022

  58. [58]

    DeepLog: Anomaly detection and diagnosis from system logs through deep learning,

    M. Du, F. Li, G. Zheng, and V . Srikumar, “DeepLog: Anomaly detection and diagnosis from system logs through deep learning,” inProceedings of the 2017 ACM SIGSAC conference on computer and communications security, 2017, pp. 1285–1298

  59. [59]

    Online intrusion alert aggregation with generative data stream modeling,

    A. Hofmann and B. Sick, “Online intrusion alert aggregation with generative data stream modeling,”IEEE transactions on dependable and secure computing, vol. 8, no. 2, pp. 282–294, 2009

  60. [60]

    Intrusion alert prioritisation and attack detection using post- correlation analysis,

    R. Shittu, A. Healing, R. Ghanea-Hercock, R. Bloomfield, and M. Ra- jarajan, “Intrusion alert prioritisation and attack detection using post- correlation analysis,”Computers & security, vol. 50, pp. 1–15, 2015

  61. [61]

    An end-to-end method for advanced persistent threats reconstruction in large-scale networks based on alert and log correlation,

    Y . Wang, Y . Guo, and C. Fang, “An end-to-end method for advanced persistent threats reconstruction in large-scale networks based on alert and log correlation,”Journal of Information Security and Applica- tions, vol. 71, p. 103373, 2022

  62. [62]

    XGBoost: A scalable tree boosting system,

    T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 785–

  63. [63]

    Chen and C

    [Online]. Available: https://doi.org/10.1145/2939672.2939785

  64. [64]

    Using optimized focal loss for imbalanced dataset on network intrusion detection system,

    Mulyanto, S. W. Prakosa, M. Faisal, and J.-S. Leu, “Using optimized focal loss for imbalanced dataset on network intrusion detection system,” in2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring), 2022, pp. 1–7. [Online]. Available: https://doi.org/10.1109/VTC2022-Spring54318.2022.9861034

  65. [65]

    A deep learning approach for intrusion detection in internet of things using focal loss function,

    A. S. Dina, A. Siddique, and D. Manivannan, “A deep learning approach for intrusion detection in internet of things using focal loss function,”Internet of Things, vol. 22, p. 100699, 2023. [Online]. Available: https://doi.org/10.1016/j.iot.2023.100699

  66. [66]

    Introducing a new alert data set for multi-step attack analysis,

    M. Landauer, F. Skopik, and M. Wurzenberger, “Introducing a new alert data set for multi-step attack analysis,” inProceedings of the 17th Cyber Security Experimentation and Test Workshop (CSET ’24). Association for Computing Machinery, 2024, pp. 41–53. [Online]. Available: https://doi.org/10.1145/3675741.3675748

  67. [67]

    Boss of the SOC (BOTS) v1 Dataset,

    Splunk Inc., “Boss of the SOC (BOTS) v1 Dataset,” Public dataset repository. https://github.com/splunk/botsv1, 2018, accessed: 2026- 04-27. Appendix A. Feature-retention and leakage-mitigation man- ifest To support reviewer-facing reproducibility for the leakage-mitigation filter described in Section 3, this ap- pendix lists the exact features retained in...