pith. sign in

arxiv: 2606.22581 · v1 · pith:AZM3OUEPnew · submitted 2026-06-21 · 💻 cs.CR · cs.AI

From CVE to CWE: Syscall-Based HIDS Generalisation

Pith reviewed 2026-06-26 10:02 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords syscall traceshost intrusion detectionCWE generalizationone-class anomaly detectionCVE to CWE transferfalse positive rate calibrationIsolation ForestOne-Class SVM
0
0 comments X

The pith

Syscall anomaly detectors trained on multiple CVEs sharing a CWE class can detect unseen CVEs in that class for some weakness types but not others.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether training one-class anomaly detectors on normal syscall behavior from several CVEs in the same CWE class allows detection of a new CVE in that class. Experiments use six scenarios grouped into CWE-307, CWE-89, and CWE-434, with 66-dimensional feature vectors and detectors calibrated to fixed false positive rates. The combined profile for CWE-307 achieves an F1 score of 0.6976 at 5% FPR, while the other classes show F1 scores of 0.21 or lower. Transfer performance depends primarily on the breadth of the normal training data rather than the shared CWE label. This indicates that CWE-level generalization in HIDS is feasible selectively with existing syscall features.

Core claim

A combined CWE-level normal profile supports detection of an unseen CVE within the same class for CWE-307 broken authentication, reaching F1 = 0.6976 at target FPR = 0.05, but the same approach collapses to F1 <= 0.21 for CWE-89 SQL injection and CWE-434 unrestricted file upload. Cross-CVE transfer is asymmetric and governed by the breadth of the source normal profile rather than the CWE label.

What carries the argument

The combined CWE-level normal profile extracted from multiple CVEs to train one-class anomaly detectors on sliding-window syscall feature vectors.

If this is right

  • Self-detection on the same CVE works reliably across the tested families.
  • Combining CVEs into a single normal profile improves results only for certain classes like CWE-307.
  • Transfer success between CVEs is direction-dependent and tied to normal profile breadth.
  • Feature filtering does not substantially change transferability.
  • Reporting at calibrated false positive rates is essential for valid comparisons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Collecting diverse normal traces could benefit detection more than strict CWE grouping.
  • Syscall features appear insufficient for reliable generalization in SQL injection and file upload classes.
  • Operational systems might benefit from prioritizing weakness classes with consistent normal behavior across exploits.
  • Further tests with additional CVEs per class would clarify the conditions for successful generalization.

Load-bearing premise

The normal syscall profiles collected from the chosen training CVEs are representative of normal behavior for other unseen CVEs in the same CWE class.

What would settle it

A new experiment showing F1 scores below 0.3 for the combined CWE-307 detector when applied to an additional unseen CVE from the same class under identical calibration would falsify the generalization claim for that family.

Figures

Figures reproduced from arXiv: 2606.22581 by Alexander V. Kozachok, Shamil G. Magomedov, Stanislav G. Vyugov.

Figure 1
Figure 1. Figure 1: Calibrated one-class CWE detection pipeline. Normal-only training and cali￾bration on the left; target FPR sets the threshold; the decision on the right is binary at window level and supports self, cross-CVE and combined evaluations. Per-CVE detector CVE-2012-2122 modeli EPS_CWE-434 modeli CWE-89-SQLi modeli One model per CVE: no transfer to a new exploit of the same defect class CWE-level detector (this w… view at source ↗
Figure 2
Figure 2. Figure 2: Per-CVE detection (left) versus the CWE-level detector studied in this paper (right). The right panel pools the normal profiles of several CVEs that share a CWE class into a single combined detector [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cross-CVE transfer F1 inside each CWE family. The diagonals show self￾detection (uncalibrated). The off-diagonal cells show the F1 of an anomaly model fitted on the source CVE and applied to the target CVE [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: F1 across protocols and CWE families. “Best transfer” is taken over the seven feature sets of [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Combined CWE-307 detector under three target FPRs. The realised FPR tracks the target FPR closely, which is the desired behaviour of Algorithm 1. Recall and F1 grow with α, while precision declines smoothly. memory-management patterns specific to the runtime (PHP vs. EPS) rather than the upload-then-execute motif. The feature extractor does not surface a single upload-then-execute temporal motif, so the re… view at source ↗
Figure 6
Figure 6. Figure 6: Two-sample KS distance between the normal-window distributions of the two CWE-307 scenarios. Resource-size and PID-switch features dominate the shifted group; the lseek and pread byte counters are essentially identical in both normals. with additional CWE classes (e.g. CWE-22 path traversal, CWE-94 code injec￾tion) that are not yet covered here. Our feature extractor follows Peng Guo [11]; richer represent… view at source ↗
Figure 7
Figure 7. Figure 7: Effect of feature filters on cross-CVE transfer F1 inside CWE-307. The most aggressive normal-domain stability filter (stable) destroys the strong direction; importance-only top-20 (score_l0p0) preserves most of it. with structural patterns that are invariant to application identity – graph-based syscall models [11] and contrastive prototype embeddings [25] are concrete can￾didates – or adopt explicit mult… view at source ↗
read the original abstract

Host intrusion detection systems (HIDS) based on system-call traces are typically trained and evaluated against individual Common Vulnerabilities and Exposures (CVE) instances. In operational settings, however, defenders need to recognise new exploits of an already known type of weakness. We empirically examine whether a one-class anomaly detector trained on the normal behaviour of a set of CVEs that share a Common Weakness Enumeration (CWE) class generalises to a different, unseen CVE inside the same class. Using six scenarios drawn from LID-DS-2021 and grouped into three CWE families (CWE-307 broken authentication, CWE-89 SQL injection, CWE-434 unrestricted file upload), we extract a 66-dimensional Peng-Guo-style feature vector per sliding window and train Isolation Forest and SGD One-Class SVM detectors with normal-only thresholds calibrated to fixed target false positive rates. We define and answer four research questions covering self-detection, asymmetric cross-CVE transfer, the value of a combined CWE-level normal profile, and the effect of feature filtering on transferability. The combined CWE-307 detector reaches F1 = 0.6976 at calibration target FPR = 0.05 (precision = 0.8994, recall = 0.5698), whereas CWE-89 and CWE-434 collapse to F1 <= 0.21 under the same protocol. Cross-CVE transfer turns out to be strongly direction-dependent and dominated by the breadth of the source normal profile rather than by the CWE label. We conclude that CWE-level generalisation in HIDS is empirically attainable for some but not all weakness families with current syscall features, and we argue that calibrated FPR is a methodological prerequisite for honest reporting in this setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript empirically examines whether one-class anomaly detectors (Isolation Forest, SGD One-Class SVM) trained on normal syscall profiles from multiple CVEs sharing a CWE class can generalize to an unseen CVE in the same class. Using six scenarios from LID-DS-2021 grouped into CWE-307, CWE-89, and CWE-434, 66-dimensional Peng-Guo-style features are extracted per sliding window; detectors are trained on normal-only data with thresholds calibrated to fixed target FPRs. Four research questions address self-detection, asymmetric cross-CVE transfer, the value of combined CWE-level profiles, and feature filtering. Key result: the combined CWE-307 detector reaches F1=0.6976 (precision=0.8994, recall=0.5698) at FPR=0.05, while CWE-89 and CWE-434 yield F1<=0.21; transfer is direction-dependent and dominated by source-profile breadth rather than CWE label. The authors conclude CWE-level generalization is attainable for some but not all families under current features.

Significance. If the empirical findings hold after verification of representativeness, the work demonstrates that CWE grouping can support generalization in syscall HIDS for certain weakness families when source normal profiles are sufficiently broad, while providing a cautionary example for others. It contributes concrete, FPR-calibrated performance numbers on held-out CVEs and stresses calibrated FPR as a methodological requirement for honest reporting. This could guide practical detector design in operational settings where new exploits of known weakness types must be caught.

major comments (2)
  1. [Experimental setup and research questions] The central claim of CWE-class generalization rests on the assumption that normal profiles from the selected training CVEs within each CWE are representative of the class. However, no details are provided on CVE selection criteria within families, intra-CWE profile variance, or any test confirming the held-out CVE lies within the support of the training normal distribution. This assumption is load-bearing, especially given the abstract's own observation that transfer is dominated by source-profile breadth rather than CWE label.
  2. [Results (performance tables and RQ answers)] The reported F1, precision, and recall values (e.g., CWE-307 combined detector at target FPR=0.05) are presented without statistical significance tests, confidence intervals, or explicit description of data splits and cross-validation. This makes it impossible to assess whether observed differences across CWE families are reliable, directly affecting verification of the claim that generalization succeeds for CWE-307 but collapses for the others.
minor comments (2)
  1. [Methods] The 66-dimensional feature vector is described as 'Peng-Guo-style' but would benefit from an explicit reference or brief definition in the methods to aid reproducibility.
  2. [Experimental design] Clarify whether the same six scenarios are used across all four research questions or if subsets are employed for specific transfer experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that additional transparency on CVE selection and statistical reporting will strengthen the manuscript and will revise accordingly. Point-by-point responses to the major comments follow.

read point-by-point responses
  1. Referee: [Experimental setup and research questions] The central claim of CWE-class generalization rests on the assumption that normal profiles from the selected training CVEs within each CWE are representative of the class. However, no details are provided on CVE selection criteria within families, intra-CWE profile variance, or any test confirming the held-out CVE lies within the support of the training normal distribution. This assumption is load-bearing, especially given the abstract's own observation that transfer is dominated by source-profile breadth rather than CWE label.

    Authors: We acknowledge that the paper does not explicitly detail CVE selection criteria or provide intra-CWE variance statistics. The six scenarios were the complete set available in LID-DS-2021 that map to the three studied CWE classes, with assignment following the dataset's official CWE labels. No formal test of distributional support was performed. The empirical results, particularly the strong dependence on source-profile breadth, already illustrate the limits of the CWE label as a predictor. We will add a new subsection describing the selection process, basic variance metrics across normal profiles within each CWE, and an explicit discussion of the representativeness assumption as a limitation. revision: yes

  2. Referee: [Results (performance tables and RQ answers)] The reported F1, precision, and recall values (e.g., CWE-307 combined detector at target FPR=0.05) are presented without statistical significance tests, confidence intervals, or explicit description of data splits and cross-validation. This makes it impossible to assess whether observed differences across CWE families are reliable, directly affecting verification of the claim that generalization succeeds for CWE-307 but collapses for the others.

    Authors: We agree that the lack of confidence intervals and formal tests reduces interpretability. The experiments follow the fixed scenario splits provided by LID-DS-2021; cross-validation is not applicable given the small number of CVEs per CWE. We will augment all performance tables with bootstrap 95% confidence intervals computed over the test windows and add a limitations paragraph noting that formal significance testing between CWE families is under-powered with only three groups. These additions will allow readers to better gauge the reliability of the reported differences. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical evaluation with direct measurements

full rationale

The paper conducts an empirical study training one-class anomaly detectors (Isolation Forest, SGD One-Class SVM) on syscall feature vectors from selected CVEs within CWE families and measuring performance (F1, precision, recall) on held-out CVEs. No derivations, equations, fitted parameters renamed as predictions, or self-citations are load-bearing for the central claims. Reported metrics are direct experimental outcomes on the test splits; the direction-dependent transfer results are likewise measured outcomes rather than constructed by definition. The representativeness assumption is an empirical premise open to falsification by the experiments themselves and does not reduce the reported results to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that Peng-Guo-style syscall features plus sliding windows are sufficient to separate normal from anomalous behavior within a CWE class; no free parameters are explicitly fitted beyond the choice of target FPR, and no new entities are postulated.

axioms (1)
  • domain assumption Syscall traces from different CVEs within the same CWE share enough normal-behavior structure for a single one-class model to generalize.
    This premise is required for the cross-CVE transfer experiments to be meaningful and is invoked by grouping the six scenarios into three CWE families.

pith-pipeline@v0.9.1-grok · 5856 in / 1445 out tokens · 33066 ms · 2026-06-26T10:02:03.932758+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 27 canonical work pages

  1. [1]

    In: Proc

    Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for Unix processes. In: Proc. IEEE Symp. Security and Privacy, pp. 120–128 (1996). doi: 10.1109/SECPRI.1996.502675 16 Kozachok, Vyugov, Magomedov

  2. [2]

    Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur.6(3), 151–180 (1998). doi:10.3233/JCS-980109

  3. [3]

    Liao, Y., Vemuri, V.R.: Use ofk-nearest neighbor classifier for intrusion detection. Comput. Secur.21(5), 439–448 (2002). doi:10.1016/S0167-4048(02)00514-X

  4. [4]

    In: Proc

    Kang, D.K., Fuller, D., Honavar, V.: Learning classifiers for misuse and anomaly detection using a bag of system calls representation. In: Proc. IEEE SMC Infor- mation Assurance Workshop, pp. 118–125 (2005). doi:10.1109/IAW.2005.1495944

  5. [5]

    IEEE Trans

    Maggi, F., Matteucci, M., Zanero, S.: Detecting intrusions through system call sequence and argument analysis. IEEE Trans. Depend. Secur. Comput.7(4), 381– 395 (2010). doi:10.1109/TDSC.2008.69

  6. [6]

    IEEE Trans

    Creech, G., Hu, J.: A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Trans. Comput. 63(4), 807–819 (2014). doi:10.1109/TC.2013.13

  7. [7]

    In: D-A-CH Security 2019, pp

    Grimmer, M., Roehling, M.M., Kreusel, D., Rechert, K.: A modern and sophisti- cated host based intrusion detection data set. In: D-A-CH Security 2019, pp. 135–

  8. [8]

    Mendeley Data, v3 (2021)

    Grimmer, M., Kaelble, T., Rucks, F., Pirl, J.: LID-DS 2021 – A modern host-based intrusion detection data set. Mendeley Data, v3 (2021). doi:10.17632/4xj3p3z5kj.3

  9. [9]

    In: Proc

    Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation Forest. In: Proc. ICDM 2008, pp. 413–

  10. [10]

    Isolation forest,

    IEEE (2008). doi:10.1109/ICDM.2008.17

  11. [11]

    Platt, John C

    Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Es- timating the support of a high-dimensional distribution. Neural Comput.13(7), 1443–1471 (2001). doi:10.1162/089976601750264965

  12. [12]

    In: Proc

    Guo, P.: Intrusion detection based on complete system call information. In: Proc. DSAI 2024, pp. 1–5. ACM (2024). doi:10.1145/3677892.3677893

  13. [13]

    In: Proc

    El Khairi, A., Caselli, M., Knierim, C., Peter, A., Continella, A.: Contextualiz- ing system calls in containers for anomaly-based intrusion detection. In: Proc. ACMCloudComputingSecurityWorkshop(CCSW),pp.9–21(2022).doi:10.1145/ 3560810.3564266

  14. [14]

    In: Proc

    Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection. In: Proc. IEEE Symp. Security and Privacy, pp. 305– 316 (2010). doi:10.1109/SP.2010.25

  15. [15]

    In: Proc

    Tunde-Onadele, O., He, J., Dai, T., Gu, X.: A study on container vulnerability exploit detection. In: Proc. IEEE IC2E, pp. 121–127 (2019). doi:10.1109/IC2E. 2019.00026

  16. [16]

    In: Proc

    Lin,Y.,Tunde-Onadele,O.,Gu,X.:CDL:Classifieddistributedlearningfordetect- ing security attacks in containerized applications. In: Proc. ACSAC, pp. 179–188 (2020). doi:10.1145/3427228.3427236

  17. [17]

    In: Proc

    Lin, Y., Tunde-Onadele, O., Gu, X., He, J., Latapie, H.: SHIL: Self-supervised hybrid learning for security attack detection in containerized applications. In: Proc. IEEE ACSOS, pp. 41–50 (2022). doi:10.1109/ACSOS55765.2022.00022

  18. [18]

    ACM Trans

    Tunde-Onadele, O., Lin, Y., Gu, X., He, J., Latapie, H.: A self-supervised machine learning framework for online container security attack detection. ACM Trans. Auton. Adapt. Syst.19(3), 17 (2024). doi:10.1145/3665795

  19. [19]

    In: Proc

    Suneja, S., Kanso, A., Le, M., Isci, C.: SecQuant: quantifying container security exposure. In: Proc. ESORICS 2022, LNCS 13554, pp. 525–546. Springer (2022). doi:10.1007/978-3-031-17143-7_26

  20. [20]

    In: Proc

    Aghaei, E., Shadid, W., Al-Shaer, E.: ThreatZoom: CVE2CWE using hierarchical neural network. In: Proc. SecureComm 2020, LNICST 335, pp. 23–41. Springer (2020). doi:10.1007/978-3-030-63086-7_2 From CVE to CWE: Syscall-Based HIDS Generalisation 17

  21. [21]

    In: Proc

    Das, S.S., Serra, E., Halappanavar, M., Pothen, A., Al-Shaer, E.: V2W-BERT: A framework for effective hierarchical multiclass classification of software vulnerabil- ities. In: Proc. IEEE DSAA 2021, pp. 1–12 (2021). doi:10.1109/DSAA53316.2021. 9564227

  22. [22]

    In: Proc

    Pan, S., Bao, L., Xia, X., Lo, D., Li, S.: Fine-grained commit-level vulnerability type prediction by CWE tree structure. In: Proc. ICSE 2023, pp. 957–969 (2023). doi:10.1109/ICSE48619.2023.00088

  23. [23]

    ACM Trans

    Li, L., Ding, S.H.H., Tian, Y., Fung, B.C.M., Charland, P., Ou, W., Song, L., Chen, C.: VulANalyzeR: Explainable binary vulnerability detection with multi- task learning and attentional graph convolution. ACM Trans. Priv. Secur.26(3), 1–25 (2023). doi:10.1145/3585386

  24. [24]

    In: Proc

    Atiiq, S.A., Gehrmann, C., Dahlen, K., Khalil, K.: From generalist to specialist: exploring CWE-specific vulnerability detection. In: Proc. ARES 2024, pp. 1–12 (2024). doi:10.1145/3664476.3670872

  25. [25]

    arXiv:2403.13013 (2024)

    Uddin, M.A., Aryal, S., Bouadjenek, M.R., Al-Hawawreh, M., Talukder, M.A.: Hierarchical classification for intrusion detection system: effective design and em- pirical analysis. arXiv:2403.13013 (2024). https://arxiv.org/abs/2403.13013

  26. [26]

    Lopez-Martin, M., Sanchez-Esguevillas, A., Arribas, J.I., Carro, B.: Supervised contrastive learning over prototype-label embeddings for network intrusion detec- tion. Inf. Fusion79, 200–228 (2022). doi:10.1016/j.inffus.2021.09.014

  27. [27]

    IEEE Commun

    Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutor.16(1), 303–336 (2014). doi:10.1109/SURV.2013.052213.00046

  28. [28]

    Garcia-Teodoro, P., Díaz-Verdejo, J., Maciá-Fernández, G., Vázquez, E.: Anomaly- based network intrusion detection: techniques, systems and challenges. Comput. Secur.28(1–2), 18–28 (2009). doi:10.1016/j.cose.2008.08.003

  29. [29]

    IEEE Access8, 6249–6271 (2020)

    Aslan, Ö., Samet, R.: A comprehensive review on malware detection approaches. IEEE Access8, 6249–6271 (2020). doi:10.1109/ACCESS.2019.2963724

  30. [30]

    https://cwe

    MITRE Corporation: Common Weakness Enumeration, version 4.15. https://cwe. mitre.org/ (Accessed: 1 May 2026)

  31. [31]

    https://www.cve.org/ (Accessed: 1 May 2026)

    MITRE Corporation: CVE Program. https://www.cve.org/ (Accessed: 1 May 2026)

  32. [32]

    Zhang, J., Wei, F., Hu, X., Yang, B., Xie, F., Liu, S.: MCLDM: multi-channel contrastive learning network for intrusion detection. Comput. Netw.237, 110083 (2023). doi:10.1016/j.comnet.2023.110083

  33. [33]

    SN Comput

    Canbek, G., Temizel, T.T., Sagiroglu, S.: PToPI: A comprehensive review, anal- ysis, and knowledge representation of binary classification performance mea- sures/metrics. SN Comput. Sci.4, 13 (2022). doi:10.1007/s42979-022-01409-1

  34. [34]

    Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Adv. Neural Inf. Process. Syst. 20 (NIPS 2007), pp. 1177–1184. MIT Press (2008)