From CVE to CWE: Syscall-Based HIDS Generalisation
Pith reviewed 2026-06-26 10:02 UTC · model grok-4.3
The pith
Syscall anomaly detectors trained on multiple CVEs sharing a CWE class can detect unseen CVEs in that class for some weakness types but not others.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A combined CWE-level normal profile supports detection of an unseen CVE within the same class for CWE-307 broken authentication, reaching F1 = 0.6976 at target FPR = 0.05, but the same approach collapses to F1 <= 0.21 for CWE-89 SQL injection and CWE-434 unrestricted file upload. Cross-CVE transfer is asymmetric and governed by the breadth of the source normal profile rather than the CWE label.
What carries the argument
The combined CWE-level normal profile extracted from multiple CVEs to train one-class anomaly detectors on sliding-window syscall feature vectors.
If this is right
- Self-detection on the same CVE works reliably across the tested families.
- Combining CVEs into a single normal profile improves results only for certain classes like CWE-307.
- Transfer success between CVEs is direction-dependent and tied to normal profile breadth.
- Feature filtering does not substantially change transferability.
- Reporting at calibrated false positive rates is essential for valid comparisons.
Where Pith is reading between the lines
- Collecting diverse normal traces could benefit detection more than strict CWE grouping.
- Syscall features appear insufficient for reliable generalization in SQL injection and file upload classes.
- Operational systems might benefit from prioritizing weakness classes with consistent normal behavior across exploits.
- Further tests with additional CVEs per class would clarify the conditions for successful generalization.
Load-bearing premise
The normal syscall profiles collected from the chosen training CVEs are representative of normal behavior for other unseen CVEs in the same CWE class.
What would settle it
A new experiment showing F1 scores below 0.3 for the combined CWE-307 detector when applied to an additional unseen CVE from the same class under identical calibration would falsify the generalization claim for that family.
Figures
read the original abstract
Host intrusion detection systems (HIDS) based on system-call traces are typically trained and evaluated against individual Common Vulnerabilities and Exposures (CVE) instances. In operational settings, however, defenders need to recognise new exploits of an already known type of weakness. We empirically examine whether a one-class anomaly detector trained on the normal behaviour of a set of CVEs that share a Common Weakness Enumeration (CWE) class generalises to a different, unseen CVE inside the same class. Using six scenarios drawn from LID-DS-2021 and grouped into three CWE families (CWE-307 broken authentication, CWE-89 SQL injection, CWE-434 unrestricted file upload), we extract a 66-dimensional Peng-Guo-style feature vector per sliding window and train Isolation Forest and SGD One-Class SVM detectors with normal-only thresholds calibrated to fixed target false positive rates. We define and answer four research questions covering self-detection, asymmetric cross-CVE transfer, the value of a combined CWE-level normal profile, and the effect of feature filtering on transferability. The combined CWE-307 detector reaches F1 = 0.6976 at calibration target FPR = 0.05 (precision = 0.8994, recall = 0.5698), whereas CWE-89 and CWE-434 collapse to F1 <= 0.21 under the same protocol. Cross-CVE transfer turns out to be strongly direction-dependent and dominated by the breadth of the source normal profile rather than by the CWE label. We conclude that CWE-level generalisation in HIDS is empirically attainable for some but not all weakness families with current syscall features, and we argue that calibrated FPR is a methodological prerequisite for honest reporting in this setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript empirically examines whether one-class anomaly detectors (Isolation Forest, SGD One-Class SVM) trained on normal syscall profiles from multiple CVEs sharing a CWE class can generalize to an unseen CVE in the same class. Using six scenarios from LID-DS-2021 grouped into CWE-307, CWE-89, and CWE-434, 66-dimensional Peng-Guo-style features are extracted per sliding window; detectors are trained on normal-only data with thresholds calibrated to fixed target FPRs. Four research questions address self-detection, asymmetric cross-CVE transfer, the value of combined CWE-level profiles, and feature filtering. Key result: the combined CWE-307 detector reaches F1=0.6976 (precision=0.8994, recall=0.5698) at FPR=0.05, while CWE-89 and CWE-434 yield F1<=0.21; transfer is direction-dependent and dominated by source-profile breadth rather than CWE label. The authors conclude CWE-level generalization is attainable for some but not all families under current features.
Significance. If the empirical findings hold after verification of representativeness, the work demonstrates that CWE grouping can support generalization in syscall HIDS for certain weakness families when source normal profiles are sufficiently broad, while providing a cautionary example for others. It contributes concrete, FPR-calibrated performance numbers on held-out CVEs and stresses calibrated FPR as a methodological requirement for honest reporting. This could guide practical detector design in operational settings where new exploits of known weakness types must be caught.
major comments (2)
- [Experimental setup and research questions] The central claim of CWE-class generalization rests on the assumption that normal profiles from the selected training CVEs within each CWE are representative of the class. However, no details are provided on CVE selection criteria within families, intra-CWE profile variance, or any test confirming the held-out CVE lies within the support of the training normal distribution. This assumption is load-bearing, especially given the abstract's own observation that transfer is dominated by source-profile breadth rather than CWE label.
- [Results (performance tables and RQ answers)] The reported F1, precision, and recall values (e.g., CWE-307 combined detector at target FPR=0.05) are presented without statistical significance tests, confidence intervals, or explicit description of data splits and cross-validation. This makes it impossible to assess whether observed differences across CWE families are reliable, directly affecting verification of the claim that generalization succeeds for CWE-307 but collapses for the others.
minor comments (2)
- [Methods] The 66-dimensional feature vector is described as 'Peng-Guo-style' but would benefit from an explicit reference or brief definition in the methods to aid reproducibility.
- [Experimental design] Clarify whether the same six scenarios are used across all four research questions or if subsets are employed for specific transfer experiments.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that additional transparency on CVE selection and statistical reporting will strengthen the manuscript and will revise accordingly. Point-by-point responses to the major comments follow.
read point-by-point responses
-
Referee: [Experimental setup and research questions] The central claim of CWE-class generalization rests on the assumption that normal profiles from the selected training CVEs within each CWE are representative of the class. However, no details are provided on CVE selection criteria within families, intra-CWE profile variance, or any test confirming the held-out CVE lies within the support of the training normal distribution. This assumption is load-bearing, especially given the abstract's own observation that transfer is dominated by source-profile breadth rather than CWE label.
Authors: We acknowledge that the paper does not explicitly detail CVE selection criteria or provide intra-CWE variance statistics. The six scenarios were the complete set available in LID-DS-2021 that map to the three studied CWE classes, with assignment following the dataset's official CWE labels. No formal test of distributional support was performed. The empirical results, particularly the strong dependence on source-profile breadth, already illustrate the limits of the CWE label as a predictor. We will add a new subsection describing the selection process, basic variance metrics across normal profiles within each CWE, and an explicit discussion of the representativeness assumption as a limitation. revision: yes
-
Referee: [Results (performance tables and RQ answers)] The reported F1, precision, and recall values (e.g., CWE-307 combined detector at target FPR=0.05) are presented without statistical significance tests, confidence intervals, or explicit description of data splits and cross-validation. This makes it impossible to assess whether observed differences across CWE families are reliable, directly affecting verification of the claim that generalization succeeds for CWE-307 but collapses for the others.
Authors: We agree that the lack of confidence intervals and formal tests reduces interpretability. The experiments follow the fixed scenario splits provided by LID-DS-2021; cross-validation is not applicable given the small number of CVEs per CWE. We will augment all performance tables with bootstrap 95% confidence intervals computed over the test windows and add a limitations paragraph noting that formal significance testing between CWE families is under-powered with only three groups. These additions will allow readers to better gauge the reliability of the reported differences. revision: yes
Circularity Check
No significant circularity; purely empirical evaluation with direct measurements
full rationale
The paper conducts an empirical study training one-class anomaly detectors (Isolation Forest, SGD One-Class SVM) on syscall feature vectors from selected CVEs within CWE families and measuring performance (F1, precision, recall) on held-out CVEs. No derivations, equations, fitted parameters renamed as predictions, or self-citations are load-bearing for the central claims. Reported metrics are direct experimental outcomes on the test splits; the direction-dependent transfer results are likewise measured outcomes rather than constructed by definition. The representativeness assumption is an empirical premise open to falsification by the experiments themselves and does not reduce the reported results to tautology.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Syscall traces from different CVEs within the same CWE share enough normal-behavior structure for a single one-class model to generalize.
Reference graph
Works this paper leans on
-
[1]
Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for Unix processes. In: Proc. IEEE Symp. Security and Privacy, pp. 120–128 (1996). doi: 10.1109/SECPRI.1996.502675 16 Kozachok, Vyugov, Magomedov
-
[2]
Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur.6(3), 151–180 (1998). doi:10.3233/JCS-980109
-
[3]
Liao, Y., Vemuri, V.R.: Use ofk-nearest neighbor classifier for intrusion detection. Comput. Secur.21(5), 439–448 (2002). doi:10.1016/S0167-4048(02)00514-X
-
[4]
Kang, D.K., Fuller, D., Honavar, V.: Learning classifiers for misuse and anomaly detection using a bag of system calls representation. In: Proc. IEEE SMC Infor- mation Assurance Workshop, pp. 118–125 (2005). doi:10.1109/IAW.2005.1495944
-
[5]
Maggi, F., Matteucci, M., Zanero, S.: Detecting intrusions through system call sequence and argument analysis. IEEE Trans. Depend. Secur. Comput.7(4), 381– 395 (2010). doi:10.1109/TDSC.2008.69
-
[6]
Creech, G., Hu, J.: A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Trans. Comput. 63(4), 807–819 (2014). doi:10.1109/TC.2013.13
-
[7]
In: D-A-CH Security 2019, pp
Grimmer, M., Roehling, M.M., Kreusel, D., Rechert, K.: A modern and sophisti- cated host based intrusion detection data set. In: D-A-CH Security 2019, pp. 135–
2019
-
[8]
Grimmer, M., Kaelble, T., Rucks, F., Pirl, J.: LID-DS 2021 – A modern host-based intrusion detection data set. Mendeley Data, v3 (2021). doi:10.17632/4xj3p3z5kj.3
-
[9]
In: Proc
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation Forest. In: Proc. ICDM 2008, pp. 413–
2008
- [10]
-
[11]
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Es- timating the support of a high-dimensional distribution. Neural Comput.13(7), 1443–1471 (2001). doi:10.1162/089976601750264965
-
[12]
Guo, P.: Intrusion detection based on complete system call information. In: Proc. DSAI 2024, pp. 1–5. ACM (2024). doi:10.1145/3677892.3677893
- [13]
-
[14]
Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection. In: Proc. IEEE Symp. Security and Privacy, pp. 305– 316 (2010). doi:10.1109/SP.2010.25
-
[15]
Tunde-Onadele, O., He, J., Dai, T., Gu, X.: A study on container vulnerability exploit detection. In: Proc. IEEE IC2E, pp. 121–127 (2019). doi:10.1109/IC2E. 2019.00026
-
[16]
Lin,Y.,Tunde-Onadele,O.,Gu,X.:CDL:Classifieddistributedlearningfordetect- ing security attacks in containerized applications. In: Proc. ACSAC, pp. 179–188 (2020). doi:10.1145/3427228.3427236
-
[17]
Lin, Y., Tunde-Onadele, O., Gu, X., He, J., Latapie, H.: SHIL: Self-supervised hybrid learning for security attack detection in containerized applications. In: Proc. IEEE ACSOS, pp. 41–50 (2022). doi:10.1109/ACSOS55765.2022.00022
-
[18]
Tunde-Onadele, O., Lin, Y., Gu, X., He, J., Latapie, H.: A self-supervised machine learning framework for online container security attack detection. ACM Trans. Auton. Adapt. Syst.19(3), 17 (2024). doi:10.1145/3665795
-
[19]
Suneja, S., Kanso, A., Le, M., Isci, C.: SecQuant: quantifying container security exposure. In: Proc. ESORICS 2022, LNCS 13554, pp. 525–546. Springer (2022). doi:10.1007/978-3-031-17143-7_26
-
[20]
Aghaei, E., Shadid, W., Al-Shaer, E.: ThreatZoom: CVE2CWE using hierarchical neural network. In: Proc. SecureComm 2020, LNICST 335, pp. 23–41. Springer (2020). doi:10.1007/978-3-030-63086-7_2 From CVE to CWE: Syscall-Based HIDS Generalisation 17
-
[21]
Das, S.S., Serra, E., Halappanavar, M., Pothen, A., Al-Shaer, E.: V2W-BERT: A framework for effective hierarchical multiclass classification of software vulnerabil- ities. In: Proc. IEEE DSAA 2021, pp. 1–12 (2021). doi:10.1109/DSAA53316.2021. 9564227
-
[22]
Pan, S., Bao, L., Xia, X., Lo, D., Li, S.: Fine-grained commit-level vulnerability type prediction by CWE tree structure. In: Proc. ICSE 2023, pp. 957–969 (2023). doi:10.1109/ICSE48619.2023.00088
-
[23]
Li, L., Ding, S.H.H., Tian, Y., Fung, B.C.M., Charland, P., Ou, W., Song, L., Chen, C.: VulANalyzeR: Explainable binary vulnerability detection with multi- task learning and attentional graph convolution. ACM Trans. Priv. Secur.26(3), 1–25 (2023). doi:10.1145/3585386
-
[24]
Atiiq, S.A., Gehrmann, C., Dahlen, K., Khalil, K.: From generalist to specialist: exploring CWE-specific vulnerability detection. In: Proc. ARES 2024, pp. 1–12 (2024). doi:10.1145/3664476.3670872
-
[25]
Uddin, M.A., Aryal, S., Bouadjenek, M.R., Al-Hawawreh, M., Talukder, M.A.: Hierarchical classification for intrusion detection system: effective design and em- pirical analysis. arXiv:2403.13013 (2024). https://arxiv.org/abs/2403.13013
arXiv 2024
-
[26]
Lopez-Martin, M., Sanchez-Esguevillas, A., Arribas, J.I., Carro, B.: Supervised contrastive learning over prototype-label embeddings for network intrusion detec- tion. Inf. Fusion79, 200–228 (2022). doi:10.1016/j.inffus.2021.09.014
-
[27]
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutor.16(1), 303–336 (2014). doi:10.1109/SURV.2013.052213.00046
-
[28]
Garcia-Teodoro, P., Díaz-Verdejo, J., Maciá-Fernández, G., Vázquez, E.: Anomaly- based network intrusion detection: techniques, systems and challenges. Comput. Secur.28(1–2), 18–28 (2009). doi:10.1016/j.cose.2008.08.003
-
[29]
IEEE Access8, 6249–6271 (2020)
Aslan, Ö., Samet, R.: A comprehensive review on malware detection approaches. IEEE Access8, 6249–6271 (2020). doi:10.1109/ACCESS.2019.2963724
-
[30]
https://cwe
MITRE Corporation: Common Weakness Enumeration, version 4.15. https://cwe. mitre.org/ (Accessed: 1 May 2026)
2026
-
[31]
https://www.cve.org/ (Accessed: 1 May 2026)
MITRE Corporation: CVE Program. https://www.cve.org/ (Accessed: 1 May 2026)
2026
-
[32]
Zhang, J., Wei, F., Hu, X., Yang, B., Xie, F., Liu, S.: MCLDM: multi-channel contrastive learning network for intrusion detection. Comput. Netw.237, 110083 (2023). doi:10.1016/j.comnet.2023.110083
-
[33]
Canbek, G., Temizel, T.T., Sagiroglu, S.: PToPI: A comprehensive review, anal- ysis, and knowledge representation of binary classification performance mea- sures/metrics. SN Comput. Sci.4, 13 (2022). doi:10.1007/s42979-022-01409-1
-
[34]
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Adv. Neural Inf. Process. Syst. 20 (NIPS 2007), pp. 1177–1184. MIT Press (2008)
2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.