NetVAD: Foundation-Model Representation Learning for Identifier-Free Unsupervised Intrusion Detection
Pith reviewed 2026-06-28 16:31 UTC · model grok-4.3
The pith
A frozen foundation model representation inside an identifier-free VAE detects network attacks at 98% micro F1 when trained only on benign traffic.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NetVAD is a strictly identifier-free variational autoencoder that takes representations from a frozen Foundation Model, projects them into a task-specific latent space, and is trained solely on benign traffic. On ToN-IoT this produces a 98% Micro F1-score and 96% Macro F1-score at an operational false positive rate while reporting results for every attack class; the model reaches 99.6% F1 on Okiru botnet traffic yet shows limitations on single-packet reconnaissance. The architecture relies on the foundation-model representations to encode sufficient structure of the benign manifold so that attack traffic yields measurably higher reconstruction loss.
What carries the argument
The identifier-free VAE decoder that projects frozen foundation-model network representations into a task-specific latent space and scores anomalies by reconstruction loss.
If this is right
- Performance holds across multiple attack classes when foundation-model representations replace hand-crafted features.
- Specialized decoder architectures are required to model the complex benign manifold precisely enough for attack separation.
- Large-scale pre-training of the backbone is essential; removing it causes measurable performance drop.
- Flow-based foundation models remain limited on single-packet reconnaissance events even with the VAE head.
Where Pith is reading between the lines
- The same frozen-representation approach could be tested on other security tasks that currently require task-specific labeled data.
- Extending the decoder to incorporate packet-level features might close the gap on reconnaissance detection without retraining the backbone.
- Deployment in networks that drop identifiers would benefit directly from the identifier-free design.
- The per-class transparency requirement suggests future unsupervised detectors should report attack-type breakdowns rather than aggregate scores only.
Load-bearing premise
The representations produced by the frozen foundation model contain sufficient information about the benign traffic manifold that a task-specific VAE decoder can reliably assign higher reconstruction loss to attack traffic across all classes.
What would settle it
Measuring reconstruction loss distributions on a held-out set where attack traffic produces losses indistinguishable from benign traffic would falsify the separation claim.
Figures
read the original abstract
Detecting zero-day exploits in production networks requires robust Intrusion Detection Systems (IDS). However, current unsupervised models struggle to match the performance of supervised classifiers, which are trained for specific attacks only. To bridge this gap, we leverage the emerging capabilities of Network Foundation Models. We propose \textit{NetVAD}, a strictly identifier-free Variational Autoencoder that projects representations from a frozen Foundation Model into a task-specific latent space, trained solely on benign traffic. Evaluated on ToN-IoT and IoT-23, NetVAD achieves highly competitive unsupervised performance. On ToN-IoT, it achieves a 98% Micro F1-score and a 96% Macro F1-score at an operational false positive rate. Unlike prior work, we show the model's performance transparently for all attack-classes of the datasets. While the architecture excels at discerning complex botnet behaviour (99.6% F1 on Okiru), our evaluation reveals limitations of flow-based Foundation Models in detecting single-packet reconnaissance events. Finally, a comprehensive ablation study confirms that while large-scale pre-training is essential to prevent performance degrading, specialised decoder architectures are necessary to precisely model the complex benign manifold, ensuring attacks are caught more reliably, due to a higher reconstruction loss.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes NetVAD, a strictly identifier-free unsupervised IDS that feeds frozen representations from a network foundation model into a task-specific VAE decoder trained exclusively on benign traffic. It reports competitive performance on ToN-IoT (98% micro F1, 96% macro F1 at operational FPR) and IoT-23, provides per-class F1 scores (strong on botnets like Okiru at 99.6%, weaker on single-packet reconnaissance), and includes an ablation study claiming both large-scale pre-training and specialized decoder architectures are required.
Significance. If the central performance claims hold under rigorous evaluation, the work would meaningfully advance unsupervised network anomaly detection by showing how foundation-model representations can enable a VAE to reliably separate attacks via reconstruction error without identifiers or attack-specific supervision. The explicit per-class transparency and acknowledgment of limitations on reconnaissance flows are positive; the ablation on pre-training and decoder design adds useful evidence if properly controlled.
major comments (2)
- [Experiments / Evaluation] Evaluation protocol (Experiments section): the abstract and reported F1 scores (98% micro / 96% macro on ToN-IoT) are presented without any description of dataset splits, number of independent runs, error bars, or statistical tests. This absence directly undermines verification of the central performance claim and the ablation conclusions.
- [§4 (or equivalent evaluation subsection)] Threshold selection and operational FPR: the paper states results 'at an operational false positive rate' but provides no concrete procedure for choosing or validating this threshold on held-out benign data, which is load-bearing for the reported F1 numbers and for claims of practical utility.
minor comments (2)
- [Abstract / Introduction] The abstract claims the model is 'strictly identifier-free' yet does not explicitly contrast this with prior work that may have used flow identifiers; a short clarification in the introduction would help.
- [Method] Notation for the VAE latent space and reconstruction loss could be introduced earlier with a small diagram to aid readers unfamiliar with the foundation-model + VAE pipeline.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the evaluation protocol and threshold selection. We address each major comment below and will revise the manuscript accordingly to enhance reproducibility and clarity.
read point-by-point responses
-
Referee: [Experiments / Evaluation] Evaluation protocol (Experiments section): the abstract and reported F1 scores (98% micro / 96% macro on ToN-IoT) are presented without any description of dataset splits, number of independent runs, error bars, or statistical tests. This absence directly undermines verification of the central performance claim and the ablation conclusions.
Authors: We agree that the Experiments section lacks sufficient detail on dataset splits, the number of independent runs, error bars, and statistical tests, which is necessary for full verification of the reported F1 scores and ablation results. In the revised manuscript, we will expand the evaluation protocol description to specify the exact train/test splits (benign-only training), the number of independent runs (e.g., 5 runs with reported means and standard deviations), inclusion of error bars on all metrics, and any statistical tests performed. These additions will directly address the concern. revision: yes
-
Referee: [§4 (or equivalent evaluation subsection)] Threshold selection and operational FPR: the paper states results 'at an operational false positive rate' but provides no concrete procedure for choosing or validating this threshold on held-out benign data, which is load-bearing for the reported F1 numbers and for claims of practical utility.
Authors: We acknowledge that the specific procedure for selecting and validating the operational threshold on held-out benign data was not detailed, which is important for interpreting the F1 scores and practical claims. We will revise the evaluation subsection to explicitly describe the threshold selection method, including how a target FPR is achieved and validated using held-out benign traffic (e.g., via a validation split to set the threshold corresponding to a specific FPR level). This will be incorporated in the next version. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents a standard unsupervised VAE trained solely on benign traffic using frozen foundation-model representations, with performance evaluated on separate attack classes in ToN-IoT and IoT-23. No equations, fitted parameters, or self-citations are shown that reduce the reported F1 scores or reconstruction losses to definitions by construction. The central claim rests on external pre-trained models and a conventional reconstruction objective, with explicit ablations and per-class results that remain independently falsifiable; the derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
netfound: Foundation model for network security,
S. Guthula, R. Beltiukov, N. Battula, W. Guo, A. Gupta, and I. Monga, “netfound: Foundation model for network security,” 2025. [Online]. Available: https://arxiv.org/abs/2310.17025
Pith/arXiv arXiv 2025
-
[2]
Lens: A knowledge-guided foundation model for network traffic,
X. Li, C. Qian, Q. Wang, J. Kong, Y . Wang, Z. Yao, B. Ji, L. Cheng, G. Zhou, and H. Shao, “Lens: A knowledge-guided foundation model for network traffic,”arXiv e-prints, pp. arXiv–2402, 2024
2024
-
[3]
Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,
X. Lin, G. Xiong, G. Gou, Z. Li, J. Shi, and J. Yu, “Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,” inProceedings of the ACM Web Conference 2022, 2022, pp. 633–642
2022
-
[4]
Robust iot security using isolation forest and one class svm algorithms,
A. Zahoor, W. Abbasi, M. Z. Babar, and A. Aljohani, “Robust iot security using isolation forest and one class svm algorithms,”Scientific Reports, vol. 15, no. 1, p. 36586, 2025
2025
-
[5]
Local intrinsic dimensionality of iot networks for unsupervised intrusion detection,
M. Gorbett, H. Shirazi, and I. Ray, “Local intrinsic dimensionality of iot networks for unsupervised intrusion detection,” inIFIP Annual Conference on Data and Applications Security and Privacy. Springer, 2022, pp. 143–161
2022
-
[6]
TON IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems,
A. Alsaedi, N. Moustafa, Z. Tari, A. Mahmood, and A. Anwar, “TON IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems,”IEEE Access, vol. 8, pp. 165 130–165 150, 2020
2020
-
[7]
Shortcut learning in deep neural networks,
R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann, “Shortcut learning in deep neural networks,”Nature Machine Intelligence, vol. 2, no. 11, pp. 665–673, 2020
2020
-
[8]
One-class intrusion detection with dynamic graphs,
A. Liuliakov, A. Schulz, L. Hermes, and B. Hammer, “One-class intrusion detection with dynamic graphs,” inInternational Conference on Artificial Neural Networks. Springer, 2023, pp. 537–549
2023
-
[9]
Towards model generalization for intrusion detection: Unsupervised machine learning techniques,
M. Verkerken, L. D’hooge, T. Wauters, B. V olckaert, and F. De Turck, “Towards model generalization for intrusion detection: Unsupervised machine learning techniques,”Journal of Network and Systems Management, vol. 30, no. 1, p. 12, 2022
2022
-
[10]
Netgpt: Gener- ative pretrained transformer for network traffic,
X. Meng, C. Lin, Y . Wang, and Y . Zhang, “Netgpt: Gener- ative pretrained transformer for network traffic,”arXiv preprint arXiv:2304.09513, 2023
arXiv 2023
-
[11]
Mm4flow: A pre-trained multi-modal model for versatile network traffic analysis,
L. Yang, L. Liu, J. Huang, Z. Liu, S. Liang, S. Fu, and Y . Wang, “Mm4flow: A pre-trained multi-modal model for versatile network traffic analysis,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 2025, pp. 1664–1678
2025
-
[12]
Traffic-moe: A sparse foundation model for network traffic analysis,
J. Zhou, C. Sun, M. Shen, S. Yu, and Q. Xuan, “Traffic-moe: A sparse foundation model for network traffic analysis,”arXiv preprint arXiv:2601.00357, 2026
arXiv 2026
-
[13]
J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,”arXiv preprint arXiv:1607.06450, 2016
Pith/arXiv arXiv 2016
-
[14]
U-net: Convolutional net- works for biomedical image segmentation,
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net- works for biomedical image segmentation,” inInternational Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241
2015
-
[15]
Mobilenetv2: Inverted residuals and linear bottlenecks,
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520
2018
-
[16]
S. Zagoruyko and N. Komodakis, “Wide residual networks,”arXiv preprint arXiv:1605.07146, 2016
Pith/arXiv arXiv 2016
-
[17]
Predtrad–prediction- based transformer for anomaly detection in multivariate time series data,
J. Schuster, A. W¨olfel, F. Brunner, and C. Bergler, “Predtrad–prediction- based transformer for anomaly detection in multivariate time series data,” inProc. Interspeech 2025, 2025, pp. 3873–3877
2025
-
[18]
Cyclical annealing schedule: A simple approach to mitigating KL vanishing,
H. Fu, C. Li, X. Liu, J. Gao, A. Celikyilmaz, and L. Carin, “Cyclical annealing schedule: A simple approach to mitigating KL vanishing,”CoRR, vol. abs/1903.10145, 2019. [Online]. Available: http://arxiv.org/abs/1903.10145
Pith/arXiv arXiv 1903
-
[19]
IoT-23: A labeled dataset with malicious and benign IoT network traffic,
S. Garcia, A. Parmisano, and M. J. Erquiaga, “IoT-23: A labeled dataset with malicious and benign IoT network traffic,” Jan. 2020, version 1.0.0. [Online]. Available: https://doi.org/10.5281/zenodo.4743746 APPENDIX: OCSVM EVALUATION ANDSCALABILITY To further contextualise our baseline selection, we conduc- ted secondary experiments using a One-Class Suppo...
-
[20]
Interestingly, when the OCSVM was supplied with the Foundation Model embeddings, performance improved to a 61.2 % Macro F1 on ToN-IoT and 33.9 % on IoT-
This indicates that without network shortcut identifiers (e.g., IP-addresses), the overlapping distributions of raw flow statistics hinder the ability to establish a clear decision boundary. Interestingly, when the OCSVM was supplied with the Foundation Model embeddings, performance improved to a 61.2 % Macro F1 on ToN-IoT and 33.9 % on IoT-
-
[21]
However, because the overall performance remained substantially lower than both the Isolation Forest and NetV AD, the Isolation Forest was used as a primary baseline for our study
This supports the notion that the Foundation Model effectively clusters behavioural representations in the latent space. However, because the overall performance remained substantially lower than both the Isolation Forest and NetV AD, the Isolation Forest was used as a primary baseline for our study
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.