netFound: Principled Design for Network Foundation Models

Arpit Gupta; Haarika Manda; Inder Monga; Jaber Daneshamooz; Satyandra Guthula; Sylee Beltiukov; Walter Willinger; Wenbo Guo

arxiv: 2310.17025 · v5 · submitted 2023-10-25 · 💻 cs.NI · cs.AI

netFound: Principled Design for Network Foundation Models

Sylee Beltiukov , Satyandra Guthula , Haarika Manda , Jaber Daneshamooz , Wenbo Guo , Walter Willinger , Arpit Gupta , Inder Monga This is my paper

Pith reviewed 2026-05-24 06:42 UTC · model grok-4.3

classification 💻 cs.NI cs.AI

keywords network foundation modelstraffic representation learningprotocol-aware tokenizationburst-flow hierarchical attentionprivacy-by-constructionexogenous context discriminationembedding anisotropy

0 comments

The pith

netFound applies four principles from model failure diagnostics to produce network embeddings with 0.95 F1 on context discrimination.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing network foundation models fail by exploiting dataset shortcuts rather than learning genuine traffic patterns, producing collapsed embedding spaces, and ignoring exogenous network conditions. The paper translates these diagnostic problems into four concrete design principles: protocol-aware tokenization, operational context embedding, burst-flow hierarchical attention, and privacy-by-construction input design. netFound implements these principles, pretrains on a billion-token corpus, and yields representations with lower anisotropy and stronger alignment to domain-expert features. This matters because reusable embeddings that actually reflect real traffic behavior could support many downstream analysis tasks without repeated full retraining. The design also excludes payload and IP addresses to preserve privacy by construction.

Core claim

netFound is a network foundation model whose architecture is motivated by diagnostic analysis of why prior models fail. By incorporating protocol-aware tokenization, operational context embedding, burst-flow hierarchical attention, and privacy-by-construction input design, and pretraining on a large-scale corpus of 4.2 billion flows, it produces high-quality representations with lower anisotropy, significantly higher alignment with domain-expert features, and an F1 of 0.95 on exogenous context discrimination where existing models score below 0.62. It outperforms baselines in both frozen-encoder and end-to-end fine-tuned evaluations while excluding payload and IP addresses.

What carries the argument

Four design principles—protocol-aware tokenization, operational context embedding, burst-flow hierarchical attention, and privacy-by-construction input design—that translate identified failure modes into architectural choices for the netFound model.

If this is right

Pretrained netFound embeddings carry useful structure that improves performance in frozen-encoder settings across benchmarks.
The model remains the top performer in all end-to-end fine-tuned evaluations.
Representations exhibit lower anisotropy and higher alignment with domain-expert features than prior models.
Privacy is preserved by design through exclusion of payload and IP addresses without performance loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar translation of diagnostic failure analysis into explicit design rules could be tested in other sequence or graph domains where shortcut exploitation occurs.
The released full pipeline from raw PCAPs to inference and the 4.2 billion flow dataset enable direct checks of whether the gains hold on traffic from different network operators.
Better modeling of exogenous conditions may improve robustness when models encounter previously unseen network environments or policy changes.

Load-bearing premise

The diagnostic findings from prior work correctly identify the root causes of failure in existing network foundation models, and the four proposed design principles directly mitigate those causes without introducing compensating weaknesses or evaluation artifacts.

What would settle it

Evaluation of netFound on a new dataset engineered to retain the same shortcuts as the pretraining data but alter the genuine traffic patterns, checking whether the F1 on context discrimination drops below 0.8 or anisotropy increases to prior-model levels.

Figures

Figures reproduced from arXiv: 2310.17025 by Arpit Gupta, Haarika Manda, Inder Monga, Jaber Daneshamooz, Satyandra Guthula, Sylee Beltiukov, Walter Willinger, Wenbo Guo.

**Figure 2.** Figure 2: Data extraction, Featurization & Protocolaware Tokenization: Pipeline for converting the packet traces into tokens with metadata. After the flows are extracted from packet traces, we collect the relevant fields into features at different granularities, following which we convert them into tokens. dependencies across bursts, which cannot be modeled by naive structure in Figure 1a. More importantly, the ou… view at source ↗

**Figure 3.** Figure 3: Pre-training—the hierarchical transformer uses a [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The token prediction performance between net [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: The testing performance of netFound and baselines [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

Network foundation models promise reusable representations for diverse traffic analysis tasks, but recent diagnostic works have revealed fundamental problems: models exploit dataset shortcuts rather than learning genuine traffic patterns, produce collapsed embedding spaces, and fail to capture the exogenous network conditions that shape real-world behavior. We translate these diagnostic insights into four concrete design principles: protocol-aware tokenization, operational context embedding, burst-flow hierarchical attention, and privacy-by-construction input design, and build netFound, a network foundation model whose architecture is motivated by this failure analysis. We pretrain netFound on a billion-token-scale corpus over 5000 GPU hours, and demonstrate that it produces high-quality representations with lower anisotropy, significantly higher alignment with domain-expert features, and an F1 of 0.95 on exogenous context discrimination where existing state-of-the-art models score below 0.62, while preserving privacy by excluding payload and IP addresses. netFound demonstrates significant improvements in frozen-encoder evaluation, showing that pretrained embeddings themselves carry useful structure, and remains the top performer across all benchmarks in end-to-end fine-tuned settings. We release full open-source code, weights for three model sizes on HuggingFace, a containerized pipeline from raw PCAPs to downstream inference, and the full 4.2 billion flows pretraining dataset to facilitate reproducibility and further research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

netFound turns diagnostic critiques of network FMs into four concrete design rules, ships strong benchmark numbers plus full open artifacts, but the gains are not yet isolated from scale and data volume.

read the letter

The main thing to know is that this paper takes recent failure analyses of network foundation models—shortcut learning, collapsed embeddings, missing context—and converts them into four explicit principles: protocol-aware tokenization, operational context embedding, burst-flow hierarchical attention, and privacy-by-construction. They pretrain on 4.2 billion flows for 5000 GPU hours and report an F1 of 0.95 on exogenous context discrimination where prior models sit below 0.62, along with lower anisotropy and better alignment to expert features. They also release code, three model sizes on Hugging Face, a full pipeline, and the dataset itself. That release alone gives the work practical value for anyone who wants to start from a large, privacy-preserving network corpus rather than scraping their own. The architecture choices are motivated by the diagnostics rather than generic scaling, which is a step beyond simply training a bigger transformer on more packets. The soft spot is the absence of ablations that hold model size, data volume, and training compute fixed while toggling individual principles. Without those variants it remains possible that the reported deltas come from corpus scale or training duration rather than the specific design rules. The abstract also gives limited detail on baselines, splits, and statistical tests, though the full paper presumably expands on them. This paper is aimed at researchers working on traffic analysis and domain-specific foundation models who need a reproducible starting point. A reader who values open artifacts and empirical comparisons will find usable material here. It deserves a serious referee because the combination of grounded design, concrete numbers, and released resources is substantial enough to warrant expert scrutiny even if revisions are needed on the causal claims.

Referee Report

2 major / 2 minor

Summary. The paper claims that diagnostic insights into failures of existing network foundation models (shortcut exploitation, embedding collapse, failure to capture exogenous conditions) can be translated into four design principles—protocol-aware tokenization, operational context embedding, burst-flow hierarchical attention, and privacy-by-construction—to produce netFound. After pretraining on a 4.2B-flow corpus (billion-token scale, 5000 GPU hours), netFound yields lower-anisotropy embeddings with higher alignment to domain-expert features, achieves F1=0.95 on exogenous context discrimination (vs. <0.62 for prior SOTA), preserves privacy by excluding payload/IP, and leads both frozen-encoder and fine-tuned benchmarks. Full code, three model weights, containerized pipeline, and the pretraining dataset are released.

Significance. If the results hold under controlled evaluation, the work is significant for supplying the first architecture whose components are explicitly derived from published failure diagnostics rather than scale alone, for demonstrating usable structure in the frozen pretrained embeddings, and for the unusually complete reproducibility package (dataset, pipeline, weights). These elements would materially advance the design of reusable traffic representations.

major comments (2)

[Evaluation / Experimental sections] The central claim—that the four design principles directly mitigate the diagnosed failure modes—rests on the reported performance deltas (F1 0.95 vs. <0.62, improved anisotropy and alignment). No ablation variants are described that remove or disable individual principles while holding model size, data volume, and training compute fixed; therefore the deltas cannot be attributed to the principles rather than corpus scale or training duration.
[Abstract and §5 (results)] Abstract and results sections report quantitative improvements but supply no details on baseline implementations, statistical significance tests, train/validation/test splits, or potential evaluation confounds (e.g., distribution shift between pretraining and downstream tasks). These omissions prevent assessment of whether the numbers support the claim that netFound representations are genuinely higher-quality.

minor comments (2)

[Model architecture section] Notation for the hierarchical attention mechanism and context-embedding layers should be defined more explicitly (e.g., with a small diagram or pseudocode) to allow readers to verify alignment with the stated design principles.
[Input design / privacy section] The privacy-by-construction claim would be strengthened by an explicit statement of what information is discarded at tokenization time and confirmation that no IP or payload bytes reach the model.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses

Referee: [Evaluation / Experimental sections] The central claim—that the four design principles directly mitigate the diagnosed failure modes—rests on the reported performance deltas (F1 0.95 vs. <0.62, improved anisotropy and alignment). No ablation variants are described that remove or disable individual principles while holding model size, data volume, and training compute fixed; therefore the deltas cannot be attributed to the principles rather than corpus scale or training duration.

Authors: We agree that the manuscript does not include ablation studies that isolate the contribution of each design principle under controlled conditions. Each principle is explicitly motivated by a specific failure mode from prior diagnostic literature, and netFound is shown to outperform prior models that lack these components. However, without ablations we cannot rigorously rule out contributions from scale or training factors. In the revised manuscript we will add ablation experiments training variants with individual principles disabled while holding model size, data volume, and compute fixed. revision: yes
Referee: [Abstract and §5 (results)] Abstract and results sections report quantitative improvements but supply no details on baseline implementations, statistical significance tests, train/validation/test splits, or potential evaluation confounds (e.g., distribution shift between pretraining and downstream tasks). These omissions prevent assessment of whether the numbers support the claim that netFound representations are genuinely higher-quality.

Authors: The full manuscript provides an experimental setup description in §5, but we acknowledge that explicit details on baseline re-implementations, statistical significance testing, precise train/validation/test splits, and distribution-shift analysis are insufficient. In the revised version we will expand §5 and the appendix to include these elements, enabling direct assessment of result validity and potential confounds. revision: yes

Circularity Check

0 steps flagged

No circularity; design principles are motivated by external citations and results are empirical benchmarks

full rationale

The manuscript contains no equations, derivations, or fitted parameters presented as predictions. It cites prior diagnostic works to motivate four design principles, then reports empirical performance on frozen-encoder and fine-tuned tasks against external state-of-the-art models. No self-citation is load-bearing for a mathematical claim, and no step reduces by construction to its own inputs. The central claims rest on observable benchmark deltas rather than internal redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that the four design principles derived from prior diagnostics are sufficient to produce genuinely better representations; no free parameters, invented entities, or additional axioms are stated in the abstract.

axioms (1)

domain assumption Diagnostic insights from prior work on shortcut exploitation, collapsed embeddings, and missing exogenous context correctly identify the root causes that must be fixed.
The paper states that it translates these diagnostic insights into the four design principles.

pith-pipeline@v0.9.0 · 5793 in / 1251 out tokens · 22778 ms · 2026-05-24T06:42:30.923235+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 3 internal anchors

[1]

On the effectiveness of machine and deep learning for cyber security,

G. Apruzzese, M. Colajanni, L. Ferretti, A. Guido, and M. Marchetti, “On the effectiveness of machine and deep learning for cyber security,” 2018 10th International Conference on Cyber Conflict (CyCon), pp. 371–390, 2018. [Online]. Available: https://api. semanticscholar.org/CorpusID:49656174

work page 2018
[2]

A survey on machine learning techniques for cyber security in the last decade,

K. Shaukat, S. Luo, V . Varadharajan, I. A. Hameed, and M. Xu, “A survey on machine learning techniques for cyber security in the last decade,” IEEE Access, vol. 8, pp. 222 310–222 354, 2020

work page 2020
[3]

Outside the closed world: On using machine learning for network intrusion detection,

R. Sommer and V . Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” in 2010 IEEE Symposium on Security and Privacy , 2010, pp. 305–316

work page 2010
[4]

Underspecification presents challenges for credibility in modern machine learning,

A. D’Amour, K. Heller, D. Moldovan, B. Adlam, B. Alipanahi, A. Beutel, C. Chen et al. , “Underspecification presents challenges for credibility in modern machine learning,” Journal of Machine Learning Research, 2022. [Online]. Available: http://jmlr.org/papers/ v23/20-1335.html

work page 2022
[5]

In search of netunicorn: A data-collection platform to develop generalizable ml models for network security problems,

R. Beltiukov, W. Guo, A. Gupta, and W. Willinger, “In search of netunicorn: A data-collection platform to develop generalizable ml models for network security problems,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2023

work page 2023
[6]

A look behind the curtain: Traffic classification in an increasingly encrypted web,

I. Akbari, M. A. Salahuddin, L. Ven, N. Limam, R. Boutaba, B. Mathieu, S. Moteau, and S. Tuffin, “A look behind the curtain: Traffic classification in an increasingly encrypted web,” Proc. ACM Meas. Anal. Comput. Syst. , vol. 5, no. 1, feb 2021. [Online]. Available: https://doi.org/10.1145/3447382

work page doi:10.1145/3447382 2021
[7]

Ac-dc: Adaptive ensemble classification for network traffic identifi- cation,

X. Jiang, S. Liu, S. Naama, F. Bronzino, P. Schmitt, and N. Feamster, “Ac-dc: Adaptive ensemble classification for network traffic identifi- cation,” 2023

work page 2023
[8]

Fine-grained TLS services classification with reject option,

J. Luxemburk and T. ˇCejka, “Fine-grained TLS services classification with reject option,” Computer Networks , vol. 220, 2022. [Online]. Available: http://arxiv.org/abs/2202.11984

work page arXiv 2022
[9]

Error prevalence in nids datasets: A case study on cic-ids-2017 and cse- cic-ids-2018,

L. Liu, G. Engelen, T. Lynar, D. Essam, and W. Joosen, “Error prevalence in nids datasets: A case study on cic-ids-2017 and cse- cic-ids-2018,” in 2022 IEEE Conference on Communications and Network Security (CNS) , 2022, pp. 254–262

work page 2017
[10]

Ai/ml for network security: The emperor has no clothes,

A. S. Jacobs, R. Beltiukov, W. Willinger, R. A. Ferreira, A. Gupta, and L. Z. Granville, “Ai/ml for network security: The emperor has no clothes,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS) , 2022

work page 2022
[11]

Dos and don’ts of machine learning in computer security,

D. Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wressnegger, L. Cavallaro, and K. Rieck, “Dos and don’ts of machine learning in computer security,” in 31st USENIX Security Symposium (USENIX Security 22) . Boston, MA: USENIX Association, Aug. 2022, pp. 3971–3988. [Online]. Available: https://www.usenix.org/conference/usenixsecurity22/presen...

work page 2022
[12]

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt,

C. Zhou, Q. Li, C. Li, J. Yu, Y . Liu, G. Wang, K. Zhang, C. Ji, Q. Yan, L. He, H. Peng, J. Li, J. Wu, Z. Liu, P. Xie, C. Xiong, J. Pei, P. S. Yu, and L. Sun, “A comprehensive survey on pretrained foundation models: A history from bert to chatgpt,” 2023

work page 2023
[13]

Gpt-4 technical report,

OpenAI, “Gpt-4 technical report,” 2023

work page 2023
[14]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre- training of deep bidirectional transformers for language understand- ing,” ArXiv, vol. abs/1810.04805, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2019
[15]

Vision transformer architecture search,

X. Su, S. You, J. Xie, M. Zheng, F. Wang, C. Qian, C. Zhang, X. Wang, and C. Xu, “Vision transformer architecture search,” ArXiv, vol. abs/2106.13700, 2021

work page arXiv 2021
[16]

Pinot: Programmable infrastructure for networking,

R. Beltiukov, S. Chandrasekaran, A. Gupta, and W. Willinger, “Pinot: Programmable infrastructure for networking,” in Proceedings of the Applied Networking Research Workshop , ser. ANRW ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 51–53. [Online]. Available: https://doi.org/10.1145/3606464.3606485

work page doi:10.1145/3606464.3606485 2023
[17]

Experience-driven research on programmable networks,

H. Kim, X. Chen, J. Brassil, and J. Rexford, “Experience-driven research on programmable networks,” SIGCOMM Comput. Commun. Rev., vol. 51, no. 1, p. 10–17, mar 2021. [Online]. Available: https://doi.org/10.1145/3457175.3457178

work page doi:10.1145/3457175.3457178 2021
[18]

Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,

X. Lin, G. Xiong, G. Gou, Z. Li, J. Shi, and J. Yu, “Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,” in Proceedings of the ACM Web Conference 2022 , ser. WWW ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 633–642. [Online]. Available: https://doi.org/10.1145/34...

work page doi:10.1145/3485447.3512217 2022
[19]

Yet another traffic classifier: A masked autoencoder based traffic transformer with multi-level flow representation,

R. Zhao, M. Zhan, X. Deng, Y . Wang, Y . Wang, G. Gui, and Z. Xue, “Yet another traffic classifier: A masked autoencoder based traffic transformer with multi-level flow representation,” Proceedings of the AAAI Conference on Artificial Intelligence , vol. 37, no. 4, pp. 5420–5427, Jun. 2023. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/...

work page 2023
[20]

Multi- classification approaches for classifying mobile app traffic,

G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, “Multi- classification approaches for classifying mobile app traffic,” J. Netw. Comput. Appl., vol. 103, pp. 131–145, 2018

work page 2018
[21]

FlowPrint: Semi-Supervised Mobile-App Fingerprinting on Encrypted Network Traffic,

T. van Ede, R. Bortolameotti, A. Continella, J. Ren, D. J. Dubois, M. Lindorfer, D. Choffness, M. van Steen, and A. Peter, “FlowPrint: Semi-Supervised Mobile-App Fingerprinting on Encrypted Network Traffic,” in NDSS. The Internet Society, 2020

work page 2020
[22]

A survey on encrypted network traffic analysis applications, techniques, and countermeasures,

E. Papadogiannaki and S. Ioannidis, “A survey on encrypted network traffic analysis applications, techniques, and countermeasures,” ACM Comput. Surv. , vol. 54, no. 6, jul 2021. [Online]. Available: https://doi.org/10.1145/3457904

work page doi:10.1145/3457904 2021
[23]

End-to-end en- crypted traffic classification with one-dimensional convolution neural networks,

W. Wang, M. Zhu, J. Wang, X. Zeng, and Z. Yang, “End-to-end en- crypted traffic classification with one-dimensional convolution neural networks,” 2017 IEEE International Conference on Intelligence and Security Informatics (ISI) , pp. 43–48, 2017

work page 2017
[24]

Characterization of tor traffic using time based features,

A. H. Lashkari, G. Draper-Gil, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of tor traffic using time based features,” in Inter- national Conference on Information Systems Security and Privacy , 2017

work page 2017
[25]

Flag: Flow representation generator based on self-supervised learning for encrypted traffic classification,

W. Wei, T. Ju, H. Liao, W. Zhao, and H. Gu, “Flag: Flow representation generator based on self-supervised learning for encrypted traffic classification,” in 5th Asia-Pacific Workshop on Networking (APNet 2021) , ser. APNet 2021. New York, NY , USA: Association for Computing Machinery, 2022, p. 14–20. [Online]. Available: https://doi.org/10.1145/3469393.3469394 14

work page doi:10.1145/3469393.3469394 2021
[26]

A comparative study of network traffic representations for novelty detection,

K. Yang, S. Kpotufe, and N. Feamster, “A comparative study of network traffic representations for novelty detection,” CoRR, vol. abs/2006.16993, 2020. [Online]. Available: https://arxiv.org/abs/ 2006.16993

work page arXiv 2006
[27]

New directions in automated traffic analysis,

J. Holland, P. Schmitt, N. Feamster, and P. Mittal, “New directions in automated traffic analysis,” Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS) , 2020

work page 2021
[28]

Deep-full-range: A deep learning based network encrypted traffic classification and intrusion detection framework,

Y . Zeng, H. Gu, W. Wenting, and Y . Guo, “Deep-full-range: A deep learning based network encrypted traffic classification and intrusion detection framework,” IEEE Access, vol. PP, pp. 1–1, 01 2019

work page 2019
[29]

Kitsune: An ensemble of autoencoders for online network intrusion detection,

Y . Mirsky, T. Doitshman, Y . Elovici, and A. Shabtai, “Kitsune: An ensemble of autoencoders for online network intrusion detection,” in 25th Annual Network and Distributed System Security Symposium, NDSS. The Internet Society, 2018

work page 2018
[30]

Machine learning for botnet detection: An optimized feature selection approach,

M. Lefoane, I. Ghafir, S. Kabir, and I.-U. Awan, “Machine learning for botnet detection: An optimized feature selection approach,” in The 5th International Conference on Future Networks & Distributed Systems, ser. ICFNDS 2021. New York, NY , USA: Association for Computing Machinery, 2022, p. 195–200. [Online]. Available: https://doi.org/10.1145/3508072.3508102

work page doi:10.1145/3508072.3508102 2021
[31]

A survey on data-driven software vulnerability assessment and prioritization,

T. H. M. Le, H. Chen, and M. A. Babar, “A survey on data-driven software vulnerability assessment and prioritization,” ACM Comput. Surv. , vol. 55, no. 5, dec 2022. [Online]. Available: https://doi.org/10.1145/3529757

work page doi:10.1145/3529757 2022
[32]

Network traffic classifier with convolutional and recurrent neural networks for internet of things,

M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret, “Network traffic classifier with convolutional and recurrent neural networks for internet of things,” IEEE access , vol. 5, pp. 18 042– 18 050, 2017

work page 2017
[33]

Flowpic: Encrypted internet traffic classifi- cation is as easy as image recognition,

T. Shapira and Y . Shavitt, “Flowpic: Encrypted internet traffic classifi- cation is as easy as image recognition,”IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 680–687, 2019

work page 2019
[34]

A neural attention model for real-time network intrusion detection,

M. Tan, A. Iacovazzi, N.-M. M. Cheung, and Y . Elovici, “A neural attention model for real-time network intrusion detection,” in 2019 IEEE 44th conference on local computer networks (LCN) . IEEE, 2019, pp. 291–299

work page 2019
[35]

Deep packet: A novel approach for encrypted traffic classification using deep learning,

M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hossein Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,” Soft Computing , vol. 24, no. 3, pp. 1999–2012, 2020

work page 1999
[36]

Large-scale mobile app iden- tification using deep learning,

S. Rezaei, B. Kroencke, and X. Liu, “Large-scale mobile app iden- tification using deep learning,” IEEE Access , vol. 8, pp. 348–362, 2019

work page 2019
[37]

Encrypted network traffic classification using deep and parallel network-in- network models,

Z. Bu, B. Zhou, P. Cheng, K. Zhang, and Z.-H. Ling, “Encrypted network traffic classification using deep and parallel network-in- network models,” Ieee Access, vol. 8, pp. 132 950–132 959, 2020

work page 2020
[38]

Byte segment neural network for network traffic classification,

R. Li, X. Xiao, S. Ni, H. Zheng, and S. Xia, “Byte segment neural network for network traffic classification,” in 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS) . IEEE, 2018, pp. 1–10

work page 2018
[39]

Mt-flowformer: A semi-supervised flow transformer for encrypted traffic classification,

R. Zhao, X. Deng, Z. Yan, J. Ma, Z. Xue, and Y . Wang, “Mt-flowformer: A semi-supervised flow transformer for encrypted traffic classification,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , ser. KDD ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 2576–2584. [Online]. Available: https://do...

work page arXiv 2022
[40]

Inferring streaming video quality from encrypted traffic: Practical models and deployment experience,

F. Bronzino, P. Schmitt, S. Ayoubi, G. Martins, R. Teixeira, and N. Feamster, “Inferring streaming video quality from encrypted traffic: Practical models and deployment experience,” Proc. ACM Meas. Anal. Comput. Syst. , vol. 3, no. 3, dec 2019. [Online]. Available: https://doi.org/10.1145/3366704

work page doi:10.1145/3366704 2019
[41]

Privateeye: Scalable and privacy-preserving compro- mise detection in the cloud,

B. Arzani, S. Ciraci, S. Saroiu, A. Wolman, J. W. Stokes, G. Outhred, and L. Diwu, “Privateeye: Scalable and privacy-preserving compro- mise detection in the cloud,” in Proceedings of the 17th Usenix Conference on Networked Systems Design and Implementation , ser. NSDI’20. USA: USENIX Association, 2020, p. 797–816

work page 2020
[42]

Pert: Payload encoding representation from transformer for encrypted traffic classification,

H. Y . He, Z. Guo Yang, and X. N. Chen, “Pert: Payload encoding representation from transformer for encrypted traffic classification,” in 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K), 2020, pp. 1–8

work page 2020
[43]

Flow-mae: Leveraging masked autoencoder for accurate, efficient and robust malicious traffic classification,

Z. Hang, Y . Lu, Y . Wang, and Y . Xie, “Flow-mae: Leveraging masked autoencoder for accurate, efficient and robust malicious traffic classification,” in Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses , ser. RAID ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 297–314. [Online]. Avail...

work page arXiv 2023
[44]

Mtsecurity: Privacy-preserving malicious traffic classification using graph neural network and transformer,

J. Yang, X. Jiang, Y . Lei, W. Liang, Z. Ma, and S. Li, “Mtsecurity: Privacy-preserving malicious traffic classification using graph neural network and transformer,”IEEE Transactions on Network and Service Management, pp. 1–1, 2024

work page 2024
[45]

Trafficgpt: Breaking the token barrier for efficient long traffic analysis and genera- tion,

J. Qu, X. Ma, and J. Li, “Trafficgpt: Breaking the token barrier for efficient long traffic analysis and genera- tion,” ArXiv, vol. abs/2403.05822, 2024. [Online]. Available: https://api.semanticscholar.org/CorpusID:268351552

work page arXiv 2024
[46]

Lens: A foundation model for network traffic in cyberse- curity,

Q. Wang, C. Qian, X. Li, Z. Yao, and H. Shao, “Lens: A foundation model for network traffic in cyberse- curity,” ArXiv, vol. abs/2402.03646, 2024. [Online]. Available: https://api.semanticscholar.org/CorpusID:267628222

work page arXiv 2024
[47]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021

work page 2021
[48]

Masked autoencoders are scalable vision learners,

K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” 2021

work page 2021
[49]

Auto-encoding variational bayes,

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2022

work page 2022
[50]

Generative adversarial networks,

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde- Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial networks,” 2014

work page 2014
[51]

Long-short transformer: Efficient transformers for language and vision,

C. Zhu, W. Ping, C. Xiao, M. Shoeybi, T. Goldstein, A. Anandkumar, and B. Catanzaro, “Long-short transformer: Efficient transformers for language and vision,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 17 723– 17 736. [Online]...

work page 2021
[52]

Hier- archical attention networks for document classification,

Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hier- archical attention networks for document classification,” in NAACL, 2016

work page 2016
[53]

Deepvsa: Facili- tating value-set analysis with deep learning for postmortem program analysis

W. Guo, D. Mu, X. Xing, M. Du, and D. Song, “Deepvsa: Facili- tating value-set analysis with deep learning for postmortem program analysis.” in USENIX Security Symposium , 2019

work page 2019
[54]

Hierarchical transformers are more efficient language models,

P. Nawrot, S. Tworkowski, M. Tyrolski, L. Kaiser, Y . Wu, C. Szegedy, and H. Michalewski, “Hierarchical transformers are more efficient language models,” in Findings of the Association for Computational Linguistics: NAACL 2022 , M. Carpuat, M.-C. de Marneffe, and I. V . Meza Ruiz, Eds. Seattle, United States: Association for Computational Linguistics, Jul...

work page 2022
[55]

Word embeddings: A survey,

F. Almeida and G. Xexéo, “Word embeddings: A survey,” 2023

work page 2023
[56]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems , vol. 30, 2017

work page 2017
[57]

Should you mask 15% in masked language modeling?

A. Wettig, T. Gao, Z. Zhong, and D. Chen, “Should you mask 15% in masked language modeling?” in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , A. Vlachos and I. Augenstein, Eds. Dubrovnik, Croatia: Association for Computational Linguistics, May 2023, pp. 2985–3000. [Online]. Available: https:/...

work page 2023
[58]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza- tion,” arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[59]

Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic,

T. van Ede, R. Bortolameotti, A. Continella, J. Ren, D. J. Dubois, M. Lindorfer, D. R. Choffnes, M. van Steen, and A. Peter, “Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic,” Proceedings 2020 Network and Distributed System Security Symposium , 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:211265114

work page 2020
[60]

Characterization of encrypted and vpn traffic using time-related fea- tures,

G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of encrypted and vpn traffic using time-related fea- tures,” in International Conference on Information Systems Security and Privacy, 2016

work page 2016
[61]

Toward gen- erating a new intrusion detection dataset and intrusion traffic char- acterization,

I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward gen- erating a new intrusion detection dataset and intrusion traffic char- acterization,” in Proceedings of the 4th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP , , INSTICC. SciTePress, 2018, pp. 108–116

work page 2018
[62]

Patator,

“Patator,” https://github.com/lanjelot/patator

work page
[63]

Random decision forests,

T. K. Ho, “Random decision forests,” in Proceedings of 3rd inter- national conference on document analysis and recognition , vol. 1. IEEE, 1995, pp. 278–282

work page 1995
[64]

Support-vector networks,

C. Cortes and V . Vapnik, “Support-vector networks,” Machine learn- ing, vol. 20, no. 3, pp. 273–297, 1995

work page 1995
[65]

The probable error of a mean,

Student, “The probable error of a mean,” Biometrika, pp. 1–25, 1908

work page 1908
[66]

From grim reality to practical solution: Malware classification in real-world noise,

X. Wu, W. Guo, J. Yan, B. Coskun, and X. Xing, “From grim reality to practical solution: Malware classification in real-world noise,” in 2023 IEEE Symposium on Security and Privacy (SP), 2023, pp. 2602– 2619

work page 2023
[67]

Longformer: The long- document transformer,

I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The long- document transformer,” 2020

work page 2020
[68]

A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,

A. Alshamrani, S. Myneni, A. Chowdhary, and D. Huang, “A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,” IEEE Communications Surveys & Tutorials , vol. 21, no. 2, pp. 1851–1877, 2019

work page 2019
[69]

Deep metric learning: A survey,

M. Kaya and H. ¸ S. Bilge, “Deep metric learning: A survey,” Symme- try, vol. 11, no. 9, p. 1066, 2019

work page 2019
[70]

LoRA: Low-Rank Adaptation of Large Language Models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[71]

A new hope for network model generalization,

A. Dietmüller, S. Ray, R. Jacob, and L. Vanbever, “A new hope for network model generalization,” in Proceedings of the 21st ACM Workshop on Hot Topics in Networks , ser. HotNets ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 152–159. [Online]. Available: https://doi.org/10.1145/3563766.3564104

work page doi:10.1145/3563766.3564104 2022
[72]

Towards transferable adversarial attacks on vision transformers,

Z. Wei, J. Chen, M. Goldblum, Z. Wu, T. Goldstein, and Y .-G. Jiang, “Towards transferable adversarial attacks on vision transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 2668–2676

work page 2022
[73]

Adversarial attack and defense technologies in natural language processing: A survey,

S. Qiu, Q. Liu, S. Zhou, and W. Huang, “Adversarial attack and defense technologies in natural language processing: A survey,” Neu- rocomputing, vol. 492, pp. 278–307, 2022. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0925231222003861 Appendix In this appendix, we provide brief information about traffic distribution between di...

work page 2022

[1] [1]

On the effectiveness of machine and deep learning for cyber security,

G. Apruzzese, M. Colajanni, L. Ferretti, A. Guido, and M. Marchetti, “On the effectiveness of machine and deep learning for cyber security,” 2018 10th International Conference on Cyber Conflict (CyCon), pp. 371–390, 2018. [Online]. Available: https://api. semanticscholar.org/CorpusID:49656174

work page 2018

[2] [2]

A survey on machine learning techniques for cyber security in the last decade,

K. Shaukat, S. Luo, V . Varadharajan, I. A. Hameed, and M. Xu, “A survey on machine learning techniques for cyber security in the last decade,” IEEE Access, vol. 8, pp. 222 310–222 354, 2020

work page 2020

[3] [3]

Outside the closed world: On using machine learning for network intrusion detection,

R. Sommer and V . Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” in 2010 IEEE Symposium on Security and Privacy , 2010, pp. 305–316

work page 2010

[4] [4]

Underspecification presents challenges for credibility in modern machine learning,

A. D’Amour, K. Heller, D. Moldovan, B. Adlam, B. Alipanahi, A. Beutel, C. Chen et al. , “Underspecification presents challenges for credibility in modern machine learning,” Journal of Machine Learning Research, 2022. [Online]. Available: http://jmlr.org/papers/ v23/20-1335.html

work page 2022

[5] [5]

In search of netunicorn: A data-collection platform to develop generalizable ml models for network security problems,

R. Beltiukov, W. Guo, A. Gupta, and W. Willinger, “In search of netunicorn: A data-collection platform to develop generalizable ml models for network security problems,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2023

work page 2023

[6] [6]

A look behind the curtain: Traffic classification in an increasingly encrypted web,

I. Akbari, M. A. Salahuddin, L. Ven, N. Limam, R. Boutaba, B. Mathieu, S. Moteau, and S. Tuffin, “A look behind the curtain: Traffic classification in an increasingly encrypted web,” Proc. ACM Meas. Anal. Comput. Syst. , vol. 5, no. 1, feb 2021. [Online]. Available: https://doi.org/10.1145/3447382

work page doi:10.1145/3447382 2021

[7] [7]

Ac-dc: Adaptive ensemble classification for network traffic identifi- cation,

X. Jiang, S. Liu, S. Naama, F. Bronzino, P. Schmitt, and N. Feamster, “Ac-dc: Adaptive ensemble classification for network traffic identifi- cation,” 2023

work page 2023

[8] [8]

Fine-grained TLS services classification with reject option,

J. Luxemburk and T. ˇCejka, “Fine-grained TLS services classification with reject option,” Computer Networks , vol. 220, 2022. [Online]. Available: http://arxiv.org/abs/2202.11984

work page arXiv 2022

[9] [9]

Error prevalence in nids datasets: A case study on cic-ids-2017 and cse- cic-ids-2018,

L. Liu, G. Engelen, T. Lynar, D. Essam, and W. Joosen, “Error prevalence in nids datasets: A case study on cic-ids-2017 and cse- cic-ids-2018,” in 2022 IEEE Conference on Communications and Network Security (CNS) , 2022, pp. 254–262

work page 2017

[10] [10]

Ai/ml for network security: The emperor has no clothes,

A. S. Jacobs, R. Beltiukov, W. Willinger, R. A. Ferreira, A. Gupta, and L. Z. Granville, “Ai/ml for network security: The emperor has no clothes,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS) , 2022

work page 2022

[11] [11]

Dos and don’ts of machine learning in computer security,

D. Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wressnegger, L. Cavallaro, and K. Rieck, “Dos and don’ts of machine learning in computer security,” in 31st USENIX Security Symposium (USENIX Security 22) . Boston, MA: USENIX Association, Aug. 2022, pp. 3971–3988. [Online]. Available: https://www.usenix.org/conference/usenixsecurity22/presen...

work page 2022

[12] [12]

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt,

C. Zhou, Q. Li, C. Li, J. Yu, Y . Liu, G. Wang, K. Zhang, C. Ji, Q. Yan, L. He, H. Peng, J. Li, J. Wu, Z. Liu, P. Xie, C. Xiong, J. Pei, P. S. Yu, and L. Sun, “A comprehensive survey on pretrained foundation models: A history from bert to chatgpt,” 2023

work page 2023

[13] [13]

Gpt-4 technical report,

OpenAI, “Gpt-4 technical report,” 2023

work page 2023

[14] [14]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre- training of deep bidirectional transformers for language understand- ing,” ArXiv, vol. abs/1810.04805, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2019

[15] [15]

Vision transformer architecture search,

X. Su, S. You, J. Xie, M. Zheng, F. Wang, C. Qian, C. Zhang, X. Wang, and C. Xu, “Vision transformer architecture search,” ArXiv, vol. abs/2106.13700, 2021

work page arXiv 2021

[16] [16]

Pinot: Programmable infrastructure for networking,

R. Beltiukov, S. Chandrasekaran, A. Gupta, and W. Willinger, “Pinot: Programmable infrastructure for networking,” in Proceedings of the Applied Networking Research Workshop , ser. ANRW ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 51–53. [Online]. Available: https://doi.org/10.1145/3606464.3606485

work page doi:10.1145/3606464.3606485 2023

[17] [17]

Experience-driven research on programmable networks,

H. Kim, X. Chen, J. Brassil, and J. Rexford, “Experience-driven research on programmable networks,” SIGCOMM Comput. Commun. Rev., vol. 51, no. 1, p. 10–17, mar 2021. [Online]. Available: https://doi.org/10.1145/3457175.3457178

work page doi:10.1145/3457175.3457178 2021

[18] [18]

Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,

X. Lin, G. Xiong, G. Gou, Z. Li, J. Shi, and J. Yu, “Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,” in Proceedings of the ACM Web Conference 2022 , ser. WWW ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 633–642. [Online]. Available: https://doi.org/10.1145/34...

work page doi:10.1145/3485447.3512217 2022

[19] [19]

Yet another traffic classifier: A masked autoencoder based traffic transformer with multi-level flow representation,

R. Zhao, M. Zhan, X. Deng, Y . Wang, Y . Wang, G. Gui, and Z. Xue, “Yet another traffic classifier: A masked autoencoder based traffic transformer with multi-level flow representation,” Proceedings of the AAAI Conference on Artificial Intelligence , vol. 37, no. 4, pp. 5420–5427, Jun. 2023. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/...

work page 2023

[20] [20]

Multi- classification approaches for classifying mobile app traffic,

G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, “Multi- classification approaches for classifying mobile app traffic,” J. Netw. Comput. Appl., vol. 103, pp. 131–145, 2018

work page 2018

[21] [21]

FlowPrint: Semi-Supervised Mobile-App Fingerprinting on Encrypted Network Traffic,

T. van Ede, R. Bortolameotti, A. Continella, J. Ren, D. J. Dubois, M. Lindorfer, D. Choffness, M. van Steen, and A. Peter, “FlowPrint: Semi-Supervised Mobile-App Fingerprinting on Encrypted Network Traffic,” in NDSS. The Internet Society, 2020

work page 2020

[22] [22]

A survey on encrypted network traffic analysis applications, techniques, and countermeasures,

E. Papadogiannaki and S. Ioannidis, “A survey on encrypted network traffic analysis applications, techniques, and countermeasures,” ACM Comput. Surv. , vol. 54, no. 6, jul 2021. [Online]. Available: https://doi.org/10.1145/3457904

work page doi:10.1145/3457904 2021

[23] [23]

End-to-end en- crypted traffic classification with one-dimensional convolution neural networks,

W. Wang, M. Zhu, J. Wang, X. Zeng, and Z. Yang, “End-to-end en- crypted traffic classification with one-dimensional convolution neural networks,” 2017 IEEE International Conference on Intelligence and Security Informatics (ISI) , pp. 43–48, 2017

work page 2017

[24] [24]

Characterization of tor traffic using time based features,

A. H. Lashkari, G. Draper-Gil, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of tor traffic using time based features,” in Inter- national Conference on Information Systems Security and Privacy , 2017

work page 2017

[25] [25]

Flag: Flow representation generator based on self-supervised learning for encrypted traffic classification,

W. Wei, T. Ju, H. Liao, W. Zhao, and H. Gu, “Flag: Flow representation generator based on self-supervised learning for encrypted traffic classification,” in 5th Asia-Pacific Workshop on Networking (APNet 2021) , ser. APNet 2021. New York, NY , USA: Association for Computing Machinery, 2022, p. 14–20. [Online]. Available: https://doi.org/10.1145/3469393.3469394 14

work page doi:10.1145/3469393.3469394 2021

[26] [26]

A comparative study of network traffic representations for novelty detection,

K. Yang, S. Kpotufe, and N. Feamster, “A comparative study of network traffic representations for novelty detection,” CoRR, vol. abs/2006.16993, 2020. [Online]. Available: https://arxiv.org/abs/ 2006.16993

work page arXiv 2006

[27] [27]

New directions in automated traffic analysis,

J. Holland, P. Schmitt, N. Feamster, and P. Mittal, “New directions in automated traffic analysis,” Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS) , 2020

work page 2021

[28] [28]

Deep-full-range: A deep learning based network encrypted traffic classification and intrusion detection framework,

Y . Zeng, H. Gu, W. Wenting, and Y . Guo, “Deep-full-range: A deep learning based network encrypted traffic classification and intrusion detection framework,” IEEE Access, vol. PP, pp. 1–1, 01 2019

work page 2019

[29] [29]

Kitsune: An ensemble of autoencoders for online network intrusion detection,

Y . Mirsky, T. Doitshman, Y . Elovici, and A. Shabtai, “Kitsune: An ensemble of autoencoders for online network intrusion detection,” in 25th Annual Network and Distributed System Security Symposium, NDSS. The Internet Society, 2018

work page 2018

[30] [30]

Machine learning for botnet detection: An optimized feature selection approach,

M. Lefoane, I. Ghafir, S. Kabir, and I.-U. Awan, “Machine learning for botnet detection: An optimized feature selection approach,” in The 5th International Conference on Future Networks & Distributed Systems, ser. ICFNDS 2021. New York, NY , USA: Association for Computing Machinery, 2022, p. 195–200. [Online]. Available: https://doi.org/10.1145/3508072.3508102

work page doi:10.1145/3508072.3508102 2021

[31] [31]

A survey on data-driven software vulnerability assessment and prioritization,

T. H. M. Le, H. Chen, and M. A. Babar, “A survey on data-driven software vulnerability assessment and prioritization,” ACM Comput. Surv. , vol. 55, no. 5, dec 2022. [Online]. Available: https://doi.org/10.1145/3529757

work page doi:10.1145/3529757 2022

[32] [32]

Network traffic classifier with convolutional and recurrent neural networks for internet of things,

M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret, “Network traffic classifier with convolutional and recurrent neural networks for internet of things,” IEEE access , vol. 5, pp. 18 042– 18 050, 2017

work page 2017

[33] [33]

Flowpic: Encrypted internet traffic classifi- cation is as easy as image recognition,

T. Shapira and Y . Shavitt, “Flowpic: Encrypted internet traffic classifi- cation is as easy as image recognition,”IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 680–687, 2019

work page 2019

[34] [34]

A neural attention model for real-time network intrusion detection,

M. Tan, A. Iacovazzi, N.-M. M. Cheung, and Y . Elovici, “A neural attention model for real-time network intrusion detection,” in 2019 IEEE 44th conference on local computer networks (LCN) . IEEE, 2019, pp. 291–299

work page 2019

[35] [35]

Deep packet: A novel approach for encrypted traffic classification using deep learning,

M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hossein Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,” Soft Computing , vol. 24, no. 3, pp. 1999–2012, 2020

work page 1999

[36] [36]

Large-scale mobile app iden- tification using deep learning,

S. Rezaei, B. Kroencke, and X. Liu, “Large-scale mobile app iden- tification using deep learning,” IEEE Access , vol. 8, pp. 348–362, 2019

work page 2019

[37] [37]

Encrypted network traffic classification using deep and parallel network-in- network models,

Z. Bu, B. Zhou, P. Cheng, K. Zhang, and Z.-H. Ling, “Encrypted network traffic classification using deep and parallel network-in- network models,” Ieee Access, vol. 8, pp. 132 950–132 959, 2020

work page 2020

[38] [38]

Byte segment neural network for network traffic classification,

R. Li, X. Xiao, S. Ni, H. Zheng, and S. Xia, “Byte segment neural network for network traffic classification,” in 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS) . IEEE, 2018, pp. 1–10

work page 2018

[39] [39]

Mt-flowformer: A semi-supervised flow transformer for encrypted traffic classification,

R. Zhao, X. Deng, Z. Yan, J. Ma, Z. Xue, and Y . Wang, “Mt-flowformer: A semi-supervised flow transformer for encrypted traffic classification,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , ser. KDD ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 2576–2584. [Online]. Available: https://do...

work page arXiv 2022

[40] [40]

Inferring streaming video quality from encrypted traffic: Practical models and deployment experience,

F. Bronzino, P. Schmitt, S. Ayoubi, G. Martins, R. Teixeira, and N. Feamster, “Inferring streaming video quality from encrypted traffic: Practical models and deployment experience,” Proc. ACM Meas. Anal. Comput. Syst. , vol. 3, no. 3, dec 2019. [Online]. Available: https://doi.org/10.1145/3366704

work page doi:10.1145/3366704 2019

[41] [41]

Privateeye: Scalable and privacy-preserving compro- mise detection in the cloud,

B. Arzani, S. Ciraci, S. Saroiu, A. Wolman, J. W. Stokes, G. Outhred, and L. Diwu, “Privateeye: Scalable and privacy-preserving compro- mise detection in the cloud,” in Proceedings of the 17th Usenix Conference on Networked Systems Design and Implementation , ser. NSDI’20. USA: USENIX Association, 2020, p. 797–816

work page 2020

[42] [42]

Pert: Payload encoding representation from transformer for encrypted traffic classification,

H. Y . He, Z. Guo Yang, and X. N. Chen, “Pert: Payload encoding representation from transformer for encrypted traffic classification,” in 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K), 2020, pp. 1–8

work page 2020

[43] [43]

Flow-mae: Leveraging masked autoencoder for accurate, efficient and robust malicious traffic classification,

Z. Hang, Y . Lu, Y . Wang, and Y . Xie, “Flow-mae: Leveraging masked autoencoder for accurate, efficient and robust malicious traffic classification,” in Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses , ser. RAID ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 297–314. [Online]. Avail...

work page arXiv 2023

[44] [44]

Mtsecurity: Privacy-preserving malicious traffic classification using graph neural network and transformer,

J. Yang, X. Jiang, Y . Lei, W. Liang, Z. Ma, and S. Li, “Mtsecurity: Privacy-preserving malicious traffic classification using graph neural network and transformer,”IEEE Transactions on Network and Service Management, pp. 1–1, 2024

work page 2024

[45] [45]

Trafficgpt: Breaking the token barrier for efficient long traffic analysis and genera- tion,

J. Qu, X. Ma, and J. Li, “Trafficgpt: Breaking the token barrier for efficient long traffic analysis and genera- tion,” ArXiv, vol. abs/2403.05822, 2024. [Online]. Available: https://api.semanticscholar.org/CorpusID:268351552

work page arXiv 2024

[46] [46]

Lens: A foundation model for network traffic in cyberse- curity,

Q. Wang, C. Qian, X. Li, Z. Yao, and H. Shao, “Lens: A foundation model for network traffic in cyberse- curity,” ArXiv, vol. abs/2402.03646, 2024. [Online]. Available: https://api.semanticscholar.org/CorpusID:267628222

work page arXiv 2024

[47] [47]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021

work page 2021

[48] [48]

Masked autoencoders are scalable vision learners,

K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” 2021

work page 2021

[49] [49]

Auto-encoding variational bayes,

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2022

work page 2022

[50] [50]

Generative adversarial networks,

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde- Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial networks,” 2014

work page 2014

[51] [51]

Long-short transformer: Efficient transformers for language and vision,

C. Zhu, W. Ping, C. Xiao, M. Shoeybi, T. Goldstein, A. Anandkumar, and B. Catanzaro, “Long-short transformer: Efficient transformers for language and vision,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 17 723– 17 736. [Online]...

work page 2021

[52] [52]

Hier- archical attention networks for document classification,

Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hier- archical attention networks for document classification,” in NAACL, 2016

work page 2016

[53] [53]

Deepvsa: Facili- tating value-set analysis with deep learning for postmortem program analysis

W. Guo, D. Mu, X. Xing, M. Du, and D. Song, “Deepvsa: Facili- tating value-set analysis with deep learning for postmortem program analysis.” in USENIX Security Symposium , 2019

work page 2019

[54] [54]

Hierarchical transformers are more efficient language models,

P. Nawrot, S. Tworkowski, M. Tyrolski, L. Kaiser, Y . Wu, C. Szegedy, and H. Michalewski, “Hierarchical transformers are more efficient language models,” in Findings of the Association for Computational Linguistics: NAACL 2022 , M. Carpuat, M.-C. de Marneffe, and I. V . Meza Ruiz, Eds. Seattle, United States: Association for Computational Linguistics, Jul...

work page 2022

[55] [55]

Word embeddings: A survey,

F. Almeida and G. Xexéo, “Word embeddings: A survey,” 2023

work page 2023

[56] [56]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems , vol. 30, 2017

work page 2017

[57] [57]

Should you mask 15% in masked language modeling?

A. Wettig, T. Gao, Z. Zhong, and D. Chen, “Should you mask 15% in masked language modeling?” in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , A. Vlachos and I. Augenstein, Eds. Dubrovnik, Croatia: Association for Computational Linguistics, May 2023, pp. 2985–3000. [Online]. Available: https:/...

work page 2023

[58] [58]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza- tion,” arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[59] [59]

Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic,

T. van Ede, R. Bortolameotti, A. Continella, J. Ren, D. J. Dubois, M. Lindorfer, D. R. Choffnes, M. van Steen, and A. Peter, “Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic,” Proceedings 2020 Network and Distributed System Security Symposium , 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:211265114

work page 2020

[60] [60]

Characterization of encrypted and vpn traffic using time-related fea- tures,

G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of encrypted and vpn traffic using time-related fea- tures,” in International Conference on Information Systems Security and Privacy, 2016

work page 2016

[61] [61]

Toward gen- erating a new intrusion detection dataset and intrusion traffic char- acterization,

I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward gen- erating a new intrusion detection dataset and intrusion traffic char- acterization,” in Proceedings of the 4th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP , , INSTICC. SciTePress, 2018, pp. 108–116

work page 2018

[62] [62]

Patator,

“Patator,” https://github.com/lanjelot/patator

work page

[63] [63]

Random decision forests,

T. K. Ho, “Random decision forests,” in Proceedings of 3rd inter- national conference on document analysis and recognition , vol. 1. IEEE, 1995, pp. 278–282

work page 1995

[64] [64]

Support-vector networks,

C. Cortes and V . Vapnik, “Support-vector networks,” Machine learn- ing, vol. 20, no. 3, pp. 273–297, 1995

work page 1995

[65] [65]

The probable error of a mean,

Student, “The probable error of a mean,” Biometrika, pp. 1–25, 1908

work page 1908

[66] [66]

From grim reality to practical solution: Malware classification in real-world noise,

X. Wu, W. Guo, J. Yan, B. Coskun, and X. Xing, “From grim reality to practical solution: Malware classification in real-world noise,” in 2023 IEEE Symposium on Security and Privacy (SP), 2023, pp. 2602– 2619

work page 2023

[67] [67]

Longformer: The long- document transformer,

I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The long- document transformer,” 2020

work page 2020

[68] [68]

A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,

A. Alshamrani, S. Myneni, A. Chowdhary, and D. Huang, “A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,” IEEE Communications Surveys & Tutorials , vol. 21, no. 2, pp. 1851–1877, 2019

work page 2019

[69] [69]

Deep metric learning: A survey,

M. Kaya and H. ¸ S. Bilge, “Deep metric learning: A survey,” Symme- try, vol. 11, no. 9, p. 1066, 2019

work page 2019

[70] [70]

LoRA: Low-Rank Adaptation of Large Language Models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[71] [71]

A new hope for network model generalization,

A. Dietmüller, S. Ray, R. Jacob, and L. Vanbever, “A new hope for network model generalization,” in Proceedings of the 21st ACM Workshop on Hot Topics in Networks , ser. HotNets ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 152–159. [Online]. Available: https://doi.org/10.1145/3563766.3564104

work page doi:10.1145/3563766.3564104 2022

[72] [72]

Towards transferable adversarial attacks on vision transformers,

Z. Wei, J. Chen, M. Goldblum, Z. Wu, T. Goldstein, and Y .-G. Jiang, “Towards transferable adversarial attacks on vision transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 2668–2676

work page 2022

[73] [73]

Adversarial attack and defense technologies in natural language processing: A survey,

S. Qiu, Q. Liu, S. Zhou, and W. Huang, “Adversarial attack and defense technologies in natural language processing: A survey,” Neu- rocomputing, vol. 492, pp. 278–307, 2022. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0925231222003861 Appendix In this appendix, we provide brief information about traffic distribution between di...

work page 2022