pith. sign in

arxiv: 2508.02001 · v2 · submitted 2025-08-04 · 💻 cs.NI · cs.LG

Versatile yet Efficient Network Traffic Analysis: Offloading Network Foundation Model to SmartNIC

Pith reviewed 2026-05-19 01:28 UTC · model grok-4.3

classification 💻 cs.NI cs.LG
keywords network traffic analysisSmartNIC offloadingnetwork foundation modelslocalized modelingpattern-aware convolutionencrypted traffic analysisedge computinglow-latency analysis
0
0 comments X

The pith

Nepco offloads network foundation models to SmartNIC to deliver versatile traffic analysis with millisecond-scale latency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tries to prove that the trade-off between versatile network foundation models and efficient hardware offloading is not fundamental but stems from bad design choices in processing, architecture, and execution. By noticing that key information sits in local byte areas, it builds a system with direct raw byte embedding and a convolutional design that finds patterns using scoring and gating. This keeps the ability to handle many tasks well while slashing latency dramatically on SmartNIC hardware. A reader would care because encrypted traffic is hard to label and real-time edge analysis is needed to stop security issues from spreading. If successful, it means security tools can use advanced models without waiting for slow central processing.

Core claim

We present Nepco, which offloads network foundation models to SmartNIC for traffic analysis. The key is recognizing that discriminative information concentrates in localized byte regions, leading to a hardware-friendly pipeline for direct byte sequence embedding and a pattern-aware convolutional architecture with scoring and gating to extract semantic signatures via translation invariance. Prototyped on Nvidia BlueField-3, it matches the best macro F1 of eight state-of-the-art models but reduces end-to-end latency by 328x to milliseconds.

What carries the argument

Pattern-aware convolutional architecture with scoring and gating mechanisms that uses translation invariance to dynamically locate and extract salient semantic signatures from localized byte sequences.

If this is right

  • Versatile multi-task analysis becomes feasible at the network edge without relying on extensive labeled data.
  • End-to-end latency drops to milliseconds, enabling real-time security operations that avoid service degradation.
  • Direct embedding of raw bytes eliminates bottlenecks from complex preprocessing steps.
  • Multiengine execution on SmartNIC supports collaborative analysis for high-volume traffic.
  • The design maintains performance parity with global foundation models across diverse encrypted traffic tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The localized focus might extend to other data types with regional features, potentially improving efficiency in areas like video stream analysis or log processing.
  • Adapting this to CPU-only environments could test if the hardware offload is essential or if the architecture alone suffices.
  • This could inspire hybrid models that switch between localized and global modes based on traffic type.
  • Scaling to larger SmartNIC clusters might handle even higher throughput in data centers.

Load-bearing premise

The assumption that discriminative traffic information is concentrated in localized byte regions, which allows shifting from global to local modeling.

What would settle it

Running Nepco on datasets where critical features span entire packets or across multiple packets, and observing whether the macro F1 remains competitive or the latency reduction holds on the target SmartNIC hardware.

Figures

Figures reproduced from arXiv: 2508.02001 by Chungang Lin, Guanming Che, Haitong Luo, Meng Shen, Ruijie Zhao, Ruiqi Meng, Tianyu Zuo, Weiyao Zhang, Xuying Meng, Yujun Zhang, Zhiwei Xu, Ziyue Huang.

Figure 1
Figure 1. Figure 1: Classification performance and model efficiency comparisons between [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of classification performance and model efficiency [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Changes in classification performance of the pre-trained convolutional [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of traffic scalability during fine-tuning between pre [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: The framework of NetConv. to capture localized byte sequence patterns is a promising approach for improving the classification performance of pre￾trained convolutional models. In summary, the above observations demonstrate that pre￾trained convolutions can effectively address the challenges faced by pre-trained Transformers in terms of model efficiency (O1) and traffic scalability (O2). Furthermore, we ide… view at source ↗
Figure 6
Figure 6. Figure 6: Inference efficiency comparison of different methods. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison results in the few-shot learning scenario. X-shot means the model has access to X labeled traffic samples per traffic category. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

Pervasive encryption makes large-scale labeling infeasible for traffic analysis, while security operations demand edge analysis to avert service degradation and further vulnerabilities. These pressures have produced two disjoint research lines: 1) versatile analysis, via network foundation models for low label dependency, and 2) efficient analysis, via hardware offloading for low analysis latency. However, versatility and efficiency have appeared fundamentally incompatible to co-achieve, with prior work consistently sacrificing one for the other, yet we show that this incompatibility is a consequence of polarized design choices across the three components of traffic analysis systems, i.e., traffic processing, model architecture, and analysis execution. In response, we present Nepco, a versatile yet efficient network traffic analysis system that offloads network foundation models to SmartNIC. Our key observation is that discriminative traffic information is concentrated in localized byte regions, motivating versatile yet efficient localized byte-sequence modeling rather than inefficient global modeling. To exploit this without incurring the latency bottlenecks of complex encoding steps, we employ a hardware-friendly processing pipeline that directly embeds raw byte sequences. Crucially, to maintain versatility across diverse tasks, we propose a pattern-aware convolutional architecture equipped with dedicated scoring and gating mechanisms. By exploiting translation invariance, this design dynamically locates and extracts salient semantic signatures. We prototype Nepco on the Nvidia BlueField-3 SmartNIC with multiengine collaborative analysis execution. The experimental results demonstrate that Nepco achieves macro F1 competitive with the best performances achieved by 8 state-of-the-art network foundation models, while reducing end-to-end latency by 328x to the millisecond scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents Nepco, a system for versatile yet efficient network traffic analysis that offloads network foundation models to SmartNIC hardware. Motivated by the observation that discriminative traffic information is concentrated in localized byte regions, it replaces global modeling with localized byte-sequence modeling via a hardware-friendly raw-byte embedding pipeline and a pattern-aware convolutional architecture that uses scoring and gating to exploit translation invariance. Prototyped on Nvidia BlueField-3 with multi-engine collaborative execution, it reports macro F1 scores competitive with the best of 8 state-of-the-art network foundation models while achieving a 328× end-to-end latency reduction to the millisecond scale.

Significance. If the central claims hold under rigorous evaluation, the work is significant because it directly addresses the apparent incompatibility between versatile (low-label) foundation-model approaches and efficient (low-latency) hardware-offloading approaches in encrypted traffic analysis. The combination of raw-byte processing, translation-invariant convolutional design, and SmartNIC offloading offers a concrete path toward edge-deployable, real-time analysis that prior work has treated as fundamentally opposed.

major comments (2)
  1. [Abstract] Abstract (key observation paragraph): The premise that 'discriminative traffic information is concentrated in localized byte regions' is stated as the motivating observation without an accompanying ablation or explicit task coverage. This premise directly justifies the localized modeling choice, the raw-byte pipeline, and the scoring/gating architecture. The manuscript must therefore demonstrate either (a) performance collapse on tasks that require long-range or cross-packet context (e.g., stateful protocol reconstruction, slow-rate DDoS, multi-turn C2) when global context is withheld, or (b) inclusion of such tasks in the reported evaluation suite; absent this evidence the versatility half of the headline result remains under-supported.
  2. [Experimental evaluation] Experimental evaluation section (results paragraph): The abstract asserts macro F1 competitiveness with 8 SOTA foundation models and a 328× latency reduction, yet the provided text supplies no dataset names, task definitions, baseline implementations, cross-validation details, or error bars. Because these quantities are load-bearing for both the accuracy and latency claims, the full manuscript must report them with sufficient granularity to allow independent verification.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'multiengine collaborative analysis execution' is introduced without a one-sentence description of the collaboration mechanism or how it interacts with the convolutional pipeline.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of supporting our key claims on localized modeling and ensuring experimental reproducibility. We address each major comment below and will revise the manuscript to strengthen these elements while preserving the core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract (key observation paragraph): The premise that 'discriminative traffic information is concentrated in localized byte regions' is stated as the motivating observation without an accompanying ablation or explicit task coverage. This premise directly justifies the localized modeling choice, the raw-byte pipeline, and the scoring/gating architecture. The manuscript must therefore demonstrate either (a) performance collapse on tasks that require long-range or cross-packet context (e.g., stateful protocol reconstruction, slow-rate DDoS, multi-turn C2) when global context is withheld, or (b) inclusion of such tasks in the reported evaluation suite; absent this evidence the versatility half of the headline result remains under-supported.

    Authors: We agree that the motivating observation benefits from explicit support to fully substantiate the versatility claim. Our reported evaluation already spans multiple tasks with varying context requirements, including encrypted traffic classification and intrusion detection scenarios that implicitly test localized pattern extraction. To directly address the concern, we will add a targeted ablation study in the revised manuscript comparing localized versus global modeling on tasks such as stateful protocol reconstruction and slow-rate DDoS detection. This will quantify any performance differences and demonstrate that the localized approach maintains competitive results without requiring full global context for the tasks considered. revision: yes

  2. Referee: [Experimental evaluation] Experimental evaluation section (results paragraph): The abstract asserts macro F1 competitiveness with 8 SOTA foundation models and a 328× latency reduction, yet the provided text supplies no dataset names, task definitions, baseline implementations, cross-validation details, or error bars. Because these quantities are load-bearing for both the accuracy and latency claims, the full manuscript must report them with sufficient granularity to allow independent verification.

    Authors: We acknowledge that the experimental section requires greater explicitness for independent verification. The full manuscript contains the underlying dataset details, task definitions, and baseline comparisons, but these are not presented with sufficient granularity in the current draft. We will revise the experimental evaluation section to include a summary table listing all datasets (with references and statistics), precise task definitions, baseline model implementations and citations, cross-validation methodology, and error bars (standard deviation across runs) for both macro F1 scores and end-to-end latency measurements. This will make the competitiveness and latency reduction claims fully verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on experimental validation rather than self-referential derivation

full rationale

The paper's central claims of competitive macro F1 and 328x latency reduction are presented as direct experimental outcomes from a prototype on Nvidia BlueField-3 SmartNIC. The key observation about localized byte regions is explicitly framed as a motivating assumption that leads to design choices (localized modeling, raw-byte embedding, pattern-aware convolution), but no equations, fitted parameters, or self-citations are shown that reduce these claims or the architecture back to the inputs by construction. The derivation chain is self-contained because performance is measured externally against 8 SOTA models rather than predicted from internal fits or prior author results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the unverified observation that discriminative information is localized in byte regions and on the assumption that a convolutional architecture with scoring/gating preserves versatility across tasks. No explicit free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Discriminative traffic information is concentrated in localized byte regions rather than requiring global modeling.
    This is presented as the key observation motivating the entire design; if false, the localized modeling approach would not deliver versatility.

pith-pipeline@v0.9.0 · 5860 in / 1393 out tokens · 40856 ms · 2026-05-19T01:28:33.379066+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 3 internal anchors

  1. [1]

    Flowminer: A powerful model based on flow correlation mining for encrypted traffic classification,

    H. Xu, C. Si, S. Li, Z. Cheng, C. Wang, J. Xie, P. Sun, and Q. Liu, “Flowminer: A powerful model based on flow correlation mining for encrypted traffic classification,” in IEEE INFOCOM 2025-IEEE Con- ference on Computer Communications . IEEE, 2025, pp. 1–10

  2. [2]

    An input-agnostic hierarchical deep learning framework for traffic fingerprinting,

    J. Qu, X. Ma, J. Li, X. Luo, L. Xue, J. Zhang, Z. Li, L. Feng, and X. Guan, “An input-agnostic hierarchical deep learning framework for traffic fingerprinting,” in 32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 589–606

  3. [3]

    Rosetta: Enabling robust tls encrypted traffic classification in diverse network environments with tcp-aware traffic augmentation,

    R. Xie, Y . Wang, J. Cao, E. Dong, M. Xu, K. Sun, Q. Li, L. Shen, and M. Zhang, “Rosetta: Enabling robust tls encrypted traffic classification in diverse network environments with tcp-aware traffic augmentation,” in Proceedings of the ACM turing award celebration conference-China 2023, 2023, pp. 131–132

  4. [4]

    A- nids: adaptive network intrusion detection system based on clustering and stacked ctgan,

    C. Zha, Z. Wang, Y . Fan, B. Bai, Y . Zhang, S. Shi, and R. Zhang, “A- nids: adaptive network intrusion detection system based on clustering and stacked ctgan,” IEEE Transactions on Information Forensics and Security, 2025

  5. [5]

    Trafficformer: an efficient pre-trained model for traffic data,

    G. Zhou, X. Guo, Z. Liu, T. Li, Q. Li, and K. Xu, “Trafficformer: an efficient pre-trained model for traffic data,” in 2025 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 2024, pp. 102–102

  6. [6]

    Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,

    X. Lin, G. Xiong, G. Gou, Z. Li, J. Shi, and J. Yu, “Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 633–642

  7. [7]

    A novel self-supervised framework based on masked autoen- coder for traffic classification,

    R. Zhao, M. Zhan, X. Deng, F. Li, Y . Wang, Y . Wang, G. Gui, and Z. Xue, “A novel self-supervised framework based on masked autoen- coder for traffic classification,” IEEE/ACM Transactions on Networking, 2024

  8. [8]

    Openvpn is open to vpn fingerprinting,

    D. Xue, R. Ramesh, A. Jain, M. Kallitsis, J. A. Halderman, J. R. Crandall, and R. Ensafi, “Openvpn is open to vpn fingerprinting,” Communications of the ACM , vol. 68, no. 1, pp. 79–87, 2025

  9. [9]

    Tor: The second- generation onion router,

    R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second- generation onion router,” 2004

  10. [10]

    2024: The year of scaling security efficiencies,

    Zscaler, “2024: The year of scaling security efficiencies,” Accessed March 2025

  11. [11]

    Encrypted malware traffic detection via graph-based network analysis,

    Z. Fu, M. Liu, Y . Qin, J. Zhang, Y . Zou, Q. Yin, Q. Li, and H. Duan, “Encrypted malware traffic detection via graph-based network analysis,” in Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses , 2022, pp. 495–509

  12. [12]

    Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traf- fic,

    T. Van Ede, R. Bortolameotti, A. Continella, J. Ren, D. J. Dubois, M. Lindorfer, D. Choffnes, M. van Steen, and A. Peter, “Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traf- fic,” in Network and distributed system security symposium (NDSS) , vol. 27, 2020

  13. [13]

    Appscanner: Automatic fingerprinting of smartphone apps from encrypted network traffic,

    V . F. Taylor, R. Spolaor, M. Conti, and I. Martinovic, “Appscanner: Automatic fingerprinting of smartphone apps from encrypted network traffic,” in 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2016, pp. 439–454

  14. [14]

    Machine learning-powered encrypted network traffic analysis: A com- prehensive survey,

    M. Shen, K. Ye, X. Liu, L. Zhu, J. Kang, S. Yu, Q. Li, and K. Xu, “Machine learning-powered encrypted network traffic analysis: A com- prehensive survey,”IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 791–824, 2022

  15. [15]

    Fs-net: A flow sequence network for encrypted traffic classification,

    C. Liu, L. He, G. Xiong, Z. Cao, and Z. Li, “Fs-net: A flow sequence network for encrypted traffic classification,” in IEEE INFOCOM 2019- IEEE Conference On Computer Communications . IEEE, 2019, pp. 1171–1179

  16. [16]

    Ebsnn: Extended byte segment neural network for network traffic classification,

    X. Xiao, W. Xiao, R. Li, X. Luo, H. Zheng, and S. Xia, “Ebsnn: Extended byte segment neural network for network traffic classification,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 5, pp. 3521–3538, 2021

  17. [17]

    Tfe-gnn: A temporal fusion encoder using graph neural networks for fine-grained encrypted traffic classification,

    H. Zhang, L. Yu, X. Xiao, Q. Li, F. Mercaldo, X. Luo, and Q. Liu, “Tfe-gnn: A temporal fusion encoder using graph neural networks for fine-grained encrypted traffic classification,” in Proceedings of the ACM Web Conference 2023, 2023, pp. 2066–2075

  18. [18]

    A few shots traffic classification with mini-flowpic augmentations,

    E. Horowicz, T. Shapira, and Y . Shavitt, “A few shots traffic classification with mini-flowpic augmentations,” in Proceedings of the 22nd ACM Internet Measurement Conference , 2022, pp. 647–654

  19. [19]

    Realistic website fingerprinting by augmenting network traces,

    A. Bahramali, A. Bozorgi, and A. Houmansadr, “Realistic website fingerprinting by augmenting network traces,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 1035–1049

  20. [20]

    Pert: Payload encoding representation from transformer for encrypted traffic classification,

    H. Y . He, Z. G. Yang, and X. N. Chen, “Pert: Payload encoding representation from transformer for encrypted traffic classification,” in 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K). IEEE, 2020, pp. 1–8

  21. [21]

    Netgpt: Generative pretrained transformer for network traffic,

    X. Meng, C. Lin, Y . Wang, and Y . Zhang, “Netgpt: Generative pretrained transformer for network traffic,”arXiv preprint arXiv:2304.09513, 2023

  22. [22]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems , vol. 30, 2017

  23. [23]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo- lutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017

  24. [24]

    Towards the development of a realistic multi- dimensional iot profiling dataset,

    S. Dadkhah, H. Mahdikhani, P. K. Danso, A. Zohourian, K. A. Truong, and A. A. Ghorbani, “Towards the development of a realistic multi- dimensional iot profiling dataset,” in 2022 19th Annual International Conference on Privacy, Security & Trust (PST). IEEE, 2022, pp. 1–11

  25. [25]

    On using extreme gradient boosting (xgboost) machine learning algorithm for home network traffic clas- sification,

    I. L. Cherif and A. Kortebi, “On using extreme gradient boosting (xgboost) machine learning algorithm for home network traffic clas- sification,” in 2019 Wireless Days (WD) . IEEE, 2019, pp. 1–6

  26. [26]

    An analysis of network traffic identification based on decision tree,

    J. Dai, Y . Chen, Y . Chen, and A. Meng, “An analysis of network traffic identification based on decision tree,” in 2021 International Conference on Artificial Intelligence and Electromechanical Automation (AIEA) . IEEE, 2021, pp. 308–311

  27. [27]

    Iotmosaic: Inferring user activities from iot network traffic in smart homes,

    Y . Wan, K. Xu, F. Wang, and G. Xue, “Iotmosaic: Inferring user activities from iot network traffic in smart homes,” inIEEE INFOCOM 2022-IEEE Conference on Computer Communications . IEEE, 2022, pp. 370–379

  28. [28]

    Netmamba: Efficient network traffic classification via pre-training unidirectional mamba,

    T. Wang, X. Xie, W. Wang, C. Wang, Y . Zhao, and Y . Cui, “Netmamba: Efficient network traffic classification via pre-training unidirectional mamba,” in 2024 IEEE 32nd International Conference on Network Protocols (ICNP). IEEE, 2024, pp. 1–11

  29. [29]

    Trage: A generic packet representation for traffic classification based on header- payload differences,

    C. Lin, Y . Jiang, W. Zhang, X. Meng, T. Zuo, and Y . Zhang, “Trage: A generic packet representation for traffic classification based on header- payload differences,” in 2025 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS) . IEEE, 2025, pp. 1–6

  30. [30]

    Convolutions are competitive with transformers for protein sequence pretraining,

    K. K. Yang, N. Fusi, and A. X. Lu, “Convolutions are competitive with transformers for protein sequence pretraining,” Cell Systems , vol. 15, no. 3, pp. 286–294, 2024

  31. [31]

    Dynamic convolutional neural networks as efficient pre-trained audio models,

    F. Schmid, K. Koutini, and G. Widmer, “Dynamic convolutional neural networks as efficient pre-trained audio models,”IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 32, pp. 2227–2241, 2024

  32. [32]

    A convnet for the 2020s,

    Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2022, pp. 11 976–11 986

  33. [33]

    Convolutional Neural Networks for Sentence Classification

    Y . Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882 , 2014

  34. [34]

    Deep packet: A novel approach for encrypted traffic classification using deep learning,

    M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hossein Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,” Soft Computing , vol. 24, no. 3, pp. 1999–2012, 2020

  35. [35]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018

  36. [36]

    Characterization of encrypted and vpn traffic using time-related,

    G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of encrypted and vpn traffic using time-related,” in Proceedings of the 2nd international conference on information systems security and privacy (ICISSP) , 2016, pp. 407–414

  37. [37]

    Visualizing and understanding convolu- tional networks,

    M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu- tional networks,” in European conference on computer vision. Springer, 2014, pp. 818–833

  38. [38]

    Malware traffic classification using convolutional neural network for representation learning,

    W. Wang, M. Zhu, X. Zeng, X. Ye, and Y . Sheng, “Malware traffic classification using convolutional neural network for representation learning,” in 2017 International conference on information networking (ICOIN). IEEE, 2017, pp. 712–717

  39. [39]

    Quark: Implementing convolutional neural networks entirely on programmable data plane,

    M. Zhang, L. Cui, X. Zhang, F. P. Tso, Z. Zhen, Y . Deng, and Z. Li, “Quark: Implementing convolutional neural networks entirely on programmable data plane,” in IEEE INFOCOM 2025-IEEE Conference on Computer Communications . IEEE, 2025, pp. 1–10

  40. [40]

    {Brain-on- Switch}: Towards advanced intelligent network data plane via {NN- Driven} traffic analysis at {Line-Speed},

    J. Yan, H. Xu, Z. Liu, Q. Li, K. Xu, M. Xu, and J. Wu, “ {Brain-on- Switch}: Towards advanced intelligent network data plane via {NN- Driven} traffic analysis at {Line-Speed},” in 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24) , 2024, pp. 419–440