Versatile yet Efficient Network Traffic Analysis: Offloading Network Foundation Model to SmartNIC
Pith reviewed 2026-05-19 01:28 UTC · model grok-4.3
The pith
Nepco offloads network foundation models to SmartNIC to deliver versatile traffic analysis with millisecond-scale latency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present Nepco, which offloads network foundation models to SmartNIC for traffic analysis. The key is recognizing that discriminative information concentrates in localized byte regions, leading to a hardware-friendly pipeline for direct byte sequence embedding and a pattern-aware convolutional architecture with scoring and gating to extract semantic signatures via translation invariance. Prototyped on Nvidia BlueField-3, it matches the best macro F1 of eight state-of-the-art models but reduces end-to-end latency by 328x to milliseconds.
What carries the argument
Pattern-aware convolutional architecture with scoring and gating mechanisms that uses translation invariance to dynamically locate and extract salient semantic signatures from localized byte sequences.
If this is right
- Versatile multi-task analysis becomes feasible at the network edge without relying on extensive labeled data.
- End-to-end latency drops to milliseconds, enabling real-time security operations that avoid service degradation.
- Direct embedding of raw bytes eliminates bottlenecks from complex preprocessing steps.
- Multiengine execution on SmartNIC supports collaborative analysis for high-volume traffic.
- The design maintains performance parity with global foundation models across diverse encrypted traffic tasks.
Where Pith is reading between the lines
- The localized focus might extend to other data types with regional features, potentially improving efficiency in areas like video stream analysis or log processing.
- Adapting this to CPU-only environments could test if the hardware offload is essential or if the architecture alone suffices.
- This could inspire hybrid models that switch between localized and global modes based on traffic type.
- Scaling to larger SmartNIC clusters might handle even higher throughput in data centers.
Load-bearing premise
The assumption that discriminative traffic information is concentrated in localized byte regions, which allows shifting from global to local modeling.
What would settle it
Running Nepco on datasets where critical features span entire packets or across multiple packets, and observing whether the macro F1 remains competitive or the latency reduction holds on the target SmartNIC hardware.
Figures
read the original abstract
Pervasive encryption makes large-scale labeling infeasible for traffic analysis, while security operations demand edge analysis to avert service degradation and further vulnerabilities. These pressures have produced two disjoint research lines: 1) versatile analysis, via network foundation models for low label dependency, and 2) efficient analysis, via hardware offloading for low analysis latency. However, versatility and efficiency have appeared fundamentally incompatible to co-achieve, with prior work consistently sacrificing one for the other, yet we show that this incompatibility is a consequence of polarized design choices across the three components of traffic analysis systems, i.e., traffic processing, model architecture, and analysis execution. In response, we present Nepco, a versatile yet efficient network traffic analysis system that offloads network foundation models to SmartNIC. Our key observation is that discriminative traffic information is concentrated in localized byte regions, motivating versatile yet efficient localized byte-sequence modeling rather than inefficient global modeling. To exploit this without incurring the latency bottlenecks of complex encoding steps, we employ a hardware-friendly processing pipeline that directly embeds raw byte sequences. Crucially, to maintain versatility across diverse tasks, we propose a pattern-aware convolutional architecture equipped with dedicated scoring and gating mechanisms. By exploiting translation invariance, this design dynamically locates and extracts salient semantic signatures. We prototype Nepco on the Nvidia BlueField-3 SmartNIC with multiengine collaborative analysis execution. The experimental results demonstrate that Nepco achieves macro F1 competitive with the best performances achieved by 8 state-of-the-art network foundation models, while reducing end-to-end latency by 328x to the millisecond scale.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Nepco, a system for versatile yet efficient network traffic analysis that offloads network foundation models to SmartNIC hardware. Motivated by the observation that discriminative traffic information is concentrated in localized byte regions, it replaces global modeling with localized byte-sequence modeling via a hardware-friendly raw-byte embedding pipeline and a pattern-aware convolutional architecture that uses scoring and gating to exploit translation invariance. Prototyped on Nvidia BlueField-3 with multi-engine collaborative execution, it reports macro F1 scores competitive with the best of 8 state-of-the-art network foundation models while achieving a 328× end-to-end latency reduction to the millisecond scale.
Significance. If the central claims hold under rigorous evaluation, the work is significant because it directly addresses the apparent incompatibility between versatile (low-label) foundation-model approaches and efficient (low-latency) hardware-offloading approaches in encrypted traffic analysis. The combination of raw-byte processing, translation-invariant convolutional design, and SmartNIC offloading offers a concrete path toward edge-deployable, real-time analysis that prior work has treated as fundamentally opposed.
major comments (2)
- [Abstract] Abstract (key observation paragraph): The premise that 'discriminative traffic information is concentrated in localized byte regions' is stated as the motivating observation without an accompanying ablation or explicit task coverage. This premise directly justifies the localized modeling choice, the raw-byte pipeline, and the scoring/gating architecture. The manuscript must therefore demonstrate either (a) performance collapse on tasks that require long-range or cross-packet context (e.g., stateful protocol reconstruction, slow-rate DDoS, multi-turn C2) when global context is withheld, or (b) inclusion of such tasks in the reported evaluation suite; absent this evidence the versatility half of the headline result remains under-supported.
- [Experimental evaluation] Experimental evaluation section (results paragraph): The abstract asserts macro F1 competitiveness with 8 SOTA foundation models and a 328× latency reduction, yet the provided text supplies no dataset names, task definitions, baseline implementations, cross-validation details, or error bars. Because these quantities are load-bearing for both the accuracy and latency claims, the full manuscript must report them with sufficient granularity to allow independent verification.
minor comments (1)
- [Abstract] Abstract: The phrase 'multiengine collaborative analysis execution' is introduced without a one-sentence description of the collaboration mechanism or how it interacts with the convolutional pipeline.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of supporting our key claims on localized modeling and ensuring experimental reproducibility. We address each major comment below and will revise the manuscript to strengthen these elements while preserving the core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract (key observation paragraph): The premise that 'discriminative traffic information is concentrated in localized byte regions' is stated as the motivating observation without an accompanying ablation or explicit task coverage. This premise directly justifies the localized modeling choice, the raw-byte pipeline, and the scoring/gating architecture. The manuscript must therefore demonstrate either (a) performance collapse on tasks that require long-range or cross-packet context (e.g., stateful protocol reconstruction, slow-rate DDoS, multi-turn C2) when global context is withheld, or (b) inclusion of such tasks in the reported evaluation suite; absent this evidence the versatility half of the headline result remains under-supported.
Authors: We agree that the motivating observation benefits from explicit support to fully substantiate the versatility claim. Our reported evaluation already spans multiple tasks with varying context requirements, including encrypted traffic classification and intrusion detection scenarios that implicitly test localized pattern extraction. To directly address the concern, we will add a targeted ablation study in the revised manuscript comparing localized versus global modeling on tasks such as stateful protocol reconstruction and slow-rate DDoS detection. This will quantify any performance differences and demonstrate that the localized approach maintains competitive results without requiring full global context for the tasks considered. revision: yes
-
Referee: [Experimental evaluation] Experimental evaluation section (results paragraph): The abstract asserts macro F1 competitiveness with 8 SOTA foundation models and a 328× latency reduction, yet the provided text supplies no dataset names, task definitions, baseline implementations, cross-validation details, or error bars. Because these quantities are load-bearing for both the accuracy and latency claims, the full manuscript must report them with sufficient granularity to allow independent verification.
Authors: We acknowledge that the experimental section requires greater explicitness for independent verification. The full manuscript contains the underlying dataset details, task definitions, and baseline comparisons, but these are not presented with sufficient granularity in the current draft. We will revise the experimental evaluation section to include a summary table listing all datasets (with references and statistics), precise task definitions, baseline model implementations and citations, cross-validation methodology, and error bars (standard deviation across runs) for both macro F1 scores and end-to-end latency measurements. This will make the competitiveness and latency reduction claims fully verifiable. revision: yes
Circularity Check
No circularity; claims rest on experimental validation rather than self-referential derivation
full rationale
The paper's central claims of competitive macro F1 and 328x latency reduction are presented as direct experimental outcomes from a prototype on Nvidia BlueField-3 SmartNIC. The key observation about localized byte regions is explicitly framed as a motivating assumption that leads to design choices (localized modeling, raw-byte embedding, pattern-aware convolution), but no equations, fitted parameters, or self-citations are shown that reduce these claims or the architecture back to the inputs by construction. The derivation chain is self-contained because performance is measured externally against 8 SOTA models rather than predicted from internal fits or prior author results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Discriminative traffic information is concentrated in localized byte regions rather than requiring global modeling.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our key observation is that discriminative traffic information is concentrated in localized byte regions, motivating versatile yet efficient localized byte-sequence modeling rather than inefficient global modeling.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
NetConv employs stacked traffic convolution layers, which enhance the ability to capture localized byte-sequence patterns through window-wise byte scoring and sequence-wise byte gating.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Flowminer: A powerful model based on flow correlation mining for encrypted traffic classification,
H. Xu, C. Si, S. Li, Z. Cheng, C. Wang, J. Xie, P. Sun, and Q. Liu, “Flowminer: A powerful model based on flow correlation mining for encrypted traffic classification,” in IEEE INFOCOM 2025-IEEE Con- ference on Computer Communications . IEEE, 2025, pp. 1–10
work page 2025
-
[2]
An input-agnostic hierarchical deep learning framework for traffic fingerprinting,
J. Qu, X. Ma, J. Li, X. Luo, L. Xue, J. Zhang, Z. Li, L. Feng, and X. Guan, “An input-agnostic hierarchical deep learning framework for traffic fingerprinting,” in 32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 589–606
work page 2023
-
[3]
R. Xie, Y . Wang, J. Cao, E. Dong, M. Xu, K. Sun, Q. Li, L. Shen, and M. Zhang, “Rosetta: Enabling robust tls encrypted traffic classification in diverse network environments with tcp-aware traffic augmentation,” in Proceedings of the ACM turing award celebration conference-China 2023, 2023, pp. 131–132
work page 2023
-
[4]
A- nids: adaptive network intrusion detection system based on clustering and stacked ctgan,
C. Zha, Z. Wang, Y . Fan, B. Bai, Y . Zhang, S. Shi, and R. Zhang, “A- nids: adaptive network intrusion detection system based on clustering and stacked ctgan,” IEEE Transactions on Information Forensics and Security, 2025
work page 2025
-
[5]
Trafficformer: an efficient pre-trained model for traffic data,
G. Zhou, X. Guo, Z. Liu, T. Li, Q. Li, and K. Xu, “Trafficformer: an efficient pre-trained model for traffic data,” in 2025 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 2024, pp. 102–102
work page 2025
-
[6]
X. Lin, G. Xiong, G. Gou, Z. Li, J. Shi, and J. Yu, “Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 633–642
work page 2022
-
[7]
A novel self-supervised framework based on masked autoen- coder for traffic classification,
R. Zhao, M. Zhan, X. Deng, F. Li, Y . Wang, Y . Wang, G. Gui, and Z. Xue, “A novel self-supervised framework based on masked autoen- coder for traffic classification,” IEEE/ACM Transactions on Networking, 2024
work page 2024
-
[8]
Openvpn is open to vpn fingerprinting,
D. Xue, R. Ramesh, A. Jain, M. Kallitsis, J. A. Halderman, J. R. Crandall, and R. Ensafi, “Openvpn is open to vpn fingerprinting,” Communications of the ACM , vol. 68, no. 1, pp. 79–87, 2025
work page 2025
-
[9]
Tor: The second- generation onion router,
R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second- generation onion router,” 2004
work page 2004
-
[10]
2024: The year of scaling security efficiencies,
Zscaler, “2024: The year of scaling security efficiencies,” Accessed March 2025
work page 2024
-
[11]
Encrypted malware traffic detection via graph-based network analysis,
Z. Fu, M. Liu, Y . Qin, J. Zhang, Y . Zou, Q. Yin, Q. Li, and H. Duan, “Encrypted malware traffic detection via graph-based network analysis,” in Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses , 2022, pp. 495–509
work page 2022
-
[12]
Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traf- fic,
T. Van Ede, R. Bortolameotti, A. Continella, J. Ren, D. J. Dubois, M. Lindorfer, D. Choffnes, M. van Steen, and A. Peter, “Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traf- fic,” in Network and distributed system security symposium (NDSS) , vol. 27, 2020
work page 2020
-
[13]
Appscanner: Automatic fingerprinting of smartphone apps from encrypted network traffic,
V . F. Taylor, R. Spolaor, M. Conti, and I. Martinovic, “Appscanner: Automatic fingerprinting of smartphone apps from encrypted network traffic,” in 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2016, pp. 439–454
work page 2016
-
[14]
Machine learning-powered encrypted network traffic analysis: A com- prehensive survey,
M. Shen, K. Ye, X. Liu, L. Zhu, J. Kang, S. Yu, Q. Li, and K. Xu, “Machine learning-powered encrypted network traffic analysis: A com- prehensive survey,”IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 791–824, 2022
work page 2022
-
[15]
Fs-net: A flow sequence network for encrypted traffic classification,
C. Liu, L. He, G. Xiong, Z. Cao, and Z. Li, “Fs-net: A flow sequence network for encrypted traffic classification,” in IEEE INFOCOM 2019- IEEE Conference On Computer Communications . IEEE, 2019, pp. 1171–1179
work page 2019
-
[16]
Ebsnn: Extended byte segment neural network for network traffic classification,
X. Xiao, W. Xiao, R. Li, X. Luo, H. Zheng, and S. Xia, “Ebsnn: Extended byte segment neural network for network traffic classification,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 5, pp. 3521–3538, 2021
work page 2021
-
[17]
H. Zhang, L. Yu, X. Xiao, Q. Li, F. Mercaldo, X. Luo, and Q. Liu, “Tfe-gnn: A temporal fusion encoder using graph neural networks for fine-grained encrypted traffic classification,” in Proceedings of the ACM Web Conference 2023, 2023, pp. 2066–2075
work page 2023
-
[18]
A few shots traffic classification with mini-flowpic augmentations,
E. Horowicz, T. Shapira, and Y . Shavitt, “A few shots traffic classification with mini-flowpic augmentations,” in Proceedings of the 22nd ACM Internet Measurement Conference , 2022, pp. 647–654
work page 2022
-
[19]
Realistic website fingerprinting by augmenting network traces,
A. Bahramali, A. Bozorgi, and A. Houmansadr, “Realistic website fingerprinting by augmenting network traces,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 1035–1049
work page 2023
-
[20]
Pert: Payload encoding representation from transformer for encrypted traffic classification,
H. Y . He, Z. G. Yang, and X. N. Chen, “Pert: Payload encoding representation from transformer for encrypted traffic classification,” in 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K). IEEE, 2020, pp. 1–8
work page 2020
-
[21]
Netgpt: Generative pretrained transformer for network traffic,
X. Meng, C. Lin, Y . Wang, and Y . Zhang, “Netgpt: Generative pretrained transformer for network traffic,”arXiv preprint arXiv:2304.09513, 2023
-
[22]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems , vol. 30, 2017
work page 2017
-
[23]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo- lutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[24]
Towards the development of a realistic multi- dimensional iot profiling dataset,
S. Dadkhah, H. Mahdikhani, P. K. Danso, A. Zohourian, K. A. Truong, and A. A. Ghorbani, “Towards the development of a realistic multi- dimensional iot profiling dataset,” in 2022 19th Annual International Conference on Privacy, Security & Trust (PST). IEEE, 2022, pp. 1–11
work page 2022
-
[25]
I. L. Cherif and A. Kortebi, “On using extreme gradient boosting (xgboost) machine learning algorithm for home network traffic clas- sification,” in 2019 Wireless Days (WD) . IEEE, 2019, pp. 1–6
work page 2019
-
[26]
An analysis of network traffic identification based on decision tree,
J. Dai, Y . Chen, Y . Chen, and A. Meng, “An analysis of network traffic identification based on decision tree,” in 2021 International Conference on Artificial Intelligence and Electromechanical Automation (AIEA) . IEEE, 2021, pp. 308–311
work page 2021
-
[27]
Iotmosaic: Inferring user activities from iot network traffic in smart homes,
Y . Wan, K. Xu, F. Wang, and G. Xue, “Iotmosaic: Inferring user activities from iot network traffic in smart homes,” inIEEE INFOCOM 2022-IEEE Conference on Computer Communications . IEEE, 2022, pp. 370–379
work page 2022
-
[28]
Netmamba: Efficient network traffic classification via pre-training unidirectional mamba,
T. Wang, X. Xie, W. Wang, C. Wang, Y . Zhao, and Y . Cui, “Netmamba: Efficient network traffic classification via pre-training unidirectional mamba,” in 2024 IEEE 32nd International Conference on Network Protocols (ICNP). IEEE, 2024, pp. 1–11
work page 2024
-
[29]
C. Lin, Y . Jiang, W. Zhang, X. Meng, T. Zuo, and Y . Zhang, “Trage: A generic packet representation for traffic classification based on header- payload differences,” in 2025 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS) . IEEE, 2025, pp. 1–6
work page 2025
-
[30]
Convolutions are competitive with transformers for protein sequence pretraining,
K. K. Yang, N. Fusi, and A. X. Lu, “Convolutions are competitive with transformers for protein sequence pretraining,” Cell Systems , vol. 15, no. 3, pp. 286–294, 2024
work page 2024
-
[31]
Dynamic convolutional neural networks as efficient pre-trained audio models,
F. Schmid, K. Koutini, and G. Widmer, “Dynamic convolutional neural networks as efficient pre-trained audio models,”IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 32, pp. 2227–2241, 2024
work page 2024
-
[32]
Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2022, pp. 11 976–11 986
work page 2022
-
[33]
Convolutional Neural Networks for Sentence Classification
Y . Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[34]
Deep packet: A novel approach for encrypted traffic classification using deep learning,
M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hossein Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,” Soft Computing , vol. 24, no. 3, pp. 1999–2012, 2020
work page 1999
-
[35]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
Characterization of encrypted and vpn traffic using time-related,
G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of encrypted and vpn traffic using time-related,” in Proceedings of the 2nd international conference on information systems security and privacy (ICISSP) , 2016, pp. 407–414
work page 2016
-
[37]
Visualizing and understanding convolu- tional networks,
M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu- tional networks,” in European conference on computer vision. Springer, 2014, pp. 818–833
work page 2014
-
[38]
Malware traffic classification using convolutional neural network for representation learning,
W. Wang, M. Zhu, X. Zeng, X. Ye, and Y . Sheng, “Malware traffic classification using convolutional neural network for representation learning,” in 2017 International conference on information networking (ICOIN). IEEE, 2017, pp. 712–717
work page 2017
-
[39]
Quark: Implementing convolutional neural networks entirely on programmable data plane,
M. Zhang, L. Cui, X. Zhang, F. P. Tso, Z. Zhen, Y . Deng, and Z. Li, “Quark: Implementing convolutional neural networks entirely on programmable data plane,” in IEEE INFOCOM 2025-IEEE Conference on Computer Communications . IEEE, 2025, pp. 1–10
work page 2025
-
[40]
J. Yan, H. Xu, Z. Liu, Q. Li, K. Xu, M. Xu, and J. Wu, “ {Brain-on- Switch}: Towards advanced intelligent network data plane via {NN- Driven} traffic analysis at {Line-Speed},” in 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24) , 2024, pp. 419–440
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.