TraceCodec: A Compiler-Backed Neural Codec for Stateful Multi-Flow Network Traffic Traces

Junhui Ding; Shinan Liu; Xiaohui Xie; Xinchen Zhang

arxiv: 2605.29941 · v1 · pith:G2OMQZTSnew · submitted 2026-05-28 · 💻 cs.NI · cs.LG

TraceCodec: A Compiler-Backed Neural Codec for Stateful Multi-Flow Network Traffic Traces

Junhui Ding , Xinchen Zhang , Xiaohui Xie , Shinan Liu This is my paper

Pith reviewed 2026-06-29 00:32 UTC · model grok-4.3

classification 💻 cs.NI cs.LG

keywords neural codecpacket trace generationmulti-flow network trafficTCP state preservationcompiler-backed decodingCICIDS2017

0 comments

The pith

TraceCodec lifts packets to timed actions with flow slots, then uses a deterministic compiler to render valid PCAPs that match real traces to 0.03%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that neural packet generators fail when they decode directly to raw header fields because that choice mixes learned behavior with deterministic protocol rules and forces heuristic repairs afterward. TraceCodec instead represents each packet as an explicit timed action carrying flow identity and transport cues, trains a latent space over those actions, and delegates all state management and legality to a fixed compiler that produces the final trace. On the CICIDS2017 Monday capture this separation lets the model reproduce packet counts, protocol mix, and flow population within 0.03 percent while raw-field baselines under the same no-repair rule distort those quantities by orders of magnitude. The result matters for any workflow that needs realistic multi-flow PCAPs for testing or security without exposing live traffic.

Core claim

TraceCodec lifts each packet into a timed packet action with explicit flow slots and transport cues, learns a continuous per-packet latent, and lowers the decoded actions to PCAPs via a deterministic compiler that owns endpoint assignment, TCP state, legality constraints, and packet rendering.

What carries the argument

State-aware neural codec that decouples a learned latent over packet actions from a deterministic compiler that enforces protocol constraints and produces the final trace.

If this is right

Downstream traffic models can generate sequences in the packet-action latent space instead of raw header fields.
TCP state transitions and multi-flow interleaving remain intact because the compiler, not the neural decoder, enforces them.
Packet count, protocol composition, and flow population match the source trace to within 0.03 percent under a non-repair policy.
Structural diagnostics confirm preservation of stateful behavior that raw-field decoders fragment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same compiler interface could be swapped for other protocol stacks if the action vocabulary and legality rules are rewritten.
Synthetic traces produced this way could serve as drop-in replacements for real PCAPs in privacy-sensitive evaluation pipelines.
The latent space might support controlled editing of traffic properties such as flow duration or protocol mix by operating directly on the continuous actions.

Load-bearing premise

The deterministic compiler can correctly own endpoint assignment, TCP state, legality constraints, and packet rendering for all decoded actions without introducing systematic biases that affect the learned latent space.

What would settle it

Generate traces from the model on a held-out day of CICIDS2017 or another capture; if TCP state-transition frequencies or active-flow counts deviate by more than 1 percent from the real trace while the same non-repair raw-field baseline stays closer, the separation claim is falsified.

Figures

Figures reproduced from arXiv: 2605.29941 by Junhui Ding, Shinan Liu, Xiaohui Xie, Xinchen Zhang.

**Figure 1.** Figure 1: Comparison of decode interfaces. Raw-field decode entangles behavior with protocol consequences, making generated rows repair-dependent. TraceCodec learns packet actions above a clean contract and compiles them into state-consistent packets. The core problem is the decode interface. Raw packet fields mix two kinds of variables: behavioral choices that a model should learn and protocol consequences that pac… view at source ↗

**Figure 2.** Figure 2: TRACECODEC overview. Packet traces are lifted into timed packet actions and continuous per-packet latents; decoding maps sampled latents back to timed actions before deterministic state realization emits the final PCAP [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Protocol state preservation analysis on CICIDS2017 Monday. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: TCP transition and state consistency on CICIDS2017 Monday. Each heatmap is a row [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Active-flow and multi-flow interleaving on CICIDS2017 Monday. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Protocol state preservation analysis on MAWI. This repeats the decoded-PCAP diagnostic [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: TCP transition and state consistency on MAWI. Each heatmap is a row-normalized transition [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Active-flow and multi-flow interleaving on MAWI, using the same structural diagnostics as [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

read the original abstract

Critical networking workflows require high-fidelity packet captures (PCAPs) for testing, security analysis, and protocol validation, not just statistical flow-level summaries. Recent packet generators have demonstrated protocol-constrained PCAP synthesis, but they universally decode directly to raw packet fields. That interface entangles learned behavioral choices with deterministic protocol consequences, which forces packet realization to depend on post-hoc heuristic repair. We identify this decode interface as the fundamental bottleneck and present TraceCodec, a state-aware neural codec for stateful multi-flow traces. TraceCodec lifts each packet into a timed packet action with explicit flow slots and transport cues, then learns a continuous per-packet latent. A deterministic compiler lowers decoded actions back to PCAPs, owning endpoint assignment, TCP state, legality constraints, and packet rendering. The latent layer exposes a generator-facing sequence space, so downstream traffic models can operate on packet-action latents rather than raw header fields. On CICIDS2017 Monday, TraceCodec matches packet count, protocol composition, and flow population to within 0.03%. Raw-field baselines under the same non-repair policy distort flow counts and TCP state by orders of magnitude. Structural diagnostics show that TraceCodec preserves TCP state transitions and multi-flow interleaving that raw-field decoders fragment. This work establishes a new foundation for high-fidelity packet-trace generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TraceCodec's split between a neural latent on packet actions and a deterministic compiler for state looks like a useful interface move, but the 0.03% fidelity numbers rest on an underspecified compiler whose rules are not shown or ablated.

read the letter

The new piece is the explicit handoff: the model outputs timed actions with flow slots and transport cues, then a compiler owns endpoint assignment, TCP state, legality, and rendering. That separation lets downstream generators work in a cleaner sequence space instead of raw headers, and the abstract positions it as fixing the repair problem in prior packet generators.

On CICIDS2017 Monday the numbers are striking—packet count, protocol mix, and flow population within 0.03% while raw-field baselines under the same non-repair rule blow up flow counts and TCP state. The structural checks on state transitions and interleaving also give some evidence the outputs stay coherent.

The main gap is that nothing is said about the compiler itself: its state machine, handling of ambiguous flags, retransmissions, or multi-flow slot resolution. If those rules quietly enforce constraints that the baselines are denied, the performance difference is an artifact of the interface rather than proof the latent learned better behavior. No pseudocode, edge cases, or ablation isolating the compiler appears in the abstract, and the usual details on splits, error bars, or baseline implementations are absent.

This is for people who generate synthetic traces for security testing or protocol work. The framing is worth a serious referee if the full paper supplies the compiler specification and ablations; without them the central claim stays provisional.

Referee Report

2 major / 2 minor

Summary. The paper presents TraceCodec, a state-aware neural codec for stateful multi-flow network traffic traces. Packets are lifted to timed actions with explicit flow slots and transport cues; a continuous per-packet latent is learned; and a deterministic compiler lowers the actions to PCAPs by owning endpoint assignment, TCP state tracking, legality constraints, and packet rendering. On CICIDS2017 Monday the method matches packet count, protocol composition, and flow population to within 0.03 % while preserving TCP state transitions and multi-flow interleaving; raw-field baselines under the same non-repair policy distort these quantities by orders of magnitude.

Significance. If the compiler rules prove to be dataset-independent and the quantitative and structural results are reproducible, the separation of learned action latents from deterministic protocol lowering would constitute a substantive advance for high-fidelity PCAP synthesis and for downstream generators that can operate directly on the latent space.

major comments (2)

[§4] §4 (Compiler): no pseudocode, state-machine diagram, or enumeration of edge cases is supplied for the compiler’s handling of TCP flags, retransmissions, ambiguous multi-flow slot resolution, or endpoint assignment. Without this specification it is impossible to confirm that the reported 0.03 % fidelity gap is produced by the neural latent rather than by unstated deterministic rules unavailable to the raw-field baselines.
[§5] §5 (Evaluation): the 0.03 % match on packet count, protocol composition, and flow population is presented without dataset splits, baseline source code or hyper-parameters, error bars, or ablations that isolate the compiler’s contribution. The structural diagnostics on TCP state transitions are likewise reported without quantitative metrics or statistical tests.

minor comments (2)

[Abstract / §3] Notation for “timed packet action” and “flow slot” is introduced in the abstract but never given a formal definition or example before the results section.
[Figures in §5] Figure captions for the structural diagnostics do not state the exact CICIDS2017 subset or the number of flows examined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us identify areas for improvement in the presentation of TraceCodec. We provide point-by-point responses to the major comments and will update the manuscript accordingly to enhance reproducibility and clarity.

read point-by-point responses

Referee: [§4] §4 (Compiler): no pseudocode, state-machine diagram, or enumeration of edge cases is supplied for the compiler’s handling of TCP flags, retransmissions, ambiguous multi-flow slot resolution, or endpoint assignment. Without this specification it is impossible to confirm that the reported 0.03 % fidelity gap is produced by the neural latent rather than by unstated deterministic rules unavailable to the raw-field baselines.

Authors: We concur that the manuscript would benefit from more explicit documentation of the compiler. We will revise §4 to include pseudocode for the main compiler routines, a state-machine diagram depicting TCP state handling, and a list of edge cases addressed, such as flag combinations, retransmission scenarios, and multi-flow ambiguities. This addition will make clear how the deterministic rules operate independently of the learned latents and ensure they are not the sole source of the observed fidelity. revision: yes
Referee: [§5] §5 (Evaluation): the 0.03 % match on packet count, protocol composition, and flow population is presented without dataset splits, baseline source code or hyper-parameters, error bars, or ablations that isolate the compiler’s contribution. The structural diagnostics on TCP state transitions are likewise reported without quantitative metrics or statistical tests.

Authors: We agree that the evaluation section requires additional details for full reproducibility and to better isolate contributions. In the revised manuscript, we will report the specific dataset splits, provide hyper-parameters, include error bars from repeated experiments, and present ablations focusing on the compiler component. We will also augment the structural analysis with quantitative metrics for TCP state preservation and statistical significance tests. The baseline source code will be released alongside the paper to allow direct comparison. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation; empirical fidelity claims are independent of fitted inputs

full rationale

The provided abstract and context contain no equations, fitted parameters, or self-citations that reduce any reported result to a self-referential definition or construction. The 0.03% fidelity match on CICIDS2017 is presented as an empirical outcome of the neural latent plus deterministic compiler, with the compiler explicitly positioned as external and non-repair. No load-bearing step equates a prediction to its input by definition, renames a known result, or imports uniqueness from prior author work. The central claim remains falsifiable against the external dataset under the stated non-repair policy, satisfying the criteria for a self-contained derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; all such elements would require the methods and equations sections of the full manuscript.

pith-pipeline@v0.9.1-grok · 5781 in / 1024 out tokens · 20309 ms · 2026-06-29T00:32:33.388603+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 16 canonical work pages

[1]

Scapy: Manipulate packets

Philippe Biondi and Scapy community. Scapy: Manipulate packets. https://scapy.net/,
[2]

Accessed: 2026-05-06

2026
[3]

Traffic data repository at the WIDE project

Kenjiro Cho, Kenji Mitsuya, and Akira Kato. Traffic data repository at the WIDE project. InProceedings of the 2000 USENIX Annual Technical Conference, FREENIX Track, pages 263–270, 2000. URLhttps://mawi.wide.ad.jp/mawi/

2000
[4]

Schmitt, and Nick Feamster

Andrew Chu, Xi Jiang, Shinan Liu, Arjun Nitin Bhagoji, Francesco Bronzino, Paul J. Schmitt, and Nick Feamster. Netssm: Multi-flow and state-aware network trace generation using state- space models.Proceedings of the ACM on Networking, 4(CoNEXT1), 2026. doi: 10.1145/ 3786289. URLhttps://doi.org/10.1145/3786289

work page doi:10.1145/3786289 2026
[5]

Trafficllm: Enhancing large language models for network traffic analysis with generic traffic representation, 2025

Tianyu Cui, Xinjie Lin, Sijia Li, Miao Chen, Qilei Yin, Qi Li, and Ke Xu. Trafficllm: Enhancing large language models for network traffic analysis with generic traffic representation, 2025. URLhttps://arxiv.org/abs/2504.04222. arXiv preprint arXiv:2504.04222

arXiv 2025
[6]

Flowchronicle: synthetic network flow generation through pattern set mining.Proceedings of the ACM on Networking, 2(CoNEXT4):1–20, 2024

Joscha Cüppers, Adrien Schoen, Gregory Blanc, and Pierre-Francois Gimenez. Flowchronicle: synthetic network flow generation through pattern set mining.Proceedings of the ACM on Networking, 2(CoNEXT4):1–20, 2024

2024
[7]

Dpdk – The open source data plane development kit accelerating network performance.https://www.dpdk.org/, 2026

DPDK Project. Dpdk – The open source data plane development kit accelerating network performance.https://www.dpdk.org/, 2026. Accessed: 2026-05-06

2026
[8]

PACC: Protocol- aware cross-layer compression for compact network traffic representation, 2026

Zhaochen Guo, Tianyufei Zhou, Honghao Wang, Ronghua Li, and Shinan Liu. PACC: Protocol- aware cross-layer compression for compact network traffic representation, 2026. URLhttps: //arxiv.org/abs/2602.08331. arXiv preprint arXiv:2602.08331

arXiv 2026
[9]

Generative active adapta- tion for drifting and imbalanced network intrusion detection.arXiv preprint arXiv:2503.03022, 2025

Ragini Gupta, Shinan Liu, Ruixiao Zhang, Xinyue Hu, Xiaoyang Wang, Hadjer Benkraouda, Pranav Kommaraju, Phuong Cao, Nick Feamster, and Klara Nahrstedt. Generative active adapta- tion for drifting and imbalanced network intrusion detection.arXiv preprint arXiv:2503.03022, 2025

arXiv 2025
[10]

netfound: Foundation model for network security, 2023

Satyandra Guthula, Navya Battula, Roman Beltiukov, Wenbo Guo, and Arpit Gupta. netfound: Foundation model for network security, 2023. URL https://arxiv.org/abs/2310.17025. CoRR abs/2310.17025

Pith/arXiv arXiv 2023
[11]

Payload encoding represen- tation from transformer for encrypted traffic classification.ZTE Communica- tions, 19(4):90–97, 2021

Hongye He, Zhiguo Yang, and Xiangning Chen. Payload encoding represen- tation from transformer for encrypted traffic classification.ZTE Communica- tions, 19(4):90–97, 2021. URL https://www.zte.com.cn/global/about/magazine/ zte-communications/2021/en202104/researchpaper/en202104010.html

2021
[12]

New directions in automated traffic analysis

Jordan Holland, Paul Schmitt, Nick Feamster, and Prateek Mittal. New directions in automated traffic analysis. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pages 3366–3383, 2021. doi: 10.1145/3460120.3484758

work page doi:10.1145/3460120.3484758 2021
[13]

Flow-based encrypted network traffic clas- sification with graph neural networks.IEEE Transactions on Network and Service Management, 20(2):1224–1237, 2023

Ting-Li Huoh, Yan Luo, Peilong Li, and Tong Zhang. Flow-based encrypted network traffic clas- sification with graph neural networks.IEEE Transactions on Network and Service Management, 20(2):1224–1237, 2023. doi: 10.1109/TNSM.2022.3227500

work page doi:10.1109/tnsm.2022.3227500 2023
[14]

Schmitt, Francesco Bronzino, and Nick Feamster

Xi Jiang, Shinan Liu, Aaron Gember-Jacobson, Arjun Nitin Bhagoji, Paul J. Schmitt, Francesco Bronzino, and Nick Feamster. Netdiffusion: Network data augmentation through protocol- constrained traffic generation.Proceedings of the ACM on Measurement and Analysis of Computing Systems, 8(1), 2024. doi: 10.1145/3639037. URL https://doi.org/10.1145/ 3639037

work page doi:10.1145/3639037 2024
[15]

Robustifying {ML-powered} network classifiers with {PANTS}

Minhao Jin and Maria Apostolaki. Robustifying {ML-powered} network classifiers with {PANTS}. In34th USENIX Security Symposium (USENIX Security 25), pages 7291–7310, 2025

2025
[16]

Et-bert: A contextu- alized datagram representation with pre-training transformers for encrypted traffic classification,

Xinjie Lin, Gang Xiong, Gaopeng Gou, Zhen Li, Junzheng Shi, and Jing Yu. Et-bert: A contextu- alized datagram representation with pre-training transformers for encrypted traffic classification,
[17]

arXiv preprint arXiv:2202.06335

URLhttps://arxiv.org/abs/2202.06335. arXiv preprint arXiv:2202.06335. 10

arXiv
[18]

FS-Net: A flow sequence network for encrypted traffic classification

Chang Liu, Longtao He, Gang Xiong, Zigang Cao, and Zhen Li. FS-Net: A flow sequence network for encrypted traffic classification. InIEEE INFOCOM 2019 – IEEE Conference on Computer Communications, pages 1171–1179, 2019. doi: 10.1109/INFOCOM.2019.8737507

work page doi:10.1109/infocom.2019.8737507 2019
[19]

Shinan Liu, Tarun Mangla, Ted Shaowang, Jinjin Zhao, John Paparrizos, Sanjay Krishnan, and Nick Feamster. Amir: Active multimodal interaction recognition from video and network traffic in connected environments.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(1):1–26, 2023

2023
[20]

Goggle: Generative modelling for tabular data by learning relational structure

Tennison Liu, Zhaozhi Qian, Jeroen Berrevoets, and Mihaela van der Schaar. Goggle: Generative modelling for tabular data by learning relational structure. InInternational Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=fPVRcJqspu

2023
[21]

Deep packet: A novel approach for encrypted traffic classification using deep learning.Soft Computing, 24(3):1999–2012, 2020

Mohammad Lotfollahi, Mahdi Jafari Siavoshani, Ramin Shirali Hossein Zade, and Mohammd- sadegh Saberian. Deep packet: A novel approach for encrypted traffic classification using deep learning.Soft Computing, 24(3):1999–2012, 2020. doi: 10.1007/s00500-019-04030-2

work page doi:10.1007/s00500-019-04030-2 1999
[22]

Packet representation learning for traffic classification

Xuying Meng, Yequan Wang, Runxin Ma, Haitong Luo, Xiang Li, and Yujun Zhang. Packet representation learning for traffic classification. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3546–3554, 2022. doi: 10.1145/ 3534678.3539085

arXiv 2022
[23]

Net- mamba: Efficient network traffic classification via pre-training unidirectional mamba

Lingfeng Peng, Xiaohui Xie, Sijiang Huang, Ziyi Wang, and Yong Cui. PTU: Pre-trained model for network traffic understanding. In2024 IEEE 32nd International Conference on Network Protocols (ICNP), pages 1–12, 2024. doi: 10.1109/ICNP61940.2024.10858503

work page doi:10.1109/icnp61940.2024.10858503 2024
[24]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2022
[25]

Ghorbani

Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. Toward generating a new intrusion detection dataset and intrusion traffic characterization. InProceedings of the 4th International Conference on Information Systems Security and Privacy, pages 108–116, 2018. doi: 10.5220/0006639801080116. URL https://www.unb.ca/cic/datasets/ids-2017. html

work page doi:10.5220/0006639801080116 2018
[26]

Packet analysis for network forensics: A comprehensive survey.Forensic Science International: Digital Investigation, 32:200892, 2020

Leslie F Sikos. Packet analysis for network forensics: A comprehensive survey.Forensic Science International: Digital Investigation, 32:200892, 2020

2020
[27]

Tcpreplay: Pcap editing and replaying utilities

Aaron Turner and Fred Klassen. Tcpreplay: Pcap editing and replaying utilities. https: //github.com/appneta/tcpreplay, 2024. Accessed: 2026-05-06

2024
[28]

Neural discrete representation learning

Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. InAdvances in Neural Information Processing Systems, 2017

2017
[29]

Dubois, Martina Lindorfer, David R

Thijs van Ede, Riccardo Bortolameotti, Andrea Continella, Jingjing Ren, Daniel J. Dubois, Martina Lindorfer, David R. Choffnes, Maarten van Steen, and Andreas Peter. Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic. InProceedings of the Network and Distributed System Security Symposium (NDSS), 2020. doi: 10.14722/ndss.2020. 24412

work page doi:10.14722/ndss.2020 2020
[30]

Wang and Binh P

Alex X. Wang and Binh P. Nguyen. Ttvae: Transformer-based generative modeling for tabular data generation.Artificial Intelligence, 340:104292, 2025. doi: 10.1016/j.artint.2025.104292. URLhttps://www.sciencedirect.com/science/article/pii/S0004370225000116

work page doi:10.1016/j.artint.2025.104292 2025
[31]

Net- mamba: Efficient network traffic classification via pre-training unidirectional mamba

Tongze Wang, Xiaohui Xie, Wenduo Wang, Chuyi Wang, Youjian Zhao, and Yong Cui. Net- mamba: Efficient network traffic classification via pre-training unidirectional mamba. In2024 IEEE 32nd International Conference on Network Protocols (ICNP), pages 1–11, 2024. doi: 10.1109/ICNP61940.2024.10858569

work page doi:10.1109/icnp61940.2024.10858569 2024
[32]

End-to-end encrypted traffic classification with one-dimensional convolution neural networks

Wei Wang, Ming Zhu, Jinlin Wang, Xuewen Zeng, and Zhongzhen Yang. End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In2017 IEEE Inter- national Conference on Intelligence and Security Informatics (ISI), pages 43–48, 2017. doi: 10.1109/ISI.2017.8004872. 11

work page doi:10.1109/isi.2017.8004872 2017
[33]

EBSNN: Extended byte segment neural network for network traffic classification.IEEE Transactions on Depend- able and Secure Computing, 19(5):3521–3538, 2022

Xi Xiao, Wentao Xiao, Rui Li, Xiapu Luo, Haitao Zheng, and Shutao Xia. EBSNN: Extended byte segment neural network for network traffic classification.IEEE Transactions on Depend- able and Secure Computing, 19(5):3521–3538, 2022. doi: 10.1109/TDSC.2021.3101311

work page doi:10.1109/tdsc.2021.3101311 2022
[34]

Mod- eling tabular data using conditional GAN

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Mod- eling tabular data using conditional GAN. InAdvances in Neural Information Process- ing Systems 32, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/ 254ed7d2de3b23ab10936522dd547b78-Abstract.html

2019
[35]

Practical GAN-based synthetic IP header trace generation using NetShare

Yucheng Yin, Zinan Lin, Minhao Jin, Giulia Fanti, and Vyas Sekar. Practical GAN-based synthetic IP header trace generation using NetShare. InProceedings of the ACM SIGCOMM 2022 Conference, pages 458–472, 2022. doi: 10.1145/3544216.3544251

work page doi:10.1145/3544216.3544251 2022
[36]

Soundstream: An end-to-end neural audio codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021

Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. Soundstream: An end-to-end neural audio codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021

2021
[37]

Mixed-type tabular data synthesis with score-based diffusion in latent space

Hengrui Zhang, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, and George Karypis. Mixed-type tabular data synthesis with score-based diffusion in latent space. InInternational Conference on Learning Representations,
[38]

URLhttps://openreview.net/forum?id=4Ay23yeuz0
[39]

Yet another traffic classifier: A masked autoencoder based traffic transformer with multi- level flow representation

Ruijie Zhao, Mingwei Zhan, Xianwen Deng, Yanhao Wang, Yijun Wang, Guan Gui, and Zhi Xue. Yet another traffic classifier: A masked autoencoder based traffic transformer with multi- level flow representation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5420–5427, 2023. doi: 10.1609/AAAI.V37I4.25674

work page doi:10.1609/aaai.v37i4.25674 2023
[40]

Bait: Large language model backdoor scanning by inverting attack target

Guangmeng Zhou, Xiongwen Guo, Zhuotao Liu, Tong Li, Qi Li, and Ke Xu. Trafficformer: An efficient pre-trained model for traffic data. InProceedings of the IEEE Symposium on Security and Privacy, pages 1844–1860, 2025. doi: 10.1109/SP61157.2025.00102

work page doi:10.1109/sp61157.2025.00102 2025
[41]

privacy guaranteed,

Yajie Zhou, Fuheng Zhao, Eric Wang, Ayse K. Coskun, Divyakant Agrawal, Amr El Abbadi, and Zaoxing Liu. Prvtel: Lightweight models for private and accurate telemetry data reten- tion. In23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26). USENIX Association, 2026. URL https://www.usenix.org/conference/nsdi26/ technical-sessions. ...

2026

[1] [1]

Scapy: Manipulate packets

Philippe Biondi and Scapy community. Scapy: Manipulate packets. https://scapy.net/,

[2] [2]

Accessed: 2026-05-06

2026

[3] [3]

Traffic data repository at the WIDE project

Kenjiro Cho, Kenji Mitsuya, and Akira Kato. Traffic data repository at the WIDE project. InProceedings of the 2000 USENIX Annual Technical Conference, FREENIX Track, pages 263–270, 2000. URLhttps://mawi.wide.ad.jp/mawi/

2000

[4] [4]

Schmitt, and Nick Feamster

Andrew Chu, Xi Jiang, Shinan Liu, Arjun Nitin Bhagoji, Francesco Bronzino, Paul J. Schmitt, and Nick Feamster. Netssm: Multi-flow and state-aware network trace generation using state- space models.Proceedings of the ACM on Networking, 4(CoNEXT1), 2026. doi: 10.1145/ 3786289. URLhttps://doi.org/10.1145/3786289

work page doi:10.1145/3786289 2026

[5] [5]

Trafficllm: Enhancing large language models for network traffic analysis with generic traffic representation, 2025

Tianyu Cui, Xinjie Lin, Sijia Li, Miao Chen, Qilei Yin, Qi Li, and Ke Xu. Trafficllm: Enhancing large language models for network traffic analysis with generic traffic representation, 2025. URLhttps://arxiv.org/abs/2504.04222. arXiv preprint arXiv:2504.04222

arXiv 2025

[6] [6]

Flowchronicle: synthetic network flow generation through pattern set mining.Proceedings of the ACM on Networking, 2(CoNEXT4):1–20, 2024

Joscha Cüppers, Adrien Schoen, Gregory Blanc, and Pierre-Francois Gimenez. Flowchronicle: synthetic network flow generation through pattern set mining.Proceedings of the ACM on Networking, 2(CoNEXT4):1–20, 2024

2024

[7] [7]

Dpdk – The open source data plane development kit accelerating network performance.https://www.dpdk.org/, 2026

DPDK Project. Dpdk – The open source data plane development kit accelerating network performance.https://www.dpdk.org/, 2026. Accessed: 2026-05-06

2026

[8] [8]

PACC: Protocol- aware cross-layer compression for compact network traffic representation, 2026

Zhaochen Guo, Tianyufei Zhou, Honghao Wang, Ronghua Li, and Shinan Liu. PACC: Protocol- aware cross-layer compression for compact network traffic representation, 2026. URLhttps: //arxiv.org/abs/2602.08331. arXiv preprint arXiv:2602.08331

arXiv 2026

[9] [9]

Generative active adapta- tion for drifting and imbalanced network intrusion detection.arXiv preprint arXiv:2503.03022, 2025

Ragini Gupta, Shinan Liu, Ruixiao Zhang, Xinyue Hu, Xiaoyang Wang, Hadjer Benkraouda, Pranav Kommaraju, Phuong Cao, Nick Feamster, and Klara Nahrstedt. Generative active adapta- tion for drifting and imbalanced network intrusion detection.arXiv preprint arXiv:2503.03022, 2025

arXiv 2025

[10] [10]

netfound: Foundation model for network security, 2023

Satyandra Guthula, Navya Battula, Roman Beltiukov, Wenbo Guo, and Arpit Gupta. netfound: Foundation model for network security, 2023. URL https://arxiv.org/abs/2310.17025. CoRR abs/2310.17025

Pith/arXiv arXiv 2023

[11] [11]

Payload encoding represen- tation from transformer for encrypted traffic classification.ZTE Communica- tions, 19(4):90–97, 2021

Hongye He, Zhiguo Yang, and Xiangning Chen. Payload encoding represen- tation from transformer for encrypted traffic classification.ZTE Communica- tions, 19(4):90–97, 2021. URL https://www.zte.com.cn/global/about/magazine/ zte-communications/2021/en202104/researchpaper/en202104010.html

2021

[12] [12]

New directions in automated traffic analysis

Jordan Holland, Paul Schmitt, Nick Feamster, and Prateek Mittal. New directions in automated traffic analysis. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pages 3366–3383, 2021. doi: 10.1145/3460120.3484758

work page doi:10.1145/3460120.3484758 2021

[13] [13]

Flow-based encrypted network traffic clas- sification with graph neural networks.IEEE Transactions on Network and Service Management, 20(2):1224–1237, 2023

Ting-Li Huoh, Yan Luo, Peilong Li, and Tong Zhang. Flow-based encrypted network traffic clas- sification with graph neural networks.IEEE Transactions on Network and Service Management, 20(2):1224–1237, 2023. doi: 10.1109/TNSM.2022.3227500

work page doi:10.1109/tnsm.2022.3227500 2023

[14] [14]

Schmitt, Francesco Bronzino, and Nick Feamster

Xi Jiang, Shinan Liu, Aaron Gember-Jacobson, Arjun Nitin Bhagoji, Paul J. Schmitt, Francesco Bronzino, and Nick Feamster. Netdiffusion: Network data augmentation through protocol- constrained traffic generation.Proceedings of the ACM on Measurement and Analysis of Computing Systems, 8(1), 2024. doi: 10.1145/3639037. URL https://doi.org/10.1145/ 3639037

work page doi:10.1145/3639037 2024

[15] [15]

Robustifying {ML-powered} network classifiers with {PANTS}

Minhao Jin and Maria Apostolaki. Robustifying {ML-powered} network classifiers with {PANTS}. In34th USENIX Security Symposium (USENIX Security 25), pages 7291–7310, 2025

2025

[16] [16]

Et-bert: A contextu- alized datagram representation with pre-training transformers for encrypted traffic classification,

Xinjie Lin, Gang Xiong, Gaopeng Gou, Zhen Li, Junzheng Shi, and Jing Yu. Et-bert: A contextu- alized datagram representation with pre-training transformers for encrypted traffic classification,

[17] [17]

arXiv preprint arXiv:2202.06335

URLhttps://arxiv.org/abs/2202.06335. arXiv preprint arXiv:2202.06335. 10

arXiv

[18] [18]

FS-Net: A flow sequence network for encrypted traffic classification

Chang Liu, Longtao He, Gang Xiong, Zigang Cao, and Zhen Li. FS-Net: A flow sequence network for encrypted traffic classification. InIEEE INFOCOM 2019 – IEEE Conference on Computer Communications, pages 1171–1179, 2019. doi: 10.1109/INFOCOM.2019.8737507

work page doi:10.1109/infocom.2019.8737507 2019

[19] [19]

Shinan Liu, Tarun Mangla, Ted Shaowang, Jinjin Zhao, John Paparrizos, Sanjay Krishnan, and Nick Feamster. Amir: Active multimodal interaction recognition from video and network traffic in connected environments.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(1):1–26, 2023

2023

[20] [20]

Goggle: Generative modelling for tabular data by learning relational structure

Tennison Liu, Zhaozhi Qian, Jeroen Berrevoets, and Mihaela van der Schaar. Goggle: Generative modelling for tabular data by learning relational structure. InInternational Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=fPVRcJqspu

2023

[21] [21]

Deep packet: A novel approach for encrypted traffic classification using deep learning.Soft Computing, 24(3):1999–2012, 2020

Mohammad Lotfollahi, Mahdi Jafari Siavoshani, Ramin Shirali Hossein Zade, and Mohammd- sadegh Saberian. Deep packet: A novel approach for encrypted traffic classification using deep learning.Soft Computing, 24(3):1999–2012, 2020. doi: 10.1007/s00500-019-04030-2

work page doi:10.1007/s00500-019-04030-2 1999

[22] [22]

Packet representation learning for traffic classification

Xuying Meng, Yequan Wang, Runxin Ma, Haitong Luo, Xiang Li, and Yujun Zhang. Packet representation learning for traffic classification. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3546–3554, 2022. doi: 10.1145/ 3534678.3539085

arXiv 2022

[23] [23]

Net- mamba: Efficient network traffic classification via pre-training unidirectional mamba

Lingfeng Peng, Xiaohui Xie, Sijiang Huang, Ziyi Wang, and Yong Cui. PTU: Pre-trained model for network traffic understanding. In2024 IEEE 32nd International Conference on Network Protocols (ICNP), pages 1–12, 2024. doi: 10.1109/ICNP61940.2024.10858503

work page doi:10.1109/icnp61940.2024.10858503 2024

[24] [24]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2022

[25] [25]

Ghorbani

Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. Toward generating a new intrusion detection dataset and intrusion traffic characterization. InProceedings of the 4th International Conference on Information Systems Security and Privacy, pages 108–116, 2018. doi: 10.5220/0006639801080116. URL https://www.unb.ca/cic/datasets/ids-2017. html

work page doi:10.5220/0006639801080116 2018

[26] [26]

Packet analysis for network forensics: A comprehensive survey.Forensic Science International: Digital Investigation, 32:200892, 2020

Leslie F Sikos. Packet analysis for network forensics: A comprehensive survey.Forensic Science International: Digital Investigation, 32:200892, 2020

2020

[27] [27]

Tcpreplay: Pcap editing and replaying utilities

Aaron Turner and Fred Klassen. Tcpreplay: Pcap editing and replaying utilities. https: //github.com/appneta/tcpreplay, 2024. Accessed: 2026-05-06

2024

[28] [28]

Neural discrete representation learning

Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. InAdvances in Neural Information Processing Systems, 2017

2017

[29] [29]

Dubois, Martina Lindorfer, David R

Thijs van Ede, Riccardo Bortolameotti, Andrea Continella, Jingjing Ren, Daniel J. Dubois, Martina Lindorfer, David R. Choffnes, Maarten van Steen, and Andreas Peter. Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic. InProceedings of the Network and Distributed System Security Symposium (NDSS), 2020. doi: 10.14722/ndss.2020. 24412

work page doi:10.14722/ndss.2020 2020

[30] [30]

Wang and Binh P

Alex X. Wang and Binh P. Nguyen. Ttvae: Transformer-based generative modeling for tabular data generation.Artificial Intelligence, 340:104292, 2025. doi: 10.1016/j.artint.2025.104292. URLhttps://www.sciencedirect.com/science/article/pii/S0004370225000116

work page doi:10.1016/j.artint.2025.104292 2025

[31] [31]

Net- mamba: Efficient network traffic classification via pre-training unidirectional mamba

Tongze Wang, Xiaohui Xie, Wenduo Wang, Chuyi Wang, Youjian Zhao, and Yong Cui. Net- mamba: Efficient network traffic classification via pre-training unidirectional mamba. In2024 IEEE 32nd International Conference on Network Protocols (ICNP), pages 1–11, 2024. doi: 10.1109/ICNP61940.2024.10858569

work page doi:10.1109/icnp61940.2024.10858569 2024

[32] [32]

End-to-end encrypted traffic classification with one-dimensional convolution neural networks

Wei Wang, Ming Zhu, Jinlin Wang, Xuewen Zeng, and Zhongzhen Yang. End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In2017 IEEE Inter- national Conference on Intelligence and Security Informatics (ISI), pages 43–48, 2017. doi: 10.1109/ISI.2017.8004872. 11

work page doi:10.1109/isi.2017.8004872 2017

[33] [33]

EBSNN: Extended byte segment neural network for network traffic classification.IEEE Transactions on Depend- able and Secure Computing, 19(5):3521–3538, 2022

Xi Xiao, Wentao Xiao, Rui Li, Xiapu Luo, Haitao Zheng, and Shutao Xia. EBSNN: Extended byte segment neural network for network traffic classification.IEEE Transactions on Depend- able and Secure Computing, 19(5):3521–3538, 2022. doi: 10.1109/TDSC.2021.3101311

work page doi:10.1109/tdsc.2021.3101311 2022

[34] [34]

Mod- eling tabular data using conditional GAN

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Mod- eling tabular data using conditional GAN. InAdvances in Neural Information Process- ing Systems 32, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/ 254ed7d2de3b23ab10936522dd547b78-Abstract.html

2019

[35] [35]

Practical GAN-based synthetic IP header trace generation using NetShare

Yucheng Yin, Zinan Lin, Minhao Jin, Giulia Fanti, and Vyas Sekar. Practical GAN-based synthetic IP header trace generation using NetShare. InProceedings of the ACM SIGCOMM 2022 Conference, pages 458–472, 2022. doi: 10.1145/3544216.3544251

work page doi:10.1145/3544216.3544251 2022

[36] [36]

Soundstream: An end-to-end neural audio codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021

Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. Soundstream: An end-to-end neural audio codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021

2021

[37] [37]

Mixed-type tabular data synthesis with score-based diffusion in latent space

Hengrui Zhang, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, and George Karypis. Mixed-type tabular data synthesis with score-based diffusion in latent space. InInternational Conference on Learning Representations,

[38] [38]

URLhttps://openreview.net/forum?id=4Ay23yeuz0

[39] [39]

Yet another traffic classifier: A masked autoencoder based traffic transformer with multi- level flow representation

Ruijie Zhao, Mingwei Zhan, Xianwen Deng, Yanhao Wang, Yijun Wang, Guan Gui, and Zhi Xue. Yet another traffic classifier: A masked autoencoder based traffic transformer with multi- level flow representation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5420–5427, 2023. doi: 10.1609/AAAI.V37I4.25674

work page doi:10.1609/aaai.v37i4.25674 2023

[40] [40]

Bait: Large language model backdoor scanning by inverting attack target

Guangmeng Zhou, Xiongwen Guo, Zhuotao Liu, Tong Li, Qi Li, and Ke Xu. Trafficformer: An efficient pre-trained model for traffic data. InProceedings of the IEEE Symposium on Security and Privacy, pages 1844–1860, 2025. doi: 10.1109/SP61157.2025.00102

work page doi:10.1109/sp61157.2025.00102 2025

[41] [41]

privacy guaranteed,

Yajie Zhou, Fuheng Zhao, Eric Wang, Ayse K. Coskun, Divyakant Agrawal, Amr El Abbadi, and Zaoxing Liu. Prvtel: Lightweight models for private and accurate telemetry data reten- tion. In23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26). USENIX Association, 2026. URL https://www.usenix.org/conference/nsdi26/ technical-sessions. ...

2026