TraceCodec: A Compiler-Backed Neural Codec for Stateful Multi-Flow Network Traffic Traces
Pith reviewed 2026-06-29 00:32 UTC · model grok-4.3
The pith
TraceCodec lifts packets to timed actions with flow slots, then uses a deterministic compiler to render valid PCAPs that match real traces to 0.03%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TraceCodec lifts each packet into a timed packet action with explicit flow slots and transport cues, learns a continuous per-packet latent, and lowers the decoded actions to PCAPs via a deterministic compiler that owns endpoint assignment, TCP state, legality constraints, and packet rendering.
What carries the argument
State-aware neural codec that decouples a learned latent over packet actions from a deterministic compiler that enforces protocol constraints and produces the final trace.
If this is right
- Downstream traffic models can generate sequences in the packet-action latent space instead of raw header fields.
- TCP state transitions and multi-flow interleaving remain intact because the compiler, not the neural decoder, enforces them.
- Packet count, protocol composition, and flow population match the source trace to within 0.03 percent under a non-repair policy.
- Structural diagnostics confirm preservation of stateful behavior that raw-field decoders fragment.
Where Pith is reading between the lines
- The same compiler interface could be swapped for other protocol stacks if the action vocabulary and legality rules are rewritten.
- Synthetic traces produced this way could serve as drop-in replacements for real PCAPs in privacy-sensitive evaluation pipelines.
- The latent space might support controlled editing of traffic properties such as flow duration or protocol mix by operating directly on the continuous actions.
Load-bearing premise
The deterministic compiler can correctly own endpoint assignment, TCP state, legality constraints, and packet rendering for all decoded actions without introducing systematic biases that affect the learned latent space.
What would settle it
Generate traces from the model on a held-out day of CICIDS2017 or another capture; if TCP state-transition frequencies or active-flow counts deviate by more than 1 percent from the real trace while the same non-repair raw-field baseline stays closer, the separation claim is falsified.
Figures
read the original abstract
Critical networking workflows require high-fidelity packet captures (PCAPs) for testing, security analysis, and protocol validation, not just statistical flow-level summaries. Recent packet generators have demonstrated protocol-constrained PCAP synthesis, but they universally decode directly to raw packet fields. That interface entangles learned behavioral choices with deterministic protocol consequences, which forces packet realization to depend on post-hoc heuristic repair. We identify this decode interface as the fundamental bottleneck and present TraceCodec, a state-aware neural codec for stateful multi-flow traces. TraceCodec lifts each packet into a timed packet action with explicit flow slots and transport cues, then learns a continuous per-packet latent. A deterministic compiler lowers decoded actions back to PCAPs, owning endpoint assignment, TCP state, legality constraints, and packet rendering. The latent layer exposes a generator-facing sequence space, so downstream traffic models can operate on packet-action latents rather than raw header fields. On CICIDS2017 Monday, TraceCodec matches packet count, protocol composition, and flow population to within 0.03%. Raw-field baselines under the same non-repair policy distort flow counts and TCP state by orders of magnitude. Structural diagnostics show that TraceCodec preserves TCP state transitions and multi-flow interleaving that raw-field decoders fragment. This work establishes a new foundation for high-fidelity packet-trace generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents TraceCodec, a state-aware neural codec for stateful multi-flow network traffic traces. Packets are lifted to timed actions with explicit flow slots and transport cues; a continuous per-packet latent is learned; and a deterministic compiler lowers the actions to PCAPs by owning endpoint assignment, TCP state tracking, legality constraints, and packet rendering. On CICIDS2017 Monday the method matches packet count, protocol composition, and flow population to within 0.03 % while preserving TCP state transitions and multi-flow interleaving; raw-field baselines under the same non-repair policy distort these quantities by orders of magnitude.
Significance. If the compiler rules prove to be dataset-independent and the quantitative and structural results are reproducible, the separation of learned action latents from deterministic protocol lowering would constitute a substantive advance for high-fidelity PCAP synthesis and for downstream generators that can operate directly on the latent space.
major comments (2)
- [§4] §4 (Compiler): no pseudocode, state-machine diagram, or enumeration of edge cases is supplied for the compiler’s handling of TCP flags, retransmissions, ambiguous multi-flow slot resolution, or endpoint assignment. Without this specification it is impossible to confirm that the reported 0.03 % fidelity gap is produced by the neural latent rather than by unstated deterministic rules unavailable to the raw-field baselines.
- [§5] §5 (Evaluation): the 0.03 % match on packet count, protocol composition, and flow population is presented without dataset splits, baseline source code or hyper-parameters, error bars, or ablations that isolate the compiler’s contribution. The structural diagnostics on TCP state transitions are likewise reported without quantitative metrics or statistical tests.
minor comments (2)
- [Abstract / §3] Notation for “timed packet action” and “flow slot” is introduced in the abstract but never given a formal definition or example before the results section.
- [Figures in §5] Figure captions for the structural diagnostics do not state the exact CICIDS2017 subset or the number of flows examined.
Simulated Author's Rebuttal
We thank the referee for their insightful comments, which have helped us identify areas for improvement in the presentation of TraceCodec. We provide point-by-point responses to the major comments and will update the manuscript accordingly to enhance reproducibility and clarity.
read point-by-point responses
-
Referee: [§4] §4 (Compiler): no pseudocode, state-machine diagram, or enumeration of edge cases is supplied for the compiler’s handling of TCP flags, retransmissions, ambiguous multi-flow slot resolution, or endpoint assignment. Without this specification it is impossible to confirm that the reported 0.03 % fidelity gap is produced by the neural latent rather than by unstated deterministic rules unavailable to the raw-field baselines.
Authors: We concur that the manuscript would benefit from more explicit documentation of the compiler. We will revise §4 to include pseudocode for the main compiler routines, a state-machine diagram depicting TCP state handling, and a list of edge cases addressed, such as flag combinations, retransmission scenarios, and multi-flow ambiguities. This addition will make clear how the deterministic rules operate independently of the learned latents and ensure they are not the sole source of the observed fidelity. revision: yes
-
Referee: [§5] §5 (Evaluation): the 0.03 % match on packet count, protocol composition, and flow population is presented without dataset splits, baseline source code or hyper-parameters, error bars, or ablations that isolate the compiler’s contribution. The structural diagnostics on TCP state transitions are likewise reported without quantitative metrics or statistical tests.
Authors: We agree that the evaluation section requires additional details for full reproducibility and to better isolate contributions. In the revised manuscript, we will report the specific dataset splits, provide hyper-parameters, include error bars from repeated experiments, and present ablations focusing on the compiler component. We will also augment the structural analysis with quantitative metrics for TCP state preservation and statistical significance tests. The baseline source code will be released alongside the paper to allow direct comparison. revision: yes
Circularity Check
No circularity in derivation; empirical fidelity claims are independent of fitted inputs
full rationale
The provided abstract and context contain no equations, fitted parameters, or self-citations that reduce any reported result to a self-referential definition or construction. The 0.03% fidelity match on CICIDS2017 is presented as an empirical outcome of the neural latent plus deterministic compiler, with the compiler explicitly positioned as external and non-repair. No load-bearing step equates a prediction to its input by definition, renames a known result, or imports uniqueness from prior author work. The central claim remains falsifiable against the external dataset under the stated non-repair policy, satisfying the criteria for a self-contained derivation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Scapy: Manipulate packets
Philippe Biondi and Scapy community. Scapy: Manipulate packets. https://scapy.net/,
-
[2]
Accessed: 2026-05-06
2026
-
[3]
Traffic data repository at the WIDE project
Kenjiro Cho, Kenji Mitsuya, and Akira Kato. Traffic data repository at the WIDE project. InProceedings of the 2000 USENIX Annual Technical Conference, FREENIX Track, pages 263–270, 2000. URLhttps://mawi.wide.ad.jp/mawi/
2000
-
[4]
Andrew Chu, Xi Jiang, Shinan Liu, Arjun Nitin Bhagoji, Francesco Bronzino, Paul J. Schmitt, and Nick Feamster. Netssm: Multi-flow and state-aware network trace generation using state- space models.Proceedings of the ACM on Networking, 4(CoNEXT1), 2026. doi: 10.1145/ 3786289. URLhttps://doi.org/10.1145/3786289
-
[5]
Tianyu Cui, Xinjie Lin, Sijia Li, Miao Chen, Qilei Yin, Qi Li, and Ke Xu. Trafficllm: Enhancing large language models for network traffic analysis with generic traffic representation, 2025. URLhttps://arxiv.org/abs/2504.04222. arXiv preprint arXiv:2504.04222
arXiv 2025
-
[6]
Flowchronicle: synthetic network flow generation through pattern set mining.Proceedings of the ACM on Networking, 2(CoNEXT4):1–20, 2024
Joscha Cüppers, Adrien Schoen, Gregory Blanc, and Pierre-Francois Gimenez. Flowchronicle: synthetic network flow generation through pattern set mining.Proceedings of the ACM on Networking, 2(CoNEXT4):1–20, 2024
2024
-
[7]
Dpdk – The open source data plane development kit accelerating network performance.https://www.dpdk.org/, 2026
DPDK Project. Dpdk – The open source data plane development kit accelerating network performance.https://www.dpdk.org/, 2026. Accessed: 2026-05-06
2026
-
[8]
PACC: Protocol- aware cross-layer compression for compact network traffic representation, 2026
Zhaochen Guo, Tianyufei Zhou, Honghao Wang, Ronghua Li, and Shinan Liu. PACC: Protocol- aware cross-layer compression for compact network traffic representation, 2026. URLhttps: //arxiv.org/abs/2602.08331. arXiv preprint arXiv:2602.08331
arXiv 2026
-
[9]
Ragini Gupta, Shinan Liu, Ruixiao Zhang, Xinyue Hu, Xiaoyang Wang, Hadjer Benkraouda, Pranav Kommaraju, Phuong Cao, Nick Feamster, and Klara Nahrstedt. Generative active adapta- tion for drifting and imbalanced network intrusion detection.arXiv preprint arXiv:2503.03022, 2025
arXiv 2025
-
[10]
netfound: Foundation model for network security, 2023
Satyandra Guthula, Navya Battula, Roman Beltiukov, Wenbo Guo, and Arpit Gupta. netfound: Foundation model for network security, 2023. URL https://arxiv.org/abs/2310.17025. CoRR abs/2310.17025
Pith/arXiv arXiv 2023
-
[11]
Payload encoding represen- tation from transformer for encrypted traffic classification.ZTE Communica- tions, 19(4):90–97, 2021
Hongye He, Zhiguo Yang, and Xiangning Chen. Payload encoding represen- tation from transformer for encrypted traffic classification.ZTE Communica- tions, 19(4):90–97, 2021. URL https://www.zte.com.cn/global/about/magazine/ zte-communications/2021/en202104/researchpaper/en202104010.html
2021
-
[12]
New directions in automated traffic analysis
Jordan Holland, Paul Schmitt, Nick Feamster, and Prateek Mittal. New directions in automated traffic analysis. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pages 3366–3383, 2021. doi: 10.1145/3460120.3484758
-
[13]
Ting-Li Huoh, Yan Luo, Peilong Li, and Tong Zhang. Flow-based encrypted network traffic clas- sification with graph neural networks.IEEE Transactions on Network and Service Management, 20(2):1224–1237, 2023. doi: 10.1109/TNSM.2022.3227500
-
[14]
Schmitt, Francesco Bronzino, and Nick Feamster
Xi Jiang, Shinan Liu, Aaron Gember-Jacobson, Arjun Nitin Bhagoji, Paul J. Schmitt, Francesco Bronzino, and Nick Feamster. Netdiffusion: Network data augmentation through protocol- constrained traffic generation.Proceedings of the ACM on Measurement and Analysis of Computing Systems, 8(1), 2024. doi: 10.1145/3639037. URL https://doi.org/10.1145/ 3639037
-
[15]
Robustifying {ML-powered} network classifiers with {PANTS}
Minhao Jin and Maria Apostolaki. Robustifying {ML-powered} network classifiers with {PANTS}. In34th USENIX Security Symposium (USENIX Security 25), pages 7291–7310, 2025
2025
-
[16]
Et-bert: A contextu- alized datagram representation with pre-training transformers for encrypted traffic classification,
Xinjie Lin, Gang Xiong, Gaopeng Gou, Zhen Li, Junzheng Shi, and Jing Yu. Et-bert: A contextu- alized datagram representation with pre-training transformers for encrypted traffic classification,
-
[17]
arXiv preprint arXiv:2202.06335
URLhttps://arxiv.org/abs/2202.06335. arXiv preprint arXiv:2202.06335. 10
-
[18]
FS-Net: A flow sequence network for encrypted traffic classification
Chang Liu, Longtao He, Gang Xiong, Zigang Cao, and Zhen Li. FS-Net: A flow sequence network for encrypted traffic classification. InIEEE INFOCOM 2019 – IEEE Conference on Computer Communications, pages 1171–1179, 2019. doi: 10.1109/INFOCOM.2019.8737507
-
[19]
Shinan Liu, Tarun Mangla, Ted Shaowang, Jinjin Zhao, John Paparrizos, Sanjay Krishnan, and Nick Feamster. Amir: Active multimodal interaction recognition from video and network traffic in connected environments.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(1):1–26, 2023
2023
-
[20]
Goggle: Generative modelling for tabular data by learning relational structure
Tennison Liu, Zhaozhi Qian, Jeroen Berrevoets, and Mihaela van der Schaar. Goggle: Generative modelling for tabular data by learning relational structure. InInternational Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=fPVRcJqspu
2023
-
[21]
Mohammad Lotfollahi, Mahdi Jafari Siavoshani, Ramin Shirali Hossein Zade, and Mohammd- sadegh Saberian. Deep packet: A novel approach for encrypted traffic classification using deep learning.Soft Computing, 24(3):1999–2012, 2020. doi: 10.1007/s00500-019-04030-2
-
[22]
Packet representation learning for traffic classification
Xuying Meng, Yequan Wang, Runxin Ma, Haitong Luo, Xiang Li, and Yujun Zhang. Packet representation learning for traffic classification. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3546–3554, 2022. doi: 10.1145/ 3534678.3539085
arXiv 2022
-
[23]
Net- mamba: Efficient network traffic classification via pre-training unidirectional mamba
Lingfeng Peng, Xiaohui Xie, Sijiang Huang, Ziyi Wang, and Yong Cui. PTU: Pre-trained model for network traffic understanding. In2024 IEEE 32nd International Conference on Network Protocols (ICNP), pages 1–12, 2024. doi: 10.1109/ICNP61940.2024.10858503
-
[24]
High- resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2022
-
[25]
Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. Toward generating a new intrusion detection dataset and intrusion traffic characterization. InProceedings of the 4th International Conference on Information Systems Security and Privacy, pages 108–116, 2018. doi: 10.5220/0006639801080116. URL https://www.unb.ca/cic/datasets/ids-2017. html
-
[26]
Packet analysis for network forensics: A comprehensive survey.Forensic Science International: Digital Investigation, 32:200892, 2020
Leslie F Sikos. Packet analysis for network forensics: A comprehensive survey.Forensic Science International: Digital Investigation, 32:200892, 2020
2020
-
[27]
Tcpreplay: Pcap editing and replaying utilities
Aaron Turner and Fred Klassen. Tcpreplay: Pcap editing and replaying utilities. https: //github.com/appneta/tcpreplay, 2024. Accessed: 2026-05-06
2024
-
[28]
Neural discrete representation learning
Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. InAdvances in Neural Information Processing Systems, 2017
2017
-
[29]
Dubois, Martina Lindorfer, David R
Thijs van Ede, Riccardo Bortolameotti, Andrea Continella, Jingjing Ren, Daniel J. Dubois, Martina Lindorfer, David R. Choffnes, Maarten van Steen, and Andreas Peter. Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic. InProceedings of the Network and Distributed System Security Symposium (NDSS), 2020. doi: 10.14722/ndss.2020. 24412
-
[30]
Alex X. Wang and Binh P. Nguyen. Ttvae: Transformer-based generative modeling for tabular data generation.Artificial Intelligence, 340:104292, 2025. doi: 10.1016/j.artint.2025.104292. URLhttps://www.sciencedirect.com/science/article/pii/S0004370225000116
-
[31]
Net- mamba: Efficient network traffic classification via pre-training unidirectional mamba
Tongze Wang, Xiaohui Xie, Wenduo Wang, Chuyi Wang, Youjian Zhao, and Yong Cui. Net- mamba: Efficient network traffic classification via pre-training unidirectional mamba. In2024 IEEE 32nd International Conference on Network Protocols (ICNP), pages 1–11, 2024. doi: 10.1109/ICNP61940.2024.10858569
-
[32]
End-to-end encrypted traffic classification with one-dimensional convolution neural networks
Wei Wang, Ming Zhu, Jinlin Wang, Xuewen Zeng, and Zhongzhen Yang. End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In2017 IEEE Inter- national Conference on Intelligence and Security Informatics (ISI), pages 43–48, 2017. doi: 10.1109/ISI.2017.8004872. 11
-
[33]
Xi Xiao, Wentao Xiao, Rui Li, Xiapu Luo, Haitao Zheng, and Shutao Xia. EBSNN: Extended byte segment neural network for network traffic classification.IEEE Transactions on Depend- able and Secure Computing, 19(5):3521–3538, 2022. doi: 10.1109/TDSC.2021.3101311
-
[34]
Mod- eling tabular data using conditional GAN
Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Mod- eling tabular data using conditional GAN. InAdvances in Neural Information Process- ing Systems 32, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/ 254ed7d2de3b23ab10936522dd547b78-Abstract.html
2019
-
[35]
Practical GAN-based synthetic IP header trace generation using NetShare
Yucheng Yin, Zinan Lin, Minhao Jin, Giulia Fanti, and Vyas Sekar. Practical GAN-based synthetic IP header trace generation using NetShare. InProceedings of the ACM SIGCOMM 2022 Conference, pages 458–472, 2022. doi: 10.1145/3544216.3544251
-
[36]
Soundstream: An end-to-end neural audio codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021
Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. Soundstream: An end-to-end neural audio codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021
2021
-
[37]
Mixed-type tabular data synthesis with score-based diffusion in latent space
Hengrui Zhang, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, and George Karypis. Mixed-type tabular data synthesis with score-based diffusion in latent space. InInternational Conference on Learning Representations,
-
[38]
URLhttps://openreview.net/forum?id=4Ay23yeuz0
-
[39]
Ruijie Zhao, Mingwei Zhan, Xianwen Deng, Yanhao Wang, Yijun Wang, Guan Gui, and Zhi Xue. Yet another traffic classifier: A masked autoencoder based traffic transformer with multi- level flow representation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5420–5427, 2023. doi: 10.1609/AAAI.V37I4.25674
-
[40]
Bait: Large language model backdoor scanning by inverting attack target
Guangmeng Zhou, Xiongwen Guo, Zhuotao Liu, Tong Li, Qi Li, and Ke Xu. Trafficformer: An efficient pre-trained model for traffic data. InProceedings of the IEEE Symposium on Security and Privacy, pages 1844–1860, 2025. doi: 10.1109/SP61157.2025.00102
-
[41]
privacy guaranteed,
Yajie Zhou, Fuheng Zhao, Eric Wang, Ayse K. Coskun, Divyakant Agrawal, Amr El Abbadi, and Zaoxing Liu. Prvtel: Lightweight models for private and accurate telemetry data reten- tion. In23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26). USENIX Association, 2026. URL https://www.usenix.org/conference/nsdi26/ technical-sessions. ...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.