UniAlign: A Model-Agnostic Framework for Robust Network Traffic Classification under Distribution Shifts

Chuyi Wang; Tongze Wang; Wenduo Wang; Xiaohui Xie; Yong Cui

arxiv: 2605.17575 · v1 · pith:FV6D27COnew · submitted 2026-05-17 · 💻 cs.LG · cs.AI

UniAlign: A Model-Agnostic Framework for Robust Network Traffic Classification under Distribution Shifts

Tongze Wang , Xiaohui Xie , Wenduo Wang , Chuyi Wang , Yong Cui This is my paper

Pith reviewed 2026-05-20 13:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords network traffic classificationdistribution shiftsdomain alignmentmodel ensemblingrobustnessdeep learning

0 comments

The pith

UniAlign is a model-agnostic framework that makes deep learning network traffic classifiers more robust to distribution shifts through domain alignment and stable ensembling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to solve the drop in accuracy that network traffic classification models experience when network conditions change after training. It does so by pairing domain alignment fine-tuning, which pushes the model toward representations that stay consistent across different conditions, with stable model ensembling that combines checkpoints from flat regions of the loss surface. A sympathetic reader would care because prior robustness techniques either lock to one model family, fail on modern raw-byte inputs, or demand heavy extra training work. If the claim holds, existing supervised classifiers could be made more reliable in live networks without redesigning features or paying constant extra cost.

Core claim

UniAlign combines domain alignment fine-tuning, which encourages the learning of domain-invariant traffic representations across heterogeneous network conditions, with stable model ensembling, which enhances inference robustness by aggregating checkpoints within a flat loss region. The framework integrates into existing supervised NTC models without requiring specific feature modalities or introducing non-constant additional training costs and is tested on shifts arising from encryption schemes, data collection devices, and attack behaviors.

What carries the argument

The UniAlign framework, which pairs domain alignment fine-tuning for invariant representations with stable model ensembling for robust inference.

If this is right

Existing deep-learning NTC models can retain higher performance when encryption schemes, collection hardware, or attack patterns change after deployment.
The same training pipeline works across different model architectures without custom feature engineering.
Robustness gains arrive at lower total training time than methods built specifically for traffic classification.
No ongoing extra cost appears once the fine-tuning and ensembling steps are complete.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-step pattern of alignment followed by flat-region ensembling could be tested on other classification tasks that suffer distribution shift, such as malware detection or sensor-based activity recognition.
Live network traces with continuous drift would provide a stricter test than the static public datasets used here.

Load-bearing premise

The distribution shifts present in the three public datasets are representative of those encountered in real deployments and the added steps impose no non-constant training costs.

What would settle it

Running UniAlign on a fresh NTC model and a dataset containing distribution shifts absent from the three public ones and finding no gain or even a loss relative to standard training would refute the central claim.

Figures

Figures reproduced from arXiv: 2605.17575 by Chuyi Wang, Tongze Wang, Wenduo Wang, Xiaohui Xie, Yong Cui.

**Figure 1.** Figure 1: Overview of the UniAlign framework. The framework consists of two modules: (left) a domain alignment fine-tuning module that incorporates an additional representation alignment loss to minimize cross-domain feature discrepancies, and (right) a stable model ensembling module that merges model checkpoints located in an identified flat loss valley. in the representation space via adversarial training on domai… view at source ↗

**Figure 2.** Figure 2: Illustration of a sharp minimum and a flat loss valley. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of feature representations produced by [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Average training time versus OOD accuracy of different [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: The impact of different distance metrics on NTC [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 7.** Figure 7: The impact of label smoothing on training loss scales [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: NTC performance of different ensembling variants [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

Network traffic classification (NTC) models often suffer severe performance degradation when deployed in real-world environments due to distribution shifts caused by changing network conditions. Existing robustness-enhancing approaches are commonly coupled to specific model architectures or data settings, fail to generalize to state-of-the-art raw-byte-based NTC models, or incur significant training overhead. In this paper, we propose UniAlign, a novel model-agnostic framework that improves the robustness of deep learning-based NTC models under distribution shifts. UniAlign combines \emph{domain alignment fine-tuning}, which encourages the learning of domain-invariant traffic representations across heterogeneous network conditions, with \emph{stable model ensembling}, which enhances inference robustness by aggregating checkpoints within a flat loss region. The framework can be seamlessly integrated into existing supervised NTC models without requiring specific feature modalities or introducing non-constant additional training costs. We evaluate UniAlign on three public datasets covering diverse distribution shifts, including encryption schemes, data collection devices, and attack behaviors. Experimental results on two representative NTC models demonstrate that, compared with standard training, UniAlign improves average classification accuracy by 2.51\% and average F1 score by 2.71\%, outperforming the strongest baseline by 1.45\% in accuracy and 1.69\% in F1 score, while requiring only 12.4\%--53.9\% of the training time of all NTC-specific baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniAlign gives a workable model-agnostic lift to NTC robustness on the tested shifts, but the 2.5% gains look modest and the datasets may not cover the shifts that matter most in practice.

read the letter

UniAlign pairs domain alignment fine-tuning with stable ensembling to keep raw-byte network traffic classifiers accurate when conditions change. It stays model-agnostic and avoids extra constant training costs, which is the main practical hook here. On three public datasets that vary encryption, collection hardware, and attack types, the method adds roughly 2.5 points of accuracy and 2.7 points of F1 over plain training, and it beats the strongest listed baseline by a smaller margin while using far less time than the specialized competitors. That combination of modest improvement and lower overhead is the part worth noticing if you run NTC models in production. The stable-ensembling step, which pulls checkpoints from a flat loss basin, is a clean way to add inference robustness without new architecture changes. The paper applies these ideas directly to the raw-byte setting rather than inventing new components from scratch, so the novelty sits in the specific pairing and the NTC-focused evaluation. The gains are small enough that they could disappear under different random seeds or slight protocol tweaks, and the abstract gives no error bars or full protocol details to judge stability. The three datasets are a reasonable start, but they still leave open whether the same shifts appear in live networks with temporal drift or unseen protocols; if those dominate, the reported numbers may not travel. Overall the work is clear on its own terms and engages the domain-adaptation literature without obvious circularity. It is the kind of incremental engineering result that matters for people who actually deploy classifiers and want to avoid full retraining. I would bring it to a reading group to check the exact fine-tuning schedule and ensembling procedure. It deserves peer review because the claims are concrete and falsifiable even if the effect size stays modest.

Referee Report

3 major / 2 minor

Summary. The paper presents UniAlign, a model-agnostic framework for enhancing the robustness of deep learning-based network traffic classification (NTC) models against distribution shifts. It combines domain alignment fine-tuning to learn domain-invariant representations and stable model ensembling to aggregate checkpoints in flat loss regions. The approach is designed to integrate seamlessly into existing supervised NTC models without requiring specific feature modalities or incurring non-constant additional training costs. Evaluations on three public datasets involving shifts from encryption schemes, collection devices, and attack behaviors show that UniAlign improves average classification accuracy by 2.51% and F1 score by 2.71% compared to standard training, outperforming the strongest baseline by 1.45% in accuracy and 1.69% in F1, while using only 12.4%--53.9% of the training time of NTC-specific baselines.

Significance. If the reported improvements are confirmed to be statistically significant and generalizable beyond the three datasets, UniAlign could provide a valuable, efficient method for making NTC models more robust to real-world variations in network conditions, addressing a practical challenge in deploying such models without architecture-specific modifications or excessive computational overhead.

major comments (3)

[Abstract] Abstract: The central claims of 2.51% average accuracy improvement and 2.71% F1 improvement (outperforming the strongest baseline by 1.45%/1.69%) are presented without error bars, standard deviations across runs, or details on the number of experimental repetitions and statistical tests. This undercuts the ability to assess whether the robustness gains are reliable or sensitive to random seeds and hyperparameter choices.
[§4.1] §4.1: The three public datasets are positioned as covering encryption schemes, collection devices, and attack behaviors, yet the manuscript does not provide evidence or discussion that these shifts adequately proxy real-world factors such as temporal drift, new protocols, or mixed adversarial patterns. If they do not, the reported gains may not transfer to deployments.
[§3.2] §3.2: The domain alignment fine-tuning component is described as incurring no non-constant additional training costs, but the integration details and any dependence on how shifts are realized during fine-tuning are not quantified across the two representative NTC models, leaving the model-agnostic claim partially unsupported.

minor comments (2)

Add error bars or confidence intervals to all quantitative results in tables and figures to support the average improvement claims.
Clarify the exact protocol for stable model ensembling, including how checkpoints are selected within the flat loss region.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the manuscript. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims of 2.51% average accuracy improvement and 2.71% F1 improvement (outperforming the strongest baseline by 1.45%/1.69%) are presented without error bars, standard deviations across runs, or details on the number of experimental repetitions and statistical tests. This undercuts the ability to assess whether the robustness gains are reliable or sensitive to random seeds and hyperparameter choices.

Authors: We agree with the referee that providing statistical details strengthens the claims. Our experiments were repeated 5 times with different random seeds, and we will update the abstract and relevant sections to include standard deviations as error bars. We will also report the number of repetitions and include statistical significance tests (e.g., paired t-test results) to demonstrate that the improvements are reliable. revision: yes
Referee: [§4.1] §4.1: The three public datasets are positioned as covering encryption schemes, collection devices, and attack behaviors, yet the manuscript does not provide evidence or discussion that these shifts adequately proxy real-world factors such as temporal drift, new protocols, or mixed adversarial patterns. If they do not, the reported gains may not transfer to deployments.

Authors: The three datasets were selected to represent key types of distribution shifts in NTC as identified in prior literature. However, we recognize that they may not fully capture all real-world aspects such as long-term temporal drift or novel protocols. In the revision, we will add a dedicated paragraph in §4.1 discussing the scope of these shifts and their relation to real-world conditions, including potential limitations in generalizability. revision: partial
Referee: [§3.2] §3.2: The domain alignment fine-tuning component is described as incurring no non-constant additional training costs, but the integration details and any dependence on how shifts are realized during fine-tuning are not quantified across the two representative NTC models, leaving the model-agnostic claim partially unsupported.

Authors: We appreciate this observation. The domain alignment fine-tuning adds a constant overhead independent of the model architecture by using a domain classifier on top of the existing features. We will revise §3.2 to provide explicit integration steps for the two NTC models and quantify the additional training time, which is constant and minimal (less than 5% overhead), confirming the model-agnostic nature without dependence on specific shift realizations beyond domain labels. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains from direct dataset comparisons

full rationale

The paper presents UniAlign as a model-agnostic combination of domain alignment fine-tuning and stable model ensembling, then reports measured accuracy/F1 lifts (2.51%/2.71% average) and training-time reductions on three public datasets against baselines. These are straightforward experimental outcomes, not quantities obtained by fitting parameters to the target metric or by any self-referential derivation. No equations, uniqueness theorems, or ansatzes are invoked that collapse back to the inputs; the central claims rest on observable performance deltas rather than on any load-bearing self-citation or definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the framework rests on standard supervised learning assumptions and the representativeness of the chosen public datasets.

pith-pipeline@v0.9.0 · 5792 in / 1066 out tokens · 41587 ms · 2026-05-20T13:27:39.702255+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lalign = Lmean + Lcov ... L = LCE-LS + alpha Lalign

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 6 internal anchors

[1]

Streaming video qoe modeling and prediction: A long short-term memory approach,

N. Eswara, S. Ashique, A. Panchbhai, S. Chakraborty, H. P. Sethuram, K. Kuchi, A. Kumar, and S. S. Channappayya, “Streaming video qoe modeling and prediction: A long short-term memory approach,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 3, pp. 661–673, 2019

work page 2019
[2]

Ptu: Pre-trained model for network traffic understanding,

L. Peng, X. Xie, S. Huang, Z. Wang, and Y . Cui, “Ptu: Pre-trained model for network traffic understanding,” in2024 IEEE 32nd International Conference on Network Protocols (ICNP). IEEE, 2024, pp. 1–12

work page 2024
[3]

Realtime robust malicious traffic detection via frequency domain analysis,

C. Fu, Q. Li, M. Shen, and K. Xu, “Realtime robust malicious traffic detection via frequency domain analysis,” inProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, pp. 3431–3446

work page 2021
[4]

Flowlens: Enabling efficient flow classification for ml- based network security applications

D. Barradas, N. Santos, L. Rodrigues, S. Signorello, F. M. Ramos, and A. Madeira, “Flowlens: Enabling efficient flow classification for ml- based network security applications.” inNDSS, 2021

work page 2021
[5]

On the effectiveness of machine and deep learning for cyber security,

G. Apruzzese, M. Colajanni, L. Ferretti, A. Guido, and M. Marchetti, “On the effectiveness of machine and deep learning for cyber security,” in2018 10th international conference on cyber Conflict (CyCon). IEEE, 2018, pp. 371–390

work page 2018
[6]

Appsniffer: Towards robust mobile app fingerprinting against vpn,

S. Oh, M. Lee, H. Lee, E. Bertino, and H. Kim, “Appsniffer: Towards robust mobile app fingerprinting against vpn,” inProceedings of the ACM Web Conference 2023, 2023, pp. 2318–2328

work page 2023
[7]

Fingerprinting obfuscated proxy traffic with encapsulated{TLS}handshakes,

D. Xue, M. Kallitsis, A. Houmansadr, and R. Ensafi, “Fingerprinting obfuscated proxy traffic with encapsulated{TLS}handshakes,” in33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 2689– 2706

work page 2024
[8]

k-fingerprinting: A robust scalable web- site fingerprinting technique,

J. Hayes and G. Danezis, “k-fingerprinting: A robust scalable web- site fingerprinting technique,” in25th USENIX Security Symposium (USENIX Security 16), 2016, pp. 1187–1203

work page 2016
[9]

Robust smart- phone app identification via encrypted network traffic analysis,

V . F. Taylor, R. Spolaor, M. Conti, and I. Martinovic, “Robust smart- phone app identification via encrypted network traffic analysis,”IEEE Transactions on Information Forensics and Security, vol. 13, no. 1, pp. 63–78, 2017

work page 2017
[10]

Fs-net: A flow sequence network for encrypted traffic classification,

C. Liu, L. He, G. Xiong, Z. Cao, and Z. Li, “Fs-net: A flow sequence network for encrypted traffic classification,” inIEEE INFOCOM 2019- IEEE Conference On Computer Communications. IEEE, 2019, pp. 1171–1179

work page 2019
[11]

Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traf- fic,

T. Van Ede, R. Bortolameotti, A. Continella, J. Ren, D. J. Dubois, M. Lindorfer, D. Choffnes, M. Van Steen, and A. Peter, “Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traf- fic,” inNetwork and distributed system security symposium (NDSS), vol. 27, 2020

work page 2020
[12]

Deep packet: A novel approach for encrypted traffic classification using deep learning,

M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hossein Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,”Soft Computing, vol. 24, no. 3, pp. 1999–2012, 2020

work page 1999
[13]

Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,

X. Lin, G. Xiong, G. Gou, Z. Li, J. Shi, and J. Yu, “Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,” inProceedings of the ACM Web Conference 2022, 2022, pp. 633–642

work page 2022
[14]

Yet another traffic classifier: A masked autoencoder based traffic transformer with multi-level flow representation,

R. Zhao, M. Zhan, X. Deng, Y . Wang, Y . Wang, G. Gui, and Z. Xue, “Yet another traffic classifier: A masked autoencoder based traffic transformer with multi-level flow representation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 4, 2023, pp. 5420–5427

work page 2023
[15]

Netmamba: Efficient network traffic classification via pre-training unidirectional mamba,

T. Wang, X. Xie, W. Wang, C. Wang, Y . Zhao, and Y . Cui, “Netmamba: Efficient network traffic classification via pre-training unidirectional mamba,” in2024 IEEE 32nd International Conference on Network Protocols (ICNP). IEEE, 2024, pp. 1–11

work page 2024
[16]

Trafficformer: an efficient pre-trained model for traffic data,

G. Zhou, X. Guo, Z. Liu, T. Li, Q. Li, and K. Xu, “Trafficformer: an efficient pre-trained model for traffic data,” in2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2025, pp. 1844–1860

work page 2025
[17]

Mm4flow: A pre-trained multi-modal model for versatile network traffic analysis,

L. Yang, L. Liu, J. Huang, Z. Liu, S. Liang, S. Fu, and Y . Wang, “Mm4flow: A pre-trained multi-modal model for versatile network traffic analysis,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 2025, pp. 1664–1678

work page 2025
[18]

The sweet danger of sugar: Debunking representation learning for encrypted traffic classification,

Y . Zhao, G. Dettori, M. Boffa, L. Vassio, and M. Mellia, “The sweet danger of sugar: Debunking representation learning for encrypted traffic classification,” inProceedings of the ACM SIGCOMM 2025 Conference, 2025, pp. 296–310

work page 2025
[19]

Cd-net: Robust mobile traffic classification against apps updating,

Y . Chen, B. Hou, B. Wu, and H. Hu, “Cd-net: Robust mobile traffic classification against apps updating,”Computers & Security, vol. 150, p. 104214, 2025

work page 2025
[20]

Fg-sat: Efficient flow graph for encrypted traffic classification under environment shifts,

S. Cui, X. Han, D. Han, Z. Wang, W. Wang, B. Jiang, B. Liu, and Z. Lu, “Fg-sat: Efficient flow graph for encrypted traffic classification under environment shifts,”IEEE Transactions on Information Forensics and Security, 2025

work page 2025
[21]

Respond to change with constancy: Instruction-tuning with llm for non-iid network traffic classification,

X. Lin, G. Xiong, G. Gou, W. Dong, J. Yu, Z. Li, and W. Xia, “Respond to change with constancy: Instruction-tuning with llm for non-iid network traffic classification,”IEEE Transactions on Information Forensics and Security, 2025

work page 2025
[22]

Realistic website fingerprinting by augmenting network traces,

A. Bahramali, A. Bozorgi, and A. Houmansadr, “Realistic website fingerprinting by augmenting network traces,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 1035–1049

work page 2023
[23]

Rosetta: Enabling robust tls encrypted traffic classification in diverse network environments with tcp-aware traffic augmentation,

R. Xie, Y . Wang, J. Cao, E. Dong, M. Xu, K. Sun, Q. Li, L. Shen, and M. Zhang, “Rosetta: Enabling robust tls encrypted traffic classification in diverse network environments with tcp-aware traffic augmentation,” inProceedings of the ACM turing award celebration conference-China 2023, 2023, pp. 131–132

work page 2023
[24]

Training robust classifiers for classifying encrypted traffic under dynamic network conditions,

Y . Qing, Q. Yin, X. Deng, X. Zhang, P. Li, Z. Liu, K. Sun, K. Xu, and Q. Li, “Training robust classifiers for classifying encrypted traffic under dynamic network conditions,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 2025, pp. 3564–3578

work page 2025
[25]

Domain general- ization: A survey,

K. Zhou, Z. Liu, Y . Qiao, T. Xiang, and C. C. Loy, “Domain general- ization: A survey,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 4, pp. 4396–4415, 2022

work page 2022
[26]

Deep fingerprinting: Undermining website fingerprinting defenses with deep learning,

P. Sirinam, M. Imani, M. Juarez, and M. Wright, “Deep fingerprinting: Undermining website fingerprinting defenses with deep learning,” in Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, 2018, pp. 1928–1943

work page 2018
[27]

Robust multi-tab website fingerprinting attacks in the wild,

X. Deng, Q. Yin, Z. Liu, X. Zhao, Q. Li, M. Xu, K. Xu, and J. Wu, “Robust multi-tab website fingerprinting attacks in the wild,” in2023 IEEE symposium on security and privacy (SP). IEEE, 2023, pp. 1005– 1022

work page 2023
[28]

Flow-mae: Leveraging masked autoencoder for accurate, efficient and robust malicious traffic classifica- tion,

Z. Hang, Y . Lu, Y . Wang, and Y . Xie, “Flow-mae: Leveraging masked autoencoder for accurate, efficient and robust malicious traffic classifica- tion,” inProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, 2023, pp. 297–314

work page 2023
[29]

Tfe-gnn: A temporal fusion encoder using graph neural networks for fine-grained encrypted traffic classification,

H. Zhang, L. Yu, X. Xiao, Q. Li, F. Mercaldo, X. Luo, and Q. Liu, “Tfe-gnn: A temporal fusion encoder using graph neural networks for fine-grained encrypted traffic classification,” inProceedings of the ACM Web Conference 2023, 2023, pp. 2066–2075

work page 2023
[30]

Deep coral: Correlation alignment for deep do- main adaptation,

B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep do- main adaptation,” inEuropean conference on computer vision. Springer, 2016, pp. 443–450

work page 2016
[31]

Domain generalization via conditional invariant representations,

Y . Li, M. Gong, X. Tian, T. Liu, and D. Tao, “Domain generalization via conditional invariant representations,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

work page 2018
[32]

Sharpness-Aware Minimization for Efficiently Improving Generalization

P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,”arXiv preprint arXiv:2010.01412, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[33]

Swad: Domain generalization by seeking flat minima,

J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y . Lee, and S. Park, “Swad: Domain generalization by seeking flat minima,”Advances in Neural Information Processing Systems, vol. 34, pp. 22 405–22 418, 2021

work page 2021
[34]

Surrogate gap minimization improves sharpness-aware training,

J. Zhuang, B. Gong, L. Yuan, Y . Cui, H. Adam, N. Dvornek, S. Tatikonda, J. Duncan, and T. Liu, “Surrogate gap minimization improves sharpness-aware training,”arXiv preprint arXiv:2203.08065, 2022

work page arXiv 2022
[35]

Model-agnostic meta-learning for fast adaptation of deep networks,

C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” inInternational conference on machine learning. PMLR, 2017, pp. 1126–1135

work page 2017
[36]

Learning to generalize: Meta-learning for domain generalization,

D. Li, Y . Yang, Y .-Z. Song, and T. Hospedales, “Learning to generalize: Meta-learning for domain generalization,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

work page 2018
[37]

Domain-adversarial training of neural networks,

Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavi- olette, M. March, and V . Lempitsky, “Domain-adversarial training of neural networks,”Journal of machine learning research, vol. 17, no. 59, pp. 1–35, 2016

work page 2016
[38]

Reducing domain gap by reducing style bias,

H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo, “Reducing domain gap by reducing style bias,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8690–8699

work page 2021
[39]

Domain generalization by learning and removing domain-specific features,

Y . Ding, L. Wang, B. Liang, S. Liang, Y . Wang, and F. Chen, “Domain generalization by learning and removing domain-specific features,”Ad- vances in Neural Information Processing Systems, vol. 35, pp. 24 226– 24 239, 2022

work page 2022
[40]

Domain generalization with adversarial feature learning,

H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain generalization with adversarial feature learning,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5400–5409

work page 2018
[41]

Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,

A. Kendall, Y . Gal, and R. Cipolla, “Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7482–7491

work page 2018
[42]

Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,

Z. Chen, V . Badrinarayanan, C.-Y . Lee, and A. Rabinovich, “Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,” inInternational conference on machine learning. PMLR, 2018, pp. 794–803

work page 2018
[43]

Rethinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826

work page 2016
[44]

Regularizing Neural Networks by Penalizing Confident Output Distributions

G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, and G. Hinton, “Reg- ularizing neural networks by penalizing confident output distributions,” arXiv preprint arXiv:1701.06548, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[45]

When does label smoothing help?

R. M ¨uller, S. Kornblith, and G. E. Hinton, “When does label smoothing help?”Advances in neural information processing systems, vol. 32, 2019

work page 2019
[46]

Averaging Weights Leads to Wider Optima and Better Generalization

P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, “Averaging weights leads to wider optima and better generalization,” arXiv preprint arXiv:1803.05407, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[47]

Ensemble of averages: Improving model selection and boosting performance in domain gener- alization,

D. Arpit, H. Wang, Y . Zhou, and C. Xiong, “Ensemble of averages: Improving model selection and boosting performance in domain gener- alization,”Advances in Neural Information Processing Systems, vol. 35, pp. 8265–8277, 2022

work page 2022
[48]

Sok: Decoding the enigma of encrypted network traffic classifiers,

N. Wickramasinghe, A. Shaghaghi, G. Tsudik, and S. Jha, “Sok: Decoding the enigma of encrypted network traffic classifiers,” in2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2025, pp. 1825– 1843

work page 2025
[49]

A large- scale mobile traffic dataset for mobile application identification,

S. Zhao, S. Chen, F. Wang, Z. Wei, J. Zhong, and J. Liang, “A large- scale mobile traffic dataset for mobile application identification,”The Computer Journal, vol. 67, no. 4, pp. 1501–1513, 2024

work page 2024
[50]

Toward generating a new intrusion detection dataset and intrusion traffic characterization

I. Sharafaldin, A. H. Lashkari, A. A. Ghorbaniet al., “Toward generating a new intrusion detection dataset and intrusion traffic characterization.” ICISSp, vol. 1, no. 2018, pp. 108–116, 2018

work page 2018
[51]

Netmamba+: A framework of pre-trained models for efficient and accurate network traffic classification,

T. Wang, X. Xie, W. Wang, C. Wang, J. Liu, B. Huang, Y . Hu, Y . Zhao, and Y . Cui, “Netmamba+: A framework of pre-trained models for efficient and accurate network traffic classification,”arXiv preprint arXiv:2601.21792, 2026

work page arXiv 2026
[52]

Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection

Y . Mirsky, T. Doitshman, Y . Elovici, and A. Shabtai, “Kitsune: an ensemble of autoencoders for online network intrusion detection,”arXiv preprint arXiv:1802.09089, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[53]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,”arXiv preprint arXiv:1802.03426, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[54]

Flowpic: Encrypted internet traffic classi- fication is as easy as image recognition,

T. Shapira and Y . Shavitt, “Flowpic: Encrypted internet traffic classi- fication is as easy as image recognition,” inIEEE INFOCOM 2019- IEEE conference on computer communications workshops (INFOCOM WKSHPS). IEEE, 2019, pp. 680–687

work page 2019
[55]

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,”arXiv preprint arXiv:1610.02136, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[56]

Energy-based out-of-distribution detection,

W. Liu, X. Wang, J. Owens, and Y . Li, “Energy-based out-of-distribution detection,”Advances in neural information processing systems, vol. 33, pp. 21 464–21 475, 2020

work page 2020
[57]

Enhancing the reliability of out-of-distribution image detection in neural networks

S. Liang, Y . Li, and R. Srikant, “Enhancing the reliability of out- of-distribution image detection in neural networks,”arXiv preprint arXiv:1706.02690, 2017

work page arXiv 2017
[58]

Gen: Pushing the limits of softmax- based out-of-distribution detection,

X. Liu, Y . Lochman, and C. Zach, “Gen: Pushing the limits of softmax- based out-of-distribution detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 23 946–23 955

work page 2023
[59]

Ssd: A unified framework for self-supervised outlier detection

V . Sehwag, M. Chiang, and P. Mittal, “Ssd: A unified framework for self- supervised outlier detection,”arXiv preprint arXiv:2103.12051, 2021

work page arXiv 2021
[60]

Out-of-distribution detection with deep nearest neighbors,

Y . Sun, Y . Ming, X. Zhu, and Y . Li, “Out-of-distribution detection with deep nearest neighbors,” inInternational conference on machine learning. PMLR, 2022, pp. 20 827–20 840

work page 2022
[61]

Nearest neighbor guidance for out-of-distribution detection,

J. Park, Y . G. Jung, and A. B. J. Teoh, “Nearest neighbor guidance for out-of-distribution detection,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 1686–1695

work page 2023
[62]

On the importance of gradients for detecting distributional shifts in the wild,

R. Huang, A. Geng, and Y . Li, “On the importance of gradients for detecting distributional shifts in the wild,”Advances in Neural Information Processing Systems, vol. 34, pp. 677–689, 2021

work page 2021
[63]

Gaia: Delving into gradient-based attribution abnormality for out-of-distribution detec- tion,

J. Chen, J. Li, X. Qu, J. Wang, J. Wan, and J. Xiao, “Gaia: Delving into gradient-based attribution abnormality for out-of-distribution detec- tion,”Advances in Neural Information Processing Systems, vol. 36, pp. 79 946–79 958, 2023

work page 2023
[64]

Gradorth: A simple yet efficient out-of-distribution detection with orthogonal projection of gradients,

S. Behpour, T. L. Doan, X. Li, W. He, L. Gou, and L. Ren, “Gradorth: A simple yet efficient out-of-distribution detection with orthogonal projection of gradients,”Advances in Neural Information Processing Systems, vol. 36, pp. 38 206–38 230, 2023

work page 2023
[65]

A survey on deep active learning: Recent advances and new frontiers,

D. Li, Z. Wang, Y . Chen, R. Jiang, W. Ding, and M. Okumura, “A survey on deep active learning: Recent advances and new frontiers,” IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 4, pp. 5879–5899, 2024

work page 2024
[66]

Deep class-incremental learning: A survey,

D.-W. Zhou, Q.-W. Wang, Z.-H. Qi, H.-J. Ye, D.-C. Zhan, and Z. Liu, “Deep class-incremental learning: A survey,”arXiv preprint arXiv:2302.03648, vol. 1, no. 2, p. 6, 2023

work page arXiv 2023
[67]

A few shots traffic classification with mini-flowpic augmentations,

E. Horowicz, T. Shapira, and Y . Shavitt, “A few shots traffic classification with mini-flowpic augmentations,” inProceedings of the 22nd ACM internet measurement conference, 2022, pp. 647–654

work page 2022
[68]

Accurate decentralized application identification via encrypted traffic analysis using graph neural networks,

M. Shen, J. Zhang, L. Zhu, K. Xu, and X. Du, “Accurate decentralized application identification via encrypted traffic analysis using graph neural networks,”IEEE Transactions on Information Forensics and Security, vol. 16, pp. 2367–2380, 2021

work page 2021
[69]

Pert: Payload encoding representation from transformer for encrypted traffic classification,

H. Y . He, Z. G. Yang, and X. N. Chen, “Pert: Payload encoding representation from transformer for encrypted traffic classification,” in 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K). IEEE, 2020, pp. 1–8

work page 2020
[70]

Mtt: an efficient model for encrypted network traffic classification using multi-task transformer,

W. Zheng, J. Zhong, Q. Zhang, and G. Zhao, “Mtt: an efficient model for encrypted network traffic classification using multi-task transformer,” Applied Intelligence, vol. 52, no. 9, pp. 10 741–10 756, 2022

work page 2022
[71]

Netgpt: Generative pretrained transformer for network traffic,

X. Meng, C. Lin, Y . Wang, and Y . Zhang, “Netgpt: Generative pretrained transformer for network traffic,”arXiv preprint arXiv:2304.09513, 2023

work page arXiv 2023
[72]

Lens: A foundation model for network traffic in cybersecurity,

Q. Wang, C. Qian, X. Li, Z. Yao, and H. Shao, “Lens: A foundation model for network traffic in cybersecurity,”arXiv e-prints, pp. arXiv– 2402, 2024

work page 2024

[1] [1]

Streaming video qoe modeling and prediction: A long short-term memory approach,

N. Eswara, S. Ashique, A. Panchbhai, S. Chakraborty, H. P. Sethuram, K. Kuchi, A. Kumar, and S. S. Channappayya, “Streaming video qoe modeling and prediction: A long short-term memory approach,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 3, pp. 661–673, 2019

work page 2019

[2] [2]

Ptu: Pre-trained model for network traffic understanding,

L. Peng, X. Xie, S. Huang, Z. Wang, and Y . Cui, “Ptu: Pre-trained model for network traffic understanding,” in2024 IEEE 32nd International Conference on Network Protocols (ICNP). IEEE, 2024, pp. 1–12

work page 2024

[3] [3]

Realtime robust malicious traffic detection via frequency domain analysis,

C. Fu, Q. Li, M. Shen, and K. Xu, “Realtime robust malicious traffic detection via frequency domain analysis,” inProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, pp. 3431–3446

work page 2021

[4] [4]

Flowlens: Enabling efficient flow classification for ml- based network security applications

D. Barradas, N. Santos, L. Rodrigues, S. Signorello, F. M. Ramos, and A. Madeira, “Flowlens: Enabling efficient flow classification for ml- based network security applications.” inNDSS, 2021

work page 2021

[5] [5]

On the effectiveness of machine and deep learning for cyber security,

G. Apruzzese, M. Colajanni, L. Ferretti, A. Guido, and M. Marchetti, “On the effectiveness of machine and deep learning for cyber security,” in2018 10th international conference on cyber Conflict (CyCon). IEEE, 2018, pp. 371–390

work page 2018

[6] [6]

Appsniffer: Towards robust mobile app fingerprinting against vpn,

S. Oh, M. Lee, H. Lee, E. Bertino, and H. Kim, “Appsniffer: Towards robust mobile app fingerprinting against vpn,” inProceedings of the ACM Web Conference 2023, 2023, pp. 2318–2328

work page 2023

[7] [7]

Fingerprinting obfuscated proxy traffic with encapsulated{TLS}handshakes,

D. Xue, M. Kallitsis, A. Houmansadr, and R. Ensafi, “Fingerprinting obfuscated proxy traffic with encapsulated{TLS}handshakes,” in33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 2689– 2706

work page 2024

[8] [8]

k-fingerprinting: A robust scalable web- site fingerprinting technique,

J. Hayes and G. Danezis, “k-fingerprinting: A robust scalable web- site fingerprinting technique,” in25th USENIX Security Symposium (USENIX Security 16), 2016, pp. 1187–1203

work page 2016

[9] [9]

Robust smart- phone app identification via encrypted network traffic analysis,

V . F. Taylor, R. Spolaor, M. Conti, and I. Martinovic, “Robust smart- phone app identification via encrypted network traffic analysis,”IEEE Transactions on Information Forensics and Security, vol. 13, no. 1, pp. 63–78, 2017

work page 2017

[10] [10]

Fs-net: A flow sequence network for encrypted traffic classification,

C. Liu, L. He, G. Xiong, Z. Cao, and Z. Li, “Fs-net: A flow sequence network for encrypted traffic classification,” inIEEE INFOCOM 2019- IEEE Conference On Computer Communications. IEEE, 2019, pp. 1171–1179

work page 2019

[11] [11]

Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traf- fic,

T. Van Ede, R. Bortolameotti, A. Continella, J. Ren, D. J. Dubois, M. Lindorfer, D. Choffnes, M. Van Steen, and A. Peter, “Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traf- fic,” inNetwork and distributed system security symposium (NDSS), vol. 27, 2020

work page 2020

[12] [12]

Deep packet: A novel approach for encrypted traffic classification using deep learning,

M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hossein Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,”Soft Computing, vol. 24, no. 3, pp. 1999–2012, 2020

work page 1999

[13] [13]

Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,

X. Lin, G. Xiong, G. Gou, Z. Li, J. Shi, and J. Yu, “Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification,” inProceedings of the ACM Web Conference 2022, 2022, pp. 633–642

work page 2022

[14] [14]

Yet another traffic classifier: A masked autoencoder based traffic transformer with multi-level flow representation,

R. Zhao, M. Zhan, X. Deng, Y . Wang, Y . Wang, G. Gui, and Z. Xue, “Yet another traffic classifier: A masked autoencoder based traffic transformer with multi-level flow representation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 4, 2023, pp. 5420–5427

work page 2023

[15] [15]

Netmamba: Efficient network traffic classification via pre-training unidirectional mamba,

T. Wang, X. Xie, W. Wang, C. Wang, Y . Zhao, and Y . Cui, “Netmamba: Efficient network traffic classification via pre-training unidirectional mamba,” in2024 IEEE 32nd International Conference on Network Protocols (ICNP). IEEE, 2024, pp. 1–11

work page 2024

[16] [16]

Trafficformer: an efficient pre-trained model for traffic data,

G. Zhou, X. Guo, Z. Liu, T. Li, Q. Li, and K. Xu, “Trafficformer: an efficient pre-trained model for traffic data,” in2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2025, pp. 1844–1860

work page 2025

[17] [17]

Mm4flow: A pre-trained multi-modal model for versatile network traffic analysis,

L. Yang, L. Liu, J. Huang, Z. Liu, S. Liang, S. Fu, and Y . Wang, “Mm4flow: A pre-trained multi-modal model for versatile network traffic analysis,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 2025, pp. 1664–1678

work page 2025

[18] [18]

The sweet danger of sugar: Debunking representation learning for encrypted traffic classification,

Y . Zhao, G. Dettori, M. Boffa, L. Vassio, and M. Mellia, “The sweet danger of sugar: Debunking representation learning for encrypted traffic classification,” inProceedings of the ACM SIGCOMM 2025 Conference, 2025, pp. 296–310

work page 2025

[19] [19]

Cd-net: Robust mobile traffic classification against apps updating,

Y . Chen, B. Hou, B. Wu, and H. Hu, “Cd-net: Robust mobile traffic classification against apps updating,”Computers & Security, vol. 150, p. 104214, 2025

work page 2025

[20] [20]

Fg-sat: Efficient flow graph for encrypted traffic classification under environment shifts,

S. Cui, X. Han, D. Han, Z. Wang, W. Wang, B. Jiang, B. Liu, and Z. Lu, “Fg-sat: Efficient flow graph for encrypted traffic classification under environment shifts,”IEEE Transactions on Information Forensics and Security, 2025

work page 2025

[21] [21]

Respond to change with constancy: Instruction-tuning with llm for non-iid network traffic classification,

X. Lin, G. Xiong, G. Gou, W. Dong, J. Yu, Z. Li, and W. Xia, “Respond to change with constancy: Instruction-tuning with llm for non-iid network traffic classification,”IEEE Transactions on Information Forensics and Security, 2025

work page 2025

[22] [22]

Realistic website fingerprinting by augmenting network traces,

A. Bahramali, A. Bozorgi, and A. Houmansadr, “Realistic website fingerprinting by augmenting network traces,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 1035–1049

work page 2023

[23] [23]

Rosetta: Enabling robust tls encrypted traffic classification in diverse network environments with tcp-aware traffic augmentation,

R. Xie, Y . Wang, J. Cao, E. Dong, M. Xu, K. Sun, Q. Li, L. Shen, and M. Zhang, “Rosetta: Enabling robust tls encrypted traffic classification in diverse network environments with tcp-aware traffic augmentation,” inProceedings of the ACM turing award celebration conference-China 2023, 2023, pp. 131–132

work page 2023

[24] [24]

Training robust classifiers for classifying encrypted traffic under dynamic network conditions,

Y . Qing, Q. Yin, X. Deng, X. Zhang, P. Li, Z. Liu, K. Sun, K. Xu, and Q. Li, “Training robust classifiers for classifying encrypted traffic under dynamic network conditions,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 2025, pp. 3564–3578

work page 2025

[25] [25]

Domain general- ization: A survey,

K. Zhou, Z. Liu, Y . Qiao, T. Xiang, and C. C. Loy, “Domain general- ization: A survey,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 4, pp. 4396–4415, 2022

work page 2022

[26] [26]

Deep fingerprinting: Undermining website fingerprinting defenses with deep learning,

P. Sirinam, M. Imani, M. Juarez, and M. Wright, “Deep fingerprinting: Undermining website fingerprinting defenses with deep learning,” in Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, 2018, pp. 1928–1943

work page 2018

[27] [27]

Robust multi-tab website fingerprinting attacks in the wild,

X. Deng, Q. Yin, Z. Liu, X. Zhao, Q. Li, M. Xu, K. Xu, and J. Wu, “Robust multi-tab website fingerprinting attacks in the wild,” in2023 IEEE symposium on security and privacy (SP). IEEE, 2023, pp. 1005– 1022

work page 2023

[28] [28]

Flow-mae: Leveraging masked autoencoder for accurate, efficient and robust malicious traffic classifica- tion,

Z. Hang, Y . Lu, Y . Wang, and Y . Xie, “Flow-mae: Leveraging masked autoencoder for accurate, efficient and robust malicious traffic classifica- tion,” inProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, 2023, pp. 297–314

work page 2023

[29] [29]

Tfe-gnn: A temporal fusion encoder using graph neural networks for fine-grained encrypted traffic classification,

H. Zhang, L. Yu, X. Xiao, Q. Li, F. Mercaldo, X. Luo, and Q. Liu, “Tfe-gnn: A temporal fusion encoder using graph neural networks for fine-grained encrypted traffic classification,” inProceedings of the ACM Web Conference 2023, 2023, pp. 2066–2075

work page 2023

[30] [30]

Deep coral: Correlation alignment for deep do- main adaptation,

B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep do- main adaptation,” inEuropean conference on computer vision. Springer, 2016, pp. 443–450

work page 2016

[31] [31]

Domain generalization via conditional invariant representations,

Y . Li, M. Gong, X. Tian, T. Liu, and D. Tao, “Domain generalization via conditional invariant representations,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

work page 2018

[32] [32]

Sharpness-Aware Minimization for Efficiently Improving Generalization

P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,”arXiv preprint arXiv:2010.01412, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[33] [33]

Swad: Domain generalization by seeking flat minima,

J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y . Lee, and S. Park, “Swad: Domain generalization by seeking flat minima,”Advances in Neural Information Processing Systems, vol. 34, pp. 22 405–22 418, 2021

work page 2021

[34] [34]

Surrogate gap minimization improves sharpness-aware training,

J. Zhuang, B. Gong, L. Yuan, Y . Cui, H. Adam, N. Dvornek, S. Tatikonda, J. Duncan, and T. Liu, “Surrogate gap minimization improves sharpness-aware training,”arXiv preprint arXiv:2203.08065, 2022

work page arXiv 2022

[35] [35]

Model-agnostic meta-learning for fast adaptation of deep networks,

C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” inInternational conference on machine learning. PMLR, 2017, pp. 1126–1135

work page 2017

[36] [36]

Learning to generalize: Meta-learning for domain generalization,

D. Li, Y . Yang, Y .-Z. Song, and T. Hospedales, “Learning to generalize: Meta-learning for domain generalization,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

work page 2018

[37] [37]

Domain-adversarial training of neural networks,

Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavi- olette, M. March, and V . Lempitsky, “Domain-adversarial training of neural networks,”Journal of machine learning research, vol. 17, no. 59, pp. 1–35, 2016

work page 2016

[38] [38]

Reducing domain gap by reducing style bias,

H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo, “Reducing domain gap by reducing style bias,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8690–8699

work page 2021

[39] [39]

Domain generalization by learning and removing domain-specific features,

Y . Ding, L. Wang, B. Liang, S. Liang, Y . Wang, and F. Chen, “Domain generalization by learning and removing domain-specific features,”Ad- vances in Neural Information Processing Systems, vol. 35, pp. 24 226– 24 239, 2022

work page 2022

[40] [40]

Domain generalization with adversarial feature learning,

H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain generalization with adversarial feature learning,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5400–5409

work page 2018

[41] [41]

Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,

A. Kendall, Y . Gal, and R. Cipolla, “Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7482–7491

work page 2018

[42] [42]

Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,

Z. Chen, V . Badrinarayanan, C.-Y . Lee, and A. Rabinovich, “Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,” inInternational conference on machine learning. PMLR, 2018, pp. 794–803

work page 2018

[43] [43]

Rethinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826

work page 2016

[44] [44]

Regularizing Neural Networks by Penalizing Confident Output Distributions

G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, and G. Hinton, “Reg- ularizing neural networks by penalizing confident output distributions,” arXiv preprint arXiv:1701.06548, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[45] [45]

When does label smoothing help?

R. M ¨uller, S. Kornblith, and G. E. Hinton, “When does label smoothing help?”Advances in neural information processing systems, vol. 32, 2019

work page 2019

[46] [46]

Averaging Weights Leads to Wider Optima and Better Generalization

P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, “Averaging weights leads to wider optima and better generalization,” arXiv preprint arXiv:1803.05407, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[47] [47]

Ensemble of averages: Improving model selection and boosting performance in domain gener- alization,

D. Arpit, H. Wang, Y . Zhou, and C. Xiong, “Ensemble of averages: Improving model selection and boosting performance in domain gener- alization,”Advances in Neural Information Processing Systems, vol. 35, pp. 8265–8277, 2022

work page 2022

[48] [48]

Sok: Decoding the enigma of encrypted network traffic classifiers,

N. Wickramasinghe, A. Shaghaghi, G. Tsudik, and S. Jha, “Sok: Decoding the enigma of encrypted network traffic classifiers,” in2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2025, pp. 1825– 1843

work page 2025

[49] [49]

A large- scale mobile traffic dataset for mobile application identification,

S. Zhao, S. Chen, F. Wang, Z. Wei, J. Zhong, and J. Liang, “A large- scale mobile traffic dataset for mobile application identification,”The Computer Journal, vol. 67, no. 4, pp. 1501–1513, 2024

work page 2024

[50] [50]

Toward generating a new intrusion detection dataset and intrusion traffic characterization

I. Sharafaldin, A. H. Lashkari, A. A. Ghorbaniet al., “Toward generating a new intrusion detection dataset and intrusion traffic characterization.” ICISSp, vol. 1, no. 2018, pp. 108–116, 2018

work page 2018

[51] [51]

Netmamba+: A framework of pre-trained models for efficient and accurate network traffic classification,

T. Wang, X. Xie, W. Wang, C. Wang, J. Liu, B. Huang, Y . Hu, Y . Zhao, and Y . Cui, “Netmamba+: A framework of pre-trained models for efficient and accurate network traffic classification,”arXiv preprint arXiv:2601.21792, 2026

work page arXiv 2026

[52] [52]

Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection

Y . Mirsky, T. Doitshman, Y . Elovici, and A. Shabtai, “Kitsune: an ensemble of autoencoders for online network intrusion detection,”arXiv preprint arXiv:1802.09089, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[53] [53]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,”arXiv preprint arXiv:1802.03426, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[54] [54]

Flowpic: Encrypted internet traffic classi- fication is as easy as image recognition,

T. Shapira and Y . Shavitt, “Flowpic: Encrypted internet traffic classi- fication is as easy as image recognition,” inIEEE INFOCOM 2019- IEEE conference on computer communications workshops (INFOCOM WKSHPS). IEEE, 2019, pp. 680–687

work page 2019

[55] [55]

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,”arXiv preprint arXiv:1610.02136, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[56] [56]

Energy-based out-of-distribution detection,

W. Liu, X. Wang, J. Owens, and Y . Li, “Energy-based out-of-distribution detection,”Advances in neural information processing systems, vol. 33, pp. 21 464–21 475, 2020

work page 2020

[57] [57]

Enhancing the reliability of out-of-distribution image detection in neural networks

S. Liang, Y . Li, and R. Srikant, “Enhancing the reliability of out- of-distribution image detection in neural networks,”arXiv preprint arXiv:1706.02690, 2017

work page arXiv 2017

[58] [58]

Gen: Pushing the limits of softmax- based out-of-distribution detection,

X. Liu, Y . Lochman, and C. Zach, “Gen: Pushing the limits of softmax- based out-of-distribution detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 23 946–23 955

work page 2023

[59] [59]

Ssd: A unified framework for self-supervised outlier detection

V . Sehwag, M. Chiang, and P. Mittal, “Ssd: A unified framework for self- supervised outlier detection,”arXiv preprint arXiv:2103.12051, 2021

work page arXiv 2021

[60] [60]

Out-of-distribution detection with deep nearest neighbors,

Y . Sun, Y . Ming, X. Zhu, and Y . Li, “Out-of-distribution detection with deep nearest neighbors,” inInternational conference on machine learning. PMLR, 2022, pp. 20 827–20 840

work page 2022

[61] [61]

Nearest neighbor guidance for out-of-distribution detection,

J. Park, Y . G. Jung, and A. B. J. Teoh, “Nearest neighbor guidance for out-of-distribution detection,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 1686–1695

work page 2023

[62] [62]

On the importance of gradients for detecting distributional shifts in the wild,

R. Huang, A. Geng, and Y . Li, “On the importance of gradients for detecting distributional shifts in the wild,”Advances in Neural Information Processing Systems, vol. 34, pp. 677–689, 2021

work page 2021

[63] [63]

Gaia: Delving into gradient-based attribution abnormality for out-of-distribution detec- tion,

J. Chen, J. Li, X. Qu, J. Wang, J. Wan, and J. Xiao, “Gaia: Delving into gradient-based attribution abnormality for out-of-distribution detec- tion,”Advances in Neural Information Processing Systems, vol. 36, pp. 79 946–79 958, 2023

work page 2023

[64] [64]

Gradorth: A simple yet efficient out-of-distribution detection with orthogonal projection of gradients,

S. Behpour, T. L. Doan, X. Li, W. He, L. Gou, and L. Ren, “Gradorth: A simple yet efficient out-of-distribution detection with orthogonal projection of gradients,”Advances in Neural Information Processing Systems, vol. 36, pp. 38 206–38 230, 2023

work page 2023

[65] [65]

A survey on deep active learning: Recent advances and new frontiers,

D. Li, Z. Wang, Y . Chen, R. Jiang, W. Ding, and M. Okumura, “A survey on deep active learning: Recent advances and new frontiers,” IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 4, pp. 5879–5899, 2024

work page 2024

[66] [66]

Deep class-incremental learning: A survey,

D.-W. Zhou, Q.-W. Wang, Z.-H. Qi, H.-J. Ye, D.-C. Zhan, and Z. Liu, “Deep class-incremental learning: A survey,”arXiv preprint arXiv:2302.03648, vol. 1, no. 2, p. 6, 2023

work page arXiv 2023

[67] [67]

A few shots traffic classification with mini-flowpic augmentations,

E. Horowicz, T. Shapira, and Y . Shavitt, “A few shots traffic classification with mini-flowpic augmentations,” inProceedings of the 22nd ACM internet measurement conference, 2022, pp. 647–654

work page 2022

[68] [68]

Accurate decentralized application identification via encrypted traffic analysis using graph neural networks,

M. Shen, J. Zhang, L. Zhu, K. Xu, and X. Du, “Accurate decentralized application identification via encrypted traffic analysis using graph neural networks,”IEEE Transactions on Information Forensics and Security, vol. 16, pp. 2367–2380, 2021

work page 2021

[69] [69]

Pert: Payload encoding representation from transformer for encrypted traffic classification,

H. Y . He, Z. G. Yang, and X. N. Chen, “Pert: Payload encoding representation from transformer for encrypted traffic classification,” in 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K). IEEE, 2020, pp. 1–8

work page 2020

[70] [70]

Mtt: an efficient model for encrypted network traffic classification using multi-task transformer,

W. Zheng, J. Zhong, Q. Zhang, and G. Zhao, “Mtt: an efficient model for encrypted network traffic classification using multi-task transformer,” Applied Intelligence, vol. 52, no. 9, pp. 10 741–10 756, 2022

work page 2022

[71] [71]

Netgpt: Generative pretrained transformer for network traffic,

X. Meng, C. Lin, Y . Wang, and Y . Zhang, “Netgpt: Generative pretrained transformer for network traffic,”arXiv preprint arXiv:2304.09513, 2023

work page arXiv 2023

[72] [72]

Lens: A foundation model for network traffic in cybersecurity,

Q. Wang, C. Qian, X. Li, Z. Yao, and H. Shao, “Lens: A foundation model for network traffic in cybersecurity,”arXiv e-prints, pp. arXiv– 2402, 2024

work page 2024