pith. sign in

arxiv: 2508.15100 · v2 · pith:YYDR4J6Bnew · submitted 2025-08-20 · 💻 cs.CR · cs.LG

Shift Detection and Adaptation for Network Intrusion Detection

Pith reviewed 2026-05-21 22:28 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords network intrusion detectiondistribution shiftpseudo-labelingknowledge distillationanomaly detectiononline adaptationcatastrophic forgetting
0
0 comments X

The pith

NetSight adapts supervised network anomaly detectors to distribution shifts online using pseudo-labeling and knowledge distillation without manual labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NetSight as a framework that lets supervised anomaly detection systems keep working as network traffic patterns change over time. Existing approaches either demand costly manual labeling for each shift or require clean unsupervised data that is rarely available in practice. NetSight instead generates its own labels for new data and transfers knowledge from the old model to the new one so that performance on earlier patterns does not collapse. This removes the need for constant human oversight in environments where threats and traffic evolve continuously. A reader would care because the method turns a recurring practical bottleneck into an automated process that can run indefinitely on real networks.

Core claim

NetSight is a framework for supervised anomaly detection in network data that continually detects and adapts to distribution shifts in an online manner. It eliminates manual intervention through a novel pseudo-labeling technique and uses a knowledge distillation-based adaptation strategy to prevent catastrophic forgetting. Evaluated on three long-term network datasets, NetSight demonstrates superior adaptation performance compared to state-of-the-art methods that rely on manual labeling.

What carries the argument

The NetSight framework, which pairs a pseudo-labeling technique that assigns labels to newly arrived shifted data with a knowledge-distillation adaptation step that updates the model while retaining prior knowledge.

If this is right

  • The detector can continue to operate in production networks without pauses for human labeling after each detected shift.
  • Performance remains high on both pre-shift and post-shift traffic because distillation preserves earlier knowledge.
  • The approach scales to long-running streams where manual labeling would become prohibitively expensive.
  • Adaptation occurs automatically once a shift is detected, removing the delay between shift occurrence and model update.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pseudo-labeling plus distillation pattern could be tested on other streaming security tasks such as malware classification where labeled data also becomes stale.
  • If the shift detector inside NetSight is replaced with a lighter heuristic, the overall system might run on resource-constrained edge devices.
  • Combining NetSight with periodic small batches of verified labels could provide a hybrid human-in-the-loop fallback when pseudo-label confidence drops.
  • The method suggests that online adaptation in security need not trade off accuracy on old data for accuracy on new data when distillation is used.

Load-bearing premise

The pseudo-labeling technique produces sufficiently accurate labels for the new distribution without introducing systematic errors that would degrade the adapted model.

What would settle it

Running the adaptation loop on a dataset where ground-truth labels for the shifted portion are known and checking whether the F1 score falls below the level achieved when those same ground-truth labels are supplied instead of the pseudo-labels.

Figures

Figures reproduced from arXiv: 2508.15100 by Andrey Dimanchev, Ehssan Mousavipour, Majid Ghaderi.

Figure 1
Figure 1. Figure 1: An overview of the NetSight workflow, illustrating the key stages from initial training to shift detection, explanation, and adaptation. learning using contrastive learning, and 2) anomaly detection based on pseudo-label generation for new samples. Representation Learning. To detect anomalies, we first need to establish a comprehensive representation of normal data. To achieve this, we leverage InfoNCE [36… view at source ↗
Figure 2
Figure 2. Figure 2: Performance comparison on Kyoto2006+. Approaches trained [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Impact of InfoNCE vs. CRC Loss within NetSight on Ky￾oto2006+ for components. Approaches trained on 2007 part (@T0) and evaluated after adaptation on 2011 (@T1), 2012 (@T2), 2013 (@T3), and 2014 (@T4). evaluated using the SMOTE-enhanced CICDDoS2019 dataset. The experiments followed a single adaptation setup where the first time point (@T1) denotes the model’s performance prior to the shift adaptation, and … view at source ↗
Figure 5
Figure 5. Figure 5: Impact of InfoNCE vs. CRC Loss within NetSight on CI￾CIDS2017 and CICDDoS2019. Approaches trained on CICIDS2017 and evaluated before adaptation (@T1) and after adaptation (@T2) on CICDDoS2019. Distribution-level Voting PointWise Voting @T1 @T2 @T3 @T4 Time 0.85 0.90 0.95 F1-Score (a) F1-Score @T1 @T2 @T3 @T4 Time 0.80 0.85 0.90 0.95 Accuracy (b) Accuracy @T1 @T2 @T3 @T4 Time 0.6 0.8 BACC (c) BACC @T1 @T2 @… view at source ↗
Figure 6
Figure 6. Figure 6: Pseudo-Labeling Comparison: Distribution-level vs. Point [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

Distribution shift, a change in the statistical properties of data over time, poses a critical challenge for deep learning anomaly detection systems. Existing anomaly detection systems often struggle to adapt to these shifts. Specifically, systems based on supervised learning require costly manual labeling, while those based on unsupervised learning rely on clean data, which is difficult to obtain, for shift adaptation. Both of these requirements are challenging to meet in practice. In this paper, we introduce NetSight, a framework for supervised anomaly detection in network data that continually detects and adapts to distribution shifts in an online manner. NetSight eliminates manual intervention through a novel pseudo-labeling technique and uses a knowledge distillation-based adaptation strategy to prevent catastrophic forgetting. Evaluated on three long-term network datasets, NetSight demonstrates superior adaptation performance compared to state-of-the-art methods that rely on manual labeling, achieving F1-score improvements of up to 11.72%. This proves its robustness and effectiveness in dynamic networks that experience distribution shifts over time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces NetSight, a framework for online supervised anomaly detection in network intrusion detection that detects distribution shifts and adapts via a novel pseudo-labeling technique combined with knowledge distillation to avoid catastrophic forgetting. It claims F1-score gains of up to 11.72% over manual-labeling baselines on three long-term network datasets.

Significance. If the pseudo-labeling step can be shown to produce reliable labels on shifted data, the work would offer a practical advance for NIDS in dynamic environments by removing the need for ongoing manual labeling while preserving prior knowledge through distillation. The empirical gains on long-term traces are potentially impactful for the field, but their robustness hinges on direct validation of the pseudo-label quality.

major comments (2)
  1. [Evaluation section] The central claim that NetSight outperforms supervised baselines via pseudo-label-driven adaptation requires that the pseudo-labels on post-shift data are sufficiently accurate. The manuscript reports F1 improvements but provides no direct measurement (e.g., precision/recall of pseudo-labels versus held-out ground-truth labels on the shifted regime), leaving open the possibility that gains arise from the distillation regularizer or dataset artifacts rather than reliable pseudo-labels.
  2. [Methods] §4 (or equivalent methods description): the pseudo-labeling procedure is described at a high level without quantitative checks against true post-shift labels that are available for final scoring. If pseudo-label error exceeds a modest threshold, the adaptation loop would be expected to degrade rather than improve performance, undermining the reported 11.72% gains.
minor comments (2)
  1. [Results] Add error bars and statistical significance tests to the F1-score comparisons in the results tables to strengthen the empirical claims.
  2. [Shift Detection] Clarify the exact criteria and thresholds used for shift detection in the online setting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. The concerns about validating pseudo-label quality are well taken, and we address each major comment below while outlining planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Evaluation section] The central claim that NetSight outperforms supervised baselines via pseudo-label-driven adaptation requires that the pseudo-labels on post-shift data are sufficiently accurate. The manuscript reports F1 improvements but provides no direct measurement (e.g., precision/recall of pseudo-labels versus held-out ground-truth labels on the shifted regime), leaving open the possibility that gains arise from the distillation regularizer or dataset artifacts rather than reliable pseudo-labels.

    Authors: We agree that a direct quantitative evaluation of pseudo-label accuracy against ground truth on post-shift data would provide stronger support for attributing the F1 gains specifically to the pseudo-labeling mechanism rather than other components. While the end-to-end performance improvements across the three long-term datasets are consistent with effective adaptation, we acknowledge that this leaves room for alternative explanations. In the revised manuscript we will add a dedicated analysis in the Evaluation section that computes and reports precision, recall, and F1 of the pseudo-labels versus the available held-out ground-truth labels in the shifted regimes. This will allow readers to assess whether pseudo-label error remains within bounds that support performance gains. revision: yes

  2. Referee: [Methods] §4 (or equivalent methods description): the pseudo-labeling procedure is described at a high level without quantitative checks against true post-shift labels that are available for final scoring. If pseudo-label error exceeds a modest threshold, the adaptation loop would be expected to degrade rather than improve performance, undermining the reported 11.72% gains.

    Authors: We appreciate the referee highlighting the need for more concrete validation of the pseudo-labeling step. The current description in Section 4 emphasizes the integration with knowledge distillation and shift detection but does not include explicit error metrics against ground truth. We will revise the methods section to provide additional detail on the pseudo-label generation process and incorporate quantitative checks (e.g., label error rates or agreement with ground truth) using the labels that are already available for final performance scoring. This will directly address whether the observed adaptation benefits are consistent with acceptable pseudo-label quality. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation or claims

full rationale

The manuscript describes an empirical framework (NetSight) relying on a pseudo-labeling technique and knowledge-distillation adaptation, evaluated via F1-score comparisons on three network datasets. No equations, closed-form derivations, or mathematical predictions are present in the provided text. Performance improvements are reported as experimental outcomes rather than results forced by self-definition, fitted parameters renamed as predictions, or load-bearing self-citations. The central claims rest on observable empirical gains and do not reduce to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The central claim implicitly assumes that pseudo-labels can be generated reliably from the current model and that knowledge distillation preserves prior knowledge without additional regularization terms being tuned.

pith-pipeline@v0.9.0 · 5700 in / 1174 out tokens · 33517 ms · 2026-05-21T22:28:31.502443+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 4 internal anchors

  1. [1]

    Anomaly detection: A survey,

    V . Chandola, A. Banerjee, and V . Kumar, “Anomaly detection: A survey,” ACM Comput. Surv., vol. 41, no. 3, 2009

  2. [2]

    Anomaly-based network intrusion detection: Techniques, systems and challenges,

    P. Garcia-Teodoro, J. Diaz-Verdejo, G. Maci ´a-Fern´andez, and E. V ´azquez, “Anomaly-based network intrusion detection: Techniques, systems and challenges,” Comput. Secur., vol. 28, no. 1-2, 2009

  3. [3]

    Deep Learning for Anomaly Detection: A Survey

    R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,” arXiv preprint arXiv:1901.03407 , 2019

  4. [4]

    Anomaly detection in the open world: Normality shift detection, explanation, and adaptation,

    D. Han, Z. Wang, W. Chen, K. Wang, R. Yu, S. Wang, H. Zhang, Z. Wang, M. Jin, and J. Yang, “Anomaly detection in the open world: Normality shift detection, explanation, and adaptation,” in Proc. NDSS, 2023

  5. [5]

    AOC-IDS: Autonomous online framework with contrastive learning for intrusion detection,

    X. Zhang, R. Zhao, Z. Jiang, Z. Sun, Y . Ding, E. C. Ngai, and S.- H. Yang, “AOC-IDS: Autonomous online framework with contrastive learning for intrusion detection,” in Proc. IEEE INFOCOM , 2024

  6. [6]

    Continual learning with strategic selection and forgetting for network intrusion detection,

    X. Zhang, R. Zhao, Z. Jiang, H. Chen, Y . Ding, E. C. Ngai, and S.- H. Yang, “Continual learning with strategic selection and forgetting for network intrusion detection,” in Proc. IEEE INFOCOM , 2025

  7. [7]

    Kitsune: an ensemble of autoencoders for online network intrusion detection,

    Y . Mirsky, T. Doitshman, Y . Elovici, and A. Shabtai, “Kitsune: an ensemble of autoencoders for online network intrusion detection,” in Proc. NDSS, 2018

  8. [8]

    Zerowall: Detecting zero-day web attacks through encoder-decoder recurrent neural networks,

    R. Tang et al. , “Zerowall: Detecting zero-day web attacks through encoder-decoder recurrent neural networks,” in Proc. IEEE INFOCOM, 2020

  9. [9]

    Transcend: Detecting concept drift in malware classification models,

    R. Jordaney, K. Sharad, S. K. Dash, Z. Wang, D. Papini, I. Nouretdinov, and L. Cavallaro, “Transcend: Detecting concept drift in malware classification models,” in Proc. USENIX Security Symp. , 2017

  10. [10]

    CADE: Detecting and explaining concept drift samples for security applications,

    L. Yang, W. Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, and G. Wang, “CADE: Detecting and explaining concept drift samples for security applications,” in Proc. USENIX Security Symp. , 2021

  11. [11]

    Insomnia: Towards concept-drift robustness in network intrusion detection,

    G. Andresini, F. Pendlebury, F. Pierazzi, C. Loglisci, A. Appice, and L. Cavallaro, “Insomnia: Towards concept-drift robustness in network intrusion detection,” in Proc. ACM AISec , 2021

  12. [12]

    Transcending transcend: Revisiting malware classification in the presence of concept drift,

    F. Barbero, F. Pendlebury, F. Pierazzi, and L. Cavallaro, “Transcending transcend: Revisiting malware classification in the presence of concept drift,” in Proc. IEEE S&P , 2022

  13. [13]

    Outside the closed world: On using machine learning for network intrusion detection,

    R. Sommer and V . Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” in Proc. IEEE S&P , 2010

  14. [14]

    Casting out demons: Sanitizing training data for anomaly sensors,

    G. F. Cretu, A. Stavrou, M. E. Locasto, S. J. Stolfo, and A. D. Keromytis, “Casting out demons: Sanitizing training data for anomaly sensors,” in Proc. IEEE S&P , 2008

  15. [15]

    Adaptive anomaly detection via self-calibration and dynamic updating,

    G. F. Cretu-Ciocarlie, A. Stavrou, M. E. Locasto, and S. J. Stolfo, “Adaptive anomaly detection via self-calibration and dynamic updating,” in Proc. RAID, 2009

  16. [16]

    Approaches to adversarial drift,

    A. Kantchelian, S. Afroz, L. Huang, A. C. Islam, B. Miller, M. C. Tschantz, R. Greenstadt, A. D. Joseph, and J. D. Tygar, “Approaches to adversarial drift,” in Proc. ACM AISec , 2013

  17. [17]

    TESSERACT: Eliminating experimental bias in malware classification across space and time,

    F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, and L. Cavallaro, “TESSERACT: Eliminating experimental bias in malware classification across space and time,” in Proc. USENIX Security Symp. , 2019

  18. [18]

    Throwing darts in the dark? detecting bots with limited data using neural data augmentation,

    S. T. Jan, Q. Hao, T. Hu, J. Pu, S. Oswal, G. Wang, and B. Viswanath, “Throwing darts in the dark? detecting bots with limited data using neural data augmentation,” in Proc. IEEE S&P , 2020

  19. [19]

    Amazon sagemaker model monitor: A system for real- time insights into deployed machine learning models,

    D. Nigenda, Z. Karnin, M. B. Zafar, R. Ramesha, A. Tan, M. Donini, and K. Kenthapadi, “Amazon sagemaker model monitor: A system for real- time insights into deployed machine learning models,” in Proc. ACM SIGKDD, 2022

  20. [20]

    A method of few-shot network intrusion detection based on meta-learning framework,

    C. Xu, J. Shen, and X. Du, “A method of few-shot network intrusion detection based on meta-learning framework,”IEEE Trans. Inf. Forensics Secur., vol. 15, 2020

  21. [21]

    Logclass: Anomalous log identification and classifica- tion with partial labels,

    W. Meng et al., “Logclass: Anomalous log identification and classifica- tion with partial labels,” IEEE Trans. Netw. Serv. Manag., vol. 18, no. 2, 2021

  22. [23]

    Lifelong anomaly detection through unlearning,

    M. Du, Z. Chen, C. Liu, R. Oak, and D. Song, “Lifelong anomaly detection through unlearning,” in Proc. ACM CCS , 2019

  23. [24]

    Stabilizing elastic weight consolidation method in practical ml tasks and using weight importances for neural network pruning,

    A. Kutalev and A. Lapina, “Stabilizing elastic weight consolidation method in practical ml tasks and using weight importances for neural network pruning,” arXiv preprint arXiv:2109.10021 , 2021

  24. [25]

    Catastrophic fisher explosion: Early phase fisher matrix impacts generalization,

    S. Jastrzebski, D. Arpit, O. Astrand, G. B. Kerg, H. Wang, C. Xiong, R. Socher, K. Cho, and K. J. Geras, “Catastrophic fisher explosion: Early phase fisher matrix impacts generalization,” in Proc. ICML, 2021

  25. [26]

    Learning without forgetting,

    Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 40, no. 12, 2017

  26. [27]

    No Forgetting Learning: Buffer-free Continual Learning Classification

    M. A. Vahedifar and Q. Zhang, “No forgetting learning: Memory-free continual learning,” arXiv preprint arXiv:2503.04638 , 2025

  27. [28]

    Statistical analysis of honeypot data and building of kyoto 2006+ dataset for nids evaluation,

    J. Song, H. Takakura, Y . Okabe, M. Eto, D. Inoue, and K. Nakao, “Statistical analysis of honeypot data and building of kyoto 2006+ dataset for nids evaluation,” in Proc. BADGERS, 2011

  28. [29]

    Anoshift: A distribution shift benchmark for unsupervised anomaly detection,

    M. Dragoi, E. Burceanu, E. Haller, A. Manolache, and F. Brad, “Anoshift: A distribution shift benchmark for unsupervised anomaly detection,” Adv. Neural Inf. Process. Syst. , vol. 35, 2022

  29. [30]

    CIC-IDS2017 Dataset,

    Canadian Institute for Cybersecurity, “CIC-IDS2017 Dataset,” https:// www.unb.ca/cic/datasets/ids-2017.html, 2017, accessed: Apr. 18, 2025

  30. [31]

    Toward generating a new intrusion detection dataset and intrusion traffic characterization

    I. Sharafaldin, A. H. Lashkari, A. A. Ghorbani et al., “Toward generating a new intrusion detection dataset and intrusion traffic characterization.” in Proc. ICISSP, 2018

  31. [32]

    CIC-DDoS2019 Dataset,

    Canadian Institute for Cybersecurity, “CIC-DDoS2019 Dataset,” https:// www.unb.ca/cic/datasets/ddos-2019.html, 2019, accessed: Apr. 18, 2025

  32. [33]

    De- veloping realistic distributed denial of service (ddos) attack dataset and taxonomy,

    I. Sharafaldin, A. H. Lashkari, S. Hakak, and A. A. Ghorbani, “De- veloping realistic distributed denial of service (ddos) attack dataset and taxonomy,” in Proc. ICCST, 2019

  33. [34]

    What makes for good views for contrastive learning?

    Y . Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola, “What makes for good views for contrastive learning?”Adv. Neural Inf. Process. Syst., vol. 33, 2020

  34. [35]

    Supervised contrastive learn- ing,

    P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y . Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learn- ing,” Adv. Neural Inf. Process. Syst. , vol. 33, 2020

  35. [36]

    Representation Learning with Contrastive Predictive Coding

    A. v. d. Oord, Y . Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748 , 2018

  36. [37]

    Ranking info noise contrastive estimation: Boosting contrastive learning via ranked positives,

    D. T. Hoffmann, N. Behrmann, J. Gall, T. Brox, and M. Noroozi, “Ranking info noise contrastive estimation: Boosting contrastive learning via ranked positives,” in Proc. AAAI, 2022

  37. [38]

    Sch ¨utze, C

    H. Sch ¨utze, C. D. Manning, and P. Raghavan, Introduction to informa- tion retrieval. Cambridge University Press Cambridge, 2008

  38. [39]

    Deep autoencoding gaussian mixture model for unsupervised anomaly detection,

    B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep autoencoding gaussian mixture model for unsupervised anomaly detection,” in Proc. ICLR, 2018

  39. [40]

    Semisu- pervised hyperspectral image classification using a probabilistic pseudo- label generation framework,

    M. Seydgar, S. Rahnamayan, P. Ghamisi, and A. A. Bidgoli, “Semisu- pervised hyperspectral image classification using a probabilistic pseudo- label generation framework,”IEEE Trans. Geosci. Remote Sens., vol. 60, 2022

  40. [41]

    Overcoming catastrophic forgetting in neural networks,

    J. Kirkpatrick et al. , “Overcoming catastrophic forgetting in neural networks,” Proc. Natl. Acad. Sci. , vol. 114, no. 13, 2017

  41. [42]

    Similarity-preserving knowledge distillation,

    F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” in Proc. ICCV, 2019

  42. [43]

    Encoding ip address as a feature for network intrusion detection,

    E. Shao, “Encoding ip address as a feature for network intrusion detection,” Master’s thesis, Purdue University, 2019

  43. [44]

    Time series / date function- ality,

    The Pandas Development Team, “Time series / date function- ality,” https://pandas.pydata.org/docs/user guide/timeseries.html, 2024, accessed: Jun. 8, 2025

  44. [45]

    Adasyn-random forest based intrusion detection model,

    Z. Chen, L. Zhou, and W. Yu, “Adasyn-random forest based intrusion detection model,” in Proc. SPML, 2021

  45. [46]

    Smote: Synthetic minority over-sampling technique,

    N. V . Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: Synthetic minority over-sampling technique,” JAIR, vol. 16, 2002

  46. [47]

    Visualizing data using t-sne,

    L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” JMLR, vol. 9, 2008

  47. [48]

    The matthews correlation coefficient (mcc) is more reliable than balanced accuracy, bookmaker in- formedness, and markedness in two-class confusion matrix evaluation,

    D. Chicco, N. T ¨otsch, and G. Jurman, “The matthews correlation coefficient (mcc) is more reliable than balanced accuracy, bookmaker in- formedness, and markedness in two-class confusion matrix evaluation,” BioData Mining, vol. 14, no. 1, 2021

  48. [49]

    Comparison of the predicted and observed secondary structure of t4 phage lysozyme,

    B. Matthews, “Comparison of the predicted and observed secondary structure of t4 phage lysozyme,” Biochim. Biophys. Acta, vol. 405, no. 2, 1975

  49. [50]

    Anomaly de- tection on attributed networks via contrastive self-supervised learning,

    Y . Liu, Z. Li, S. Pan, C. Gong, C. Zhou, and G. Karypis, “Anomaly de- tection on attributed networks via contrastive self-supervised learning,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 33, no. 6, 2021

  50. [51]

    Feco: Boosting intrusion detection capability in iot networks via contrastive learning,

    N. Wang, S. Shi, Y . Chen, W. Lou, and Y . T. Hou, “Feco: Boosting intrusion detection capability in iot networks via contrastive learning,” IEEE Trans. Dependable Secure Comput. , 2025

  51. [52]

    Gradient episodic memory for continual learning,

    D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for continual learning,” in Proc. NeurIPS, 2017

  52. [53]

    Efficient lifelong learning with a-GEM,

    A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, “Efficient lifelong learning with a-GEM,” in Proc. ICLR, 2019

  53. [54]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531 , 2015