pith. machine review for the scientific record. sign in

arxiv: 2605.10436 · v1 · submitted 2026-05-11 · 💻 cs.CR · cs.LG· cs.NI

Recognition: 2 theorem links

· Lean Theorem

DRIFT: Drift-Resilient Invariant-Feature Transformer for DGA Detection

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:15 UTC · model grok-4.3

classification 💻 cs.CR cs.LGcs.NI
keywords DGA detectiontemporal driftinvariant featurestransformerself-supervised pre-traininghybrid tokenizationbotnet defenselongitudinal evaluation
0
0 comments X

The pith

A Transformer with hybrid tokenization and self-supervised pre-training learns invariant features that reduce temporal degradation in DGA detection over nine years.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard deep learning detectors for Domain Generation Algorithms lose accuracy rapidly as new variants emerge because they overfit to patterns in older data. The paper introduces DRIFT, a Transformer framework that combines character-level encoding for morphological randomness with subword-level encoding for word-based structures, then uses three self-supervised pre-training tasks to build stable representations before fine-tuning. This setup is tested in forward-chaining experiments across a nine-year window from 2017 to 2025, showing less performance loss than prior character- or word-based models. The work matters because botnets rely on continuously evolving DGAs to evade detection, so a method that holds up longer supports more dependable network defense without frequent retraining.

Core claim

Through longitudinal evaluation the authors establish that their hybrid tokenization strategy paired with multi-task self-supervised pre-training produces invariant representations in a Transformer model, enabling it to maintain higher detection rates against emerging DGA variants compared with existing baselines when both are assessed in forward-chaining settings over the 2017-2025 period.

What carries the argument

Hybrid tokenization that merges character-level encoding to capture stochastic morphological patterns with subword-level encoding for word-based DGAs, allowing three self-supervised pre-training tasks to extract robust structural and contextual invariant features before supervised fine-tuning.

If this is right

  • The model significantly mitigates the temporal performance degradation that affects prior DGA classifiers.
  • It achieves consistently higher accuracy than state-of-the-art character- and word-based baselines across the multi-year forward-chaining tests.
  • The framework supplies a practical foundation for sustained long-term defense against evolving botnet threats without repeated model retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same hybrid tokenization and pre-training approach could be tested on other classes of evolving threats such as polymorphic malware or zero-day exploit kits.
  • Running additional forward-chaining experiments on DGA variants observed after 2025 would directly test whether the learned invariants extend beyond the original study period.
  • Deployment alongside existing network sensors could reduce the frequency of detector updates required by security operations teams.

Load-bearing premise

The invariant representations produced by the hybrid tokenization and the three specific self-supervised pre-training tasks will continue to generalize to DGA variants that appear after the 2017-2025 study window.

What would settle it

A clear drop in detection accuracy when the trained model is evaluated on DGA samples generated after 2025 using the same forward-chaining protocol would show that the claimed drift resilience does not hold.

Figures

Figures reproduced from arXiv: 2605.10436 by Chaeri Jung, Chaeyoung Lee, Seonghoon Jeong.

Figure 1
Figure 1. Figure 1: Longitudinal t-SNE analysis of the 1024-dimensional fused latent vector vfusion, applied jointly to 25,000 randomly sampled domains. Each point is a domain embedding, color-coded by prediction outcome— TP, TN, FP, and FN. The symbol marks the per-class centroid (benign, DGA), and d denotes the Euclidean distance between the benign and DGA centroids. To emphasize significant temporal shifts while suppressin… view at source ↗
Figure 2
Figure 2. Figure 2: Longitudinal evaluation of DGA detection performance across three baseline models (MIT [ [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Proposed pre-training strategy, where each backbone processes the input domain as a sequence of character and subword tokens, respectively. To [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Proposed dual-branch architecture for DGA detection. Subword-level [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Per-family TPRs for 147 DGA families and the benign class under three models— [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance comparison of DRIFT and three baseline models under adaptive drift-mitigation strategies. Solid lines denote periodic retraining (RT) from scratch on accumulated historical data; dashed lines denote continuous learning (CL) via sequential single-epoch updates on each new year’s data. especially in the form of rising FNRs. These observations motivated four requirements for drift-resilient DGA de… view at source ↗
read the original abstract

Domain Generation Algorithms (DGAs) evolve continuously to evade botnet detection, posing a persistent challenge for dependable network defense. While deep learning-based detectors achieve strong performance under static conditions, they suffer severe degradation when facing temporal drift. Through a 9-year longitudinal study (2017-2025), we empirically show that state-of-the-art character- and word-based DGA classifiers rapidly lose effectiveness as new DGA variants emerge. To address this problem, we propose a drift-resilient Transformer-based framework that learns invariant representations through a hybrid tokenization strategy and multi-task self-supervised pre-training. The model integrates (i) character-level encoding to capture stochastic morphological patterns and (ii) subword-level encoding for word-based DGAs. Three pre-training tasks enable the model to learn robust structural and contextual features prior to supervised fine-tuning. Comprehensive evaluations demonstrate that our method significantly mitigates temporal degradation and consistently outperforms state-of-the-art baselines in forward-chaining experiments. The proposed approach offers a dependable foundation for long-term DGA defense in evolving threat landscapes. Our code is available at: https://github.com/snsec-net/2026-DSN-DRIFT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that character- and word-based DGA detectors suffer rapid performance degradation under temporal drift, as shown in a 9-year (2017-2025) longitudinal forward-chaining study. It proposes DRIFT, a Transformer that combines hybrid character/subword tokenization with three self-supervised pre-training tasks to learn invariant structural and contextual features, which are then fine-tuned for classification; the model is reported to mitigate degradation and outperform baselines in the same forward-chaining protocol.

Significance. If the temporal separation in the experimental protocol is correctly enforced, the work would offer a concrete, reproducible approach to building drift-resilient detectors for evolving threats, with the code release aiding verification. The longitudinal design itself is a strength that could influence future security-ML evaluations.

major comments (2)
  1. [Methodology / Experimental Protocol] The manuscript does not state whether the self-supervised pre-training corpus for each forward-chaining fold is restricted to data from the training window only. If pre-training uses the full 2017-2025 collection, future DGA variants leak into the learned representations before supervised training on past data, rendering the drift-resilience claim unsupported by the reported experiments.
  2. [Methodology] No equations, pseudocode, or precise definitions are given for the three self-supervised pre-training tasks. Without these, it is impossible to determine whether the tasks actually enforce invariance to morphological or contextual drift or simply fit additional parameters on the same data distribution.
minor comments (2)
  1. [Abstract] The abstract asserts 'significantly mitigates temporal degradation' and 'consistently outperforms' without quoting any concrete F1, accuracy, or degradation-rate numbers from the forward-chaining tables; adding these would strengthen the summary.
  2. [Experiments] Figure captions and axis labels in the longitudinal performance plots should explicitly indicate the train/test year splits and whether pre-training data boundaries match those splits.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of experimental rigor and methodological clarity that we will address in the revision.

read point-by-point responses
  1. Referee: [Methodology / Experimental Protocol] The manuscript does not state whether the self-supervised pre-training corpus for each forward-chaining fold is restricted to data from the training window only. If pre-training uses the full 2017-2025 collection, future DGA variants leak into the learned representations before supervised training on past data, rendering the drift-resilience claim unsupported by the reported experiments.

    Authors: We agree that explicit confirmation of temporal separation is essential to support the drift-resilience claims. In our experiments, the self-supervised pre-training corpus for each forward-chaining fold was restricted exclusively to data from the corresponding training window. We will revise the manuscript to state this restriction clearly in the experimental protocol section. revision: yes

  2. Referee: [Methodology] No equations, pseudocode, or precise definitions are given for the three self-supervised pre-training tasks. Without these, it is impossible to determine whether the tasks actually enforce invariance to morphological or contextual drift or simply fit additional parameters on the same data distribution.

    Authors: We acknowledge that the current manuscript lacks sufficient detail on the pre-training tasks. In the revised version, we will add the mathematical equations, pseudocode, and precise definitions for all three self-supervised pre-training tasks, including an explanation of how each promotes structural and contextual invariance. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical ML pipeline with independent forward-chaining evaluations

full rationale

The paper proposes a Transformer architecture using hybrid character/subword tokenization and three self-supervised pre-training tasks, followed by supervised fine-tuning and forward-chaining temporal evaluations over 2017-2025 data. No equations, derivations, or mathematical claims are present that could reduce predictions to inputs by construction. The central performance claims rest on empirical results rather than any self-definitional loop, fitted-parameter renaming, or load-bearing self-citation chain. The method is a standard self-supervised-then-fine-tune pipeline whose validity is assessed externally via held-out later-year test folds; no step equates the output to the input by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of standard Transformer components adapted with a hybrid tokenization scheme and three unspecified self-supervised tasks; no new physical entities or mathematical axioms are introduced.

pith-pipeline@v0.9.0 · 5512 in / 1217 out tokens · 43964 ms · 2026-05-12T05:15:14.163360+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

  1. [1]

    A comprehensive measurement study of domain generating malware,

    D. Plohmann, K. Yakdan, M. Klatt, J. Bader, and E. Gerhards-Padilla, “A comprehensive measurement study of domain generating malware,” in Proc. USENIX Conference on Security Symposium, ser. SEC’16. USA: USENIX Association, 2016, pp. 263—-278

  2. [2]

    From throw-away traffic to bots: Detecting the rise of DGA-based malware,

    M. Antonakakis, R. Perdisci, Y . Nadji, N. Vasiloglou, S. Abu- Nimeh, W. Lee, and D. Dagon, “From throw-away traffic to bots: Detecting the rise of DGA-based malware,” inProc. USENIX Security. Bellevue, W A: USENIX Association, Aug. 2012, pp. 491–506. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity12/technical-sessions/presentation/...

  3. [3]

    The evolving threat landscape of botnets: Comprehensive analysis of detection techniques in the age of artificial intelligence,

    A. Mahboubi, K. Luong, H. Aboutorab, H. T. Bui, S. Camtepe, K. Ansari, and B. Barry, “The evolving threat landscape of botnets: Comprehensive analysis of detection techniques in the age of artificial intelligence,”Internet of Things (The Netherlands), vol. 33, 2025. [Online]. Available: https://www.scopus.com/inward/record.uri?eid= 2-s2.0-105013173895&doi...

  4. [4]

    Detecting algorithmically generated malicious domain names,

    S. Yadav, A. K. K. Reddy, A. N. Reddy, and S. Ranjan, “Detecting algorithmically generated malicious domain names,” inProc. ACM SIGCOMM, 2010, pp. 48–61

  5. [5]

    A survey of machine learning and deep learning based DGA detection techniques,

    A. M. H. Saeed, D. Wang, H. A. M. Alnedhari, K. Mei, and J. Wang, “A survey of machine learning and deep learning based DGA detection techniques,”Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13202 LNCS, pp. 133–143, 2022, cited by:

  6. [6]

    Available: https://www.scopus.com/inward/record.uri? eid=2-s2.0-85127037344&doi=10.1007%2f978-3-030-97774-0 12& partnerID=40&md5=a0bad05c4e724d73a6f35f9c6db094f1

    [Online]. Available: https://www.scopus.com/inward/record.uri? eid=2-s2.0-85127037344&doi=10.1007%2f978-3-030-97774-0 12& partnerID=40&md5=a0bad05c4e724d73a6f35f9c6db094f1

  7. [7]

    Advances in artificial intelligence for detecting algorithmically generated domains: Current trends and future prospects,

    H. Alqahtani and G. Kumar, “Advances in artificial intelligence for detecting algorithmically generated domains: Current trends and future prospects,”Engineering Applications of Artificial Intelligence, vol. 138, p. 109410, 2024. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0952197624015689

  8. [8]

    Dom2Vec - detecting DGA domains through word em- beddings and AI/ML-driven lexicographic analysis,

    L. T. Aravena, P. Casas, J. Bustos-Jim ´enez, G. Capdehourat, and M. Findrik, “Dom2Vec - detecting DGA domains through word em- beddings and AI/ML-driven lexicographic analysis,” inProc. CNSM, 2023, pp. 1–5

  9. [9]

    Down to earth! guidelines for DGA-based malware detection,

    B. C. Cebere, J. L. B. Flueren, S. Sebasti ´an, D. Plohmann, and C. Rossow, “Down to earth! guidelines for DGA-based malware detection,” inProc. RAID, ser. RAID ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 147–165. [Online]. Available: https://doi.org/10.1145/3678890.3678913

  10. [10]

    Botnet dga domain name classification using transformer network with hybrid embedding,

    L. Ding, P. Du, H. Hou, J. Zhang, D. Jin, and S. Ding, “Botnet dga domain name classification using transformer network with hybrid embedding,”Big Data Research, vol. 33, p. 100395, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S221457962300028X

  11. [11]

    A hybrid DGA DefenseNet for detecting DGA domain names based on FastText and deep learning techniques,

    J.-L. Chen, J.-F. Qiu, and Y .-H. Chen, “A hybrid DGA DefenseNet for detecting DGA domain names based on FastText and deep learning techniques,”Computers & Security, vol. 150, p. 104232, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0167404824005388

  12. [12]

    Longitudinal benign and DGA domain name dataset,

    C. Lee, C. Jung, and S. Jeong, “Longitudinal benign and DGA domain name dataset,” 2026. [Online]. Available: https://dx.doi.org/10.21227/ za2s-9e09

  13. [13]

    Wayback machine—top-1m.csv.zip,

    Internet Archive, “Wayback machine—top-1m.csv.zip,” 2017, ac- cessed on Dec 1, 2025. [Online]. Available: https://web.archive.org/web/ 20170801000000*/http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

  14. [14]

    Tranco: A research-oriented top sites ranking hardened against manipulation,

    V . Le Pochat, T. Van Goethem, S. Tajalizadehkhoob, M. Korczynski, and W. Joosen, “Tranco: A research-oriented top sites ranking hardened against manipulation,” inProc. NDSS, ser. NDSS 2019. Internet Society,

  15. [15]

    Available: http://dx.doi.org/10.14722/ndss.2019.23386

    [Online]. Available: http://dx.doi.org/10.14722/ndss.2019.23386

  16. [16]

    A research-oriented top sites ranking hardened against manipulation,

    Tranco, “A research-oriented top sites ranking hardened against manipulation,” 2019, accessed on Dec 1, 2025. [Online]. Available: https://tranco-list.eu/

  17. [17]

    False sense of security: Leveraging xai to analyze the reasoning and true performance of context- less dga classifiers,

    A. Drichel and U. Meyer, “False sense of security: Leveraging xai to analyze the reasoning and true performance of context- less dga classifiers,” inProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, ser. RAID 2023. ACM, Oct. 2023, p. 330–345. [Online]. Available: http://dx.doi.org/10.1145/3607199.3607231

  18. [18]

    LLMs for domain generation algorithm detection,

    R. L. L. O, C. A. Catania, and T. Parlanti, “LLMs for domain generation algorithm detection,” 2024. [Online]. Available: https: //arxiv.org/abs/2411.03307

  19. [19]

    Legitimate domains and dga categorized morphologically and by families

    J. R. Greg ´orio and A. M. Cansian, “Legitimate domains and dga categorized morphologically and by families.” 2025. [Online]. Available: https://data.mendeley.com/datasets/nhvyvytn2h/1

  20. [20]

    Detecting domain names generated by dgas with low false positives in chinese domain names,

    H. Lee, J. Do Yoo, S. Jeong, and H. K. Kim, “Detecting domain names generated by dgas with low false positives in chinese domain names,” IEEE Access, vol. 12, p. 123716–123730, 2024. [Online]. Available: http://dx.doi.org/10.1109/ACCESS.2024.3454242

  21. [21]

    Character level based detection of DGA domain names,

    B. Yu, J. Pan, J. Hu, A. Nascimento, and M. De Cock, “Character level based detection of DGA domain names,” in2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8

  22. [22]

    Detection of DGA domains based on support vector machine,

    Y . Chen, S. Yan, T. Pang, and R. Chen, “Detection of DGA domains based on support vector machine,” inProc. SSIC, 2018, pp. 1–4

  23. [23]

    An evaluation of DGA classifiers,

    R. Sivaguru, C. Choudhary, B. Yu, V . Tymchenko, A. Nascimento, and M. D. Cock, “An evaluation of DGA classifiers,” in2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 5058–5067

  24. [24]

    A machine learning framework for domain generation algorithm-based malware detection,

    Y . Li, K. Xiong, T. Chin, and C. Hu, “A machine learning framework for domain generation algorithm-based malware detection,”IEEE Access, vol. 7, pp. 32 765–32 782, 2019

  25. [25]

    Phoenix: DGA- based botnet tracking and intelligence,

    S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, “Phoenix: DGA- based botnet tracking and intelligence,” inProc. DIMVA. Springer, 2014, pp. 192–211

  26. [26]

    Psybog: A scalable botnet detection method for large-scale DNS traffic,

    J. Kwon, J. Lee, H. Lee, and A. Perrig, “Psybog: A scalable botnet detection method for large-scale DNS traffic,”Computer Networks, vol. 97, pp. 48–73, 2016

  27. [27]

    Predicting domain generation algorithms with long short-term memory networks,

    J. Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, “Predicting domain generation algorithms with long short-term memory networks,”

  28. [28]

    Available: https://arxiv.org/abs/1611.00791

    [Online]. Available: https://arxiv.org/abs/1611.00791

  29. [29]

    Deep learning framework for domain generation algorithms prediction using long short-term memory,

    S. Akarsh, S. Sriram, P. Poornachandran, V . K. Menon, and K. P. Soman, “Deep learning framework for domain generation algorithms prediction using long short-term memory,” inProc. ICACCS, 2019, pp. 666–671

  30. [30]

    Toward optimal LSTM neural networks for detecting algorithmically generated domain names,

    J. Selvi, R. J. Rodr ´ıguez, and E. Soria-Olivas, “Toward optimal LSTM neural networks for detecting algorithmically generated domain names,” IEEE Access, vol. 9, pp. 126 446–126 456, 2021

  31. [31]

    Character-level convolutional networks for text classification,

    X. Zhang, J. Zhao, and Y . LeCun, “Character-level convolutional networks for text classification,” inAdvances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc., 2015

  32. [32]

    Convolutional Neural Networks for Sentence Classification

    Y . Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014. [Online]. Available: https: //arxiv.org/abs/1408.5882

  33. [34]

    Available: https://arxiv.org/abs/1404.2188

    [Online]. Available: https://arxiv.org/abs/1404.2188

  34. [35]

    Real-time detection of dictionary DGA network traffic using deep learning,

    K. Highnam, D. Puzio, S. Luo, and N. R. Jennings, “Real-time detection of dictionary DGA network traffic using deep learning,”SN Computer Science, vol. 2, no. 2, p. 110, 2021

  35. [36]

    A DGA domain names detection modeling method based on integrating an attention mechanism and deep neural network,

    F. Ren, Z. Jiang, X. Wang, and J. Liu, “A DGA domain names detection modeling method based on integrating an attention mechanism and deep neural network,”Cybersecurity, vol. 3, no. 1, p. 4, 2020

  36. [37]

    PEPC: A deep parallel convolutional neural network model with pre-trained embeddings for DGA detection,

    W. Huang, Y . Zong, Z. Shi, L. Wang, and P. Liu, “PEPC: A deep parallel convolutional neural network model with pre-trained embeddings for DGA detection,” in2022 International Joint Conference on Neural Networks (IJCNN), 2022, pp. 1–8

  37. [38]

    Inline detection of domain generation algorithms with context-sensitive word embeddings,

    J. J. Koh and B. Rhodes, “Inline detection of domain generation algorithms with context-sensitive word embeddings,” inProc. Big Data, 2018, pp. 2966–2971

  38. [39]

    Use of subword tokenization for domain genera- tion algorithm classification. cybersecurity 6 (1), 49 (sep 2023)

    S. Liew and N. Law, “Use of subword tokenization for domain genera- tion algorithm classification. cybersecurity 6 (1), 49 (sep 2023).”

  39. [40]

    Wilds: A benchmark of in-the-wild distribution shifts,

    P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsub- ramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gaoet al., “Wilds: A benchmark of in-the-wild distribution shifts,” inProc. ICML. PMLR, 2021, pp. 5637–5664

  40. [41]

    FANCI: Feature-based automated NXDomain classification and intelligence,

    S. Sch ¨uppen, D. Teubert, P. Herrmann, and U. Meyer, “FANCI: Feature-based automated NXDomain classification and intelligence,” in Proc. USENIX Security 18. Baltimore, MD: USENIX Association, Aug. 2018, pp. 1165–1181. [Online]. Available: https://www.usenix. org/conference/usenixsecurity18/presentation/schuppen

  41. [42]

    Detection and blocking of DGA-based bot infected computers by monitoring NXDOMAIN responses,

    Y . Iuchi, Y . Jin, H. Ichise, K. Iida, and Y . Takai, “Detection and blocking of DGA-based bot infected computers by monitoring NXDOMAIN responses,” inProc. CSCloud, 2020, pp. 82–87

  42. [43]

    Dial ”n” for NXDomain: The scale, origin, and security implications of DNS queries to non-existent domains,

    G. Liu, L. Jin, S. Hao, Y . Zhang, D. Liu, A. Stavrou, and H. Wang, “Dial ”n” for NXDomain: The scale, origin, and security implications of DNS queries to non-existent domains,” inProc. IMC, ser. IMC ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 198–212. [Online]. Available: https://doi.org/10.1145/3618257.3624805

  43. [44]

    Domain names - implementation and specification,

    P. Mockapetris, “Domain names - implementation and specification,” RFC Editor, RFC 1035, Nov. 1987, https://www.rfc-editor.org/info/ rfc1035. [Online]. Available: https://www.rfc-editor.org/info/rfc1035

  44. [45]

    Public suffix list,

    Mozilla Foundation, “Public suffix list,” 2025, accessed: 2026-02-23. [Online]. Available: https://publicsuffix.org/list/public suffix list.dat

  45. [46]

    tldextract: Accurately separates a URL’s subdomain, domain, and public suffix,

    J. Kurkowski, “tldextract: Accurately separates a URL’s subdomain, domain, and public suffix,” 2025, python package. [Online]. Available: https://github.com/john-kurkowski/tldextract

  46. [47]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” 2019. [Online]. Available: https://arxiv.org/abs/1810.04805

  47. [48]

    Measuring query latency of top level DNS servers,

    J. Liang, J. Jiang, H. Duan, K. Li, and J. Wu, “Measuring query latency of top level DNS servers,” inProc. PAM. Springer, 2013, pp. 145–154

  48. [49]

    Tweet2vec: Learning tweet embeddings using character-level cnn-lstm encoder-decoder,

    S. V osoughi, P. Vijayaraghavan, and D. Roy, “Tweet2vec: Learning tweet embeddings using character-level cnn-lstm encoder-decoder,” in Proc. ACM SIGIR, ser. SIGIR ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 1041–1044. [Online]. Available: https://doi.org/10.1145/2911451.2914762

  49. [50]

    Analyzing the real-world applicability of DGA classifiers,

    A. Drichel, U. Meyer, S. Sch ¨uppen, and D. Teubert, “Analyzing the real-world applicability of DGA classifiers,” inProceedings of the 15th International Conference on Availability, Reliability and Security, ser. ARES ’20. New York, NY , USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi.org/10.1145/3407023. 3407030

  50. [51]

    B-cos networks: Alignment is all we need for interpretability,

    M. B ¨ohle, M. Fritz, and B. Schiele, “B-cos networks: Alignment is all we need for interpretability,” 2022

  51. [52]

    Transformers: State-of- the-art natural language processing,

    T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y . Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers: State-of- the-art natural language processing,” inProceedings of the 2020 Conference on Empirical Me...