arxiv: 2605.10436 · v1 · submitted 2026-05-11 · 💻 cs.CR · cs.LG· cs.NI

Recognition: 2 theorem links

· Lean Theorem

DRIFT: Drift-Resilient Invariant-Feature Transformer for DGA Detection

Chaeyoung Lee , Chaeri Jung , Seonghoon Jeong

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:15 UTC · model grok-4.3

classification 💻 cs.CR cs.LGcs.NI

keywords DGA detectiontemporal driftinvariant featurestransformerself-supervised pre-traininghybrid tokenizationbotnet defenselongitudinal evaluation

0 comments

The pith

A Transformer with hybrid tokenization and self-supervised pre-training learns invariant features that reduce temporal degradation in DGA detection over nine years.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard deep learning detectors for Domain Generation Algorithms lose accuracy rapidly as new variants emerge because they overfit to patterns in older data. The paper introduces DRIFT, a Transformer framework that combines character-level encoding for morphological randomness with subword-level encoding for word-based structures, then uses three self-supervised pre-training tasks to build stable representations before fine-tuning. This setup is tested in forward-chaining experiments across a nine-year window from 2017 to 2025, showing less performance loss than prior character- or word-based models. The work matters because botnets rely on continuously evolving DGAs to evade detection, so a method that holds up longer supports more dependable network defense without frequent retraining.

Core claim

Through longitudinal evaluation the authors establish that their hybrid tokenization strategy paired with multi-task self-supervised pre-training produces invariant representations in a Transformer model, enabling it to maintain higher detection rates against emerging DGA variants compared with existing baselines when both are assessed in forward-chaining settings over the 2017-2025 period.

What carries the argument

Hybrid tokenization that merges character-level encoding to capture stochastic morphological patterns with subword-level encoding for word-based DGAs, allowing three self-supervised pre-training tasks to extract robust structural and contextual invariant features before supervised fine-tuning.

If this is right

The model significantly mitigates the temporal performance degradation that affects prior DGA classifiers.
It achieves consistently higher accuracy than state-of-the-art character- and word-based baselines across the multi-year forward-chaining tests.
The framework supplies a practical foundation for sustained long-term defense against evolving botnet threats without repeated model retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hybrid tokenization and pre-training approach could be tested on other classes of evolving threats such as polymorphic malware or zero-day exploit kits.
Running additional forward-chaining experiments on DGA variants observed after 2025 would directly test whether the learned invariants extend beyond the original study period.
Deployment alongside existing network sensors could reduce the frequency of detector updates required by security operations teams.

Load-bearing premise

The invariant representations produced by the hybrid tokenization and the three specific self-supervised pre-training tasks will continue to generalize to DGA variants that appear after the 2017-2025 study window.

What would settle it

A clear drop in detection accuracy when the trained model is evaluated on DGA samples generated after 2025 using the same forward-chaining protocol would show that the claimed drift resilience does not hold.

Figures

Figures reproduced from arXiv: 2605.10436 by Chaeri Jung, Chaeyoung Lee, Seonghoon Jeong.

**Figure 1.** Figure 1: Longitudinal t-SNE analysis of the 1024-dimensional fused latent vector vfusion, applied jointly to 25,000 randomly sampled domains. Each point is a domain embedding, color-coded by prediction outcome— TP, TN, FP, and FN. The symbol marks the per-class centroid (benign, DGA), and d denotes the Euclidean distance between the benign and DGA centroids. To emphasize significant temporal shifts while suppressin… view at source ↗

**Figure 2.** Figure 2: Longitudinal evaluation of DGA detection performance across three baseline models (MIT [ [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Proposed pre-training strategy, where each backbone processes the input domain as a sequence of character and subword tokens, respectively. To [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Proposed dual-branch architecture for DGA detection. Subword-level [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Per-family TPRs for 147 DGA families and the benign class under three models— [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 7.** Figure 7: Performance comparison of DRIFT and three baseline models under adaptive drift-mitigation strategies. Solid lines denote periodic retraining (RT) from scratch on accumulated historical data; dashed lines denote continuous learning (CL) via sequential single-epoch updates on each new year’s data. especially in the form of rising FNRs. These observations motivated four requirements for drift-resilient DGA de… view at source ↗

read the original abstract

Domain Generation Algorithms (DGAs) evolve continuously to evade botnet detection, posing a persistent challenge for dependable network defense. While deep learning-based detectors achieve strong performance under static conditions, they suffer severe degradation when facing temporal drift. Through a 9-year longitudinal study (2017-2025), we empirically show that state-of-the-art character- and word-based DGA classifiers rapidly lose effectiveness as new DGA variants emerge. To address this problem, we propose a drift-resilient Transformer-based framework that learns invariant representations through a hybrid tokenization strategy and multi-task self-supervised pre-training. The model integrates (i) character-level encoding to capture stochastic morphological patterns and (ii) subword-level encoding for word-based DGAs. Three pre-training tasks enable the model to learn robust structural and contextual features prior to supervised fine-tuning. Comprehensive evaluations demonstrate that our method significantly mitigates temporal degradation and consistently outperforms state-of-the-art baselines in forward-chaining experiments. The proposed approach offers a dependable foundation for long-term DGA defense in evolving threat landscapes. Our code is available at: https://github.com/snsec-net/2026-DSN-DRIFT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DRIFT combines hybrid character-subword tokenization with multi-task self-supervised pre-training to target temporal drift in DGA detection, but the forward-chaining protocol needs explicit confirmation that pre-training stayed within each training window.

read the letter

The main thing here is that the paper introduces DRIFT, a Transformer that mixes character-level and subword tokenization and runs three self-supervised pre-training tasks before fine-tuning, with the goal of learning features that hold up as DGAs change over years. They run a nine-year forward-chaining study from 2017 to 2025, show that standard character and word-based baselines lose accuracy on newer variants, and report that their model degrades less while beating those baselines. The code release on GitHub is a straightforward plus for anyone who wants to inspect or extend the work. The problem itself is practical—DGA detectors really do need frequent retraining in operations—so the focus on reducing that burden is on target. The hybrid tokenization is a sensible attempt to cover both random strings and any word-like DGAs without overcomplicating the input pipeline. Self-supervised pre-training before supervised steps is a standard way to push for more structural invariance, and the longitudinal setup at least tries to simulate real drift instead of random splits. The soft spot is the temporal isolation during pre-training. Forward-chaining only works for drift claims if the self-supervised stage sees only the earlier data in each fold. The abstract and high-level description do not state this clearly, so it is possible the pre-training corpus spans the full period and leaks later DGAs into the representations. That would make the reduced degradation look better than it is. Without the exact task definitions or ablation numbers it is also hard to tell how much each piece contributes versus the Transformer backbone itself. This is for researchers working on applied ML for network security and botnet defense. Someone already tracking DGA literature or running detectors in production would find the evaluation window and the architecture details useful to compare against their own setups. The thinking is coherent on its own terms and engages the drift issue directly rather than ignoring it, so the paper deserves a serious referee to check the methods section, the pre-training data splits, and the full result tables. I would send it to peer review with a request for explicit confirmation on the pre-training boundaries.

Referee Report

2 major / 2 minor

Summary. The paper claims that character- and word-based DGA detectors suffer rapid performance degradation under temporal drift, as shown in a 9-year (2017-2025) longitudinal forward-chaining study. It proposes DRIFT, a Transformer that combines hybrid character/subword tokenization with three self-supervised pre-training tasks to learn invariant structural and contextual features, which are then fine-tuned for classification; the model is reported to mitigate degradation and outperform baselines in the same forward-chaining protocol.

Significance. If the temporal separation in the experimental protocol is correctly enforced, the work would offer a concrete, reproducible approach to building drift-resilient detectors for evolving threats, with the code release aiding verification. The longitudinal design itself is a strength that could influence future security-ML evaluations.

major comments (2)

[Methodology / Experimental Protocol] The manuscript does not state whether the self-supervised pre-training corpus for each forward-chaining fold is restricted to data from the training window only. If pre-training uses the full 2017-2025 collection, future DGA variants leak into the learned representations before supervised training on past data, rendering the drift-resilience claim unsupported by the reported experiments.
[Methodology] No equations, pseudocode, or precise definitions are given for the three self-supervised pre-training tasks. Without these, it is impossible to determine whether the tasks actually enforce invariance to morphological or contextual drift or simply fit additional parameters on the same data distribution.

minor comments (2)

[Abstract] The abstract asserts 'significantly mitigates temporal degradation' and 'consistently outperforms' without quoting any concrete F1, accuracy, or degradation-rate numbers from the forward-chaining tables; adding these would strengthen the summary.
[Experiments] Figure captions and axis labels in the longitudinal performance plots should explicitly indicate the train/test year splits and whether pre-training data boundaries match those splits.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of experimental rigor and methodological clarity that we will address in the revision.

read point-by-point responses

Referee: [Methodology / Experimental Protocol] The manuscript does not state whether the self-supervised pre-training corpus for each forward-chaining fold is restricted to data from the training window only. If pre-training uses the full 2017-2025 collection, future DGA variants leak into the learned representations before supervised training on past data, rendering the drift-resilience claim unsupported by the reported experiments.

Authors: We agree that explicit confirmation of temporal separation is essential to support the drift-resilience claims. In our experiments, the self-supervised pre-training corpus for each forward-chaining fold was restricted exclusively to data from the corresponding training window. We will revise the manuscript to state this restriction clearly in the experimental protocol section. revision: yes
Referee: [Methodology] No equations, pseudocode, or precise definitions are given for the three self-supervised pre-training tasks. Without these, it is impossible to determine whether the tasks actually enforce invariance to morphological or contextual drift or simply fit additional parameters on the same data distribution.

Authors: We acknowledge that the current manuscript lacks sufficient detail on the pre-training tasks. In the revised version, we will add the mathematical equations, pseudocode, and precise definitions for all three self-supervised pre-training tasks, including an explanation of how each promotes structural and contextual invariance. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical ML pipeline with independent forward-chaining evaluations

full rationale

The paper proposes a Transformer architecture using hybrid character/subword tokenization and three self-supervised pre-training tasks, followed by supervised fine-tuning and forward-chaining temporal evaluations over 2017-2025 data. No equations, derivations, or mathematical claims are present that could reduce predictions to inputs by construction. The central performance claims rest on empirical results rather than any self-definitional loop, fitted-parameter renaming, or load-bearing self-citation chain. The method is a standard self-supervised-then-fine-tune pipeline whose validity is assessed externally via held-out later-year test folds; no step equates the output to the input by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of standard Transformer components adapted with a hybrid tokenization scheme and three unspecified self-supervised tasks; no new physical entities or mathematical axioms are introduced.

pith-pipeline@v0.9.0 · 5512 in / 1217 out tokens · 43964 ms · 2026-05-12T05:15:14.163360+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hybrid tokenization strategy and multi-task self-supervised pre-training... three auxiliary objectives—Masked Token Prediction (MTP), Token Position Prediction (TPP), and Token Order Verification (TOV)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

drift-resilient... invariant representations... forward-chaining experiments

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

[1]

A comprehensive measurement study of domain generating malware,

D. Plohmann, K. Yakdan, M. Klatt, J. Bader, and E. Gerhards-Padilla, “A comprehensive measurement study of domain generating malware,” in Proc. USENIX Conference on Security Symposium, ser. SEC’16. USA: USENIX Association, 2016, pp. 263—-278

work page 2016
[2]

From throw-away traffic to bots: Detecting the rise of DGA-based malware,

M. Antonakakis, R. Perdisci, Y . Nadji, N. Vasiloglou, S. Abu- Nimeh, W. Lee, and D. Dagon, “From throw-away traffic to bots: Detecting the rise of DGA-based malware,” inProc. USENIX Security. Bellevue, W A: USENIX Association, Aug. 2012, pp. 491–506. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity12/technical-sessions/presentation/...

work page 2012
[3]

The evolving threat landscape of botnets: Comprehensive analysis of detection techniques in the age of artificial intelligence,

A. Mahboubi, K. Luong, H. Aboutorab, H. T. Bui, S. Camtepe, K. Ansari, and B. Barry, “The evolving threat landscape of botnets: Comprehensive analysis of detection techniques in the age of artificial intelligence,”Internet of Things (The Netherlands), vol. 33, 2025. [Online]. Available: https://www.scopus.com/inward/record.uri?eid= 2-s2.0-105013173895&doi...

work page arXiv 2025
[4]

Detecting algorithmically generated malicious domain names,

S. Yadav, A. K. K. Reddy, A. N. Reddy, and S. Ranjan, “Detecting algorithmically generated malicious domain names,” inProc. ACM SIGCOMM, 2010, pp. 48–61

work page 2010
[5]

A survey of machine learning and deep learning based DGA detection techniques,

A. M. H. Saeed, D. Wang, H. A. M. Alnedhari, K. Mei, and J. Wang, “A survey of machine learning and deep learning based DGA detection techniques,”Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13202 LNCS, pp. 133–143, 2022, cited by:

work page 2022
[6]

Available: https://www.scopus.com/inward/record.uri? eid=2-s2.0-85127037344&doi=10.1007%2f978-3-030-97774-0 12& partnerID=40&md5=a0bad05c4e724d73a6f35f9c6db094f1

[Online]. Available: https://www.scopus.com/inward/record.uri? eid=2-s2.0-85127037344&doi=10.1007%2f978-3-030-97774-0 12& partnerID=40&md5=a0bad05c4e724d73a6f35f9c6db094f1

work page
[7]

Advances in artificial intelligence for detecting algorithmically generated domains: Current trends and future prospects,

H. Alqahtani and G. Kumar, “Advances in artificial intelligence for detecting algorithmically generated domains: Current trends and future prospects,”Engineering Applications of Artificial Intelligence, vol. 138, p. 109410, 2024. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0952197624015689

work page 2024
[8]

Dom2Vec - detecting DGA domains through word em- beddings and AI/ML-driven lexicographic analysis,

L. T. Aravena, P. Casas, J. Bustos-Jim ´enez, G. Capdehourat, and M. Findrik, “Dom2Vec - detecting DGA domains through word em- beddings and AI/ML-driven lexicographic analysis,” inProc. CNSM, 2023, pp. 1–5

work page 2023
[9]

Down to earth! guidelines for DGA-based malware detection,

B. C. Cebere, J. L. B. Flueren, S. Sebasti ´an, D. Plohmann, and C. Rossow, “Down to earth! guidelines for DGA-based malware detection,” inProc. RAID, ser. RAID ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 147–165. [Online]. Available: https://doi.org/10.1145/3678890.3678913

work page doi:10.1145/3678890.3678913 2024
[10]

Botnet dga domain name classification using transformer network with hybrid embedding,

L. Ding, P. Du, H. Hou, J. Zhang, D. Jin, and S. Ding, “Botnet dga domain name classification using transformer network with hybrid embedding,”Big Data Research, vol. 33, p. 100395, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S221457962300028X

work page 2023
[11]

A hybrid DGA DefenseNet for detecting DGA domain names based on FastText and deep learning techniques,

J.-L. Chen, J.-F. Qiu, and Y .-H. Chen, “A hybrid DGA DefenseNet for detecting DGA domain names based on FastText and deep learning techniques,”Computers & Security, vol. 150, p. 104232, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0167404824005388

work page 2025
[12]

Longitudinal benign and DGA domain name dataset,

C. Lee, C. Jung, and S. Jeong, “Longitudinal benign and DGA domain name dataset,” 2026. [Online]. Available: https://dx.doi.org/10.21227/ za2s-9e09

work page 2026
[13]

Wayback machine—top-1m.csv.zip,

Internet Archive, “Wayback machine—top-1m.csv.zip,” 2017, ac- cessed on Dec 1, 2025. [Online]. Available: https://web.archive.org/web/ 20170801000000*/http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

work page 2017
[14]

Tranco: A research-oriented top sites ranking hardened against manipulation,

V . Le Pochat, T. Van Goethem, S. Tajalizadehkhoob, M. Korczynski, and W. Joosen, “Tranco: A research-oriented top sites ranking hardened against manipulation,” inProc. NDSS, ser. NDSS 2019. Internet Society,

work page 2019
[15]

Available: http://dx.doi.org/10.14722/ndss.2019.23386

[Online]. Available: http://dx.doi.org/10.14722/ndss.2019.23386

work page doi:10.14722/ndss.2019.23386 2019
[16]

A research-oriented top sites ranking hardened against manipulation,

Tranco, “A research-oriented top sites ranking hardened against manipulation,” 2019, accessed on Dec 1, 2025. [Online]. Available: https://tranco-list.eu/

work page 2019
[17]

False sense of security: Leveraging xai to analyze the reasoning and true performance of context- less dga classifiers,

A. Drichel and U. Meyer, “False sense of security: Leveraging xai to analyze the reasoning and true performance of context- less dga classifiers,” inProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, ser. RAID 2023. ACM, Oct. 2023, p. 330–345. [Online]. Available: http://dx.doi.org/10.1145/3607199.3607231

work page doi:10.1145/3607199.3607231 2023
[18]

LLMs for domain generation algorithm detection,

R. L. L. O, C. A. Catania, and T. Parlanti, “LLMs for domain generation algorithm detection,” 2024. [Online]. Available: https: //arxiv.org/abs/2411.03307

work page arXiv 2024
[19]

Legitimate domains and dga categorized morphologically and by families

J. R. Greg ´orio and A. M. Cansian, “Legitimate domains and dga categorized morphologically and by families.” 2025. [Online]. Available: https://data.mendeley.com/datasets/nhvyvytn2h/1

work page 2025
[20]

Detecting domain names generated by dgas with low false positives in chinese domain names,

H. Lee, J. Do Yoo, S. Jeong, and H. K. Kim, “Detecting domain names generated by dgas with low false positives in chinese domain names,” IEEE Access, vol. 12, p. 123716–123730, 2024. [Online]. Available: http://dx.doi.org/10.1109/ACCESS.2024.3454242

work page doi:10.1109/access.2024.3454242 2024
[21]

Character level based detection of DGA domain names,

B. Yu, J. Pan, J. Hu, A. Nascimento, and M. De Cock, “Character level based detection of DGA domain names,” in2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8

work page 2018
[22]

Detection of DGA domains based on support vector machine,

Y . Chen, S. Yan, T. Pang, and R. Chen, “Detection of DGA domains based on support vector machine,” inProc. SSIC, 2018, pp. 1–4

work page 2018
[23]

An evaluation of DGA classifiers,

R. Sivaguru, C. Choudhary, B. Yu, V . Tymchenko, A. Nascimento, and M. D. Cock, “An evaluation of DGA classifiers,” in2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 5058–5067

work page 2018
[24]

A machine learning framework for domain generation algorithm-based malware detection,

Y . Li, K. Xiong, T. Chin, and C. Hu, “A machine learning framework for domain generation algorithm-based malware detection,”IEEE Access, vol. 7, pp. 32 765–32 782, 2019

work page 2019
[25]

Phoenix: DGA- based botnet tracking and intelligence,

S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, “Phoenix: DGA- based botnet tracking and intelligence,” inProc. DIMVA. Springer, 2014, pp. 192–211

work page 2014
[26]

Psybog: A scalable botnet detection method for large-scale DNS traffic,

J. Kwon, J. Lee, H. Lee, and A. Perrig, “Psybog: A scalable botnet detection method for large-scale DNS traffic,”Computer Networks, vol. 97, pp. 48–73, 2016

work page 2016
[27]

Predicting domain generation algorithms with long short-term memory networks,

J. Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, “Predicting domain generation algorithms with long short-term memory networks,”

work page
[28]

Available: https://arxiv.org/abs/1611.00791

[Online]. Available: https://arxiv.org/abs/1611.00791

work page arXiv
[29]

Deep learning framework for domain generation algorithms prediction using long short-term memory,

S. Akarsh, S. Sriram, P. Poornachandran, V . K. Menon, and K. P. Soman, “Deep learning framework for domain generation algorithms prediction using long short-term memory,” inProc. ICACCS, 2019, pp. 666–671

work page 2019
[30]

Toward optimal LSTM neural networks for detecting algorithmically generated domain names,

J. Selvi, R. J. Rodr ´ıguez, and E. Soria-Olivas, “Toward optimal LSTM neural networks for detecting algorithmically generated domain names,” IEEE Access, vol. 9, pp. 126 446–126 456, 2021

work page 2021
[31]

Character-level convolutional networks for text classification,

X. Zhang, J. Zhao, and Y . LeCun, “Character-level convolutional networks for text classification,” inAdvances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc., 2015

work page 2015
[32]

Convolutional Neural Networks for Sentence Classification

Y . Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014. [Online]. Available: https: //arxiv.org/abs/1408.5882

work page Pith review arXiv 2014
[34]

Available: https://arxiv.org/abs/1404.2188

[Online]. Available: https://arxiv.org/abs/1404.2188

work page arXiv
[35]

Real-time detection of dictionary DGA network traffic using deep learning,

K. Highnam, D. Puzio, S. Luo, and N. R. Jennings, “Real-time detection of dictionary DGA network traffic using deep learning,”SN Computer Science, vol. 2, no. 2, p. 110, 2021

work page 2021
[36]

A DGA domain names detection modeling method based on integrating an attention mechanism and deep neural network,

F. Ren, Z. Jiang, X. Wang, and J. Liu, “A DGA domain names detection modeling method based on integrating an attention mechanism and deep neural network,”Cybersecurity, vol. 3, no. 1, p. 4, 2020

work page 2020
[37]

PEPC: A deep parallel convolutional neural network model with pre-trained embeddings for DGA detection,

W. Huang, Y . Zong, Z. Shi, L. Wang, and P. Liu, “PEPC: A deep parallel convolutional neural network model with pre-trained embeddings for DGA detection,” in2022 International Joint Conference on Neural Networks (IJCNN), 2022, pp. 1–8

work page 2022
[38]

Inline detection of domain generation algorithms with context-sensitive word embeddings,

J. J. Koh and B. Rhodes, “Inline detection of domain generation algorithms with context-sensitive word embeddings,” inProc. Big Data, 2018, pp. 2966–2971

work page 2018
[39]

Use of subword tokenization for domain genera- tion algorithm classification. cybersecurity 6 (1), 49 (sep 2023)

S. Liew and N. Law, “Use of subword tokenization for domain genera- tion algorithm classification. cybersecurity 6 (1), 49 (sep 2023).”

work page 2023
[40]

Wilds: A benchmark of in-the-wild distribution shifts,

P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsub- ramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gaoet al., “Wilds: A benchmark of in-the-wild distribution shifts,” inProc. ICML. PMLR, 2021, pp. 5637–5664

work page 2021
[41]

FANCI: Feature-based automated NXDomain classification and intelligence,

S. Sch ¨uppen, D. Teubert, P. Herrmann, and U. Meyer, “FANCI: Feature-based automated NXDomain classification and intelligence,” in Proc. USENIX Security 18. Baltimore, MD: USENIX Association, Aug. 2018, pp. 1165–1181. [Online]. Available: https://www.usenix. org/conference/usenixsecurity18/presentation/schuppen

work page 2018
[42]

Detection and blocking of DGA-based bot infected computers by monitoring NXDOMAIN responses,

Y . Iuchi, Y . Jin, H. Ichise, K. Iida, and Y . Takai, “Detection and blocking of DGA-based bot infected computers by monitoring NXDOMAIN responses,” inProc. CSCloud, 2020, pp. 82–87

work page 2020
[43]

Dial ”n” for NXDomain: The scale, origin, and security implications of DNS queries to non-existent domains,

G. Liu, L. Jin, S. Hao, Y . Zhang, D. Liu, A. Stavrou, and H. Wang, “Dial ”n” for NXDomain: The scale, origin, and security implications of DNS queries to non-existent domains,” inProc. IMC, ser. IMC ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 198–212. [Online]. Available: https://doi.org/10.1145/3618257.3624805

work page doi:10.1145/3618257.3624805 2023
[44]

Domain names - implementation and specification,

P. Mockapetris, “Domain names - implementation and specification,” RFC Editor, RFC 1035, Nov. 1987, https://www.rfc-editor.org/info/ rfc1035. [Online]. Available: https://www.rfc-editor.org/info/rfc1035

work page 1987
[45]

Public suffix list,

Mozilla Foundation, “Public suffix list,” 2025, accessed: 2026-02-23. [Online]. Available: https://publicsuffix.org/list/public suffix list.dat

work page 2025
[46]

tldextract: Accurately separates a URL’s subdomain, domain, and public suffix,

J. Kurkowski, “tldextract: Accurately separates a URL’s subdomain, domain, and public suffix,” 2025, python package. [Online]. Available: https://github.com/john-kurkowski/tldextract

work page 2025
[47]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” 2019. [Online]. Available: https://arxiv.org/abs/1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2019
[48]

Measuring query latency of top level DNS servers,

J. Liang, J. Jiang, H. Duan, K. Li, and J. Wu, “Measuring query latency of top level DNS servers,” inProc. PAM. Springer, 2013, pp. 145–154

work page 2013
[49]

Tweet2vec: Learning tweet embeddings using character-level cnn-lstm encoder-decoder,

S. V osoughi, P. Vijayaraghavan, and D. Roy, “Tweet2vec: Learning tweet embeddings using character-level cnn-lstm encoder-decoder,” in Proc. ACM SIGIR, ser. SIGIR ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 1041–1044. [Online]. Available: https://doi.org/10.1145/2911451.2914762

work page doi:10.1145/2911451.2914762 2016
[50]

Analyzing the real-world applicability of DGA classifiers,

A. Drichel, U. Meyer, S. Sch ¨uppen, and D. Teubert, “Analyzing the real-world applicability of DGA classifiers,” inProceedings of the 15th International Conference on Availability, Reliability and Security, ser. ARES ’20. New York, NY , USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi.org/10.1145/3407023. 3407030

work page doi:10.1145/3407023 2020
[51]

B-cos networks: Alignment is all we need for interpretability,

M. B ¨ohle, M. Fritz, and B. Schiele, “B-cos networks: Alignment is all we need for interpretability,” 2022

work page 2022
[52]

Transformers: State-of- the-art natural language processing,

T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y . Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers: State-of- the-art natural language processing,” inProceedings of the 2020 Conference on Empirical Me...

work page 2020