Identifying DNS-tunneled traffic with predictive models
Pith reviewed 2026-05-25 15:37 UTC · model grok-4.3
The pith
Predictive models detect DNS-tunneled traffic with over 83 percent accuracy when trained on query and response pairs together.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that training on DNS query-response pairs rather than queries or responses alone increases model performance for detecting tunneled SSH, SFTP, and Telnet traffic. Models using IP packet length, name length, and name entropy achieve accuracy greater than 83 percent, with a roughly 95 percent reduction in data size.
What carries the argument
The query-response pair feature set of IP packet length, name length, and name entropy extracted from DNS traffic.
If this is right
- Training on query and response pairs improves performance over using only one direction.
- Models achieve accuracy above 83 percent on the tested protocols.
- Feature extraction reduces data size by roughly 95 percent.
- Machine learning serves as a tool for detecting network protocols inside DNS tunnels.
- Only a small subset of network traffic is needed to detect the anomaly.
Where Pith is reading between the lines
- The same feature set might apply to other tunneling protocols beyond the three tested.
- Real-time monitoring becomes feasible because only extracted features are required.
- Attackers could reduce detection by altering name lengths or entropy in their tunnels.
- Combining this method with other network signals could raise overall detection rates.
Load-bearing premise
The features of packet length, name length, and name entropy from query-response pairs are sufficient to distinguish tunneled traffic from normal DNS traffic without much overlap or easy evasion.
What would settle it
Observing that normal DNS traffic produces feature distributions overlapping those of the tested tunnels, or finding a new set of tunnels where the same models drop below 83 percent accuracy.
read the original abstract
DNS is a distributed, fault tolerant system that avoids a single point of failure. As such it is an integral part of the internet as we use it today and hence deemed a safe protocol which is let through firewalls and proxies with no or little checks. This can be exploited by malicious agents. Network forensics is effective but struggles due to size of data and manual labour. This paper explores to what extent predictive models can be used to predict network traffic, what protocols are tunneled in the DNS protocol and more specifically whether the predictive performance is enhanced when analyzing DNS-queries and responses together and which feature set that can be used for DNS-tunneled network prediction. The tested protocols are SSH, SFTP and Telnet and the machine learning models used are Multi Layered Perceptron and Random Forests. To train the models we extract the IP Packet length, Name length and Name entropy of both the queries and responses in the DNS traffic. With an experimental research strategy it is empirically shown that the performance of the models increases when training the models on the query and respose pairs rather than using only queries or responses. The accuracy of the models is >83% and reduction in data size when features are extracted is roughly 95%. Our results provides evidence that machine learning is a valuable tool in detecting network protocols in a DNS tunnel and that only an small subset of network traffic is needed to detect this anomaly.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that supervised ML models (MLP and Random Forests) can identify DNS-tunneled traffic carrying SSH, SFTP or Telnet by extracting three features (IP packet length, name length, name entropy) from DNS query-response pairs, that accuracy exceeds 83 %, that performance is higher when both directions are used than when queries or responses are used alone, and that feature extraction yields a roughly 95 % reduction in data volume.
Significance. If the performance numbers and the pairing benefit survive proper validation and controls, the work would supply concrete evidence that a very small feature set suffices to flag DNS tunnels, supporting lightweight forensic detection without full-packet retention.
major comments (2)
- Abstract and experimental description: the reported accuracy (>83 %) and the claim of improved performance with query-response pairs are presented without any information on train-test splits, cross-validation procedure, class balance, or whether metrics were computed on held-out data. These omissions render the central empirical claims unverifiable from the given text.
- Abstract (pairing experiment): the performance lift attributed to using query-response pairs (6-dimensional vectors) versus single-direction inputs (3-dimensional vectors) is not accompanied by an ablation that holds input dimensionality fixed (e.g., feature duplication or addition of noise features). Without such a control the reported gain cannot be attributed to joint query-response information rather than simply to the doubled feature count.
minor comments (1)
- Abstract: typo 'respose' for 'response' and 'an small' for 'a small'.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve clarity and rigor.
read point-by-point responses
-
Referee: Abstract and experimental description: the reported accuracy (>83 %) and the claim of improved performance with query-response pairs are presented without any information on train-test splits, cross-validation procedure, class balance, or whether metrics were computed on held-out data. These omissions render the central empirical claims unverifiable from the given text.
Authors: We agree that the manuscript does not explicitly describe the train-test split, cross-validation procedure, class balance, or confirmation that metrics were computed on held-out data. These details are necessary for reproducibility. In the revised version we will add a dedicated experimental setup subsection that reports the data partitioning strategy (including the proportion of held-out test data), any cross-validation used, the class distribution in the dataset, and confirmation that all reported accuracies were obtained on unseen test instances. revision: yes
-
Referee: Abstract (pairing experiment): the performance lift attributed to using query-response pairs (6-dimensional vectors) versus single-direction inputs (3-dimensional vectors) is not accompanied by an ablation that holds input dimensionality fixed (e.g., feature duplication or addition of noise features). Without such a control the reported gain cannot be attributed to joint query-response information rather than simply to the doubled feature count.
Authors: The referee correctly identifies that the current comparison does not control for input dimensionality. The manuscript reports results for 3-feature (query-only or response-only) versus 6-feature (paired) inputs but does not include an ablation that duplicates features or adds noise to reach six dimensions without pairing. We will conduct the suggested control experiments in the revision and include the results to determine whether the observed improvement stems from the joint query-response information or merely from the increase in feature count. revision: yes
Circularity Check
No circularity in empirical ML classification pipeline
full rationale
The paper reports an experimental comparison of standard supervised classifiers (MLP and Random Forest) trained on hand-extracted features (packet length, name length, entropy) from DNS queries, responses, or pairs. No equations, fitted parameters presented as predictions, self-citations, or uniqueness theorems appear in the derivation chain. All performance numbers (>83% accuracy) are direct empirical measurements on the collected traffic, falsifiable by replication, and do not reduce to any quantity defined in terms of the output itself.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The chosen features (packet length, name length, entropy) are informative for the classification task
- domain assumption Query-response pairs provide independent information beyond single messages
Reference graph
Works this paper leans on
-
[1]
Domain names - concepts and facilities. Tech. Rep. 1034 (n ov 1987), https://rfc-editor.org/rfc/rfc1034.txt
work page 1987
-
[2]
Domain names - implementation and specification. Tech. Re p. 1035 (nov 1987), https://rfc-editor.org/rfc/rfc1035.txt
work page 1987
-
[3]
Alpaydin, E.: Introduction to machine learning. MIT pres s (2009) 12
work page 2009
-
[4]
Alshammari, R., Zincir-Heywood, A.N.: Can encrypted tra ffic be identified without port numbers, ip addresses and payload inspection? Compute r networks 55(6), 1326–1350 (2011)
work page 2011
- [5]
-
[6]
Arndt, C.: Information Measures: Information and Its Des cription in Science and Engineering. Springer, Heidelberg (2001)
work page 2001
-
[7]
Berners-Lee, T.: Rfc 1630, universal resource identifier s in www. Tech. rep. (1994), https://rfc-editor.org/rfc/rfc1630.txt
work page 1994
-
[8]
Detecting DNS Tunnels Using Character Frequency Analysis
Born, K., Gustafson, D.: Detecting dns tunnels using char acter frequency analysis. arXiv preprint arXiv:1004.4358 (2010)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[9]
Machine learning 45(1), 5–32 (2001)
Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
work page 2001
-
[10]
Damas, J., Graff, M., Vixie, P.A.: Extension Mechanisms f or DNS. Tech. Rep. 6891 (apr 2013), https://rfc-editor.org/rfc/rfc6891.txt
work page 2013
-
[11]
Davidoff, S., Ham, J.: Network Forensics: Tracking Hacke rs through Cyberspace. Pearson Education, Inc. (2012)
work page 2012
-
[12]
Journal of Machine learning research 7(Jan), 1–30 (2006)
Demˇ sar, J.: Statistical comparisons of classifiers ove r multiple data sets. Journal of Machine learning research 7(Jan), 1–30 (2006)
work page 2006
-
[13]
S teen, M., Pohlmann, N.: On botnets that use dns for command and control
Dietrich, C.J., Rossow, C., Freiling, F.C., Bos, H., v. S teen, M., Pohlmann, N.: On botnets that use dns for command and control. In: 2011 Seve nth European Conference on Computer Network Defense. pp. 9–16 (Sep 2011)
work page 2011
-
[14]
Computer Networks 53(1), 81 – 97 (2009)
Dusi, M., Crotti, M., Gringoli, F., Salgarelli, L.: Tunn el hunter: Detecting application-layer tunnels with statistical fingerprintin g. Computer Networks 53(1), 81 – 97 (2009)
work page 2009
-
[15]
The Annals of Mathematical Statistics 11(1), 86–92 (1940)
Friedman, M.: A comparison of alternative tests of signi ficance for the problem of m rankings. The Annals of Mathematical Statistics 11(1), 86–92 (1940)
work page 1940
-
[16]
The qualitative report 8(4), 597–606 (2003)
Golafshani, N.: Understanding reliability and validit y in qualitative research. The qualitative report 8(4), 597–606 (2003)
work page 2003
-
[17]
Chalmers University of Technology, Tech
Hjelmvik, E., John, W.: Breaking and improving protocol obfuscation. Chalmers University of Technology, Tech. Rep 123751 (2010)
work page 2010
-
[18]
Scandinavian journal of statistics pp
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics pp. 65–70 (1979)
work page 1979
-
[19]
In: Proceedings of the Conference on Digital Forensics, Security and Law
Homem, I., Papapetrou, P.: Harnessing predictive model s for assisting network forensic investigations of dns tunnels. In: Proceedings of the Conference on Digital Forensics, Security and Law. pp. 79–93. Association of Digi tal Forensics, Security and Law (2017)
work page 2017
-
[20]
Entropy-based Prediction of Network Protocols in the Forensic Analysis of DNS Tunnels
Homem, I., Papapetrou, P., Dosis, S.: Entropy-based pre diction of network proto- cols in the forensic analysis of dns tunnels. arXiv preprint arXiv:1709.06363 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[21]
Communications in Statistics-Theory and Metho ds 9(6), 571–595 (1980)
Iman, R.L., Davenport, J.M.: Approximations of the crit ical region of the fbietkan statistic. Communications in Statistics-Theory and Metho ds 9(6), 571–595 (1980)
work page 1980
-
[22]
Journal of Network a nd Computer Appli- cations 66, 214 – 235 (2016)
Khan, S., Gani, A., Wahab, A.W.A., Shiraz, M., Ahmad, I.: Network forensics: Review, taxonomy, and open challenges. Journal of Network a nd Computer Appli- cations 66, 214 – 235 (2016)
work page 2016
-
[23]
The annals of mathe- matical statistics 22(1), 79–86 (1951)
Kullback, S., Leibler, R.A.: On information and sufficien cy. The annals of mathe- matical statistics 22(1), 79–86 (1951)
work page 1951
-
[24]
Computer Communications 32(17), 1881–1892 (2009)
Liao, N., Tian, S., Wang, T.: Network forensics based on f uzzy logic and expert system. Computer Communications 32(17), 1881–1892 (2009)
work page 2009
-
[25]
Computers & Securit y 80, 36–53 (2019)
Nadler, A., Aminov, A., Shabtai, A.: Detection of malici ous and low throughput data exfiltration over the dns protocol. Computers & Securit y 80, 36–53 (2019)
work page 2019
-
[26]
International Journal of Computer Research 10, 49–61 (2001) 13
Nanopoulos, A., Alcock, R., Manolopoulos, Y.: Feature- based classification of time- series data. International Journal of Computer Research 10, 49–61 (2001) 13
work page 2001
-
[27]
In: First Digital Forensic Research Workshop, Utica, New York
Palmer, G., et al.: A road map for digital forensic resear ch. In: First Digital Forensic Research Workshop, Utica, New York. pp. 27–30 (2001)
work page 2001
-
[28]
Postel, J., et al.: Rfc 791: Internet protocol. Tech. Rep . 791 (sep 1981), https://rfc-editor.org/rfc/rfc791.txt
work page 1981
-
[29]
Reed, R.D., Marks, R.J.: Neural Smithing: Supervised Le arning in Feedforward Artificial Neural Networks. MIT Press (1999)
work page 1999
-
[30]
UCSD, C.: Quarterly 2014-4 dns-traffic (2014), http://data.caida.org/datasets/topology/ark/ipv4/dns-traffic/\quarterly/2014-04/ , [Online; Accessed 2019-02-11]
work page 2014
-
[31]
UCSD, C.: The ipv4 routed /24 topology dataset (2019), https://www.caida.org/data/active/ipv4 routed 24 topology dataset.xml, [On- line; Accessed 2019-02-19]
work page 2019
-
[32]
IEEE Transactions on Dependable and Secure Computing 10(3), 143–153 (May 2013)
Xu, K., Butler, P., Saha, S., Yao, D.: Dns for massive-sca le command and control. IEEE Transactions on Dependable and Secure Computing 10(3), 143–153 (May 2013)
work page 2013
-
[33]
Ylonen, T., Lonvick, C.: The secure shell (ssh) protocol architecture. Tech. rep. (2005), https://rfc-editor.org/rfc/rfc4253.txt
work page 2005
-
[34]
Information and control 8(3), 338–353 (1965) 14
Zadeh, L.A., et al.: Fuzzy sets. Information and control 8(3), 338–353 (1965) 14
work page 1965
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.