pith. sign in

arxiv: 1906.11246 · v1 · pith:Y6L4IPJ5new · submitted 2019-06-26 · 💻 cs.CR · cs.LG

Identifying DNS-tunneled traffic with predictive models

Pith reviewed 2026-05-25 15:37 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords DNS tunnelingmachine learningnetwork securitypredictive modelsquery response pairsSSHSFTPTelnet
0
0 comments X

The pith

Predictive models detect DNS-tunneled traffic with over 83 percent accuracy when trained on query and response pairs together.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether machine learning can identify when protocols such as SSH, SFTP, and Telnet are hidden inside DNS traffic. It trains Multi Layered Perceptron and Random Forest models on three extracted features from DNS messages. The central result is that performance improves when the models see both queries and responses rather than either alone. This approach matters because DNS tunnels can slip past firewalls, and full traffic logs are too large for manual review. The models reach accuracy above 83 percent while shrinking the required data volume by roughly 95 percent through feature extraction.

Core claim

The paper shows that training on DNS query-response pairs rather than queries or responses alone increases model performance for detecting tunneled SSH, SFTP, and Telnet traffic. Models using IP packet length, name length, and name entropy achieve accuracy greater than 83 percent, with a roughly 95 percent reduction in data size.

What carries the argument

The query-response pair feature set of IP packet length, name length, and name entropy extracted from DNS traffic.

If this is right

  • Training on query and response pairs improves performance over using only one direction.
  • Models achieve accuracy above 83 percent on the tested protocols.
  • Feature extraction reduces data size by roughly 95 percent.
  • Machine learning serves as a tool for detecting network protocols inside DNS tunnels.
  • Only a small subset of network traffic is needed to detect the anomaly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same feature set might apply to other tunneling protocols beyond the three tested.
  • Real-time monitoring becomes feasible because only extracted features are required.
  • Attackers could reduce detection by altering name lengths or entropy in their tunnels.
  • Combining this method with other network signals could raise overall detection rates.

Load-bearing premise

The features of packet length, name length, and name entropy from query-response pairs are sufficient to distinguish tunneled traffic from normal DNS traffic without much overlap or easy evasion.

What would settle it

Observing that normal DNS traffic produces feature distributions overlapping those of the tested tunnels, or finding a new set of tunnels where the same models drop below 83 percent accuracy.

read the original abstract

DNS is a distributed, fault tolerant system that avoids a single point of failure. As such it is an integral part of the internet as we use it today and hence deemed a safe protocol which is let through firewalls and proxies with no or little checks. This can be exploited by malicious agents. Network forensics is effective but struggles due to size of data and manual labour. This paper explores to what extent predictive models can be used to predict network traffic, what protocols are tunneled in the DNS protocol and more specifically whether the predictive performance is enhanced when analyzing DNS-queries and responses together and which feature set that can be used for DNS-tunneled network prediction. The tested protocols are SSH, SFTP and Telnet and the machine learning models used are Multi Layered Perceptron and Random Forests. To train the models we extract the IP Packet length, Name length and Name entropy of both the queries and responses in the DNS traffic. With an experimental research strategy it is empirically shown that the performance of the models increases when training the models on the query and respose pairs rather than using only queries or responses. The accuracy of the models is >83% and reduction in data size when features are extracted is roughly 95%. Our results provides evidence that machine learning is a valuable tool in detecting network protocols in a DNS tunnel and that only an small subset of network traffic is needed to detect this anomaly.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that supervised ML models (MLP and Random Forests) can identify DNS-tunneled traffic carrying SSH, SFTP or Telnet by extracting three features (IP packet length, name length, name entropy) from DNS query-response pairs, that accuracy exceeds 83 %, that performance is higher when both directions are used than when queries or responses are used alone, and that feature extraction yields a roughly 95 % reduction in data volume.

Significance. If the performance numbers and the pairing benefit survive proper validation and controls, the work would supply concrete evidence that a very small feature set suffices to flag DNS tunnels, supporting lightweight forensic detection without full-packet retention.

major comments (2)
  1. Abstract and experimental description: the reported accuracy (>83 %) and the claim of improved performance with query-response pairs are presented without any information on train-test splits, cross-validation procedure, class balance, or whether metrics were computed on held-out data. These omissions render the central empirical claims unverifiable from the given text.
  2. Abstract (pairing experiment): the performance lift attributed to using query-response pairs (6-dimensional vectors) versus single-direction inputs (3-dimensional vectors) is not accompanied by an ablation that holds input dimensionality fixed (e.g., feature duplication or addition of noise features). Without such a control the reported gain cannot be attributed to joint query-response information rather than simply to the doubled feature count.
minor comments (1)
  1. Abstract: typo 'respose' for 'response' and 'an small' for 'a small'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve clarity and rigor.

read point-by-point responses
  1. Referee: Abstract and experimental description: the reported accuracy (>83 %) and the claim of improved performance with query-response pairs are presented without any information on train-test splits, cross-validation procedure, class balance, or whether metrics were computed on held-out data. These omissions render the central empirical claims unverifiable from the given text.

    Authors: We agree that the manuscript does not explicitly describe the train-test split, cross-validation procedure, class balance, or confirmation that metrics were computed on held-out data. These details are necessary for reproducibility. In the revised version we will add a dedicated experimental setup subsection that reports the data partitioning strategy (including the proportion of held-out test data), any cross-validation used, the class distribution in the dataset, and confirmation that all reported accuracies were obtained on unseen test instances. revision: yes

  2. Referee: Abstract (pairing experiment): the performance lift attributed to using query-response pairs (6-dimensional vectors) versus single-direction inputs (3-dimensional vectors) is not accompanied by an ablation that holds input dimensionality fixed (e.g., feature duplication or addition of noise features). Without such a control the reported gain cannot be attributed to joint query-response information rather than simply to the doubled feature count.

    Authors: The referee correctly identifies that the current comparison does not control for input dimensionality. The manuscript reports results for 3-feature (query-only or response-only) versus 6-feature (paired) inputs but does not include an ablation that duplicates features or adds noise to reach six dimensions without pairing. We will conduct the suggested control experiments in the revision and include the results to determine whether the observed improvement stems from the joint query-response information or merely from the increase in feature count. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical ML classification pipeline

full rationale

The paper reports an experimental comparison of standard supervised classifiers (MLP and Random Forest) trained on hand-extracted features (packet length, name length, entropy) from DNS queries, responses, or pairs. No equations, fitted parameters presented as predictions, self-citations, or uniqueness theorems appear in the derivation chain. All performance numbers (>83% accuracy) are direct empirical measurements on the collected traffic, falsifiable by replication, and do not reduce to any quantity defined in terms of the output itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that the three extracted features capture the difference between normal and tunneled DNS traffic and that the experimental traffic is representative; no free parameters, axioms, or invented entities are introduced beyond standard ML training.

axioms (2)
  • domain assumption The chosen features (packet length, name length, entropy) are informative for the classification task
    Invoked when the authors state that these features are extracted and used for training
  • domain assumption Query-response pairs provide independent information beyond single messages
    Central to the claim that pairing improves performance

pith-pipeline@v0.9.0 · 5776 in / 1350 out tokens · 22478 ms · 2026-05-25T15:37:32.007685+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 2 internal anchors

  1. [1]

    Domain names - concepts and facilities. Tech. Rep. 1034 (n ov 1987), https://rfc-editor.org/rfc/rfc1034.txt

  2. [2]

    Domain names - implementation and specification. Tech. Re p. 1035 (nov 1987), https://rfc-editor.org/rfc/rfc1035.txt

  3. [3]

    MIT pres s (2009) 12

    Alpaydin, E.: Introduction to machine learning. MIT pres s (2009) 12

  4. [4]

    Alshammari, R., Zincir-Heywood, A.N.: Can encrypted tra ffic be identified without port numbers, ip addresses and payload inspection? Compute r networks 55(6), 1326–1350 (2011)

  5. [5]

    In: DMIN

    Anguita, D., Ghio, A., Ridella, S., Sterpi, D.: K-fold cro ss validation for error rate estimate in support vector machines. In: DMIN. pp. 291–297 ( 2009)

  6. [6]

    Springer, Heidelberg (2001)

    Arndt, C.: Information Measures: Information and Its Des cription in Science and Engineering. Springer, Heidelberg (2001)

  7. [7]

    Berners-Lee, T.: Rfc 1630, universal resource identifier s in www. Tech. rep. (1994), https://rfc-editor.org/rfc/rfc1630.txt

  8. [8]

    Detecting DNS Tunnels Using Character Frequency Analysis

    Born, K., Gustafson, D.: Detecting dns tunnels using char acter frequency analysis. arXiv preprint arXiv:1004.4358 (2010)

  9. [9]

    Machine learning 45(1), 5–32 (2001)

    Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)

  10. [10]

    Damas, J., Graff, M., Vixie, P.A.: Extension Mechanisms f or DNS. Tech. Rep. 6891 (apr 2013), https://rfc-editor.org/rfc/rfc6891.txt

  11. [11]

    Pearson Education, Inc

    Davidoff, S., Ham, J.: Network Forensics: Tracking Hacke rs through Cyberspace. Pearson Education, Inc. (2012)

  12. [12]

    Journal of Machine learning research 7(Jan), 1–30 (2006)

    Demˇ sar, J.: Statistical comparisons of classifiers ove r multiple data sets. Journal of Machine learning research 7(Jan), 1–30 (2006)

  13. [13]

    S teen, M., Pohlmann, N.: On botnets that use dns for command and control

    Dietrich, C.J., Rossow, C., Freiling, F.C., Bos, H., v. S teen, M., Pohlmann, N.: On botnets that use dns for command and control. In: 2011 Seve nth European Conference on Computer Network Defense. pp. 9–16 (Sep 2011)

  14. [14]

    Computer Networks 53(1), 81 – 97 (2009)

    Dusi, M., Crotti, M., Gringoli, F., Salgarelli, L.: Tunn el hunter: Detecting application-layer tunnels with statistical fingerprintin g. Computer Networks 53(1), 81 – 97 (2009)

  15. [15]

    The Annals of Mathematical Statistics 11(1), 86–92 (1940)

    Friedman, M.: A comparison of alternative tests of signi ficance for the problem of m rankings. The Annals of Mathematical Statistics 11(1), 86–92 (1940)

  16. [16]

    The qualitative report 8(4), 597–606 (2003)

    Golafshani, N.: Understanding reliability and validit y in qualitative research. The qualitative report 8(4), 597–606 (2003)

  17. [17]

    Chalmers University of Technology, Tech

    Hjelmvik, E., John, W.: Breaking and improving protocol obfuscation. Chalmers University of Technology, Tech. Rep 123751 (2010)

  18. [18]

    Scandinavian journal of statistics pp

    Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics pp. 65–70 (1979)

  19. [19]

    In: Proceedings of the Conference on Digital Forensics, Security and Law

    Homem, I., Papapetrou, P.: Harnessing predictive model s for assisting network forensic investigations of dns tunnels. In: Proceedings of the Conference on Digital Forensics, Security and Law. pp. 79–93. Association of Digi tal Forensics, Security and Law (2017)

  20. [20]

    Entropy-based Prediction of Network Protocols in the Forensic Analysis of DNS Tunnels

    Homem, I., Papapetrou, P., Dosis, S.: Entropy-based pre diction of network proto- cols in the forensic analysis of dns tunnels. arXiv preprint arXiv:1709.06363 (2017)

  21. [21]

    Communications in Statistics-Theory and Metho ds 9(6), 571–595 (1980)

    Iman, R.L., Davenport, J.M.: Approximations of the crit ical region of the fbietkan statistic. Communications in Statistics-Theory and Metho ds 9(6), 571–595 (1980)

  22. [22]

    Journal of Network a nd Computer Appli- cations 66, 214 – 235 (2016)

    Khan, S., Gani, A., Wahab, A.W.A., Shiraz, M., Ahmad, I.: Network forensics: Review, taxonomy, and open challenges. Journal of Network a nd Computer Appli- cations 66, 214 – 235 (2016)

  23. [23]

    The annals of mathe- matical statistics 22(1), 79–86 (1951)

    Kullback, S., Leibler, R.A.: On information and sufficien cy. The annals of mathe- matical statistics 22(1), 79–86 (1951)

  24. [24]

    Computer Communications 32(17), 1881–1892 (2009)

    Liao, N., Tian, S., Wang, T.: Network forensics based on f uzzy logic and expert system. Computer Communications 32(17), 1881–1892 (2009)

  25. [25]

    Computers & Securit y 80, 36–53 (2019)

    Nadler, A., Aminov, A., Shabtai, A.: Detection of malici ous and low throughput data exfiltration over the dns protocol. Computers & Securit y 80, 36–53 (2019)

  26. [26]

    International Journal of Computer Research 10, 49–61 (2001) 13

    Nanopoulos, A., Alcock, R., Manolopoulos, Y.: Feature- based classification of time- series data. International Journal of Computer Research 10, 49–61 (2001) 13

  27. [27]

    In: First Digital Forensic Research Workshop, Utica, New York

    Palmer, G., et al.: A road map for digital forensic resear ch. In: First Digital Forensic Research Workshop, Utica, New York. pp. 27–30 (2001)

  28. [28]

    Postel, J., et al.: Rfc 791: Internet protocol. Tech. Rep . 791 (sep 1981), https://rfc-editor.org/rfc/rfc791.txt

  29. [29]

    MIT Press (1999)

    Reed, R.D., Marks, R.J.: Neural Smithing: Supervised Le arning in Feedforward Artificial Neural Networks. MIT Press (1999)

  30. [30]

    UCSD, C.: Quarterly 2014-4 dns-traffic (2014), http://data.caida.org/datasets/topology/ark/ipv4/dns-traffic/\quarterly/2014-04/ , [Online; Accessed 2019-02-11]

  31. [31]

    UCSD, C.: The ipv4 routed /24 topology dataset (2019), https://www.caida.org/data/active/ipv4 routed 24 topology dataset.xml, [On- line; Accessed 2019-02-19]

  32. [32]

    IEEE Transactions on Dependable and Secure Computing 10(3), 143–153 (May 2013)

    Xu, K., Butler, P., Saha, S., Yao, D.: Dns for massive-sca le command and control. IEEE Transactions on Dependable and Secure Computing 10(3), 143–153 (May 2013)

  33. [33]

    Ylonen, T., Lonvick, C.: The secure shell (ssh) protocol architecture. Tech. rep. (2005), https://rfc-editor.org/rfc/rfc4253.txt

  34. [34]

    Information and control 8(3), 338–353 (1965) 14

    Zadeh, L.A., et al.: Fuzzy sets. Information and control 8(3), 338–353 (1965) 14