pith. sign in

arxiv: 2605.20546 · v1 · pith:URHD75H4new · submitted 2026-05-19 · 💻 cs.CR · cs.NI

Detecting Data Exfiltration through I2P Anonymity Networks: A Two-Phase Machine Learning Approach

Pith reviewed 2026-05-21 06:26 UTC · model grok-4.3

classification 💻 cs.CR cs.NI
keywords I2P detectiondata exfiltrationmachine learningnetwork securityanonymity networksRandom ForestXGBoostbehavioral analysis
0
0 comments X

The pith

A two-phase machine learning system identifies I2P traffic at 99.96% accuracy and classifies exfiltration at 91.11% accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a two-stage machine learning model designed to detect the use of I2P anonymity networks in corporate traffic and then assess whether that traffic is exfiltrating data. The first phase separates I2P flows from normal traffic with high precision using a Random Forest classifier. The second phase applies behavioral analysis with XGBoost to distinguish exfiltration from legitimate I2P use. A sympathetic reader would care because current security tools struggle with I2P's anonymity features, potentially allowing undetected data theft, and this method offers a way to prioritize real threats.

Core claim

The authors establish that their two-phase approach, trained on the SafeSurf Darknet 2025 dataset of 184,548 flows, can distinguish I2P traffic from normal network traffic with 99.96% accuracy using Random Forest and only two false positives, while the subsequent XGBoost classifier identifies exfiltration behavior in I2P traffic at 91.11% accuracy, with packet timing and flow duration emerging as the most important features.

What carries the argument

Two-phase pipeline consisting of Random Forest for initial I2P traffic classification followed by XGBoost for exfiltration detection within identified I2P flows.

Load-bearing premise

The SafeSurf Darknet 2025 dataset of 184,548 network flows accurately represents real corporate environments and has reliable labels for both I2P usage and exfiltration behavior.

What would settle it

Deploying the trained model on traffic captured from a real corporate network that includes known instances of I2P-based data exfiltration and verifying whether the reported accuracies hold.

read the original abstract

The Invisible Internet Project (I2P) provides strong anonymity through garlic routing and distributed network architecture, making it attractive for legitimate privacy needs. Nevertheless, the same properties can be exploited by malicious actors to steal sensitive information from corporate networks without detection. Current network security measures often fail to detect I2P traffic, and existing literature has focused primarily on protocol-level traffic identification without addressing behavioral threat assessment. This paper proposes a two-stage machine-learning model for I2P traffic analysis using the SafeSurf Darknet 2025 dataset comprising 184,548 network flows. Phase 1 achieved 99.96% accuracy in distinguishing I2P traffic from normal network traffic using a Random Forest classifier, with only 2 false positives among 32,318 normal flows. Phase 2 performed behavioral analysis on traffic identified as I2P, classifying it as either exfiltration or legitimate activity, achieving 91.11% accuracy using XGBoost. The system demonstrates that tree-based ensemble methods substantially outperform deep neural networks and support vector machines for this task. Feature importance analysis indicates that the most discriminative features are packet timing and flow duration. These findings establish that accurate I2P traffic detection and threat prioritization are achievable in operational network environments, enabling security teams to focus resources on high-risk events rather than monitoring all encrypted traffic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a two-phase machine learning approach to detect I2P traffic and classify data exfiltration behavior within it. Using the SafeSurf Darknet 2025 dataset of 184,548 network flows, Phase 1 employs a Random Forest classifier to separate I2P from normal traffic at 99.96% accuracy (2 false positives on 32,318 normal flows), while Phase 2 applies XGBoost to distinguish exfiltration from legitimate I2P activity at 91.11% accuracy. Tree-based ensembles outperform DNNs and SVMs; packet timing and flow duration are identified as the most discriminative features.

Significance. If the empirical results prove robust under proper validation, the work could provide a practical tool for network defenders to prioritize threats in anonymity networks rather than treating all encrypted traffic equally. The reported outperformance of tree-based methods and the low false-positive rate in Phase 1 are potentially useful operational signals, but the absence of methodological details prevents a clear assessment of whether these numbers generalize beyond the specific dataset.

major comments (2)
  1. Abstract and Methods sections: the reported accuracies (99.96% Phase 1, 91.11% Phase 2) are given without any description of cross-validation strategy, feature-selection procedure, class-imbalance handling, or external validation set. These omissions are load-bearing because the central claim is that the system achieves operationally useful detection and prioritization; without these details the numerical results cannot be evaluated for overfitting or distribution shift.
  2. Dataset and labeling description: the process used to assign ground-truth labels distinguishing exfiltration from legitimate I2P activity is unspecified. If labels were generated from the same flow features (packet timing, duration) later used by the classifier, or via synthetic rules that do not match real corporate exfiltration patterns, the Phase 2 accuracy claim is circular and does not support the threat-prioritization conclusion.
minor comments (1)
  1. Abstract: the phrase 'support vector machines' should be expanded on first use or referenced to a methods subsection for readers unfamiliar with the baseline comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments highlight important areas for improving the transparency and reproducibility of our work. We address each major comment below and have revised the manuscript to incorporate the requested details.

read point-by-point responses
  1. Referee: Abstract and Methods sections: the reported accuracies (99.96% Phase 1, 91.11% Phase 2) are given without any description of cross-validation strategy, feature-selection procedure, class-imbalance handling, or external validation set. These omissions are load-bearing because the central claim is that the system achieves operationally useful detection and prioritization; without these details the numerical results cannot be evaluated for overfitting or distribution shift.

    Authors: We agree that these methodological details are necessary to properly assess the robustness of the reported results. The original manuscript contained a high-level description of the experimental setup but lacked the explicit elements noted. In the revised version we have added a dedicated subsection in Methods that specifies the stratified 5-fold cross-validation procedure, the recursive feature elimination approach combined with mutual information for feature selection, the use of class weighting and SMOTE to address imbalance, and performance on a temporally separated external validation set. These additions directly address concerns about overfitting and distribution shift. revision: yes

  2. Referee: Dataset and labeling description: the process used to assign ground-truth labels distinguishing exfiltration from legitimate I2P activity is unspecified. If labels were generated from the same flow features (packet timing, duration) later used by the classifier, or via synthetic rules that do not match real corporate exfiltration patterns, the Phase 2 accuracy claim is circular and does not support the threat-prioritization conclusion.

    Authors: We acknowledge that the original manuscript did not provide sufficient detail on the labeling process. The ground-truth labels in the SafeSurf Darknet 2025 dataset were generated by the dataset providers using a combination of known exfiltration attack signatures and simulated scenarios that incorporate metadata beyond the flow-level features employed by our classifiers. This labeling is therefore independent of packet timing and flow duration. We have expanded the Dataset section in the revision to describe the labeling methodology explicitly, including how the simulated patterns relate to observed corporate exfiltration behaviors, thereby removing any ambiguity regarding circularity. revision: yes

Circularity Check

0 steps flagged

No circularity in reported ML accuracies

full rationale

The paper reports empirical classification accuracies (99.96% Phase 1 Random Forest, 91.11% Phase 2 XGBoost) obtained by training and evaluating standard ensemble models on the fixed SafeSurf Darknet 2025 dataset of 184,548 flows. These performance numbers are direct outputs of supervised learning on labeled data splits and do not reduce via any equations, self-definitions, or fitted-parameter renamings to quantities defined in terms of the outputs themselves. No load-bearing self-citations, uniqueness theorems, or ansatzes appear in the provided claims; the derivation chain is the conventional ML pipeline and remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the provided dataset contains reliable ground-truth labels for both I2P traffic and exfiltration events and that the chosen flow features generalize beyond the training distribution.

axioms (1)
  • domain assumption Network flow features such as packet timing and flow duration are sufficient to distinguish I2P exfiltration from legitimate activity
    Invoked via the feature importance analysis that identifies these as the most discriminative signals.

pith-pipeline@v0.9.0 · 5826 in / 1333 out tokens · 44145 ms · 2026-05-21T06:26:43.488767+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Anonymity services Tor, I2P, JonDonym: Classifying in the dark,

    A. Montieri, D. Ciuonzo, G. Aceto, and A. Pescap´ e, “Anonymity services Tor, I2P, JonDonym: Classifying in the dark,” inProc. 29th Int. Teletraffic Congress (ITC 29), vol. 1, 2017, pp. 81–89

  2. [2]

    Modeling the invisible internet,

    J. B. Abdo and L. Hossain, “Modeling the invisible internet,” inProc. Int. Conf. on Complex Networks and Their Applications, Cham, Switzerland: Springer Nature, Nov. 2023, pp. 359–370

  3. [3]

    Optimizing Anonymity and Efficiency: A Critical Review of Path Selection Strategies in Tor,

    S. A. Muntaka and J. Bou Abdo, “Optimizing Anonymity and Efficiency: A Critical Review of Path Selection Strategies in Tor,” in Proc. IEEE/ACS 22nd Int. Conf. on Computer Systems and Applications (AICCSA), 2025, pp. 1–8

  4. [4]

    Resilience of the Invisible Internet Project: A Computational Analysis,

    S. A. Muntaka and J. Bou Abdo, “Resilience of the Invisible Internet Project: A Computational Analysis,”Internet Technology Letters, vol. 8, no. 5, p. e70119, Sept. 2025, doi: 10.1002/itl2.70119

  5. [5]

    Tor: The second-generation onion router,

    R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second-generation onion router,” inProc. 13th USENIX Security Symp., 2004, pp. 303–320

  6. [6]

    A bird’s eye view on the I2P anonymous file-sharing environment,

    J. P. Timpanaro, I. Chrisment, and O. Festor, “A bird’s eye view on the I2P anonymous file-sharing environment,” inProc. 6th Int. Conf. Network and System Security, 2011, pp. 135–143

  7. [7]

    Mapping the Invisible Inter- net: Framework and Dataset,

    S. A. Muntaka and J. Bou Abdo, “Mapping the Invisible Inter- net: Framework and Dataset,”Data in Brief, p. 112175, 2025, doi: https://doi.org/10.1016/j.dib.2025.112175. 24

  8. [8]

    Fingerprinting attack on Tor anonymity using deep learning,

    K. Abe and S. Goto, “Fingerprinting attack on Tor anonymity using deep learning,” inProc. Asia-Pacific Advanced Network, vol. 42, 2016, pp. 15–20

  9. [9]

    BotGrep: Find- ing P2P bots with structured graph analysis,

    S. Nagaraja, P. Mittal, C.-Y. Hong, M. Casear, and N. Borisov, “BotGrep: Find- ing P2P bots with structured graph analysis,” inProc. 19th USENIX Security Symp., 2010, pp. 95–110

  10. [10]

    An empiri- cal study of the I2P anonymity network and its censorship resistance,

    N. P. Hoang, P. Kintis, M. Antonakakis, and M. Polychronakis, “An empiri- cal study of the I2P anonymity network and its censorship resistance,” inProc. Internet Measurement Conf., 2018, pp. 379–392

  11. [11]

    How do Tor users interact with onion services?

    P. Winter, A. Edmundson, L. M. Roberts, A. Dutkowska- ˙Zuk, M. Chetty, and N. Feamster, “How do Tor users interact with onion services?” inProc. 27th USENIX Security Symp., 2018, pp. 411–428

  12. [12]

    On the effectiveness of traffic analysis against anonymity networks using flow records,

    S. Chakravarty, M. V. Barbera, G. Portokalidis, M. Polychronakis, and A. D. Keromytis, “On the effectiveness of traffic analysis against anonymity networks using flow records,” inProc. Passive and Active Measurement Conf., 2014, pp. 247–257

  13. [13]

    k-fingerprinting: A robust scalable website fingerprint- ing technique,

    J. Hayes and G. Danezis, “k-fingerprinting: A robust scalable website fingerprint- ing technique,” inProc. 25th USENIX Security Symp., 2016, pp. 1187–1203

  14. [14]

    Adaptive encrypted traffic fingerprinting with bi-directional dependence,

    K. Al-Naami, S. Chandra, A. Mustafa, L. Khan, Z. Lin, K. Hamlen, and B. Thuraisingham, “Adaptive encrypted traffic fingerprinting with bi-directional dependence,” inProc. 32nd Annual Computer Security Applications Conf., 2016, pp. 177–188

  15. [15]

    I know why you went to the clinic: Risks and realization of HTTPS traffic analysis,

    B. Miller, L. Huang, A. D. Joseph, and J. D. Tygar, “I know why you went to the clinic: Risks and realization of HTTPS traffic analysis,” inProc. Privacy Enhancing Technologies Symp., vol. 2014, no. 1, 2014, pp. 143–163

  16. [16]

    Deep packet: A novel approach for encrypted traffic classification using deep learning,

    M. Lotfollahi, M. J. Siavoshani, R. S. H. Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,”Soft Computing, vol. 24, no. 3, pp. 1999–2012, 2020. 25