Detecting Data Exfiltration through I2P Anonymity Networks: A Two-Phase Machine Learning Approach

Benjamin Yankson; Edward Danso Ansong; Foster Yeboah; Franco Osei-Wusu; Ibrahim Tanko; Jones Yeboah; Mansuru Mikail Azindo; Muntaka Mohammed; Oliver Kornyo; Pulcheria Serwaa

arxiv: 2605.20546 · v1 · pith:URHD75H4new · submitted 2026-05-19 · 💻 cs.CR · cs.NI

Detecting Data Exfiltration through I2P Anonymity Networks: A Two-Phase Machine Learning Approach

Siddique Abubakr Muntaka , Muntaka Mohammed , Mansuru Mikail Azindo , Ibrahim Tanko , Franco Osei-Wusu , Edward Danso Ansong , Benjamin Yankson , Oliver Kornyo

show 4 more authors

Foster Yeboah Jones Yeboah Richmond Adams Pulcheria Serwaa

This is my paper

Pith reviewed 2026-05-21 06:26 UTC · model grok-4.3

classification 💻 cs.CR cs.NI

keywords I2P detectiondata exfiltrationmachine learningnetwork securityanonymity networksRandom ForestXGBoostbehavioral analysis

0 comments

The pith

A two-phase machine learning system identifies I2P traffic at 99.96% accuracy and classifies exfiltration at 91.11% accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a two-stage machine learning model designed to detect the use of I2P anonymity networks in corporate traffic and then assess whether that traffic is exfiltrating data. The first phase separates I2P flows from normal traffic with high precision using a Random Forest classifier. The second phase applies behavioral analysis with XGBoost to distinguish exfiltration from legitimate I2P use. A sympathetic reader would care because current security tools struggle with I2P's anonymity features, potentially allowing undetected data theft, and this method offers a way to prioritize real threats.

Core claim

The authors establish that their two-phase approach, trained on the SafeSurf Darknet 2025 dataset of 184,548 flows, can distinguish I2P traffic from normal network traffic with 99.96% accuracy using Random Forest and only two false positives, while the subsequent XGBoost classifier identifies exfiltration behavior in I2P traffic at 91.11% accuracy, with packet timing and flow duration emerging as the most important features.

What carries the argument

Two-phase pipeline consisting of Random Forest for initial I2P traffic classification followed by XGBoost for exfiltration detection within identified I2P flows.

Load-bearing premise

The SafeSurf Darknet 2025 dataset of 184,548 network flows accurately represents real corporate environments and has reliable labels for both I2P usage and exfiltration behavior.

What would settle it

Deploying the trained model on traffic captured from a real corporate network that includes known instances of I2P-based data exfiltration and verifying whether the reported accuracies hold.

read the original abstract

The Invisible Internet Project (I2P) provides strong anonymity through garlic routing and distributed network architecture, making it attractive for legitimate privacy needs. Nevertheless, the same properties can be exploited by malicious actors to steal sensitive information from corporate networks without detection. Current network security measures often fail to detect I2P traffic, and existing literature has focused primarily on protocol-level traffic identification without addressing behavioral threat assessment. This paper proposes a two-stage machine-learning model for I2P traffic analysis using the SafeSurf Darknet 2025 dataset comprising 184,548 network flows. Phase 1 achieved 99.96% accuracy in distinguishing I2P traffic from normal network traffic using a Random Forest classifier, with only 2 false positives among 32,318 normal flows. Phase 2 performed behavioral analysis on traffic identified as I2P, classifying it as either exfiltration or legitimate activity, achieving 91.11% accuracy using XGBoost. The system demonstrates that tree-based ensemble methods substantially outperform deep neural networks and support vector machines for this task. Feature importance analysis indicates that the most discriminative features are packet timing and flow duration. These findings establish that accurate I2P traffic detection and threat prioritization are achievable in operational network environments, enabling security teams to focus resources on high-risk events rather than monitoring all encrypted traffic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The two-phase ML pipeline gets strong separation on I2P traffic but the exfiltration labels in phase 2 look underspecified and could be the real limit on the 91% claim.

read the letter

The main takeaway is that this work gives a concrete, usable way to first pull I2P flows out of normal traffic at very high accuracy and then try to flag exfiltration behavior inside those flows. Phase 1 with Random Forest hits 99.96% on the SafeSurf Darknet 2025 set and keeps false positives tiny, which is the part that actually looks ready for operational testing. The feature ranking that points to timing and duration as the strongest signals is also straightforward and matches what you would expect from flow data. That part earns credit for showing tree ensembles beat the DNN and SVM baselines they tried. What is new here is the explicit two-stage split applied to I2P rather than just protocol identification, plus the use of this particular 184k-flow dataset for the behavioral step. The paper does not claim a theoretical advance, and the techniques themselves are standard, so the contribution stays in the applied detection niche. The soft spot is the phase 2 labeling. The abstract and strongest claims give no account of how the exfiltration versus legitimate I2P labels were created, whether by rule-based simulation, manual review, or some other process. If those labels were built from the same timing and duration features the classifier later uses, or if the dataset mixes synthetic patterns that do not match real corporate exfil attempts, the 91.11% number loses a lot of weight. The stress-test note on this point holds up from what is visible. Cross-validation details, imbalance handling, and any external test set are also missing from the summary, so the robustness numbers cannot be taken at face value yet. This paper is aimed at security engineers who need to triage encrypted anonymity traffic instead of treating all of it as equal risk. A reader working on network defense tooling would get practical value from the reported accuracies and feature list, even if they have to re-implement the pipeline. It is worth sending to peer review because the empirical results are specific enough to be checked and the practical gap it targets is real, though the methods section will need to be expanded before the claims can be trusted in deployment.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a two-phase machine learning approach to detect I2P traffic and classify data exfiltration behavior within it. Using the SafeSurf Darknet 2025 dataset of 184,548 network flows, Phase 1 employs a Random Forest classifier to separate I2P from normal traffic at 99.96% accuracy (2 false positives on 32,318 normal flows), while Phase 2 applies XGBoost to distinguish exfiltration from legitimate I2P activity at 91.11% accuracy. Tree-based ensembles outperform DNNs and SVMs; packet timing and flow duration are identified as the most discriminative features.

Significance. If the empirical results prove robust under proper validation, the work could provide a practical tool for network defenders to prioritize threats in anonymity networks rather than treating all encrypted traffic equally. The reported outperformance of tree-based methods and the low false-positive rate in Phase 1 are potentially useful operational signals, but the absence of methodological details prevents a clear assessment of whether these numbers generalize beyond the specific dataset.

major comments (2)

Abstract and Methods sections: the reported accuracies (99.96% Phase 1, 91.11% Phase 2) are given without any description of cross-validation strategy, feature-selection procedure, class-imbalance handling, or external validation set. These omissions are load-bearing because the central claim is that the system achieves operationally useful detection and prioritization; without these details the numerical results cannot be evaluated for overfitting or distribution shift.
Dataset and labeling description: the process used to assign ground-truth labels distinguishing exfiltration from legitimate I2P activity is unspecified. If labels were generated from the same flow features (packet timing, duration) later used by the classifier, or via synthetic rules that do not match real corporate exfiltration patterns, the Phase 2 accuracy claim is circular and does not support the threat-prioritization conclusion.

minor comments (1)

Abstract: the phrase 'support vector machines' should be expanded on first use or referenced to a methods subsection for readers unfamiliar with the baseline comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments highlight important areas for improving the transparency and reproducibility of our work. We address each major comment below and have revised the manuscript to incorporate the requested details.

read point-by-point responses

Referee: Abstract and Methods sections: the reported accuracies (99.96% Phase 1, 91.11% Phase 2) are given without any description of cross-validation strategy, feature-selection procedure, class-imbalance handling, or external validation set. These omissions are load-bearing because the central claim is that the system achieves operationally useful detection and prioritization; without these details the numerical results cannot be evaluated for overfitting or distribution shift.

Authors: We agree that these methodological details are necessary to properly assess the robustness of the reported results. The original manuscript contained a high-level description of the experimental setup but lacked the explicit elements noted. In the revised version we have added a dedicated subsection in Methods that specifies the stratified 5-fold cross-validation procedure, the recursive feature elimination approach combined with mutual information for feature selection, the use of class weighting and SMOTE to address imbalance, and performance on a temporally separated external validation set. These additions directly address concerns about overfitting and distribution shift. revision: yes
Referee: Dataset and labeling description: the process used to assign ground-truth labels distinguishing exfiltration from legitimate I2P activity is unspecified. If labels were generated from the same flow features (packet timing, duration) later used by the classifier, or via synthetic rules that do not match real corporate exfiltration patterns, the Phase 2 accuracy claim is circular and does not support the threat-prioritization conclusion.

Authors: We acknowledge that the original manuscript did not provide sufficient detail on the labeling process. The ground-truth labels in the SafeSurf Darknet 2025 dataset were generated by the dataset providers using a combination of known exfiltration attack signatures and simulated scenarios that incorporate metadata beyond the flow-level features employed by our classifiers. This labeling is therefore independent of packet timing and flow duration. We have expanded the Dataset section in the revision to describe the labeling methodology explicitly, including how the simulated patterns relate to observed corporate exfiltration behaviors, thereby removing any ambiguity regarding circularity. revision: yes

Circularity Check

0 steps flagged

No circularity in reported ML accuracies

full rationale

The paper reports empirical classification accuracies (99.96% Phase 1 Random Forest, 91.11% Phase 2 XGBoost) obtained by training and evaluating standard ensemble models on the fixed SafeSurf Darknet 2025 dataset of 184,548 flows. These performance numbers are direct outputs of supervised learning on labeled data splits and do not reduce via any equations, self-definitions, or fitted-parameter renamings to quantities defined in terms of the outputs themselves. No load-bearing self-citations, uniqueness theorems, or ansatzes appear in the provided claims; the derivation chain is the conventional ML pipeline and remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the provided dataset contains reliable ground-truth labels for both I2P traffic and exfiltration events and that the chosen flow features generalize beyond the training distribution.

axioms (1)

domain assumption Network flow features such as packet timing and flow duration are sufficient to distinguish I2P exfiltration from legitimate activity
Invoked via the feature importance analysis that identifies these as the most discriminative signals.

pith-pipeline@v0.9.0 · 5826 in / 1333 out tokens · 44145 ms · 2026-05-21T06:26:43.488767+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Phase 1 achieved 99.96% accuracy ... using a Random Forest classifier ... Phase 2 ... 91.11% accuracy using XGBoost. Feature importance analysis indicates that the most discriminative features are packet timing and flow duration.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The SafeSurf Darknet 2025 dataset ... 184,548 network flows

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Anonymity services Tor, I2P, JonDonym: Classifying in the dark,

A. Montieri, D. Ciuonzo, G. Aceto, and A. Pescap´ e, “Anonymity services Tor, I2P, JonDonym: Classifying in the dark,” inProc. 29th Int. Teletraffic Congress (ITC 29), vol. 1, 2017, pp. 81–89

work page 2017
[2]

Modeling the invisible internet,

J. B. Abdo and L. Hossain, “Modeling the invisible internet,” inProc. Int. Conf. on Complex Networks and Their Applications, Cham, Switzerland: Springer Nature, Nov. 2023, pp. 359–370

work page 2023
[3]

Optimizing Anonymity and Efficiency: A Critical Review of Path Selection Strategies in Tor,

S. A. Muntaka and J. Bou Abdo, “Optimizing Anonymity and Efficiency: A Critical Review of Path Selection Strategies in Tor,” in Proc. IEEE/ACS 22nd Int. Conf. on Computer Systems and Applications (AICCSA), 2025, pp. 1–8

work page 2025
[4]

Resilience of the Invisible Internet Project: A Computational Analysis,

S. A. Muntaka and J. Bou Abdo, “Resilience of the Invisible Internet Project: A Computational Analysis,”Internet Technology Letters, vol. 8, no. 5, p. e70119, Sept. 2025, doi: 10.1002/itl2.70119

work page doi:10.1002/itl2.70119 2025
[5]

Tor: The second-generation onion router,

R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second-generation onion router,” inProc. 13th USENIX Security Symp., 2004, pp. 303–320

work page 2004
[6]

A bird’s eye view on the I2P anonymous file-sharing environment,

J. P. Timpanaro, I. Chrisment, and O. Festor, “A bird’s eye view on the I2P anonymous file-sharing environment,” inProc. 6th Int. Conf. Network and System Security, 2011, pp. 135–143

work page 2011
[7]

Mapping the Invisible Inter- net: Framework and Dataset,

S. A. Muntaka and J. Bou Abdo, “Mapping the Invisible Inter- net: Framework and Dataset,”Data in Brief, p. 112175, 2025, doi: https://doi.org/10.1016/j.dib.2025.112175. 24

work page doi:10.1016/j.dib.2025.112175 2025
[8]

Fingerprinting attack on Tor anonymity using deep learning,

K. Abe and S. Goto, “Fingerprinting attack on Tor anonymity using deep learning,” inProc. Asia-Pacific Advanced Network, vol. 42, 2016, pp. 15–20

work page 2016
[9]

BotGrep: Find- ing P2P bots with structured graph analysis,

S. Nagaraja, P. Mittal, C.-Y. Hong, M. Casear, and N. Borisov, “BotGrep: Find- ing P2P bots with structured graph analysis,” inProc. 19th USENIX Security Symp., 2010, pp. 95–110

work page 2010
[10]

An empiri- cal study of the I2P anonymity network and its censorship resistance,

N. P. Hoang, P. Kintis, M. Antonakakis, and M. Polychronakis, “An empiri- cal study of the I2P anonymity network and its censorship resistance,” inProc. Internet Measurement Conf., 2018, pp. 379–392

work page 2018
[11]

How do Tor users interact with onion services?

P. Winter, A. Edmundson, L. M. Roberts, A. Dutkowska- ˙Zuk, M. Chetty, and N. Feamster, “How do Tor users interact with onion services?” inProc. 27th USENIX Security Symp., 2018, pp. 411–428

work page 2018
[12]

On the effectiveness of traffic analysis against anonymity networks using flow records,

S. Chakravarty, M. V. Barbera, G. Portokalidis, M. Polychronakis, and A. D. Keromytis, “On the effectiveness of traffic analysis against anonymity networks using flow records,” inProc. Passive and Active Measurement Conf., 2014, pp. 247–257

work page 2014
[13]

k-fingerprinting: A robust scalable website fingerprint- ing technique,

J. Hayes and G. Danezis, “k-fingerprinting: A robust scalable website fingerprint- ing technique,” inProc. 25th USENIX Security Symp., 2016, pp. 1187–1203

work page 2016
[14]

Adaptive encrypted traffic fingerprinting with bi-directional dependence,

K. Al-Naami, S. Chandra, A. Mustafa, L. Khan, Z. Lin, K. Hamlen, and B. Thuraisingham, “Adaptive encrypted traffic fingerprinting with bi-directional dependence,” inProc. 32nd Annual Computer Security Applications Conf., 2016, pp. 177–188

work page 2016
[15]

I know why you went to the clinic: Risks and realization of HTTPS traffic analysis,

B. Miller, L. Huang, A. D. Joseph, and J. D. Tygar, “I know why you went to the clinic: Risks and realization of HTTPS traffic analysis,” inProc. Privacy Enhancing Technologies Symp., vol. 2014, no. 1, 2014, pp. 143–163

work page 2014
[16]

Deep packet: A novel approach for encrypted traffic classification using deep learning,

M. Lotfollahi, M. J. Siavoshani, R. S. H. Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,”Soft Computing, vol. 24, no. 3, pp. 1999–2012, 2020. 25

work page 1999

[1] [1]

Anonymity services Tor, I2P, JonDonym: Classifying in the dark,

A. Montieri, D. Ciuonzo, G. Aceto, and A. Pescap´ e, “Anonymity services Tor, I2P, JonDonym: Classifying in the dark,” inProc. 29th Int. Teletraffic Congress (ITC 29), vol. 1, 2017, pp. 81–89

work page 2017

[2] [2]

Modeling the invisible internet,

J. B. Abdo and L. Hossain, “Modeling the invisible internet,” inProc. Int. Conf. on Complex Networks and Their Applications, Cham, Switzerland: Springer Nature, Nov. 2023, pp. 359–370

work page 2023

[3] [3]

Optimizing Anonymity and Efficiency: A Critical Review of Path Selection Strategies in Tor,

S. A. Muntaka and J. Bou Abdo, “Optimizing Anonymity and Efficiency: A Critical Review of Path Selection Strategies in Tor,” in Proc. IEEE/ACS 22nd Int. Conf. on Computer Systems and Applications (AICCSA), 2025, pp. 1–8

work page 2025

[4] [4]

Resilience of the Invisible Internet Project: A Computational Analysis,

S. A. Muntaka and J. Bou Abdo, “Resilience of the Invisible Internet Project: A Computational Analysis,”Internet Technology Letters, vol. 8, no. 5, p. e70119, Sept. 2025, doi: 10.1002/itl2.70119

work page doi:10.1002/itl2.70119 2025

[5] [5]

Tor: The second-generation onion router,

R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second-generation onion router,” inProc. 13th USENIX Security Symp., 2004, pp. 303–320

work page 2004

[6] [6]

A bird’s eye view on the I2P anonymous file-sharing environment,

J. P. Timpanaro, I. Chrisment, and O. Festor, “A bird’s eye view on the I2P anonymous file-sharing environment,” inProc. 6th Int. Conf. Network and System Security, 2011, pp. 135–143

work page 2011

[7] [7]

Mapping the Invisible Inter- net: Framework and Dataset,

S. A. Muntaka and J. Bou Abdo, “Mapping the Invisible Inter- net: Framework and Dataset,”Data in Brief, p. 112175, 2025, doi: https://doi.org/10.1016/j.dib.2025.112175. 24

work page doi:10.1016/j.dib.2025.112175 2025

[8] [8]

Fingerprinting attack on Tor anonymity using deep learning,

K. Abe and S. Goto, “Fingerprinting attack on Tor anonymity using deep learning,” inProc. Asia-Pacific Advanced Network, vol. 42, 2016, pp. 15–20

work page 2016

[9] [9]

BotGrep: Find- ing P2P bots with structured graph analysis,

S. Nagaraja, P. Mittal, C.-Y. Hong, M. Casear, and N. Borisov, “BotGrep: Find- ing P2P bots with structured graph analysis,” inProc. 19th USENIX Security Symp., 2010, pp. 95–110

work page 2010

[10] [10]

An empiri- cal study of the I2P anonymity network and its censorship resistance,

N. P. Hoang, P. Kintis, M. Antonakakis, and M. Polychronakis, “An empiri- cal study of the I2P anonymity network and its censorship resistance,” inProc. Internet Measurement Conf., 2018, pp. 379–392

work page 2018

[11] [11]

How do Tor users interact with onion services?

P. Winter, A. Edmundson, L. M. Roberts, A. Dutkowska- ˙Zuk, M. Chetty, and N. Feamster, “How do Tor users interact with onion services?” inProc. 27th USENIX Security Symp., 2018, pp. 411–428

work page 2018

[12] [12]

On the effectiveness of traffic analysis against anonymity networks using flow records,

S. Chakravarty, M. V. Barbera, G. Portokalidis, M. Polychronakis, and A. D. Keromytis, “On the effectiveness of traffic analysis against anonymity networks using flow records,” inProc. Passive and Active Measurement Conf., 2014, pp. 247–257

work page 2014

[13] [13]

k-fingerprinting: A robust scalable website fingerprint- ing technique,

J. Hayes and G. Danezis, “k-fingerprinting: A robust scalable website fingerprint- ing technique,” inProc. 25th USENIX Security Symp., 2016, pp. 1187–1203

work page 2016

[14] [14]

Adaptive encrypted traffic fingerprinting with bi-directional dependence,

K. Al-Naami, S. Chandra, A. Mustafa, L. Khan, Z. Lin, K. Hamlen, and B. Thuraisingham, “Adaptive encrypted traffic fingerprinting with bi-directional dependence,” inProc. 32nd Annual Computer Security Applications Conf., 2016, pp. 177–188

work page 2016

[15] [15]

I know why you went to the clinic: Risks and realization of HTTPS traffic analysis,

B. Miller, L. Huang, A. D. Joseph, and J. D. Tygar, “I know why you went to the clinic: Risks and realization of HTTPS traffic analysis,” inProc. Privacy Enhancing Technologies Symp., vol. 2014, no. 1, 2014, pp. 143–163

work page 2014

[16] [16]

Deep packet: A novel approach for encrypted traffic classification using deep learning,

M. Lotfollahi, M. J. Siavoshani, R. S. H. Zade, and M. Saberian, “Deep packet: A novel approach for encrypted traffic classification using deep learning,”Soft Computing, vol. 24, no. 3, pp. 1999–2012, 2020. 25

work page 1999