Recognition: no theorem link
BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection
Pith reviewed 2026-05-10 15:56 UTC · model grok-4.3
The pith
TCH-Net with the BRIDGE benchmark achieves superior cross-domain IoT botnet detection by outperforming twelve baselines under leave-one-dataset-out evaluation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BRIDGE unifies CICIDS-2017, CIC-IoT-2023, Bot-IoT, Edge-IIoTset, and N-BaIoT through a 46-feature semantic canonical vocabulary grounded in CICFlowMeter nomenclature with genuine-equivalence-only mapping and explicit zero-filling, where per-dataset coverage ranges from 15% to 93%. Under the leave-one-dataset-out protocol, all five evaluated architectures record mean F1 between 0.39 and 0.47, establishing the first community generalisation baseline at mean LODO F1 of 0.5577. TCH-Net fuses a three-path temporal branch, a provenance-conditioned contextual branch, and a statistical branch via Cross-Branch Gated Attention Fusion with learnable sigmoid gates, achieving F1 of 0.8296 plus or minus 0
What carries the argument
The 46-feature semantic canonical vocabulary for unifying heterogeneous datasets and the TCH-Net architecture that combines temporal residual convolutional-BiGRU paths, contextual information, statistical features, and Cross-Branch Gated Attention Fusion for dynamic feature-wise mixing.
If this is right
- Standard architectures achieve mean LODO F1 between 0.39 and 0.47, exposing a measurable generalization gap.
- TCH-Net records the highest LODO F1 overall while outperforming all twelve baselines at p less than 0.05 by Wilcoxon test.
- The BRIDGE benchmark supplies a reproducible methodology that shifts evaluation from single-dataset optimisation to cross-environment generalisation.
- All reported metrics hold across five random seeds with low variance.
Where Pith is reading between the lines
- The same semantic unification approach could be applied to other security tasks that face mismatched feature spaces across data sources.
- The gated fusion mechanism may transfer to additional multi-view problems such as combining network flows with device metadata.
- Adding streaming or real-time IoT traces to BRIDGE would test whether the reported gains survive deployment conditions.
Load-bearing premise
The 46-feature semantic canonical vocabulary with genuine-equivalence-only mapping and zero-filling preserves enough discriminative information across datasets whose coverage varies from 15% to 93% without systematic bias from the mapping rules or per-dataset missingness patterns.
What would settle it
A sixth IoT dataset whose required feature mappings produce zero-filled values that correlate with labels in a way that drops TCH-Net's F1 below the mean of the twelve baselines under the same leave-one-dataset-out protocol.
Figures
read the original abstract
IoT botnet detection has advanced, yet most published systems are validated on a single dataset and rarely generalise across environments. Heterogeneous feature spaces make multi-dataset training practically impossible without discarding semantic interpretability or introducing data integrity violations. No prior work has addressed both problems with a formally specified, reproducible methodology. This paper does. We introduce BRIDGE (Benchmark Reference for IoT Domain Generalisation Evaluation), the first formally specified heterogeneous multi-dataset benchmark for IoT intrusion detection, unifying CICIDS-2017, CIC-IoT-2023, Bot-IoT, Edge-IIoTset, and N-BaIoT through a 46-feature semantic canonical vocabulary grounded in CICFlowMeter nomenclature, with genuine-equivalence-only feature mapping, explicit zero-filling, and per-dataset coverage from 15% to 93%. A leave-one-dataset-out (LODO) protocol makes the generalisation gap precisely measurable: all five evaluated architectures achieve mean LODO F1 between 0.39 and 0.47, and we establish the first community generalisation baseline at mean LODO F1 = 0.5577, a result that shifts the agenda from single-benchmark optimisation toward cross-environment generalisation. We propose TCH-Net, a multi-branch network fusing a three-path Temporal branch (residual convolutional-BiGRU, stride-downsampled BiGRU, pre-LayerNorm Transformer), a provenance-conditioned Contextual branch, and a Statistical branch via Cross-Branch Gated Attention Fusion (CB-GAF) with learnable sigmoid gates for dynamic feature-wise mixing. Across five random seeds, TCH-Net achieves F1 = 0.8296 +/- 0.0028, AUC = 0.9380 +/- 0.0025, and MCC = 0.6972 +/- 0.0056, outperforming all twelve baselines (p < 0.05, Wilcoxon) and recording the highest LODO F1 overall. BRIDGE and the full pipeline are at https://github.com/Ammar-ss/TCH-Net.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces BRIDGE, a formally specified heterogeneous benchmark that unifies five IoT botnet datasets (CICIDS-2017, CIC-IoT-2023, Bot-IoT, Edge-IIoTset, N-BaIoT) into a 46-feature semantic canonical space via genuine-equivalence mapping and explicit zero-filling, together with a leave-one-dataset-out (LODO) protocol. It proposes TCH-Net, a multi-branch network with temporal (residual conv-BiGRU, stride-downsampled BiGRU, pre-LayerNorm Transformer), provenance-conditioned contextual, and statistical branches fused by Cross-Branch Gated Attention Fusion (CB-GAF) using learnable sigmoid gates. The authors report that TCH-Net attains in-domain F1 = 0.8296 ± 0.0028, AUC = 0.9380 ± 0.0025, MCC = 0.6972 ± 0.0056, outperforms twelve baselines (Wilcoxon p < 0.05), and records the highest LODO F1, establishing a community baseline of mean LODO F1 = 0.5577.
Significance. If the central claims hold, the work supplies the first reproducible, formally specified multi-dataset benchmark for cross-domain IoT intrusion detection and a competitive multi-branch baseline that demonstrably improves both in-domain accuracy and LODO generalization. The public GitHub artifacts (code, pipeline, and BRIDGE definition) are a clear strength that supports future community use. The shift from single-benchmark optimization to measurable domain generalization is a substantive contribution to the ML-for-security literature.
major comments (2)
- [Section 3] BRIDGE construction (Section 3): zero-filling of features whose per-dataset coverage ranges from 15 % to 93 % creates a potential dataset identifier. Because the manuscript provides no ablation that isolates zero-fill artifacts from semantic content and no statistical check that missingness is independent of the botnet label, it is impossible to rule out that the reported LODO gains (including the 0.5577 baseline) partly reflect exploitation of missingness patterns rather than transferable botnet signals. This directly affects the validity of the cross-domain generalization claim.
- [Section 5] Experimental protocol (Section 5): the LODO results are presented with error bars and Wilcoxon tests, yet the manuscript does not report the exact train/test splits, the precise feature-mapping implementation, or any sensitivity analysis to the zero-filling rule. Without these, the link between the TCH-Net architecture and the claimed outperformance cannot be independently verified.
minor comments (2)
- [Abstract] The abstract states that 'all five evaluated architectures achieve mean LODO F1 between 0.39 and 0.47'; clarify whether this set includes TCH-Net or only the twelve baselines.
- [Section 4] Notation for the CB-GAF gates (learnable sigmoid parameters) should be introduced with an equation or explicit pseudocode in the architecture section to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help strengthen the validity and reproducibility of our claims regarding BRIDGE and TCH-Net. We address each major comment point by point below, outlining the specific revisions we will make.
read point-by-point responses
-
Referee: [Section 3] BRIDGE construction (Section 3): zero-filling of features whose per-dataset coverage ranges from 15 % to 93 % creates a potential dataset identifier. Because the manuscript provides no ablation that isolates zero-fill artifacts from semantic content and no statistical check that missingness is independent of the botnet label, it is impossible to rule out that the reported LODO gains (including the 0.5577 baseline) partly reflect exploitation of missingness patterns rather than transferable botnet signals. This directly affects the validity of the cross-domain generalization claim.
Authors: We agree this is a substantive concern: zero-filling could encode dataset-specific signals if missingness correlates with labels or dataset identity. In the revision we will add (i) an ablation comparing zero-filling against mean imputation (per-dataset) and against dropping low-coverage features entirely, and (ii) a statistical check (Pearson correlation and chi-square tests between missingness indicators and botnet labels within each dataset). These results, together with updated LODO numbers under the alternative strategies, will be placed in Section 3 and a new appendix subsection. This directly addresses whether the reported generalization reflects transferable botnet signals. revision: yes
-
Referee: [Section 5] Experimental protocol (Section 5): the LODO results are presented with error bars and Wilcoxon tests, yet the manuscript does not report the exact train/test splits, the precise feature-mapping implementation, or any sensitivity analysis to the zero-filling rule. Without these, the link between the TCH-Net architecture and the claimed outperformance cannot be independently verified.
Authors: We accept that the current manuscript lacks sufficient detail for full independent reproduction. We will expand Section 5 with the exact train/test split ratios, random seeds, and per-fold sample counts used in every LODO experiment. The 46-feature canonical mapping (including genuine-equivalence rules and zero-fill policy) will be documented with pseudocode and a coverage table in an expanded Section 3. We will also add a sensitivity analysis that varies the zero-filling rule and reports its effect on both in-domain and LODO F1/AUC/MCC. The public GitHub repository already contains the pipeline; the paper revisions will make all parameters explicit without requiring external code inspection. revision: yes
Circularity Check
No circularity: empirical benchmark and model evaluation is self-contained
full rationale
The paper defines BRIDGE via explicit feature mapping rules and zero-filling to a 46-feature space, then evaluates TCH-Net and baselines under a LODO protocol on public datasets. No equations, fitted parameters, or self-citations reduce any performance claim (F1, AUC, MCC) to its own inputs by construction. The reported metrics are measured outcomes from held-out data splits; the architecture and benchmark are externally reproducible via GitHub artifacts. No load-bearing step collapses to renaming, ansatz smuggling, or uniqueness imported from prior author work.
Axiom & Free-Parameter Ledger
free parameters (1)
- learnable sigmoid gates in CB-GAF
Reference graph
Works this paper leans on
-
[1]
Securing the Internet of Things: Chal- lenges, Threats and Solutions
Radoglou Grammatikis, P., Sarigiannidis, P., Moscho- lios, I., 2018. Securing the Internet of Things: Chal- lenges, Threats and Solutions. Internet of Things. https: //doi.org/10.1016/j.iot.2018.11.003
-
[2]
DDoS in the IoT: Mirai and Other Botnets
Kolias, C., Kambourakis, G., Stavrou, A., V oas, J., 2017. DDoS in the IoT: Mirai and Other Botnets. Computer 50, 80–84.https://doi.org/10.1109/MC.2017.201
-
[3]
Understanding the Mi- rai Botnet
Antonakakis, M., April, T., Bailey, M., Bernhard, M., Bursztein, E., Cochran, J., Durumeric, Z., Halderman, J.A., Invernizzi, L., Kallitsis, M., Kumar, D., Lever, C., Ma, Z., Mason, J., Menscher, D., Seaman, C., Sullivan, N., Thomas, K., Zhou, Y ., 2017. Understanding the Mi- rai Botnet. In: Proceedings of the 26th USENIX Security Symposium, pp. 1093–1110...
2017
-
[4]
Anderson, R., Barton, C., Boehme, R., Clayton, R., Ganan, C., Grasso, T., Levi, M., Moore, T., & Vasek, M. (2019). Measuring the Changing Cost of Cybercrime. https:// doi.org/10.17863/CAM.41598
-
[5]
Snort: Lightweight Intrusion Detec- tion for Networks
Roesch, M., 1999. Snort: Lightweight Intrusion Detec- tion for Networks. In: Proceedings of the 13th USENIX Conference on System Administration, Seattle, WA, pp. 229–238
1999
-
[6]
Outside the Closed World: On Using Machine Learning for Network Intrusion Detec- tion
Sommer, R., Paxson, V ., 2010. Outside the Closed World: On Using Machine Learning for Network Intrusion Detec- tion. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 305–316. https://doi.org/10.1109/ SP.2010.25
2010
-
[7]
An empirical comparison of botnet detection methods
García, S., Grill, M., Stiborek, J., Zunino, A., 2014. An empirical comparison of botnet detection methods. Com- puters & Security 45, 100–123. https://doi.org/10. 1016/j.cose.2014.05.011
2014
-
[8]
Breiman, L., 2001. Random Forests. Machine Learn- ing 45, 5–32. http://dx.doi.org/10.1023/A: 1010933404324
work page doi:10.1023/a: 2001
-
[9]
Chen, T.Q., Guestrin, C., 2016. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13–17 Au- gust 2016, pp. 785–794. https://doi.org/10.1145/ 2939672.2939785
-
[10]
Principal Component Analysis, 2nd ed
Jolliffe, I.T., 2002. Principal Component Analysis, 2nd ed. Springer-Verlag, New York
2002
-
[11]
Neural Computation 9(8), 1735–1780 (1997)
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Computation 9 (8), 1735–1780. https: //doi.org/10.1162/neco.1997.9.8.1735
-
[12]
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y ., 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP), Doha, Qatar, pp. 1724–1734
2014
-
[13]
Y . Lecun, L. Bottou, Y . Bengio and P. Haffner, "Gradient- based learning applied to document recognition," in Pro- ceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998,doi:10.1109/5.726791
-
[14]
Atten- tion Is All You Need
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I., 2017. Atten- tion Is All You Need. In: Proceedings of the 31st Interna- tional Conference on Neural Information Processing Sys- tems, Long Beach, 4–9 December 2017, pp. 6000–6010
2017
-
[15]
Focal Loss for Dense Object Detection , booktitle =
Lin, T.Y ., Goyal, P., Girshick, R., He, K., Dollár, P., 2017. Focal Loss for Dense Object Detection. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, 22–29 October 2017, pp. 2980–2988. https:// doi.org/10.1109/ICCV.2017.324. Bhilwarawala et al. 21 BRIDGE and TCH-Net
-
[16]
Decoupled Weight De- cay Regularization
Loshchilov, I., Hutter, F., 2019. Decoupled Weight De- cay Regularization. In: 7th International Conference on Learning Representations, New Orleans, 6–9 May 2019
2019
-
[17]
A bidirectional LSTM deep learning approach for intrusion detection
Imrana, Y ., Xiang, Y ., Ali, L., Abdul-Rauf, Z., 2021. A bidirectional LSTM deep learning approach for intrusion detection. Expert Systems with Applications 185, 115524. https://doi.org/10.1016/j.eswa.2021.115524
-
[18]
Transformer Based Intrusion Detection for IoT Networks
Akuthota, U.C., Bhargava, L., 2025. Transformer Based Intrusion Detection for IoT Networks. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT. 2025.3525494
-
[19]
N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders,
Y . Meidan et al., "N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders," in IEEE Pervasive Computing, vol. 17, no. 3, pp. 12-22, Jul.-Sep. 2018,doi:10.1109/MPRV.2018.03367731
-
[20]
Kitsune: An Ensemble of Autoencoders for Online Net- work Intrusion Detection
Mirsky, Y ., Doitshman, T., Elovici, Y ., Shabtai, A., 2018. Kitsune: An Ensemble of Autoencoders for Online Net- work Intrusion Detection. In: Proceedings of the Net- work and Distributed System Security Symposium (NDSS 2018), San Diego, CA. https://doi.org/10.14722/ ndss.2018.23211
-
[21]
DeepDefense: Identifying DDoS Attack via Deep Learning, pp
Yuan, X., Li, C., Li, X., 2017. DeepDefense: Identifying DDoS Attack via Deep Learning, pp. 1–8. In: Proceed- ings of the IEEE International Conference on Smart Com- puting https://doi.org/10.1109/SMARTCOMP.2017. 7946998
-
[22]
Distributed attack detec- tion scheme using deep learning approach for Internet of Things
Diro, A., Chilamkurti, N., 2017. Distributed attack detec- tion scheme using deep learning approach for Internet of Things. Future Generation Computer Systems 82. https: //doi.org/10.1016/j.future.2017.08.043
-
[23]
Inductive Representation Learning on Large Graphs
Hamilton, W.L., Ying, Z., Leskovec, J., 2017. Inductive Representation Learning on Large Graphs. In: Advances in Neural Information Processing Systems (NIPS 2017). arXiv abs/1706.02216
work page Pith review arXiv 2017
-
[24]
Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.,
-
[25]
In: Pro- ceedings of the 4th International Conference on In- formation Systems Security and Privacy (ICISSP), Funchal, Portugal, pp
Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In: Pro- ceedings of the 4th International Conference on In- formation Systems Security and Privacy (ICISSP), Funchal, Portugal, pp. 108–116. https://doi.org/10. 5220/0006639801080116
-
[26]
CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment
Neto, E.C.P., Dadkhah, S., Ferreira, R., Zohourian, A., Lu, R., Ghorbani, A.A., 2023. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment. Sensors 23, 5941. https://doi.org/10. 3390/s23135941
2023
-
[27]
Koroniotis, N., Moustafa, N., Sitnikova, E., Turnbull, B., 2019. Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset. arXiv:1811.00701. https: //doi.org/10.1016/j.future.2019.05.041
-
[28]
Ferrag, M.A., Friha, O., Hamouda, D., Maglaras, L., Jan- icke, H., 2022. Edge-IIoTset: A New Comprehensive Real- istic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning. IEEE Access 10. https://doi.org/10.1109/ACCESS.2022.3165809
-
[29]
Characterization of Tor Traffic using Time based Features
Habibi Lashkari, A., Draper Gil, G., Mamun, M., Ghor- bani, A., 2017. Characterization of Tor Traffic using Time based Features. In: Proceedings of the 4th Inter- national Conference on Information Systems Security and Privacy (ICISSP), Porto, Portugal, pp. 253–262. https: //doi.org/10.5220/0006105602530262
-
[30]
Troubleshoot- ing an Intrusion Detection Dataset: the CICIDS2017 Case Study
Engelen, G., Rimmer, V ., Joosen, W., 2021. Troubleshoot- ing an Intrusion Detection Dataset: the CICIDS2017 Case Study. In: Proceedings of the IEEE Security and Pri- vacy Workshops (SPW), San Francisco, CA, pp. 7–12. https://doi.org/10.1109/SPW53761.2021.00009
-
[31]
A Survey of Network-based Intrusion Detection Data Sets
Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A., 2019. A Survey of Network-based Intrusion Detection Data Sets. Computers and Security. https: //doi.org/10.1016/j.cose.2019.06.005
-
[32]
Individual Comparisons by Rank- ing Methods
Wilcoxon, F., 1992. Individual Comparisons by Rank- ing Methods. In: Kotz, S., Johnson, N.L. (Eds.), Breakthroughs in Statistics. Springer Series in Statistics. Springer, New York, NY .https://doi.org/10.1007/ 978-1-4612-4380-9_16
1992
-
[33]
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
Chicco, D., Jurman, G., 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 6. https://doi.org/10.1186/ s12864-019-6413-7
2020
-
[34]
Domain-Adversarial Training of Neural Net- works
Ganin, Y ., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V ., 2016. Domain-Adversarial Training of Neural Net- works. Journal of Machine Learning Research 17 (59), 1–35
2016
-
[35]
Distilling the Knowledge in a Neural Network
Hinton, G., Dean, J., Vinyals, O., 2015. Distilling the Knowledge in a Neural Network. arXiv:1503.02531. https://arxiv.org/abs/1503.02531
work page internal anchor Pith review Pith/arXiv arXiv 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.