SE-Enhanced ViT and BiLSTM-Based Intrusion Detection for Secure IIoT and IoMT Environments
Pith reviewed 2026-05-10 18:38 UTC · model grok-4.3
The pith
The SE ViT-BiLSTM hybrid model reaches 99.33 percent accuracy for intrusion detection on IIoT benchmark data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that replacing the multi-head attention in a Vision Transformer with Squeeze-and-Excitation attention and integrating BiLSTM layers yields an intrusion detector that achieves 99.11 percent accuracy and 0.00032 seconds per instance latency on EdgeIIoT before balancing, improving to 99.33 percent accuracy after balancing, while reaching 98.16 percent accuracy and 0.00014 seconds latency on the balanced CICIoMT2024 dataset.
What carries the argument
The SE ViT-BiLSTM architecture, in which Squeeze-and-Excitation attention replaces the Vision Transformer's standard multi-head attention and BiLSTM layers are added to capture sequential dependencies in network traffic.
If this is right
- The model processes each instance in 0.00014 to 0.00035 seconds on the tested datasets, supporting real-time use.
- False-positive rates stay below 0.004 percent across both datasets before and after balancing.
- Data balancing with SMOTE and RandomOverSampler raises accuracy on the imbalanced medical IoT dataset from 96.10 to 98.16 percent.
- The same architecture is shown to apply to both industrial and medical IoT traffic classification tasks.
Where Pith is reading between the lines
- If the Squeeze-and-Excitation replacement is the main driver of gains, the same swap could be tried in other transformer-based detectors for different IoT domains.
- The reported low latency makes the model a candidate for direct deployment on edge hardware, but this would still require validation under live traffic loads.
- Because performance improved after balancing, future work on similar tasks should routinely check class imbalance as a confounding factor.
Load-bearing premise
The two benchmark datasets sufficiently represent the variety of real-world attack patterns and network conditions found in IIoT and IoMT environments.
What would settle it
Testing the same model on a third independent dataset containing previously unseen attack types and network conditions that yields accuracy below 95 percent would show the reported outperformance does not hold generally.
Figures
read the original abstract
With the rapid growth of interconnected devices in Industrial and Medical Internet of Things (IIoT and MIoT) ecosystems, ensuring timely and accurate detection of cyber threats has become a critical challenge. This study presents an advanced intrusion detection framework based on a hybrid Squeeze-and-Excitation Attention Vision Transformer-Bidirectional Long Short-Term Memory (SE ViT-BiLSTM) architecture. In this design, the traditional multi-head attention mechanism of the Vision Transformer is replaced with Squeeze-and-Excitation attention, and integrated with BiLSTM layers to enhance detection accuracy and computational efficiency. The proposed model was trained and evaluated on two real-world benchmark datasets; EdgeIIoT and CICIoMT2024; both before and after data balancing using the Synthetic Minority Over-sampling Technique (SMOTE) and RandomOverSampler. Experimental results demonstrate that the SE ViT-BiLSTM model outperforms existing approaches across multiple metrics. Before balancing, the model achieved accuracies of 99.11% (FPR: 0.0013%, latency: 0.00032 sec/inst) on EdgeIIoT and 96.10% (FPR: 0.0036%, latency: 0.00053 sec/inst) on CICIoMT2024. After balancing, performance further improved, reaching 99.33% accuracy with 0.00035 sec/inst latency on EdgeIIoT and 98.16% accuracy with 0.00014 sec/inst latency on CICIoMT2024.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an SE ViT-BiLSTM hybrid architecture for intrusion detection in IIoT and IoMT settings. It replaces the Vision Transformer's multi-head attention with Squeeze-and-Excitation attention, integrates BiLSTM layers, and evaluates the model on the EdgeIIoT and CICIoMT2024 datasets both before and after balancing via SMOTE and RandomOverSampler. The central claim is that this model outperforms prior approaches, with reported accuracies of 99.11% (EdgeIIoT) and 96.10% (CICIoMT2024) before balancing, improving to 99.33% and 98.16% after balancing, alongside low FPR and per-instance latency values.
Significance. If the performance gains can be rigorously attributed to the SE modification and BiLSTM integration rather than balancing or tuning, the work could offer a practical, low-latency IDS approach for constrained IoT environments. The concrete numerical results on named public datasets provide a starting point for comparison, but the absence of ablations, baselines, and statistical validation reduces the strength of the outperformance claim.
major comments (3)
- [§5] §5 (Results and Discussion): The claim that the SE ViT-BiLSTM outperforms existing approaches lacks supporting baseline comparisons or tables listing prior methods' metrics on the same datasets and splits; without these, the reported accuracy/FPR/latency improvements cannot be directly verified as superior.
- [§4.3 and §5] §4.3 (Model Architecture) and §5: No ablation study isolates the contribution of the Squeeze-and-Excitation attention replacement versus a standard ViT-BiLSTM or versus the SMOTE/RandomOverSampler balancing step; the central attribution of gains to the SE modification is therefore unsupported.
- [§4.2] §4.2 (Experimental Setup): The manuscript provides no details on train/validation/test splits, number of random seeds, or statistical measures (error bars, confidence intervals, or p-values), making it impossible to assess whether the high accuracies (e.g., 99.33%) reflect stable performance or overfitting to the benchmark distributions.
minor comments (2)
- [Abstract and §5] Abstract and §5: Latency is reported in sec/inst but without specifying hardware platform or batch size, which limits reproducibility and comparison.
- [§3] §3 (Related Work): The discussion of prior ViT and BiLSTM IDS methods could include more recent 2023-2024 references on attention mechanisms in IoT security to strengthen context.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight areas where additional details and analyses will strengthen the manuscript. We address each major comment below and will revise the paper accordingly.
read point-by-point responses
-
Referee: §5 (Results and Discussion): The claim that the SE ViT-BiLSTM outperforms existing approaches lacks supporting baseline comparisons or tables listing prior methods' metrics on the same datasets and splits; without these, the reported accuracy/FPR/latency improvements cannot be directly verified as superior.
Authors: We agree that a direct comparison table is needed to substantiate the outperformance claims. In the revised manuscript, we will add a table in Section 5 listing performance metrics (accuracy, FPR, latency) of relevant prior intrusion detection methods on EdgeIIoT and CICIoMT2024, using comparable evaluation settings where possible, to enable verification of the reported improvements. revision: yes
-
Referee: §4.3 (Model Architecture) and §5: No ablation study isolates the contribution of the Squeeze-and-Excitation attention replacement versus a standard ViT-BiLSTM or versus the SMOTE/RandomOverSampler balancing step; the central attribution of gains to the SE modification is therefore unsupported.
Authors: We acknowledge that ablation studies are required to isolate component contributions. We will add ablation experiments in the revised Section 5, including comparisons of the full SE ViT-BiLSTM against a standard ViT-BiLSTM (without SE attention) and against versions trained without SMOTE/RandomOverSampler balancing, to quantify the specific impact of the SE modification and balancing step. revision: yes
-
Referee: §4.2 (Experimental Setup): The manuscript provides no details on train/validation/test splits, number of random seeds, or statistical measures (error bars, confidence intervals, or p-values), making it impossible to assess whether the high accuracies (e.g., 99.33%) reflect stable performance or overfitting to the benchmark distributions.
Authors: We appreciate this observation. In the revised Section 4.2, we will specify the train/validation/test split ratios for each dataset, the number of random seeds used, and include statistical measures such as mean accuracy with standard deviations across runs and confidence intervals to demonstrate result stability. revision: yes
Circularity Check
No circularity: empirical evaluation on external benchmarks
full rationale
The paper presents a hybrid neural architecture (SE ViT-BiLSTM) and reports its classification performance after training and evaluation on two public benchmark datasets (EdgeIIoT, CICIoMT2024), both before and after SMOTE balancing. No mathematical derivation chain, uniqueness theorem, or parameter-free prediction is claimed. The reported accuracies, FPR, and latency figures are standard supervised-learning outcomes obtained by fitting network weights to the given data splits; they are not renamed as 'predictions' derived from the inputs by construction. No self-citations are invoked as load-bearing premises, no ansatz is smuggled via prior work, and no self-definitional loop exists. The central claim therefore remains an empirical observation against external benchmarks rather than a reduction to the paper's own fitted values.
Axiom & Free-Parameter Ledger
free parameters (2)
- neural network weights and architecture hyperparameters
- SMOTE and RandomOverSampler parameters
axioms (2)
- domain assumption The labeled network traffic in EdgeIIoT and CICIoMT2024 is representative of real IIoT and IoMT attack scenarios.
- standard math Standard supervised learning assumptions hold (i.i.d. samples, fixed label definitions).
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
hybrid Squeeze-and-Excitation Attention Vision Transformer-Bidirectional Long Short-Term Memory (SE ViT-BiLSTM) architecture... traditional multi-head attention mechanism of the Vision Transformer is replaced with Squeeze-and-Excitation attention
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Ablation study... Model #3 ViT∥BiLSTM 32... parallel feature extraction and fusion
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Machine learning solutions for securing iot- based healthcare: A review,
K. Kamir and C. Sarra, “Machine learning solutions for securing iot- based healthcare: A review,” in2023 5th International Conference on Pattern Analysis and Intelligent Systems (PAIS). IEEE, 2023, pp. 1–8
work page 2023
-
[2]
Machine learning enabled industrial iot security: Challenges, trends and solutions,
C. Ni and S. C. Li, “Machine learning enabled industrial iot security: Challenges, trends and solutions,”Journal of Industrial Information Integration, vol. 38, p. 100549, 2024
work page 2024
-
[3]
Cyber threat detection in iiot and iomt using dnn-gru with multi-head attention,
A. Gueriani, H. Kheddar, and A. C. Mazari, “Cyber threat detection in iiot and iomt using dnn-gru with multi-head attention,” in2025 International Conference on Research in Computing at Feminine (RIF). IEEE, 2025, pp. 1–8
work page 2025
-
[4]
Explainable bilstm-mha-based ids for iot using shap and zero- day attack detection,
——, “Explainable bilstm-mha-based ids for iot using shap and zero- day attack detection,” in2025 International Conference on Artificial Intelligence and Innovative Applications (AIIA). IEEE, 2025, pp. 1–8
work page 2025
-
[5]
S. F. Ahmed, M. S. B. Alam, S. Afrin, S. J. Rafa, N. Rafa, and A. H. Gandomi, “Insights into internet of medical things (iomt): Data fusion, security issues and potential solutions,”Information Fusion, vol. 102, p. 102060, 2024
work page 2024
-
[6]
F. Xu, S. Liu, and X. Yang, “An efficient privacy-preserving authenti- cation scheme with enhanced security for iomt applications,”Computer Communications, vol. 208, pp. 171–178, 2023
work page 2023
-
[7]
Securing a smart home with a transformer-based iot intrusion detection system,
M. Wang, N. Yang, and N. Weng, “Securing a smart home with a transformer-based iot intrusion detection system,”Electronics, vol. 12, no. 9, p. 2100, 2023
work page 2023
-
[8]
Securing the iot cyber environment: Enhancing intrusion anomaly detection with vision transformers,
L. Sana, M. M. Nazir, J. Yang, L. Hussain, Y .-L. Chen, C. S. Ku, M. Alatiyyah, and L. Y . Por, “Securing the iot cyber environment: Enhancing intrusion anomaly detection with vision transformers,”IEEE Access, 2024
work page 2024
-
[9]
Federated learning in intrusion detection: advancements, applications, and future directions,
B. Buyuktanir, S ¸. Altinkaya, G. Karatas Baydogmus, and K. Yildiz, “Federated learning in intrusion detection: advancements, applications, and future directions,”Cluster Computing, vol. 28, no. 7, pp. 1–25, 2025
work page 2025
-
[10]
A comparative study of ai algorithms for anomaly- based intrusion detection,
V . P. Gandi, N. S. L. Jatla, G. Sadhineni, S. Geddamuri, G. K. Chaitanya, and A. Velmurugan, “A comparative study of ai algorithms for anomaly- based intrusion detection,” in2023 7th International Conference on Computing Methodologies and Communication (ICCMC). IEEE, 2023, pp. 530–534. TABLE III PERFORMANCE OF DIFFERENT VARIANTS OF THE PROPOSED MODELS IN...
work page 2023
-
[11]
Generic CNN EdgeIIoT 98.98✗ ✗ ✗ ✗ ✗ ✗
-
[12]
LSTM/DNN CICIoMT 79✗78 79 76✗ ✗ [26] CNN-LSTM- ResNet-SA EdgeIIoMT 33.30✗33.31 100 49.97✗ ✗ CICIoT 99.88✗99.89 99.99 99.94✗ ✗
-
[13]
CNN EdgeIIoT 96.50✗97.48 96.50 96.41✗0.00012 CICIoMT 99.67✗99.67 99.67 99.66✗0.00004 Our ViT-BiLSTM EdgeIIoT 99.33 0.0158 99.33 99.33 99.33 0.0013 0.00035 CICIoMT 98.16 0.0578 98.16 98.16 98.15 0.0036 0.00014 Acc, Pr, Rc, and FPR are in (%), Inference time is in (seconds/instance)
-
[14]
Applied artificial intelligence as event horizon of cyber security,
A. Ali, A. W. Septyanto, I. Chaudhary, H. Al Hamadi, H. M. Alzoubi, and Z. F. Khan, “Applied artificial intelligence as event horizon of cyber security,” in2022 International Conference on Business Analytics for Technology and Security (ICBATS). IEEE, 2022, pp. 1–7
work page 2022
-
[15]
Leveraging graph neural networks for iot attack detection,
O. Ceran, E. ¨Ozdo˘gan, and M. Uysal, “Leveraging graph neural networks for iot attack detection,”Sakarya University Journal of Computer and Information Sciences, vol. 8, no. 2, pp. 223–244, 2025
work page 2025
-
[16]
A bilstm-based iot intrusion detection sys- tem with mutual information and focal loss,
H. Peng, C. Wu, and Y . Xiao, “A bilstm-based iot intrusion detection sys- tem with mutual information and focal loss,” in2024 6th International Conference on Frontier Technologies of Information and Computer (ICFTIC). IEEE, 2024, pp. 1–6
work page 2024
-
[17]
A robust cross-domain ids using bigru-lstm-attention for medical and industrial iot security,
A. Gueriani, H. Kheddar, A. C. Mazari, and M. C. Ghanem, “A robust cross-domain ids using bigru-lstm-attention for medical and industrial iot security,”arXiv preprint arXiv:2508.12470, 2025
-
[18]
A. Gueriani, H. Kheddar, and A. C. Mazari, “Hybrid resnet-1d-bigru with multi-head attention for cyberattack detection in industrial iot en- vironments,” in2025 International Conference on Intelligent Computer Systems, Data Science and Applications (IC2SDA). IEEE, 2025, pp. 1–6
work page 2025
-
[19]
M. A. Ferrag, O. Friha, D. Hamouda, L. Maglaras, and H. Janicke, “Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications for centralized and federated learning,”IEEE Access, vol. 10, pp. 40 281–40 306, 2022
work page 2022
-
[20]
Ciciomt2024: A benchmark dataset for multi-protocol security assessment in iomt,
S. Dadkhah, E. C. P. Neto, R. Ferreira, R. C. Molokwu, S. Sadeghi, and A. A. Ghorbani, “Ciciomt2024: A benchmark dataset for multi-protocol security assessment in iomt,”Internet of Things, vol. 28, p. 101351, 2024
work page 2024
-
[21]
Deep reinforcement learn- ing for intrusion detection in IoT: A survey,
A. Gueriani, H. Kheddar, and A. C. Mazari, “Deep reinforcement learn- ing for intrusion detection in IoT: A survey,” in2023 2nd International Conference on Electronics, Energy and Measurement (IC2EM), vol. 1. IEEE, 2023, pp. 1–7
work page 2023
-
[22]
H. Kheddar, M. Hemis, Y . Himeur, D. Meg ´ıas, and A. Amira, “Deep learning for steganalysis of diverse data types: A review of methods, tax- onomy, challenges and future directions,”Neurocomputing, p. 127528, 2024
work page 2024
-
[23]
Deep transfer learning for automatic speech recognition: Towards better generalization,
H. Kheddar, Y . Himeur, S. Al-Maadeed, A. Amira, and F. Bensaali, “Deep transfer learning for automatic speech recognition: Towards better generalization,”Knowledge-Based Systems, vol. 277, p. 110851, 2023
work page 2023
-
[24]
Adaptive cyber-attack detection in iiot using attention-based lstm-cnn models,
A. Gueriani, H. Kheddar, and A. C. Mazari, “Adaptive cyber-attack detection in iiot using attention-based lstm-cnn models,” in2024 In- ternational Conference on Telecommunications and Intelligent Systems (ICTIS). IEEE, 2024, pp. 1–6
work page 2024
-
[25]
Enhancing iot security with cnn and lstm-based intrusion detec- tion systems,
——, “Enhancing iot security with cnn and lstm-based intrusion detec- tion systems,” in2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS). IEEE, 2024, pp. 1–7
work page 2024
-
[26]
K. Kharoubi, S. Cherbal, D. Mechta, and A. Gawanmeh, “Network intrusion detection system using convolutional neural networks: Nids- dl-cnn for iot security,”Cluster Computing, vol. 28, no. 4, p. 219, 2025
work page 2025
-
[27]
Convolutional neural network based iot intrusion detection system using edge-iiotset,
M. Singh and N. Chauhan, “Convolutional neural network based iot intrusion detection system using edge-iiotset,” in2024 International Conference on Integrated Circuits, Communication, and Computing Systems (ICIC3S), vol. 1. IEEE, 2024, pp. 1–4
work page 2024
-
[28]
Enhancing lomt security with deep learning based approach for medical iot threat detection,
N. C. Kavkas and K. Yildiz, “Enhancing lomt security with deep learning based approach for medical iot threat detection,” in2025 13th International Symposium on Digital Forensics and Security (ISDFS). IEEE, 2025, pp. 1–5
work page 2025
-
[29]
T. Sasi, A. H. Lashkari, R. Lu, P. Xiong, and S. Iqbal, “An efficient self attention-based 1d-cnn-lstm network for iot attack detection and identi- fication using network traffic,”Journal of Information and Intelligence, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.