Binary Image-Based Intrusion Detection for Operational Technology Networks: Extending the SPHBI Methodology from IoT to Modbus TCP
Pith reviewed 2026-05-08 17:29 UTC · model grok-4.3
The pith
Adding eight application-layer bytes to binary packet images enables 98.1% accurate intrusion detection in Modbus TCP networks with only 63 parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Extending SPHBI to Modbus TCP reveals TCP/IP headers alone provide only 51.8% binary accuracy due to lack of heterogeneity in SCADA traffic. Incorporating eight bytes of application-layer information raises binary accuracy to 98.1% using a model with just 63 parameters. The best model achieves 94.4% multiclass accuracy on nine classes with 56,873 parameters, about 430 times smaller than ResNet50 approaches, and detects seven of eight attack types with over 94% recall while replay remains undetectable from single packets.
What carries the argument
The Single Packet Header Binary Image (SPHBI) approach, which transforms packet headers and limited payload bytes into binary images for classification by small convolutional networks, applied at varying protocol depths.
If this is right
- TCP/IP headers prove insufficient for OT intrusion detection, unlike in IoT.
- Minimal application data enables near-perfect binary detection with extremely low parameters suitable for edge devices.
- Multiclass performance reaches 94.4% with a confidence interval of 92.9% to 95.9% across 10 random seeds.
- Seven attack types are reliably detected while replay attacks require multi-packet analysis.
- Models use far fewer parameters than image-based deep learning baselines like ResNet50.
Where Pith is reading between the lines
- Similar binary image methods may apply to other uniform-header protocols in industrial control systems.
- Hybrid systems combining single-packet SPHBI with sequence analysis could handle replay attacks.
- Deployment on actual OT hardware would test if the reported accuracies hold under live traffic variations.
- The parameter efficiency suggests potential for on-device training or frequent updates in resource-limited settings.
Load-bearing premise
The CIC Modbus 2023 dataset represents typical real-world Modbus TCP traffic and the specific attack behaviors, with single-packet binary images providing enough information to distinguish attacks.
What would settle it
Evaluating the same models on a new Modbus TCP dataset collected from a different industrial environment or including more replay attack instances, and observing whether binary accuracy falls significantly below 98% or multiclass accuracy below 90%.
Figures
read the original abstract
This paper extends the Single Packet Header Binary Image (SPHBI) intrusion detection methodology from IoT to Modbus TCP, evaluating five approaches spanning a gradient of protocol depth on the CIC Modbus 2023 dataset (11.4 million packets, eight detectable attack types). TCP/IP headers alone achieve only 51.8% binary accuracy, confirming that header-level heterogeneity exploited in IoT traffic is absent in uniform SCADA environments. Adding eight bytes of application-layer information improves binary accuracy to 98.1% with just 63 parameters, directly relevant to per-packet classification on resource-constrained OT edge devices. The best-performing approach achieves 94.4% +/- 2.2pp multiclass accuracy across nine classes (95% CI [92.9%, 95.9%], 10 seeds) with 56,873 parameters, roughly 430 times fewer than comparable ResNet50-based approaches. Per-class recall analysis shows seven of eight detectable attack types identified with recall above 94%, while replay attacks remain structurally undetectable by any single-packet method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends the Single Packet Header Binary Image (SPHBI) methodology from IoT to Modbus TCP operational technology networks. It evaluates five approaches with increasing protocol depth (from TCP/IP headers alone to full application-layer inclusion) on the CIC Modbus 2023 dataset of 11.4 million packets covering eight attack types. TCP/IP headers yield only 51.8% binary accuracy, while adding eight application-layer bytes raises binary accuracy to 98.1% using 63 parameters. The best approach achieves 94.4% ± 2.2pp multiclass accuracy (95% CI [92.9%, 95.9%], 10 seeds) across nine classes with 56,873 parameters—roughly 430 times fewer than ResNet50 baselines. Per-class analysis shows high recall (>94%) for seven of eight detectable attacks, with replay attacks identified as structurally undetectable by single-packet classifiers.
Significance. If the empirical results hold under scrutiny, the work offers a lightweight, parameter-efficient intrusion detection method tailored to resource-constrained OT edge devices. It demonstrates that uniform SCADA traffic lacks the header heterogeneity exploited in IoT settings, but that minimal application-layer bytes suffice for strong binary detection. The dramatic reduction in model size relative to deep learning baselines, combined with explicit acknowledgment of single-packet limitations, provides a practical foundation for real-time per-packet classification in industrial control systems.
major comments (2)
- [Methods/Experimental Setup] Methods/Experimental Setup: The abstract reports concrete performance numbers (98.1% binary accuracy, 94.4% ± 2.2pp multiclass with 95% CI from 10 seeds) and parameter counts (63 and 56,873), yet the manuscript provides no details on model architectures, training procedures, data splits, preprocessing, or hyperparameter selection. This absence is load-bearing for the central performance claims, as it prevents verification that the reported accuracies and efficiency gains are not artifacts of undisclosed implementation choices or selection effects.
- [Dataset and Evaluation] Dataset and Evaluation sections: The headline claims rest on the assumption that the CIC Modbus 2023 corpus faithfully represents real-world Modbus TCP traffic distributions, attack timing, and payload variability. The paper itself notes that replay attacks are structurally invisible to any single-packet classifier, but offers no cross-dataset validation, live-network testing, or analysis of whether other attack classes similarly require flow-level or temporal context absent from the binary-image representation. This directly affects the practical applicability asserted for OT networks.
minor comments (2)
- [Abstract/Introduction] The abstract and results would benefit from an explicit statement of how the binary images are constructed (e.g., exact packet-to-image mapping, dimensions, and normalization) to aid reproducibility.
- [Results] Table or figure captions for per-class recall results should include the exact number of samples per class to contextualize the reported recalls above 94%.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The comments identify key gaps in reproducibility and evaluation scope that we address below. We commit to revisions that strengthen the manuscript without altering its core claims.
read point-by-point responses
-
Referee: Methods/Experimental Setup: The abstract reports concrete performance numbers (98.1% binary accuracy, 94.4% ± 2.2pp multiclass with 95% CI from 10 seeds) and parameter counts (63 and 56,873), yet the manuscript provides no details on model architectures, training procedures, data splits, preprocessing, or hyperparameter selection. This absence is load-bearing for the central performance claims, as it prevents verification that the reported accuracies and efficiency gains are not artifacts of undisclosed implementation choices or selection effects.
Authors: We agree that the absence of these details hinders reproducibility and verification. The manuscript as submitted emphasizes the methodology and empirical outcomes but does not include the implementation specifics. In the revised version we will add a dedicated subsection in Methods that specifies: the neural network architectures for each of the five protocol-depth approaches (layer types, dimensions, activations); training procedures (optimizer, learning rate schedule, batch size, epochs, early stopping); data splits (train/validation/test proportions and how the 11.4 million packets were partitioned); preprocessing steps (byte extraction from Modbus TCP packets, binary-image construction, and normalization); and hyperparameter selection (any tuning process or fixed values). We will also document the ten random seeds used for the reported confidence intervals. These additions will allow independent reproduction of the accuracy and parameter-count results. revision: yes
-
Referee: Dataset and Evaluation sections: The headline claims rest on the assumption that the CIC Modbus 2023 corpus faithfully represents real-world Modbus TCP traffic distributions, attack timing, and payload variability. The paper itself notes that replay attacks are structurally invisible to any single-packet classifier, but offers no cross-dataset validation, live-network testing, or analysis of whether other attack classes similarly require flow-level or temporal context absent from the binary-image representation. This directly affects the practical applicability asserted for OT networks.
Authors: The referee correctly identifies a genuine limitation. CIC Modbus 2023 is a large, publicly available labeled corpus, yet it remains a single source whose fidelity to operational OT traffic cannot be assumed without further evidence. The manuscript already states that replay attacks are undetectable by any single-packet method; we will expand this point. In revision we will insert an extended Limitations and Future Work subsection that (i) analyzes which of the remaining attack classes may also benefit from temporal or flow-level features beyond the current binary-image representation, (ii) explicitly notes the reliance on one dataset and the absence of cross-dataset or live-network validation, and (iii) outlines planned extensions such as multi-packet aggregation. We cannot retroactively perform live-network testing or new cross-dataset experiments within the current study scope, but the added discussion will better bound the claimed applicability to resource-constrained OT edge devices. revision: partial
Circularity Check
No circularity: empirical results on external dataset are self-contained
full rationale
The paper reports direct empirical measurements of binary and multiclass accuracy for five protocol-depth variants of a binary-image classifier, evaluated on the named external CIC Modbus 2023 corpus (11.4 million packets). No equations, fitted parameters, or predictions are defined in terms of the target metrics; the reported figures (98.1% binary accuracy with 63 parameters, 94.4% multiclass with 56,873 parameters) are obtained via standard training/testing splits and cross-seed averaging. References to the prior SPHBI methodology serve only to describe the extension being tested and do not supply load-bearing premises that reduce the new results to self-citation or self-definition. The central claims therefore rest on independent experimental content rather than any of the enumerated circular patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- Trainable model parameters
axioms (1)
- domain assumption Binary image representations of packet headers plus limited application data contain sufficient information to distinguish normal Modbus TCP traffic from the eight attack types in the dataset.
Reference graph
Works this paper leans on
-
[1]
Anthi, E., Williams, L., Burnap, P., Jones, K.: A three -tiered intrusion detection system for industrial control systems. J. Cybersecur. 7(1), tyab006 (2021). https://doi.org/10.1093/cyb- sec/tyab006
-
[2]
In: ICC Workshops 2019, pp
Baptista, I., Shiaeles, S., Kolokotronis, N.: A novel malware detection system based on ma- chine learning and binary visualization. In: ICC Workshops 2019, pp. 1–6. IEEE (2019)
2019
-
[3]
In: 20th International Confer- ence on Privacy, Security and Trust (PST), pp
Boakye-Boateng, K., Ghorbani, A.A., Lashkari, A.H.: Securing substations with trust, risk posture and multi-agent systems: a comprehensive approach. In: 20th International Confer- ence on Privacy, Security and Trust (PST), pp. 1 –12. IEEE, Copenhagen (2023). https://doi.org/10.1109/PST58708.2023.10320154
-
[4]
Cybersecurity 8(104), 1 –23 (2025)
El-Sherif, M., Khattab, A., El-Soudani, M.: Intrusion detection using TCP/IP single packet header binary image for IoT networks. Cybersecurity 8(104), 1 –23 (2025). https://doi.org/10.1186/s42400-025-00441-x
-
[5]
IEEE Access 10, 40281–40306 (2022)
Ferrag, M.A., Friha, O., Hamouda, D., Maglaras, L., Janicke, H.: Edge-IIoTset: a new com- prehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning. IEEE Access 10, 40281–40306 (2022)
2022
-
[6]
Kotsiopoulos, T., Radoglou -Grammatikis, P., Lekka, Z., Mladenov, V., Sarigiannidis, P.: Defending industrial internet of things against Modbus/TCP threats: a combined AI -based detection and SDN -based mitigation solution. Int. J. Inf. Secur. 24(157) (2025). https://doi.org/10.1007/s10207-025-01076-2 Binary Image-Based Intrusion Detection for Operational...
-
[7]
In: Proceedings of the 8th International Symposium on Visualiza- tion for Cyber Security (VizSec), pp
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualiza- tion for Cyber Security (VizSec), pp. 1–7. ACM (2011)
2011
-
[8]
In: ITASEC 2024
Russo, S., Zanasi, C., Marasco, I.: Feature extraction for anomaly detection in industrial control systems. In: ITASEC 2024. CEUR Workshop Proceedings, vol. 3731, paper 22 (2024)
2024
-
[9]
Termanini, A., Al -Abri, D., Bourdoucen, H., Al Maashri, A.: Using machine learning to detect network intrusions in industrial control systems: a survey. Int. J. Inf. Secur. (2024). https://doi.org/10.1007/s10207-024-00916-x
-
[10]
In: International Conference on Infor- mation Networking (ICOIN), pp
Wang, W., Zhu, M., Zeng, X., Ye, X., Sheng, Y.: Malware traffic classification using con- volutional neural network for representation learning. In: International Conference on Infor- mation Networking (ICOIN), pp. 712–717. IEEE (2017)
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.