Anomaly-based Intrusion Detection in Industrial Data with SVM and Random Forests
Pith reviewed 2026-05-24 16:56 UTC · model grok-4.3
The pith
Support vector machines and random forests identify attacks in Modbus and OPC UA industrial traffic.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The application of SVM and Random Forest to the Modbus gas pipeline and OPC UA batch processing datasets demonstrates effective detection of introduced attacks, with Random Forest marginally outperforming SVM, while addressing feature extraction and missing data issues.
What carries the argument
SVM and Random Forest machine learning classifiers operating on features from industrial network traffic for anomaly detection.
Load-bearing premise
The attacks introduced into the test datasets have the same characteristics as those that would occur in actual industrial environments.
What would settle it
Applying the models to network captures from a real industrial incident or a third dataset with different attack types and checking the accuracy would disprove or support the claim.
Figures
read the original abstract
Attacks on industrial enterprises are increasing in number as well as in effect. Since the introduction of industrial control systems in the 1970's, industrial networks have been the target of malicious actors. More recently, the political and warfare-aspects of attacks on industrial and critical infrastructure are becoming more relevant. In contrast to classic home and office IT systems, industrial IT, so-called OT systems, have an effect on the physical world. Furthermore, industrial devices have long operation times, sometimes several decades. Updates and fixes are tedious and often not possible. The threats on industry with the legacy requirements of industrial environments creates the need for efficient intrusion detection that can be integrated into existing systems. In this work, the network data containing industrial operation is analysed with machine learning- and time series- based anomaly detection algorithms in order to discover the attacks introduced to the data. Two different data sets are used, one Modbus-based gas pipeline control traffic and one OPC UA-based batch processing traffic. In order to detect attacks, two machine learning-based algorithms are used, namely \textit{SVM} and Random Forest. Both perform well, with Random Forest slightly outperforming SVM. Furthermore, extracting and selecting features as well as handling missing data is addressed in this work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript applies SVM and Random Forest classifiers to anomaly detection on two industrial datasets (Modbus gas-pipeline control traffic and OPC UA batch-processing traffic). After feature extraction/selection and missing-data handling, it claims both algorithms detect the introduced attacks effectively, with Random Forest slightly outperforming SVM.
Significance. If the performance claims hold under proper validation, the work would offer a practical demonstration of standard ML classifiers for legacy OT intrusion detection. However, the absence of domain-expert validation of the synthetic attacks and of any comparison to real attack traces limits the result to a narrow, dataset-specific demonstration rather than a generalizable contribution.
major comments (2)
- [Datasets / Evaluation] Datasets and evaluation sections: the central claim that the classifiers detect attacks in these ICS environments rests on the untested assumption that the synthetically introduced attacks reproduce the statistical and temporal signatures of real-world threats. No domain-expert validation, no comparison against independently collected real attack traces, and no ablation of the chosen features versus injection artifacts are provided; this directly undermines the assertion that the models would generalize to previously unseen intrusions.
- [Abstract / Results] Abstract and results: performance is asserted as 'good' and 'slightly outperforming' without reporting concrete metrics (accuracy, precision, recall, F1), baselines, cross-validation procedure, or error analysis. If these details are also absent from the full evaluation section, the empirical comparison cannot be verified and the superiority claim cannot be assessed.
minor comments (1)
- [Methods] Clarify the exact feature-selection method and the criteria used for handling missing values; these steps are mentioned but not described with sufficient algorithmic detail for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our application of SVM and Random Forest to anomaly detection in industrial control system traffic. We address each major comment below.
read point-by-point responses
-
Referee: [Datasets / Evaluation] Datasets and evaluation sections: the central claim that the classifiers detect attacks in these ICS environments rests on the untested assumption that the synthetically introduced attacks reproduce the statistical and temporal signatures of real-world threats. No domain-expert validation, no comparison against independently collected real attack traces, and no ablation of the chosen features versus injection artifacts are provided; this directly undermines the assertion that the models would generalize to previously unseen intrusions.
Authors: The Modbus gas pipeline and OPC UA datasets are established benchmarks in ICS security research, with attacks generated according to documented protocol-specific threat models. We will add an explicit limitations subsection acknowledging the lack of domain-expert validation and real-world trace comparison, plus an ablation study separating selected features from potential injection artifacts. Direct comparison to independently collected real attacks is outside the scope of the current datasets. revision: partial
-
Referee: [Abstract / Results] Abstract and results: performance is asserted as 'good' and 'slightly outperforming' without reporting concrete metrics (accuracy, precision, recall, F1), baselines, cross-validation procedure, or error analysis. If these details are also absent from the full evaluation section, the empirical comparison cannot be verified and the superiority claim cannot be assessed.
Authors: The evaluation section already reports accuracy, precision, recall, and F1 scores for both classifiers on each dataset, together with the 5-fold cross-validation procedure and discussion of false-positive patterns. We will revise the abstract to include the specific numerical results and baseline comparisons. revision: yes
- Direct comparison against independently collected real attack traces, which would require new data collection beyond the provided benchmark datasets.
Circularity Check
No circularity: purely empirical ML evaluation on fixed datasets
full rationale
The manuscript applies standard SVM and Random Forest classifiers to two pre-existing industrial datasets (Modbus gas pipeline and OPC UA batch processing) after feature extraction and missing-data handling. No equations, derivations, or first-principles predictions appear. Performance metrics are obtained by direct training and testing on the supplied traces; no parameter is fitted to a subset and then renamed as a prediction of a related quantity. No self-citations are invoked to justify uniqueness or load-bearing premises. The evaluation chain is therefore self-contained against external benchmarks and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (2)
- SVM kernel and regularization parameters
- Random Forest tree count and depth
axioms (2)
- domain assumption The synthetic attacks inserted into the datasets mirror real attacker behavior in industrial networks
- domain assumption Selected network features are sufficient to distinguish attacks from normal operation
Reference graph
Works this paper leans on
-
[1]
Two decades of SCADA exploitation: A brief history,
S. Duque Anton, D. Fraunholz, C. Lipps, F. Pohl, M. Zimmermann, and H. D. Schotten, “Two decades of SCADA exploitation: A brief history,” in 2017 IEEE Conference on Application, Information and Network Security (AINS), November 2017, pp. 98–104
work page 2017
-
[2]
Security issues in SCADA networks,
V . M. Igure, S. A. Laughter, and R. D. Williams, “Security issues in SCADA networks,” Computers & Security , no. 25, pp. 498–506, 2006
work page 2006
-
[3]
Product design process model in the digital factory context,
I. D. Stef, G. Draghici, and A. Draghici, “Product design process model in the digital factory context,” Procedia Technology, vol. 9, pp. 451 – 462, 2013
work page 2013
-
[4]
Modbus application protocol specification v1.1b3,
Modbus, “Modbus application protocol specification v1.1b3,” 2012. [Online]. Available: http://www.modbus.org/docs/Modbus Application Protocol V1 1b3.pdf
work page 2012
-
[5]
Modbus messaging on tcp/ip implementation guide v1.0b,
Modbus-IDA, “Modbus messaging on tcp/ip implementation guide v1.0b,” 2006. [Online]. Available: http://www.modbus.org/docs/ Modbus Messaging Implementation Guide V1 0b.pdf
work page 2006
-
[6]
PROFIBUS, “Profinet specification,” 2017. [Online]. Avail- able: http://www.profibus.com/nc/download/specifications-standards/ downloads/profinet-io-specification/display/
work page 2017
-
[7]
Win32/Industroyer - a new threat for industrial control systems,
A. Cherepanov, “Win32/Industroyer - a new threat for industrial control systems,” ESET, Tech. Rep., June 2017
work page 2017
-
[8]
Chrashoverride - analysis of the threat to electric grid opera- tions,
Dragos, “Chrashoverride - analysis of the threat to electric grid opera- tions,” Dragos Inc., Tech. Rep. 2.20170613, 2016
work page 2016
-
[9]
Analysis of the cyber attack on the ukrainian power grid,
R. M. Lee, M. J. Assante, and T. Conway, “Analysis of the cyber attack on the ukrainian power grid,” Electricity Information Sharing and Analysis Center (E-ISAC) , 2016
work page 2016
-
[10]
S. Duque Anton, S. Kanoor, D. Fraunholz, and H. D. Schotten, “Evalu- ation of machine learning-based anomaly detection algorithms on an in- dustrial Modbus/TCP data set,” in Proceedings of the 13th International Conference on Availability, Reliability and Security (ARES) . ACM, 2018
work page 2018
-
[11]
Time is of the essence: Machine learning-based intrusion detection in industrial time series data,
S. Duque Anton, L. Ahrens, D. Fraunholz, and H. D. Schotten, “Time is of the essence: Machine learning-based intrusion detection in industrial time series data,” in IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2018
work page 2018
-
[12]
A review of cyber security risk assessment methods for SCADA systems,
Y . Cherdantseva, P. Burnap, A. Blyth, P. Eden, K. Jones, H. Soulsby, and K. Stoddart, “A review of cyber security risk assessment methods for SCADA systems,” in Computers & Security , no. 56, 2016
work page 2016
-
[13]
SCADA-specific intrusion detection / prevention systems : A survey and taxonomy,
B. Zhu and S. Sastry, “SCADA-specific intrusion detection / prevention systems : A survey and taxonomy,” 2010
work page 2010
-
[14]
Anomaly-based network intrusion detection: Techniques, systems and challenges,
V . M. Igure, S. A. Laughter, and R. D. Williams, “Anomaly-based network intrusion detection: Techniques, systems and challenges,” Com- puters & Security , no. 28, pp. 18–28, February 2009
work page 2009
-
[15]
Sequence-aware intrusion de- tection in industrial control systems,
M. Caselli, E. Zambon, and F. Kargl, “Sequence-aware intrusion de- tection in industrial control systems,” in Proceedings of the 1st ACM Workshop on Cyber-Physical System Security. ACM, 2015, pp. 13–24
work page 2015
-
[16]
A. Khalili and A. Sami, “Sysdetect: A systematic approach to critical state determination for industrial intrusion detection systems using apriori algorithm,” Journal of Process Control , vol. 32, pp. 154–160, 2015
work page 2015
-
[17]
W. Gao and T. H. Morris, “On cyber attacks and signature based intrusion detection for modbus based industrial control systems,”Journal of Digital Forensics, Security and Law , vol. 9, no. 1, 2014
work page 2014
-
[18]
A retrofit network intrusion detection system for modbus rtu and ascii industrial control systems,
T. Morris, R. Vaughn, and Y . Dandass, “A retrofit network intrusion detection system for modbus rtu and ascii industrial control systems,” in 2012 45th Hawaii International Conference on System Sciences . IEEE, 2012, pp. 2338–2345
work page 2012
-
[19]
A survey of intrusion detection systems in wireless sensor networks,
I. Butun, S. D. Morgera, and R. Sankar, “A survey of intrusion detection systems in wireless sensor networks,” IEEE Communications Surveys Tutorials, vol. 16, no. 1, pp. 266–282, January 2014
work page 2014
-
[20]
Eaacka secure intrusion- detection system for manets,
E. M. Shakshuki, N. Kang, and T. R. Sheltami, “Eaacka secure intrusion- detection system for manets,” IEEE Transactions on Industrial Electron- ics, vol. 60, no. 3, pp. 1089–1098, March 2013
work page 2013
-
[21]
Intrusion detection scheme using traffic predic- tion for wireless industrial networks,
M. Wei and K. Kim, “Intrusion detection scheme using traffic predic- tion for wireless industrial networks,” Journal of Communications and Networks, vol. 14, no. 3, pp. 310–318, June 2012
work page 2012
-
[22]
An experimental study of hierarchical intrusion detection for wireless industrial sensor networks,
S. Shin, T. Kwon, G.-Y . Jo, Y . Park, and H. Rhy, “An experimental study of hierarchical intrusion detection for wireless industrial sensor networks,” IEEE Transactions on Industrial Informatics , vol. 6, no. 4, pp. 744–757, 2010
work page 2010
-
[23]
Intrusion detection techniques for mobile wireless networks,
Y . Zhang, W. Lee, and Y .-A. Huang, “Intrusion detection techniques for mobile wireless networks,” Wirel. Netw., vol. 9, no. 5, pp. 545–556, Sep. 2003
work page 2003
-
[24]
Anomaly detection in industrial networks using machine learning: A roadmap,
A. Meshram and C. Haas, “Anomaly detection in industrial networks using machine learning: A roadmap,” in Machine Learning for Cyber Physical Systems , ser. Technologien f ¨ur die intelligente Automation, J. Beyerer, O. Niggemann, and C. K ¨uhnert, Eds. Berlin, Germany: Springer Vieweg, 2017, pp. 65–72
work page 2017
-
[25]
Towards learning normality for anomaly detection in industrial control networks,
F. Schuster, A. Paul, and H. K ¨onig, “Towards learning normality for anomaly detection in industrial control networks,” in Emerging Management Mechanisms for the Future Internet , ser. Lecture Notes in Computer Science, G. Doyen, M. Waldburger, P. Celeda, and A. Sper- otto, Eds. Berlin/Heidelberg: Springer Berlin Heidelberg, 2013, vol. 7943, pp. 61–72
work page 2013
-
[26]
An evaluation of machine learning methods to detect malicious scada communications,
J. M. Beaver, R. C. Borges-Hink, and M. A. Buckner, “An evaluation of machine learning methods to detect malicious scada communications,” in 2013 12th International Conference on Machine Learning and Ap- plications, vol. 2, Dec 2013, pp. 54–59
work page 2013
-
[27]
Intrusion detection using neural networks and support vector machines,
S. Mukkamala, G. Janoski, and A. Sung, “Intrusion detection using neural networks and support vector machines,” in Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No.02CH37290), vol. 2, May 2002, pp. 1702–1707
work page 2002
-
[28]
Industrial control system simulation and data logging for intrusion detection system research,
T. H. Morris, Z. Thornton, and I. Turnipseed, “Industrial control system simulation and data logging for intrusion detection system research,” 7th Annual Southeastern Cyber Security Summit , June 2015
work page 2015
-
[29]
(2019) Model and simulate fluid systems
MathWorks. (2019) Model and simulate fluid systems. [Online]. Available: https://www.mathworks.com/products/simhydraulics.html
work page 2019
-
[30]
S. Duque Anton, M. Gundall, D. Fraunholz, and H. D. Schotten, “Implementing scada scenarios and introducing attacks to obtain training data for intrusion detection methods,” in International Conference on Cyber Warfare and Security (ICCWS) , 2019
work page 2019
-
[31]
A training algorithm for optimal margin classifiers,
B. E. Boser, I. M. Guyon, and V . N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the Fifth Annual Workshop on Computational Learning Theory , ser. COLT ’92, New York, NY , USA, 1992, pp. 144–152
work page 1992
-
[32]
C. Cortes and V . Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, September 1995
work page 1995
-
[33]
L. Breiman, “Random forests,” Machine Learning , vol. 45, no. 1, pp. 5–32, 2001
work page 2001
-
[34]
Top-down induction of decision trees classifiersa survey,
L. Rokach and O. Maimon, “Top-down induction of decision trees classifiersa survey,” IEEE Transactions on Systems, Man, and Cyber- netics, Part C (Applications and Reviews) , vol. 35, no. 4, pp. 476–487, November 2005
work page 2005
-
[35]
Network intrusion detection using random forests
J. Zhang and M. Zulkernine, “Network intrusion detection using random forests.” in Pst. Citeseer, 2005
work page 2005
-
[36]
Multiple imputation for multivariate missing-data problems: A data analyst’s perspective,
J. L. Schafer and M. K. Olsen, “Multiple imputation for multivariate missing-data problems: A data analyst’s perspective,” Multivariate be- havioral research, vol. 33, no. 4, pp. 545–571, 1998
work page 1998
-
[37]
The effects of the irregular sample and missing data in time series analysis,
D. M. Kreindler and C. J. Lumsden, “The effects of the irregular sample and missing data in time series analysis,” in Nonlinear Dynamical Systems Analysis for the Behavioral Sciences Using Real Data . CRC Press, 2016, pp. 149–172
work page 2016
-
[38]
Svms modeling for highly imbalanced classification,
Y . Tang, Y .-Q. Zhang, N. V . Chawla, and S. Krasser, “Svms modeling for highly imbalanced classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , vol. 39, no. 1, pp. 281– 288, 2009
work page 2009
-
[39]
C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling,
C. Drummond, R. C. Holte et al. , “C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling,” in Workshop on learning from imbalanced datasets II , vol. 11. Citeseer, 2003, pp. 1–8
work page 2003
-
[40]
(2019) Tuning the hyper-parameters of an estimator
scikit learn. (2019) Tuning the hyper-parameters of an estimator
work page 2019
-
[41]
Modern problems require modern solutions: Hybrid concepts for industrial intrusion de- tection,
S. Duque Anton, M. Strufe, and H. D. Schotten, “Modern problems require modern solutions: Hybrid concepts for industrial intrusion de- tection,” in ITG-Fachtagung Mobilkommunikation - Technologien und Anwendungen (ITG-17) , Informationstechnische Gesellschaft im VDE (ITG). VDE Verlag GmbH, May 2019
work page 2019
-
[42]
Putting together the pieces: A concept for holistic industrial intrusion detection,
S. Duque Anton and H. D. Schotten, “Putting together the pieces: A concept for holistic industrial intrusion detection,” in 18th European Conference on Cyber Warfare and Security (ECCWS) , ACPI. ACPI, July 2019
work page 2019
-
[43]
Using temporal and topological features for intrusion detection in operational networks,
S. Duque Anton, D. Fraunholz, and H. D. Schotten, “Using temporal and topological features for intrusion detection in operational networks,” in ARES ’19: Proceedings of the 13th International Conference on Availability, Reliability and Security , ACM. ACM, August 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.