pith. sign in

arxiv: 1907.10374 · v1 · pith:JOD2DFIGnew · submitted 2019-07-24 · 💻 cs.CR

Anomaly-based Intrusion Detection in Industrial Data with SVM and Random Forests

Pith reviewed 2026-05-24 16:56 UTC · model grok-4.3

classification 💻 cs.CR
keywords anomaly detectionintrusion detectionindustrial control systemsSVMrandom forestModbusOPC UAgas pipeline
0
0 comments X

The pith

Support vector machines and random forests identify attacks in Modbus and OPC UA industrial traffic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates SVM and random forest algorithms for anomaly-based intrusion detection using two datasets: one from a Modbus-based gas pipeline and another from an OPC UA-based batch processing system. Both algorithms detect the attacks added to these datasets effectively. Random forest shows slightly better performance than SVM. Readers might care because attacks on industrial systems can have physical consequences and these systems often cannot be easily updated or patched. The work also covers feature extraction and handling of missing data to make the methods applicable.

Core claim

The application of SVM and Random Forest to the Modbus gas pipeline and OPC UA batch processing datasets demonstrates effective detection of introduced attacks, with Random Forest marginally outperforming SVM, while addressing feature extraction and missing data issues.

What carries the argument

SVM and Random Forest machine learning classifiers operating on features from industrial network traffic for anomaly detection.

Load-bearing premise

The attacks introduced into the test datasets have the same characteristics as those that would occur in actual industrial environments.

What would settle it

Applying the models to network captures from a real industrial incident or a third dataset with different attack types and checking the accuracy would disprove or support the claim.

Figures

Figures reproduced from arXiv: 1907.10374 by Hans Dieter Schotten, Sapna Sinha, Simon D. Duque Anton.

Figure 1
Figure 1. Figure 1: Process Behaviour in DS2 packet 1 800 and from packet 3 000 to packet 3 500, marked by red frames. In the first attack, all sensor and actuator values are set to 0, indicating an error in the process, causing a disruption due to necessary investigation and maintenance. For the second attack, the frequency of the process is cut in half. This causes a change in the process, still indicating activity. IV. ALG… view at source ↗
read the original abstract

Attacks on industrial enterprises are increasing in number as well as in effect. Since the introduction of industrial control systems in the 1970's, industrial networks have been the target of malicious actors. More recently, the political and warfare-aspects of attacks on industrial and critical infrastructure are becoming more relevant. In contrast to classic home and office IT systems, industrial IT, so-called OT systems, have an effect on the physical world. Furthermore, industrial devices have long operation times, sometimes several decades. Updates and fixes are tedious and often not possible. The threats on industry with the legacy requirements of industrial environments creates the need for efficient intrusion detection that can be integrated into existing systems. In this work, the network data containing industrial operation is analysed with machine learning- and time series- based anomaly detection algorithms in order to discover the attacks introduced to the data. Two different data sets are used, one Modbus-based gas pipeline control traffic and one OPC UA-based batch processing traffic. In order to detect attacks, two machine learning-based algorithms are used, namely \textit{SVM} and Random Forest. Both perform well, with Random Forest slightly outperforming SVM. Furthermore, extracting and selecting features as well as handling missing data is addressed in this work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript applies SVM and Random Forest classifiers to anomaly detection on two industrial datasets (Modbus gas-pipeline control traffic and OPC UA batch-processing traffic). After feature extraction/selection and missing-data handling, it claims both algorithms detect the introduced attacks effectively, with Random Forest slightly outperforming SVM.

Significance. If the performance claims hold under proper validation, the work would offer a practical demonstration of standard ML classifiers for legacy OT intrusion detection. However, the absence of domain-expert validation of the synthetic attacks and of any comparison to real attack traces limits the result to a narrow, dataset-specific demonstration rather than a generalizable contribution.

major comments (2)
  1. [Datasets / Evaluation] Datasets and evaluation sections: the central claim that the classifiers detect attacks in these ICS environments rests on the untested assumption that the synthetically introduced attacks reproduce the statistical and temporal signatures of real-world threats. No domain-expert validation, no comparison against independently collected real attack traces, and no ablation of the chosen features versus injection artifacts are provided; this directly undermines the assertion that the models would generalize to previously unseen intrusions.
  2. [Abstract / Results] Abstract and results: performance is asserted as 'good' and 'slightly outperforming' without reporting concrete metrics (accuracy, precision, recall, F1), baselines, cross-validation procedure, or error analysis. If these details are also absent from the full evaluation section, the empirical comparison cannot be verified and the superiority claim cannot be assessed.
minor comments (1)
  1. [Methods] Clarify the exact feature-selection method and the criteria used for handling missing values; these steps are mentioned but not described with sufficient algorithmic detail for reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our application of SVM and Random Forest to anomaly detection in industrial control system traffic. We address each major comment below.

read point-by-point responses
  1. Referee: [Datasets / Evaluation] Datasets and evaluation sections: the central claim that the classifiers detect attacks in these ICS environments rests on the untested assumption that the synthetically introduced attacks reproduce the statistical and temporal signatures of real-world threats. No domain-expert validation, no comparison against independently collected real attack traces, and no ablation of the chosen features versus injection artifacts are provided; this directly undermines the assertion that the models would generalize to previously unseen intrusions.

    Authors: The Modbus gas pipeline and OPC UA datasets are established benchmarks in ICS security research, with attacks generated according to documented protocol-specific threat models. We will add an explicit limitations subsection acknowledging the lack of domain-expert validation and real-world trace comparison, plus an ablation study separating selected features from potential injection artifacts. Direct comparison to independently collected real attacks is outside the scope of the current datasets. revision: partial

  2. Referee: [Abstract / Results] Abstract and results: performance is asserted as 'good' and 'slightly outperforming' without reporting concrete metrics (accuracy, precision, recall, F1), baselines, cross-validation procedure, or error analysis. If these details are also absent from the full evaluation section, the empirical comparison cannot be verified and the superiority claim cannot be assessed.

    Authors: The evaluation section already reports accuracy, precision, recall, and F1 scores for both classifiers on each dataset, together with the 5-fold cross-validation procedure and discussion of false-positive patterns. We will revise the abstract to include the specific numerical results and baseline comparisons. revision: yes

standing simulated objections not resolved
  • Direct comparison against independently collected real attack traces, which would require new data collection beyond the provided benchmark datasets.

Circularity Check

0 steps flagged

No circularity: purely empirical ML evaluation on fixed datasets

full rationale

The manuscript applies standard SVM and Random Forest classifiers to two pre-existing industrial datasets (Modbus gas pipeline and OPC UA batch processing) after feature extraction and missing-data handling. No equations, derivations, or first-principles predictions appear. Performance metrics are obtained by direct training and testing on the supplied traces; no parameter is fitted to a subset and then renamed as a prediction of a related quantity. No self-citations are invoked to justify uniqueness or load-bearing premises. The evaluation chain is therefore self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Central claim rests on representativeness of the attack-injected datasets and sufficiency of feature engineering, neither of which receives independent evidence in the abstract.

free parameters (2)
  • SVM kernel and regularization parameters
    Hyperparameters chosen to fit the industrial traffic data; values and selection process not reported.
  • Random Forest tree count and depth
    Hyperparameters chosen to fit the industrial traffic data; values and selection process not reported.
axioms (2)
  • domain assumption The synthetic attacks inserted into the datasets mirror real attacker behavior in industrial networks
    Invoked when claiming detection performance generalizes to operational environments.
  • domain assumption Selected network features are sufficient to distinguish attacks from normal operation
    Invoked when reporting that both algorithms perform well after feature extraction.

pith-pipeline@v0.9.0 · 5753 in / 1222 out tokens · 20058 ms · 2026-05-24T16:56:44.136479+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    Two decades of SCADA exploitation: A brief history,

    S. Duque Anton, D. Fraunholz, C. Lipps, F. Pohl, M. Zimmermann, and H. D. Schotten, “Two decades of SCADA exploitation: A brief history,” in 2017 IEEE Conference on Application, Information and Network Security (AINS), November 2017, pp. 98–104

  2. [2]

    Security issues in SCADA networks,

    V . M. Igure, S. A. Laughter, and R. D. Williams, “Security issues in SCADA networks,” Computers & Security , no. 25, pp. 498–506, 2006

  3. [3]

    Product design process model in the digital factory context,

    I. D. Stef, G. Draghici, and A. Draghici, “Product design process model in the digital factory context,” Procedia Technology, vol. 9, pp. 451 – 462, 2013

  4. [4]

    Modbus application protocol specification v1.1b3,

    Modbus, “Modbus application protocol specification v1.1b3,” 2012. [Online]. Available: http://www.modbus.org/docs/Modbus Application Protocol V1 1b3.pdf

  5. [5]

    Modbus messaging on tcp/ip implementation guide v1.0b,

    Modbus-IDA, “Modbus messaging on tcp/ip implementation guide v1.0b,” 2006. [Online]. Available: http://www.modbus.org/docs/ Modbus Messaging Implementation Guide V1 0b.pdf

  6. [6]

    Profinet specification,

    PROFIBUS, “Profinet specification,” 2017. [Online]. Avail- able: http://www.profibus.com/nc/download/specifications-standards/ downloads/profinet-io-specification/display/

  7. [7]

    Win32/Industroyer - a new threat for industrial control systems,

    A. Cherepanov, “Win32/Industroyer - a new threat for industrial control systems,” ESET, Tech. Rep., June 2017

  8. [8]

    Chrashoverride - analysis of the threat to electric grid opera- tions,

    Dragos, “Chrashoverride - analysis of the threat to electric grid opera- tions,” Dragos Inc., Tech. Rep. 2.20170613, 2016

  9. [9]

    Analysis of the cyber attack on the ukrainian power grid,

    R. M. Lee, M. J. Assante, and T. Conway, “Analysis of the cyber attack on the ukrainian power grid,” Electricity Information Sharing and Analysis Center (E-ISAC) , 2016

  10. [10]

    Evalu- ation of machine learning-based anomaly detection algorithms on an in- dustrial Modbus/TCP data set,

    S. Duque Anton, S. Kanoor, D. Fraunholz, and H. D. Schotten, “Evalu- ation of machine learning-based anomaly detection algorithms on an in- dustrial Modbus/TCP data set,” in Proceedings of the 13th International Conference on Availability, Reliability and Security (ARES) . ACM, 2018

  11. [11]

    Time is of the essence: Machine learning-based intrusion detection in industrial time series data,

    S. Duque Anton, L. Ahrens, D. Fraunholz, and H. D. Schotten, “Time is of the essence: Machine learning-based intrusion detection in industrial time series data,” in IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2018

  12. [12]

    A review of cyber security risk assessment methods for SCADA systems,

    Y . Cherdantseva, P. Burnap, A. Blyth, P. Eden, K. Jones, H. Soulsby, and K. Stoddart, “A review of cyber security risk assessment methods for SCADA systems,” in Computers & Security , no. 56, 2016

  13. [13]

    SCADA-specific intrusion detection / prevention systems : A survey and taxonomy,

    B. Zhu and S. Sastry, “SCADA-specific intrusion detection / prevention systems : A survey and taxonomy,” 2010

  14. [14]

    Anomaly-based network intrusion detection: Techniques, systems and challenges,

    V . M. Igure, S. A. Laughter, and R. D. Williams, “Anomaly-based network intrusion detection: Techniques, systems and challenges,” Com- puters & Security , no. 28, pp. 18–28, February 2009

  15. [15]

    Sequence-aware intrusion de- tection in industrial control systems,

    M. Caselli, E. Zambon, and F. Kargl, “Sequence-aware intrusion de- tection in industrial control systems,” in Proceedings of the 1st ACM Workshop on Cyber-Physical System Security. ACM, 2015, pp. 13–24

  16. [16]

    Sysdetect: A systematic approach to critical state determination for industrial intrusion detection systems using apriori algorithm,

    A. Khalili and A. Sami, “Sysdetect: A systematic approach to critical state determination for industrial intrusion detection systems using apriori algorithm,” Journal of Process Control , vol. 32, pp. 154–160, 2015

  17. [17]

    On cyber attacks and signature based intrusion detection for modbus based industrial control systems,

    W. Gao and T. H. Morris, “On cyber attacks and signature based intrusion detection for modbus based industrial control systems,”Journal of Digital Forensics, Security and Law , vol. 9, no. 1, 2014

  18. [18]

    A retrofit network intrusion detection system for modbus rtu and ascii industrial control systems,

    T. Morris, R. Vaughn, and Y . Dandass, “A retrofit network intrusion detection system for modbus rtu and ascii industrial control systems,” in 2012 45th Hawaii International Conference on System Sciences . IEEE, 2012, pp. 2338–2345

  19. [19]

    A survey of intrusion detection systems in wireless sensor networks,

    I. Butun, S. D. Morgera, and R. Sankar, “A survey of intrusion detection systems in wireless sensor networks,” IEEE Communications Surveys Tutorials, vol. 16, no. 1, pp. 266–282, January 2014

  20. [20]

    Eaacka secure intrusion- detection system for manets,

    E. M. Shakshuki, N. Kang, and T. R. Sheltami, “Eaacka secure intrusion- detection system for manets,” IEEE Transactions on Industrial Electron- ics, vol. 60, no. 3, pp. 1089–1098, March 2013

  21. [21]

    Intrusion detection scheme using traffic predic- tion for wireless industrial networks,

    M. Wei and K. Kim, “Intrusion detection scheme using traffic predic- tion for wireless industrial networks,” Journal of Communications and Networks, vol. 14, no. 3, pp. 310–318, June 2012

  22. [22]

    An experimental study of hierarchical intrusion detection for wireless industrial sensor networks,

    S. Shin, T. Kwon, G.-Y . Jo, Y . Park, and H. Rhy, “An experimental study of hierarchical intrusion detection for wireless industrial sensor networks,” IEEE Transactions on Industrial Informatics , vol. 6, no. 4, pp. 744–757, 2010

  23. [23]

    Intrusion detection techniques for mobile wireless networks,

    Y . Zhang, W. Lee, and Y .-A. Huang, “Intrusion detection techniques for mobile wireless networks,” Wirel. Netw., vol. 9, no. 5, pp. 545–556, Sep. 2003

  24. [24]

    Anomaly detection in industrial networks using machine learning: A roadmap,

    A. Meshram and C. Haas, “Anomaly detection in industrial networks using machine learning: A roadmap,” in Machine Learning for Cyber Physical Systems , ser. Technologien f ¨ur die intelligente Automation, J. Beyerer, O. Niggemann, and C. K ¨uhnert, Eds. Berlin, Germany: Springer Vieweg, 2017, pp. 65–72

  25. [25]

    Towards learning normality for anomaly detection in industrial control networks,

    F. Schuster, A. Paul, and H. K ¨onig, “Towards learning normality for anomaly detection in industrial control networks,” in Emerging Management Mechanisms for the Future Internet , ser. Lecture Notes in Computer Science, G. Doyen, M. Waldburger, P. Celeda, and A. Sper- otto, Eds. Berlin/Heidelberg: Springer Berlin Heidelberg, 2013, vol. 7943, pp. 61–72

  26. [26]

    An evaluation of machine learning methods to detect malicious scada communications,

    J. M. Beaver, R. C. Borges-Hink, and M. A. Buckner, “An evaluation of machine learning methods to detect malicious scada communications,” in 2013 12th International Conference on Machine Learning and Ap- plications, vol. 2, Dec 2013, pp. 54–59

  27. [27]

    Intrusion detection using neural networks and support vector machines,

    S. Mukkamala, G. Janoski, and A. Sung, “Intrusion detection using neural networks and support vector machines,” in Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No.02CH37290), vol. 2, May 2002, pp. 1702–1707

  28. [28]

    Industrial control system simulation and data logging for intrusion detection system research,

    T. H. Morris, Z. Thornton, and I. Turnipseed, “Industrial control system simulation and data logging for intrusion detection system research,” 7th Annual Southeastern Cyber Security Summit , June 2015

  29. [29]

    (2019) Model and simulate fluid systems

    MathWorks. (2019) Model and simulate fluid systems. [Online]. Available: https://www.mathworks.com/products/simhydraulics.html

  30. [30]

    Implementing scada scenarios and introducing attacks to obtain training data for intrusion detection methods,

    S. Duque Anton, M. Gundall, D. Fraunholz, and H. D. Schotten, “Implementing scada scenarios and introducing attacks to obtain training data for intrusion detection methods,” in International Conference on Cyber Warfare and Security (ICCWS) , 2019

  31. [31]

    A training algorithm for optimal margin classifiers,

    B. E. Boser, I. M. Guyon, and V . N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the Fifth Annual Workshop on Computational Learning Theory , ser. COLT ’92, New York, NY , USA, 1992, pp. 144–152

  32. [32]

    Support-vector networks,

    C. Cortes and V . Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, September 1995

  33. [33]

    Random forests,

    L. Breiman, “Random forests,” Machine Learning , vol. 45, no. 1, pp. 5–32, 2001

  34. [34]

    Top-down induction of decision trees classifiersa survey,

    L. Rokach and O. Maimon, “Top-down induction of decision trees classifiersa survey,” IEEE Transactions on Systems, Man, and Cyber- netics, Part C (Applications and Reviews) , vol. 35, no. 4, pp. 476–487, November 2005

  35. [35]

    Network intrusion detection using random forests

    J. Zhang and M. Zulkernine, “Network intrusion detection using random forests.” in Pst. Citeseer, 2005

  36. [36]

    Multiple imputation for multivariate missing-data problems: A data analyst’s perspective,

    J. L. Schafer and M. K. Olsen, “Multiple imputation for multivariate missing-data problems: A data analyst’s perspective,” Multivariate be- havioral research, vol. 33, no. 4, pp. 545–571, 1998

  37. [37]

    The effects of the irregular sample and missing data in time series analysis,

    D. M. Kreindler and C. J. Lumsden, “The effects of the irregular sample and missing data in time series analysis,” in Nonlinear Dynamical Systems Analysis for the Behavioral Sciences Using Real Data . CRC Press, 2016, pp. 149–172

  38. [38]

    Svms modeling for highly imbalanced classification,

    Y . Tang, Y .-Q. Zhang, N. V . Chawla, and S. Krasser, “Svms modeling for highly imbalanced classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , vol. 39, no. 1, pp. 281– 288, 2009

  39. [39]

    C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling,

    C. Drummond, R. C. Holte et al. , “C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling,” in Workshop on learning from imbalanced datasets II , vol. 11. Citeseer, 2003, pp. 1–8

  40. [40]

    (2019) Tuning the hyper-parameters of an estimator

    scikit learn. (2019) Tuning the hyper-parameters of an estimator

  41. [41]

    Modern problems require modern solutions: Hybrid concepts for industrial intrusion de- tection,

    S. Duque Anton, M. Strufe, and H. D. Schotten, “Modern problems require modern solutions: Hybrid concepts for industrial intrusion de- tection,” in ITG-Fachtagung Mobilkommunikation - Technologien und Anwendungen (ITG-17) , Informationstechnische Gesellschaft im VDE (ITG). VDE Verlag GmbH, May 2019

  42. [42]

    Putting together the pieces: A concept for holistic industrial intrusion detection,

    S. Duque Anton and H. D. Schotten, “Putting together the pieces: A concept for holistic industrial intrusion detection,” in 18th European Conference on Cyber Warfare and Security (ECCWS) , ACPI. ACPI, July 2019

  43. [43]

    Using temporal and topological features for intrusion detection in operational networks,

    S. Duque Anton, D. Fraunholz, and H. D. Schotten, “Using temporal and topological features for intrusion detection in operational networks,” in ARES ’19: Proceedings of the 13th International Conference on Availability, Reliability and Security , ACM. ACM, August 2019