Detection of Anomalous Network Nodes via Hierarchical Prediction and Extreme Value Theory
Pith reviewed 2026-05-24 09:07 UTC · model grok-4.3
The pith
A two-stage method using hierarchical time series prediction of ARP calls followed by extreme value theory flags anomalous network nodes while cutting false positives.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Modelling ARP call behaviour via hierarchical time series prediction methods and then exploiting Extreme Value Theory to decide whether deviations are anomalous produces considerably fewer false positives than existing approaches when evaluated on a real-life dataset of over 10M ARP calls from 362 nodes.
What carries the argument
Two-stage pipeline that first generates hierarchical time series forecasts of ARP behaviour and then applies extreme value theory thresholds to the resulting residuals.
If this is right
- Anomalous nodes can be identified from their ARP patterns even when malware has already bypassed signature checks.
- Heavy-tailed internet traffic distributions are handled directly by the extreme value theory stage rather than by ad-hoc rules.
- Security teams receive fewer alerts, directly reducing the alert fatigue reported by professionals.
- The same two-stage structure can be applied to any network protocol that produces count-based time series.
Where Pith is reading between the lines
- The approach might be tested on other industrial protocols such as Modbus or DNP3 to see whether the same residual properties appear.
- Real-time deployment would require checking how often the hierarchical forecasts need retraining as network topology changes.
- Combining the output with node metadata such as device type could further lower the remaining false positives.
- Synthetic injection of known anomalies into the dataset would provide a controlled check on the extreme value theory thresholds.
Load-bearing premise
The residuals left by the hierarchical time series predictions of ARP behaviour follow heavy-tailed distributions that extreme value theory can reliably threshold to separate normal from anomalous activity.
What would settle it
Applying the method to the 10M+ ARP call dataset and obtaining no measurable drop in false positives relative to a non-EVT baseline would show the central claim does not hold.
Figures
read the original abstract
Continuously evolving cyber-attacks against industrial networks reduce the effectiveness of signature-based detection methods. Once malware has infiltrated a network (for example, entering via an unsecured device), it can infect further network nodes and carry out malicious activity. Infected nodes can exhibit unusual behaviour in their use of Address Resolution Protocol (ARP) calls within the network. In order to detect such anomalous nodes, we propose a two-stage method: (i) modelling of ARP call behaviour via hierarchical time series prediction methods, and (ii) exploiting Extreme Value Theory (EVT) to robustly detect whether deviations from expected behaviour are anomalous. EVT is able to handle heavy-tailed distributions which are exhibited by internet traffic. Empirical evaluations on a real-life dataset containing over 10M ARP calls from 362 nodes show that the proposed method results in considerably reduced number of false positives, addressing the problem of alert fatigue commonly reported by security professionals.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a two-stage anomaly detection method for industrial networks: (i) hierarchical time-series models to predict normal ARP call behavior per node, and (ii) Extreme Value Theory applied to the resulting residuals to set thresholds for anomalous deviations. The central empirical claim is that this yields considerably fewer false positives than alternatives when evaluated on a real dataset of >10M ARP calls from 362 nodes, thereby mitigating alert fatigue.
Significance. If the empirical results and EVT assumptions can be rigorously validated, the work offers a practical combination of hierarchical forecasting and extreme-value thresholding for a domain where heavy-tailed traffic is common. The approach directly targets a known operational pain point (alert fatigue) using standard statistical tools rather than purely data-driven black-box models.
major comments (3)
- [Abstract / empirical evaluation] Abstract and empirical evaluation section: the headline claim of 'considerably reduced number of false positives' is presented without any quantitative metrics (e.g., false-positive rates, precision-recall values), baseline comparisons, or description of how ground-truth anomalies were established on the 10M-call dataset. This absence leaves the central performance assertion unsupported.
- [EVT application / residual analysis] Section describing the EVT stage: no QQ-plots, Anderson-Darling or Cramér-von Mises tests, nor fitted GPD shape/scale parameters are reported for the residuals after hierarchical prediction. Without such diagnostics it is impossible to verify that the residuals are approximately stationary and exhibit the heavy tails required for EVT thresholding to be theoretically justified rather than an arbitrary quantile.
- [Method / hierarchical prediction] Method description: details on how the hierarchical time-series models are fitted (choice of hierarchy levels, forecasting horizon, residual extraction) and how EVT parameters are selected are not provided, making reproducibility and sensitivity analysis impossible.
minor comments (2)
- [Method] Notation for the hierarchical levels and the precise definition of the residual process should be introduced with a small diagram or explicit equations to improve clarity.
- [Abstract / introduction] The abstract states that 'internet traffic is heavy-tailed' but does not cite the specific literature or dataset characteristics that justify this for ARP traffic in the target industrial setting.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We agree that additional quantitative details, diagnostics, and methodological specifications will strengthen the manuscript and address concerns about unsupported claims and reproducibility. We will revise the paper accordingly.
read point-by-point responses
-
Referee: [Abstract / empirical evaluation] Abstract and empirical evaluation section: the headline claim of 'considerably reduced number of false positives' is presented without any quantitative metrics (e.g., false-positive rates, precision-recall values), baseline comparisons, or description of how ground-truth anomalies were established on the 10M-call dataset. This absence leaves the central performance assertion unsupported.
Authors: We acknowledge that the abstract and evaluation section would benefit from explicit quantitative metrics and clearer description of the evaluation protocol. The real-world dataset is unlabeled, as is typical for operational network traffic; we therefore evaluate via direct comparison of alert volumes against baselines (e.g., per-node EVT without hierarchy, simple thresholding) while validating detected anomalies through post-hoc expert review of a sample of flagged nodes. We will revise the abstract to report specific false-positive reductions (e.g., X% fewer alerts) and expand the empirical section with baseline tables and evaluation details. revision: yes
-
Referee: [EVT application / residual analysis] Section describing the EVT stage: no QQ-plots, Anderson-Darling or Cramér-von Mises tests, nor fitted GPD shape/scale parameters are reported for the residuals after hierarchical prediction. Without such diagnostics it is impossible to verify that the residuals are approximately stationary and exhibit the heavy tails required for EVT thresholding to be theoretically justified rather than an arbitrary quantile.
Authors: We agree that formal diagnostics are needed to justify the EVT application. The residuals exhibit the expected heavy tails due to the nature of ARP traffic, but the original submission omitted the requested visualizations and tests. In revision we will include QQ-plots of the residuals, Anderson-Darling and Cramér-von Mises goodness-of-fit results, and the estimated GPD shape and scale parameters to confirm the modeling assumptions. revision: yes
-
Referee: [Method / hierarchical prediction] Method description: details on how the hierarchical time-series models are fitted (choice of hierarchy levels, forecasting horizon, residual extraction) and how EVT parameters are selected are not provided, making reproducibility and sensitivity analysis impossible.
Authors: We accept that the method section requires more explicit specification for reproducibility. The hierarchy follows the network topology (node level, subnet aggregation, and global), forecasts are one-step ahead, and residuals are computed as observed minus predicted call counts. EVT parameters are fit by maximum-likelihood on exceedances above a high quantile. We will expand the method section with these choices, pseudocode, and parameter-selection procedure in the revised manuscript. revision: yes
Circularity Check
No circularity: standard two-stage application of forecasting + EVT to observed data
full rationale
The paper's chain is (1) fit hierarchical time-series models to ARP counts per node, (2) compute residuals, (3) apply EVT (GPD) thresholds to flag extremes. None of these steps is defined in terms of the output it produces, nor does any 'prediction' reduce to a fitted parameter by construction. The central empirical claim rests on external 10 M-call dataset performance rather than self-citation or ansatz smuggling. No uniqueness theorems or prior-author results are invoked as load-bearing. This is the normal non-circular case of applying established statistical tools to new data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Prediction residuals from hierarchical ARP models follow heavy-tailed distributions suitable for EVT
Reference graph
Works this paper leans on
-
[1]
Cyber threat intelligence from honeypot data using elasticsearch. In Proceedings - International Conference on Advanced Information Networking and Ap- plications, AINA, V ol. 2018-May. 900–906. https://doi.org/10.1109/AINA.2018. 00132 Zoltán Balogh, Štefan Koprda, and Jan Francisti
-
[2]
LAN security analysis and design. In 2018 IEEE 12th International Conference on Application of Information and Commu- nication Technologies (AICT). IEEE, 1–6. Tao Ban, Ndichu Samuel, Takeshi Takahashi, and Daisuke Inoue
work page 2018
-
[3]
In ACM International Conference Proceeding Series
Combat Security Alert Fatigue with AI-Assisted Techniques. In ACM International Conference Proceeding Series. 9–16. https://doi.org/10.1145/3474718.3474723 Muhammet Baykara and Resul Das
-
[4]
Journal of Information Security and Applications 41 (2018), 103–116
A novel honeypot based security approach for real-time intrusion detection and prevention systems. Journal of Information Security and Applications 41 (2018), 103–116. https://doi.org/10.1016/j.jisa.2018.06.004 Jarosław Bernacki and Grzegorz Kołaczek
-
[5]
International Journal of Computer Network and Information Security 7, 9 (2015), 10–18
Anomaly detection in network traffic using selected methods of time series analysis. International Journal of Computer Network and Information Security 7, 9 (2015), 10–18. Tristan Carrier, Princy Victor, Ali Tekeoglu, and Arash Lashkari
work page 2015
-
[6]
In Proceedings of the 4th International Conference on Big Data Engineering
Feature Extraction Pipeline and Analysis of Suspicious Events in Large-Scale LANs for Cyberattack Cate- gorization. In Proceedings of the 4th International Conference on Big Data Engineering . 104–112. Stuart Coles, Joanna Bawa, Lesley Trenner, and Pat Dorazio. 2001.An introduction to statis- tical modeling of extreme values. V ol
work page 2001
-
[7]
https://doi.org/10.1109/SECON.2016.7506644 Mingjian Cui, Jianhui Wang, and Meng Yue
1–8. https://doi.org/10.1109/SECON.2016.7506644 Mingjian Cui, Jianhui Wang, and Meng Yue
-
[8]
IEEE Transactions on Smart Grid 10, 5 (2019), 5724–5734
Machine learning-based anomaly de- tection for load forecasting under cyberattacks. IEEE Transactions on Smart Grid 10, 5 (2019), 5724–5734. 30 Marina Evangelou and Niall M Adams
work page 2019
-
[9]
Computer Fraud and Security 2009, 11 (2009), 7–11
Recognising and addressing ’security fa- tigue’. Computer Fraud and Security 2009, 11 (2009), 7–11. https://doi.org/10. 1016/S1361-3723(09)70139-3 Akash Garg and Prachi Maheshwari
work page 2009
-
[10]
Performance analysis of snort-based intrusion detection system. In 2016 3rd international conference on advanced computing and com- munication systems (icaccs), V ol
work page 2016
-
[11]
Performance Evaluation 58, 2-3 (2004), 261–284
Variable heavy tails in Internet traffic. Performance Evaluation 58, 2-3 (2004), 261–284. https://doi.org/10.1016/j.peva.2004.07.008 Kate Hignam, Kai Arulkumaran, Zachary Hanif, and Jennings Nicholas R
-
[12]
BETH Dataset: Real Cybersecurity Data for Anomaly Detection Research. InICML Workshop on Uncertainty and Robustness in Deep Learning 2021 and Conference on Applied Machine Learning for Information Security. ICML2021. Rob J Hyndman and George Athanasopoulos
work page 2021
-
[13]
Procedia com- puter science 22 (2013), 810–819
Dynamic isolation of network devices using OpenFlow for keeping LAN secure from intra-LAN attack. Procedia com- puter science 22 (2013), 810–819. Sevvandi Kandanaarachchi and Rob J Hyndman
work page 2013
-
[14]
Journal of Computational and Graphical Statistics 31, 2 (2022), 586–599
Leave-One-Out Kernel Density Es- timates for Outlier Detection. Journal of Computational and Graphical Statistics 31, 2 (2022), 586–599. https://doi.org/10.1080/10618600.2021.2000425 31 Sevvandi Kandanarrchchi, Hideya Ochiai, and Asha Rao
-
[15]
Expert Systems with Applica- tions 201 (2022)
Honeyboost: Boosting hon- eypot performance with data fusion and anomaly detection. Expert Systems with Applica- tions 201 (2022). Meatasit Karakate, Hiroshi Esaki, and Hideya Ochiai
work page 2022
-
[16]
In 2021 the 11th In- ternational Conference on Communication and Network Security
SDNHive: A Proof-of-Concept SDN and Honeypot System for Defending Against Internal Threats. In 2021 the 11th In- ternational Conference on Communication and Network Security. 9–20. Nattawat Khamphakdee, Nunnapus Benjamas, and Saiyan Saiyod
work page 2021
-
[17]
In 2014 2nd In- ternational Conference on Information and Communication Technology (ICoICT)
Improving intrusion detection system based on snort rules for network probe attack detection. In 2014 2nd In- ternational Conference on Information and Communication Technology (ICoICT) . IEEE, 69–74. Ansam Khraisat, Iqbal Gondal, Peter Vamplew, and Joarder Kamruzzaman
work page 2014
-
[18]
Cybersecurity 2, 1 (2019), 1–22
Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2, 1 (2019), 1–22. Timo Kiravuo, Mikko Sarela, and Jukka Manner
work page 2019
-
[19]
IEEE Communications Surveys & Tutorials 15, 3 (2013), 1477–1491
A survey of Ethernet LAN security. IEEE Communications Surveys & Tutorials 15, 3 (2013), 1477–1491. Wilkinson L
work page 2013
-
[20]
Visualizing Big Data Outliers Through Distributed Aggregation,
“Visualizing Big Data Outliers Through Distributed Aggregation,”.IEEE Transactions on Visualization and Computer Graphics24 (2018),
work page 2018
-
[21]
In 2010 International conference on computer, mechatronics, control and electronic engineering, V ol
Research on intelligent intrusion prevention system based on snort. In 2010 International conference on computer, mechatronics, control and electronic engineering, V ol
work page 2010
-
[22]
In 2011 IEEE 2nd International Conference on Software Engineering and Service Science
The research and design of honeypot system applied in the LAN security. In 2011 IEEE 2nd International Conference on Software Engineering and Service Science. IEEE, 360–363. Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner
work page 2011
-
[23]
ACM SIGCOMM computer communication review 38, 2 (2008), 69–
OpenFlow: enabling innovation in campus networks. ACM SIGCOMM computer communication review 38, 2 (2008), 69–
work page 2008
-
[24]
In 2015 Military Commu- nications and Information Systems Conference (MilCIS)
UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In 2015 Military Commu- nications and Information Systems Conference (MilCIS) . 1–6. https://doi.org/10. 1109/MilCIS.2015.7348942 Seung Yeob Nam, Dongwon Kim, and Jeongeun Kim
-
[25]
IEEE communications letters 14, 2 (2010), 187–189
Enhanced ARP: preventing ARP poisoning-based man-in-the-middle attacks. IEEE communications letters 14, 2 (2010), 187–189. Hideya Ochiai
work page 2010
-
[26]
IEICE Technical Report; IEICE Tech
LAN-security monitoring project. IEICE Technical Report; IEICE Tech. Rep. 120, 19 (2018), 27–38. Vaidyanathan Ramaswami, Kaustubh Jain, Rittwik Jana, and Vaneet Aggarwal
work page 2018
-
[27]
Advances in Intelligent Systems and Computing 246 (2014), 23–44
Mod- eling heavy tails in traffic sources for network performance evaluation. Advances in Intelligent Systems and Computing 246 (2014), 23–44. https://doi.org/10.1007/ 978-81-322-1680-3_4 Martin Roesch et al
work page 2014
-
[28]
IEEE Open Journal of the Communications Society 2 (2020), 102–112
Adaptive intrusion detection in the networking of large-scale LANs with segmented federated learning. IEEE Open Journal of the Communications Society 2 (2020), 102–112. Priyanga Dilini Talagala, Rob J. Hyndman, and Kate Smith-Miles
work page 2020
-
[29]
(1994): Diagnostic plots for one-dimensional data
Anomaly De- tection in High-Dimensional Data. Journal of Computational and Graphical Statis- tics 30, 2 (2021), 360–374. https://doi.org/10.1080/10618600.2020.1807997 arXiv:1908.04000 Sean Whalen
-
[30]
Node99 [Online Document] (2001)
An introduction to ARP spoofing. Node99 [Online Document] (2001). Shanika L Wickramasuriya, George Athanasopoulos, and Rob J Hyndman
work page 2001
-
[31]
Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. J. Amer. Statist. Assoc.114, 526 (2019), 804–819. 33 S. Withers
work page 2019
-
[32]
Emrah Yasasin, Julian Prester, Gerit Wagner, and Guido Schryen
Optus breach casts spotlight on cyber resilience.Computer Weekly(2022). Emrah Yasasin, Julian Prester, Gerit Wagner, and Guido Schryen
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.