pith. sign in

arxiv: 2304.13941 · v2 · submitted 2023-04-27 · 💻 cs.CR

Detection of Anomalous Network Nodes via Hierarchical Prediction and Extreme Value Theory

Pith reviewed 2026-05-24 09:07 UTC · model grok-4.3

classification 💻 cs.CR
keywords anomaly detectionARP protocolhierarchical time seriesextreme value theorynetwork securityfalse positivesindustrial networksalert fatigue
0
0 comments X

The pith

A two-stage method using hierarchical time series prediction of ARP calls followed by extreme value theory flags anomalous network nodes while cutting false positives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that predicting normal ARP call patterns with hierarchical time series models and then applying extreme value theory to the residuals can separate routine variation from malicious deviations in industrial networks. This matters because once malware enters via one device it can spread by altering how nodes request addresses, and signature methods no longer keep up with changing attacks. A reader would care that the approach is tested on more than ten million real ARP records from 362 nodes and produces markedly fewer alerts than standard techniques. If the claim holds, operators gain a way to monitor traffic without constant overload from spurious warnings. The work focuses on the practical problem of alert fatigue rather than abstract detection rates.

Core claim

Modelling ARP call behaviour via hierarchical time series prediction methods and then exploiting Extreme Value Theory to decide whether deviations are anomalous produces considerably fewer false positives than existing approaches when evaluated on a real-life dataset of over 10M ARP calls from 362 nodes.

What carries the argument

Two-stage pipeline that first generates hierarchical time series forecasts of ARP behaviour and then applies extreme value theory thresholds to the resulting residuals.

If this is right

  • Anomalous nodes can be identified from their ARP patterns even when malware has already bypassed signature checks.
  • Heavy-tailed internet traffic distributions are handled directly by the extreme value theory stage rather than by ad-hoc rules.
  • Security teams receive fewer alerts, directly reducing the alert fatigue reported by professionals.
  • The same two-stage structure can be applied to any network protocol that produces count-based time series.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach might be tested on other industrial protocols such as Modbus or DNP3 to see whether the same residual properties appear.
  • Real-time deployment would require checking how often the hierarchical forecasts need retraining as network topology changes.
  • Combining the output with node metadata such as device type could further lower the remaining false positives.
  • Synthetic injection of known anomalies into the dataset would provide a controlled check on the extreme value theory thresholds.

Load-bearing premise

The residuals left by the hierarchical time series predictions of ARP behaviour follow heavy-tailed distributions that extreme value theory can reliably threshold to separate normal from anomalous activity.

What would settle it

Applying the method to the 10M+ ARP call dataset and obtaining no measurable drop in false positives relative to a non-EVT baseline would show the central claim does not hold.

Figures

Figures reproduced from arXiv: 2304.13941 by Asha Rao, Conrad Sanderson, Hideya Ochiai, Mahdi Abolghasemi, Sevvandi Kandanaarachchi.

Figure 1
Figure 1. Figure 1: ARP scan calls made by LAN-internal malware are not visible to Conventional NADS, as these observe only incoming/outgoing (such as TCP/UDP) traffic. A LAN-security monitoring device attached to an internal LAN can observe this be￾haviour and protect connected devices, especially IoT. the LAN and could siphon privileged data if not detected early. Furthermore, as [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Places where intrusion detection systems can be deployed in LANs LAN intrusion detection literature tends to focus on mechanisms that can be deployed at the external gateway. However, as shown in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: gives the two-level hierarchical time series. In this hierarchy, 𝑌𝑡 is the total value of nodes at time 𝑡, 𝑌𝐴,𝑡 is the value of node 𝐴 at time 𝑡, and 𝑌𝐵,𝑡 is the value of node 𝐵 at time 𝑡 [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Different tail behaviour. Weibull distribution has a truncated tail. Gumbell has an exponentially decaying tail and Frechet has a fatter tail. Fréchet: 𝐺(𝑥) =    0 , 𝑥 ≤ 𝑏 exp  −  𝑥−𝑏 𝑎 −𝛼  , 𝑥 > 𝑏 Weibull: 𝐺(𝑥) =    exp  −  − [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Types of intrusion detection datasets 4.1 Dataset This research uses data from the LAN-Security monitoring project (Sun et al., 2020; Ochiai, 2018), a research collaboration led by Japan and involving 12 ASEAN and SAARC countries. Deployed in late 2018, the research project aimed to improve cyber-readiness and cyber￾resilience among the partners. The dataset used in this paper was generated by deploying a … view at source ↗
Figure 6
Figure 6. Figure 6: The LAN monitoring device. This is connected to a LAN as a host – not to a mirror port of the switch. This easy-installation design for monitoring suspicious activities is important especially in ASEAN and SAARC countries, where security incidents are very common because of the lack of detection/protection infrastruc￾ture. the Linux tcpdump command. The captured broadcast traffic includes address resolutio… view at source ↗
Figure 7
Figure 7. Figure 7: ARP calls of 8 representative nodes (out of over 300 nodes) in the LAN data col￾lected as noted in Section 4.1 in nodes. We report the accuracy of methods here to merely depict the normal behaviour of forecasting models in predicting the pattern of signals [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: ETS hourly residuals for 4 nodes in the LAN [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: ETS Residuals by hour with weeks shown in dotted lines. expected when many points are identified as belonging to the positive class. Even though ETS and TSLM have lower recall values compared to the autoencoder, from [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of results using an autoencoder, ETS-lookout, LightGBM-lookout, TSLM-lookout and Zeroinflated-lookout using BU as hierarchical forecasting method [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of methods using MinT for hierarchical forecasting. Autoencoder used as a comparison method [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Number of anomalies by hour - autoencoder gives many more anomalies (a) ETS (b) TSLM (c) LightGBM (d) Zero-inflated [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Anomalies over time using ETS, TSLM, LightGBM and Zero-inflated models 26 [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗
read the original abstract

Continuously evolving cyber-attacks against industrial networks reduce the effectiveness of signature-based detection methods. Once malware has infiltrated a network (for example, entering via an unsecured device), it can infect further network nodes and carry out malicious activity. Infected nodes can exhibit unusual behaviour in their use of Address Resolution Protocol (ARP) calls within the network. In order to detect such anomalous nodes, we propose a two-stage method: (i) modelling of ARP call behaviour via hierarchical time series prediction methods, and (ii) exploiting Extreme Value Theory (EVT) to robustly detect whether deviations from expected behaviour are anomalous. EVT is able to handle heavy-tailed distributions which are exhibited by internet traffic. Empirical evaluations on a real-life dataset containing over 10M ARP calls from 362 nodes show that the proposed method results in considerably reduced number of false positives, addressing the problem of alert fatigue commonly reported by security professionals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a two-stage anomaly detection method for industrial networks: (i) hierarchical time-series models to predict normal ARP call behavior per node, and (ii) Extreme Value Theory applied to the resulting residuals to set thresholds for anomalous deviations. The central empirical claim is that this yields considerably fewer false positives than alternatives when evaluated on a real dataset of >10M ARP calls from 362 nodes, thereby mitigating alert fatigue.

Significance. If the empirical results and EVT assumptions can be rigorously validated, the work offers a practical combination of hierarchical forecasting and extreme-value thresholding for a domain where heavy-tailed traffic is common. The approach directly targets a known operational pain point (alert fatigue) using standard statistical tools rather than purely data-driven black-box models.

major comments (3)
  1. [Abstract / empirical evaluation] Abstract and empirical evaluation section: the headline claim of 'considerably reduced number of false positives' is presented without any quantitative metrics (e.g., false-positive rates, precision-recall values), baseline comparisons, or description of how ground-truth anomalies were established on the 10M-call dataset. This absence leaves the central performance assertion unsupported.
  2. [EVT application / residual analysis] Section describing the EVT stage: no QQ-plots, Anderson-Darling or Cramér-von Mises tests, nor fitted GPD shape/scale parameters are reported for the residuals after hierarchical prediction. Without such diagnostics it is impossible to verify that the residuals are approximately stationary and exhibit the heavy tails required for EVT thresholding to be theoretically justified rather than an arbitrary quantile.
  3. [Method / hierarchical prediction] Method description: details on how the hierarchical time-series models are fitted (choice of hierarchy levels, forecasting horizon, residual extraction) and how EVT parameters are selected are not provided, making reproducibility and sensitivity analysis impossible.
minor comments (2)
  1. [Method] Notation for the hierarchical levels and the precise definition of the residual process should be introduced with a small diagram or explicit equations to improve clarity.
  2. [Abstract / introduction] The abstract states that 'internet traffic is heavy-tailed' but does not cite the specific literature or dataset characteristics that justify this for ARP traffic in the target industrial setting.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that additional quantitative details, diagnostics, and methodological specifications will strengthen the manuscript and address concerns about unsupported claims and reproducibility. We will revise the paper accordingly.

read point-by-point responses
  1. Referee: [Abstract / empirical evaluation] Abstract and empirical evaluation section: the headline claim of 'considerably reduced number of false positives' is presented without any quantitative metrics (e.g., false-positive rates, precision-recall values), baseline comparisons, or description of how ground-truth anomalies were established on the 10M-call dataset. This absence leaves the central performance assertion unsupported.

    Authors: We acknowledge that the abstract and evaluation section would benefit from explicit quantitative metrics and clearer description of the evaluation protocol. The real-world dataset is unlabeled, as is typical for operational network traffic; we therefore evaluate via direct comparison of alert volumes against baselines (e.g., per-node EVT without hierarchy, simple thresholding) while validating detected anomalies through post-hoc expert review of a sample of flagged nodes. We will revise the abstract to report specific false-positive reductions (e.g., X% fewer alerts) and expand the empirical section with baseline tables and evaluation details. revision: yes

  2. Referee: [EVT application / residual analysis] Section describing the EVT stage: no QQ-plots, Anderson-Darling or Cramér-von Mises tests, nor fitted GPD shape/scale parameters are reported for the residuals after hierarchical prediction. Without such diagnostics it is impossible to verify that the residuals are approximately stationary and exhibit the heavy tails required for EVT thresholding to be theoretically justified rather than an arbitrary quantile.

    Authors: We agree that formal diagnostics are needed to justify the EVT application. The residuals exhibit the expected heavy tails due to the nature of ARP traffic, but the original submission omitted the requested visualizations and tests. In revision we will include QQ-plots of the residuals, Anderson-Darling and Cramér-von Mises goodness-of-fit results, and the estimated GPD shape and scale parameters to confirm the modeling assumptions. revision: yes

  3. Referee: [Method / hierarchical prediction] Method description: details on how the hierarchical time-series models are fitted (choice of hierarchy levels, forecasting horizon, residual extraction) and how EVT parameters are selected are not provided, making reproducibility and sensitivity analysis impossible.

    Authors: We accept that the method section requires more explicit specification for reproducibility. The hierarchy follows the network topology (node level, subnet aggregation, and global), forecasts are one-step ahead, and residuals are computed as observed minus predicted call counts. EVT parameters are fit by maximum-likelihood on exceedances above a high quantile. We will expand the method section with these choices, pseudocode, and parameter-selection procedure in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: standard two-stage application of forecasting + EVT to observed data

full rationale

The paper's chain is (1) fit hierarchical time-series models to ARP counts per node, (2) compute residuals, (3) apply EVT (GPD) thresholds to flag extremes. None of these steps is defined in terms of the output it produces, nor does any 'prediction' reduce to a fitted parameter by construction. The central empirical claim rests on external 10 M-call dataset performance rather than self-citation or ansatz smuggling. No uniqueness theorems or prior-author results are invoked as load-bearing. This is the normal non-circular case of applying established statistical tools to new data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that prediction errors admit EVT modeling; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Prediction residuals from hierarchical ARP models follow heavy-tailed distributions suitable for EVT
    Abstract states that EVT is used because internet traffic exhibits heavy-tailed distributions.

pith-pipeline@v0.9.0 · 5699 in / 1270 out tokens · 73139 ms · 2026-05-24T09:07:31.154395+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    In Proceedings - International Conference on Advanced Information Networking and Ap- plications, AINA, V ol

    Cyber threat intelligence from honeypot data using elasticsearch. In Proceedings - International Conference on Advanced Information Networking and Ap- plications, AINA, V ol. 2018-May. 900–906. https://doi.org/10.1109/AINA.2018. 00132 Zoltán Balogh, Štefan Koprda, and Jan Francisti

  2. [2]

    In 2018 IEEE 12th International Conference on Application of Information and Commu- nication Technologies (AICT)

    LAN security analysis and design. In 2018 IEEE 12th International Conference on Application of Information and Commu- nication Technologies (AICT). IEEE, 1–6. Tao Ban, Ndichu Samuel, Takeshi Takahashi, and Daisuke Inoue

  3. [3]

    In ACM International Conference Proceeding Series

    Combat Security Alert Fatigue with AI-Assisted Techniques. In ACM International Conference Proceeding Series. 9–16. https://doi.org/10.1145/3474718.3474723 Muhammet Baykara and Resul Das

  4. [4]

    Journal of Information Security and Applications 41 (2018), 103–116

    A novel honeypot based security approach for real-time intrusion detection and prevention systems. Journal of Information Security and Applications 41 (2018), 103–116. https://doi.org/10.1016/j.jisa.2018.06.004 Jarosław Bernacki and Grzegorz Kołaczek

  5. [5]

    International Journal of Computer Network and Information Security 7, 9 (2015), 10–18

    Anomaly detection in network traffic using selected methods of time series analysis. International Journal of Computer Network and Information Security 7, 9 (2015), 10–18. Tristan Carrier, Princy Victor, Ali Tekeoglu, and Arash Lashkari

  6. [6]

    In Proceedings of the 4th International Conference on Big Data Engineering

    Feature Extraction Pipeline and Analysis of Suspicious Events in Large-Scale LANs for Cyberattack Cate- gorization. In Proceedings of the 4th International Conference on Big Data Engineering . 104–112. Stuart Coles, Joanna Bawa, Lesley Trenner, and Pat Dorazio. 2001.An introduction to statis- tical modeling of extreme values. V ol

  7. [7]

    https://doi.org/10.1109/SECON.2016.7506644 Mingjian Cui, Jianhui Wang, and Meng Yue

    1–8. https://doi.org/10.1109/SECON.2016.7506644 Mingjian Cui, Jianhui Wang, and Meng Yue

  8. [8]

    IEEE Transactions on Smart Grid 10, 5 (2019), 5724–5734

    Machine learning-based anomaly de- tection for load forecasting under cyberattacks. IEEE Transactions on Smart Grid 10, 5 (2019), 5724–5734. 30 Marina Evangelou and Niall M Adams

  9. [9]

    Computer Fraud and Security 2009, 11 (2009), 7–11

    Recognising and addressing ’security fa- tigue’. Computer Fraud and Security 2009, 11 (2009), 7–11. https://doi.org/10. 1016/S1361-3723(09)70139-3 Akash Garg and Prachi Maheshwari

  10. [10]

    In 2016 3rd international conference on advanced computing and com- munication systems (icaccs), V ol

    Performance analysis of snort-based intrusion detection system. In 2016 3rd international conference on advanced computing and com- munication systems (icaccs), V ol

  11. [11]

    Performance Evaluation 58, 2-3 (2004), 261–284

    Variable heavy tails in Internet traffic. Performance Evaluation 58, 2-3 (2004), 261–284. https://doi.org/10.1016/j.peva.2004.07.008 Kate Hignam, Kai Arulkumaran, Zachary Hanif, and Jennings Nicholas R

  12. [12]

    InICML Workshop on Uncertainty and Robustness in Deep Learning 2021 and Conference on Applied Machine Learning for Information Security

    BETH Dataset: Real Cybersecurity Data for Anomaly Detection Research. InICML Workshop on Uncertainty and Robustness in Deep Learning 2021 and Conference on Applied Machine Learning for Information Security. ICML2021. Rob J Hyndman and George Athanasopoulos

  13. [13]

    Procedia com- puter science 22 (2013), 810–819

    Dynamic isolation of network devices using OpenFlow for keeping LAN secure from intra-LAN attack. Procedia com- puter science 22 (2013), 810–819. Sevvandi Kandanaarachchi and Rob J Hyndman

  14. [14]

    Journal of Computational and Graphical Statistics 31, 2 (2022), 586–599

    Leave-One-Out Kernel Density Es- timates for Outlier Detection. Journal of Computational and Graphical Statistics 31, 2 (2022), 586–599. https://doi.org/10.1080/10618600.2021.2000425 31 Sevvandi Kandanarrchchi, Hideya Ochiai, and Asha Rao

  15. [15]

    Expert Systems with Applica- tions 201 (2022)

    Honeyboost: Boosting hon- eypot performance with data fusion and anomaly detection. Expert Systems with Applica- tions 201 (2022). Meatasit Karakate, Hiroshi Esaki, and Hideya Ochiai

  16. [16]

    In 2021 the 11th In- ternational Conference on Communication and Network Security

    SDNHive: A Proof-of-Concept SDN and Honeypot System for Defending Against Internal Threats. In 2021 the 11th In- ternational Conference on Communication and Network Security. 9–20. Nattawat Khamphakdee, Nunnapus Benjamas, and Saiyan Saiyod

  17. [17]

    In 2014 2nd In- ternational Conference on Information and Communication Technology (ICoICT)

    Improving intrusion detection system based on snort rules for network probe attack detection. In 2014 2nd In- ternational Conference on Information and Communication Technology (ICoICT) . IEEE, 69–74. Ansam Khraisat, Iqbal Gondal, Peter Vamplew, and Joarder Kamruzzaman

  18. [18]

    Cybersecurity 2, 1 (2019), 1–22

    Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2, 1 (2019), 1–22. Timo Kiravuo, Mikko Sarela, and Jukka Manner

  19. [19]

    IEEE Communications Surveys & Tutorials 15, 3 (2013), 1477–1491

    A survey of Ethernet LAN security. IEEE Communications Surveys & Tutorials 15, 3 (2013), 1477–1491. Wilkinson L

  20. [20]

    Visualizing Big Data Outliers Through Distributed Aggregation,

    “Visualizing Big Data Outliers Through Distributed Aggregation,”.IEEE Transactions on Visualization and Computer Graphics24 (2018),

  21. [21]

    In 2010 International conference on computer, mechatronics, control and electronic engineering, V ol

    Research on intelligent intrusion prevention system based on snort. In 2010 International conference on computer, mechatronics, control and electronic engineering, V ol

  22. [22]

    In 2011 IEEE 2nd International Conference on Software Engineering and Service Science

    The research and design of honeypot system applied in the LAN security. In 2011 IEEE 2nd International Conference on Software Engineering and Service Science. IEEE, 360–363. Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner

  23. [23]

    ACM SIGCOMM computer communication review 38, 2 (2008), 69–

    OpenFlow: enabling innovation in campus networks. ACM SIGCOMM computer communication review 38, 2 (2008), 69–

  24. [24]

    In 2015 Military Commu- nications and Information Systems Conference (MilCIS)

    UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In 2015 Military Commu- nications and Information Systems Conference (MilCIS) . 1–6. https://doi.org/10. 1109/MilCIS.2015.7348942 Seung Yeob Nam, Dongwon Kim, and Jeongeun Kim

  25. [25]

    IEEE communications letters 14, 2 (2010), 187–189

    Enhanced ARP: preventing ARP poisoning-based man-in-the-middle attacks. IEEE communications letters 14, 2 (2010), 187–189. Hideya Ochiai

  26. [26]

    IEICE Technical Report; IEICE Tech

    LAN-security monitoring project. IEICE Technical Report; IEICE Tech. Rep. 120, 19 (2018), 27–38. Vaidyanathan Ramaswami, Kaustubh Jain, Rittwik Jana, and Vaneet Aggarwal

  27. [27]

    Advances in Intelligent Systems and Computing 246 (2014), 23–44

    Mod- eling heavy tails in traffic sources for network performance evaluation. Advances in Intelligent Systems and Computing 246 (2014), 23–44. https://doi.org/10.1007/ 978-81-322-1680-3_4 Martin Roesch et al

  28. [28]

    IEEE Open Journal of the Communications Society 2 (2020), 102–112

    Adaptive intrusion detection in the networking of large-scale LANs with segmented federated learning. IEEE Open Journal of the Communications Society 2 (2020), 102–112. Priyanga Dilini Talagala, Rob J. Hyndman, and Kate Smith-Miles

  29. [29]

    (1994): Diagnostic plots for one-dimensional data

    Anomaly De- tection in High-Dimensional Data. Journal of Computational and Graphical Statis- tics 30, 2 (2021), 360–374. https://doi.org/10.1080/10618600.2020.1807997 arXiv:1908.04000 Sean Whalen

  30. [30]

    Node99 [Online Document] (2001)

    An introduction to ARP spoofing. Node99 [Online Document] (2001). Shanika L Wickramasuriya, George Athanasopoulos, and Rob J Hyndman

  31. [31]

    Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. J. Amer. Statist. Assoc.114, 526 (2019), 804–819. 33 S. Withers

  32. [32]

    Emrah Yasasin, Julian Prester, Gerit Wagner, and Guido Schryen

    Optus breach casts spotlight on cyber resilience.Computer Weekly(2022). Emrah Yasasin, Julian Prester, Gerit Wagner, and Guido Schryen