Latency-Aware Deep Learning Benchmark for Real-Time Cyber-Physical Attack and Fault Classification in Inverter-Dominated Power Grids

A.P. Sakis Meliopoulos; Emad Abukhousa; Saman Zonouz

arxiv: 2605.17256 · v1 · pith:XSGITVL3new · submitted 2026-05-17 · 📡 eess.SY · cs.AI· cs.LG· cs.SY

Latency-Aware Deep Learning Benchmark for Real-Time Cyber-Physical Attack and Fault Classification in Inverter-Dominated Power Grids

Emad Abukhousa , Saman Zonouz , A.P. Sakis Meliopoulos This is my paper

Pith reviewed 2026-05-19 23:24 UTC · model grok-4.3

classification 📡 eess.SY cs.AIcs.LGcs.SY

keywords deep learningpower system anomaly detectioncyber-physical attacksfault classificationlatency benchmarkinginverter-dominated gridsreal-time classificationelectromagnetic transient simulation

0 comments

The pith

Deep learning models classify power grid anomalies in under 15 ms but require 50 to 90 ms for complete inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a latency-aware benchmarking framework to test deep learning models for detecting faults and cyber-attacks in power grids using simulated time-domain signals. Eight different neural network types were evaluated on streaming data from inverter-dominated networks, all achieving sub-cycle classification decisions. Yet the full end-to-end latency remained over three cycles, exposing a mismatch with the speed required for actual protection systems. This matters for moving AI methods from labs to real grid operations where delays can cause failures.

Core claim

The work establishes that while eight neural network architectures successfully classified multi-event sequences of physical faults and cyber-attacks in real time with response times below 15 ms, the end-to-end inference latency consistently ranged from 50 to 90 ms, exceeding three cycles and highlighting the gap to protection-grade deployment.

What carries the argument

The latency-aware benchmarking framework that systematically evaluates models on high-fidelity streaming datasets from an electromagnetic transient simulator.

If this is right

Optimization and hardware acceleration are needed to reduce inference latency.
A reproducible benchmark is established for sub-cycle anomaly detection.
Guidance is provided for transitioning machine learning to real-world protection applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar latency issues may affect other real-time control systems beyond power grids.
Future work could test these models on actual hardware to verify simulation results.

Load-bearing premise

The high-fidelity simulated signals from the electromagnetic transient simulator accurately represent real inverter-dominated power grid behavior under faults and attacks.

What would settle it

Running the same models on data collected from a physical power grid testbed and checking if classification accuracy and latency match the simulated results.

Figures

Figures reproduced from arXiv: 2605.17256 by A.P. Sakis Meliopoulos, Emad Abukhousa, Saman Zonouz.

**Figure 1.** Figure 1: Testbed configuration highlighting Protection Zone 2 (PZ-2) with MUs MU23 and MU32 and representative fault location. II. METHODOLOGY A. High-Fidelity Dataset Generation The dataset was generated using the industry-grade simulator WinIGS [11] , which models grid dynamics, renewable integration, and anomaly injections with microsecond precision. Data acquisition emulated IEC 61850 Sampled Value (SV) strea… view at source ↗

**Figure 2.** Figure 2: Streaming model outputs with and without cyclic confidence filtering. Raw predictions (a) show high jitter and unstable class switching, whereas one-cycle moving average with confidence gating (b) yields smooth and decisive classifications with an explicit model confidence score [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Zoomed-in predictions during the CT Ratio Attack. 3) Experiment 3: Multi-Event Classification: The DL classification system was tested on Event 1 dataset containing five sequential anomalies. Fig. 4a illustrates the full sequence, where the model successfully tracks transitions between normal states, line faults, and cyberattacks. The model confidence dips align with the onset of each disturbance. Fig. 4… view at source ↗

**Figure 4.** Figure 4: Model prediction analysis across two datasets of streaming events. (a) Event 1 shows all five anomalies in sequence, (b) zoom on a line-to-line fault between phases B and C in Zone 2, (c) Event 2 highlighting GPS spoofing at MU32 between 4.0–4.2 s, and (d) Event 2 showing an out-of-zone SLG fault (5.0–5.2 s) that is detected by confidence drop and labeled as unseen class −1 since it was not part of trainin… view at source ↗

read the original abstract

This work introduces a latency-aware benchmarking framework for evaluating deep learning models in power system anomaly detection using high-fidelity, time-domain signals generated from an industry-grade electromagnetic transient simulator. Eight neural network architectures, ranging from MLPs to Transformers, were systematically evaluated on streaming datasets representing both physical faults and cyber-attacks in inverter-dominated networks. All models successfully classified two representative multi-event sequences in real time with sub-cycle response times below 15 ms. However, although classification decisions occurred within one cycle, the end-to-end inference latency consistently exceeded three cycles, ranging from 50 to 90 ms. These results highlight a critical gap between algorithmic capability and protection-grade deployment, pointing to the need for further optimization and hardware acceleration. The findings establish a reproducible benchmark for sub-cycle anomaly detection and provide guidance for transitioning machine learning methods from research prototypes to real-world protection applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a latency-aware benchmarking framework for deep learning models in real-time cyber-physical attack and fault classification for inverter-dominated power grids. It generates streaming datasets from an industry-grade electromagnetic transient (EMT) simulator and evaluates eight neural network architectures (MLPs to Transformers). All models classify two representative multi-event sequences with sub-cycle response times below 15 ms, but end-to-end inference latencies range from 50 to 90 ms (exceeding three cycles). The work claims this reveals a critical gap for protection-grade deployment and establishes a reproducible benchmark to guide optimization and hardware acceleration.

Significance. If the EMT simulator signals accurately capture real inverter-grid dynamics, the distinction between algorithmic decision time and full end-to-end latency would provide useful practical guidance for transitioning deep learning methods to power system protection applications. The reproducible benchmark aspect could help standardize evaluations in this emerging area.

major comments (2)

[§3 (Simulation Setup) and abstract] §3 (Simulation Setup) and abstract: The central claim that results inform protection-grade deployment rests on the unvalidated premise that high-fidelity EMT simulator outputs accurately reproduce real-world inverter control responses, communication delays, and multi-event signatures under faults and cyber-attacks. No hardware-in-the-loop validation, field data comparison, or sensitivity analysis to omitted effects (e.g., PLL dynamics or sensor quantization) is provided, which directly affects whether the reported 50–90 ms latencies are relevant outside simulation.
[§5 (Results)] §5 (Results): The classification success is reported only for two representative multi-event sequences with no aggregate metrics (e.g., accuracy, F1-score, or false positive rates across a larger test set), no statistical significance testing, and no ablation on sequence selection. This weakens support for the general claim of 'all models successfully classified' in real time.

minor comments (2)

[Abstract and §1] The abstract and §1 would benefit from explicitly listing the eight architectures evaluated and the precise definition of 'end-to-end inference latency' versus 'classification decision time' to improve clarity for readers.
[Figures] Figure captions and axis labels in the latency plots should include units and confidence intervals for the 50–90 ms range to aid interpretation.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their thorough and constructive review of our manuscript. We address each major comment point by point below, with clear indications of planned revisions.

read point-by-point responses

Referee: [§3 (Simulation Setup) and abstract] §3 (Simulation Setup) and abstract: The central claim that results inform protection-grade deployment rests on the unvalidated premise that high-fidelity EMT simulator outputs accurately reproduce real-world inverter control responses, communication delays, and multi-event signatures under faults and cyber-attacks. No hardware-in-the-loop validation, field data comparison, or sensitivity analysis to omitted effects (e.g., PLL dynamics or sensor quantization) is provided, which directly affects whether the reported 50–90 ms latencies are relevant outside simulation.

Authors: We agree that the study relies on EMT simulation without hardware-in-the-loop validation or field data comparison, which limits direct claims about real-world protection deployment. While the chosen EMT simulator is industry-standard for capturing detailed inverter-grid dynamics, we acknowledge that certain effects such as sensor quantization and specific communication delays are not explicitly modeled. In the revised manuscript we will add a dedicated limitations subsection in the discussion that explicitly states these assumptions and includes a sensitivity analysis for PLL dynamics and related omitted effects within the existing simulation framework. This will better qualify the applicability of the reported latencies. revision: partial
Referee: [§5 (Results)] §5 (Results): The classification success is reported only for two representative multi-event sequences with no aggregate metrics (e.g., accuracy, F1-score, or false positive rates across a larger test set), no statistical significance testing, and no ablation on sequence selection. This weakens support for the general claim of 'all models successfully classified' in real time.

Authors: The manuscript deliberately focuses on two representative multi-event sequences to illustrate latency behavior under complex, realistic conditions. We recognize, however, that aggregate metrics would strengthen the presentation. We will revise §5 to report accuracy, F1-score, and false-positive rates over a larger test set, include statistical significance testing, and add justification (with supporting ablation where feasible) for the choice of sequences. These additions will support the broader claim of real-time classification capability. revision: yes

standing simulated objections not resolved

Hardware-in-the-loop validation or direct comparison against field data, as these require physical experimental infrastructure and real-grid measurements that are outside the scope and resources of the present simulation-based benchmark study.

Circularity Check

0 steps flagged

No circularity: empirical benchmarking study with direct measurements

full rationale

This is an empirical benchmarking paper that evaluates neural network architectures on streaming datasets generated from an industry-grade EMT simulator. It reports measured classification success rates and inference latencies without any mathematical derivations, parameter fitting steps, or load-bearing self-citations that reduce claims to inputs by construction. The central results (sub-cycle classification decisions versus 50-90 ms end-to-end latency) follow directly from running the models on the simulated multi-event sequences, with no self-definitional loops or renamed known results. The study is self-contained against its own simulation benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The benchmark depends on the fidelity of the simulation tool and the choice of the two representative sequences as proxies for broader scenarios.

axioms (1)

domain assumption The electromagnetic transient simulator generates data that is representative of real-world conditions for the purpose of evaluating model performance.
This underpins the validity of the classification and latency results for drawing deployment conclusions.

pith-pipeline@v0.9.0 · 5704 in / 1348 out tokens · 49523 ms · 2026-05-19T23:24:24.271740+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

All models successfully classified two representative multi-event sequences in real time with sub-cycle response times below 15 ms. However, although classification decisions occurred within one cycle, the end-to-end inference latency consistently exceeded three cycles, ranging from 50 to 90 ms.
IndisputableMonolith/Foundation/DimensionForcing.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a one-cycle centered moving average filter... N_cyc = 80 samples/cycle

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

Influence of inverter-based resources on microgrid protection: Part 1: Microgrids in radial distribution systems,

M. J. Reno, S. Brahma, A. Bidram, and M. E. Ropp, “Influence of inverter-based resources on microgrid protection: Part 1: Microgrids in radial distribution systems,”IEEE Power & Energy Magazine, vol. 19, no. 3, pp. 36–46, 2021

work page 2021
[2]

Scpse: Security-oriented cyber-physical state estimation for power grid critical infrastructures,

S. Zonouz, K. M. Rogers, R. Berthier, R. B. Bobba, W. H. Sanders, and T. J. Overbye, “Scpse: Security-oriented cyber-physical state estimation for power grid critical infrastructures,”IEEE Transactions on Smart Grid, vol. 3, no. 4, pp. 1790–1799, Dec. 2012

work page 2012
[3]

Dynamic estimation-based protection and hidden failure detection and identification: Inverter-dominated power systems,

A. S. Meliopoulos, G. J. Cokkinides, P. Myrda, E. Farantatos, R. El- moudi, B. Fardanesh, G. Stefopoulos, C. Black, and P. Panciatici, “Dynamic estimation-based protection and hidden failure detection and identification: Inverter-dominated power systems,”IEEE Power & Energy Magazine, jan 2023

work page 2023
[4]

Cnn-based transformer model for fault detection in power system networks,

J. B. Thomas, S. G. Chaudhari, K. V . Shihabudheen, and N. K. Verma, “Cnn-based transformer model for fault detection in power system networks,”IEEE Transactions on Instrumentation and Measurement, vol. 72, p. 2504210, 2023

work page 2023
[5]

Deep machine learning model-based cyber-attacks detection in smart power systems,

A. Almalaq, S. Albadran, and M. A. Mohamed, “Deep machine learning model-based cyber-attacks detection in smart power systems,”Mathe- matics, vol. 10, no. 15, p. 2574, 2022

work page 2022
[6]

A deep learning-based cyberattack detection system for transmission protective relays,

Y . M. Khaw, A. A. Jahromi, M. F. M. Arani, S. Sanner, D. Kundur, and M. Kassouf, “A deep learning-based cyberattack detection system for transmission protective relays,”IEEE Transactions on Smart Grid, vol. 12, no. 3, pp. 2554–2565, May 2021

work page 2021
[7]

Deep learning based relay for online fault detection, classification, and fault location in a grid-connected microgrid,

B. Roy, S. Adhikari, S. Datta, K. J. Devi, A. D. Devi, F. Alsaif, S. Alsulamy, and T. S. Ustun, “Deep learning based relay for online fault detection, classification, and fault location in a grid-connected microgrid,”IEEE Access, vol. 11, pp. 62 677–62 693, 2023. [Online]. Available: https://doi.org/10.1109/ACCESS.2023.3285768

work page doi:10.1109/access.2023.3285768 2023
[9]

Available: https://arxiv.org/abs/2411.14278v2

[Online]. Available: https://arxiv.org/abs/2411.14278v2

work page arXiv
[10]

A comprehensive review on deep learning techniques in power system protection: Trends, challenges, applications and future directions,

M. Mishra and J. G. Singh, “A comprehensive review on deep learning techniques in power system protection: Trends, challenges, applications and future directions,”Results in Engineering, vol. 25, p. 103884,

work page
[11]

Available: https://doi.org/10.1016/j.rineng.2024.103884

[Online]. Available: https://doi.org/10.1016/j.rineng.2024.103884

work page doi:10.1016/j.rineng.2024.103884 2024
[12]

A review on machine learning techniques for secured cyber-physical systems in smart grid networks,

M. K. Hasan, R. A. Abdulkadir, S. Islam, T. R. Gadekallu, and N. Safie, “A review on machine learning techniques for secured cyber-physical systems in smart grid networks,”Energy Reports, vol. 11, pp. 1268– 1290, 2024. [11]WinIGS Integrated Grounding System Analysis for Windows – Version 8.1.5, Advanced Grounding Concepts (AGC), Alpharetta, GA, USA, May 2...

work page 2024

[1] [1]

Influence of inverter-based resources on microgrid protection: Part 1: Microgrids in radial distribution systems,

M. J. Reno, S. Brahma, A. Bidram, and M. E. Ropp, “Influence of inverter-based resources on microgrid protection: Part 1: Microgrids in radial distribution systems,”IEEE Power & Energy Magazine, vol. 19, no. 3, pp. 36–46, 2021

work page 2021

[2] [2]

Scpse: Security-oriented cyber-physical state estimation for power grid critical infrastructures,

S. Zonouz, K. M. Rogers, R. Berthier, R. B. Bobba, W. H. Sanders, and T. J. Overbye, “Scpse: Security-oriented cyber-physical state estimation for power grid critical infrastructures,”IEEE Transactions on Smart Grid, vol. 3, no. 4, pp. 1790–1799, Dec. 2012

work page 2012

[3] [3]

Dynamic estimation-based protection and hidden failure detection and identification: Inverter-dominated power systems,

A. S. Meliopoulos, G. J. Cokkinides, P. Myrda, E. Farantatos, R. El- moudi, B. Fardanesh, G. Stefopoulos, C. Black, and P. Panciatici, “Dynamic estimation-based protection and hidden failure detection and identification: Inverter-dominated power systems,”IEEE Power & Energy Magazine, jan 2023

work page 2023

[4] [4]

Cnn-based transformer model for fault detection in power system networks,

J. B. Thomas, S. G. Chaudhari, K. V . Shihabudheen, and N. K. Verma, “Cnn-based transformer model for fault detection in power system networks,”IEEE Transactions on Instrumentation and Measurement, vol. 72, p. 2504210, 2023

work page 2023

[5] [5]

Deep machine learning model-based cyber-attacks detection in smart power systems,

A. Almalaq, S. Albadran, and M. A. Mohamed, “Deep machine learning model-based cyber-attacks detection in smart power systems,”Mathe- matics, vol. 10, no. 15, p. 2574, 2022

work page 2022

[6] [6]

A deep learning-based cyberattack detection system for transmission protective relays,

Y . M. Khaw, A. A. Jahromi, M. F. M. Arani, S. Sanner, D. Kundur, and M. Kassouf, “A deep learning-based cyberattack detection system for transmission protective relays,”IEEE Transactions on Smart Grid, vol. 12, no. 3, pp. 2554–2565, May 2021

work page 2021

[7] [7]

Deep learning based relay for online fault detection, classification, and fault location in a grid-connected microgrid,

B. Roy, S. Adhikari, S. Datta, K. J. Devi, A. D. Devi, F. Alsaif, S. Alsulamy, and T. S. Ustun, “Deep learning based relay for online fault detection, classification, and fault location in a grid-connected microgrid,”IEEE Access, vol. 11, pp. 62 677–62 693, 2023. [Online]. Available: https://doi.org/10.1109/ACCESS.2023.3285768

work page doi:10.1109/access.2023.3285768 2023

[8] [9]

Available: https://arxiv.org/abs/2411.14278v2

[Online]. Available: https://arxiv.org/abs/2411.14278v2

work page arXiv

[9] [10]

A comprehensive review on deep learning techniques in power system protection: Trends, challenges, applications and future directions,

M. Mishra and J. G. Singh, “A comprehensive review on deep learning techniques in power system protection: Trends, challenges, applications and future directions,”Results in Engineering, vol. 25, p. 103884,

work page

[10] [11]

Available: https://doi.org/10.1016/j.rineng.2024.103884

[Online]. Available: https://doi.org/10.1016/j.rineng.2024.103884

work page doi:10.1016/j.rineng.2024.103884 2024

[11] [12]

A review on machine learning techniques for secured cyber-physical systems in smart grid networks,

M. K. Hasan, R. A. Abdulkadir, S. Islam, T. R. Gadekallu, and N. Safie, “A review on machine learning techniques for secured cyber-physical systems in smart grid networks,”Energy Reports, vol. 11, pp. 1268– 1290, 2024. [11]WinIGS Integrated Grounding System Analysis for Windows – Version 8.1.5, Advanced Grounding Concepts (AGC), Alpharetta, GA, USA, May 2...

work page 2024