Large Language Models as Explainable Cyberattack Detectors for Energy Industrial Control Systems

arxiv: 2604.26079 · v1 · submitted 2026-04-28 · 💻 cs.CR

Large Language Models as Explainable Cyberattack Detectors for Energy Industrial Control Systems

Weiyi Kong , Ahmad Mohammad Saber , Amr Youssef , Deepa Kundur This is my paper

Pith reviewed 2026-05-07 15:34 UTC · model grok-4.3

classification 💻 cs.CR

keywords large language modelscyberattack detectionindustrial control systemsModbusintrusion detectionexplainable AISCADAenergy systems

0 comments p. Extension

The pith

An off-the-shelf large language model can triage Modbus traffic in energy industrial control systems into normal or critical events with performance comparable to trained supervised detectors and without any task-specific updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether a ready-to-use large language model can act as an auditable, human-in-the-loop detector for critical events in Modbus-based industrial control systems. It converts each Modbus communication into a short string of discretized protocol tokens, then prompts the model to output a binary normal-or-critical label together with a short record that cites the specific tokens it used. On two public ICS Modbus benchmarks, under identical event information and evaluation splits, this pipeline reaches high accuracy that is broadly comparable to strong supervised baselines while needing no weight updates. Intervention diagnostics further indicate that the tokens the model cites are frequently relevant to its own decision rather than incidental.

Core claim

Under matched event information and shared evaluation splits, the LLM-based triage pipeline achieves high predictive performance on both benchmarks and is broadly comparable to strong supervised baselines, while requiring no task-specific weight updates. Intervention-based diagnostics provide evidence that the cited tokens are often decision-relevant to the model's own prediction.

What carries the argument

A prompt-configured large language model that receives a compact token string derived from discretized Modbus protocol fields and returns a normal-or-critical decision plus a token-grounded incident record.

Load-bearing premise

Converting Modbus fields into a compact discretized token string preserves enough information for the LLM to make accurate normal-versus-critical decisions.

What would settle it

On a held-out Modbus dataset, if the LLM's accuracy falls substantially below matched supervised baselines or if intervention diagnostics show that the cited tokens are not causally tied to the prediction, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2604.26079 by Ahmad Mohammad Saber, Amr Youssef, Deepa Kundur, Weiyi Kong.

**Figure 1.** Figure 1: Overview of the proposed framework. Offline prepa view at source ↗

**Figure 2.** Figure 2: Example of raw-to-token encoding. A raw Mod view at source ↗

**Figure 3.** Figure 3: Pipeline and leakage guards. Preprocessing includes view at source ↗

**Figure 4.** Figure 4: Pass-rate vs 𝜀 for sufficiency and comprehensiveness (95% CIs); dashed line shows the label flipped baseline. and only moderate flip rates; we therefore treat it as a diagnostic of boundary sensitivity rather than a primary objective. 5.3 Runtime latency and cost We evaluate end-to-end latency and amortized API cost of the LLM+XAI pipeline. To mitigate occasional network spikes, latencies are summarized af… view at source ↗

read the original abstract

In modern energy systems, industrial control systems (ICS) and power-system SCADA require intrusion detection that is not only accurate but also auditable by operators. The ICS intrusion-detection landscape is currently dominated by established supervised detectors. In this paper, we study whether an off-the-shelf large language model (LLM) can serve as a complementary, human-in-the-loop layer for Modbus traffic. We cast this as a binary network-side normal/critical decision task on two public ICS Modbus datasets, collapsing attack periods and other safety-critical behaviors into a single critical class. Each Modbus communication instance is converted into a compact token string derived from discretized protocol fields, and a prompt-configured LLM produces a normal/critical alert together with a concise, token-grounded incident record for analyst review. Under matched event information and shared evaluation splits, the resulting LLM-based triage pipeline achieves high predictive performance on both benchmarks and is broadly comparable to strong supervised baselines, while requiring no task-specific weight updates. To assess the audit record, we apply intervention-based diagnostics, including sufficiency- and necessity-style tests, which provide evidence that the cited tokens are often decision-relevant to the model's own prediction. These records are intended as audit signals rather than full human-grounded explanations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows an off-the-shelf LLM can flag critical Modbus events in ICS traffic with token-based explanations and no training, but the performance numbers are missing so the comparability claim is still untested.

read the letter

The main thing here is a training-free LLM pipeline that turns Modbus protocol fields into compact token strings, prompts an off-the-shelf model for normal/critical decisions, and adds intervention diagnostics to produce short audit records. They run this on two public datasets and say it matches strong supervised baselines on shared splits while staying fully auditable for operators in energy systems. That combination is the actual new piece: the tokenization step plus the sufficiency/necessity checks applied specifically to Modbus traffic rather than a generic network dataset.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes using an off-the-shelf LLM as a zero-shot, explainable triage layer for binary normal/critical classification of Modbus traffic in energy ICS/SCADA systems. Modbus instances are converted to compact discretized token strings derived from protocol fields; a prompt-configured LLM outputs the label plus a concise, token-grounded incident record. The approach is evaluated on two public ICS Modbus datasets under matched event information and shared splits, claiming high predictive performance broadly comparable to strong supervised baselines without any task-specific weight updates. Intervention-based diagnostics (sufficiency/necessity tests) are applied to provide evidence that cited tokens are decision-relevant to the model's predictions, positioning the records as audit signals for human analysts.

Significance. If the performance claims hold under the stated matching conditions, the work demonstrates a practical route to training-free, human-auditable detection for critical infrastructure that complements existing supervised detectors. Credit is due for the use of public datasets, the absence of task-specific fitting or invented parameters, and the explicit intervention diagnostics that move beyond post-hoc attribution. The approach directly addresses the auditability requirement highlighted in the ICS security literature.

major comments (3)

[Abstract and §4] Abstract and §4 (Evaluation): the central claim that the LLM pipeline 'achieves high predictive performance ... and is broadly comparable to strong supervised baselines' is load-bearing yet unsupported by any numeric metrics, confidence intervals, or per-class results in the provided abstract; the full results section must supply these values together with the exact baseline implementations and feature sets to permit verification.
[§3] §3 (Method, discretization step): the claim of matched event information rests on converting Modbus fields (register values, function codes, addresses) into a compact discretized token string. This quantization necessarily bins or drops fine-grained numeric thresholds and multi-packet timing that supervised baselines receive directly; without an explicit ablation or side-by-side feature-equivalence test, it is unclear whether the LLM input is informationally complete relative to the baselines used for comparison.
[§5] §5 (Intervention diagnostics): the sufficiency/necessity tests demonstrate that certain tokens influence the LLM's own prediction, but they operate entirely within the discretized token representation. They therefore cannot test whether that representation itself preserves the discriminative information present in the raw Modbus records employed by the supervised baselines.

minor comments (2)

[§3.2] Clarify in the prompt template (likely §3.2) whether the LLM is instructed to output only the binary label plus cited tokens or whether additional free-form text is permitted; this affects reproducibility of the audit record.
[Results figures] Table captions and axis labels in the results figures should explicitly state the evaluation split and the precise definition of the 'critical' class (collapsed attacks plus safety-critical behaviors).

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive and detailed review. We appreciate the recognition of the work's focus on training-free, auditable detection for ICS environments and the value placed on public datasets and intervention diagnostics. We address each major comment below, indicating the revisions we will incorporate to improve clarity, completeness, and verifiability.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Evaluation): the central claim that the LLM pipeline 'achieves high predictive performance ... and is broadly comparable to strong supervised baselines' is load-bearing yet unsupported by any numeric metrics, confidence intervals, or per-class results in the provided abstract; the full results section must supply these values together with the exact baseline implementations and feature sets to permit verification.

Authors: We agree that the abstract would benefit from explicit numeric support for the performance claim. In the revised manuscript we will add key metrics (accuracy, F1, per-class precision/recall) with confidence intervals to the abstract. Section 4 will be expanded to report the precise supervised baseline implementations (including libraries, hyperparameters, and training procedures) and the exact feature sets extracted from the raw Modbus records, enabling direct verification and reproduction. revision: yes
Referee: [§3] §3 (Method, discretization step): the claim of matched event information rests on converting Modbus fields (register values, function codes, addresses) into a compact discretized token string. This quantization necessarily bins or drops fine-grained numeric thresholds and multi-packet timing that supervised baselines receive directly; without an explicit ablation or side-by-side feature-equivalence test, it is unclear whether the LLM input is informationally complete relative to the baselines used for comparison.

Authors: The discretization was constructed to retain the protocol fields most relevant to the binary normal/critical task while producing compact token strings suitable for prompting. We acknowledge that an explicit ablation would strengthen the matched-information claim. In the revision we will add an ablation study varying discretization granularity (bin widths for register values and address ranges) and report its effect on LLM performance. We will also include a side-by-side mapping table showing how each token corresponds to the raw fields and any timing aggregates supplied to the baselines, together with a discussion of any information loss for multi-packet sequences. revision: yes
Referee: [§5] §5 (Intervention diagnostics): the sufficiency/necessity tests demonstrate that certain tokens influence the LLM's own prediction, but they operate entirely within the discretized token representation. They therefore cannot test whether that representation itself preserves the discriminative information present in the raw Modbus records employed by the supervised baselines.

Authors: The intervention tests are scoped to the LLM's internal decision process on the token representation; they are not intended to validate cross-representation equivalence. We will revise §5 to state this limitation explicitly and to clarify that the primary empirical comparison to baselines occurs at the level of task performance under identical event splits. The diagnostics serve to substantiate the audit-record utility rather than to prove representational completeness. We will also note this as a boundary condition for future work that might explore raw-data prompting or hybrid representations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation is self-contained

full rationale

The paper describes an empirical pipeline that converts public Modbus datasets into discretized token strings, feeds them to an off-the-shelf LLM via prompting, and reports predictive performance against externally trained supervised baselines on matched splits. No equations, fitted parameters, or derivations are presented that reduce to their own inputs by construction. The intervention diagnostics are applied after the fact to inspect token relevance within the LLM's own representation and do not define or force the reported accuracy numbers. The central claim of comparability without weight updates therefore rests on independent data and external benchmarks rather than self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that discretized token strings retain decision-relevant protocol information and that intervention tests can validate token relevance; no explicit free parameters or invented entities are stated in the abstract.

axioms (2)

domain assumption Discretized protocol fields converted to token strings preserve sufficient information for accurate normal/critical classification by an LLM.
Invoked in the data-preparation step described in the abstract.
domain assumption Intervention-based diagnostics can establish that cited tokens are causally relevant to the model's prediction.
Used to support the audit-record claim.

pith-pipeline@v0.9.0 · 5526 in / 1384 out tokens · 48890 ms · 2026-05-07T15:34:56.270114+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 11 canonical work pages

[1]

Brown, Benjamin Mann, Nick Ryder, et al

Tom B. Brown, Benjamin Mann, Nick Ryder, et al. 2020. Language Models are Few-Shot Learners. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., Red Hook, NY, USA, 1877–1901. doi:10.5555/3495724. 3495881

work page doi:10.5555/3495724 2020
[2]

Canadian Institute for Cybersecurity (UNB). 2023. CIC Modbus dataset 2023. https://www.unb.ca/cic/datasets/modbus-2023.html Accessed: 2025-11-06

2023
[3]

Cybersecurity and Infrastructure Security Agency. 2021. Cyber-Attack Against Ukrainian Critical Infrastructure. https://www.cisa.gov/news-events/ics-alerts/ ir-alert-h-16-056-01. ICS Alert IR-ALERT-H-16-056-01, Accessed: 2026-03-27

2021
[4]

Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C. Wallace. 2020. ERASER: A Benchmark to Evaluate Rationalized NLP Models. InProceedings of the 58th Annual Meeting of the Associ- ation for Computational Linguistics. Association for Computational Linguistics, Online, 4443–4458. doi:10.18653/v1/2020.ac...

work page doi:10.18653/v1/2020.acl-main.408 2020
[5]

Yousif Hosain and Muhammet Çakmak. 2025. XAI-XGBoost: An Innovative Ex- plainable Intrusion Detection Approach for Securing Internet of Medical Things Systems.Scientific Reports15, 1 (2025), 22278. doi:10.1038/s41598-025-07790-0

work page doi:10.1038/s41598-025-07790-0 2025
[6]

Yan Hu, An Yang, Hong Li, Yuyan Sun, and Limin Sun. 2018. A Survey of Intrusion Detection on Industrial Control Systems.International Journal of Distributed Sensor Networks14, 8 (2018), 1–13. doi:10.1177/1550147718794615

work page doi:10.1177/1550147718794615 2018
[7]

Alani, Amine Bermak, and Issa Khalil

Naseem Khan, Kashif Ahmad, Aref Al Tamimi, Mohammed M. Alani, Amine Bermak, and Issa Khalil. 2024. Explainable AI-Based Intrusion Detection System for Industry 5.0: An Overview of the Literature, Associated Challenges, the Existing Solutions, and Potential Research Directions. arXiv:2408.03335 https: //arxiv.org/abs/2408.03335

work page arXiv 2024
[8]

Fernandez

Antoine LeMay and Jose M. Fernandez. 2016. Providing SCADA Network Data Sets for Intrusion Detection Research. In9th Workshop on Cyber Security Exper- imentation and Test (CSET 16). USENIX Association, Austin, TX, USA. https: //www.usenix.org/conference/cset16/workshop-program/presentation/lemay

2016
[9]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.Comput. Surveys55, 9 (2023), 1–35. doi:10.1145/3560815

work page doi:10.1145/3560815 2023
[10]

Yao Liu, Peng Ning, and Michael K. Reiter. 2009. False Data Injection Attacks against State Estimation in Electric Power Grids. InProceedings of the 16th ACM Conference on Computer and Communications Security (CCS ’09). Association for Computing Machinery, Chicago, IL, USA, 21–32. doi:10.1145/1653662.1653666

work page doi:10.1145/1653662.1653666 2009
[11]

OpenAI. 2024. GPT-4o System Card. https://openai.com/index/gpt-4o-system- card/. Accessed: 2026-03-27

2024
[12]

Ahmad Mohammad Saber, Max Mauro Dias Santos, Mohammad Al Janaideh, Amr Youssef, and Deepa Kundur. 2025. A Kolmogorov-Arnold Network for Explainable Detection of Cyberattacks on EV Chargers. arXiv:2503.02281 https: //arxiv.org/abs/2503.02281 Accessed: 2025-11-06

work page arXiv 2025
[13]

Muhammad Azmi Umer, Khurum Nazir Junejo, Muhammad Taha Jilani, and Aditya P. Mathur. 2022. Machine Learning for Intrusion Detection in Industrial Control Systems: Applications, Challenges, and Recommendations.International Journal of Critical Infrastructure Protection38 (2022), 100516. doi:10.1016/j.ijcip. 2022.100516

work page doi:10.1016/j.ijcip 2022
[14]

2024 , issue_date =

Sahil Verma, Varich Boonsanong, Minh Hoang, Keegan Hines, John P. Dickerson, and Chirag Shah. 2024. Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review.Comput. Surveys56, 12 (2024), 312:1–312:42. doi:10.1145/3677119

work page doi:10.1145/3677119 2024
[15]

Large language models for cyber security: A systematic literature review,

Hanxiang Xu, Shenao Wang, Ningke Li, Kailong Wang, Yanjie Zhao, Kai Chen, Ting Yu, Yang Liu, and Haoyu Wang. 2024. Large Language Models for Cyber Security: A Systematic Literature Review. arXiv:2405.04760 https://arxiv.org/ abs/2405.04760 Appendix: Example Prompts This appendix lists one representative configuration of the prompts used in our experiments...

work page arXiv 2024
[16]

Does this command merely *observe* the system (Read), or does it attempt to *change* the system (Write/Control)?

**The Principle of Impact:** - Ask yourself: "Does this command merely *observe* the system (Read), or does it attempt to *change* the system (Write/Control)?" - **READ operations** (Polling) are the baseline of industrial automation and are generally **Normal**, unless they occur at an impossible speed (Flooding). - **WRITE/CONTROL operations** actively ...
[17]

low-and-slow

**The Principle of Rhythm (Time Analysis):** - Machines are rhythmic; Hackers are bursty. - **Normal:** Periodic, steady inter-arrival times (e.g., regular polling intervals). - **Critical:** Sudden deviations from the rhythm. Extremely short intervals (B0/B1) suggest automated flooding or fuzzing. Extremely long gaps followed by activity may suggest a "l...
[18]

- Exception codes (EX) usually indicate a device failure or a scanner probing invalid addresses

**The Principle of Protocol Compliance:** - Any proprietary, undefined, or malformed function codes are immediately **Critical**. - Exception codes (EX) usually indicate a device failure or a scanner probing invalid addresses. ### INPUT FORMAT LEGEND You will receive a single log line with discretized tokens: - **DIR:** Direction (C2S = Client to Server /...
[19]

**Semantic Decode:** What is the specific purpose of this FC according to the Modbus protocol standard? (e.g., Is it reading inputs or forcing coils?)
[20]

**Intent Check:** Does this combination of Direction + FC + Frequency look like a SCADA master polling a sensor (Benign), or an external actor trying to manipulate the grid (Malicious)? e-Energy ’26, June 2026, Banff, AB, Canada Weiyi Kong, Ahmad Mohammad Saber, Amr Youssef, and Deepa Kundur

2026
[21]

label": Must be

**Risk Assessment:** If this command succeeds, could it physically trip a breaker or alter a sensor reading? ### OUTPUT FORMAT Output ONLY a JSON object. - "label": Must be "normal" or "critical". - "confidence": Float between 0.0 and 1.0. - "rationale": A concise, 1-sentence explanation focusing on the *operational impact* (e.g., "Unauthorized attempt to...

[1] [1]

Brown, Benjamin Mann, Nick Ryder, et al

Tom B. Brown, Benjamin Mann, Nick Ryder, et al. 2020. Language Models are Few-Shot Learners. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., Red Hook, NY, USA, 1877–1901. doi:10.5555/3495724. 3495881

work page doi:10.5555/3495724 2020

[2] [2]

Canadian Institute for Cybersecurity (UNB). 2023. CIC Modbus dataset 2023. https://www.unb.ca/cic/datasets/modbus-2023.html Accessed: 2025-11-06

2023

[3] [3]

Cybersecurity and Infrastructure Security Agency. 2021. Cyber-Attack Against Ukrainian Critical Infrastructure. https://www.cisa.gov/news-events/ics-alerts/ ir-alert-h-16-056-01. ICS Alert IR-ALERT-H-16-056-01, Accessed: 2026-03-27

2021

[4] [4]

Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C. Wallace. 2020. ERASER: A Benchmark to Evaluate Rationalized NLP Models. InProceedings of the 58th Annual Meeting of the Associ- ation for Computational Linguistics. Association for Computational Linguistics, Online, 4443–4458. doi:10.18653/v1/2020.ac...

work page doi:10.18653/v1/2020.acl-main.408 2020

[5] [5]

Yousif Hosain and Muhammet Çakmak. 2025. XAI-XGBoost: An Innovative Ex- plainable Intrusion Detection Approach for Securing Internet of Medical Things Systems.Scientific Reports15, 1 (2025), 22278. doi:10.1038/s41598-025-07790-0

work page doi:10.1038/s41598-025-07790-0 2025

[6] [6]

Yan Hu, An Yang, Hong Li, Yuyan Sun, and Limin Sun. 2018. A Survey of Intrusion Detection on Industrial Control Systems.International Journal of Distributed Sensor Networks14, 8 (2018), 1–13. doi:10.1177/1550147718794615

work page doi:10.1177/1550147718794615 2018

[7] [7]

Alani, Amine Bermak, and Issa Khalil

Naseem Khan, Kashif Ahmad, Aref Al Tamimi, Mohammed M. Alani, Amine Bermak, and Issa Khalil. 2024. Explainable AI-Based Intrusion Detection System for Industry 5.0: An Overview of the Literature, Associated Challenges, the Existing Solutions, and Potential Research Directions. arXiv:2408.03335 https: //arxiv.org/abs/2408.03335

work page arXiv 2024

[8] [8]

Fernandez

Antoine LeMay and Jose M. Fernandez. 2016. Providing SCADA Network Data Sets for Intrusion Detection Research. In9th Workshop on Cyber Security Exper- imentation and Test (CSET 16). USENIX Association, Austin, TX, USA. https: //www.usenix.org/conference/cset16/workshop-program/presentation/lemay

2016

[9] [9]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.Comput. Surveys55, 9 (2023), 1–35. doi:10.1145/3560815

work page doi:10.1145/3560815 2023

[10] [10]

Yao Liu, Peng Ning, and Michael K. Reiter. 2009. False Data Injection Attacks against State Estimation in Electric Power Grids. InProceedings of the 16th ACM Conference on Computer and Communications Security (CCS ’09). Association for Computing Machinery, Chicago, IL, USA, 21–32. doi:10.1145/1653662.1653666

work page doi:10.1145/1653662.1653666 2009

[11] [11]

OpenAI. 2024. GPT-4o System Card. https://openai.com/index/gpt-4o-system- card/. Accessed: 2026-03-27

2024

[12] [12]

Ahmad Mohammad Saber, Max Mauro Dias Santos, Mohammad Al Janaideh, Amr Youssef, and Deepa Kundur. 2025. A Kolmogorov-Arnold Network for Explainable Detection of Cyberattacks on EV Chargers. arXiv:2503.02281 https: //arxiv.org/abs/2503.02281 Accessed: 2025-11-06

work page arXiv 2025

[13] [13]

Muhammad Azmi Umer, Khurum Nazir Junejo, Muhammad Taha Jilani, and Aditya P. Mathur. 2022. Machine Learning for Intrusion Detection in Industrial Control Systems: Applications, Challenges, and Recommendations.International Journal of Critical Infrastructure Protection38 (2022), 100516. doi:10.1016/j.ijcip. 2022.100516

work page doi:10.1016/j.ijcip 2022

[14] [14]

2024 , issue_date =

Sahil Verma, Varich Boonsanong, Minh Hoang, Keegan Hines, John P. Dickerson, and Chirag Shah. 2024. Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review.Comput. Surveys56, 12 (2024), 312:1–312:42. doi:10.1145/3677119

work page doi:10.1145/3677119 2024

[15] [15]

Large language models for cyber security: A systematic literature review,

Hanxiang Xu, Shenao Wang, Ningke Li, Kailong Wang, Yanjie Zhao, Kai Chen, Ting Yu, Yang Liu, and Haoyu Wang. 2024. Large Language Models for Cyber Security: A Systematic Literature Review. arXiv:2405.04760 https://arxiv.org/ abs/2405.04760 Appendix: Example Prompts This appendix lists one representative configuration of the prompts used in our experiments...

work page arXiv 2024

[16] [16]

Does this command merely *observe* the system (Read), or does it attempt to *change* the system (Write/Control)?

**The Principle of Impact:** - Ask yourself: "Does this command merely *observe* the system (Read), or does it attempt to *change* the system (Write/Control)?" - **READ operations** (Polling) are the baseline of industrial automation and are generally **Normal**, unless they occur at an impossible speed (Flooding). - **WRITE/CONTROL operations** actively ...

[17] [17]

low-and-slow

**The Principle of Rhythm (Time Analysis):** - Machines are rhythmic; Hackers are bursty. - **Normal:** Periodic, steady inter-arrival times (e.g., regular polling intervals). - **Critical:** Sudden deviations from the rhythm. Extremely short intervals (B0/B1) suggest automated flooding or fuzzing. Extremely long gaps followed by activity may suggest a "l...

[18] [18]

- Exception codes (EX) usually indicate a device failure or a scanner probing invalid addresses

**The Principle of Protocol Compliance:** - Any proprietary, undefined, or malformed function codes are immediately **Critical**. - Exception codes (EX) usually indicate a device failure or a scanner probing invalid addresses. ### INPUT FORMAT LEGEND You will receive a single log line with discretized tokens: - **DIR:** Direction (C2S = Client to Server /...

[19] [19]

**Semantic Decode:** What is the specific purpose of this FC according to the Modbus protocol standard? (e.g., Is it reading inputs or forcing coils?)

[20] [20]

**Intent Check:** Does this combination of Direction + FC + Frequency look like a SCADA master polling a sensor (Benign), or an external actor trying to manipulate the grid (Malicious)? e-Energy ’26, June 2026, Banff, AB, Canada Weiyi Kong, Ahmad Mohammad Saber, Amr Youssef, and Deepa Kundur

2026

[21] [21]

label": Must be

**Risk Assessment:** If this command succeeds, could it physically trip a breaker or alter a sensor reading? ### OUTPUT FORMAT Output ONLY a JSON object. - "label": Must be "normal" or "critical". - "confidence": Float between 0.0 and 1.0. - "rationale": A concise, 1-sentence explanation focusing on the *operational impact* (e.g., "Unauthorized attempt to...