arxiv: 2604.05674 · v1 · submitted 2026-04-07 · 💻 cs.CR · cs.AI

Recognition: no theorem link

From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems

Shaofei Huang , Christopher M. Poskitt , Lwin Khin Shar

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:56 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords cyber-physical systemssecurity assessmentlarge language modelsarchitectural reconstructionthreat modelingrisk estimationprompt chaining

0 comments

The pith

Multimodal LLMs can synthesize complete architectural models from fragmented data to support quantitative risk assessment in cyber-physical systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ASTRAL as a technique that uses multimodal large language models to pull system details from scattered or outdated sources and build usable architectural representations. It relies on prompt chaining, few-shot examples, and reasoning about system structure to fill gaps caused by legacy components and long operational histories. This reconstruction then feeds into identification of threats and calculation of risk levels. Practitioner tests across case studies indicate the outputs are reliable enough to guide security decisions where traditional methods stall due to missing information.

Core claim

ASTRAL is an architecture-centric security assessment technique implemented in a prototype tool powered by multimodal LLMs that extracts and synthesises system representations from disparate data sources by leveraging prompt chaining, few-shot learning, and architectural reasoning, thereby supporting adaptive threat identification and quantitative risk estimation for cyber-physical systems with incomplete documentation.

What carries the argument

ASTRAL, the prototype tool that integrates LLM reasoning with architectural modelling to reconstruct and analyse system structures from incomplete inputs.

If this is right

Security assessments become feasible for legacy cyber-physical systems whose documentation has become outdated or lost over time.
Quantitative risk estimates can be generated directly from partial data sources rather than requiring complete diagrams first.
Threat identification adapts based on the synthesised model instead of relying on static, incomplete records.
Practitioner evaluations show the outputs support more informed decisions in cyber risk management.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reconstruction process could extend to other long-lived technical systems where knowledge gaps accumulate, such as energy grids or transportation networks.
Pairing the outputs with formal verification methods might catch any remaining model inaccuracies before risk numbers are used for decisions.
Widespread adoption would reduce the frequency of full manual documentation audits for systems that operate for decades.

Load-bearing premise

Multimodal LLMs can accurately reconstruct architectural models and perform threat analysis from fragmented data without introducing errors that invalidate the resulting risk estimates.

What would settle it

Run ASTRAL on a CPS case study with deliberately incomplete data, then have independent experts manually reconstruct the full architecture from the same sources and compare the threats and risk values produced by each method for mismatches.

Figures

Figures reproduced from arXiv: 2604.05674 by Christopher M. Poskitt, Lwin Khin Shar, Shaofei Huang.

**Figure 1.** Figure 1: Overview of ASTRAL. The security assessment workflow for CPSs with incomplete architectural knowledge, enabled by [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Prototype implementation of ASTRAL (left) with architectural information reconstructed from incomplete artefacts, e.g., data [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Architectural reconstruction prompt with semantic guardrails and controlled sampling. Generates structured architectural [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: STRIDE-LM threat model generation prompt with semantic guardrails and controlled sampling. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Attack tree generation prompt with semantic guardrails and controlled sampling. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

Cyber-physical systems often contend with incomplete architectural documentation or outdated information resulting from legacy technologies, knowledge management gaps, and the complexity of integrating diverse subsystems over extended operational lifecycles. This architectural incompleteness impedes reliable security assessment, as inaccurate or missing architectural knowledge limits the identification of system dependencies, attack surfaces, and risk propagation pathways. To address this foundational challenge, this paper introduces ASTRAL (Architecture-Centric Security Threat Risk Assessment using LLMs), an architecture-centric security assessment technique implemented in a prototype tool powered by multimodal LLMs. The proposed approach assists practitioners in reconstructing and analysing CPS architectures when documentation is fragmented or absent. By leveraging prompt chaining, few-shot learning, and architectural reasoning, ASTRAL extracts and synthesises system representations from disparate data sources. By integrating LLM reasoning with architectural modelling, our approach supports adaptive threat identification and quantitative risk estimation for cyber-physical systems. We evaluated the approach through an ablation study across multiple CPS case studies and an expert evaluation involving 14 experienced cybersecurity practitioners. Practitioner feedback suggests that ASTRAL is useful and reliable for supporting architecture-centric security assessment. Overall, the results indicate that the approach can support more informed cyber risk management decisions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ASTRAL applies multimodal LLMs to rebuild CPS architectures from incomplete docs and estimate risks, but the evaluation skips objective accuracy checks against ground truth.

read the letter

The core idea is straightforward: use multimodal LLMs with prompt chaining and architectural reasoning to fill gaps in CPS documentation, then derive threats and quantitative risk scores from the reconstructed model. This targets a real, recurring problem in industrial and critical infrastructure systems where legacy components and poor record-keeping leave analysts working blind on dependencies and attack surfaces. The paper shows a working prototype and walks through how the pipeline pulls from mixed inputs like diagrams, text specs, and logs. They also ran an ablation study on multiple case studies and collected feedback from 14 practitioners who said the outputs felt useful for their workflows. That practical focus is the main strength here. The evaluation stays grounded in external judgment rather than circular self-tests. The soft spot is the missing validation layer. No numbers compare the LLM-generated architectures or risk values to independent ground-truth models, and there is no error analysis or precision metrics on extracted elements like component links or attack surfaces. Practitioner ratings help, but they are subjective and can miss consistent hallucinations or invented dependencies that would invalidate the downstream risk figures. Without those checks, the reliability claim rests on thin evidence. This is for security engineers and risk analysts who deal with incomplete CPS records, plus researchers working on LLM pipelines for domain modeling. A reader in that space would get some concrete implementation details and a sense of where the approach fits, but they would still need to run their own accuracy tests before trusting the numbers. The paper deserves peer review because the problem matters and the method is spelled out enough to critique and improve. Referees will likely press for stronger correctness experiments, which is exactly what it needs.

Referee Report

2 major / 2 minor

Summary. The paper introduces ASTRAL, a multimodal LLM-powered prototype for reconstructing and analyzing cyber-physical system (CPS) architectures from incomplete or fragmented documentation. It uses prompt chaining, few-shot learning, and architectural reasoning to extract system representations, identify threats, and produce quantitative risk estimates. Evaluation consists of an ablation study across CPS case studies plus feedback from 14 cybersecurity practitioners, leading to the conclusion that the approach is useful and reliable for architecture-centric security assessment.

Significance. If the LLM outputs can be shown to yield accurate architectural models and risk values, the work would address a practical gap in securing legacy and complex CPS by enabling security analysis when documentation is missing or outdated, potentially improving risk management in critical infrastructure domains.

major comments (2)

[Evaluation] Evaluation: The ablation study and expert evaluation with 14 practitioners report only subjective feedback on usefulness and reliability, without quantitative metrics (e.g., precision of extracted dependencies, error rates in risk scores) or direct comparisons of synthesized architectures and risk estimates against independent ground-truth models. This leaves the central claim of reliable quantitative risk estimation vulnerable to unaddressed LLM hallucination or systematic reconstruction errors.
[Abstract] Abstract and Evaluation: Practitioner ratings alone cannot substantiate the claim that ASTRAL produces sufficiently accurate outputs for decision-making, as subjective assessments may overlook invented components or incorrect attack-surface mappings that would invalidate downstream risk numbers.

minor comments (2)

The manuscript would benefit from explicit discussion of hallucination mitigation techniques employed in the prompt chaining and few-shot setup.
Consider including more details on the specific CPS case studies and data sources used to improve reproducibility of the ablation results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments on evaluation methodology are well-taken and have prompted revisions to clarify limitations and strengthen the presentation of results. We respond to each major comment below.

read point-by-point responses

Referee: [Evaluation] Evaluation: The ablation study and expert evaluation with 14 practitioners report only subjective feedback on usefulness and reliability, without quantitative metrics (e.g., precision of extracted dependencies, error rates in risk scores) or direct comparisons of synthesized architectures and risk estimates against independent ground-truth models. This leaves the central claim of reliable quantitative risk estimation vulnerable to unaddressed LLM hallucination or systematic reconstruction errors.

Authors: We acknowledge that the evaluation relies on ablation studies measuring the contribution of multimodal inputs, prompt chaining, and few-shot learning via expert-rated usefulness and reliability, together with feedback from 14 practitioners. Direct quantitative metrics such as precision of extracted dependencies or error rates against independent ground-truth models are not reported because the case studies were selected specifically for their incomplete documentation, which is the motivating problem and makes authoritative ground truth unavailable by construction. The framework incorporates architectural consistency checks within the prompt chain to reduce hallucination risks. In the revised manuscript we have expanded the evaluation section with an explicit discussion of potential reconstruction errors, added inter-rater agreement statistics from the practitioner study, and inserted a dedicated limitations subsection on the scope of the quantitative risk estimates. These changes address the concern without overstating the current evidence. revision: partial
Referee: [Abstract] Abstract and Evaluation: Practitioner ratings alone cannot substantiate the claim that ASTRAL produces sufficiently accurate outputs for decision-making, as subjective assessments may overlook invented components or incorrect attack-surface mappings that would invalidate downstream risk numbers.

Authors: We agree that subjective practitioner ratings have inherent limitations and cannot alone guarantee absence of invented components or mapping errors. The original wording in the abstract and conclusions was intentionally cautious, stating only that feedback “suggests” usefulness and reliability for supporting assessment. We have revised the abstract to emphasize the assistive role of ASTRAL and have added explicit caveats in the evaluation section regarding possible LLM-induced inaccuracies in component identification and risk propagation. Illustrative excerpts from the case studies have also been included to show how synthesized outputs were cross-checked against practitioner expectations, providing additional qualitative grounding for the reported risk estimates. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces the ASTRAL technique for reconstructing CPS architectures and estimating risks via multimodal LLMs, prompt chaining, and few-shot learning. Its central claims are supported by an ablation study on multiple case studies plus independent feedback from 14 external cybersecurity practitioners, rather than any internal fitting, self-referential predictions, or equations that reduce to the method's own inputs. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked to justify the core results. The evaluation draws on external benchmarks and expert judgment, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unverified assumption that current multimodal LLMs possess sufficient architectural reasoning capability to synthesize faithful system models from partial inputs; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption Multimodal LLMs can extract, synthesize, and reason about system architectures from fragmented or multimodal data sources
Invoked as the core mechanism enabling reconstruction when documentation is absent or outdated.

invented entities (1)

ASTRAL no independent evidence
purpose: Prototype tool implementing architecture-centric security threat and risk assessment via multimodal LLMs
The named method and implementation introduced to operationalize the LLM-driven workflow.

pith-pipeline@v0.9.0 · 5518 in / 1351 out tokens · 44969 ms · 2026-05-10T18:56:39.946751+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 38 canonical work pages · 3 internal anchors

[1]

Matt Adams. 2024. STRIDE GPT: An AI-powered Threat Modelling Tool. https://github.com/mrwadams/stride-gpt. Accessed: 2026-04-07. Manuscript submitted to ACM 24 Huang et al

2024
[2]

AutomationML Consortium. 2018. AutomationML Specifications. https://www.automationml.org/about-automationml/specifications/. Accessed: 2026-04-07

2018
[3]

Georgios Bakirtzis, Garrett Ward, Christopher Deloglos, Carl Elks, Barry Horowitz, and Cody Fleming. 2020. Fundamental Challenges of Cyber- Physical Systems Security Modeling. In2020 50th Annual IEEE-IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S). IEEE, 33–36. doi:10.1109/DSN-S50200.2020.00021

work page doi:10.1109/dsn-s50200.2020.00021 2020
[4]

Karin Bernsmed, Guillaume Bour, Martin Lundgren, and Erik Bergström. 2022. An evaluation of practitioners’ perceptions of a security risk assessment methodology in air traffic management projects.Journal of Air Transport Management102 (2022), 102223. doi:10.1016/j.jairtraman.2022.102223

work page doi:10.1016/j.jairtraman.2022.102223 2022
[5]

Alvaro Cardenas and Santa Cruz. 2021. Cyber-Physical Systems Security Knowledge Area Version 1.0.1. https://www.cybok.org/media/downloads/ Cyber_Physical_Systems-v1.0.1.pdf.The Cyber Security Body Of Knowledge(2021). Accessed: 2026-04-07

2021
[6]

Fulong Chen, Yuqing Tang, Canlin Wang, Jing Huang, Cheng Huang, Dong Xie, Taochun Wang, and Chuanxin Zhao. 2021. Medical Cyber- Physical Systems: A Solution to Smart Health and the State of the Art.IEEE Transactions on Computational Social Systems9, 5 (2021), 1359–1386. doi:10.1109/TCSS.2021.3122807

work page doi:10.1109/tcss.2021.3122807 2021
[7]

Daniela S Cruzes and Lotfi ben Othmane. 2017. Threats to Validity in Empirical Software Security Research. InEmpirical research for software security. CRC Press, 275–300. doi:10.1201/9781315154855

work page doi:10.1201/9781315154855 2017
[8]

Adrian Dabrowski, Johanna Ullrich, and Edgar R Weippl. 2017. Grid Shock: Coordinated Load-Changing Attacks on Power Grids: The Non- Smart Power Grid is Vulnerable to Cyber Attacks as Well. InProceedings of the 33rd Annual Computer Security Applications Conference. 303–314. doi:10.1145/3134600.313463

work page doi:10.1145/3134600.313463 2017
[9]

Stanislav Dashevskyi, Francesco La Spina, and Daniel Dos Santos. 2025. SUN:DOWN Destabilizing the grid via orchestrated exploitation of solar power systems. https://www.forescout.com/resources/sun-down-research-report/. Accessed: 2026-04-07

2025
[10]

OpenAI Developers. 2026. Structured model outputs. https://developers.openai.com/api/docs/guides/structured-outputs/. Accessed: 2026-04-07

2026
[11]

Dragos, Inc. 2024. Impact of FrostyGoop ICS Malware on Connected OT Systems. https://hub.dragos.com/report/frostygoop-ics-malware-impacting- operational-technology. Accessed: 2026-04-07

2024
[12]

Matthias Eckhart, Andreas Ekelhart, and Edgar Weippl. 2020. Automated Security Risk Identification Using AutomationML-based Engineering Data. IEEE Transactions on Dependable and Secure Computing19, 3 (2020), 1655–1672. doi:10.1109/TDSC.2020.3033150

work page doi:10.1109/tdsc.2020.3033150 2020
[13]

Isra Elsharef, Zhen Zeng, and Zhongshu Gu. 2024. Facilitating Threat Modeling by Leveraging Large Language Models. InWorkshop on AI Systems with Confidential Computing. NDSS. doi:10.14722/aiscc.2024.23016

work page doi:10.14722/aiscc.2024.23016 2024
[14]

FIRST. 2015. Common Vulnerability Scoring System v3.1: Specification Document. https://www.first.org/cvss/v3-1/specification-document. Accessed: 2026-04-07

2015
[15]

Daniele Granata and Massimiliano Rak. 2024. Systematic Analysis of Automated Threat Modelling Techniques: Comparison of Open-Source Tools. Software Quality Journal32, 1 (2024), 125–161. doi:10.1007/s11219-023-09634-4

work page doi:10.1007/s11219-023-09634-4 2024
[16]

Fikret Mert Gültekin, Oscar Lilja, Ranim Khojah, Rebekka Wohlrab, Marvin Damschen, and Mazen Mohamad. 2025. Leveraging Large Language Models for Cybersecurity Risk Assessment–A Case from Forestry Cyber-Physical Systems. In2025 40th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). IEEE, IEEE, 58–65. doi:10.1109/ASEW6777...

work page doi:10.1109/asew67777.2025.00021 2025
[17]

S Hernan, S Lambert, T Ostwald, et al . 2006. Uncover Security Design Flaws using The STRIDE Approach. https://learn.microsoft.com/en- us/archive/msdn-magazine/2006/november/uncover-security-design-flaws-using-the-stride-approach. Accessed: 2026-04-07

2006
[18]

Florian Hofer. 2018. Architecture, technologies and challenges for cyber-physical systems in industry 4.0: A systematic mapping study. InProceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1–10. doi:10.1145/3239235.3239242

work page doi:10.1145/3239235.3239242 2018
[19]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions.ACM Transactions on Information Systems43, 2 (2025), 1–55. doi:10.1145/3703155

work page doi:10.1145/3703155 2025
[20]

Shaofei Huang, Christopher M Poskitt, and Lwin Khin Shar. 2024. ACTISM: Threat-informed Dynamic Security Modelling for Automotive Systems. arXiv preprint arXiv:2412.00416(2024). doi:10.48550/arXiv.2412.00416

work page doi:10.48550/arxiv.2412.00416 2024
[21]

Shaofei Huang, Christopher M Poskitt, and Lwin Khin Shar. 2025. Security Modelling for Cyber-Physical Systems: A Systematic Literature Review. ACM Transactions on Cyber-Physical Systems(2025). doi:10.1145/3776549

work page doi:10.1145/3776549 2025
[22]

Shaofei Huang, Christopher M Poskitt, and Lwin Khin Shar. 2026. Bayesian and Multi-Objective Decision Support for Real-Time Incident Mitigation in Critical Infrastructure.arXiv preprint arXiv:2509.00770(2026). doi:10.48550/arXiv.2509.00770

work page doi:10.48550/arxiv.2509.00770 2026
[23]

IEC. 2018. Engineering data exchange format for use in industrial automation systems engineering – Automation Markup Language – Part 1: Architecture and general requirements. https://webstore.iec.ch/en/publication/32339, 170 pages

2018
[24]

2010–2024

International Electrotechnical Commission. 2010–2024. IEC 62443: Industrial Automation and Control System Security. Series of standards (Parts 1-1 through 4-2)

2010
[25]

Jay Jacobs, Sasha Romanosky, Benjamin Edwards, Idris Adjerid, and Michael Roytman. 2021. Exploit Prediction Scoring System (EPSS).Digital Threats: Research and Practice2, 3 (2021), 1–17. doi:10.1145/3436242

work page doi:10.1145/3436242 2021
[26]

A. M. Jamil, S. Khan, J. K. Lee, and L. B. Othmane. 2021. Towards Automated Threat Modeling of Cyber-Physical Systems. In2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM). IEEE, 614–619. doi:10.1109/ICSECS52883.2021.00118 Manuscript...

work page doi:10.1109/icsecs52883.2021.00118 2021
[27]

Jiang, M

Y. Jiang, M. A. Jeusfeld, M. Mosaad, and N. Oo. 2024. Enterprise architecture modeling for cybersecurity analysis in critical infrastructures—A systematic literature review.International Journal of Critical Infrastructure Protection46 (2024), 100700. doi:10.1016/j.ijcip.2024.100700

work page doi:10.1016/j.ijcip.2024.100700 2024
[28]

Maryam Kalantarnia, Faisal Khan, and Kelly Hawboldt. 2009. Dynamic risk assessment using failure assessment and Bayesian theory.Journal of Loss Prevention in the Process Industries22, 5 (2009), 600–606. doi:10.1016/j.jlp.2009.04.006

work page doi:10.1016/j.jlp.2009.04.006 2009
[29]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks.Advances in neural information processing systems33 (2020), 9459–9474

2020
[30]

Johannes Mäkelburg, Diego Perez-Palacin, Raffaela Mirandola, and Maribel Acosta. 2025. Surveying Uncertainty Representation: A Unified Model for Cyber-Physical Systems.arXiv preprint arXiv:2503.23892(2025). doi:10.48550/arXiv.2503.23892

work page doi:10.48550/arxiv.2503.23892 2025
[31]

Microsoft Corporation. [n. d.]. Microsoft Threat Modeling Tool. https://learn.microsoft.com/en-us/azure/security/develop/threat-modeling-tool. Accessed: 2026-04-07

2026
[32]

Mistral AI. 2025. Medium is the new large. https://mistral.ai/news/mistral-medium-3. Accessed: 2026-04-07

2025
[33]

Muckin and S

M. Muckin and S. C. Fitch. 2014.A Threat-Driven Approach to Cyber Security. Technical Report. Lockheed Martin Corporation

2014
[34]

OWASP Foundation. 2024. OWASP Threat Dragon. https://owasp.org/www-project-threat-dragon/. Accessed: 2026-04-07

2024
[35]

Palo Alto Networks Unit 42. 2024. FrostyGoop’s Zoom-In: A Closer Look into the Malware Artifacts Disrupting Critical Infrastructure. https: //unit42.paloaltonetworks.com/frostygoop-malware-analysis/. Accessed: 2026-04-07

2024
[36]

Nayot Poolsappasit, Rinku Dewri, and Indrajit Ray. 2011. Dynamic Security Risk Management Using Bayesian Attack Graphs.IEEE Transactions on Dependable and Secure Computing9, 1 (2011), 61–74. doi:10.1109/TDSC.2011.34

work page doi:10.1109/tdsc.2011.34 2011
[37]

Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha. 2024. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications.arXiv preprint arXiv:2402.07927(2024)

work page internal anchor Pith review arXiv 2024
[38]

M Sabbir Salek, Mashrur Chowdhury, Muhaimin Bin Munir, Yuchen Cai, Mohammad Imtiaz Hasan, Jean-Michel Tine, Latifur Khan, and Mizanur Rahman. 2025. A Large Language Model-Supported Threat Modeling Framework for Transportation Cyber-Physical Systems.arXiv preprint arXiv:2506.00831(2025)

work page arXiv 2025
[39]

Mahdi Jafari Sarvejahani. 2025. Towards Architectural Pen Test Case Generation and Attack Surface Analysis to Support Secure Design. In2025 IEEE 22nd International Conference on Software Architecture Companion (ICSA-C). IEEE, 143–148. doi:10.1109/ICSA-C65153.2025.00027

work page doi:10.1109/icsa-c65153.2025.00027 2025
[40]

B Schneier. 1999. Attack Trees. https://tnlandforms.us/cs594-cns96/attacktrees.pdf

1999
[41]

Huang Shaofei. 2025. AutomationML Security Extension For CPS. https://github.com/shaofeihuang/automationml-cps-security. Accessed: 2026-04-07

2025
[42]

Huang Shaofei. 2026. Architecture-Centric Security Threat Risk Assessment using LLMs (ASTRAL) GitHub Repository. https://github.com/ shaofeihuang/ASTRAL. Accessed: 2026-04-07

2026
[43]

Shichao Sun, Ruifeng Yuan, Ziqiang Cao, Wenjie Li, and Pengfei Liu. 2024. Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization. InFindings of the Association for Computational Linguistics: ACL 2024. 7551–7558. doi:10.18653/v1/2024.findings-acl.449

work page doi:10.18653/v1/2024.findings-acl.449 2024
[44]

Sidhant Thoviti. 2024. ArgusGPT. https://github.com/sidthoviti/ArgusGPT/. Accessed: 2026-04-07

2024
[45]

Shuang Tian, Tao Zhang, Jiqiang Liu, Jiacheng Wang, Xuangou Wu, Xiaoqiang Zhu, Ruichen Zhang, Weiting Zhang, Zhenhui Yuan, Shiwen Mao, et al. 2025. Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey.arXiv preprint arXiv:2504.15622(2025). doi:10.48550/arXiv.2504.15622

work page doi:10.48550/arxiv.2504.15622 2025
[46]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need.Advances in neural information processing systems30 (2017)

2017
[47]

Chamila Wijayarathna and Nalin AG Arachchilage. 2018. A Methodology to Evaluate the Usability of Security APIs. In2018 IEEE International Conference on Information and Automation for Sustainability (ICIAfS). IEEE, 1–6. doi:10.1109/ICIAFS.2018.8913353

work page doi:10.1109/iciafs.2018.8913353 2018
[48]

Williams

Theodore J. Williams. 1994. The Purdue Enterprise Reference Architecture.Computers in Industry24, 2-3 (1994), 141–158. doi:10.1016/0166- 3615(94)90017-5

work page doi:10.1016/0166- 1994
[49]

Wolf and E

M. Wolf and E. Feron. 2015. What Don’t We Know About CPS Architectures?. InProceedings of the 52nd Annual Design Automation Conference. ACM, 1–4. doi:10.1145/2744769.27479

work page doi:10.1145/2744769.27479 2015
[50]

Tingmin Wu, Shuiqiao Yang, Shigang Liu, David Nguyen, Seung Jang, and Alsharif Abuadbba. 2024. ThreatModeling-LLM: Automating Threat Modeling using Large Language Models for Banking System.arXiv preprint arXiv:2411.17058(2024)

work page arXiv 2024
[51]

H. Xu, S. Wang, N. Li, K. Wang, Y. Zhao, K. Chen, T. Yu, Y. Liu, and H. Wang. 2024. Large Language Models for Cyber Security: A Systematic Literature Review.ACM Transactions on Software Engineering and Methodology(2024). doi:10.1145/3769676

work page doi:10.1145/3769676 2024
[52]

Weizhe Xu, Mengyu Liu, Oleg Sokolsky, Insup Lee, and Fanxin Kong. 2024. LLM-enabled Cyber-Physical Systems: Survey, Research Opportunities, and Challenges. In2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys). IEEE, 50–55. doi:10.1109/FMSys62467.2024.00013

work page doi:10.1109/fmsys62467.2024.00013 2024
[53]

Ye Yang, Dinesh Verma, and Philip S Anton. 2023. Technical debt in the engineering of complex systems.Systems Engineering26, 5 (2023), 590–603. doi:10.1002/sys.21677

work page doi:10.1002/sys.21677 2023
[54]

Piotr Żebrowski, Aitor Couce-Vieira, and Alessandro Mancuso. 2022. A Bayesian Framework for the Analysis and Optimal Mitigation of Cyber Threats to Cyber-Physical Systems.Risk Analysis42, 10 (2022), 2275–2290. doi:10.1111/risa.13900 Manuscript submitted to ACM 26 Huang et al

work page doi:10.1111/risa.13900 2022
[55]

Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, and Dong Yu. 2024. MM-LLMs: Recent Advances in MultiModal Large Language Models.arXiv preprint arXiv:2401.13601(2024)

work page arXiv 2024
[56]

Nevin Lianwen Zhang and David Poole. 1996. Exploiting Causal Independence in Bayesian Network Inference.Journal of Artificial Intelligence Research5 (1996), 301–328. doi:10.1613/jair.305

work page doi:10.1613/jair.305 1996
[57]

Yin Zhang, Meikang Qiu, Chun-Wei Tsai, Mohammad Mehedi Hassan, and Atif Alamri. 2015. Health-CPS: Healthcare Cyber-Physical System Assisted by Cloud and Big Data.IEEE Systems Journal11, 1 (2015), 88–95. doi:10.1109/JSYST.2015.2460747

work page doi:10.1109/jsyst.2015.2460747 2015
[58]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A Survey of Large Language Models.arXiv preprint arXiv:2303.18223(2023)

work page internal anchor Pith review arXiv 2023
[59]

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Lin, Zhuojin Li, Dacheng Li, Mark Jordan, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.arXiv preprint arXiv:2306.05685(2023). Manuscript submitted to ACM

work page internal anchor Pith review Pith/arXiv arXiv 2023