Recognition: no theorem link
From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems
Pith reviewed 2026-05-10 18:56 UTC · model grok-4.3
The pith
Multimodal LLMs can synthesize complete architectural models from fragmented data to support quantitative risk assessment in cyber-physical systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ASTRAL is an architecture-centric security assessment technique implemented in a prototype tool powered by multimodal LLMs that extracts and synthesises system representations from disparate data sources by leveraging prompt chaining, few-shot learning, and architectural reasoning, thereby supporting adaptive threat identification and quantitative risk estimation for cyber-physical systems with incomplete documentation.
What carries the argument
ASTRAL, the prototype tool that integrates LLM reasoning with architectural modelling to reconstruct and analyse system structures from incomplete inputs.
If this is right
- Security assessments become feasible for legacy cyber-physical systems whose documentation has become outdated or lost over time.
- Quantitative risk estimates can be generated directly from partial data sources rather than requiring complete diagrams first.
- Threat identification adapts based on the synthesised model instead of relying on static, incomplete records.
- Practitioner evaluations show the outputs support more informed decisions in cyber risk management.
Where Pith is reading between the lines
- The same reconstruction process could extend to other long-lived technical systems where knowledge gaps accumulate, such as energy grids or transportation networks.
- Pairing the outputs with formal verification methods might catch any remaining model inaccuracies before risk numbers are used for decisions.
- Widespread adoption would reduce the frequency of full manual documentation audits for systems that operate for decades.
Load-bearing premise
Multimodal LLMs can accurately reconstruct architectural models and perform threat analysis from fragmented data without introducing errors that invalidate the resulting risk estimates.
What would settle it
Run ASTRAL on a CPS case study with deliberately incomplete data, then have independent experts manually reconstruct the full architecture from the same sources and compare the threats and risk values produced by each method for mismatches.
Figures
read the original abstract
Cyber-physical systems often contend with incomplete architectural documentation or outdated information resulting from legacy technologies, knowledge management gaps, and the complexity of integrating diverse subsystems over extended operational lifecycles. This architectural incompleteness impedes reliable security assessment, as inaccurate or missing architectural knowledge limits the identification of system dependencies, attack surfaces, and risk propagation pathways. To address this foundational challenge, this paper introduces ASTRAL (Architecture-Centric Security Threat Risk Assessment using LLMs), an architecture-centric security assessment technique implemented in a prototype tool powered by multimodal LLMs. The proposed approach assists practitioners in reconstructing and analysing CPS architectures when documentation is fragmented or absent. By leveraging prompt chaining, few-shot learning, and architectural reasoning, ASTRAL extracts and synthesises system representations from disparate data sources. By integrating LLM reasoning with architectural modelling, our approach supports adaptive threat identification and quantitative risk estimation for cyber-physical systems. We evaluated the approach through an ablation study across multiple CPS case studies and an expert evaluation involving 14 experienced cybersecurity practitioners. Practitioner feedback suggests that ASTRAL is useful and reliable for supporting architecture-centric security assessment. Overall, the results indicate that the approach can support more informed cyber risk management decisions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ASTRAL, a multimodal LLM-powered prototype for reconstructing and analyzing cyber-physical system (CPS) architectures from incomplete or fragmented documentation. It uses prompt chaining, few-shot learning, and architectural reasoning to extract system representations, identify threats, and produce quantitative risk estimates. Evaluation consists of an ablation study across CPS case studies plus feedback from 14 cybersecurity practitioners, leading to the conclusion that the approach is useful and reliable for architecture-centric security assessment.
Significance. If the LLM outputs can be shown to yield accurate architectural models and risk values, the work would address a practical gap in securing legacy and complex CPS by enabling security analysis when documentation is missing or outdated, potentially improving risk management in critical infrastructure domains.
major comments (2)
- [Evaluation] Evaluation: The ablation study and expert evaluation with 14 practitioners report only subjective feedback on usefulness and reliability, without quantitative metrics (e.g., precision of extracted dependencies, error rates in risk scores) or direct comparisons of synthesized architectures and risk estimates against independent ground-truth models. This leaves the central claim of reliable quantitative risk estimation vulnerable to unaddressed LLM hallucination or systematic reconstruction errors.
- [Abstract] Abstract and Evaluation: Practitioner ratings alone cannot substantiate the claim that ASTRAL produces sufficiently accurate outputs for decision-making, as subjective assessments may overlook invented components or incorrect attack-surface mappings that would invalidate downstream risk numbers.
minor comments (2)
- The manuscript would benefit from explicit discussion of hallucination mitigation techniques employed in the prompt chaining and few-shot setup.
- Consider including more details on the specific CPS case studies and data sources used to improve reproducibility of the ablation results.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The comments on evaluation methodology are well-taken and have prompted revisions to clarify limitations and strengthen the presentation of results. We respond to each major comment below.
read point-by-point responses
-
Referee: [Evaluation] Evaluation: The ablation study and expert evaluation with 14 practitioners report only subjective feedback on usefulness and reliability, without quantitative metrics (e.g., precision of extracted dependencies, error rates in risk scores) or direct comparisons of synthesized architectures and risk estimates against independent ground-truth models. This leaves the central claim of reliable quantitative risk estimation vulnerable to unaddressed LLM hallucination or systematic reconstruction errors.
Authors: We acknowledge that the evaluation relies on ablation studies measuring the contribution of multimodal inputs, prompt chaining, and few-shot learning via expert-rated usefulness and reliability, together with feedback from 14 practitioners. Direct quantitative metrics such as precision of extracted dependencies or error rates against independent ground-truth models are not reported because the case studies were selected specifically for their incomplete documentation, which is the motivating problem and makes authoritative ground truth unavailable by construction. The framework incorporates architectural consistency checks within the prompt chain to reduce hallucination risks. In the revised manuscript we have expanded the evaluation section with an explicit discussion of potential reconstruction errors, added inter-rater agreement statistics from the practitioner study, and inserted a dedicated limitations subsection on the scope of the quantitative risk estimates. These changes address the concern without overstating the current evidence. revision: partial
-
Referee: [Abstract] Abstract and Evaluation: Practitioner ratings alone cannot substantiate the claim that ASTRAL produces sufficiently accurate outputs for decision-making, as subjective assessments may overlook invented components or incorrect attack-surface mappings that would invalidate downstream risk numbers.
Authors: We agree that subjective practitioner ratings have inherent limitations and cannot alone guarantee absence of invented components or mapping errors. The original wording in the abstract and conclusions was intentionally cautious, stating only that feedback “suggests” usefulness and reliability for supporting assessment. We have revised the abstract to emphasize the assistive role of ASTRAL and have added explicit caveats in the evaluation section regarding possible LLM-induced inaccuracies in component identification and risk propagation. Illustrative excerpts from the case studies have also been included to show how synthesized outputs were cross-checked against practitioner expectations, providing additional qualitative grounding for the reported risk estimates. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper introduces the ASTRAL technique for reconstructing CPS architectures and estimating risks via multimodal LLMs, prompt chaining, and few-shot learning. Its central claims are supported by an ablation study on multiple case studies plus independent feedback from 14 external cybersecurity practitioners, rather than any internal fitting, self-referential predictions, or equations that reduce to the method's own inputs. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked to justify the core results. The evaluation draws on external benchmarks and expert judgment, making the derivation self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multimodal LLMs can extract, synthesize, and reason about system architectures from fragmented or multimodal data sources
invented entities (1)
-
ASTRAL
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Matt Adams. 2024. STRIDE GPT: An AI-powered Threat Modelling Tool. https://github.com/mrwadams/stride-gpt. Accessed: 2026-04-07. Manuscript submitted to ACM 24 Huang et al
2024
-
[2]
AutomationML Consortium. 2018. AutomationML Specifications. https://www.automationml.org/about-automationml/specifications/. Accessed: 2026-04-07
2018
-
[3]
Georgios Bakirtzis, Garrett Ward, Christopher Deloglos, Carl Elks, Barry Horowitz, and Cody Fleming. 2020. Fundamental Challenges of Cyber- Physical Systems Security Modeling. In2020 50th Annual IEEE-IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S). IEEE, 33–36. doi:10.1109/DSN-S50200.2020.00021
-
[4]
Karin Bernsmed, Guillaume Bour, Martin Lundgren, and Erik Bergström. 2022. An evaluation of practitioners’ perceptions of a security risk assessment methodology in air traffic management projects.Journal of Air Transport Management102 (2022), 102223. doi:10.1016/j.jairtraman.2022.102223
-
[5]
Alvaro Cardenas and Santa Cruz. 2021. Cyber-Physical Systems Security Knowledge Area Version 1.0.1. https://www.cybok.org/media/downloads/ Cyber_Physical_Systems-v1.0.1.pdf.The Cyber Security Body Of Knowledge(2021). Accessed: 2026-04-07
2021
-
[6]
Fulong Chen, Yuqing Tang, Canlin Wang, Jing Huang, Cheng Huang, Dong Xie, Taochun Wang, and Chuanxin Zhao. 2021. Medical Cyber- Physical Systems: A Solution to Smart Health and the State of the Art.IEEE Transactions on Computational Social Systems9, 5 (2021), 1359–1386. doi:10.1109/TCSS.2021.3122807
-
[7]
Daniela S Cruzes and Lotfi ben Othmane. 2017. Threats to Validity in Empirical Software Security Research. InEmpirical research for software security. CRC Press, 275–300. doi:10.1201/9781315154855
-
[8]
Adrian Dabrowski, Johanna Ullrich, and Edgar R Weippl. 2017. Grid Shock: Coordinated Load-Changing Attacks on Power Grids: The Non- Smart Power Grid is Vulnerable to Cyber Attacks as Well. InProceedings of the 33rd Annual Computer Security Applications Conference. 303–314. doi:10.1145/3134600.313463
-
[9]
Stanislav Dashevskyi, Francesco La Spina, and Daniel Dos Santos. 2025. SUN:DOWN Destabilizing the grid via orchestrated exploitation of solar power systems. https://www.forescout.com/resources/sun-down-research-report/. Accessed: 2026-04-07
2025
-
[10]
OpenAI Developers. 2026. Structured model outputs. https://developers.openai.com/api/docs/guides/structured-outputs/. Accessed: 2026-04-07
2026
-
[11]
Dragos, Inc. 2024. Impact of FrostyGoop ICS Malware on Connected OT Systems. https://hub.dragos.com/report/frostygoop-ics-malware-impacting- operational-technology. Accessed: 2026-04-07
2024
-
[12]
Matthias Eckhart, Andreas Ekelhart, and Edgar Weippl. 2020. Automated Security Risk Identification Using AutomationML-based Engineering Data. IEEE Transactions on Dependable and Secure Computing19, 3 (2020), 1655–1672. doi:10.1109/TDSC.2020.3033150
-
[13]
Isra Elsharef, Zhen Zeng, and Zhongshu Gu. 2024. Facilitating Threat Modeling by Leveraging Large Language Models. InWorkshop on AI Systems with Confidential Computing. NDSS. doi:10.14722/aiscc.2024.23016
-
[14]
FIRST. 2015. Common Vulnerability Scoring System v3.1: Specification Document. https://www.first.org/cvss/v3-1/specification-document. Accessed: 2026-04-07
2015
-
[15]
Daniele Granata and Massimiliano Rak. 2024. Systematic Analysis of Automated Threat Modelling Techniques: Comparison of Open-Source Tools. Software Quality Journal32, 1 (2024), 125–161. doi:10.1007/s11219-023-09634-4
-
[16]
Fikret Mert Gültekin, Oscar Lilja, Ranim Khojah, Rebekka Wohlrab, Marvin Damschen, and Mazen Mohamad. 2025. Leveraging Large Language Models for Cybersecurity Risk Assessment–A Case from Forestry Cyber-Physical Systems. In2025 40th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). IEEE, IEEE, 58–65. doi:10.1109/ASEW6777...
-
[17]
S Hernan, S Lambert, T Ostwald, et al . 2006. Uncover Security Design Flaws using The STRIDE Approach. https://learn.microsoft.com/en- us/archive/msdn-magazine/2006/november/uncover-security-design-flaws-using-the-stride-approach. Accessed: 2026-04-07
2006
-
[18]
Florian Hofer. 2018. Architecture, technologies and challenges for cyber-physical systems in industry 4.0: A systematic mapping study. InProceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1–10. doi:10.1145/3239235.3239242
-
[19]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions.ACM Transactions on Information Systems43, 2 (2025), 1–55. doi:10.1145/3703155
-
[20]
Shaofei Huang, Christopher M Poskitt, and Lwin Khin Shar. 2024. ACTISM: Threat-informed Dynamic Security Modelling for Automotive Systems. arXiv preprint arXiv:2412.00416(2024). doi:10.48550/arXiv.2412.00416
-
[21]
Shaofei Huang, Christopher M Poskitt, and Lwin Khin Shar. 2025. Security Modelling for Cyber-Physical Systems: A Systematic Literature Review. ACM Transactions on Cyber-Physical Systems(2025). doi:10.1145/3776549
-
[22]
Shaofei Huang, Christopher M Poskitt, and Lwin Khin Shar. 2026. Bayesian and Multi-Objective Decision Support for Real-Time Incident Mitigation in Critical Infrastructure.arXiv preprint arXiv:2509.00770(2026). doi:10.48550/arXiv.2509.00770
-
[23]
IEC. 2018. Engineering data exchange format for use in industrial automation systems engineering – Automation Markup Language – Part 1: Architecture and general requirements. https://webstore.iec.ch/en/publication/32339, 170 pages
2018
-
[24]
2010–2024
International Electrotechnical Commission. 2010–2024. IEC 62443: Industrial Automation and Control System Security. Series of standards (Parts 1-1 through 4-2)
2010
-
[25]
Jay Jacobs, Sasha Romanosky, Benjamin Edwards, Idris Adjerid, and Michael Roytman. 2021. Exploit Prediction Scoring System (EPSS).Digital Threats: Research and Practice2, 3 (2021), 1–17. doi:10.1145/3436242
-
[26]
A. M. Jamil, S. Khan, J. K. Lee, and L. B. Othmane. 2021. Towards Automated Threat Modeling of Cyber-Physical Systems. In2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM). IEEE, 614–619. doi:10.1109/ICSECS52883.2021.00118 Manuscript...
-
[27]
Y. Jiang, M. A. Jeusfeld, M. Mosaad, and N. Oo. 2024. Enterprise architecture modeling for cybersecurity analysis in critical infrastructures—A systematic literature review.International Journal of Critical Infrastructure Protection46 (2024), 100700. doi:10.1016/j.ijcip.2024.100700
-
[28]
Maryam Kalantarnia, Faisal Khan, and Kelly Hawboldt. 2009. Dynamic risk assessment using failure assessment and Bayesian theory.Journal of Loss Prevention in the Process Industries22, 5 (2009), 600–606. doi:10.1016/j.jlp.2009.04.006
-
[29]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks.Advances in neural information processing systems33 (2020), 9459–9474
2020
-
[30]
Johannes Mäkelburg, Diego Perez-Palacin, Raffaela Mirandola, and Maribel Acosta. 2025. Surveying Uncertainty Representation: A Unified Model for Cyber-Physical Systems.arXiv preprint arXiv:2503.23892(2025). doi:10.48550/arXiv.2503.23892
-
[31]
Microsoft Corporation. [n. d.]. Microsoft Threat Modeling Tool. https://learn.microsoft.com/en-us/azure/security/develop/threat-modeling-tool. Accessed: 2026-04-07
2026
-
[32]
Mistral AI. 2025. Medium is the new large. https://mistral.ai/news/mistral-medium-3. Accessed: 2026-04-07
2025
-
[33]
Muckin and S
M. Muckin and S. C. Fitch. 2014.A Threat-Driven Approach to Cyber Security. Technical Report. Lockheed Martin Corporation
2014
-
[34]
OWASP Foundation. 2024. OWASP Threat Dragon. https://owasp.org/www-project-threat-dragon/. Accessed: 2026-04-07
2024
-
[35]
Palo Alto Networks Unit 42. 2024. FrostyGoop’s Zoom-In: A Closer Look into the Malware Artifacts Disrupting Critical Infrastructure. https: //unit42.paloaltonetworks.com/frostygoop-malware-analysis/. Accessed: 2026-04-07
2024
-
[36]
Nayot Poolsappasit, Rinku Dewri, and Indrajit Ray. 2011. Dynamic Security Risk Management Using Bayesian Attack Graphs.IEEE Transactions on Dependable and Secure Computing9, 1 (2011), 61–74. doi:10.1109/TDSC.2011.34
-
[37]
Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha. 2024. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications.arXiv preprint arXiv:2402.07927(2024)
work page internal anchor Pith review arXiv 2024
-
[38]
M Sabbir Salek, Mashrur Chowdhury, Muhaimin Bin Munir, Yuchen Cai, Mohammad Imtiaz Hasan, Jean-Michel Tine, Latifur Khan, and Mizanur Rahman. 2025. A Large Language Model-Supported Threat Modeling Framework for Transportation Cyber-Physical Systems.arXiv preprint arXiv:2506.00831(2025)
-
[39]
Mahdi Jafari Sarvejahani. 2025. Towards Architectural Pen Test Case Generation and Attack Surface Analysis to Support Secure Design. In2025 IEEE 22nd International Conference on Software Architecture Companion (ICSA-C). IEEE, 143–148. doi:10.1109/ICSA-C65153.2025.00027
-
[40]
B Schneier. 1999. Attack Trees. https://tnlandforms.us/cs594-cns96/attacktrees.pdf
1999
-
[41]
Huang Shaofei. 2025. AutomationML Security Extension For CPS. https://github.com/shaofeihuang/automationml-cps-security. Accessed: 2026-04-07
2025
-
[42]
Huang Shaofei. 2026. Architecture-Centric Security Threat Risk Assessment using LLMs (ASTRAL) GitHub Repository. https://github.com/ shaofeihuang/ASTRAL. Accessed: 2026-04-07
2026
-
[43]
Shichao Sun, Ruifeng Yuan, Ziqiang Cao, Wenjie Li, and Pengfei Liu. 2024. Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization. InFindings of the Association for Computational Linguistics: ACL 2024. 7551–7558. doi:10.18653/v1/2024.findings-acl.449
-
[44]
Sidhant Thoviti. 2024. ArgusGPT. https://github.com/sidthoviti/ArgusGPT/. Accessed: 2026-04-07
2024
-
[45]
Shuang Tian, Tao Zhang, Jiqiang Liu, Jiacheng Wang, Xuangou Wu, Xiaoqiang Zhu, Ruichen Zhang, Weiting Zhang, Zhenhui Yuan, Shiwen Mao, et al. 2025. Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey.arXiv preprint arXiv:2504.15622(2025). doi:10.48550/arXiv.2504.15622
-
[46]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need.Advances in neural information processing systems30 (2017)
2017
-
[47]
Chamila Wijayarathna and Nalin AG Arachchilage. 2018. A Methodology to Evaluate the Usability of Security APIs. In2018 IEEE International Conference on Information and Automation for Sustainability (ICIAfS). IEEE, 1–6. doi:10.1109/ICIAFS.2018.8913353
-
[48]
Theodore J. Williams. 1994. The Purdue Enterprise Reference Architecture.Computers in Industry24, 2-3 (1994), 141–158. doi:10.1016/0166- 3615(94)90017-5
-
[49]
M. Wolf and E. Feron. 2015. What Don’t We Know About CPS Architectures?. InProceedings of the 52nd Annual Design Automation Conference. ACM, 1–4. doi:10.1145/2744769.27479
- [50]
-
[51]
H. Xu, S. Wang, N. Li, K. Wang, Y. Zhao, K. Chen, T. Yu, Y. Liu, and H. Wang. 2024. Large Language Models for Cyber Security: A Systematic Literature Review.ACM Transactions on Software Engineering and Methodology(2024). doi:10.1145/3769676
-
[52]
Weizhe Xu, Mengyu Liu, Oleg Sokolsky, Insup Lee, and Fanxin Kong. 2024. LLM-enabled Cyber-Physical Systems: Survey, Research Opportunities, and Challenges. In2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys). IEEE, 50–55. doi:10.1109/FMSys62467.2024.00013
-
[53]
Ye Yang, Dinesh Verma, and Philip S Anton. 2023. Technical debt in the engineering of complex systems.Systems Engineering26, 5 (2023), 590–603. doi:10.1002/sys.21677
-
[54]
Piotr Żebrowski, Aitor Couce-Vieira, and Alessandro Mancuso. 2022. A Bayesian Framework for the Analysis and Optimal Mitigation of Cyber Threats to Cyber-Physical Systems.Risk Analysis42, 10 (2022), 2275–2290. doi:10.1111/risa.13900 Manuscript submitted to ACM 26 Huang et al
- [55]
-
[56]
Nevin Lianwen Zhang and David Poole. 1996. Exploiting Causal Independence in Bayesian Network Inference.Journal of Artificial Intelligence Research5 (1996), 301–328. doi:10.1613/jair.305
-
[57]
Yin Zhang, Meikang Qiu, Chun-Wei Tsai, Mohammad Mehedi Hassan, and Atif Alamri. 2015. Health-CPS: Healthcare Cyber-Physical System Assisted by Cloud and Big Data.IEEE Systems Journal11, 1 (2015), 88–95. doi:10.1109/JSYST.2015.2460747
-
[58]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A Survey of Large Language Models.arXiv preprint arXiv:2303.18223(2023)
work page internal anchor Pith review arXiv 2023
-
[59]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Lin, Zhuojin Li, Dacheng Li, Mark Jordan, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.arXiv preprint arXiv:2306.05685(2023). Manuscript submitted to ACM
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.