Developers use LLMs like ChatGPT mainly for knowledge acquisition and code generation at the detailed design level, reporting benefits such as better technology selection and early flaw detection alongside limitations like lengthy outputs, incorrect code, and hallucinations.
Software architecture meets LLMs: A systematic literature review
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 6years
2026 6roles
background 1polarities
background 1representative citing papers
R2ABench benchmark shows LLMs generate syntactically valid software architectures from requirements but produce structurally fragmented results due to weak relational reasoning.
AI coding agents perform vibe architecting by making prompt-driven architectural choices that produce structurally different systems for identical tasks.
LLMs achieve only modest understanding of HMSC formal semantics at 52 percent accuracy, performing strongly on basic constructs but weakly on abstractions and traces.
CAKE benchmark shows MCQ accuracy on cloud architecture plateaus near 99% above 3B parameters while free-response scores improve steadily with size, and reasoning steps help but tools hurt small models.
LLMs achieve 98.22% accuracy answering factual questions about ROS2 software architectures, with top models reaching 100%.
citing papers explorer
-
Using LLMs in Software Design: An Empirical Study of GitHub and A Practitioner Survey
Developers use LLMs like ChatGPT mainly for knowledge acquisition and code generation at the detailed design level, reporting benefits such as better technology selection and early flaw detection alongside limitations like lengthy outputs, incorrect code, and hallucinations.
-
Benchmarking Requirement-to-Architecture Generation with Hybrid Evaluation
R2ABench benchmark shows LLMs generate syntactically valid software architectures from requirements but produce structurally fragmented results due to weak relational reasoning.
-
Architecture Without Architects: How AI Coding Agents Shape Software Architecture
AI coding agents perform vibe architecting by making prompt-driven architectural choices that produce structurally different systems for identical tasks.
-
(How) Do Large Language Models Understand High-Level Message Sequence Charts?
LLMs achieve only modest understanding of HMSC formal semantics at 52 percent accuracy, performing strongly on basic constructs but weakly on abstractions and traces.
-
CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models
CAKE benchmark shows MCQ accuracy on cloud architecture plateaus near 99% above 3B parameters while free-response scores improve steadily with size, and reasoning steps help but tools hurt small models.
-
Can Large Language Models Assist the Comprehension of ROS2 Software Architectures?
LLMs achieve 98.22% accuracy answering factual questions about ROS2 software architectures, with top models reaching 100%.