Are LLMs Ready for Anti-Pattern Detection in Microservice Architectures?
Pith reviewed 2026-06-26 03:49 UTC · model grok-4.3
The pith
LLMs provide competitive support for detecting several microservice anti-patterns but fall short where explicit structural or cross-service evidence is needed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that LLMs deliver useful support for anti-pattern detection in microservice architectures via prompt-based analysis of static artifacts, reaching competitive precision and recall on several of the 16 patterns when evidence is local, heterogeneous, or semantically rich, while showing clear shortfalls on patterns that demand explicit structural or cross-service dependency information where static analysis stays more reliable.
What carries the argument
Prompt-based analysis pipeline that feeds static repository artifacts to LLMs and measures output against a static-analysis baseline using uniform precision and recall on the 16 annotated anti-patterns.
If this is right
- Hybrid detection that routes semantic-heavy patterns to LLMs and structural patterns to static tools would cover more cases than either alone.
- Prompt-based LLM checks can be applied to new or heterogeneous repositories without writing additional detection rules.
- Performance gaps on cross-service patterns indicate that LLM outputs would benefit from explicit structural context supplied in the prompt.
- Maintainability assessments for microservices could incorporate LLM scans as a first-pass filter before targeted static verification.
Where Pith is reading between the lines
- A combined system might first let an LLM flag candidate anti-patterns and then run static analysis only on the flagged services to reduce overall computation.
- The same prompt pipeline could be tested on other kinds of architectural smells that mix code-level and configuration-level signals.
- If prompts were expanded with dependency graphs extracted by lightweight parsers, the LLM gap on structural patterns might narrow without changing the model.
- Repository-scale evaluation could reveal whether LLMs generalize across different microservice frameworks or primarily succeed on the styles present in the current benchmark.
Load-bearing premise
The chosen set of annotated microservice repositories and the single uniform precision-recall protocol give a representative and fair comparison between the LLM approach and static analysis.
What would settle it
Running the same LLMs on a fresh collection of microservice repositories where the anti-patterns with structural dependencies appear more often and the LLMs still score higher than the static baseline on those patterns.
Figures
read the original abstract
Microservice systems are prone to recurrent architectural anti-patterns (APs) that hinder maintainability, evolvability, and operational quality. Most existing AP detection approaches rely on static analysis and handcrafted rules, which can be effective but are often tool-dependent, limited to explicitly encoded detection logic, and difficult to adapt to heterogeneous repositories. In this paper, we investigate whether large language models (LLMs) are ready to support architectural anti-pattern detection in microservice architectures through a prompt-based analysis pipeline over static repository artifacts. We evaluate three general-purpose LLMs on a curated benchmark of microservice repositories annotated with 16 architectural anti-patterns, and compare their performance against the state-of-the-art static-analysis tool MARS using a uniform evaluation protocol based on precision and recall. Our results show that LLMs can provide useful support for anti-pattern detection, achieving competitive performance on several anti-patterns, especially when the relevant evidence is local, heterogeneous, or semantically rich. At the same time, they exhibit clear limitations on anti-patterns that require explicit structural or cross-service dependency evidence, where static analysis remains more reliable. These findings suggest that LLMs are not yet a replacement for traditional analyzers, but already represent a promising complementary aid for architectural assessment in microservice systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates whether LLMs can support architectural anti-pattern detection in microservice systems via a prompt-based analysis pipeline on static repository artifacts. It evaluates three general-purpose LLMs against the static-analysis tool MARS on a curated benchmark annotated with 16 anti-patterns, using a uniform precision/recall protocol, and concludes that LLMs achieve competitive performance on anti-patterns with local, heterogeneous, or semantically rich evidence but show clear limitations on those requiring explicit structural or cross-service dependency evidence, positioning LLMs as a promising complementary aid rather than a replacement.
Significance. If the empirical results hold under a validated protocol, the work provides concrete evidence on the strengths and limitations of prompt-based LLMs for software architecture analysis tasks. It identifies specific conditions (local vs. structural evidence) where LLMs add value over rule-based tools and vice versa, which can inform the design of hybrid detection systems and guide practitioners in microservice maintenance. The comparison on a multi-AP benchmark contributes empirical data to the AI-for-SE literature.
major comments (1)
- [Evaluation protocol (as described in the abstract and results sections)] The central claim of 'competitive performance on several anti-patterns' under a 'uniform evaluation protocol based on precision and recall' is load-bearing for the paper's conclusions, yet the protocol requires mapping LLM free-form responses to binary detections. This introduces an interpretation step absent from MARS's deterministic rule-based outputs, making direct comparability sensitive to unstated choices in annotation or tie-breaking. Details on this mapping (e.g., manual vs. automated, prompt-dependent rules) must be provided to substantiate the head-to-head results.
Simulated Author's Rebuttal
We thank the referee for the constructive comment on the evaluation protocol. We address the concern directly below.
read point-by-point responses
-
Referee: [Evaluation protocol (as described in the abstract and results sections)] The central claim of 'competitive performance on several anti-patterns' under a 'uniform evaluation protocol based on precision and recall' is load-bearing for the paper's conclusions, yet the protocol requires mapping LLM free-form responses to binary detections. This introduces an interpretation step absent from MARS's deterministic rule-based outputs, making direct comparability sensitive to unstated choices in annotation or tie-breaking. Details on this mapping (e.g., manual vs. automated, prompt-dependent rules) must be provided to substantiate the head-to-head results.
Authors: We agree that the manuscript would benefit from greater transparency on the mapping from LLM free-form responses to binary detections. While the protocol is uniform in applying identical precision and recall metrics to both LLM and MARS outputs against the same ground-truth annotations, the interpretation step for LLMs is an additional element that requires explicit description. In the revised manuscript we will expand the Evaluation Protocol section to document the mapping procedure, including whether it was manual or automated, the guidelines or rubric applied, and how ambiguities were resolved. revision: yes
Circularity Check
Empirical evaluation with no derivations, fitted predictions, or load-bearing self-citations
full rationale
The paper conducts a direct empirical comparison of three LLMs against the external static-analysis tool MARS on a curated benchmark of 16 anti-patterns, reporting precision and recall under a uniform protocol. No equations, parameter fitting, predictions derived from inputs, or self-citation chains appear in the abstract or described methodology. The central claims rest on measured performance numbers against an independent tool and benchmark, not on any reduction to the paper's own definitions or prior author work. This matches the default expectation for non-circular empirical studies.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
R. Tighilt, M. Abdellatif, I. Trabelsi, L. Madern, N. Moha, and Y .-G. Gu´eh´eneuc, “On the maintenance support for microservice-based systems through the specification and the detection of microservice antipatterns,” J. Syst. Softw., vol. 204, no. C, Oct. 2023. [Online]. Available: https://doi.org/10.1016/j.jss.2023.111755
-
[2]
Automated code-smell detection in microservices through static analysis: A case study,
A. Walker, D. Das, and T. Cerny, “Automated code-smell detection in microservices through static analysis: A case study,” Applied Sciences, vol. 10, no. 21, p. 7800, 2020
2020
-
[3]
Fowler, K
M. Fowler, K. Beck, J. Brant, W. Opdyke, and E. Roberts, Refactoring: Improving the design of existing code. Addison-Wesley, 1999
1999
-
[4]
W. J. Brown, R. C. Malveau, H. W. McCormick III, and T. J. Mowbray, AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis. Wiley, 1998
1998
-
[5]
Microservices anti-patterns: A taxonomy,
D. Taibi, V . Lenarduzzi, and C. Pahl, “Microservices anti-patterns: A taxonomy,” in Microservices: Science and Engineering. Springer International Publishing, 2020, pp. 111–128
2020
-
[6]
On the study of microservices antipatterns: a catalog proposal,
R. Tighilt, M. Abdellatif, N. Moha, H. Mili, G. E. Boussaidi, J. Privat, and Y .-G. Gu´eh´eneuc, “On the study of microservices antipatterns: a catalog proposal,” in Proceedings of the European Conference on Pattern Languages of Programs 2020, ser. EuroPLoP ’20. Association for Computing Machinery, 2020
2020
-
[7]
Flexible educational software architecture: at the example of eas.lit 2,
R. Meissner and A. Thor, “Flexible educational software architecture: at the example of eas.lit 2,” in International Workshop on Intelligent Mentoring in Higher Education (IMHE), 2020. [Online]. Available: https://ceur-ws.org/V ol-3046/imhe 2020 paper 2.pdf
2020
-
[8]
On the definition of microservice bad smells,
D. Taibi and V . Lenarduzzi, “On the definition of microservice bad smells,” IEEE Software, vol. 35, no. 3, pp. 56–62, 2018
2018
-
[9]
Catalog and detection techniques of microservice anti-patterns and bad smells: A tertiary study,
T. Cerny, A. S. Abdelfattah, A. A. Maruf, A. Janes, and D. Taibi, “Catalog and detection techniques of microservice anti-patterns and bad smells: A tertiary study,” Journal of Systems and Software, vol. 206, p. 111829, 2023
2023
-
[10]
On microservice analysis and architecture evolution: A systematic mapping study,
V . Bushong, A. S. Abdelfattah, A. A. Maruf, D. Das, A. Lehman, E. Jaroszewski, M. Coffey, T. Cerny, K. Frajtak, P. Tisnovsky, and M. Bures, “On microservice analysis and architecture evolution: A systematic mapping study,” Applied Sciences, vol. 11, no. 17, 2021
2021
-
[11]
How sonarqube-identified technical debt is prioritized: An exploratory case study,
R. Alfayez, R. Winn, W. Alwehaibi, E. Venson, and B. Boehm, “How sonarqube-identified technical debt is prioritized: An exploratory case study,”Information and Software Technology, vol. 156, p. 107147, 2023
2023
-
[12]
On the diffuseness of technical debt items and accuracy of remediation time when using sonarqube,
M. T. Baldassarre, V . Lenarduzzi, S. Romano, and N. Saarim ¨aki, “On the diffuseness of technical debt items and accuracy of remediation time when using sonarqube,” Information and Software Technology, vol. 128, p. 106377, 2020
2020
-
[13]
Comparison of static analysis architec- ture recovery tools for microservice applications,
S. Schneider, A. Bakhtin, X. Li, J. Soldani, A. Brogi, T. Cerny, R. Scandariato, and D. Taibi, “Comparison of static analysis architec- ture recovery tools for microservice applications,” Empirical Software Engineering, vol. 30, no. 5, p. 128, jun 2025
2025
-
[14]
Offline mining of microservice- based architectures (extended version),
J. Soldani, J. Khalili, and A. Brogi, “Offline mining of microservice- based architectures (extended version),” SN Computer Science, vol. 4, 04 2023
2023
-
[15]
Using microservice telemetry data for system dynamic analysis,
A. Al Maruf, A. Bakhtin, T. Cerny, and D. Taibi, “Using microservice telemetry data for system dynamic analysis,” in 2022 IEEE International Conference on Service-Oriented System Engineering (SOSE), 2022, pp. 29–38
2022
-
[16]
Model- based analysis of microservice resiliency patterns,
N. C. Mendonca, C. M. Aderaldo, J. Camara, and D. Garlan, “Model- based analysis of microservice resiliency patterns,” in 2020 IEEE International Conference on Software Architecture (ICSA), 2020, pp. 114–124
2020
-
[17]
Modeling and analysis of dependen- cies between microservices in devsecops,
J. McZara, S. Kafle, and D. Shin, “Modeling and analysis of dependen- cies between microservices in devsecops,” in 2020 IEEE International Conference on Smart Cloud (SmartCloud), 2020, pp. 140–147
2020
-
[18]
Graph-based root cause analysis for service-oriented and microservice architectures,
A. Brand ´on, M. Sol ´e, A. Hu ´elamo, D. Solans, M. S. P ´erez, and V . Munt´es-Mulero, “Graph-based root cause analysis for service-oriented and microservice architectures,” Journal of Systems and Software, vol. 159, p. 110432, 2020
2020
-
[19]
Jcallgraph: Tracing microservices in very large scale container cloud platforms,
H. Liu, J. Zhang, H. Shan, M. Li, Y . Chen, X. He, and X. Li, “Jcallgraph: Tracing microservices in very large scale container cloud platforms,” in Cloud Computing – CLOUD 2019: 12th International Conference. Springer-Verlag, 2019, p. 287–302
2019
-
[20]
Code ownership: The principles, differences, and their associations with software quality,
P. Thongtanunam and C. Tantithamthavorn, “Code ownership: The principles, differences, and their associations with software quality,” in 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), 2024, pp. 379–390
2024
-
[21]
Code ownership in open- source ai software security,
J. Wen, D. Yuan, L. Ma, and H. Chen, “Code ownership in open- source ai software security,” in Proceedings of the 2nd International Workshop on Responsible AI Engineering, ser. RAIE ’24. Association for Computing Machinery, 2024, p. 28–35. [Online]. Available: https://doi.org/10.1145/3643691.3648586
-
[22]
Ownership vs contribution: Investigating the alignment between ownership and con- tribution,
E. Zabardast, J. Gonzalez-Huerta, and B. Tanveer, “Ownership vs contribution: Investigating the alignment between ownership and con- tribution,” in 2022 IEEE 19th International Conference on Software Architecture Companion (ICSA-C), 2022, pp. 30–34
2022
-
[23]
Our vision for better code ownership,
Q. Trotter, E. Kesty, and Q. Bays, “Our vision for better code ownership,” 2023, sourcegraph Blog. [Online]. Available: https://sourcegraph.com/blog/our-vision-for-code-ownership
2023
-
[24]
An approach for evaluating the potential impact of anti-patterns on microservices performance,
R. Matar and J. Jahi ´c, “An approach for evaluating the potential impact of anti-patterns on microservices performance,” in 2023 IEEE 20th International Conference on Software Architecture Companion (ICSA-C), 2023, pp. 167–170
2023
-
[25]
Towards microservice smells detection,
I. Pigazzini, F. A. Fontana, V . Lenarduzzi, and D. Taibi, “Towards microservice smells detection,” in Proceedings of the 3rd International Conference on Technical Debt, ser. TechDebt ’20. Association for Computing Machinery, 2020, p. 92–97
2020
-
[26]
Identifying availability tactics to sup- port security architectural design of microservice-based systems,
G. Marquez and H. Astudillo, “Identifying availability tactics to sup- port security architectural design of microservice-based systems,” in Proceedings of the 13th European Conference on Software Architecture - V olume2, ser. ECSA ’19. Association for Computing Machinery, 2019, p. 123–129
2019
-
[27]
Automated code-smell detection in microservices through static analysis: A case study,
A. Walker, D. Das, and T. Cerny, “Automated code-smell detection in microservices through static analysis: A case study,” Applied Sciences, vol. 10, no. 21, 2020
2020
-
[28]
Automatic anti-pattern detection in microservice architectures based on distributed tracing,
T. H ¨ubener, M. R. V . Chaudron, Y . Luo, P. Vallen, J. van der Kogel, and T. Liefheid, “Automatic anti-pattern detection in microservice architectures based on distributed tracing,” in Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, ser. ICSE-SEIP ’22. New York, NY , USA: Association for Comput...
-
[29]
Visualizing anti-patterns in microservices at runtime: A systematic mapping study,
G. Parker, S. Kim, A. A. Maruf, T. Cerny, K. Frajtak, P. Tisnovsky, and D. Taibi, “Visualizing anti-patterns in microservices at runtime: A systematic mapping study,” IEEE Access, vol. 11, pp. 4434–4442, 2023
2023
-
[30]
Unsupervised detection of microservice trace anomalies through service-level deep bayesian networks,
P. Liu, H. Xu, Q. Ouyang, R. Jiao, Z. Chen, S. Zhang, J. Yang, L. Mo, J. Zeng, W. Xue, and D. Pei, “Unsupervised detection of microservice trace anomalies through service-level deep bayesian networks,” in 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 2020, pp. 48–58
2020
-
[31]
Real-time anomaly detection using distributed tracing in microservice cloud applications,
M. Raeiszadeh, A. Ebrahimzadeh, A. Saleem, R. H. Glitho, J. Eker, and R. A. F. Mini, “Real-time anomaly detection using distributed tracing in microservice cloud applications,” in 2023 IEEE 12th International Conference on Cloud Networking (CloudNet), 2023, pp. 36–44
2023
-
[32]
J. Chen, F. Liu, J. Jiang, G. Zhong, D. Xu, Z. Tan, and S. Shi, “Tracegra: A trace-based anomaly detection for microservice using graph deep learning,” Comput. Commun., vol. 204, no. C, p. 109–117, Apr. 2023. [Online]. Available: https://doi.org/10.1016/j.comcom.2023.03.028
-
[33]
Autoencoder-based anomaly detection in microservices using distributed tracing,
S. Shahini and H. Momeni, “Autoencoder-based anomaly detection in microservices using distributed tracing,” in 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), 2024, pp. 1–6
2024
-
[34]
Multi-dimensional anomaly detection and fault localization in microservice architectures: A dual-channel deep learning approach with causal inference for intelligent sensing,
S. Xing, Y . Wang, and W. Liu, “Multi-dimensional anomaly detection and fault localization in microservice architectures: A dual-channel deep learning approach with causal inference for intelligent sensing,” Sensors, vol. 25, no. 11, p. 3396, 2025. [Online]. Available: https://www.mdpi.com/1424-8220/25/11/3396
2025
-
[35]
Detection of microservice-based software anomalies based on opentracing in cloud,
M. Khanahmadi, A. Shameli-Sendi, M. Jabbarifar, Q. Fournier, and M. Dagenais, “Detection of microservice-based software anomalies based on opentracing in cloud,” Software: Practice and Experience, vol. 53, no. 8, pp. 1681–1699, 2023. [Online]. Available: https: //onlinelibrary.wiley.com/doi/abs/10.1002/spe.3208
-
[36]
Graph-based anti-pattern detec- tion in microservice applications,
A. L. Røhne, B. Pronk, and B. Akesson, “Graph-based anti-pattern detec- tion in microservice applications,” in 2024 50th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2024, pp. 341–349
2024
-
[37]
Code smell dataset,
A. Nizam, ¨Omer Kerem Adalı, and E. ˙Islamo˘glu, “Code smell dataset,”
-
[38]
Available: https://dx.doi.org/10.21227/j0rn-ht76
[Online]. Available: https://dx.doi.org/10.21227/j0rn-ht76
-
[39]
Mlcq: Industry-relevant code smell data set,
L. Madeyski and T. Lewowski, “Mlcq: Industry-relevant code smell data set,” in Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering, ser. EASE ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 342–347. [Online]. Available: https://doi.org/10.1145/3383219.3383264
-
[40]
Smellycode++: Multi-label dataset for code smell detection,
N. Alomari, A. Alazba, H. Aljamaan, and M. Alshayeb, “Smellycode++: Multi-label dataset for code smell detection,” Scientific Data, vol. 12, no. 1, p. 1207, 2025. [Online]. Available: https://doi.org/10.1038/ s41597-025-05465-z
2025
-
[41]
apolloconfig/apollo,
“apolloconfig/apollo,” https://github.com/apolloconfig/apollo/tree/ de05540e6c9a18cdf370ae95ca5e544d7b8d04bf, accessed: 2026-02-24
2026
-
[42]
Descartesresearch/teastore,
“Descartesresearch/teastore,” https://github.com/DescartesResearch/ TeaStore/tree/e93bca736abe5bf92267d0bc61e95fc9198e6014, accessed: 2026-02-24
2026
-
[43]
zpng/spring-cloud-microservice-examples,
“zpng/spring-cloud-microservice-examples,” https://github.com/zpng/ spring-cloud-microservice-examples, accessed: 2026-02-24
2026
-
[44]
william-tran/freddys-bbq,
“william-tran/freddys-bbq,” https://github.com/william-tran/ freddys-bbq, accessed: 2026-02-24
2026
-
[45]
sqshq/piggymetrics,
“sqshq/piggymetrics,” https://github.com/sqshq/piggymetrics/tree/ fd5ee3c555ea9cd6067eacf3f2a3e8b85fe4fe77, accessed: 2026-02-24
2026
-
[46]
microservices-patterns/ftgo-application,
“microservices-patterns/ftgo-application,” https:// github.com/microservices-patterns/ftgo-application/tree/ c028190295d64b5fad8993e2f3a6dd88270e9bf2, accessed: 2026-02-24
2026
-
[47]
oktadev/spring-boot-microservices-example,
“oktadev/spring-boot-microservices-example,” https://github.com/ oktadev/spring-boot-microservices-example.git, accessed: 2026-02-24
2026
-
[48]
Microservice-api-patterns/lakesidemutual,
“Microservice-api-patterns/lakesidemutual,” https://github. com/Microservice-API-Patterns/LakesideMutual/tree/ bdc6d30135149563c057dd30f21b7df68608c500, accessed: 2026-02-24
2026
-
[49]
Hiejulia/warehouse-microservice,
“Hiejulia/warehouse-microservice,” https://github.com/HieJulia/ warehouse-microservice, accessed: 2026-02-24
2026
-
[50]
Joecao/qbike,
“Joecao/qbike,” https://github.com/JoeCao/qbike, accessed: 2026-02-24
2026
-
[51]
ewolff/microservice (microservice-demo subdirectory),
“ewolff/microservice (microservice-demo subdirectory),” https://github.com/ewolff/microservice/tree/master/microservice-demo, accessed: 2026-02-24
2026
-
[52]
benwilcock/cqrs-microservice-sampler,
“benwilcock/cqrs-microservice-sampler,” https://github.com/ benwilcock/cqrs-microservice-sampler, accessed: 2026-02-24
2026
-
[53]
idugalic/micro-company,
“idugalic/micro-company,” https://github.com/idugalic/micro-company/ tree/8f4a2189403c13af66a964d31d7dc5228e489a0b, accessed: 2026-02- 24
2026
-
[54]
A probabilistic interpretation of precision, recall and f-score, with implication for evaluation,
C. Goutte and E. Gaussier, “A probabilistic interpretation of precision, recall and f-score, with implication for evaluation,” in Advances in Information Retrieval, D. E. Losada and J. M. Fern ´andez-Luna, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 345–359
2005
-
[55]
Landscape and taxonomy of prompt engineering patterns in software engineering,
Y . Sasaki, H. Washizaki, J. Li, N. Yoshioka, N. Ubayashi, and Y . Fukazawa, “Landscape and taxonomy of prompt engineering patterns in software engineering,” IT Professional, vol. 27, no. 1, pp. 41–49, 2025
2025
-
[56]
Better zero-shot reasoning with role-play prompting,
A. Kong, S. Zhao, H. Chen, Q. Li, Y . Qin, R. Sun, X. Zhou, E. Wang, and X. Dong, “Better zero-shot reasoning with role-play prompting,” 2024
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.