pith. sign in

arxiv: 2605.22976 · v1 · pith:PWR5LQRInew · submitted 2026-05-21 · 💻 cs.SE · cs.AI

LLM Code Smells: A Taxonomy and Detection Approach

Pith reviewed 2026-05-25 05:42 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords LLM code smellstaxonomystatic analysiscode smell detectionsoftware qualityLLM integrationopen-source projectsdetection tool
0
0 comments X

The pith

Nine LLM code smells documented in a taxonomy and detected by SpecDetect4LLM appear in 73.5% of 692 analyzed software systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines nine specific coding practices that represent poor ways of integrating large language models into software applications. It builds a static analysis tool called SpecDetect4LLM to automatically find these practices in source code. Evaluation across hundreds of open-source projects shows these smells are common and the tool detects them with 91.3 percent precision and 71.8 percent recall. Developers need this kind of guidance because bad LLM integration can reduce the reliability and maintainability of the overall system.

Core claim

The authors consolidate and refine the concept of LLM code smells by presenting a self-contained taxonomy and catalog of nine such smells. They develop SpecDetect4LLM, a static source code analysis tool for detecting these smells, and evaluate it on 692 open-source projects comprising 171,194 source files. The results indicate that LLM code smells affect 73.5% of the analyzed systems, with the tool achieving a precision of 91.3% and recall of 71.8%.

What carries the argument

A catalog of nine LLM code smells together with the SpecDetect4LLM static analysis rules that map to them.

Load-bearing premise

The nine LLM code smells accurately capture inadequate integration practices that undermine software system quality, and the static analysis rules in SpecDetect4LLM correctly map to these smells without significant false classifications.

What would settle it

A controlled study comparing quality metrics such as bug rates or maintenance effort between systems containing the detected smells and equivalent refactored versions without them.

read the original abstract

Large Language Models (LLMs) are increasingly integrated into software systems for diverse purposes, due to their versatility, flexibility, and ability to simulate human reasoning to some extent. However, poor integration of LLM inference in source code can undermine software system quality. Therefore, inadequate LLM integration coding practices must be documented to help developers mitigate such issues. Following our earlier work on LLM code smells, this paper consolidates and refines the concept by presenting a self-contained taxonomy and a catalog of nine LLM code smells. We also create SpecDetect4LLM, a static source code analysis tool for their detection, and conduct extensive empirical evaluations of its detection effectiveness (precision and recall) as well as the prevalence of LLM code smells across 692 open-source software projects (171,194 source files). Our results show that LLM code smells affect 73.5% of the analyzed systems, with a detection precision of 91.3% and a recall of 71.8%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper consolidates prior work into a self-contained taxonomy of nine LLM code smells, presents the SpecDetect4LLM static analysis tool for detecting them, and reports an empirical evaluation on 692 open-source projects (171,194 files) claiming that the smells affect 73.5% of systems with tool precision of 91.3% and recall of 71.8%.

Significance. If the taxonomy validly identifies integration practices that degrade quality and the detection rules map to them without substantial misclassification, the catalog and tool could give developers concrete guidance for LLM usage in production codebases. The scale of the corpus evaluation would add practical value if the metrics are independently corroborated.

major comments (2)
  1. [Evaluation] Evaluation section: the reported precision of 91.3% and recall of 71.8% are presented without any description of how ground-truth labels for the nine smells were obtained, how inter-rater agreement was measured, or what controls were applied for selection bias in the 692-project corpus; these omissions directly undermine the reliability of the central effectiveness and prevalence claims.
  2. [Taxonomy] Taxonomy and §3 (or equivalent): the nine smells are asserted to capture inadequate LLM integration practices that undermine system quality, yet the manuscript supplies no external expert validation, quality-impact correlation study, or comparison against independent oracles beyond the authors' internal definitions; this makes both the 73.5% prevalence figure and the tool's mapping dependent on unverified internal consistency.
minor comments (1)
  1. The abstract states the work 'consolidates and refines' an earlier taxonomy but provides neither a citation to that prior work nor a concise delta table showing which smells were added, removed, or redefined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below. We will revise the manuscript to provide the requested details on ground-truth labeling and inter-rater agreement. For the taxonomy, we will add explicit discussion of its derivation and limitations while maintaining that the internal definitions are grounded in prior work.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the reported precision of 91.3% and recall of 71.8% are presented without any description of how ground-truth labels for the nine smells were obtained, how inter-rater agreement was measured, or what controls were applied for selection bias in the 692-project corpus; these omissions directly undermine the reliability of the central effectiveness and prevalence claims.

    Authors: We agree that these methodological details are essential for assessing the reliability of the metrics. The ground-truth labels were created via manual review of a stratified sample of files by two authors, with disagreements resolved through discussion; inter-rater agreement was measured using Cohen's kappa (value to be reported). Corpus selection followed criteria from prior LLM studies to reduce bias. In the revision we will insert a new subsection (likely 4.2) detailing the full labeling protocol, agreement statistics, and bias controls. revision: yes

  2. Referee: [Taxonomy] Taxonomy and §3 (or equivalent): the nine smells are asserted to capture inadequate LLM integration practices that undermine system quality, yet the manuscript supplies no external expert validation, quality-impact correlation study, or comparison against independent oracles beyond the authors' internal definitions; this makes both the 73.5% prevalence figure and the tool's mapping dependent on unverified internal consistency.

    Authors: The taxonomy consolidates and refines definitions from our earlier published work on LLM code smells, where initial examples were drawn from real-world LLM usage patterns reported in the literature. We did not perform a new external expert survey or correlation study in this manuscript. We will add a limitations paragraph acknowledging this and noting that future work could include such validation. The prevalence and tool results are presented as tied to the stated definitions; we will make this dependency explicit in the revised text. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical metrics from external projects are independent of author definitions

full rationale

The paper's central results (prevalence 73.5%, precision 91.3%, recall 71.8%) are obtained by applying SpecDetect4LLM to 692 external open-source projects and counting matches against the nine author-defined smells. No equations, fitted parameters, or self-referential reductions appear in the provided text; the taxonomy is presented as self-contained and the evaluation numbers are direct empirical counts rather than predictions derived from the definitions themselves. The mention of 'earlier work' is a normal citation and does not carry the load-bearing claim. This is a standard empirical software-engineering study whose reported figures do not reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that the defined smells are meaningful indicators of poor LLM integration and that static rules can detect them reliably; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption LLM code smells can be identified through static source code analysis rules
    The paper states that SpecDetect4LLM is a static source code analysis tool for detection of the smells.

pith-pipeline@v0.9.0 · 5713 in / 1242 out tokens · 32996 ms · 2026-05-25T05:42:09.687032+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

  1. [1]

    https://arxiv.org/abs/2504.08619

    Xia, Z., Zhu, L., Li, B., Chen, F., Li, Q., Liao, C., Wang, F., Liu, H.: Analyzing 16,193 LLM Papers for Fun and Profits (2025). https://arxiv.org/abs/2504.08619

  2. [2]

    https://arxiv.org/abs/2407.05138

    Shao, Y., Huang, Y., Shen, J., Ma, L., Su, T., Wan, C.: Are LLMs Correctly Integrated into Software Systems? (2025). https://arxiv.org/abs/2407.05138

  3. [3]

    PhD thesis, University of Waterloo (August 2024)

    Khatun, A.: Uncovering the reliability and consistency of ai language models: A systematic study. PhD thesis, University of Waterloo (August 2024). https: //uwspace.uwaterloo.ca/items/e01e11a6-e033-4f6a-85c6-849fba74e039

  4. [4]

    Knowledge-Based Systems 318, 113503 (2025) https://doi.org/10.1016/j.knosys.2025.113503

    Yang, W., Some, L., Bain, M., Kang, B.: A comprehensive survey on integrating large language models with knowledge-based methods. Knowledge-Based Systems 318, 113503 (2025) https://doi.org/10.1016/j.knosys.2025.113503

  5. [5]

    https://arxiv.org/abs/ 2501.12904

    Bucaioni, A., Weyssow, M., He, J., Lyu, Y., Lo, D.: A Functional Software Ref- erence Architecture for LLM-Integrated Systems (2025). https://arxiv.org/abs/ 2501.12904

  6. [6]

    Addison-Wesley Longman Publishing Co., Inc., USA (1999)

    Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D.: Refactoring: Improving the Design of Existing Code. Addison-Wesley Longman Publishing Co., Inc., USA (1999)

  7. [7]

    https://arxiv.org/abs/2203.13746

    Zhang, H., Cruz, L., Deursen, A.: Code Smells for Machine Learning Applications (2022). https://arxiv.org/abs/2203.13746

  8. [8]

    https://arxiv.org/abs/2509.14404

    Tian, H., Wang, C., Yang, B., Zhang, L., Liu, Y.: A Taxonomy of Prompt Defects in LLM Systems (2025). https://arxiv.org/abs/2509.14404

  9. [9]

    Paul, D.G., Zhu, H., Bayley, I.: Investigating the Smells of LLM Generated Code. SSRN. Available at SSRN (2025). https://doi.org/10.2139/ssrn.5601126 . https: //ssrn.com/abstract=5601126

  10. [10]

    In: Proceedings of the 2026 IEEE/ACM 48th International Conference on Software Engineering, New Ideas and Emerging Results (ICSE-NIER ’26)

    Mahmoudi, B., Chenail-Larcher, Z., Moha, N., Sti´ evenart, Q., Avellaneda, F.: Specification and detection of LLM code smells. In: Proceedings of the 2026 IEEE/ACM 48th International Conference on Software Engineering, New Ideas and Emerging Results (ICSE-NIER ’26). Association for Computing Machin- ery, New York, NY, USA (2026). https://doi.org/10.1145/3...

  11. [11]

    https://doi.org/10.48550/arXiv.2509

    Mahmoudi, B., Moha, N., Stievenert, Q., Avellaneda, F.: AI-Specific Code Smells: From Specification to Detection (2025). https://doi.org/10.48550/arXiv.2509. 52 20491

  12. [12]

    In: Proceedings of the 31st International Conference on Neural Information Processing Systems

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 6000–6010. Curran Associates Inc., Red Hook, NY, USA (2017)

  13. [13]

    IEEE Transactions on Pattern Analysis and Machine Intelligence46(8), 5625–5644 (2024) https://doi.org/10.1109/TPAMI.2024.3369699

    Zhang, J., Huang, J., Jin, S., Lu, S.: Vision-language models for vision tasks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence46(8), 5625–5644 (2024) https://doi.org/10.1109/TPAMI.2024.3369699

  14. [14]

    Technical report (2024)

    OpenAI: Learning to Reason with LLMs. Technical report (2024). https://openai. com/index/learning-to-reason-with-llms/

  15. [15]

    International standard, International Organization for Standardiza- tion (2017)

    ISO/IEC/IEEE: ISO/IEC/IEEE 24765:2017 Systems and software engineering: Vocabulary. International standard, International Organization for Standardiza- tion (2017)

  16. [16]

    International standard, International Organization for Standardization (2023)

    ISO/IEC: ISO/IEC 25010:2023 Systems and software engineering: Systems and software Quality Requirements and Evaluation (SQuaRE): Product quality model. International standard, International Organization for Standardization (2023)

  17. [17]

    Ieee std 610.12-1990, Institute of Electrical and Electronics Engineers (1990)

    IEEE: IEEE Standard Glossary of Software Engineering Terminology. Ieee std 610.12-1990, Institute of Electrical and Electronics Engineers (1990). https://doi. org/10.1109/IEEESTD.1990.101064

  18. [18]

    Technical Report EBSE-2007- 01, EBSE 2007 (2007)

    Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering. Technical Report EBSE-2007- 01, EBSE 2007 (2007). https://www.elsevier.com/ data/promis misc/ 525444systematicreviewsguide.pdf

  19. [19]

    https://arxiv.org/abs/2512.23066

    Cherief, H.A., Mahmoudi, B., Chenail-Larcher, Z., Moha, N., Sti’evenart, Q., Avellaneda, F.: An Automated Grey Literature Extraction Tool for Software Engineering (2025). https://arxiv.org/abs/2512.23066

  20. [20]

    BMJ372(71), 1–9 (2021) https://doi.org/10.1136/bmj.n71

    Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron, I., Hoffmann, T.C., Mul- row, C.D., Shamseer, L., Tetzlaff, J.M., Akl, E.A., Brennan, S.E., Chou, R., Glanville, J., Grimshaw, J.M., Hrobjartsson, A., Lalu, M.M., Li, T., Loder, E.W., Mayo-Wilson, E., McDonald, S., McGuinness, L.A., Stewart, L.A., Thomas, J., Tricco, A.C., Welch, V.A., Whiting, P., Moher...

  21. [21]

    BMJ 372, 160 (2021) https://doi.org/10.1136/bmj.n160

    Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron, I., Hoffmann, T.C., Mulrow, C.D., Shamseer, L., Tetzlaff, J.M., Akl, E.A., Brennan, S.E., Chou, R., Glanville, 53 J., Grimshaw, J.M., Hrobjartsson, A., Lalu, M.M., Li, T., Loder, E.W., Mayo- Wilson, E., McDonald, S., McGuinness, L.A., Stewart, L.A., Thomas, J., Tricco, A.C., Welch, V.A., Whiting, P., Moh...

  22. [22]

    IEEE Transactions on Software Engineering 49(3), 1273–1298 (2023) https://doi.org/10.1109/TSE.2022.3174092

    Kitchenham, B., Madeyski, L., Budgen, D.: Segress: Software engineering guide- lines for reporting secondary studies. IEEE Transactions on Software Engineering 49(3), 1273–1298 (2023) https://doi.org/10.1109/TSE.2022.3174092

  23. [23]

    BMC Medical Informatics and Decision Making7, 16 (2007) https://doi.org/10.1186/ 1472-6947-7-16

    Schardt, C., Adams, M.B., Owens, T., Keitz, S., Fontelo, P.: Utilization of the pico framework to improve searching pubmed for clinical questions. BMC Medical Informatics and Decision Making7, 16 (2007) https://doi.org/10.1186/ 1472-6947-7-16

  24. [24]

    In: Proceedings of the First Interna- tional Symposium on Empirical Software Engineering and Measurement (ESEM 2007), pp

    Dyb˚ a, T., Dingsøyr, T., Hanssen, G.K.: Applying systematic reviews to diverse study types: An experience report. In: Proceedings of the First Interna- tional Symposium on Empirical Software Engineering and Measurement (ESEM 2007), pp. 225–234. IEEE, ??? (2007). https://doi.org/10.1109/ESEM.2007.59 . https://doi.org/10.1109/ESEM.2007.59

  25. [25]

    A Survey of Large Language Models

    Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J., Wen, J.: A survey of large language models. arXiv preprint (2023) arXiv:2303.18223 [cs.CL]

  26. [26]

    Information and Software Technology106, 101–121 (2019) https://doi.org/10

    Garousi, V., Felderer, M., M”antyl”a, M.V.: Guidelines for including grey lit- erature and conducting multivocal literature reviews in software engineering. Information and Software Technology106, 101–121 (2019) https://doi.org/10. 1016/j.infsof.2018.09.006

  27. [27]

    In: Proceedings of the 34th Brazilian Symposium on Software Engineering

    Kamei, F., Wiese, I., Pinto, G., Ribeiro, M., Soares, S.: On the use of grey liter- ature: A survey with the brazilian software engineering research community. In: Proceedings of the 34th Brazilian Symposium on Software Engineering. SBES ’20. Association for Computing Machinery, ??? (2020). https://doi.org/10.1145/ 3422392.3422442

  28. [28]

    https://www.perplexity.ai/ (2025)

    AI, P.: Perplexity https://www.perplexity.ai/. https://www.perplexity.ai/ (2025)

  29. [29]

    https: //huggingface.co/ Accessed 2025-09-25

    Hugging Face: Hugging Face - The AI Community Building the Future. https: //huggingface.co/ Accessed 2025-09-25

  30. [30]

    https: //github.com/Brahim-Mahmoudi/Code Smell LLM (2025)

    Mahmoudi, B., Chenail Larcher, Z.: Replication Package LLM-code smells. https: //github.com/Brahim-Mahmoudi/Code Smell LLM (2025)

  31. [31]

    https://platform.openai

    OpenAI: API Reference - Chat Completions (2025). https://platform.openai. com/docs/api-reference/chat Accessed 2025-09-25 54

  32. [32]

    https://docs.claude.com/en/api/ messages Accessed 2025-09-25

    Anthropic: Messages API - Claude Docs (2025). https://docs.claude.com/en/api/ messages Accessed 2025-09-25

  33. [33]

    https:// developers.openai.com/api/docs/guides/images-vision

    OpenAI: Images and Vision — OpenAI API Documentation (2025). https:// developers.openai.com/api/docs/guides/images-vision

  34. [34]

    https://platform.claude

    Anthropic: Vision - Claude API Documentation (2025). https://platform.claude. com/docs/en/build-with-claude/vision

  35. [35]

    Dis- cussion thread accessed 2025-12-09 (2024)

    OpenAI Developer Community: Clarifications on setting temperature = 0. Dis- cussion thread accessed 2025-12-09 (2024). https://community.openai.com/t/ clarifications-on-setting-temperature-0/886447

  36. [36]

    In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp

    Nandani, H., Saad, M., Sharma, T.: DACOS: A manually annotated dataset of code smells. In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1–12 (2023). https://doi.org/10. 1109/MSR59073.2023.00067

  37. [37]

    John Wiley & Sons, New York (1977)

    Cochran, W.G.: Sampling Techniques, 3rd edn. John Wiley & Sons, New York (1977). Chap. 5

  38. [38]

    Passi, S., Jackson, S.J.: Trust in data science: Collaboration, translation, and accountability in corporate data science projects. Proc. ACM Hum.-Comput. Interact.2(CSCW) (2018) https://doi.org/10.1145/3274405

  39. [39]

    Livshits, B., Sridharan, M., Smaragdakis, Y., Lhot´ ak, O., Amaral, J.N., Chang, B.-Y.E., Guyer, S.Z., Khedker, U.P., Møller, A., Vardoulakis, D.: In defense of soundiness: a manifesto. Commun. ACM58(2), 44–46 (2015) https://doi.org/10. 1145/2644805

  40. [40]

    Empirical Software Engineering24(6), 3546–3586 (2019) https://doi.org/10

    Carvalho, S.G., Aniche, M., Ver´ ıssimo, J., Garcia, A., Alves, V., Gheyi, R.: An empirical catalog of code smells for the presentation layer of android apps. Empirical Software Engineering24(6), 3546–3586 (2019) https://doi.org/10. 1007/s10664-019-09768-9

  41. [41]

    https://arxiv.org/abs/2412.18371

    Ning, K., Chen, J., Zhang, J., Li, W., Wang, Z., Feng, Y., Zhang, W., Zheng, Z.: Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents (2024). https://arxiv.org/abs/2412.18371

  42. [42]

    https://arxiv.org/abs/2504.09037

    Ke, Z., Jiao, F., Ming, Y., Nguyen, X.-P., Xu, A., Long, D.X., Li, M., Qin, C., Wang, P., Savarese, S., Xiong, C., Joty, S.: A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems (2025). https://arxiv.org/abs/2504.09037

  43. [43]

    In: AST 2025, pp

    Winston, C., Just, R.: A taxonomy of failures in tool-augmented llms. In: AST 2025, pp. 125–135 (2025). https://doi.org/10.1109/AST66626.2025.00019 55

  44. [44]

    Cemri, M., Pan, M.Z., Yang, S., Agrawal, L.A., Chopra, B., Tiwari, R., Keutzer, K., Parameswaran, A., Klein, D., Ramchandran, K., Zaharia, M., Gonzalez, J.E., Stoica, I.: Why Do Multi-Agent LLM Systems Fail? (2025)

  45. [45]

    In: LLMSEC 2025, pp

    Le Jeune, P., Liu, J., Rossi, L., Dora, M.: Realharm: A collection of real-world language model application failures. In: LLMSEC 2025, pp. 87–100 (2025)

  46. [46]

    https://arxiv.org/abs/2401.12611

    Ronanki, K., Cabrero-Daniel, B., Berger, C.: Prompt Smells: An Omen for Undesirable Generative AI Outputs (2024). https://arxiv.org/abs/2401.12611

  47. [47]

    Agrawal, A., Kedia, N., Agarwal, A., Mohan, J., Kwatra, N., Kundu, S., Ramjee, R., Tumanov, A.: On Evaluating Performance of LLM Inference Serving Systems (2025)

  48. [48]

    Zhuo, T.Y., He, J., Sun, J., Xing, Z., Lo, D., Grundy, J., Du, X.: Identifying and Mitigating API Misuse in Large Language Models (2025)

  49. [49]

    https://arxiv.org/abs/2408.13372

    Esfahani, A.M., Kahani, N., Ajila, S.A.: Understanding Defects in Generated Codes by Language Models (2024). https://arxiv.org/abs/2408.13372

  50. [50]

    In: 2024 9th International Conference on Smart and Sustainable Technologies (SpliTech), pp

    Diaz-De-Arcaya, J., L´ opez-De-Armentia, J., Mi˜ n´ on, R., Ojanguren, I.L., Torre- Bastida, A.I.: Large language model operations (llmops): Definition, challenges, and lifecycle management. In: 2024 9th International Conference on Smart and Sustainable Technologies (SpliTech), pp. 1–4 (2024). https://doi.org/10.23919/ SpliTech61897.2024.10612341

  51. [51]

    IEEE Software42(1), 26–32 (2025) https://doi.org/10.1109/ MS.2024.3477014

    Tantithamthavorn, C.K., Palomba, F., Khomh, F., Chua, J.J.: Mlops, llmops, fmops, and beyond. IEEE Software42(1), 26–32 (2025) https://doi.org/10.1109/ MS.2024.3477014

  52. [52]

    https://cloud.google.com/discover/what-is-llmops

    Google Cloud: What is LLMOps (large language model operations)? (2026). https://cloud.google.com/discover/what-is-llmops

  53. [53]

    We Need Structured Output

    IBM: What is LLMOps? Accessed: March 20, 2026 (2026). https://www.ibm. com/think/topics/llmops 11 Appendix Selected Papers [SP54] Liu, M.X., Liu, F., Fiannaca, A.J., Koo, T., Dixon, L., Terry, M., Cai, C.J.: We need structured output: Towards user-centered constraints on large language model output. (2024). https://doi.org/10.1145/3613905.3650756 [SP55] P...

  54. [54]

    https://community.openai.com/t/ clarifications-on-setting-temperature-0/886447 [SP164] Institute, P.E.: Complete Guide to Prompt Engineering with Tempera- ture and Top-p

    Discussion thread accessed 2025-12-09. https://community.openai.com/t/ clarifications-on-setting-temperature-0/886447 [SP164] Institute, P.E.: Complete Guide to Prompt Engineering with Tempera- ture and Top-p. Accessed: 2025-12-31 (2024). https://promptengineering.org/ prompt-engineering-with-temperature-and-top-p/ [SP165] Reyes, F., Gamage, Y., Skoglund,...