pith. sign in

arxiv: 2604.16359 · v2 · pith:DCYCOXISnew · submitted 2026-03-18 · 💻 cs.SE

LLM4Log: A Systematic Review of Large Language Model-based Log Analysis

Pith reviewed 2026-05-21 09:55 UTC · model grok-4.3

classification 💻 cs.SE
keywords large language modelslog analysissystematic reviewanomaly detectionlog parsingroot cause analysissoftware reliabilityAIOps
0
0 comments X

The pith

LLM techniques now cover the full log analysis pipeline from statement generation to anomaly detection and root cause analysis, per a review of 145 papers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper performs a systematic review to map the use of large language models across software log analysis. It examines the complete pipeline starting with logging statement generation and maintenance, moving through log parsing and structuring, and ending with downstream tasks such as anomaly detection, failure prediction, root cause analysis, and log summarization. The review extracts common design patterns including prompting, retrieval grounding, fine-tuning, tool augmentation, and verification, while also assessing evaluation methods, datasets, and reproducibility. A reader would care because logs drive reliability in large systems yet remain hard to analyze at scale, and LLMs introduce both new capabilities and risks such as hallucinations or high costs.

Core claim

By following a structured search and screening protocol completed in November 2025, the authors identified 145 unique papers and organized them through a unified task-driven taxonomy. They summarize recurring design patterns such as prompting with in-context learning, retrieval grounding, fine-tuning, tool and agent augmentation, and output verification. The review further analyzes evaluation practices, datasets, metrics, and reproducibility issues, then derives key lessons and open challenges centered on robustness under drift and long-tail events, grounding and faithfulness for operator-facing outputs, and deployment-oriented designs with verifiable behavior.

What carries the argument

The end-to-end pipeline from upstream logging-statement generation and maintenance to log parsing/structuring and downstream tasks, organized by a unified task-driven taxonomy across seven logging tasks.

If this is right

  • LLMs enable semantic generalization across evolving and semi-structured logs where traditional methods struggle.
  • Real-world deployment must address context limits, latency, cost, privacy constraints, and hallucinations.
  • Verification and grounding mechanisms are required to produce faithful outputs suitable for operators.
  • Evaluation practices need greater focus on long-tail events and robustness under data drift.
  • Standardization of datasets and metrics would improve reproducibility across studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hybrid approaches that pair LLMs with conventional structured parsers could reduce hallucinations while retaining semantic strengths.
  • Industry teams may need explicit decision criteria for choosing LLM-based log tools over established statistical methods in production.
  • Targeted experiments could measure how specific verification techniques affect faithfulness on public log datasets under controlled drift.

Load-bearing premise

The structured search and manual screening protocol completed in November 2025 captured a representative and unbiased set of 145 papers without significant omissions.

What would settle it

An independent replication of the search protocol that yields substantially more or fewer papers or reveals major relevant works omitted from the collection would show the review does not represent the full literature.

Figures

Figures reproduced from arXiv: 2604.16359 by Jinqiu Yang, Tse-Hsun Chen, Zeyang Ma.

Figure 1
Figure 1. Figure 1: LLM-based Log Analysis Across the Pipeline: From Logging to Downstream Tasks [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Temporal and overall distribution of LLM4Log analysis papers across tasks. Note: Pie chart labels are shown as n (u), [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An example log parsing result from Hadoop. [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Common workflow for LLM-enabled downstream log analysis. [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An illustrative overview of log window representations: raw logs are parsed into templates and parameters, then [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
read the original abstract

Software systems generate massive, evolving, semi-structured logs that are central to reliability engineering and AIOps, yet difficult to analyze at scale under drift and limited labels. Recent advances in pretrained Transformer models and instruction-tuned large language models (LLMs) have reshaped log analysis by enabling semantic generalization and cross-source evidence integration, but also introducing deployment risks such as context limits, latency and cost, privacy constraints, and hallucinations. This paper presents LLM4Log, a systematic review of LLM-based log analysis across the end-to-end pipeline, from upstream logging-statement generation and maintenance to log parsing/structuring and downstream tasks including anomaly detection, failure prediction, root cause analysis, and log summarization. Following a structured search and manual screening protocol, we completed literature collection in November 2025 and identified 145 unique papers across seven logging tasks. We organize the research area through a unified, task-driven taxonomy, summarize common design patterns (prompting/ICL, retrieval grounding, fine-tuning, tool/agent augmentation, and verification), and analyze evaluation practices, datasets, metrics, and reproducibility. Based on these cross-paper analyses, we summarize key lessons and open challenges for reliable real-world adoption. We emphasize robustness under drift and long-tail events, grounding and faithfulness for operator-facing outputs, and deployment-oriented designs with verifiable behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. This paper presents LLM4Log, a systematic review of LLM-based log analysis covering the full pipeline from upstream logging-statement generation and maintenance through log parsing/structuring to downstream tasks such as anomaly detection, failure prediction, root cause analysis, and log summarization. Following a structured search and manual screening protocol completed in November 2025, the authors identify 145 unique papers, organize them via a task-driven taxonomy, summarize design patterns (prompting/ICL, retrieval grounding, fine-tuning, tool/agent augmentation, verification), analyze evaluation practices/datasets/metrics/reproducibility, and distill key lessons plus open challenges for reliable adoption, with emphasis on robustness under drift, grounding/faithfulness, and verifiable deployment.

Significance. If the 145-paper corpus is representative, the review would be a useful synthesis of an emerging area at the intersection of LLMs and AIOps/reliability engineering. It could help researchers and practitioners by unifying disparate tasks under one taxonomy, cataloging reusable design patterns, and surfacing cross-cutting issues such as context limits, hallucinations, and long-tail robustness. The focus on reproducibility, datasets, and deployment-oriented challenges adds practical value beyond a simple enumeration of papers.

major comments (1)
  1. [§3] §3 (Literature Search and Screening): The central claim that the structured search and manual screening protocol yielded a representative set of 145 unique papers is load-bearing for the taxonomy, lessons, and open challenges. However, the manuscript does not provide the exact search strings, queried databases, time bounds, full inclusion/exclusion criteria, or inter-rater reliability statistics. Without these details it is impossible to assess selection bias or omissions (e.g., recent arXiv-only or non-English work), directly undermining confidence in the completeness of the synthesis.
minor comments (2)
  1. [Abstract, §1] Abstract and §1: The literature collection date is given as November 2025. Clarify whether this is the actual completion date or a projected one, and ensure consistency with the submission timeline.
  2. [Throughout] Throughout: Some citations to the 145 papers appear only in tables or supplementary material; ensure every referenced work is explicitly cited in the main text on first mention for traceability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential utility of LLM4Log as a synthesis of an emerging area. We agree that methodological transparency is essential for a systematic review and will revise the manuscript accordingly to address the concern raised.

read point-by-point responses
  1. Referee: [§3] §3 (Literature Search and Screening): The central claim that the structured search and manual screening protocol yielded a representative set of 145 unique papers is load-bearing for the taxonomy, lessons, and open challenges. However, the manuscript does not provide the exact search strings, queried databases, time bounds, full inclusion/exclusion criteria, or inter-rater reliability statistics. Without these details it is impossible to assess selection bias or omissions (e.g., recent arXiv-only or non-English work), directly undermining confidence in the completeness of the synthesis.

    Authors: We acknowledge that the current version of the manuscript does not include the full details of the search protocol, which limits the ability to evaluate potential selection biases or omissions. We agree this information is necessary to support the claim of a representative corpus. In the revised manuscript we will expand Section 3 (and add an appendix if space is constrained) to report: the precise search strings employed across each database, the complete list of queried sources (arXiv, Google Scholar, ACM Digital Library, IEEE Xplore, and any others), the time bounds (literature collected through November 2025), the full inclusion and exclusion criteria applied at each screening stage, and any inter-rater reliability statistics or justification for their absence. These additions will directly address concerns about representativeness, including coverage of recent arXiv-only or non-English work. revision: yes

Circularity Check

0 steps flagged

No circularity in systematic literature review synthesis

full rationale

This paper is a systematic review that identifies and synthesizes 145 external papers on LLM-based log analysis via a described structured search and manual screening protocol completed in November 2025. No mathematical derivations, fitted parameters, predictions, or self-referential definitions appear in the provided text. The central claims report on external literature findings, design patterns, and challenges rather than reducing any result to the authors' own inputs or prior self-citations by construction. The search protocol is presented as an independent method for literature collection and does not exhibit self-definition or load-bearing reduction to unverified self-references.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The review rests on standard systematic-review methodology assumptions rather than new free parameters or invented entities.

axioms (1)
  • domain assumption A structured search combined with manual screening can produce a representative sample of the LLM-for-log-analysis literature.
    Invoked when stating that 145 unique papers were identified after literature collection in November 2025.

pith-pipeline@v0.9.0 · 5766 in / 1308 out tokens · 42636 ms · 2026-05-21T09:55:35.957266+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

224 extracted references · 224 canonical work pages · 28 internal anchors

  1. [1]

    Llm-based event log analysis techniques: A survey.arXiv preprint arXiv:2502.00677, 2025

    Siraaj Akhtar, Saad Khan, and Simon Parkinson. Llm-based event log analysis techniques: A survey.arXiv preprint arXiv:2502.00677, 2025

  2. [2]

    Logfit: Log anomaly detection using fine-tuned language models.IEEE Transactions on Network and Service Management, 21(2):1715–1723, 2024

    Crispin Almodovar, Fariza Sabrina, Sarvnaz Karimi, and Salahuddin Azad. Logfit: Log anomaly detection using fine-tuned language models.IEEE Transactions on Network and Service Management, 21(2):1715–1723, 2024

  3. [3]

    Apache log4j 2

    Apache Software Foundation. Apache log4j 2. Online documentation, 2024. URL https://logging.apache.org/log4j/2.x/

  4. [4]

    A comparative study on large language models for log parsing

    Merve Astekin, Max Hort, and Leon Moonen. A comparative study on large language models for log parsing. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2024), pages 36–47. ACM, 2024. doi: 10.1145/3674805.3686684. URL https://doi.org/10.1145/3674805.3686684

  5. [5]

    AnomalyExplainerBot: Explainable AI for LLM-based anomaly detection using BERTViz & Captum, 2025

    Prasasthy Balasubramanian, Dumindu Kankanamge, Ekaterina Gilman, and Mourad Oussalah. AnomalyExplainerBot: Explainable AI for LLM-based anomaly detection using BERTViz & Captum, 2025

  6. [6]

    System log parsing with large language models: A review.arXiv preprint arXiv:2504.04877, 2025

    Viktor Beck, Max Landauer, Markus Wurzenberger, Florian Skopik, and Andreas Rauber. System log parsing with large language models: A review.arXiv preprint arXiv:2504.04877, 2025. 37

  7. [7]

    APT-LLM: Embedding-based anomaly detection of cyber advanced persistent threats using large language models

    Sidahmed Benabderrahmane, Petko Valtchev, James Cheney, and Talal Rahwan. APT-LLM: Embedding-based anomaly detection of cyber advanced persistent threats using large language models. InProceedings of the 13th International Symposium on Digital Forensics and Security (ISDFS 2025), pages 1–6, 2025. doi: 10.1109/ISDFS65363.2025.11011912

  8. [8]

    Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks, 5(2):157–166, 1994

    Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks, 5(2):157–166, 1994. doi: 10.1109/72.279181

  9. [9]

    Auto-logging: Ai-centred logging instrumentation

    Jasmin Bogatinovski and Odej Kao. Auto-logging: Ai-centred logging instrumentation. In2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pages 95–100, 2023. doi: 10 .1109/ICSE-NIER58687.2023.00023. URL https://doi.org/10.1109/ICSE-NIER58687.2023.00023

  10. [10]

    Good enough to learn: LLM-based anomaly detection in ECU logs without reliable labels, 2025

    Bogdan Bogdan, Arina Cazacu, and Laura Vasilie. Good enough to learn: LLM-based anomaly detection in ECU logs without reliable labels, 2025. URL https://arxiv.org/abs/2507.01077. Accepted to IEEE Intelligent Vehicles Symposium (IV) 2025

  11. [11]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Jeff Dean, et al. On the opportunities and risks of foundation m...

  12. [12]

    Language models are few-shot learners.Advances in neural information processing systems, 33: 1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33: 1877–1901, 2020

  13. [13]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. In30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021. URL https://www .usenix.org/conference/...

  14. [14]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

  15. [15]

    Bert-log: Anomaly detection for system logs based on pre-trained language model.Applied Artificial Intelligence, 36(1):2145642, 2022

    Song Chen and Hai Liao. Bert-log: Anomaly detection for system logs based on pre-trained language model.Applied Artificial Intelligence, 36(1):2145642, 2022. doi: 10 .1080/08839514.2022.2145642. URL https://www .tandfonline.com/doi/full/10.1080/08839514.2022.2145642

  16. [16]

    Epas: Efficient online log parsing via asynchronous scheduling of llm queries

    Xiaolei Chen, Jie Shi, Jia Chen, Peng Wang, and Wei Wang. Epas: Efficient online log parsing via asynchronous scheduling of llm queries. InProceedings of the 41st IEEE International Conference on Data Engineering (ICDE 2025), pages 4025–4037. IEEE, 2025. doi: 10.1109/ICDE.2025.00318. URL https://ieeexplore.ieee.org/abstract/document/11113127

  17. [17]

    Automatic root cause analysis via large language models for cloud incidents

    Yinfang Chen, Huaibing Xie, Minghua Ma, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, Xuedong Gao, Hao Fan, Ming Wen, et al. Automatic root cause analysis via large language models for cloud incidents. InProceedings of the Nineteenth European Conference on Computer Systems, pages 674–688, 2024

  18. [18]

    Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei

    Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, volume 30, 2017. URL https://papers .neurips.cc/paper/7017-deep- reinforcement-learning-from-human-preferences

  19. [19]

    ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

    Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. Electra: Pre-training text encoders as discriminators rather than generators.arXiv preprint arXiv:2003.10555, 2020

  20. [20]

    Semantic hierarchical classification applied to anomaly detection using system logs with a bert model.Applied Sciences, 14(13), 2024

    Clara Corbelle, Victor Carneiro, and Fidel Cacheda. Semantic hierarchical classification applied to anomaly detection using system logs with a bert model.Applied Sciences, 14(13), 2024. ISSN 2076-3417. doi: 10 .3390/app14135388. URL https://www .mdpi.com/2076- 3417/14/13/5388

  21. [21]

    Aetherlog: Log-based root cause analysis by integrating large language models with knowledge graphs

    Tianyu Cui, Ruowei Fu, Changchang Liu, Yuhe Ji, Wenwei Gu, Shenglin Zhang, Yongqian Sun, and Dan Pei. Aetherlog: Log-based root cause analysis by integrating large language models with knowledge graphs. In2025 IEEE 36th International Symposium on Software Reliability Engineering (ISSRE), pages 49–60. IEEE, 2025

  22. [22]

    Logeval: A comprehensive benchmark suite for llms in log analysis.Empirical Softw

    Tianyu Cui, Shiyu Ma, Ziang Chen, Tong Xiao, Chenyu Zhao, Shimin Tao, Yilun Liu, Shenglin Zhang, Duoming Lin, Changchang Liu, Yuzhe Cai, Weibin Meng, Yongqian Sun, and Dan Pei. Logeval: A comprehensive benchmark suite for llms in log analysis.Empirical Softw. Engg., 30(6), October 2025. ISSN 1382-3256. doi: 10 .1007/s10664-025-10701-6. URL https://doi .or...

  23. [23]

    Logram: Efficient log parsing using𝑛 n-gram dictionaries

    Hetong Dai, Heng Li, Che-Shao Chen, Weiyi Shang, and Tse-Hsun Chen. Logram: Efficient log parsing using𝑛 n-gram dictionaries. IEEE Transactions on Software Engineering, 48(3):879–892, 2020

  24. [24]

    loguru: Python logging made simple

    Delgan and contributors. loguru: Python logging made simple. Online documentation, 2024. URL https://github .com/Delgan/loguru. 38

  25. [25]

    QLoRA: Efficient Finetuning of Quantized LLMs

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms, 2023. URL https://arxiv.org/abs/2305.14314

  26. [26]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

  27. [27]

    Pdlogger: Automated logging framework for practical software development, 2025

    Shengcheng Duan, Yihua Xu, Sheng Zhang, Shen Wang, and Yue Duan. Pdlogger: Automated logging framework for practical software development, 2025. URL https://arxiv.org/abs/2507.19951

  28. [28]

    Unsupervised log parsing based on large language models and entropy

    Yiqi Duan, Jianliang Xu, Changyu Fan, and Zixin Liu. Unsupervised log parsing based on large language models and entropy. In2025 11th International Symposium on System Security, Safety, and Reliability (ISSSR), pages 1–10. IEEE, 2025

  29. [29]

    Early exploration of using ChatGPT for log-based anomaly detection on parallel file systems logs

    Chris Egersdoerfer, Di Zhang, and Dong Dai. Early exploration of using ChatGPT for log-based anomaly detection on parallel file systems logs. InProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’23), pages 315–316. Association for Computing Machinery, 2023. doi: 10.1145/3588195.3595943. URL https:...

  30. [30]

    Insightai: Root cause analysis in large log files with private data using large language model

    Maryam Ekhlasi, Anurag Prakash, Maxime Lamothe, and Michel Dagenais. Insightai: Root cause analysis in large log files with private data using large language model. In2025 IEEE/ACM 4th International Conference on AI Engineering – Software Engineering for AI (CAIN), pages 31–41, 2025. doi: 10.1109/CAIN66642.2025.00012

  31. [31]

    A novel framework for detecting anomalies in network security using LLM and deep learning.Journal of Electrical Systems, 21(1s):294–302, 2025

    Mahmoudreza Entezami, Shahabeddin Rahimi Harsini, David Houshangi, and Zahra Entezami. A novel framework for detecting anomalies in network security using LLM and deep learning.Journal of Electrical Systems, 21(1s):294–302, 2025. doi: 10 .52783/jes.8791

  32. [32]

    How Contextual are Contextualized Word Representations? C omparing the Geometry of BERT , ELM o, and GPT -2 Embeddings

    Kawin Ethayarajh. How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Confere...

  33. [33]

    Regulation (eu) 2016/679 of the european parliament and of the council (general data protection regulation)

    European Parliament and Council of the European Union. Regulation (eu) 2016/679 of the european parliament and of the council (general data protection regulation). Official Journal of the European Union, L119, 2016. URL https://eur-lex.europa.eu/eli/reg/2016/679/oj

  34. [34]

    Log anomaly detection by leveraging llm-based parsing and embedding with attention mechanism

    Asma Fariha, Vida Gharavian, Masoud Makrehchi, Shahryar Rahnamayan, Sanaa Alwidian, and Akramul Azim. Log anomaly detection by leveraging llm-based parsing and embedding with attention mechanism. In2024 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pages 859–863. IEEE, 2024

  35. [35]

    Codebert: A pre-trained model for programming and natural languages

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. Codebert: A pre-trained model for programming and natural languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, pages 1536–1547. Association for Computational Linguistics, 2020. doi: 10 .18653/...

  36. [36]

    Security and privacy controls for information systems and organizations (nist special publication 800-53 revision 5),

    Joint Task Force. Security and privacy controls for information systems and organizations (nist special publication 800-53 revision 5),

  37. [37]

    URL https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final

  38. [38]

    Amy Foster and Selva Kumar.COMBINING LLMS AND SHELL LOGS TO PREDICT BACKUP FAILURES. 07 2025

  39. [39]

    Where do developers log? an empirical study on logging practices in industry

    Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie. Where do developers log? an empirical study on logging practices in industry. InCompanion Proceedings of the 36th International Conference on Software Engineering, pages 24–33. ACM, 2014. doi: 10.1145/2591062.2591175

  40. [40]

    End-to-end log statement generation at block-level.Journal of Systems and Software, 216:112146, 2024

    Ying Fu, Meng Yan, Pinjia He, Chao Liu, Xiaohong Zhang, and Dan Yang. End-to-end log statement generation at block-level.Journal of Systems and Software, 216:112146, 2024. doi: 10.1016/j.jss.2024.112146

  41. [41]

    Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christopher Endres, Thorsten Holz, and Mario Fritz. Not what you ´ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection, 2023. URL https://arxiv.org/abs/2302.12173

  42. [42]

    LogLLM: Log-based anomaly detection using large language models.arXiv preprint,

    Wei Guan, Jian Cao, Shiyou Qian, and Jianqi Gao. LogLLM: Log-based anomaly detection using large language models.arXiv preprint,

  43. [43]

    URL https://arxiv.org/abs/2411.08561

  44. [44]

    H. Guo, S. Yuan, and X. Wu. Logbert: Log anomaly detection via bert. In2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021

  45. [45]

    Logformer: A pre-train and tuning pipeline for log anomaly detection

    Hongcheng Guo, Jian Yang, Jiaheng Liu, Jiaqi Bai, Boyang Wang, Zhoujun Li, Tieqiao Zheng, Bo Zhang, Junran Peng, and Qi Tian. Logformer: A pre-train and tuning pipeline for log anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 135–143, 2024

  46. [46]

    Owl: A large language model for it operations

    Hongcheng Guo, Jian Yang, Jiaheng Liu, Liqun Yang, Linzheng Chai, Jiaqi Bai, Junran Peng, Xiaorong Hu, Chao Chen, Dongfeng Zhang, Xu Shi, Tieqiao Zheng, Liangfan Zheng, Bo Zhang, Ke Xu, and Zhoujun Li. Owl: A large language model for it operations. InProceedings of the Twelfth International Conference on Learning Representations (ICLR 2024), 2024. URL htt...

  47. [47]

    Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. Don’t stop pretraining: Adapt language models to domains and tasks. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 8342–8360. Association for Computational Linguistics, 2020. doi: 10 .18653/v1/2...

  48. [48]

    Llm meets ml: Data-efficient anomaly detection on unstable logs.ACM Transactions on Software Engineering and Methodology, 2025

    Fatemeh Hadadi, Qinghua Xu, Domenico Bianculli, and Lionel Briand. Llm meets ml: Data-efficient anomaly detection on unstable logs.ACM Transactions on Software Engineering and Methodology, 2025. doi: 10 .1145/3771283. URL https://doi .org/10.1145/3771283. arXiv:2406.07467

  49. [49]

    Llmelog: An approach for anomaly detection based on llm-enriched log events

    Minghua He, Tong Jia, Chiming Duan, Huaqian Cai, Ying Li, and Gang Huang. Llmelog: An approach for anomaly detection based on llm-enriched log events. In2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pages 132–143. IEEE, 2024

  50. [50]

    Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. Drain: An online log parsing approach with fixed depth tree. In2017 IEEE International Conference on Web Services (ICWS), pages 33–40, 2017. doi: 10.1109/ICWS.2017.13

  51. [51]

    A survey on automated log analysis for reliability engineering.ACM computing surveys (CSUR), 54(6):1–37, 2021

    Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. A survey on automated log analysis for reliability engineering.ACM computing surveys (CSUR), 54(6):1–37, 2021

  52. [52]

    Parameter-efficient log anomaly detection based on pre-training model and lora

    Shiming He, Ying Lei, Ying Zhang, Kun Xie, and Pradip Kumar Sharma. Parameter-efficient log anomaly detection based on pre-training model and lora. In2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pages 207–217. IEEE, 2023

  53. [53]

    Benchmarking open-source large language models for log level suggestion

    Yi Wen Heng, Zeyang Ma, Zhenhao Li, Dong Jae Kim, and Tse-Hsun Chen. Benchmarking open-source large language models for log level suggestion. In2025 IEEE Conference on Software Testing, Verification and Validation (ICST), pages 314–325, 2025. doi: 10.1109/ICST62969.2025.10988921

  54. [54]

    Diagnosing robotics systems issues with large language models–a case study

    Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, and Mark Niklas Mueller. Diagnosing robotics systems issues with large language models–a case study. InICLR 2025 Workshop on Foundation Models in the Wild

  55. [55]

    Long short-term memory.Neural Computation, 9(8):1735–1780, 1997

    Sepp Hochreiter and J"urgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735–1780, 1997. doi: 10 .1162/ neco.1997.9.8.1735

  56. [56]

    Reguly, Kálmán Tornai, Tamás Zsedrovits, and Zoltán Máthé

    András Horváth, András Oláh, Attila Pintér, Bálint Siklósi, Gergely Lukács, István Z. Reguly, Kálmán Tornai, Tamás Zsedrovits, and Zoltán Máthé. Anomaly detection algorithms for real-time log data analysis at scale.IEEE Access, 13:136288–136311, 2025. doi: 10.1109/ACCESS.2025.3565575

  57. [57]

    Research on log anomaly detection based on sentence-BERT.Electronics, 12(17):3580, 2023

    Changze Hu, Yu Fang, Jinhua Wu, Haoyang Li, and Geng Wang. Research on log anomaly detection based on sentence-BERT.Electronics, 12(17):3580, 2023. doi: 10.3390/electronics12173580. URL https://www.mdpi.com/2079-9292/12/17/3580

  58. [58]

    Demystifying and extracting fault-indicating information from logs for failure diagnosis

    Junjie Huang, Zhihan Jiang, Jinyang Liu, Yintong Huo, Jiazhen Gu, Zhuangbin Chen, Cong Feng, Hui Dong, Zengyin Yang, and Michael R Lyu. Demystifying and extracting fault-indicating information from logs for failure diagnosis. In2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pages 511–522. IEEE, 2024

  59. [59]

    Junjie Huang, Zhihan Jiang, Zhuangbin Chen, and Michael R. Lyu. No more labelled examples? an unsupervised log parser with llms. InProceedings of the ACM on Software Engineering (FSE 2025), number FSE. ACM, 2025. doi: 10 .1145/3729377. URL https: //dl.acm.org/doi/10.1145/3729377

  60. [60]

    Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan

    Shaohan Huang, Yi Liu, Carol J. Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. Hitanomaly: Hierarchical transformers for anomaly detection in system log.IEEE Transactions on Network and Service Management, 17(2):2064–2076, 2020. doi: 10 .1109/ TNSM.2020.3034647

  61. [61]

    SAMBA: Sparsity Aware In -Memory Computing Based Machine Learning Accelerator,

    Shaohan Huang, Yi Liu, Carol J. Fung, He Wang, Hailong Yang, and Zhongzhi Luan. Improving log-based anomaly detection by pre-training hierarchical transformers.IEEE Transactions on Computers, 72(9):2656–2667, 2023. doi: 10.1109/TC.2023.3257518

  62. [62]

    LogRules: Enhancing log analysis capability of large language models through rules

    Xin Huang, Ting Zhang, and Wen Zhao. LogRules: Enhancing log analysis capability of large language models through rules. In Luis Chiruzzo, Alan Ritter, and Lu Wang, editors,Findings of the Association for Computational Linguistics: NAACL 2025, pages 452– 470, Albuquerque, New Mexico, April 2025. Association for Computational Linguistics. ISBN 979-8-89176-...

  63. [63]

    Xuanbo Huang, Kaiping Xue, Lutong Chen, Jiangping Han, Jian Li, and David S. L. Wei. Forensix: Automated network forensics and diagnostics for beyond-5g and 6g networks using large language models.IEEE Network, 39(5):74–80, 2025. doi: 10 .1109/ MNET.2025.3579925

  64. [64]

    Yintong Huo, Cheryl Lee, Yuxin Su, Shiwen Shan, Jinyang Liu, and Michael R. Lyu. Evlog: Evolving log analyzer for anomalous logs identification. arXiv preprint arXiv:2306.01509, 2023. URL https://arxiv.org/abs/2306.01509

  65. [65]

    Fall: Prior failure detection in large scale system based on language model.IEEE Transactions on Dependable and Secure Computing, 22(1):279–291, 2025

    Jaeyoon Jeong, Insung Baek, Byungwoo Bang, Junyeon Lee, Uiseok Song, and Seoung Bum Kim. Fall: Prior failure detection in large scale system based on language model.IEEE Transactions on Dependable and Secure Computing, 22(1):279–291, 2025. doi: 10.1109/TDSC.2024.3396166

  66. [66]

    Lemad: LLM-empowered multi-agent system for anomaly detection in power grid services.Electronics, 14(15):3008, 2025

    Xin Ji, Le Zhang, Wenya Zhang, Fang Peng, Yifan Mao, Xingchuang Liao, and Kui Zhang. Lemad: LLM-empowered multi-agent system for anomaly detection in power grid services.Electronics, 14(15):3008, 2025. doi: 10 .3390/electronics14153008. URL https: //doi.org/10.3390/electronics14153008

  67. [67]

    Adapting large language models to log analysis with interpretable domain knowledge

    Yuhe Ji, Yilun Liu, Feiyu Yao, Minggui He, Shimin Tao, Xiaofeng Zhao, Chang Su, Xinhua Yang, Weibin Meng, Yuming Xie, Boxing Chen, Shenglin Zhang, and Yongqian Sun. Adapting large language models to log analysis with interpretable domain knowledge. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, CIKM ’25, pa...

  68. [68]

    Survey of hallucination in natural language generation, 2023

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tianle Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation, 2023. URL https://arxiv.org/abs/2304.04710

  69. [69]

    Zhihan Jiang, Jinyang Liu, Zhuangbin Chen, Yichen Li, Junjie Huang, Yintong Huo, Pinjia He, Jiazhen Gu, and Michael R. Lyu. Lilac: Log parsing using llms with adaptive parsing cache.Proceedings of the ACM on Software Engineering, 1(FSE), 2024. doi: 10 .1145/3643733. URL https://doi.org/10.1145/3643733

  70. [70]

    Language Models (Mostly) Know What They Know

    Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Dawn Drain, Nelson Chen, Yuntao Bai, Jared Kaplan, Sam McCandlish, Dario Amodei, Ethan Chen, and Catherine Olsson. Language models (mostly) know what they know, 2022. URL https://arxiv.org/abs/2207.05221

  71. [71]

    Scaling Laws for Neural Language Models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. URL https://arxiv .org/abs/2001.08361

  72. [72]

    Exploring semantic vs

    Crystal Karlsen, Denis Copstein, Yue Luo, Benjamin Schwartzentruber, Tim Niblett, and Olivier Rouyer. Exploring semantic vs. syntactic features for unsupervised learning on application log files. In2023 International Conference on Cyber Security and Networks (CSNet), 2023. doi: 10.1109/CSNet59123.2023.10339765

  73. [73]

    Benchmarking large language models for log analysis, security, and interpretation.Journal of Network and Systems Management, 32:59, 2024

    Egil Karlsen, Xiao Luo, Nur Zincir-Heywood, and Malcolm Heywood. Benchmarking large language models for log analysis, security, and interpretation.Journal of Network and Systems Management, 32:59, 2024. doi: 10.1007/s10922-024-09831-x

  74. [74]

    Dense passage retrieval for open-domain question answering

    Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781. Association for Computational Linguistics, 2020. doi: 10 .18653/v1/2...

  75. [75]

    Nist special publication 800-92: Guide to computer security log management, 2006

    Karen Kent and Murugiah Souppaya. Nist special publication 800-92: Guide to computer security log management, 2006. URL https://csrc.nist.gov/publications/detail/sp/800-92/final

  76. [76]

    Log-based anomaly detection without log parsing

    Van-Hoang Le and Hongyu Zhang. Log-based anomaly detection without log parsing. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM, 2021

  77. [77]

    In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

    Van-Hoang Le and Hongyu Zhang. Log parsing with prompt-based few-shot learning. InProceedings of the 45th International Conference on Software Engineering (ICSE 2023), pages 2438–2449. IEEE, 2023. doi: 10 .1109/ICSE48619.2023.00204. URL https://conf .researchr.org/ details/icse-2023/icse-2023-technical-track/165/Log-Parsing-with-Prompt-based-Few-shot-Learning

  78. [78]

    In: Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 34, pp 27,865–27,876,https://proceedings

    Van-Hoang Le and Hongyu Zhang. Log parsing: How far can chatgpt go? InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, ASE ’23, page 1699–1704. IEEE Press, 2024. ISBN 9798350329964. doi: 10.1109/ASE56229.2023.00206. URL https://doi.org/10.1109/ASE56229.2023.00206

  79. [79]

    Prelog: A pre-trained model for log analytics.Proceedings of the ACM on Management of Data, 2(3): 1–28, 2024

    Van-Hoang Le and Hongyu Zhang. Prelog: A pre-trained model for log analytics.Proceedings of the ACM on Management of Data, 2(3): 1–28, 2024. doi: 10.1145/3654966. URL https://dl.acm.org/doi/10.1145/3654966. Presented at SIGMOD 2024

  80. [80]

    Unleashing the true potential of semantic-based log parsing with pre-trained language models

    Van-Hoang Le, Yi Xiao, and Hongyu Zhang. Unleashing the true potential of semantic-based log parsing with pre-trained language models. InProceedings of the 47th International Conference on Software Engineering (ICSE 2025). IEEE/ACM, 2025. URL https://conf .researchr.org/details/icse-2025/icse-2025-research-track/80/Unleashing-the-True-Potential-of-Semanti...

Showing first 80 references.