pith. machine review for the scientific record. sign in

arxiv: 2604.16359 · v1 · submitted 2026-03-18 · 💻 cs.SE

Recognition: no theorem link

LLM4Log: A Systematic Review of Large Language Model-based Log Analysis

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:18 UTC · model grok-4.3

classification 💻 cs.SE
keywords large language modelslog analysissystematic reviewanomaly detectionroot cause analysislog parsingAIOpsreliability engineering
0
0 comments X

The pith

A review of 145 papers maps LLM use across seven log analysis tasks and distills patterns plus adoption challenges.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Software systems produce large volumes of evolving logs that support reliability engineering yet resist scalable analysis because of concept drift and scarce labels. Large language models address this by enabling semantic understanding and integration of evidence across sources for tasks ranging from log statement generation to anomaly detection, failure prediction, root cause analysis, and summarization. The paper follows a structured search protocol to collect literature through November 2025 and identifies 145 relevant papers. It organizes the work under a single task-driven taxonomy, catalogs recurring design patterns such as prompting, retrieval grounding, fine-tuning, and agent augmentation, and examines evaluation datasets, metrics, and reproducibility. A reader cares because the synthesis surfaces concrete lessons on robustness, faithfulness, and verifiable behavior that must be solved before these models can move from research prototypes to production AIOps systems.

Core claim

Following a structured search and manual screening protocol completed in November 2025, the review identifies 145 unique papers on LLM-based log analysis across seven tasks. It synthesizes the field through a unified task-driven taxonomy, summarizes common design patterns including prompting and in-context learning, retrieval grounding, fine-tuning, tool and agent augmentation, and verification, and analyzes evaluation practices, datasets, metrics, and reproducibility. From these cross-paper findings it distills key lessons and open challenges for reliable real-world adoption, with emphasis on robustness under drift and long-tail events, grounding and faithfulness for operator-facing outputs

What carries the argument

A unified task-driven taxonomy that classifies LLM-based log analysis research into seven tasks from upstream logging-statement generation through parsing and downstream analysis while cross-cutting common design patterns.

If this is right

  • Adoption of the identified design patterns such as retrieval grounding and verification steps can improve output reliability in LLM log analyzers.
  • Evaluation practices must incorporate tests for drift and long-tail events to match real deployment conditions.
  • Grounding mechanisms become necessary for any operator-facing outputs to maintain faithfulness.
  • Reproducibility gaps indicate that shared datasets and standardized benchmarks will accelerate progress.
  • Deployment considerations around latency, cost, and privacy point to the value of hybrid systems that combine LLMs with lighter verification layers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The taxonomy offers a ready structure for future reviews that compare LLM methods directly against earlier non-LLM log analysis techniques.
  • Privacy and context-length constraints may push development of domain-specific smaller models fine-tuned only on log data.
  • Lessons on hallucination risks could transfer to LLM applications in other software engineering tasks such as code review or incident summarization.
  • If the design patterns prove stable, they could serve as a template for LLM pipelines that process other forms of semi-structured operational data.

Load-bearing premise

The structured search protocol and manual screening captured a representative and unbiased sample of all relevant LLM-based log analysis papers published up to November 2025.

What would settle it

A later exhaustive search that locates a substantially larger or materially different set of papers on LLM-based log analysis published before November 2025 would show the collection was incomplete.

Figures

Figures reproduced from arXiv: 2604.16359 by Jinqiu Yang, Tse-Hsun Chen, Zeyang Ma.

Figure 1
Figure 1. Figure 1: LLM-based Log Analysis Across the Pipeline: From Logging to Downstream Tasks [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Temporal and overall distribution of LLM4Log analysis papers across tasks. Note: Pie chart labels are shown as n (u), [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An example log parsing result from Hadoop. [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Common workflow for LLM-enabled downstream log analysis. [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An illustrative overview of log window representations: raw logs are parsed into templates and parameters, then [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
read the original abstract

Software systems generate massive, evolving, semi-structured logs that are central to reliability engineering and AIOps, yet difficult to analyze at scale under drift and limited labels. Recent advances in pretrained Transformer models and instruction-tuned large language models (LLMs) have reshaped log analysis by enabling semantic generalization and cross-source evidence integration, but also introducing deployment risks such as context limits, latency/cost, privacy constraints, and hallucinations. This paper presents LLM4Log, a systematic review of LLM-based log analysis across the end-to-end pipeline, from upstream logging-statement generation and maintenance to log parsing/structuring and downstream tasks including anomaly detection, failure prediction, root cause analysis, and log summarization. Following a structured search and manual screening protocol, we completed literature collection in November 2025 and identified 145 unique papers across seven logging tasks. We synthesize the research area through a unified, task-driven taxonomy, summarize common design patterns (prompting/ICL, retrieval grounding, fine-tuning, tool/agent augmentation, and verification), and analyze evaluation practices, datasets, metrics, and reproducibility. Based on these cross-paper analyses, we distill key lessons and open challenges for reliable real-world adoption. We emphasize robustness under drift and long-tail events, grounding and faithfulness for operator-facing outputs, and deployment-oriented designs with verifiable behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript presents LLM4Log, a systematic review of large language model applications to log analysis in software systems. Following a structured search and manual screening protocol completed in November 2025, the authors identify 145 unique papers across seven tasks (logging statement generation/maintenance, parsing/structuring, anomaly detection, failure prediction, root cause analysis, and summarization). They synthesize the literature into a unified task-driven taxonomy, catalog common design patterns (prompting/ICL, retrieval grounding, fine-tuning, tool/agent augmentation, verification), analyze evaluation practices/datasets/metrics/reproducibility, and distill lessons plus open challenges centered on drift robustness, output faithfulness, and deployable designs.

Significance. If the screened corpus is representative, the review supplies a timely consolidation of a fast-growing intersection between LLMs and AIOps/reliability engineering. By extracting cross-paper patterns and explicitly linking them to practical risks (context limits, hallucinations, privacy), it offers researchers a shared reference frame and gives practitioners concrete guidance on moving from prototypes to verifiable production systems. The absence of internal circularity and the direct grounding in the 145-paper corpus strengthen its utility as a field map.

minor comments (3)
  1. [§3] §3 (Search Protocol): the PRISMA-style flow diagram would be clearer if it explicitly reported the number of papers excluded at each screening stage rather than only final counts.
  2. [Table 2] Table 2 (Design Patterns): several rows list 'hybrid' approaches without a footnote defining the exact combination criteria, which could confuse readers comparing prompting-only vs. retrieval-augmented entries.
  3. [§5.3] §5.3 (Reproducibility): the discussion of dataset availability would be strengthened by adding a column or supplementary table indicating which of the 145 papers release code or data.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive evaluation of the manuscript and their recommendation to accept. We are pleased that the review recognizes the timeliness of the systematic consolidation of LLM-based log analysis research and its practical value for both researchers and practitioners in AIOps.

Circularity Check

0 steps flagged

No significant circularity in systematic review synthesis

full rationale

This paper is a systematic literature review that follows standard SE review protocols: structured search, manual screening, and synthesis of 145 external papers into a task-driven taxonomy. No internal mathematical derivations, fitted parameters, self-definitional loops, or load-bearing self-citations exist. The taxonomy, design patterns, and lessons are presented as direct outcomes of the screened external literature rather than reductions to the paper's own inputs. The methodology is self-contained against external benchmarks and does not rely on any circular chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; the paper is a literature synthesis that relies on standard systematic-review methodology.

pith-pipeline@v0.9.0 · 5535 in / 1061 out tokens · 33463 ms · 2026-05-15T08:18:09.103339+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

224 extracted references · 224 canonical work pages · 26 internal anchors

  1. [1]

    Llm-based event log analysis techniques: A survey.arXiv preprint arXiv:2502.00677, 2025

    Siraaj Akhtar, Saad Khan, and Simon Parkinson. Llm-based event log analysis techniques: A survey.arXiv preprint arXiv:2502.00677, 2025

  2. [2]

    Logfit: Log anomaly detection using fine-tuned language models.IEEE Transactions on Network and Service Management, 21(2):1715–1723, 2024

    Crispin Almodovar, Fariza Sabrina, Sarvnaz Karimi, and Salahuddin Azad. Logfit: Log anomaly detection using fine-tuned language models.IEEE Transactions on Network and Service Management, 21(2):1715–1723, 2024

  3. [3]

    Apache log4j 2

    Apache Software Foundation. Apache log4j 2. Online documentation, 2024. URL https://logging.apache.org/log4j/2.x/

  4. [4]

    A comparative study on large language models for log parsing

    Merve Astekin, Max Hort, and Leon Moonen. A comparative study on large language models for log parsing. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2024), pages 36–47. ACM, 2024. doi: 10.1145/3674805.3686684. URL https://doi.org/10.1145/3674805.3686684

  5. [5]

    AnomalyExplainerBot: Explainable AI for LLM-based anomaly detection using BERTViz & Captum, 2025

    Prasasthy Balasubramanian, Dumindu Kankanamge, Ekaterina Gilman, and Mourad Oussalah. AnomalyExplainerBot: Explainable AI for LLM-based anomaly detection using BERTViz & Captum, 2025

  6. [6]

    System log parsing with large language models: A review.arXiv preprint arXiv:2504.04877, 2025

    Viktor Beck, Max Landauer, Markus Wurzenberger, Florian Skopik, and Andreas Rauber. System log parsing with large language models: A review.arXiv preprint arXiv:2504.04877, 2025. 37

  7. [7]

    APT-LLM: Embedding-based anomaly detection of cyber advanced persistent threats using large language models

    Sidahmed Benabderrahmane, Petko Valtchev, James Cheney, and Talal Rahwan. APT-LLM: Embedding-based anomaly detection of cyber advanced persistent threats using large language models. InProceedings of the 13th International Symposium on Digital Forensics and Security (ISDFS 2025), pages 1–6, 2025. doi: 10.1109/ISDFS65363.2025.11011912

  8. [8]

    Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, and Yejin Choi

    Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks, 5(2):157–166, 1994. doi: 10.1109/72.279181

  9. [9]

    Auto-logging: Ai-centred logging instrumentation

    Jasmin Bogatinovski and Odej Kao. Auto-logging: Ai-centred logging instrumentation. In2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pages 95–100, 2023. doi: 10 .1109/ICSE-NIER58687.2023.00023. URL https://doi.org/10.1109/ICSE-NIER58687.2023.00023

  10. [10]

    Good enough to learn: LLM-based anomaly detection in ECU logs without reliable labels, 2025

    Bogdan Bogdan, Arina Cazacu, and Laura Vasilie. Good enough to learn: LLM-based anomaly detection in ECU logs without reliable labels, 2025. URL https://arxiv.org/abs/2507.01077. Accepted to IEEE Intelligent Vehicles Symposium (IV) 2025

  11. [11]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Jeff Dean, et al. On the opportunities and risks of foundation m...

  12. [12]

    Language models are few-shot learners.Advances in neural information processing systems, 33: 1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33: 1877–1901, 2020

  13. [13]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. In30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021. URL https://www .usenix.org/conference/...

  14. [14]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

  15. [15]

    Bert-log: Anomaly detection for system logs based on pre-trained language model.Applied Artificial Intelligence, 36(1):2145642, 2022

    Song Chen and Hai Liao. Bert-log: Anomaly detection for system logs based on pre-trained language model.Applied Artificial Intelligence, 36(1):2145642, 2022. doi: 10 .1080/08839514.2022.2145642. URL https://www .tandfonline.com/doi/full/10.1080/08839514.2022.2145642

  16. [16]

    Epas: Efficient online log parsing via asynchronous scheduling of llm queries

    Xiaolei Chen, Jie Shi, Jia Chen, Peng Wang, and Wei Wang. Epas: Efficient online log parsing via asynchronous scheduling of llm queries. InProceedings of the 41st IEEE International Conference on Data Engineering (ICDE 2025), pages 4025–4037. IEEE, 2025. doi: 10.1109/ICDE.2025.00318. URL https://ieeexplore.ieee.org/abstract/document/11113127

  17. [17]

    Automatic root cause analysis via large language models for cloud incidents

    Yinfang Chen, Huaibing Xie, Minghua Ma, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, Xuedong Gao, Hao Fan, Ming Wen, et al. Automatic root cause analysis via large language models for cloud incidents. InProceedings of the Nineteenth European Conference on Computer Systems, pages 674–688, 2024

  18. [18]

    Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei

    Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, volume 30, 2017. URL https://papers .neurips.cc/paper/7017-deep- reinforcement-learning-from-human-preferences

  19. [19]

    Electra: Pre-training text encoders as discriminators rather than generators.arXiv preprint arXiv:2003.10555, 2020

    Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. Electra: Pre-training text encoders as discriminators rather than generators.arXiv preprint arXiv:2003.10555, 2020

  20. [20]

    Semantic hierarchical classification applied to anomaly detection using system logs with a bert model.Applied Sciences, 14(13), 2024

    Clara Corbelle, Victor Carneiro, and Fidel Cacheda. Semantic hierarchical classification applied to anomaly detection using system logs with a bert model.Applied Sciences, 14(13), 2024. ISSN 2076-3417. doi: 10 .3390/app14135388. URL https://www .mdpi.com/2076- 3417/14/13/5388

  21. [21]

    Aetherlog: Log-based root cause analysis by integrating large language models with knowledge graphs

    Tianyu Cui, Ruowei Fu, Changchang Liu, Yuhe Ji, Wenwei Gu, Shenglin Zhang, Yongqian Sun, and Dan Pei. Aetherlog: Log-based root cause analysis by integrating large language models with knowledge graphs. In2025 IEEE 36th International Symposium on Software Reliability Engineering (ISSRE), pages 49–60. IEEE, 2025

  22. [22]

    Logeval: A comprehensive benchmark suite for llms in log analysis.Empirical Softw

    Tianyu Cui, Shiyu Ma, Ziang Chen, Tong Xiao, Chenyu Zhao, Shimin Tao, Yilun Liu, Shenglin Zhang, Duoming Lin, Changchang Liu, Yuzhe Cai, Weibin Meng, Yongqian Sun, and Dan Pei. Logeval: A comprehensive benchmark suite for llms in log analysis.Empirical Softw. Engg., 30(6), October 2025. ISSN 1382-3256. doi: 10 .1007/s10664-025-10701-6. URL https://doi .or...

  23. [23]

    Logram: Efficient log parsing using𝑛 n-gram dictionaries

    Hetong Dai, Heng Li, Che-Shao Chen, Weiyi Shang, and Tse-Hsun Chen. Logram: Efficient log parsing using𝑛 n-gram dictionaries. IEEE Transactions on Software Engineering, 48(3):879–892, 2020

  24. [24]

    loguru: Python logging made simple

    Delgan and contributors. loguru: Python logging made simple. Online documentation, 2024. URL https://github .com/Delgan/loguru. 38

  25. [25]

    QLoRA: Efficient Finetuning of Quantized LLMs

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms, 2023. URL https://arxiv.org/abs/2305.14314

  26. [26]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

  27. [27]

    Pdlogger: Automated logging framework for practical software development, 2025

    Shengcheng Duan, Yihua Xu, Sheng Zhang, Shen Wang, and Yue Duan. Pdlogger: Automated logging framework for practical software development, 2025. URL https://arxiv.org/abs/2507.19951

  28. [28]

    Unsupervised log parsing based on large language models and entropy

    Yiqi Duan, Jianliang Xu, Changyu Fan, and Zixin Liu. Unsupervised log parsing based on large language models and entropy. In2025 11th International Symposium on System Security, Safety, and Reliability (ISSSR), pages 1–10. IEEE, 2025

  29. [29]

    Early exploration of using ChatGPT for log-based anomaly detection on parallel file systems logs

    Chris Egersdoerfer, Di Zhang, and Dong Dai. Early exploration of using ChatGPT for log-based anomaly detection on parallel file systems logs. InProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’23), pages 315–316. Association for Computing Machinery, 2023. doi: 10.1145/3588195.3595943. URL https:...

  30. [30]

    Insightai: Root cause analysis in large log files with private data using large language model

    Maryam Ekhlasi, Anurag Prakash, Maxime Lamothe, and Michel Dagenais. Insightai: Root cause analysis in large log files with private data using large language model. In2025 IEEE/ACM 4th International Conference on AI Engineering – Software Engineering for AI (CAIN), pages 31–41, 2025. doi: 10.1109/CAIN66642.2025.00012

  31. [31]

    A novel framework for detecting anomalies in network security using LLM and deep learning.Journal of Electrical Systems, 21(1s):294–302, 2025

    Mahmoudreza Entezami, Shahabeddin Rahimi Harsini, David Houshangi, and Zahra Entezami. A novel framework for detecting anomalies in network security using LLM and deep learning.Journal of Electrical Systems, 21(1s):294–302, 2025. doi: 10 .52783/jes.8791

  32. [32]

    How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings

    Kawin Ethayarajh. How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Confere...

  33. [33]

    Regulation (eu) 2016/679 of the european parliament and of the council (general data protection regulation)

    European Parliament and Council of the European Union. Regulation (eu) 2016/679 of the european parliament and of the council (general data protection regulation). Official Journal of the European Union, L119, 2016. URL https://eur-lex.europa.eu/eli/reg/2016/679/oj

  34. [34]

    Log anomaly detection by leveraging llm-based parsing and embedding with attention mechanism

    Asma Fariha, Vida Gharavian, Masoud Makrehchi, Shahryar Rahnamayan, Sanaa Alwidian, and Akramul Azim. Log anomaly detection by leveraging llm-based parsing and embedding with attention mechanism. In2024 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pages 859–863. IEEE, 2024

  35. [35]

    Codebert: A pre-trained model for programming and natural languages

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. Codebert: A pre-trained model for programming and natural languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, pages 1536–1547. Association for Computational Linguistics, 2020. doi: 10 .18653/...

  36. [36]

    Security and privacy controls for information systems and organizations (nist special publication 800-53 revision 5),

    Joint Task Force. Security and privacy controls for information systems and organizations (nist special publication 800-53 revision 5),

  37. [37]

    URL https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final

  38. [38]

    Amy Foster and Selva Kumar.COMBINING LLMS AND SHELL LOGS TO PREDICT BACKUP FAILURES. 07 2025

  39. [39]

    Where do developers log? an empirical study on logging practices in industry

    Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie. Where do developers log? an empirical study on logging practices in industry. InCompanion Proceedings of the 36th International Conference on Software Engineering, pages 24–33. ACM, 2014. doi: 10.1145/2591062.2591175

  40. [40]

    End-to-end log statement generation at block-level.Journal of Systems and Software, 216:112146, 2024

    Ying Fu, Meng Yan, Pinjia He, Chao Liu, Xiaohong Zhang, and Dan Yang. End-to-end log statement generation at block-level.Journal of Systems and Software, 216:112146, 2024. doi: 10.1016/j.jss.2024.112146

  41. [41]

    Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christopher Endres, Thorsten Holz, and Mario Fritz. Not what you ´ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection, 2023. URL https://arxiv.org/abs/2302.12173

  42. [42]

    LogLLM: Log-based anomaly detection using large language models.arXiv preprint,

    Wei Guan, Jian Cao, Shiyou Qian, and Jianqi Gao. LogLLM: Log-based anomaly detection using large language models.arXiv preprint,

  43. [43]

    URL https://arxiv.org/abs/2411.08561

  44. [44]

    H. Guo, S. Yuan, and X. Wu. Logbert: Log anomaly detection via bert. In2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021

  45. [45]

    Logformer: A pre-train and tuning pipeline for log anomaly detection

    Hongcheng Guo, Jian Yang, Jiaheng Liu, Jiaqi Bai, Boyang Wang, Zhoujun Li, Tieqiao Zheng, Bo Zhang, Junran Peng, and Qi Tian. Logformer: A pre-train and tuning pipeline for log anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 135–143, 2024

  46. [46]

    Owl: A large language model for it operations

    Hongcheng Guo, Jian Yang, Jiaheng Liu, Liqun Yang, Linzheng Chai, Jiaqi Bai, Junran Peng, Xiaorong Hu, Chao Chen, Dongfeng Zhang, Xu Shi, Tieqiao Zheng, Liangfan Zheng, Bo Zhang, Ke Xu, and Zhoujun Li. Owl: A large language model for it operations. InProceedings of the Twelfth International Conference on Learning Representations (ICLR 2024), 2024. URL htt...

  47. [47]

    Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. Don’t stop pretraining: Adapt language models to domains and tasks. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 8342–8360. Association for Computational Linguistics, 2020. doi: 10 .18653/v1/2...

  48. [48]

    Llm meets ml: Data-efficient anomaly detection on unstable logs.ACM Transactions on Software Engineering and Methodology, 2025

    Fatemeh Hadadi, Qinghua Xu, Domenico Bianculli, and Lionel Briand. Llm meets ml: Data-efficient anomaly detection on unstable logs.ACM Transactions on Software Engineering and Methodology, 2025. doi: 10 .1145/3771283. URL https://doi .org/10.1145/3771283. arXiv:2406.07467

  49. [49]

    Llmelog: An approach for anomaly detection based on llm-enriched log events

    Minghua He, Tong Jia, Chiming Duan, Huaqian Cai, Ying Li, and Gang Huang. Llmelog: An approach for anomaly detection based on llm-enriched log events. In2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pages 132–143. IEEE, 2024

  50. [50]

    Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. Drain: An online log parsing approach with fixed depth tree. In2017 IEEE International Conference on Web Services (ICWS), pages 33–40, 2017. doi: 10.1109/ICWS.2017.13

  51. [51]

    A survey on automated log analysis for reliability engineering.ACM computing surveys (CSUR), 54(6):1–37, 2021

    Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. A survey on automated log analysis for reliability engineering.ACM computing surveys (CSUR), 54(6):1–37, 2021

  52. [52]

    Parameter-efficient log anomaly detection based on pre-training model and lora

    Shiming He, Ying Lei, Ying Zhang, Kun Xie, and Pradip Kumar Sharma. Parameter-efficient log anomaly detection based on pre-training model and lora. In2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pages 207–217. IEEE, 2023

  53. [53]

    Benchmarking open-source large language models for log level suggestion

    Yi Wen Heng, Zeyang Ma, Zhenhao Li, Dong Jae Kim, and Tse-Hsun Chen. Benchmarking open-source large language models for log level suggestion. In2025 IEEE Conference on Software Testing, Verification and Validation (ICST), pages 314–325, 2025. doi: 10.1109/ICST62969.2025.10988921

  54. [54]

    Diagnosing robotics systems issues with large language models–a case study

    Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, and Mark Niklas Mueller. Diagnosing robotics systems issues with large language models–a case study. InICLR 2025 Workshop on Foundation Models in the Wild

  55. [55]

    Long short-term memory.Neural Computation, 9(8):1735–1780, 1997

    Sepp Hochreiter and J"urgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735–1780, 1997. doi: 10 .1162/ neco.1997.9.8.1735

  56. [56]

    Reguly, Kálmán Tornai, Tamás Zsedrovits, and Zoltán Máthé

    András Horváth, András Oláh, Attila Pintér, Bálint Siklósi, Gergely Lukács, István Z. Reguly, Kálmán Tornai, Tamás Zsedrovits, and Zoltán Máthé. Anomaly detection algorithms for real-time log data analysis at scale.IEEE Access, 13:136288–136311, 2025. doi: 10.1109/ACCESS.2025.3565575

  57. [57]

    Research on log anomaly detection based on sentence-BERT.Electronics, 12(17):3580, 2023

    Changze Hu, Yu Fang, Jinhua Wu, Haoyang Li, and Geng Wang. Research on log anomaly detection based on sentence-BERT.Electronics, 12(17):3580, 2023. doi: 10.3390/electronics12173580. URL https://www.mdpi.com/2079-9292/12/17/3580

  58. [58]

    Demystifying and extracting fault-indicating information from logs for failure diagnosis

    Junjie Huang, Zhihan Jiang, Jinyang Liu, Yintong Huo, Jiazhen Gu, Zhuangbin Chen, Cong Feng, Hui Dong, Zengyin Yang, and Michael R Lyu. Demystifying and extracting fault-indicating information from logs for failure diagnosis. In2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pages 511–522. IEEE, 2024

  59. [59]

    Junjie Huang, Zhihan Jiang, Zhuangbin Chen, and Michael R. Lyu. No more labelled examples? an unsupervised log parser with llms. InProceedings of the ACM on Software Engineering (FSE 2025), number FSE. ACM, 2025. doi: 10 .1145/3729377. URL https: //dl.acm.org/doi/10.1145/3729377

  60. [60]

    Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan

    Shaohan Huang, Yi Liu, Carol J. Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. Hitanomaly: Hierarchical transformers for anomaly detection in system log.IEEE Transactions on Network and Service Management, 17(2):2064–2076, 2020. doi: 10 .1109/ TNSM.2020.3034647

  61. [61]

    Fung, He Wang, Hailong Yang, and Zhongzhi Luan

    Shaohan Huang, Yi Liu, Carol J. Fung, He Wang, Hailong Yang, and Zhongzhi Luan. Improving log-based anomaly detection by pre-training hierarchical transformers.IEEE Transactions on Computers, 72(9):2656–2667, 2023. doi: 10.1109/TC.2023.3257518

  62. [62]

    LogRules: Enhancing log analysis capability of large language models through rules

    Xin Huang, Ting Zhang, and Wen Zhao. LogRules: Enhancing log analysis capability of large language models through rules. In Luis Chiruzzo, Alan Ritter, and Lu Wang, editors,Findings of the Association for Computational Linguistics: NAACL 2025, pages 452– 470, Albuquerque, New Mexico, April 2025. Association for Computational Linguistics. ISBN 979-8-89176-...

  63. [63]

    Xuanbo Huang, Kaiping Xue, Lutong Chen, Jiangping Han, Jian Li, and David S. L. Wei. Forensix: Automated network forensics and diagnostics for beyond-5g and 6g networks using large language models.IEEE Network, 39(5):74–80, 2025. doi: 10 .1109/ MNET.2025.3579925

  64. [64]

    Yintong Huo, Cheryl Lee, Yuxin Su, Shiwen Shan, Jinyang Liu, and Michael R. Lyu. Evlog: Evolving log analyzer for anomalous logs identification. arXiv preprint arXiv:2306.01509, 2023. URL https://arxiv.org/abs/2306.01509

  65. [65]

    Fall: Prior failure detection in large scale system based on language model.IEEE Transactions on Dependable and Secure Computing, 22(1):279–291, 2025

    Jaeyoon Jeong, Insung Baek, Byungwoo Bang, Junyeon Lee, Uiseok Song, and Seoung Bum Kim. Fall: Prior failure detection in large scale system based on language model.IEEE Transactions on Dependable and Secure Computing, 22(1):279–291, 2025. doi: 10.1109/TDSC.2024.3396166

  66. [66]

    Lemad: LLM-empowered multi-agent system for anomaly detection in power grid services.Electronics, 14(15):3008, 2025

    Xin Ji, Le Zhang, Wenya Zhang, Fang Peng, Yifan Mao, Xingchuang Liao, and Kui Zhang. Lemad: LLM-empowered multi-agent system for anomaly detection in power grid services.Electronics, 14(15):3008, 2025. doi: 10 .3390/electronics14153008. URL https: //doi.org/10.3390/electronics14153008

  67. [67]

    Adapting large language models to log analysis with interpretable domain knowledge

    Yuhe Ji, Yilun Liu, Feiyu Yao, Minggui He, Shimin Tao, Xiaofeng Zhao, Chang Su, Xinhua Yang, Weibin Meng, Yuming Xie, Boxing Chen, Shenglin Zhang, and Yongqian Sun. Adapting large language models to log analysis with interpretable domain knowledge. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, CIKM ’25, pa...

  68. [68]

    Survey of hallucination in natural language generation, 2023

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tianle Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation, 2023. URL https://arxiv.org/abs/2304.04710

  69. [69]

    Zhihan Jiang, Jinyang Liu, Zhuangbin Chen, Yichen Li, Junjie Huang, Yintong Huo, Pinjia He, Jiazhen Gu, and Michael R. Lyu. Lilac: Log parsing using llms with adaptive parsing cache.Proceedings of the ACM on Software Engineering, 1(FSE), 2024. doi: 10 .1145/3643733. URL https://doi.org/10.1145/3643733

  70. [70]

    Language Models (Mostly) Know What They Know

    Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Dawn Drain, Nelson Chen, Yuntao Bai, Jared Kaplan, Sam McCandlish, Dario Amodei, Ethan Chen, and Catherine Olsson. Language models (mostly) know what they know, 2022. URL https://arxiv.org/abs/2207.05221

  71. [71]

    Scaling Laws for Neural Language Models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. URL https://arxiv .org/abs/2001.08361

  72. [72]

    Exploring semantic vs

    Crystal Karlsen, Denis Copstein, Yue Luo, Benjamin Schwartzentruber, Tim Niblett, and Olivier Rouyer. Exploring semantic vs. syntactic features for unsupervised learning on application log files. In2023 International Conference on Cyber Security and Networks (CSNet), 2023. doi: 10.1109/CSNet59123.2023.10339765

  73. [73]

    Benchmarking large language models for log analysis, security, and interpretation.Journal of Network and Systems Management, 32:59, 2024

    Egil Karlsen, Xiao Luo, Nur Zincir-Heywood, and Malcolm Heywood. Benchmarking large language models for log analysis, security, and interpretation.Journal of Network and Systems Management, 32:59, 2024. doi: 10.1007/s10922-024-09831-x

  74. [74]

    Dense passage retrieval for open-domain question answering

    Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781. Association for Computational Linguistics, 2020. doi: 10 .18653/v1/2...

  75. [75]

    Nist special publication 800-92: Guide to computer security log management, 2006

    Karen Kent and Murugiah Souppaya. Nist special publication 800-92: Guide to computer security log management, 2006. URL https://csrc.nist.gov/publications/detail/sp/800-92/final

  76. [76]

    Log-based anomaly detection without log parsing

    Van-Hoang Le and Hongyu Zhang. Log-based anomaly detection without log parsing. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM, 2021

  77. [77]

    Automated program repair in the era of large pre-trained language models

    Van-Hoang Le and Hongyu Zhang. Log parsing with prompt-based few-shot learning. InProceedings of the 45th International Conference on Software Engineering (ICSE 2023), pages 2438–2449. IEEE, 2023. doi: 10 .1109/ICSE48619.2023.00204. URL https://conf .researchr.org/ details/icse-2023/icse-2023-technical-track/165/Log-Parsing-with-Prompt-based-Few-shot-Learning

  78. [78]

    Log parsing: How far can chatgpt go? InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, ASE ’23, page 1699–1704

    Van-Hoang Le and Hongyu Zhang. Log parsing: How far can chatgpt go? InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, ASE ’23, page 1699–1704. IEEE Press, 2024. ISBN 9798350329964. doi: 10.1109/ASE56229.2023.00206. URL https://doi.org/10.1109/ASE56229.2023.00206

  79. [79]

    Prelog: A pre-trained model for log analytics.Proceedings of the ACM on Management of Data, 2(3): 1–28, 2024

    Van-Hoang Le and Hongyu Zhang. Prelog: A pre-trained model for log analytics.Proceedings of the ACM on Management of Data, 2(3): 1–28, 2024. doi: 10.1145/3654966. URL https://dl.acm.org/doi/10.1145/3654966. Presented at SIGMOD 2024

  80. [80]

    Unleashing the true potential of semantic-based log parsing with pre-trained language models

    Van-Hoang Le, Yi Xiao, and Hongyu Zhang. Unleashing the true potential of semantic-based log parsing with pre-trained language models. InProceedings of the 47th International Conference on Software Engineering (ICSE 2025). IEEE/ACM, 2025. URL https://conf .researchr.org/details/icse-2025/icse-2025-research-track/80/Unleashing-the-True-Potential-of-Semanti...

Showing first 80 references.