arxiv: 2604.16359 · v1 · submitted 2026-03-18 · 💻 cs.SE

Recognition: no theorem link

LLM4Log: A Systematic Review of Large Language Model-based Log Analysis

Zeyang Ma , Jinqiu Yang , Tse-Hsun Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:18 UTC · model grok-4.3

classification 💻 cs.SE

keywords large language modelslog analysissystematic reviewanomaly detectionroot cause analysislog parsingAIOpsreliability engineering

0 comments

The pith

A review of 145 papers maps LLM use across seven log analysis tasks and distills patterns plus adoption challenges.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Software systems produce large volumes of evolving logs that support reliability engineering yet resist scalable analysis because of concept drift and scarce labels. Large language models address this by enabling semantic understanding and integration of evidence across sources for tasks ranging from log statement generation to anomaly detection, failure prediction, root cause analysis, and summarization. The paper follows a structured search protocol to collect literature through November 2025 and identifies 145 relevant papers. It organizes the work under a single task-driven taxonomy, catalogs recurring design patterns such as prompting, retrieval grounding, fine-tuning, and agent augmentation, and examines evaluation datasets, metrics, and reproducibility. A reader cares because the synthesis surfaces concrete lessons on robustness, faithfulness, and verifiable behavior that must be solved before these models can move from research prototypes to production AIOps systems.

Core claim

Following a structured search and manual screening protocol completed in November 2025, the review identifies 145 unique papers on LLM-based log analysis across seven tasks. It synthesizes the field through a unified task-driven taxonomy, summarizes common design patterns including prompting and in-context learning, retrieval grounding, fine-tuning, tool and agent augmentation, and verification, and analyzes evaluation practices, datasets, metrics, and reproducibility. From these cross-paper findings it distills key lessons and open challenges for reliable real-world adoption, with emphasis on robustness under drift and long-tail events, grounding and faithfulness for operator-facing outputs

What carries the argument

A unified task-driven taxonomy that classifies LLM-based log analysis research into seven tasks from upstream logging-statement generation through parsing and downstream analysis while cross-cutting common design patterns.

If this is right

Adoption of the identified design patterns such as retrieval grounding and verification steps can improve output reliability in LLM log analyzers.
Evaluation practices must incorporate tests for drift and long-tail events to match real deployment conditions.
Grounding mechanisms become necessary for any operator-facing outputs to maintain faithfulness.
Reproducibility gaps indicate that shared datasets and standardized benchmarks will accelerate progress.
Deployment considerations around latency, cost, and privacy point to the value of hybrid systems that combine LLMs with lighter verification layers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy offers a ready structure for future reviews that compare LLM methods directly against earlier non-LLM log analysis techniques.
Privacy and context-length constraints may push development of domain-specific smaller models fine-tuned only on log data.
Lessons on hallucination risks could transfer to LLM applications in other software engineering tasks such as code review or incident summarization.
If the design patterns prove stable, they could serve as a template for LLM pipelines that process other forms of semi-structured operational data.

Load-bearing premise

The structured search protocol and manual screening captured a representative and unbiased sample of all relevant LLM-based log analysis papers published up to November 2025.

What would settle it

A later exhaustive search that locates a substantially larger or materially different set of papers on LLM-based log analysis published before November 2025 would show the collection was incomplete.

Figures

Figures reproduced from arXiv: 2604.16359 by Jinqiu Yang, Tse-Hsun Chen, Zeyang Ma.

**Figure 2.** Figure 2: Temporal and overall distribution of LLM4Log analysis papers across tasks. Note: Pie chart labels are shown as n (u), [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: An example log parsing result from Hadoop. [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

**Figure 4.** Figure 4: Common workflow for LLM-enabled downstream log analysis. [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: An illustrative overview of log window representations: raw logs are parsed into templates and parameters, then [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

read the original abstract

Software systems generate massive, evolving, semi-structured logs that are central to reliability engineering and AIOps, yet difficult to analyze at scale under drift and limited labels. Recent advances in pretrained Transformer models and instruction-tuned large language models (LLMs) have reshaped log analysis by enabling semantic generalization and cross-source evidence integration, but also introducing deployment risks such as context limits, latency/cost, privacy constraints, and hallucinations. This paper presents LLM4Log, a systematic review of LLM-based log analysis across the end-to-end pipeline, from upstream logging-statement generation and maintenance to log parsing/structuring and downstream tasks including anomaly detection, failure prediction, root cause analysis, and log summarization. Following a structured search and manual screening protocol, we completed literature collection in November 2025 and identified 145 unique papers across seven logging tasks. We synthesize the research area through a unified, task-driven taxonomy, summarize common design patterns (prompting/ICL, retrieval grounding, fine-tuning, tool/agent augmentation, and verification), and analyze evaluation practices, datasets, metrics, and reproducibility. Based on these cross-paper analyses, we distill key lessons and open challenges for reliable real-world adoption. We emphasize robustness under drift and long-tail events, grounding and faithfulness for operator-facing outputs, and deployment-oriented designs with verifiable behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward systematic review that organizes 145 papers on LLM log analysis into a usable taxonomy and flags practical deployment issues.

read the letter

This review pulls together work on applying large language models to software logs in a way that should help people working in AIOps and reliability engineering get their bearings. The core piece is the task-driven taxonomy that covers the full pipeline from log statement generation through parsing, anomaly detection, failure prediction, root cause analysis, and summarization. They screened down to 145 papers via a structured search that wrapped up in November 2025 and then extracted common patterns such as prompting with in-context learning, retrieval grounding, fine-tuning, agent/tool use, and verification steps. The cross-paper look at datasets, metrics, evaluation setups, and reproducibility problems is the part that actually adds value beyond just listing papers. It surfaces recurring weaknesses like poor handling of drift and long-tail cases, plus the usual concerns around hallucinations and cost when moving to production. The lessons on needing verifiable outputs and operator-friendly designs follow directly from the reviewed work rather than being tacked on. The main soft spot is the November 2025 cutoff, which means anything newer is already missing in a fast area, and the manual screening step introduces the usual risk of selection bias even if the protocol is standard. The taxonomy itself is sensible but stays at a high level, so readers looking for deep technical comparisons on specific models or loss functions will still need to go back to the originals. This is the sort of paper that belongs in a reading group on software engineering and AI, or for anyone starting a project in log-based monitoring who wants to avoid reinventing obvious patterns. It should go to peer review because the organization and the distilled challenges are concrete enough to be useful even after the inevitable updates.

Referee Report

0 major / 3 minor

Summary. The manuscript presents LLM4Log, a systematic review of large language model applications to log analysis in software systems. Following a structured search and manual screening protocol completed in November 2025, the authors identify 145 unique papers across seven tasks (logging statement generation/maintenance, parsing/structuring, anomaly detection, failure prediction, root cause analysis, and summarization). They synthesize the literature into a unified task-driven taxonomy, catalog common design patterns (prompting/ICL, retrieval grounding, fine-tuning, tool/agent augmentation, verification), analyze evaluation practices/datasets/metrics/reproducibility, and distill lessons plus open challenges centered on drift robustness, output faithfulness, and deployable designs.

Significance. If the screened corpus is representative, the review supplies a timely consolidation of a fast-growing intersection between LLMs and AIOps/reliability engineering. By extracting cross-paper patterns and explicitly linking them to practical risks (context limits, hallucinations, privacy), it offers researchers a shared reference frame and gives practitioners concrete guidance on moving from prototypes to verifiable production systems. The absence of internal circularity and the direct grounding in the 145-paper corpus strengthen its utility as a field map.

minor comments (3)

[§3] §3 (Search Protocol): the PRISMA-style flow diagram would be clearer if it explicitly reported the number of papers excluded at each screening stage rather than only final counts.
[Table 2] Table 2 (Design Patterns): several rows list 'hybrid' approaches without a footnote defining the exact combination criteria, which could confuse readers comparing prompting-only vs. retrieval-augmented entries.
[§5.3] §5.3 (Reproducibility): the discussion of dataset availability would be strengthened by adding a column or supplementary table indicating which of the 145 papers release code or data.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive evaluation of the manuscript and their recommendation to accept. We are pleased that the review recognizes the timeliness of the systematic consolidation of LLM-based log analysis research and its practical value for both researchers and practitioners in AIOps.

Circularity Check

0 steps flagged

No significant circularity in systematic review synthesis

full rationale

This paper is a systematic literature review that follows standard SE review protocols: structured search, manual screening, and synthesis of 145 external papers into a task-driven taxonomy. No internal mathematical derivations, fitted parameters, self-definitional loops, or load-bearing self-citations exist. The taxonomy, design patterns, and lessons are presented as direct outcomes of the screened external literature rather than reductions to the paper's own inputs. The methodology is self-contained against external benchmarks and does not rely on any circular chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; the paper is a literature synthesis that relies on standard systematic-review methodology.

pith-pipeline@v0.9.0 · 5535 in / 1061 out tokens · 33463 ms · 2026-05-15T08:18:09.103339+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

224 extracted references · 224 canonical work pages · 26 internal anchors

[1]

Llm-based event log analysis techniques: A survey.arXiv preprint arXiv:2502.00677, 2025

Siraaj Akhtar, Saad Khan, and Simon Parkinson. Llm-based event log analysis techniques: A survey.arXiv preprint arXiv:2502.00677, 2025

work page arXiv 2025
[2]

Logfit: Log anomaly detection using fine-tuned language models.IEEE Transactions on Network and Service Management, 21(2):1715–1723, 2024

Crispin Almodovar, Fariza Sabrina, Sarvnaz Karimi, and Salahuddin Azad. Logfit: Log anomaly detection using fine-tuned language models.IEEE Transactions on Network and Service Management, 21(2):1715–1723, 2024

work page 2024
[3]

Apache log4j 2

Apache Software Foundation. Apache log4j 2. Online documentation, 2024. URL https://logging.apache.org/log4j/2.x/

work page 2024
[4]

A comparative study on large language models for log parsing

Merve Astekin, Max Hort, and Leon Moonen. A comparative study on large language models for log parsing. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2024), pages 36–47. ACM, 2024. doi: 10.1145/3674805.3686684. URL https://doi.org/10.1145/3674805.3686684

work page doi:10.1145/3674805.3686684 2024
[5]

AnomalyExplainerBot: Explainable AI for LLM-based anomaly detection using BERTViz & Captum, 2025

Prasasthy Balasubramanian, Dumindu Kankanamge, Ekaterina Gilman, and Mourad Oussalah. AnomalyExplainerBot: Explainable AI for LLM-based anomaly detection using BERTViz & Captum, 2025

work page 2025
[6]

System log parsing with large language models: A review.arXiv preprint arXiv:2504.04877, 2025

Viktor Beck, Max Landauer, Markus Wurzenberger, Florian Skopik, and Andreas Rauber. System log parsing with large language models: A review.arXiv preprint arXiv:2504.04877, 2025. 37

work page arXiv 2025
[7]

APT-LLM: Embedding-based anomaly detection of cyber advanced persistent threats using large language models

Sidahmed Benabderrahmane, Petko Valtchev, James Cheney, and Talal Rahwan. APT-LLM: Embedding-based anomaly detection of cyber advanced persistent threats using large language models. InProceedings of the 13th International Symposium on Digital Forensics and Security (ISDFS 2025), pages 1–6, 2025. doi: 10.1109/ISDFS65363.2025.11011912

work page doi:10.1109/isdfs65363.2025.11011912 2025
[8]

Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, and Yejin Choi

Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks, 5(2):157–166, 1994. doi: 10.1109/72.279181

work page doi:10.1109/72.279181 1994
[9]

Auto-logging: Ai-centred logging instrumentation

Jasmin Bogatinovski and Odej Kao. Auto-logging: Ai-centred logging instrumentation. In2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pages 95–100, 2023. doi: 10 .1109/ICSE-NIER58687.2023.00023. URL https://doi.org/10.1109/ICSE-NIER58687.2023.00023

work page doi:10.1109/icse-nier58687.2023.00023 2023
[10]

Good enough to learn: LLM-based anomaly detection in ECU logs without reliable labels, 2025

Bogdan Bogdan, Arina Cazacu, and Laura Vasilie. Good enough to learn: LLM-based anomaly detection in ECU logs without reliable labels, 2025. URL https://arxiv.org/abs/2507.01077. Accepted to IEEE Intelligent Vehicles Symposium (IV) 2025

work page arXiv 2025
[11]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Jeff Dean, et al. On the opportunities and risks of foundation m...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[12]

Language models are few-shot learners.Advances in neural information processing systems, 33: 1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33: 1877–1901, 2020

work page 1901
[13]

Extracting training data from large language models

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. In30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021. URL https://www .usenix.org/conference/...

work page 2021
[14]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[15]

Bert-log: Anomaly detection for system logs based on pre-trained language model.Applied Artificial Intelligence, 36(1):2145642, 2022

Song Chen and Hai Liao. Bert-log: Anomaly detection for system logs based on pre-trained language model.Applied Artificial Intelligence, 36(1):2145642, 2022. doi: 10 .1080/08839514.2022.2145642. URL https://www .tandfonline.com/doi/full/10.1080/08839514.2022.2145642

work page doi:10.1080/08839514.2022.2145642 2022
[16]

Epas: Efficient online log parsing via asynchronous scheduling of llm queries

Xiaolei Chen, Jie Shi, Jia Chen, Peng Wang, and Wei Wang. Epas: Efficient online log parsing via asynchronous scheduling of llm queries. InProceedings of the 41st IEEE International Conference on Data Engineering (ICDE 2025), pages 4025–4037. IEEE, 2025. doi: 10.1109/ICDE.2025.00318. URL https://ieeexplore.ieee.org/abstract/document/11113127

work page doi:10.1109/icde.2025.00318 2025
[17]

Automatic root cause analysis via large language models for cloud incidents

Yinfang Chen, Huaibing Xie, Minghua Ma, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, Xuedong Gao, Hao Fan, Ming Wen, et al. Automatic root cause analysis via large language models for cloud incidents. InProceedings of the Nineteenth European Conference on Computer Systems, pages 674–688, 2024

work page 2024
[18]

Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei

Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, volume 30, 2017. URL https://papers .neurips.cc/paper/7017-deep- reinforcement-learning-from-human-preferences

work page 2017
[19]

Electra: Pre-training text encoders as discriminators rather than generators.arXiv preprint arXiv:2003.10555, 2020

Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. Electra: Pre-training text encoders as discriminators rather than generators.arXiv preprint arXiv:2003.10555, 2020

work page arXiv 2003
[20]

Semantic hierarchical classification applied to anomaly detection using system logs with a bert model.Applied Sciences, 14(13), 2024

Clara Corbelle, Victor Carneiro, and Fidel Cacheda. Semantic hierarchical classification applied to anomaly detection using system logs with a bert model.Applied Sciences, 14(13), 2024. ISSN 2076-3417. doi: 10 .3390/app14135388. URL https://www .mdpi.com/2076- 3417/14/13/5388

work page 2024
[21]

Aetherlog: Log-based root cause analysis by integrating large language models with knowledge graphs

Tianyu Cui, Ruowei Fu, Changchang Liu, Yuhe Ji, Wenwei Gu, Shenglin Zhang, Yongqian Sun, and Dan Pei. Aetherlog: Log-based root cause analysis by integrating large language models with knowledge graphs. In2025 IEEE 36th International Symposium on Software Reliability Engineering (ISSRE), pages 49–60. IEEE, 2025

work page 2025
[22]

Logeval: A comprehensive benchmark suite for llms in log analysis.Empirical Softw

Tianyu Cui, Shiyu Ma, Ziang Chen, Tong Xiao, Chenyu Zhao, Shimin Tao, Yilun Liu, Shenglin Zhang, Duoming Lin, Changchang Liu, Yuzhe Cai, Weibin Meng, Yongqian Sun, and Dan Pei. Logeval: A comprehensive benchmark suite for llms in log analysis.Empirical Softw. Engg., 30(6), October 2025. ISSN 1382-3256. doi: 10 .1007/s10664-025-10701-6. URL https://doi .or...

work page doi:10.1007/s10664-025-10701-6 2025
[23]

Logram: Efficient log parsing using𝑛 n-gram dictionaries

Hetong Dai, Heng Li, Che-Shao Chen, Weiyi Shang, and Tse-Hsun Chen. Logram: Efficient log parsing using𝑛 n-gram dictionaries. IEEE Transactions on Software Engineering, 48(3):879–892, 2020

work page 2020
[24]

loguru: Python logging made simple

Delgan and contributors. loguru: Python logging made simple. Online documentation, 2024. URL https://github .com/Delgan/loguru. 38

work page 2024
[25]

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms, 2023. URL https://arxiv.org/abs/2305.14314

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[27]

Pdlogger: Automated logging framework for practical software development, 2025

Shengcheng Duan, Yihua Xu, Sheng Zhang, Shen Wang, and Yue Duan. Pdlogger: Automated logging framework for practical software development, 2025. URL https://arxiv.org/abs/2507.19951

work page arXiv 2025
[28]

Unsupervised log parsing based on large language models and entropy

Yiqi Duan, Jianliang Xu, Changyu Fan, and Zixin Liu. Unsupervised log parsing based on large language models and entropy. In2025 11th International Symposium on System Security, Safety, and Reliability (ISSSR), pages 1–10. IEEE, 2025

work page 2025
[29]

Early exploration of using ChatGPT for log-based anomaly detection on parallel file systems logs

Chris Egersdoerfer, Di Zhang, and Dong Dai. Early exploration of using ChatGPT for log-based anomaly detection on parallel file systems logs. InProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’23), pages 315–316. Association for Computing Machinery, 2023. doi: 10.1145/3588195.3595943. URL https:...

work page doi:10.1145/3588195.3595943 2023
[30]

Insightai: Root cause analysis in large log files with private data using large language model

Maryam Ekhlasi, Anurag Prakash, Maxime Lamothe, and Michel Dagenais. Insightai: Root cause analysis in large log files with private data using large language model. In2025 IEEE/ACM 4th International Conference on AI Engineering – Software Engineering for AI (CAIN), pages 31–41, 2025. doi: 10.1109/CAIN66642.2025.00012

work page doi:10.1109/cain66642.2025.00012 2025
[31]

A novel framework for detecting anomalies in network security using LLM and deep learning.Journal of Electrical Systems, 21(1s):294–302, 2025

Mahmoudreza Entezami, Shahabeddin Rahimi Harsini, David Houshangi, and Zahra Entezami. A novel framework for detecting anomalies in network security using LLM and deep learning.Journal of Electrical Systems, 21(1s):294–302, 2025. doi: 10 .52783/jes.8791

work page 2025
[32]

How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings

Kawin Ethayarajh. How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Confere...

work page doi:10.18653/v1/d19-1006 2019
[33]

Regulation (eu) 2016/679 of the european parliament and of the council (general data protection regulation)

European Parliament and Council of the European Union. Regulation (eu) 2016/679 of the european parliament and of the council (general data protection regulation). Official Journal of the European Union, L119, 2016. URL https://eur-lex.europa.eu/eli/reg/2016/679/oj

work page 2016
[34]

Log anomaly detection by leveraging llm-based parsing and embedding with attention mechanism

Asma Fariha, Vida Gharavian, Masoud Makrehchi, Shahryar Rahnamayan, Sanaa Alwidian, and Akramul Azim. Log anomaly detection by leveraging llm-based parsing and embedding with attention mechanism. In2024 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pages 859–863. IEEE, 2024

work page 2024
[35]

Codebert: A pre-trained model for programming and natural languages

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. Codebert: A pre-trained model for programming and natural languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, pages 1536–1547. Association for Computational Linguistics, 2020. doi: 10 .18653/...

work page 2020
[36]

Security and privacy controls for information systems and organizations (nist special publication 800-53 revision 5),

Joint Task Force. Security and privacy controls for information systems and organizations (nist special publication 800-53 revision 5),

work page
[37]

URL https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final

work page
[38]

Amy Foster and Selva Kumar.COMBINING LLMS AND SHELL LOGS TO PREDICT BACKUP FAILURES. 07 2025

work page 2025
[39]

Where do developers log? an empirical study on logging practices in industry

Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie. Where do developers log? an empirical study on logging practices in industry. InCompanion Proceedings of the 36th International Conference on Software Engineering, pages 24–33. ACM, 2014. doi: 10.1145/2591062.2591175

work page doi:10.1145/2591062.2591175 2014
[40]

End-to-end log statement generation at block-level.Journal of Systems and Software, 216:112146, 2024

Ying Fu, Meng Yan, Pinjia He, Chao Liu, Xiaohong Zhang, and Dan Yang. End-to-end log statement generation at block-level.Journal of Systems and Software, 216:112146, 2024. doi: 10.1016/j.jss.2024.112146

work page doi:10.1016/j.jss.2024.112146 2024
[41]

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christopher Endres, Thorsten Holz, and Mario Fritz. Not what you ´ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection, 2023. URL https://arxiv.org/abs/2302.12173

work page internal anchor Pith review Pith/arXiv arXiv 2023
[42]

LogLLM: Log-based anomaly detection using large language models.arXiv preprint,

Wei Guan, Jian Cao, Shiyou Qian, and Jianqi Gao. LogLLM: Log-based anomaly detection using large language models.arXiv preprint,

work page
[43]

URL https://arxiv.org/abs/2411.08561

work page arXiv
[44]

H. Guo, S. Yuan, and X. Wu. Logbert: Log anomaly detection via bert. In2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021

work page 2021
[45]

Logformer: A pre-train and tuning pipeline for log anomaly detection

Hongcheng Guo, Jian Yang, Jiaheng Liu, Jiaqi Bai, Boyang Wang, Zhoujun Li, Tieqiao Zheng, Bo Zhang, Junran Peng, and Qi Tian. Logformer: A pre-train and tuning pipeline for log anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 135–143, 2024

work page 2024
[46]

Owl: A large language model for it operations

Hongcheng Guo, Jian Yang, Jiaheng Liu, Liqun Yang, Linzheng Chai, Jiaqi Bai, Junran Peng, Xiaorong Hu, Chao Chen, Dongfeng Zhang, Xu Shi, Tieqiao Zheng, Liangfan Zheng, Bo Zhang, Ke Xu, and Zhoujun Li. Owl: A large language model for it operations. InProceedings of the Twelfth International Conference on Learning Representations (ICLR 2024), 2024. URL htt...

work page 2024
[47]

Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. Don’t stop pretraining: Adapt language models to domains and tasks. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 8342–8360. Association for Computational Linguistics, 2020. doi: 10 .18653/v1/2...

work page 2020
[48]

Llm meets ml: Data-efficient anomaly detection on unstable logs.ACM Transactions on Software Engineering and Methodology, 2025

Fatemeh Hadadi, Qinghua Xu, Domenico Bianculli, and Lionel Briand. Llm meets ml: Data-efficient anomaly detection on unstable logs.ACM Transactions on Software Engineering and Methodology, 2025. doi: 10 .1145/3771283. URL https://doi .org/10.1145/3771283. arXiv:2406.07467

work page doi:10.1145/3771283 2025
[49]

Llmelog: An approach for anomaly detection based on llm-enriched log events

Minghua He, Tong Jia, Chiming Duan, Huaqian Cai, Ying Li, and Gang Huang. Llmelog: An approach for anomaly detection based on llm-enriched log events. In2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pages 132–143. IEEE, 2024

work page 2024
[50]

Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. Drain: An online log parsing approach with fixed depth tree. In2017 IEEE International Conference on Web Services (ICWS), pages 33–40, 2017. doi: 10.1109/ICWS.2017.13

work page doi:10.1109/icws.2017.13 2017
[51]

A survey on automated log analysis for reliability engineering.ACM computing surveys (CSUR), 54(6):1–37, 2021

Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. A survey on automated log analysis for reliability engineering.ACM computing surveys (CSUR), 54(6):1–37, 2021

work page 2021
[52]

Parameter-efficient log anomaly detection based on pre-training model and lora

Shiming He, Ying Lei, Ying Zhang, Kun Xie, and Pradip Kumar Sharma. Parameter-efficient log anomaly detection based on pre-training model and lora. In2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pages 207–217. IEEE, 2023

work page 2023
[53]

Benchmarking open-source large language models for log level suggestion

Yi Wen Heng, Zeyang Ma, Zhenhao Li, Dong Jae Kim, and Tse-Hsun Chen. Benchmarking open-source large language models for log level suggestion. In2025 IEEE Conference on Software Testing, Verification and Validation (ICST), pages 314–325, 2025. doi: 10.1109/ICST62969.2025.10988921

work page doi:10.1109/icst62969.2025.10988921 2025
[54]

Diagnosing robotics systems issues with large language models–a case study

Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, and Mark Niklas Mueller. Diagnosing robotics systems issues with large language models–a case study. InICLR 2025 Workshop on Foundation Models in the Wild

work page 2025
[55]

Long short-term memory.Neural Computation, 9(8):1735–1780, 1997

Sepp Hochreiter and J"urgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735–1780, 1997. doi: 10 .1162/ neco.1997.9.8.1735

work page 1997
[56]

Reguly, Kálmán Tornai, Tamás Zsedrovits, and Zoltán Máthé

András Horváth, András Oláh, Attila Pintér, Bálint Siklósi, Gergely Lukács, István Z. Reguly, Kálmán Tornai, Tamás Zsedrovits, and Zoltán Máthé. Anomaly detection algorithms for real-time log data analysis at scale.IEEE Access, 13:136288–136311, 2025. doi: 10.1109/ACCESS.2025.3565575

work page doi:10.1109/access.2025.3565575 2025
[57]

Research on log anomaly detection based on sentence-BERT.Electronics, 12(17):3580, 2023

Changze Hu, Yu Fang, Jinhua Wu, Haoyang Li, and Geng Wang. Research on log anomaly detection based on sentence-BERT.Electronics, 12(17):3580, 2023. doi: 10.3390/electronics12173580. URL https://www.mdpi.com/2079-9292/12/17/3580

work page doi:10.3390/electronics12173580 2023
[58]

Demystifying and extracting fault-indicating information from logs for failure diagnosis

Junjie Huang, Zhihan Jiang, Jinyang Liu, Yintong Huo, Jiazhen Gu, Zhuangbin Chen, Cong Feng, Hui Dong, Zengyin Yang, and Michael R Lyu. Demystifying and extracting fault-indicating information from logs for failure diagnosis. In2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pages 511–522. IEEE, 2024

work page 2024
[59]

Junjie Huang, Zhihan Jiang, Zhuangbin Chen, and Michael R. Lyu. No more labelled examples? an unsupervised log parser with llms. InProceedings of the ACM on Software Engineering (FSE 2025), number FSE. ACM, 2025. doi: 10 .1145/3729377. URL https: //dl.acm.org/doi/10.1145/3729377

work page doi:10.1145/3729377 2025
[60]

Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan

Shaohan Huang, Yi Liu, Carol J. Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. Hitanomaly: Hierarchical transformers for anomaly detection in system log.IEEE Transactions on Network and Service Management, 17(2):2064–2076, 2020. doi: 10 .1109/ TNSM.2020.3034647

work page arXiv 2064
[61]

Fung, He Wang, Hailong Yang, and Zhongzhi Luan

Shaohan Huang, Yi Liu, Carol J. Fung, He Wang, Hailong Yang, and Zhongzhi Luan. Improving log-based anomaly detection by pre-training hierarchical transformers.IEEE Transactions on Computers, 72(9):2656–2667, 2023. doi: 10.1109/TC.2023.3257518

work page doi:10.1109/tc.2023.3257518 2023
[62]

LogRules: Enhancing log analysis capability of large language models through rules

Xin Huang, Ting Zhang, and Wen Zhao. LogRules: Enhancing log analysis capability of large language models through rules. In Luis Chiruzzo, Alan Ritter, and Lu Wang, editors,Findings of the Association for Computational Linguistics: NAACL 2025, pages 452– 470, Albuquerque, New Mexico, April 2025. Association for Computational Linguistics. ISBN 979-8-89176-...

work page 2025
[63]

Xuanbo Huang, Kaiping Xue, Lutong Chen, Jiangping Han, Jian Li, and David S. L. Wei. Forensix: Automated network forensics and diagnostics for beyond-5g and 6g networks using large language models.IEEE Network, 39(5):74–80, 2025. doi: 10 .1109/ MNET.2025.3579925

work page arXiv 2025
[64]

Yintong Huo, Cheryl Lee, Yuxin Su, Shiwen Shan, Jinyang Liu, and Michael R. Lyu. Evlog: Evolving log analyzer for anomalous logs identification. arXiv preprint arXiv:2306.01509, 2023. URL https://arxiv.org/abs/2306.01509

work page arXiv 2023
[65]

Fall: Prior failure detection in large scale system based on language model.IEEE Transactions on Dependable and Secure Computing, 22(1):279–291, 2025

Jaeyoon Jeong, Insung Baek, Byungwoo Bang, Junyeon Lee, Uiseok Song, and Seoung Bum Kim. Fall: Prior failure detection in large scale system based on language model.IEEE Transactions on Dependable and Secure Computing, 22(1):279–291, 2025. doi: 10.1109/TDSC.2024.3396166

work page doi:10.1109/tdsc.2024.3396166 2025
[66]

Lemad: LLM-empowered multi-agent system for anomaly detection in power grid services.Electronics, 14(15):3008, 2025

Xin Ji, Le Zhang, Wenya Zhang, Fang Peng, Yifan Mao, Xingchuang Liao, and Kui Zhang. Lemad: LLM-empowered multi-agent system for anomaly detection in power grid services.Electronics, 14(15):3008, 2025. doi: 10 .3390/electronics14153008. URL https: //doi.org/10.3390/electronics14153008

work page doi:10.3390/electronics14153008 2025
[67]

Adapting large language models to log analysis with interpretable domain knowledge

Yuhe Ji, Yilun Liu, Feiyu Yao, Minggui He, Shimin Tao, Xiaofeng Zhao, Chang Su, Xinhua Yang, Weibin Meng, Yuming Xie, Boxing Chen, Shenglin Zhang, and Yongqian Sun. Adapting large language models to log analysis with interpretable domain knowledge. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, CIKM ’25, pa...

work page doi:10.1145/3746252.3761189 2025
[68]

Survey of hallucination in natural language generation, 2023

Ziwei Ji, Nayeon Lee, Rita Frieske, Tianle Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation, 2023. URL https://arxiv.org/abs/2304.04710

work page arXiv 2023
[69]

Zhihan Jiang, Jinyang Liu, Zhuangbin Chen, Yichen Li, Junjie Huang, Yintong Huo, Pinjia He, Jiazhen Gu, and Michael R. Lyu. Lilac: Log parsing using llms with adaptive parsing cache.Proceedings of the ACM on Software Engineering, 1(FSE), 2024. doi: 10 .1145/3643733. URL https://doi.org/10.1145/3643733

work page doi:10.1145/3643733 2024
[70]

Language Models (Mostly) Know What They Know

Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Dawn Drain, Nelson Chen, Yuntao Bai, Jared Kaplan, Sam McCandlish, Dario Amodei, Ethan Chen, and Catherine Olsson. Language models (mostly) know what they know, 2022. URL https://arxiv.org/abs/2207.05221

work page internal anchor Pith review Pith/arXiv arXiv 2022
[71]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. URL https://arxiv .org/abs/2001.08361

work page internal anchor Pith review Pith/arXiv arXiv 2001
[72]

Exploring semantic vs

Crystal Karlsen, Denis Copstein, Yue Luo, Benjamin Schwartzentruber, Tim Niblett, and Olivier Rouyer. Exploring semantic vs. syntactic features for unsupervised learning on application log files. In2023 International Conference on Cyber Security and Networks (CSNet), 2023. doi: 10.1109/CSNet59123.2023.10339765

work page doi:10.1109/csnet59123.2023.10339765 2023
[73]

Benchmarking large language models for log analysis, security, and interpretation.Journal of Network and Systems Management, 32:59, 2024

Egil Karlsen, Xiao Luo, Nur Zincir-Heywood, and Malcolm Heywood. Benchmarking large language models for log analysis, security, and interpretation.Journal of Network and Systems Management, 32:59, 2024. doi: 10.1007/s10922-024-09831-x

work page doi:10.1007/s10922-024-09831-x 2024
[74]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781. Association for Computational Linguistics, 2020. doi: 10 .18653/v1/2...

work page 2020
[75]

Nist special publication 800-92: Guide to computer security log management, 2006

Karen Kent and Murugiah Souppaya. Nist special publication 800-92: Guide to computer security log management, 2006. URL https://csrc.nist.gov/publications/detail/sp/800-92/final

work page 2006
[76]

Log-based anomaly detection without log parsing

Van-Hoang Le and Hongyu Zhang. Log-based anomaly detection without log parsing. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM, 2021

work page 2021
[77]

Automated program repair in the era of large pre-trained language models

Van-Hoang Le and Hongyu Zhang. Log parsing with prompt-based few-shot learning. InProceedings of the 45th International Conference on Software Engineering (ICSE 2023), pages 2438–2449. IEEE, 2023. doi: 10 .1109/ICSE48619.2023.00204. URL https://conf .researchr.org/ details/icse-2023/icse-2023-technical-track/165/Log-Parsing-with-Prompt-based-Few-shot-Learning

work page arXiv 2023
[78]

Log parsing: How far can chatgpt go? InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, ASE ’23, page 1699–1704

Van-Hoang Le and Hongyu Zhang. Log parsing: How far can chatgpt go? InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, ASE ’23, page 1699–1704. IEEE Press, 2024. ISBN 9798350329964. doi: 10.1109/ASE56229.2023.00206. URL https://doi.org/10.1109/ASE56229.2023.00206

work page doi:10.1109/ase56229.2023.00206 2024
[79]

Prelog: A pre-trained model for log analytics.Proceedings of the ACM on Management of Data, 2(3): 1–28, 2024

Van-Hoang Le and Hongyu Zhang. Prelog: A pre-trained model for log analytics.Proceedings of the ACM on Management of Data, 2(3): 1–28, 2024. doi: 10.1145/3654966. URL https://dl.acm.org/doi/10.1145/3654966. Presented at SIGMOD 2024

work page doi:10.1145/3654966 2024
[80]

Unleashing the true potential of semantic-based log parsing with pre-trained language models

Van-Hoang Le, Yi Xiao, and Hongyu Zhang. Unleashing the true potential of semantic-based log parsing with pre-trained language models. InProceedings of the 47th International Conference on Software Engineering (ICSE 2025). IEEE/ACM, 2025. URL https://conf .researchr.org/details/icse-2025/icse-2025-research-track/80/Unleashing-the-True-Potential-of-Semanti...

work page 2025

Showing first 80 references.