AnomalyGen: Enhancing Log-Based Anomaly Detection with Code-Guided Data Augmentation

Chenxi Mao; Shiwen Shan; Xinyu Li; Yanlin Wang; Yintong Huo; Yuxin Su; Zibin Zheng

arxiv: 2604.11107 · v1 · submitted 2026-04-13 · 💻 cs.SE

AnomalyGen: Enhancing Log-Based Anomaly Detection with Code-Guided Data Augmentation

Xinyu Li , Yintong Huo , Chenxi Mao , Shiwen Shan , Yuxin Su , Yanlin Wang , Zibin Zheng This is my paper

Pith reviewed 2026-05-10 15:17 UTC · model grok-4.3

classification 💻 cs.SE

keywords anomaly detectionlog analysisdata augmentationsource codecontrol flow graphslarge language modelssoftware systems

0 comments

The pith

AnomalyGen generates labeled log sequences from source code to augment training data for better anomaly detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Log anomaly detection models often fail on unseen but valid log sequences because existing training datasets only cover a small fraction of the execution paths in the source code. The paper proposes AnomalyGen to create more training examples by analyzing the code to find valid paths and using language models to make them look real with proper parameters and labels. A sympathetic reader would care because this could mean fewer false alarms in monitoring large software systems without needing to collect more real anomalous data. The key is that the generated data must match what actually happens at runtime. If successful, it allows models to learn a fuller picture of normal behavior from the code itself.

Core claim

The central claim is that training data for log-based anomaly detection can be effectively augmented by synthesizing sequences directly from the program's source code. This is done through building log-oriented control flow graphs to list possible paths, using chain-of-thought reasoning in large language models to ensure logical consistency and generate runtime parameters, and applying domain heuristics to assign labels. As a result, anomaly detection models can better handle valid but previously unseen execution paths that were causing false positives.

What carries the argument

AnomalyGen, a three-stage framework that builds Log-Oriented Control Flow Graphs (LCFGs) from source code, applies LLM Chain-of-Thought reasoning for verification and parameter generation, and labels sequences using domain heuristics.

If this is right

Augmented datasets lead to improved performance for a wide range of anomaly detection models on real-world systems.
Both the static analysis component for path enumeration and the LLM-based verification step are essential to the gains.
The approach works for both supervised and unsupervised detection techniques.
Public release of the framework and datasets enables further research on code-guided augmentation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the method scales to larger codebases, it could significantly reduce the data collection burden for training reliable anomaly detectors in industry.
Similar code-to-data synthesis might be applied to other software analysis tasks like test generation or performance modeling.
Future work could test if the generated sequences remain effective when the source code evolves over time with new features.

Load-bearing premise

The generated log sequences must accurately represent real runtime behaviors and have correct anomaly labels so that they enhance rather than confuse the training of detection models.

What would settle it

Training models on the original data versus the augmented data and observing no improvement or a decrease in detection accuracy on a separate test set of real logs.

Figures

Figures reproduced from arXiv: 2604.11107 by Chenxi Mao, Shiwen Shan, Xinyu Li, Yanlin Wang, Yintong Huo, Yuxin Su, Zibin Zheng.

**Figure 1.** Figure 1: The three-stage workflow of AnomalyGen. Phase I performs log-based graph pruning (1), subgraph extraction (2), and LCFG [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Log-aware node labeling and call graph pruning. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: An example of LCFG construction: AST analysis extracts log-critical elements, and dominance analysis determines the valid [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Phase II: Recursive Log Merging with CoT Verification Process. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Example of Phase II: a caller context and candidate callee path are provided to the LLM, which evaluates logical consistency [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Structured output produced by CoT verification. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Average F1-score improvement under different encoding schemes across model paradigms. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: UnSupervised Deep Learning Performance Under Different Augmentation Ratio [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Supervised Deep Learning Performance Under Different Augmentation Ratio [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Machine Learning Performance Under Different Augmentation Ratio [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

read the original abstract

Log-based anomaly detection is fundamentally constrained by training data sparsity. Our empirical study reveals that public benchmark datasets cover less than 10% of source code log templates. Consequently, models frequently misclassify unseen but valid execution paths as anomalies, leading to false alarms. To address this, we propose AnomalyGen, a novel framework that augments training data by synthesizing labeled log sequences from source code. AnomalyGen combines log-oriented static analysis with Large Language Model (LLM) reasoning in three stages: (1) building Log-Oriented Control Flow Graphs (LCFGs) to enumerate structurally valid execution paths; (2) applying LLM Chain-of-Thought (CoT) reasoning to verify logical consistency and generate realistic runtime parameters (e.g., block IDs, IP addresses); and (3) labeling generated sequences with domain heuristics. Evaluations on HDFS and Zookeeper across 12 diverse anomaly detection models show AnomalyGen consistently improves performance. Deep learning models achieved average F1-score gains of 2.18% (HDFS) and 1.69% (Zookeeper), with an unsupervised Transformer on HDFS jumping from 0.818 to 0.970. Ablation results show that both static analysis and LLM-based verification are necessary: removing them reduces F1 by up to 8.7 and 10.7 percentage points, respectively. Our framework and datasets are publicly available to facilitate future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AnomalyGen uses log-oriented control flow graphs plus LLM chain-of-thought to generate synthetic log sequences from source code, producing modest F1 gains on HDFS and Zookeeper but leaving the realism of the generated data open to question.

read the letter

The core idea here is straightforward: public log benchmarks miss most of the execution paths that actually exist in the source code, so models flag valid but unseen sequences as anomalies. AnomalyGen tries to fix that by building LCFGs to list valid paths, then letting an LLM fill in parameters and check consistency before labeling with heuristics. The evaluations run the augmented data through twelve different detectors and show average F1 lifts of roughly two percent, with one unsupervised transformer on HDFS moving from 0.818 to 0.970. Ablations indicate that dropping either the static analysis or the LLM step hurts results noticeably.

Referee Report

2 major / 4 minor

Summary. The manuscript introduces AnomalyGen, a framework that augments sparse training data for log-based anomaly detection by synthesizing labeled log sequences directly from source code. It constructs Log-Oriented Control Flow Graphs (LCFGs) to enumerate structurally valid execution paths, employs LLM Chain-of-Thought reasoning to generate realistic runtime parameters and verify consistency, and applies domain heuristics for labeling. Evaluations across HDFS and Zookeeper using 12 anomaly detection models report consistent F1-score gains (average 2.18% for deep learning models on HDFS), with an unsupervised Transformer improving from 0.818 to 0.970 on HDFS; ablations indicate both static analysis and LLM verification are necessary.

Significance. If the generated sequences are distributionally faithful to real executions and correctly labeled, the work directly tackles the data sparsity problem the authors quantify (public benchmarks cover <10% of source-code log templates), potentially reducing false alarms on unseen valid paths. Strengths include the public release of the framework and datasets, the breadth of the 12-model evaluation, and the ablation results that isolate component contributions. These elements support reproducibility and could influence practical log anomaly detection pipelines.

major comments (2)

[Evaluation] Evaluation section (performance claims): The headline result—an unsupervised Transformer F1-score rising from 0.818 to 0.970 on HDFS—is large enough to be load-bearing for the central claim. Yet the manuscript provides no direct evidence (manual inspection, distributional comparison to real traces, or parameter-validity checks) that LLM-synthesized values (block IDs, IP addresses, etc.) are executable or that heuristic labels are accurate. Without such validation, the gains could arise from models exploiting generation artifacts rather than learning improved coverage of valid paths.
[Ablation study] Ablation study: While removing LCFG construction or LLM verification reduces F1 by up to 8.7 and 10.7 points respectively, the ablations do not include any metric of generated-data fidelity (e.g., fraction of paths that match real executions or statistical tests on parameter distributions). This omission leaves open whether the retained data truly augments coverage of unseen but valid paths or merely supplies easier-to-classify examples.

minor comments (4)

The abstract states that the framework and datasets are publicly available; the main text should include an explicit repository URL and commit hash for reproducibility.
[Methodology] Methodology section: Provide the exact domain heuristics used for labeling and the full LLM prompts (including CoT instructions) so readers can assess potential label noise or hallucination risks.
[Evaluation] Evaluation section: Report statistical significance (e.g., paired t-tests or bootstrap confidence intervals) for the F1 improvements and include error bars on the per-model results.
[Discussion] Discussion: Add a limitations paragraph addressing possible failure modes of LCFG construction (e.g., incomplete static analysis of complex control flow) and how they might affect downstream detection performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the need for stronger validation of the synthesized data. We address each major point below and propose revisions to incorporate direct fidelity checks.

read point-by-point responses

Referee: [Evaluation] Evaluation section (performance claims): The headline result—an unsupervised Transformer F1-score rising from 0.818 to 0.970 on HDFS—is large enough to be load-bearing for the central claim. Yet the manuscript provides no direct evidence (manual inspection, distributional comparison to real traces, or parameter-validity checks) that LLM-synthesized values (block IDs, IP addresses, etc.) are executable or that heuristic labels are accurate. Without such validation, the gains could arise from models exploiting generation artifacts rather than learning improved coverage of valid paths.

Authors: We agree that direct validation of the generated sequences would provide stronger support for the headline claims. The current manuscript relies on indirect evidence: the large performance gains occur only when both LCFG construction and LLM verification are present, and the ablations show substantial drops (up to 10.7 points) when either is removed. This pattern is difficult to explain solely by artifacts, as random or invalid sequences would not systematically improve coverage of unseen valid paths. Nevertheless, we will add a new subsection in the revised manuscript that reports (1) statistical comparisons (e.g., Kolmogorov-Smirnov tests) of parameter distributions between generated and real traces, (2) the fraction of generated paths that match observed real executions, and (3) a manual audit of 200 randomly sampled sequences for executability and label correctness. These additions will directly address the concern that gains might stem from artifacts. revision: yes
Referee: [Ablation study] Ablation study: While removing LCFG construction or LLM verification reduces F1 by up to 8.7 and 10.7 points respectively, the ablations do not include any metric of generated-data fidelity (e.g., fraction of paths that match real executions or statistical tests on parameter distributions). This omission leaves open whether the retained data truly augments coverage of unseen but valid paths or merely supplies easier-to-classify examples.

Authors: We acknowledge that the existing ablation results measure only downstream F1 impact and do not quantify data fidelity. In the revision we will augment the ablation study with explicit fidelity metrics: the percentage of generated execution paths that appear in the original real traces, and distributional similarity tests on runtime parameters. These metrics will be reported for the full pipeline versus the ablated variants, allowing readers to assess whether the retained data improves coverage of valid paths rather than merely providing easier examples. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives its performance gains from an external pipeline: LCFG construction from source code, LLM CoT parameter synthesis, heuristic labeling, and then training/evaluation on held-out portions of standard HDFS and Zookeeper benchmarks. Reported F1 improvements (including the 0.818→0.970 Transformer jump) and ablations are measured against fixed test sets that are not used in generation or labeling; no equation or step reduces the final metric to a fitted parameter or self-defined quantity. No self-citation load-bearing steps, imported uniqueness theorems, or ansatz smuggling appear. The approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Based solely on abstract; no explicit free parameters, axioms, or invented entities detailed beyond the introduced LCFG construct and reliance on LLM capabilities.

invented entities (1)

Log-Oriented Control Flow Graphs (LCFGs) no independent evidence
purpose: Enumerate structurally valid execution paths from source code for log synthesis
New construct introduced for the method; no independent evidence or prior citation in abstract.

pith-pipeline@v0.9.0 · 5581 in / 1245 out tokens · 88468 ms · 2026-05-10T15:17:05.636871+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 3 internal anchors

[1]

GPT-4o System Card

2024. GPT-4o System Card. arXiv:2410.21276 [cs.CL] https://arxiv.org/abs/2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

Adrninistrator. 2025. java-callgraph2: Programs for producing static call graphs for Java programs. https://github.com/Adrninistrator/java-callgraph2

work page 2025
[3]

Stefan Andonov and Gjorgji Madjarov. 2023. LogGC: Novel Approach for Graph-based Log Anomaly Detection. In2023 IEEE International Conference on Data Mining Workshops (ICDMW). 1194–1202. doi:10.1109/ICDMW60847.2023.00156

work page doi:10.1109/icdmw60847.2023.00156 2023
[4]

Chen, A.X

M. Chen, A.X. Zheng, J. Lloyd, M.I. Jordan, and E. Brewer. 2004. Failure diagnosis using decision trees. InInternational Conference on Autonomic Computing, 2004. Proceedings.36–43. doi:10.1109/ICAC.2004.1301345

work page doi:10.1109/icac.2004.1301345 2004
[5]

Zhuangbin Chen, Jinyang Liu, Wenwei Gu, Yuxin Su, and Michael R. Lyu. 2022. Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection. arXiv:2107.05908 [cs.SE] https://arxiv.org/abs/2107.05908

work page arXiv 2022
[6]

DeepSeek-AI. 2024. DeepSeek-V3 Technical Report. arXiv:2412.19437 [cs.CL] https://arxiv.org/abs/2412.19437

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Zishuo Ding, Heng Li, and Weiyi Shang. 2022. Logentext: Automatically generating logging texts using neural machine translation. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 349–360

work page 2022
[8]

Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security. 1285–1298

work page 2017
[9]

Chiming Duan, Minghua He, Pei Xiao, Tong Jia, Xin Zhang, Zhewei Zhong, Xiang Luo, Yan Niu, Lingzhe Zhang, Yifan Wu, Siyu Yu, Weijie Hong, Ying Li, and Gang Huang. 2025. LogAction: Consistent Cross-system Anomaly Detection through Logs via Active Domain Adaptation. arXiv:2510.03288 [cs.LG] https://arxiv.org/abs/2510.03288

work page arXiv 2025
[10]

Evelyn Fix and Joseph L. Hodges. 1989. Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties.International Statistical Review57 (1989), 238. https://api.semanticscholar.org/CorpusID:120323383

work page 1989
[11]

Hongcheng Guo, Jian Yang, Jiaheng Liu, Jiaqi Bai, Boyang Wang, Zhoujun Li, Tieqiao Zheng, Bo Zhang, Junran Peng, and Qi Tian. 2024. Logformer: A pre-train and tuning pipeline for log anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 135–143

work page 2024
[12]

Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R. Lyu. 2021. A Survey on Automated Log Analysis for Reliability Engineering.ACM Comput. Surv.54, 6, Article 130 (July 2021), 37 pages. doi:10.1145/3460345

work page doi:10.1145/3460345 2021
[13]

Shilin He, Jieming Zhu, Pinjia He, and Michael R. Lyu. 2016. Experience Report: System Log Analysis for Anomaly Detection. In2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). 207–218. doi:10.1109/ISSRE.2016.21

work page doi:10.1109/issre.2016.21 2016
[14]

Shilin He, Jieming Zhu, Pinjia He, and Michael R Lyu. 2020. Loghub: A large collection of system log datasets towards automated log analytics. arXiv 2020.arXiv preprint arXiv:2008.06448(2020)

work page arXiv 2020
[15]

Guang-Bin Huang, Yan-Qiu Chen, and H. A. Babri. 2000. Classification ability of single hidden layer feedforward neural networks.Trans. Neur. Netw.11, 3 (May 2000), 799–801. doi:10.1109/72.846750

work page doi:10.1109/72.846750 2000
[16]

Shaohan Huang, Yi Liu, Carol Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. 2020. HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log.IEEE Transactions on Network and Service Management17, 4 (2020), 2064–2076. doi:10.1109/TNSM.2020.3034647

work page doi:10.1109/tnsm.2020.3034647 2020
[17]

Yintong Huo, Yichen Li, Yuxin Su, Pinjia He, Zifan Xie, and Michael R Lyu. 2023. Autolog: A log sequence synthesis framework for anomaly detection. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 497–509

work page 2023
[18]

Van-Hoang Le and Hongyu Zhang. 2022. Log-based anomaly detection with deep learning: How far are we?. InProceedings of the 44th international conference on software engineering. 1356–1367

work page 2022
[19]

Van-Hoang Le and Hongyu Zhang. 2024. PreLog: A Pre-trained Model for Log Analytics. 2, 3, Article 163 (May 2024), 28 pages. doi:10.1145/3654966

work page doi:10.1145/3654966 2024
[20]

Xiaoyun Li, Pengfei Chen, Linxiao Jing, Zilong He, and Guangba Yu. 2020. Swisslog: Robust and unified deep learning based log anomaly detection for diverse faults. In2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 92–103

work page 2020
[21]

Yichen Li, Yintong Huo, Zhihan Jiang, Renyi Zhong, Pinjia He, Yuxin Su, Lionel C Briand, and Michael R Lyu. 2024. Exploring the effectiveness of llms in automated logging statement generation: An empirical study.IEEE Transactions on Software Engineering(2024)

work page 2024
[22]

Yichen Li, Yintong Huo, Renyi Zhong, Zhihan Jiang, Jinyang Liu, Junjie Huang, Jiazhen Gu, Pinjia He, and Michael R Lyu. 2024. Go static: Contextualized logging statement generation.Proceedings of the ACM on Software Engineering1, FSE (2024), 609–630

work page 2024
[23]

Zhong Li, Jiayang Shi, and Matthijs Van Leeuwen. 2024. Graph Neural Networks based Log Anomaly Detection and Explanation. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings(Lisbon, Portugal)(ICSE-Companion ’24). Association for Computing Machinery, New York, NY, USA, 306–307. doi:10.1145/3639478.3643084

work page doi:10.1145/3639478.3643084 2024
[24]

Zhong Li, Jiayang Shi, and Matthijs Van Leeuwen. 2024. Graph neural networks based log anomaly detection and explanation. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 306–307

work page 2024
[25]

Yilun Liu, Shimin Tao, Weibin Meng, Jingyu Wang, Wenbing Ma, Yuhang Chen, Yanqing Zhao, Hao Yang, and Yanfei Jiang. 2024. Interpretable online log analysis using large language models with prompt strategies. InProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension. 35–46. Manuscript submitted to ACM 22 Xinyu Li, Yintong Huo, Ch...

work page 2024
[26]

Yilun Liu, Shimin Tao, Weibin Meng, Feiyu Yao, Xiaofeng Zhao, and Hao Yang. 2024. Logprompt: Prompt engineering towards zero-shot and interpretable log analysis. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 364–365

work page 2024
[27]

Siyang Lu, Xiang Wei, Yandong Li, and Liqiang Wang. 2018. Detecting Anomaly in Big Data System Logs Using Convolutional Neural Network. In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Cong...

work page doi:10.1109/dasc/ 2018
[28]

Lipeng Ma, Weidong Yang, Bo Xu, Sihang Jiang, Ben Fei, Jiaqing Liang, Mingjie Zhou, and Yanghua Xiao. 2024. Knowlog: Knowledge enhanced pre-trained language model for log understanding. InProceedings of the 46th ieee/acm international conference on software engineering. 1–13

work page 2024
[29]

Antonio Mastropaolo, Luca Pascarella, and Gabriele Bavota. 2022. Using deep learning to generate complete log statements. InProceedings of the 44th International Conference on Software Engineering. 2279–2290

work page 2022
[30]

Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, and Rong Zhou. 2019. Loganomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs. InProceedings of the 28th International Joint Conference on Artificial Intelligence(Macao, China)(IJCAI’19). AAAI Press...

work page 2019
[31]

Karthik Nagaraj, Charles Killian, and Jennifer Neville. 2012. Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems. In9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX Association, San Jose, CA, 353–366. https: //www.usenix.org/conference/nsdi12/technical-sessions/presentation/nagaraj

work page 2012
[32]

Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, and Odej Kao. 2020. Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs. In2020 IEEE International Conference on Data Mining (ICDM). 1196–1201. doi:10.1109/ICDM50108.2020.00148

work page doi:10.1109/icdm50108.2020.00148 2020
[33]

Brian A Nejmeh. 1988. NPATH: A measure of execution path complexity and its applications.Commun. ACM31, 2 (1988), 188–200

work page 1988
[34]

Jiaxing Qi, Zhongzhi Luan, Shaohan Huang, Yukun Wang, Carol Fung, Hailong Yang, and Depei Qian. 2022. Adanomaly: Adaptive Anomaly Detection for System Logs with Adversarial Learning. InNOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. 1–5. doi:10.1109/NOMS54207.2022.9789917

work page doi:10.1109/noms54207.2022.9789917 2022
[35]

Yoli Shavit, Kathy Razmadze, Gary Mataev, Hanan Shteingart, Eitan Zahavi, and Zachi Binshtock. 2024. SemantiLog: Log-based Anomaly Detection with Semantic Similarity. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2438–2439

work page 2024
[36]

Danny van Bruggen, Federico Tomassetti, Roger Howell, Malte Langkabel, Nicholas Smith, Artur Bosch, Malte Skoruppa, Cruz Maximilien, ThLeu, Panayiotis, Sebastian Kirsch, Simon, Johann Beleites, Wim Tibackx, jean pierre L, André Rouél, edefazio, Daan Schipper, Mathiponds, Why you want to know, Ryan Beckett, ptitjes, kotari4u, Marvin Wyrich, Ricardo Morais,...

work page doi:10.5281/zenodo.3842713 2020
[37]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL] https://arxiv.org/abs/2201.11903

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Junjielong Xu, Ziang Cui, Yuan Zhao, Xu Zhang, Shilin He, Pinjia He, Liqun Li, Yu Kang, Qingwei Lin, Yingnong Dang, et al. 2024. Unilog: Automatic logging via llm and in-context learning. InProceedings of the 46th ieee/acm international conference on software engineering. 1–12

work page 2024
[39]

Kenji Yamanishi and Yuko Maruyama. 2005. Dynamic syslog mining for network failure monitoring. InProceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining(Chicago, Illinois, USA)(KDD ’05). Association for Computing Machinery, New York, NY, USA, 499–508. doi:10.1145/1081870.1081927

work page doi:10.1145/1081870.1081927 2005
[40]

Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Plelog: Semi-supervised log-based anomaly detection via probabilistic label estimation. In2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 230–231

work page 2021
[41]

Shi Ying, Bingming Wang, Lu Wang, Qingshan Li, Yishi Zhao, Jianga Shang, Hao Huang, Guoli Cheng, Zhe Yang, and Jiangyi Geng. 2021. An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples.ACM Trans. Knowl. Discov. Data15, 3, Article 34 (April 2021), 22 pages. doi:10.1145/3441448

work page doi:10.1145/3441448 2021
[42]

Boxi Yu, Jiayi Yao, Qiuai Fu, Zhiqing Zhong, Haotian Xie, Yaoliang Wu, Yuchi Ma, and Pinjia He. 2024. Deep learning or classical machine learning? an empirical study on log-based anomaly detection. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–13

work page 2024
[43]

Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, Junjie Chen, Xiaoting He, Randolph Yao, Jian-Guang Lou, Murali Chintalapati, Furao Shen, and Dongmei Zhang. 2019. Robust log-based anomaly detection on unstable log data(ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA....

work page doi:10.1145/3338906.3338931 2019
[44]

Yiheng Zhang, Yunkang Cao, Xiaohao Xu, and Weiming Shen. 2024. LogiCode: an LLM-Driven Framework for Logical Anomaly Detection. arXiv:2406.04687 [cs.LG] https://arxiv.org/abs/2406.04687

work page arXiv 2024
[45]

Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, and Michael R Lyu. 2023. Loghub: A large collection of system log datasets for ai-driven log analytics. In2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 355–366. Manuscript submitted to ACM

work page 2023

[1] [1]

GPT-4o System Card

2024. GPT-4o System Card. arXiv:2410.21276 [cs.CL] https://arxiv.org/abs/2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

Adrninistrator. 2025. java-callgraph2: Programs for producing static call graphs for Java programs. https://github.com/Adrninistrator/java-callgraph2

work page 2025

[3] [3]

Stefan Andonov and Gjorgji Madjarov. 2023. LogGC: Novel Approach for Graph-based Log Anomaly Detection. In2023 IEEE International Conference on Data Mining Workshops (ICDMW). 1194–1202. doi:10.1109/ICDMW60847.2023.00156

work page doi:10.1109/icdmw60847.2023.00156 2023

[4] [4]

Chen, A.X

M. Chen, A.X. Zheng, J. Lloyd, M.I. Jordan, and E. Brewer. 2004. Failure diagnosis using decision trees. InInternational Conference on Autonomic Computing, 2004. Proceedings.36–43. doi:10.1109/ICAC.2004.1301345

work page doi:10.1109/icac.2004.1301345 2004

[5] [5]

Zhuangbin Chen, Jinyang Liu, Wenwei Gu, Yuxin Su, and Michael R. Lyu. 2022. Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection. arXiv:2107.05908 [cs.SE] https://arxiv.org/abs/2107.05908

work page arXiv 2022

[6] [6]

DeepSeek-AI. 2024. DeepSeek-V3 Technical Report. arXiv:2412.19437 [cs.CL] https://arxiv.org/abs/2412.19437

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Zishuo Ding, Heng Li, and Weiyi Shang. 2022. Logentext: Automatically generating logging texts using neural machine translation. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 349–360

work page 2022

[8] [8]

Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security. 1285–1298

work page 2017

[9] [9]

Chiming Duan, Minghua He, Pei Xiao, Tong Jia, Xin Zhang, Zhewei Zhong, Xiang Luo, Yan Niu, Lingzhe Zhang, Yifan Wu, Siyu Yu, Weijie Hong, Ying Li, and Gang Huang. 2025. LogAction: Consistent Cross-system Anomaly Detection through Logs via Active Domain Adaptation. arXiv:2510.03288 [cs.LG] https://arxiv.org/abs/2510.03288

work page arXiv 2025

[10] [10]

Evelyn Fix and Joseph L. Hodges. 1989. Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties.International Statistical Review57 (1989), 238. https://api.semanticscholar.org/CorpusID:120323383

work page 1989

[11] [11]

Hongcheng Guo, Jian Yang, Jiaheng Liu, Jiaqi Bai, Boyang Wang, Zhoujun Li, Tieqiao Zheng, Bo Zhang, Junran Peng, and Qi Tian. 2024. Logformer: A pre-train and tuning pipeline for log anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 135–143

work page 2024

[12] [12]

Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R. Lyu. 2021. A Survey on Automated Log Analysis for Reliability Engineering.ACM Comput. Surv.54, 6, Article 130 (July 2021), 37 pages. doi:10.1145/3460345

work page doi:10.1145/3460345 2021

[13] [13]

Shilin He, Jieming Zhu, Pinjia He, and Michael R. Lyu. 2016. Experience Report: System Log Analysis for Anomaly Detection. In2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). 207–218. doi:10.1109/ISSRE.2016.21

work page doi:10.1109/issre.2016.21 2016

[14] [14]

Shilin He, Jieming Zhu, Pinjia He, and Michael R Lyu. 2020. Loghub: A large collection of system log datasets towards automated log analytics. arXiv 2020.arXiv preprint arXiv:2008.06448(2020)

work page arXiv 2020

[15] [15]

Guang-Bin Huang, Yan-Qiu Chen, and H. A. Babri. 2000. Classification ability of single hidden layer feedforward neural networks.Trans. Neur. Netw.11, 3 (May 2000), 799–801. doi:10.1109/72.846750

work page doi:10.1109/72.846750 2000

[16] [16]

Shaohan Huang, Yi Liu, Carol Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. 2020. HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log.IEEE Transactions on Network and Service Management17, 4 (2020), 2064–2076. doi:10.1109/TNSM.2020.3034647

work page doi:10.1109/tnsm.2020.3034647 2020

[17] [17]

Yintong Huo, Yichen Li, Yuxin Su, Pinjia He, Zifan Xie, and Michael R Lyu. 2023. Autolog: A log sequence synthesis framework for anomaly detection. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 497–509

work page 2023

[18] [18]

Van-Hoang Le and Hongyu Zhang. 2022. Log-based anomaly detection with deep learning: How far are we?. InProceedings of the 44th international conference on software engineering. 1356–1367

work page 2022

[19] [19]

Van-Hoang Le and Hongyu Zhang. 2024. PreLog: A Pre-trained Model for Log Analytics. 2, 3, Article 163 (May 2024), 28 pages. doi:10.1145/3654966

work page doi:10.1145/3654966 2024

[20] [20]

Xiaoyun Li, Pengfei Chen, Linxiao Jing, Zilong He, and Guangba Yu. 2020. Swisslog: Robust and unified deep learning based log anomaly detection for diverse faults. In2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 92–103

work page 2020

[21] [21]

Yichen Li, Yintong Huo, Zhihan Jiang, Renyi Zhong, Pinjia He, Yuxin Su, Lionel C Briand, and Michael R Lyu. 2024. Exploring the effectiveness of llms in automated logging statement generation: An empirical study.IEEE Transactions on Software Engineering(2024)

work page 2024

[22] [22]

Yichen Li, Yintong Huo, Renyi Zhong, Zhihan Jiang, Jinyang Liu, Junjie Huang, Jiazhen Gu, Pinjia He, and Michael R Lyu. 2024. Go static: Contextualized logging statement generation.Proceedings of the ACM on Software Engineering1, FSE (2024), 609–630

work page 2024

[23] [23]

Zhong Li, Jiayang Shi, and Matthijs Van Leeuwen. 2024. Graph Neural Networks based Log Anomaly Detection and Explanation. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings(Lisbon, Portugal)(ICSE-Companion ’24). Association for Computing Machinery, New York, NY, USA, 306–307. doi:10.1145/3639478.3643084

work page doi:10.1145/3639478.3643084 2024

[24] [24]

Zhong Li, Jiayang Shi, and Matthijs Van Leeuwen. 2024. Graph neural networks based log anomaly detection and explanation. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 306–307

work page 2024

[25] [25]

Yilun Liu, Shimin Tao, Weibin Meng, Jingyu Wang, Wenbing Ma, Yuhang Chen, Yanqing Zhao, Hao Yang, and Yanfei Jiang. 2024. Interpretable online log analysis using large language models with prompt strategies. InProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension. 35–46. Manuscript submitted to ACM 22 Xinyu Li, Yintong Huo, Ch...

work page 2024

[26] [26]

Yilun Liu, Shimin Tao, Weibin Meng, Feiyu Yao, Xiaofeng Zhao, and Hao Yang. 2024. Logprompt: Prompt engineering towards zero-shot and interpretable log analysis. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 364–365

work page 2024

[27] [27]

Siyang Lu, Xiang Wei, Yandong Li, and Liqiang Wang. 2018. Detecting Anomaly in Big Data System Logs Using Convolutional Neural Network. In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Cong...

work page doi:10.1109/dasc/ 2018

[28] [28]

Lipeng Ma, Weidong Yang, Bo Xu, Sihang Jiang, Ben Fei, Jiaqing Liang, Mingjie Zhou, and Yanghua Xiao. 2024. Knowlog: Knowledge enhanced pre-trained language model for log understanding. InProceedings of the 46th ieee/acm international conference on software engineering. 1–13

work page 2024

[29] [29]

Antonio Mastropaolo, Luca Pascarella, and Gabriele Bavota. 2022. Using deep learning to generate complete log statements. InProceedings of the 44th International Conference on Software Engineering. 2279–2290

work page 2022

[30] [30]

Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, and Rong Zhou. 2019. Loganomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs. InProceedings of the 28th International Joint Conference on Artificial Intelligence(Macao, China)(IJCAI’19). AAAI Press...

work page 2019

[31] [31]

Karthik Nagaraj, Charles Killian, and Jennifer Neville. 2012. Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems. In9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX Association, San Jose, CA, 353–366. https: //www.usenix.org/conference/nsdi12/technical-sessions/presentation/nagaraj

work page 2012

[32] [32]

Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, and Odej Kao. 2020. Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs. In2020 IEEE International Conference on Data Mining (ICDM). 1196–1201. doi:10.1109/ICDM50108.2020.00148

work page doi:10.1109/icdm50108.2020.00148 2020

[33] [33]

Brian A Nejmeh. 1988. NPATH: A measure of execution path complexity and its applications.Commun. ACM31, 2 (1988), 188–200

work page 1988

[34] [34]

Jiaxing Qi, Zhongzhi Luan, Shaohan Huang, Yukun Wang, Carol Fung, Hailong Yang, and Depei Qian. 2022. Adanomaly: Adaptive Anomaly Detection for System Logs with Adversarial Learning. InNOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. 1–5. doi:10.1109/NOMS54207.2022.9789917

work page doi:10.1109/noms54207.2022.9789917 2022

[35] [35]

Yoli Shavit, Kathy Razmadze, Gary Mataev, Hanan Shteingart, Eitan Zahavi, and Zachi Binshtock. 2024. SemantiLog: Log-based Anomaly Detection with Semantic Similarity. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2438–2439

work page 2024

[36] [36]

Danny van Bruggen, Federico Tomassetti, Roger Howell, Malte Langkabel, Nicholas Smith, Artur Bosch, Malte Skoruppa, Cruz Maximilien, ThLeu, Panayiotis, Sebastian Kirsch, Simon, Johann Beleites, Wim Tibackx, jean pierre L, André Rouél, edefazio, Daan Schipper, Mathiponds, Why you want to know, Ryan Beckett, ptitjes, kotari4u, Marvin Wyrich, Ricardo Morais,...

work page doi:10.5281/zenodo.3842713 2020

[37] [37]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL] https://arxiv.org/abs/2201.11903

work page internal anchor Pith review Pith/arXiv arXiv 2023

[38] [38]

Junjielong Xu, Ziang Cui, Yuan Zhao, Xu Zhang, Shilin He, Pinjia He, Liqun Li, Yu Kang, Qingwei Lin, Yingnong Dang, et al. 2024. Unilog: Automatic logging via llm and in-context learning. InProceedings of the 46th ieee/acm international conference on software engineering. 1–12

work page 2024

[39] [39]

Kenji Yamanishi and Yuko Maruyama. 2005. Dynamic syslog mining for network failure monitoring. InProceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining(Chicago, Illinois, USA)(KDD ’05). Association for Computing Machinery, New York, NY, USA, 499–508. doi:10.1145/1081870.1081927

work page doi:10.1145/1081870.1081927 2005

[40] [40]

Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Plelog: Semi-supervised log-based anomaly detection via probabilistic label estimation. In2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 230–231

work page 2021

[41] [41]

Shi Ying, Bingming Wang, Lu Wang, Qingshan Li, Yishi Zhao, Jianga Shang, Hao Huang, Guoli Cheng, Zhe Yang, and Jiangyi Geng. 2021. An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples.ACM Trans. Knowl. Discov. Data15, 3, Article 34 (April 2021), 22 pages. doi:10.1145/3441448

work page doi:10.1145/3441448 2021

[42] [42]

Boxi Yu, Jiayi Yao, Qiuai Fu, Zhiqing Zhong, Haotian Xie, Yaoliang Wu, Yuchi Ma, and Pinjia He. 2024. Deep learning or classical machine learning? an empirical study on log-based anomaly detection. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–13

work page 2024

[43] [43]

Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, Junjie Chen, Xiaoting He, Randolph Yao, Jian-Guang Lou, Murali Chintalapati, Furao Shen, and Dongmei Zhang. 2019. Robust log-based anomaly detection on unstable log data(ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA....

work page doi:10.1145/3338906.3338931 2019

[44] [44]

Yiheng Zhang, Yunkang Cao, Xiaohao Xu, and Weiming Shen. 2024. LogiCode: an LLM-Driven Framework for Logical Anomaly Detection. arXiv:2406.04687 [cs.LG] https://arxiv.org/abs/2406.04687

work page arXiv 2024

[45] [45]

Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, and Michael R Lyu. 2023. Loghub: A large collection of system log datasets for ai-driven log analytics. In2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 355–366. Manuscript submitted to ACM

work page 2023