DeepParse: Hybrid Log Parsing with LLM-Synthesized Regex Masks
Pith reviewed 2026-05-09 23:52 UTC · model grok-4.3
The pith
DeepParse has an LLM synthesize reusable regex masks from small log samples, then applies them deterministically inside the Drain algorithm to reach 97.6% average parsing accuracy at lower cost than pure heuristic or LLM methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DeepParse automatically mines reusable variable patterns from small log samples using an LLM and then applies them deterministically through the Drain algorithm. By separating the reasoning phase from execution, the method enables accurate, scalable, and cost-efficient log structuring without relying on brittle handcrafted rules or per-line neural inference.
What carries the argument
LLM-synthesized regex masks that are inserted into the Drain log-parsing algorithm to replace its heuristic variable detection step.
If this is right
- Average variable extraction accuracy reaches 97.6 percent across sixteen standard log-parsing benchmarks.
- Consistency of the extracted templates exceeds that of both heuristic and LLM-only baselines.
- False-alarm rate in a downstream anomaly detector drops by more than thirty percent.
- End-to-end inference latency for the anomaly pipeline falls by thirty-six percent relative to heuristic baselines.
Where Pith is reading between the lines
- Occasional LLM synthesis steps could replace continuous model inference for any text domain whose underlying patterns change slowly.
- The same one-time-pattern-then-deterministic-match design might improve other high-volume text-processing pipelines that currently pay for model calls on every record.
- Engineers would still need a practical rule for deciding when accumulated log drift requires the LLM synthesis step to be rerun.
Load-bearing premise
The patterns the LLM extracts from small samples remain general enough to handle new and evolving log formats without needing fresh LLM calls or manual rule updates for each change.
What would settle it
Running DeepParse on a fresh collection of logs whose formats differ from the original samples and finding that its variable-extraction accuracy falls below that of either a full LLM parser or carefully hand-tuned rules.
Figures
read the original abstract
Modern distributed systems produce massive, heterogeneous logs essential for reliability, security, and anomaly detection. Converting these free-form messages into structured templates (log parsing) is challenging due to evolving formats and limited labeled data. Machine-learning-based parsers like Drain are fast but accuracy often degrades on complex variables, while Large Language Models (LLMs) offer better generalization but incur prohibitive inference costs. This paper presents DeepParse, a hybrid framework that automatically mines reusable variable patterns from small log samples using an LLM, then applies them deterministically through the Drain algorithm. By separating the reasoning phase from execution, DeepParse enables accurate, scalable, and cost-efficient log structuring without relying on brittle handcrafted rules or per-line neural inference. Across 16 benchmark datasets, DeepParse achieves higher accuracy in variable extraction (97.6% average Parsing Accuracy) and better consistency than both heuristic and LLM-only baselines. Integrating DeepParse into an anomaly detection pipeline reduced false alarms by over 30% and reduced inference latency by 36% compared to heuristic baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DeepParse, a hybrid log parsing framework that uses an LLM to automatically synthesize reusable regex masks (variable patterns) from small log samples, which are then applied deterministically within the Drain algorithm. This separates the reasoning phase (LLM) from execution (heuristic), aiming to achieve high accuracy and consistency on heterogeneous logs without handcrafted rules or per-line LLM inference. On 16 benchmark datasets, it reports 97.6% average parsing accuracy in variable extraction, outperforming both heuristic and LLM-only baselines in consistency; integration into an anomaly detection pipeline yields >30% false-alarm reduction and 36% lower inference latency versus heuristic baselines.
Significance. If the empirical claims are substantiated, the hybrid design offers a practical advance for log parsing in distributed systems by mitigating the accuracy limitations of pure heuristics like Drain on complex variables while avoiding the high inference costs of standalone LLMs. The downstream gains in anomaly detection and latency reduction highlight potential impact on reliability and security pipelines handling evolving log formats.
major comments (1)
- [Evaluation] Evaluation section: The central claims rest on 97.6% average Parsing Accuracy, superior consistency, >30% false-alarm reduction, and 36% latency improvement across 16 benchmarks, yet the manuscript supplies no details on experimental setup, data splits, pattern validation procedures, controls for selection bias, or full per-dataset results. This absence prevents assessment of reproducibility and validity of the reported gains.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recommending major revision. We agree that the Evaluation section requires substantially more detail to support reproducibility and to allow proper assessment of the reported gains.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The central claims rest on 97.6% average Parsing Accuracy, superior consistency, >30% false-alarm reduction, and 36% latency improvement across 16 benchmarks, yet the manuscript supplies no details on experimental setup, data splits, pattern validation procedures, controls for selection bias, or full per-dataset results. This absence prevents assessment of reproducibility and validity of the reported gains.
Authors: We agree that the current manuscript lacks the necessary experimental details. In the revised version we will expand the Evaluation section with: (1) a complete description of the experimental setup, including the LLM used for mask synthesis, hardware, and software versions; (2) explicit references and characteristics of all 16 benchmark datasets; (3) the sample-selection procedure for regex mining together with controls for selection bias (e.g., random sampling across multiple runs); (4) the pattern-validation protocol, including how masks were tested for reusability on held-out lines; and (5) full per-dataset tables reporting Parsing Accuracy, consistency scores, anomaly-detection false-alarm rates, and latency for DeepParse and all baselines. These additions will enable independent reproduction of the 97.6 % average accuracy, the >30 % false-alarm reduction, and the 36 % latency improvement. revision: yes
Circularity Check
No significant circularity; empirical method with benchmark validation
full rationale
The paper describes a hybrid log-parsing pipeline (LLM pattern synthesis from small samples followed by deterministic Drain application) and reports empirical results on 16 datasets (97.6% average parsing accuracy, latency and false-alarm reductions). No equations, derivations, or self-citations are present in the provided text that reduce any claimed result to its own inputs by construction. The central claims rest on external benchmark comparisons rather than fitted parameters renamed as predictions or self-referential definitions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM can synthesize generalizable regex patterns from small log samples that remain effective on unseen logs
Reference graph
Works this paper leans on
-
[1]
Regex101: Online regular expression tester and debugger. https://regex101.com/, 2025. Ac- cessed: 2025-10-05
work page 2025
-
[2]
Amazon Web Services. Observability best practices. https://aws.amazon.com/ observability/, 2023. Accessed: January 2026
work page 2023
-
[3]
Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, and Breck Baldwin. Non-determinism of ”determin- istic” llm settings, 2025
work page 2025
-
[4]
System log parsing with large language models: A review, 2025
Viktor Beck, Max Landauer, Markus Wurzen- berger, Florian Skopik, and Andreas Rauber. System log parsing with large language models: A review, 2025
work page 2025
-
[5]
Automatic root cause analysis via large language models for cloud inci- dents
Yinfang Chen, Huaibing Xie, Minghua Ma, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, Xue- dong Gao, Hao Fan, Ming Wen, Jun Zeng, Supriyo Ghosh, Xuchao Zhang, Chaoyun Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Tianyin Xu. Automatic root cause analysis via large language models for cloud inci- dents. InProceedings of the Nineteenth European Conferen...
work page 2024
-
[6]
Hetong Dai, Heng Li, Che-Shao Chen, Weiyi Shang, and Tse-Hsun Chen. Logram: Efficient log parsing using nn-gram dictionaries.IEEE Transactions on Software Engineering, 48(3):879– 892, 2022
work page 2022
-
[7]
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...
work page 2025
-
[8]
Early exploration of using chatgpt for log-based anomaly detection on parallel file systems logs
Chris Egersdoerfer, Di Zhang, and Dong Dai. Early exploration of using chatgpt for log-based anomaly detection on parallel file systems logs. InProceedings of the 32nd International Sym- posium on High-Performance Parallel and Dis- tributed Computing, HPDC ’23, page 315–316, New York, NY, USA, 2023. Association for Com- puting Machinery
work page 2023
-
[9]
Execution anomaly detection in distributed systems through unstructured log analysis
Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li. Execution anomaly detection in distributed systems through unstructured log analysis. In 2009 Ninth IEEE International Conference on Data Mining, pages 149–158, 2009
work page 2009
-
[10]
Log- bert: Log anomaly detection via bert, 2021
Hao Guo, Shuhan Yuan, and Xintao Wu. Log- bert: Log anomaly detection via bert, 2021
work page 2021
-
[11]
Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. Drain: An online log pars- ing approach with fixed depth tree. In2017 IEEE International Conference on Web Services (ICWS), pages 33–40, 2017
work page 2017
-
[12]
A survey on automated log analysis for reliability engineering
Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. A survey on automated log analysis for reliability engineering. ACM Computing Surveys (CSUR), 54(6):1–37, 2021
work page 2021
-
[13]
Loghub 2.0: Towards real- world log analytics at scale
Shilin He, Peng Zhao, Jieming Li, Zibin Zheng, and Michael R Lyu. Loghub 2.0: Towards real- world log analytics at scale. https://github. com/logpai/loghub-2.0, 2024
work page 2024
-
[14]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021
work page 2021
-
[15]
Shaukat Ali Khan, Raimundas Matuleviˇ cius, Shilin He, Xuhui Chen, and Michael R. Lyu. Guidelines for assessing the accuracy of log mes- sage template identification techniques.Empiri- cal Software Engineering, 28(1):1–33, 2023
work page 2023
-
[16]
Zanis Ali Khan, Donghwan Shin, Domenico Bian- culli, and Lionel C. Briand. Impact of log pars- ing on deep learning-based anomaly detection. Empirical Software Engineering, 29(6), August 2024
work page 2024
-
[17]
Log parsing with prompt-based few-shot learning
Van-Hoang Le and Hongyu Zhang. Log parsing with prompt-based few-shot learning. InPro- ceedings of the 45th International Conference on Software Engineering, ICSE ’23, page 2438–2449. IEEE Press, 2023
work page 2023
-
[18]
Yukyung Lee, Jina Kim, and Pilsung Kang. Lanobert: System log anomaly detection based on bert masked language model.Applied Soft Computing, 146:110689, 2023
work page 2023
-
[19]
Length matters: Clustering sys- tem log messages using length of words, 2016
Keiichi Shima. Length matters: Clustering sys- tem log messages using length of words, 2016
work page 2016
-
[20]
Logsig: generating system events from raw tex- tual logs
Liang Tang, Tao Li, and Chang-Shing Perng. Logsig: generating system events from raw tex- tual logs. InProceedings of the 20th ACM Inter- national Conference on Information and Knowl- edge Management, CIKM ’11, page 785–794, New 17 York, NY, USA, 2011. Association for Comput- ing Machinery
work page 2011
-
[21]
A data clustering algorithm for mining patterns from event logs
Risto Vaarandi. A data clustering algorithm for mining patterns from event logs. InProceedings of the 3rd International Workshop on IP Oper- ations and Management, pages 119–126. IEEE, 2003
work page 2003
-
[22]
Logcluster - a data clustering and pattern mining algorithm for event logs
Risto Vaarandi and Mauno Pihelgas. Logcluster - a data clustering and pattern mining algorithm for event logs. In2015 11th International Con- ference on Network and Service Management (CNSM), pages 1–7, 2015
work page 2015
-
[23]
Logparser- llm: Advancing efficient log parsing with large language models
Aoxiao Zhong, Dengyao Mo, Guiyang Liu, Jinbu Liu, Qingda Lu, Qi Zhou, Jiesheng Wu, Quanzheng Li, and Qingsong Wen. Logparser- llm: Advancing efficient log parsing with large language models. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, page 4559–4570, New York, NY, USA, 2024. Asso- ciation for Computing ...
work page 2024
-
[24]
Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, and Michael R. Lyu. Loghub: A large collection of system log datasets for ai-driven log analytics. In2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pages 355–366, 2023. 18
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.