NLLog: Lightweight, Explainable SOC Anomaly Detection via Log-to-Language Rewriting

Daisuke Inoue; Samuel Ndichu; Seiichi Ozawa; Takeshi Takahashi; Tao Ban

arxiv: 2606.04957 · v1 · pith:6A7FOUJ3new · submitted 2026-06-03 · 💻 cs.CR · cs.IR· cs.LG

NLLog: Lightweight, Explainable SOC Anomaly Detection via Log-to-Language Rewriting

Samuel Ndichu , Tao Ban , Seiichi Ozawa , Takeshi Takahashi , Daisuke Inoue This is my paper

Pith reviewed 2026-06-28 05:33 UTC · model grok-4.3

classification 💻 cs.CR cs.IRcs.LG

keywords log anomaly detectionnatural language rewritingexplainable machine learningsecurity operations centertree ensemble classificationTF-IDF weightingHDFS logsBGL logs

0 comments

The pith

NLLog rewrites log templates into natural-language sentences for accurate anomaly detection and explanations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NLLog as a pipeline that converts parsed log templates into readable WHO-WHAT-SEVERITY sentences. These sentences are then pooled using term-frequency-inverse-document-frequency weighting and classified with tree ensembles, with TreeSHAP used to attribute evidence back to the sentences for review. The approach exceeds reproduced baselines on HDFS and BGL datasets while keeping false-positive rates low and running on commodity hardware. A reader would care because it turns rigid machine logs into something both machines and analysts can work with directly for security monitoring.

Core claim

NLLog deterministically rewrites parsed templates into WHO-WHAT-SEVERITY sentences, pools them with TF-IDF weighting, classifies sessions with tree ensembles, and back-projects evidence with TreeSHAP for analyst review. On HDFS and BGL corpora it exceeds two reproduced matched-protocol baselines; across HDFS, BGL, and the AIT Alert Data Set it sustains low false-positive rates with commodity-hardware latency suitable for security operations center triage. Coverage, sparse-versus-dense, faithfulness, and adversarial ablations show that fallback sufficiency is corpus-dependent, that an enrollment-time coverage check can surface refinement requirements before deployment, and that an auditable d

What carries the argument

The deterministic rewrite of parsed log templates into WHO-WHAT-SEVERITY natural-language sentences that serves as the input representation for TF-IDF pooling and tree-ensemble classification.

If this is right

An enrollment-time coverage check can identify when the rewrite needs refinement before a system is deployed.
Adversarial ablations indicate the representation layer remains usable even under targeted perturbations.
The method delivers explainable outputs via TreeSHAP attributions that map directly back to the rewritten sentences.
Commodity-hardware latency supports triage workflows in live security operations centers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sentence-rewrite step could be tested on non-security event logs such as application performance traces to see whether the representation transfers.
If the rewrite proves stable across streaming inputs, it could support continuous rather than session-based detection.
Connecting the coverage check to automated template refinement might reduce manual maintenance of log parsers over time.

Load-bearing premise

The rewrite of log templates into natural-language sentences keeps enough detail from the original logs to support accurate anomaly classification without losing critical signals.

What would settle it

A controlled test on a log corpus where the rewrite step removes a distinguishing anomaly cue that the original templates contain, causing NLLog performance to fall below the reproduced baselines.

Figures

Figures reproduced from arXiv: 2606.04957 by Daisuke Inoue, Samuel Ndichu, Seiichi Ozawa, Takeshi Takahashi, Tao Ban.

**Figure 2.** Figure 2: AIT-ADS chronological evaluation. We report Precision, Recall, F1, FPR, and AUC-PR (emphasizing AUC-PR on skewed datasets) over five random seeds (42, 1337, 31415, 2025, 8675309), using an 80:20 stratified split with 3-fold stratified cross-validation for model selection [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Template IDs (T) versus WWS representations across three [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Encoder accuracy–latency trade-off on BGL. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 6.** Figure 6: Faithfulness tests for TreeSHAP sentence attribution. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Mimicry padding sweep on HDFS and BGL. 5. Related Work Log anomaly detection has developed along several distinct lines. Classical approaches emphasize parsing, template extraction, and invariant mining [4], [5], [7]–[9], [40]. These methods are often interpretable and efficient, but they typically operate on event IDs, counts, or mined rules rather than semantically richer text, and they may require sub… view at source ↗

**Figure 9.** Figure 9: ROC and precision-recall curves on BGL under the FPR [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: AIT-ADS sessionization sensitivity across maximum session [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 8.** Figure 8: Classifier evaluation: F1 with per-session inference time (top) and false-positive rate on log scale (bottom) across the three datasets. TABLE 11. IDF-POISONING STRESS TEST ON HDFS AND BGL. Dataset Setting Prec. (%) Rec. (%) F1 (%) FPR (%) HDFS Clean train / clean test 99.62 100.00 99.81 0.01 Contaminated IDF 99.62 99.97 99.79 0.01 Frozen clean IDF 99.62 100.00 99.81 0.01 BGL Clean train / clean test 99.35… view at source ↗

**Figure 11.** Figure 11: HDFS sentence-level log-odds attribution for a false positive and a true positive. Positive bars increase the anomaly score; negative bars decrease [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

read the original abstract

System-generated logs underpin security monitoring, yet their rigid template-based format hinders both automated analysis and human comprehension. We present NLLog (Natural-Language Log), a lightweight pipeline that deterministically rewrites parsed templates into WHO-WHAT-SEVERITY sentences, pools them with term-frequency-inverse-document-frequency weighting, classifies sessions with tree ensembles, and back-projects evidence with TreeSHAP for analyst review. On Hadoop Distributed File System (HDFS) and Blue Gene/L (BGL) corpora, NLLog exceeds two reproduced matched-protocol baselines; across HDFS, BGL, and the AIT Alert Data Set, it sustains low false-positive rates with commodity-hardware latency suitable for security operations center triage. Coverage, sparse-versus-dense, faithfulness, and adversarial ablations show that fallback sufficiency is corpus-dependent, that an enrollment-time coverage check can surface refinement requirements before deployment, and that an auditable deterministic rewrite combined with lightweight dense encoding provides a measurable representation layer for log-anomaly detection and triage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NLLog adds a deterministic rewrite of log templates into readable sentences before standard TF-IDF and tree classification, which gives usable explanations and modest gains on HDFS and BGL, but the rewrite risks dropping signals and the results lack error bars or split details.

read the letter

The main point is that NLLog rewrites parsed log templates into fixed WHO-WHAT-SEVERITY sentences, then applies TF-IDF pooling, tree ensembles, and TreeSHAP to produce both detections and explanations. On the HDFS and BGL sets it beats two reproduced baselines while keeping false-positive rates low and running fast enough for SOC use. The pipeline is lightweight and the ablations on coverage, faithfulness, and adversarial cases give practical guidance on when the rewrite holds up.

The rewrite step is the clearest addition. Turning rigid templates into natural-language form makes the model outputs easier for analysts to review, and the back-projection via TreeSHAP is a direct way to surface which parts of the sentence drove a decision. The enrollment-time coverage check is also a sensible engineering touch that flags when a new corpus might need template adjustments.

The rewrite is also the main soft spot. Mapping everything to a fixed sentence schema can strip numeric values, timestamps, or template-specific tokens that matter for anomalies. The abstract mentions faithfulness ablations, but without a direct check such as mutual information between original features and the rewritten sentences it is hard to know whether the reported gains come from better representation or from the rewrite itself acting as a filter. The lack of error bars and exact dataset splits further weakens the performance claims, even if the overall direction looks reasonable.

This paper is aimed at practitioners building log-based detection tools who already use parsing and want an interpretable layer on top. A reader focused on security operations or explainable ML for logs will find the pipeline and ablation results worth examining. The work is coherent on its own terms and the claims are testable, so it deserves peer review even though the validation could be tightened.

Referee Report

2 major / 2 minor

Summary. The paper proposes NLLog, a lightweight pipeline for SOC log anomaly detection that parses templates, deterministically rewrites them into WHO-WHAT-SEVERITY natural-language sentences, applies TF-IDF pooling, classifies with tree ensembles, and back-projects explanations via TreeSHAP. It claims to exceed two reproduced matched-protocol baselines on HDFS and BGL corpora, sustain low false-positive rates across HDFS, BGL, and AIT datasets with commodity-hardware latency, and supports these via coverage, sparse-versus-dense, faithfulness, and adversarial ablations showing corpus-dependent fallback sufficiency and the value of an enrollment-time coverage check.

Significance. If the empirical claims hold, the work provides a practical, explainable representation layer for log-based anomaly detection that balances automation with analyst review, potentially aiding SOC triage without heavy compute. The explicit ablations and emphasis on deterministic rewrite are strengths that allow falsifiable assessment of the representation choice.

major comments (2)

[Abstract] Abstract (pipeline and results paragraphs): The central claim of outperformance and low FP rates rests on the rewrite step preserving distinguishing anomaly signals, yet the faithfulness ablations are described only at high level without an explicit preservation metric (e.g., mutual information between original template features and rewritten sentences, or an ablation replacing the rewrite with raw template tokens) on the exact HDFS/BGL splits used for the headline numbers.
[Abstract] Abstract (results paragraph): Reported gains over reproduced baselines and low false-positive rates are stated without error bars, exact train/test splits, or statistical significance tests, which leaves open the possibility that post-hoc choices or variance could affect the cross-dataset claims.

minor comments (2)

The description of fallback sufficiency being 'corpus-dependent' would benefit from a table or figure quantifying coverage percentages per dataset.
Consider clarifying in the methods whether the WHO-WHAT-SEVERITY mapping is fully deterministic with no tunable thresholds or learned components.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below, indicating planned revisions to improve clarity and rigor without altering the core claims.

read point-by-point responses

Referee: [Abstract] Abstract (pipeline and results paragraphs): The central claim of outperformance and low FP rates rests on the rewrite step preserving distinguishing anomaly signals, yet the faithfulness ablations are described only at high level without an explicit preservation metric (e.g., mutual information between original template features and rewritten sentences, or an ablation replacing the rewrite with raw template tokens) on the exact HDFS/BGL splits used for the headline numbers.

Authors: We agree that the faithfulness section would be strengthened by an explicit preservation metric computed on the precise HDFS/BGL splits. The existing faithfulness ablation demonstrates that the deterministic rewrite maintains anomaly signals through downstream performance, but we will add mutual information between original template features and rewritten sentences plus a raw-token ablation on those exact splits in the revised manuscript. revision: yes
Referee: [Abstract] Abstract (results paragraph): Reported gains over reproduced baselines and low false-positive rates are stated without error bars, exact train/test splits, or statistical significance tests, which leaves open the possibility that post-hoc choices or variance could affect the cross-dataset claims.

Authors: The comment is correct; the abstract and results would benefit from greater statistical transparency. We will revise to report error bars from repeated runs, state the exact train/test splits, and include statistical significance tests for the headline comparisons on HDFS and BGL. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline with external benchmarks and ablations

full rationale

The paper describes a deterministic rewrite pipeline followed by TF-IDF pooling, tree-ensemble classification, and TreeSHAP explanation, evaluated on public corpora (HDFS, BGL, AIT) against reproduced baselines with multiple ablations (coverage, faithfulness, adversarial). No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear in the provided text. Performance claims rest on external data splits and direct comparisons rather than reducing to the method's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, background axioms, or new postulated entities; ledger left empty pending full text.

pith-pipeline@v0.9.1-grok · 5717 in / 1269 out tokens · 32266 ms · 2026-06-28T05:33:19.455562+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 7 canonical work pages · 1 internal anchor

[1]

The base-rate fallacy and the difficulty of intrusion detection,

S. Axelsson, “The base-rate fallacy and the difficulty of intrusion detection,”ACM Transactions on Information and System Security, vol. 3, no. 3, pp. 186–205, 2000

2000
[2]

Outside the closed world: On using ma- chine learning for network intrusion detection,

R. Sommer and V . Paxson, “Outside the closed world: On using ma- chine learning for network intrusion detection,” inIEEE Symposium on Security and Privacy. IEEE, 2010, pp. 305–316

2010
[3]

A survey on intrusion detection systems and techniques,

S. Sharma, B. B. Gupta, and K. Ali, “A survey on intrusion detection systems and techniques,”Journal of Network and Computer Appli- cations, vol. 181, p. 103082, 2021

2021
[4]

Drain: An online log parsing approach with fixed depth tree,

P. He, J. Zhu, Z. Zheng, and M. R. Lyu, “Drain: An online log parsing approach with fixed depth tree,” inIEEE International Conference on Web Services (ICWS). IEEE, 2017, pp. 33–40

2017
[5]

Experience report: System log analysis for anomaly detection,

S. He, J. Zhu, P. He, and M. R. Lyu, “Experience report: System log analysis for anomaly detection,” inIEEE 27th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2016, pp. 207–218

2016
[6]

Deeplog: Anomaly detection and diagnosis from system logs through deep learning,

M. Du, F. Li, G. Zheng, and V . Srikumar, “Deeplog: Anomaly detection and diagnosis from system logs through deep learning,” inProceedings of the ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017, pp. 1285–1298

2017
[7]

A survey on log anomaly detection,

J. Zhu, S. He, and J. Liu, “A survey on log anomaly detection,” Journal of Systems and Software, vol. 193, p. 111477, 2022

2022
[8]

Mining invariants from console logs for system problem detection,

J.-G. Lou, Q. Fu, S. Yang, Y . Xu, and J. Li, “Mining invariants from console logs for system problem detection,” inProceedings of the USENIX Annual Technical Conference. USENIX Association, 2010, pp. 1–14

2010
[9]

A survey on log analysis for anomaly detection,

W. Meng, Y . Liu, and Q. Zhu, “A survey on log analysis for anomaly detection,”Computers & Security, vol. 97, p. 101945, 2020

2020
[10]

Robust log-based anomaly detection on unstable log data,

X. Zhang, Y . Xu, Q. Lin, B. Qiao, H. Zhang, Y . Dang, C. Xie, X. Yang, Q. Cheng, Z. Li, J. Chen, and D. He, “Robust log-based anomaly detection on unstable log data,” inProceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 2019, pp. 807–810

2019
[11]

Self-supervised log parsing,

S. Nedelkoski, J. Bogatinovski, A. Acker, J. Cardoso, and O. Kao, “Self-supervised log parsing,”arXiv preprint arXiv:2003.07905, 2020

work page arXiv 2003
[12]

Deep learning for anomaly detection: A review,

G. Pang, C. Shen, L. Cao, and A. v. d. Hengel, “Deep learning for anomaly detection: A review,”ACM Computing Surveys, vol. 54, no. 2, pp. 1–38, 2021

2021
[13]

LogLLaMA: Transformer-based log anomaly detection with LLaMA,

Z. Yang and I. G. Harris, “Logllama: Transformer-based log anomaly detection with llama,” 2025. [Online]. Available: https://arxiv.org/abs/2503.14849

work page arXiv 2025
[14]

Semi-supervised log-based anomaly detection via prob- abilistic label estimation,

L. Yang, J. Chen, Z. Wang, W. Wang, J. Jiang, X. Dong, and W. Zhang, “Semi-supervised log-based anomaly detection via prob- abilistic label estimation,” in2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021, pp. 1448– 1460

2021
[15]

Log-based anomaly detection without log parsing,

V .-H. Le and H. Zhang, “Log-based anomaly detection without log parsing,” in2021 36th IEEE/ACM International Conference on Au- tomated Software Engineering (ASE). IEEE, 2021, pp. 492–504

2021
[16]

Logprompt: Prompt engineering towards zero-shot and interpretable log analy- sis,

Y . Liu, S. Tao, W. Meng, F. Yao, X. Zhao, and H. Yang, “Logprompt: Prompt engineering towards zero-shot and interpretable log analy- sis,” in2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2024, pp. 364–365

2024
[17]

Loggpt: Log anomaly detection via gpt,

X. Han, S. Yuan, and M. Trabelsi, “Loggpt: Log anomaly detection via gpt,” in2023 IEEE International Conference on Big Data (Big- Data), 2023, pp. 1117–1122

2023
[18]

Bert: Pre- training of deep bidirectional transformers for language understand- ing,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre- training of deep bidirectional transformers for language understand- ing,” inProceedings of NAACL-HLT, 2019, pp. 4171–4186

2019
[19]

Minilm: Deep self-attention distillation for task-agnostic compression of pre- trained transformers,

W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, “Minilm: Deep self-attention distillation for task-agnostic compression of pre- trained transformers,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020, pp. 5776–5788

2020
[20]

Log-based anomaly detection with deep learning: how far are we?

V .-H. Le and H. Zhang, “Log-based anomaly detection with deep learning: how far are we?”Journal of Systems and Software, vol. 188, p. 111300, 2022

2022
[21]

Semi-supervised log anomaly detection through semantic context extraction,

X. Yang, P. Chen, Z. He, Y . Gao, J. Liu, B. Qiao, Y . Dang, and Q. Lin, “Semi-supervised log anomaly detection through semantic context extraction,” inProceedings of the International Conference on Software Engineering (ICSE). IEEE, 2021, pp. 1176–1187

2021
[22]

Logbert: Log anomaly detection via bert,

H. Guo, S. Yuan, and X. Wu, “Logbert: Log anomaly detection via bert,” inProceedings of the International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8

2021
[23]

Term-weighting approaches in automatic text retrieval,

G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,”Information processing & management, vol. 24, no. 5, pp. 513–523, 1988

1988
[24]

Detecting large-scale system problems by mining console logs,

W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale system problems by mining console logs,” inProceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, ser. SOSP ’09. New York, NY , USA: Association for Computing Machinery, 2009, pp. 117–132. [Online]. Available: https://doi.org/10.1145/1629575.1629587

work page doi:10.1145/1629575.1629587 2009
[25]

Loghub: A large collection of system log datasets for ai-driven log analytics,

J. Zhu, S. He, P. He, J. Liu, and M. R. Lyu, “Loghub: A large collection of system log datasets for ai-driven log analytics,” 2023. [Online]. Available: https://arxiv.org/abs/2008.06448

work page arXiv 2023
[26]

What supercomputers say: A study of five system logs,

A. Oliner and J. Stearley, “What supercomputers say: A study of five system logs,” in37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’07), 2007, pp. 575–584

2007
[27]

Introducing a new alert data set for multi-step attack analysis,

M. Landauer, F. Skopik, and M. Wurzenberger, “Introducing a new alert data set for multi-step attack analysis,” in Proceedings of the 17th Cyber Security Experimentation and Test Workshop, ser. CSET ’24. New York, NY , USA: Association for Computing Machinery, 2024, pp. 41–53. [Online]. Available: https://doi.org/10.1145/3675741.3675748

work page doi:10.1145/3675741.3675748 2024
[28]

Maintainable log datasets for evaluation of intrusion detection systems,

M. Landauer, F. Skopik, M. Frank, W. Hotwagner, M. Wurzenberger, and A. Rauber, “Maintainable log datasets for evaluation of intrusion detection systems,”IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466–3482, 2023

2023
[29]

Orthrus: achieving high quality of attribu- tion in provenance-based intrusion detection systems,

B. Jiang, T. Bilot, N. El Madhoun, K. Al Agha, A. Zouaoui, S. Iqbal, X. Han, and T. Pasquier, “Orthrus: achieving high quality of attribu- tion in provenance-based intrusion detection systems,” inProceedings of the 34th USENIX Conference on Security Symposium, ser. SEC ’25. USA: USENIX Association, 2025

2025
[30]

PROGRAPHER: An anomaly detection system based on provenance graph embedding,

F. Yang, J. Xu, C. Xiong, Z. Li, and K. Zhang, “PROGRAPHER: An anomaly detection system based on provenance graph embedding,” in Proceedings of the 32nd USENIX Conference on Security Symposium, ser. SEC ’23. USA: USENIX Association, 2023

2023
[31]

Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning,

S. Wang, Z. Wang, T. Zhou, H. Sun, X. Yin, D. Han, H. Zhang, X. Shi, and J. Yang, “Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning,”IEEE Transactions on Information Forensics and Security, vol. 17, pp. 3972–3987, 2022

2022
[32]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” 2019. [Online]. Available: https://arxiv.org/abs/1908.10084

work page internal anchor Pith review Pith/arXiv arXiv 2019
[33]

Random forests,

L. Breiman, “Random forests,”Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001

2001
[34]

Xgboost: A scalable tree boosting system,

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY , USA: Association for Computing Machinery, 2016, pp. 785–794

2016
[35]

Lightgbm: a highly efficient gradient boosting decision tree,

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y . Liu, “Lightgbm: a highly efficient gradient boosting decision tree,” inProceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY , USA: Curran Associates Inc., 2017, pp. 3149–3157

2017
[36]

From local explanations to global understanding with explainable ai for trees,

S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable ai for trees,”Nature machine intelligence, vol. 2, no. 1, pp. 56–67, 2020

2020
[37]

Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs,

W. Meng, Y . Liu, Y . Zhu, S. Zhang, D. Pei, Y . Liu, Y . Chen, R. Zhang, S. Tao, P. Sun, and R. Zhou, “Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019, pp. 4739–4745

2019
[38]

Logllm: Log- based anomaly detection using large language models,

W. Guan, J. Cao, S. Qian, J. Gao, and C. Ouyang, “Logllm: Log- based anomaly detection using large language models,”arXiv preprint arXiv:2411.08561, 2024

work page arXiv 2024
[39]

Deepcase: Semi-supervised contextual analysis of security events,

T. van Ede, H. Aghakhani, N. Spahn, R. Bortolameotti, M. Cova, A. Continella, M. van Steen, A. Peter, C. Kruegel, and G. Vigna, “Deepcase: Semi-supervised contextual analysis of security events,” inProceedings of the IEEE Symposium on Security and Privacy (SP). IEEE, 2022

2022
[40]

Log clustering based problem identification for online service systems,

Q. Lin, H. Zhang, J.-G. Lou, Y . Zhang, and X. Chen, “Log clustering based problem identification for online service systems,” inProceed- ings of the 38th International Conference on Software Engineering (ICSE). ACM, 2016, pp. 102–111

2016
[41]

Deepead: Explainable anomaly detection from system logs,

X. Wang, K. J. Kim, Y . Wang, T. Koike-Akino, and K. Parsons, “Deepead: Explainable anomaly detection from system logs,” inICC 2023 - IEEE International Conference on Communications, 2023, pp. 771–776

2023
[42]

This looks like that: deep learning for interpretable image recognition,

C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. Su, “This looks like that: deep learning for interpretable image recognition,” Advances in neural information processing systems, vol. 32, 2019

2019
[43]

Understanding black-box predictions via influence functions,

P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” inProceedings of the 34th International Con- ference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, pp. 1885–1894

2017
[44]

A comprehensive study of machine learning techniques for log-based anomaly detection,

S. Ali, C. Boufaied, D. Bianculli, P. Branco, and L. Briand, “A comprehensive study of machine learning techniques for log-based anomaly detection,”Empirical Software Engineering, vol. 30, no. 5, p. 129, 2025. Appendix A. Canonicalization and Template Determination NLLog applies a fixed sequence of regular-expression substitutions, following prior log-ana...

2025

[1] [1]

The base-rate fallacy and the difficulty of intrusion detection,

S. Axelsson, “The base-rate fallacy and the difficulty of intrusion detection,”ACM Transactions on Information and System Security, vol. 3, no. 3, pp. 186–205, 2000

2000

[2] [2]

Outside the closed world: On using ma- chine learning for network intrusion detection,

R. Sommer and V . Paxson, “Outside the closed world: On using ma- chine learning for network intrusion detection,” inIEEE Symposium on Security and Privacy. IEEE, 2010, pp. 305–316

2010

[3] [3]

A survey on intrusion detection systems and techniques,

S. Sharma, B. B. Gupta, and K. Ali, “A survey on intrusion detection systems and techniques,”Journal of Network and Computer Appli- cations, vol. 181, p. 103082, 2021

2021

[4] [4]

Drain: An online log parsing approach with fixed depth tree,

P. He, J. Zhu, Z. Zheng, and M. R. Lyu, “Drain: An online log parsing approach with fixed depth tree,” inIEEE International Conference on Web Services (ICWS). IEEE, 2017, pp. 33–40

2017

[5] [5]

Experience report: System log analysis for anomaly detection,

S. He, J. Zhu, P. He, and M. R. Lyu, “Experience report: System log analysis for anomaly detection,” inIEEE 27th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2016, pp. 207–218

2016

[6] [6]

Deeplog: Anomaly detection and diagnosis from system logs through deep learning,

M. Du, F. Li, G. Zheng, and V . Srikumar, “Deeplog: Anomaly detection and diagnosis from system logs through deep learning,” inProceedings of the ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017, pp. 1285–1298

2017

[7] [7]

A survey on log anomaly detection,

J. Zhu, S. He, and J. Liu, “A survey on log anomaly detection,” Journal of Systems and Software, vol. 193, p. 111477, 2022

2022

[8] [8]

Mining invariants from console logs for system problem detection,

J.-G. Lou, Q. Fu, S. Yang, Y . Xu, and J. Li, “Mining invariants from console logs for system problem detection,” inProceedings of the USENIX Annual Technical Conference. USENIX Association, 2010, pp. 1–14

2010

[9] [9]

A survey on log analysis for anomaly detection,

W. Meng, Y . Liu, and Q. Zhu, “A survey on log analysis for anomaly detection,”Computers & Security, vol. 97, p. 101945, 2020

2020

[10] [10]

Robust log-based anomaly detection on unstable log data,

X. Zhang, Y . Xu, Q. Lin, B. Qiao, H. Zhang, Y . Dang, C. Xie, X. Yang, Q. Cheng, Z. Li, J. Chen, and D. He, “Robust log-based anomaly detection on unstable log data,” inProceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 2019, pp. 807–810

2019

[11] [11]

Self-supervised log parsing,

S. Nedelkoski, J. Bogatinovski, A. Acker, J. Cardoso, and O. Kao, “Self-supervised log parsing,”arXiv preprint arXiv:2003.07905, 2020

work page arXiv 2003

[12] [12]

Deep learning for anomaly detection: A review,

G. Pang, C. Shen, L. Cao, and A. v. d. Hengel, “Deep learning for anomaly detection: A review,”ACM Computing Surveys, vol. 54, no. 2, pp. 1–38, 2021

2021

[13] [13]

LogLLaMA: Transformer-based log anomaly detection with LLaMA,

Z. Yang and I. G. Harris, “Logllama: Transformer-based log anomaly detection with llama,” 2025. [Online]. Available: https://arxiv.org/abs/2503.14849

work page arXiv 2025

[14] [14]

Semi-supervised log-based anomaly detection via prob- abilistic label estimation,

L. Yang, J. Chen, Z. Wang, W. Wang, J. Jiang, X. Dong, and W. Zhang, “Semi-supervised log-based anomaly detection via prob- abilistic label estimation,” in2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021, pp. 1448– 1460

2021

[15] [15]

Log-based anomaly detection without log parsing,

V .-H. Le and H. Zhang, “Log-based anomaly detection without log parsing,” in2021 36th IEEE/ACM International Conference on Au- tomated Software Engineering (ASE). IEEE, 2021, pp. 492–504

2021

[16] [16]

Logprompt: Prompt engineering towards zero-shot and interpretable log analy- sis,

Y . Liu, S. Tao, W. Meng, F. Yao, X. Zhao, and H. Yang, “Logprompt: Prompt engineering towards zero-shot and interpretable log analy- sis,” in2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2024, pp. 364–365

2024

[17] [17]

Loggpt: Log anomaly detection via gpt,

X. Han, S. Yuan, and M. Trabelsi, “Loggpt: Log anomaly detection via gpt,” in2023 IEEE International Conference on Big Data (Big- Data), 2023, pp. 1117–1122

2023

[18] [18]

Bert: Pre- training of deep bidirectional transformers for language understand- ing,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre- training of deep bidirectional transformers for language understand- ing,” inProceedings of NAACL-HLT, 2019, pp. 4171–4186

2019

[19] [19]

Minilm: Deep self-attention distillation for task-agnostic compression of pre- trained transformers,

W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, “Minilm: Deep self-attention distillation for task-agnostic compression of pre- trained transformers,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020, pp. 5776–5788

2020

[20] [20]

Log-based anomaly detection with deep learning: how far are we?

V .-H. Le and H. Zhang, “Log-based anomaly detection with deep learning: how far are we?”Journal of Systems and Software, vol. 188, p. 111300, 2022

2022

[21] [21]

Semi-supervised log anomaly detection through semantic context extraction,

X. Yang, P. Chen, Z. He, Y . Gao, J. Liu, B. Qiao, Y . Dang, and Q. Lin, “Semi-supervised log anomaly detection through semantic context extraction,” inProceedings of the International Conference on Software Engineering (ICSE). IEEE, 2021, pp. 1176–1187

2021

[22] [22]

Logbert: Log anomaly detection via bert,

H. Guo, S. Yuan, and X. Wu, “Logbert: Log anomaly detection via bert,” inProceedings of the International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8

2021

[23] [23]

Term-weighting approaches in automatic text retrieval,

G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,”Information processing & management, vol. 24, no. 5, pp. 513–523, 1988

1988

[24] [24]

Detecting large-scale system problems by mining console logs,

W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale system problems by mining console logs,” inProceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, ser. SOSP ’09. New York, NY , USA: Association for Computing Machinery, 2009, pp. 117–132. [Online]. Available: https://doi.org/10.1145/1629575.1629587

work page doi:10.1145/1629575.1629587 2009

[25] [25]

Loghub: A large collection of system log datasets for ai-driven log analytics,

J. Zhu, S. He, P. He, J. Liu, and M. R. Lyu, “Loghub: A large collection of system log datasets for ai-driven log analytics,” 2023. [Online]. Available: https://arxiv.org/abs/2008.06448

work page arXiv 2023

[26] [26]

What supercomputers say: A study of five system logs,

A. Oliner and J. Stearley, “What supercomputers say: A study of five system logs,” in37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’07), 2007, pp. 575–584

2007

[27] [27]

Introducing a new alert data set for multi-step attack analysis,

M. Landauer, F. Skopik, and M. Wurzenberger, “Introducing a new alert data set for multi-step attack analysis,” in Proceedings of the 17th Cyber Security Experimentation and Test Workshop, ser. CSET ’24. New York, NY , USA: Association for Computing Machinery, 2024, pp. 41–53. [Online]. Available: https://doi.org/10.1145/3675741.3675748

work page doi:10.1145/3675741.3675748 2024

[28] [28]

Maintainable log datasets for evaluation of intrusion detection systems,

M. Landauer, F. Skopik, M. Frank, W. Hotwagner, M. Wurzenberger, and A. Rauber, “Maintainable log datasets for evaluation of intrusion detection systems,”IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466–3482, 2023

2023

[29] [29]

Orthrus: achieving high quality of attribu- tion in provenance-based intrusion detection systems,

B. Jiang, T. Bilot, N. El Madhoun, K. Al Agha, A. Zouaoui, S. Iqbal, X. Han, and T. Pasquier, “Orthrus: achieving high quality of attribu- tion in provenance-based intrusion detection systems,” inProceedings of the 34th USENIX Conference on Security Symposium, ser. SEC ’25. USA: USENIX Association, 2025

2025

[30] [30]

PROGRAPHER: An anomaly detection system based on provenance graph embedding,

F. Yang, J. Xu, C. Xiong, Z. Li, and K. Zhang, “PROGRAPHER: An anomaly detection system based on provenance graph embedding,” in Proceedings of the 32nd USENIX Conference on Security Symposium, ser. SEC ’23. USA: USENIX Association, 2023

2023

[31] [31]

Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning,

S. Wang, Z. Wang, T. Zhou, H. Sun, X. Yin, D. Han, H. Zhang, X. Shi, and J. Yang, “Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning,”IEEE Transactions on Information Forensics and Security, vol. 17, pp. 3972–3987, 2022

2022

[32] [32]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” 2019. [Online]. Available: https://arxiv.org/abs/1908.10084

work page internal anchor Pith review Pith/arXiv arXiv 2019

[33] [33]

Random forests,

L. Breiman, “Random forests,”Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001

2001

[34] [34]

Xgboost: A scalable tree boosting system,

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY , USA: Association for Computing Machinery, 2016, pp. 785–794

2016

[35] [35]

Lightgbm: a highly efficient gradient boosting decision tree,

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y . Liu, “Lightgbm: a highly efficient gradient boosting decision tree,” inProceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY , USA: Curran Associates Inc., 2017, pp. 3149–3157

2017

[36] [36]

From local explanations to global understanding with explainable ai for trees,

S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable ai for trees,”Nature machine intelligence, vol. 2, no. 1, pp. 56–67, 2020

2020

[37] [37]

Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs,

W. Meng, Y . Liu, Y . Zhu, S. Zhang, D. Pei, Y . Liu, Y . Chen, R. Zhang, S. Tao, P. Sun, and R. Zhou, “Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019, pp. 4739–4745

2019

[38] [38]

Logllm: Log- based anomaly detection using large language models,

W. Guan, J. Cao, S. Qian, J. Gao, and C. Ouyang, “Logllm: Log- based anomaly detection using large language models,”arXiv preprint arXiv:2411.08561, 2024

work page arXiv 2024

[39] [39]

Deepcase: Semi-supervised contextual analysis of security events,

T. van Ede, H. Aghakhani, N. Spahn, R. Bortolameotti, M. Cova, A. Continella, M. van Steen, A. Peter, C. Kruegel, and G. Vigna, “Deepcase: Semi-supervised contextual analysis of security events,” inProceedings of the IEEE Symposium on Security and Privacy (SP). IEEE, 2022

2022

[40] [40]

Log clustering based problem identification for online service systems,

Q. Lin, H. Zhang, J.-G. Lou, Y . Zhang, and X. Chen, “Log clustering based problem identification for online service systems,” inProceed- ings of the 38th International Conference on Software Engineering (ICSE). ACM, 2016, pp. 102–111

2016

[41] [41]

Deepead: Explainable anomaly detection from system logs,

X. Wang, K. J. Kim, Y . Wang, T. Koike-Akino, and K. Parsons, “Deepead: Explainable anomaly detection from system logs,” inICC 2023 - IEEE International Conference on Communications, 2023, pp. 771–776

2023

[42] [42]

This looks like that: deep learning for interpretable image recognition,

C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. Su, “This looks like that: deep learning for interpretable image recognition,” Advances in neural information processing systems, vol. 32, 2019

2019

[43] [43]

Understanding black-box predictions via influence functions,

P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” inProceedings of the 34th International Con- ference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, pp. 1885–1894

2017

[44] [44]

A comprehensive study of machine learning techniques for log-based anomaly detection,

S. Ali, C. Boufaied, D. Bianculli, P. Branco, and L. Briand, “A comprehensive study of machine learning techniques for log-based anomaly detection,”Empirical Software Engineering, vol. 30, no. 5, p. 129, 2025. Appendix A. Canonicalization and Template Determination NLLog applies a fixed sequence of regular-expression substitutions, following prior log-ana...

2025