pith. sign in

arxiv: 2003.07905 · v1 · pith:HD3DW3N2new · submitted 2020-03-17 · 💻 cs.LG · cs.SE

Self-Supervised Log Parsing

classification 💻 cs.LG cs.SE
keywords parsingnulogdetectionresultsanalysisanomalyenablesexisting
0
0 comments X
read the original abstract

Logs are extensively used during the development and maintenance of software systems. They collect runtime events and allow tracking of code execution, which enables a variety of critical tasks such as troubleshooting and fault detection. However, large-scale software systems generate massive volumes of semi-structured log records, posing a major challenge for automated analysis. Parsing semi-structured records with free-form text log messages into structured templates is the first and crucial step that enables further analysis. Existing approaches rely on log-specific heuristics or manual rule extraction. These are often specialized in parsing certain log types, and thus, limit performance scores and generalization. We propose a novel parsing technique called NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling (MLM). In the process of parsing, the model extracts summarizations from the logs in the form of a vector embedding. This allows the coupling of the MLM as pre-training with a downstream anomaly detection task. We evaluate the parsing performance of NuLog on 10 real-world log datasets and compare the results with 12 parsing techniques. The results show that NuLog outperforms existing methods in parsing accuracy with an average of 99% and achieves the lowest edit distance to the ground truth templates. Additionally, two case studies are conducted to demonstrate the ability of the approach for log-based anomaly detection in both supervised and unsupervised scenario. The results show that NuLog can be successfully used to support troubleshooting tasks. The implementation is available at https://github.com/nulog/nulog.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. NLLog: Lightweight, Explainable SOC Anomaly Detection via Log-to-Language Rewriting

    cs.CR 2026-06 unverdicted novelty 4.0

    NLLog rewrites log templates into WHO-WHAT-SEVERITY sentences, applies TF-IDF pooling and tree-ensemble classification with TreeSHAP back-projection, and reports better performance than two reproduced baselines on HDF...