arxiv: 2604.13024 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.DB

Recognition: unknown

CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

Benzhao Tang , Shiyu Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:54 UTC · model grok-4.3

classification 💻 cs.LG cs.DB

keywords log anomaly detectioncompressed representationsbyte stream processingdeep learningdilated convolutiontransformer modelsclass imbalance handlingstreaming compression

0 comments

The pith

CLAD detects log anomalies directly from compressed byte streams by identifying disruptions in normal compression patterns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that log anomaly detection can bypass the usual steps of decompression and parsing by working straight on the compressed byte data. Normal logs produce consistent byte patterns under compression, but anomalies break those patterns in systematic ways. A custom neural architecture extracts the multi-scale signals from these opaque bytes and classifies them after a two-stage training process that first learns general byte structures and then focuses on the rare anomaly cases. If this holds, detection becomes faster and lighter while still reaching higher accuracy than methods that fully unpack the logs first.

Core claim

CLAD is the first framework to perform log anomaly detection directly on compressed byte streams. It rests on the observation that normal logs compress into regular byte patterns while anomalies produce detectable multi-scale deviations in those same bytes. The model uses a dilated convolutional encoder to read the raw bytes, a hybrid Transformer-mLSTM to model dependencies across the stream, and four-way aggregation pooling to combine features at different scales, trained first by masked pre-training on byte sequences and then by focal-contrastive fine-tuning to manage class imbalance.

What carries the argument

Dilated convolutional byte encoder combined with hybrid Transformer-mLSTM and four-way aggregation pooling that extracts multi-scale deviations directly from opaque compressed bytes.

If this is right

Detection eliminates all decompression and parsing overhead for streaming logs.
Average F1-score reaches 0.9909 across five datasets while outperforming the best prior method by 2.72 points.
The approach generalizes to structured streaming compressors without modification.
Two-stage training with masked pre-training and focal-contrastive fine-tuning handles severe class imbalance in log data.
Real-time processing becomes feasible on high-volume log streams that would otherwise require heavy pre-processing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar byte-pattern disruption detection could extend to anomaly finding in other compressed streams such as network packets or time-series sensor data.
Removing decompression steps would lower both latency and energy use in continuous monitoring systems that handle terabytes of logs daily.
The architecture's focus on raw bytes might allow direct application to logs compressed by newer or custom algorithms not tested in the original evaluation.

Load-bearing premise

Anomalies in logs will reliably create byte-pattern disruptions in compressed streams that differ from normal logs in ways the model can learn without ever seeing the original text.

What would settle it

A dataset where anomalies compress to byte sequences indistinguishable from normal logs under the same compressor, causing the model's F1 score to fall below that of decompressing baselines.

read the original abstract

The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byte streams. CLAD bypasses these bottlenecks by exploiting a key insight: normal logs compress into regular byte patterns, while anomalies systematically disrupt them. To extract these multi-scale deviations from opaque bytes, we propose a purpose-built architecture integrating a dilated convolutional byte encoder, a hybrid Transformer--mLSTM, and four-way aggregation pooling. This is coupled with a two-stage training strategy of masked pre-training and focal-contrastive fine-tuning to effectively handle severe class imbalance. Evaluated across five datasets, CLAD achieves a state-of-the-art average F1-score of 0.9909 and outperforms the best baseline by 2.72 percentage points. It delivers superior accuracy while completely eliminating decompression and parsing overheads, offering a robust solution that generalizes to structured streaming compressors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CLAD shows a workable path to skip decompression in log anomaly detection by training on byte patterns, but the gains hinge on whether those patterns reliably flag anomalies across compressors.

read the letter

The main point for you is that this paper gets log anomaly detection running straight on compressed byte streams instead of decompressing and parsing first. That removes a real bottleneck for streaming systems, and they report an average F1 of 0.9909 on five datasets while beating the best baseline by 2.72 points. The architecture uses a dilated convolutional encoder on the bytes, a hybrid Transformer-mLSTM, and four-way pooling, plus masked pre-training and focal-contrastive fine-tuning for the imbalance problem.

Referee Report

2 major / 2 minor

Summary. The paper introduces CLAD, the first deep learning framework for log anomaly detection (LAD) performed directly on compressed byte streams without decompression or parsing. It rests on the insight that normal logs produce regular byte patterns under compression while anomalies create detectable multi-scale disruptions. The proposed architecture combines a dilated convolutional byte encoder, a hybrid Transformer-mLSTM, and four-way aggregation pooling, trained via masked pre-training followed by focal-contrastive fine-tuning to address class imbalance. Experiments on five datasets report a state-of-the-art average F1-score of 0.9909, outperforming the best baseline by 2.72 percentage points while eliminating pre-processing overhead and generalizing to structured streaming compressors.

Significance. If the central empirical claims hold, the work offers a practically significant advance by removing decompression and parsing costs in high-volume log processing pipelines. The two-stage training strategy and direct operation on opaque bytes are notable strengths, as is the explicit focus on efficiency. The result could influence future systems-oriented ML research on compressed or encoded data representations, provided the performance is shown to stem from the architecture rather than dataset-specific effects.

major comments (2)

[Abstract and §3] Abstract and §3 (Method): The load-bearing assumption that anomalies 'systematically disrupt' regular byte patterns in compressed streams (enabling reliable detection without decompression) is not supported by ablations on compressor type (e.g., adaptive/dictionary-based vs. fixed) or anomaly injection methods. This leaves open whether the reported F1 gains generalize or are tied to the five specific datasets and compressor used.
[§4 and §5] §4 (Architecture) and §5 (Experiments): No ablation results are presented that isolate the contribution of the dilated convolutional encoder, hybrid Transformer-mLSTM, or four-way pooling versus the masked pre-training and focal-contrastive fine-tuning. Without these, it is unclear whether the 2.72-point improvement is attributable to the novel components or to training choices.

minor comments (2)

[Table 1 or §5.1] Table 1 or §5.1: Dataset characteristics (log formats, compression ratios, anomaly rates) and baseline implementation details should be expanded to allow reproduction and assessment of whether post-hoc selection occurred.
[§5.2] §5.2: Statistical significance of the F1 scores (error bars, multiple random seeds, or paired tests) should be reported to substantiate the SOTA claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will incorporate the suggested analyses into the revised manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Method): The load-bearing assumption that anomalies 'systematically disrupt' regular byte patterns in compressed streams (enabling reliable detection without decompression) is not supported by ablations on compressor type (e.g., adaptive/dictionary-based vs. fixed) or anomaly injection methods. This leaves open whether the reported F1 gains generalize or are tied to the five specific datasets and compressor used.

Authors: We agree that explicit ablations on compressor families and controlled anomaly injection would further substantiate the core insight. The current manuscript already reports results across five heterogeneous datasets and notes generalization to structured streaming compressors. In the revision we will add: (i) experiments with additional compressors (zlib, LZ4, Zstandard) to separate fixed vs. adaptive/dictionary behavior, and (ii) synthetic anomaly-injection studies that quantify byte-pattern disruption. These results will be placed in §3 and §5 and will directly test whether detection performance tracks the hypothesized multi-scale disruptions rather than dataset idiosyncrasies. revision: yes
Referee: [§4 and §5] §4 (Architecture) and §5 (Experiments): No ablation results are presented that isolate the contribution of the dilated convolutional encoder, hybrid Transformer-mLSTM, or four-way pooling versus the masked pre-training and focal-contrastive fine-tuning. Without these, it is unclear whether the 2.72-point improvement is attributable to the novel components or to training choices.

Authors: We concur that component-wise ablations are necessary to attribute gains. We have since run the requested studies: (a) replacing the dilated convolutional encoder with a standard convolution stack, (b) substituting the hybrid Transformer-mLSTM with a pure Transformer or mLSTM-only decoder, (c) removing the four-way aggregation pooling, and (d) comparing the two-stage (masked pre-training + focal-contrastive) regime against single-stage and standard cross-entropy fine-tuning. The new results, to be added as a dedicated subsection and table in §5, show that both the architectural modules and the training strategy contribute non-redundant improvements, with the full CLAD configuration required to reach the reported 0.9909 average F1. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML framework with external dataset validation

full rationale

The paper introduces an empirical deep learning architecture (dilated conv encoder + hybrid Transformer-mLSTM + pooling) and two-stage training for log anomaly detection on compressed byte streams. All performance claims (SOTA F1=0.9909 on five datasets) rest on direct experimental evaluation rather than any derivation, equation, or self-referential reduction. No load-bearing steps reduce predictions to fitted inputs by construction, and no self-citations are invoked to justify core premises. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on a domain assumption about log compression behavior and the effectiveness of a custom neural architecture. Free parameters consist of architectural and training choices not detailed in the abstract. No invented entities are introduced.

free parameters (2)

dilation rates and kernel sizes in byte encoder
Chosen to capture multi-scale byte patterns; specific values not provided in abstract.
hyperparameters for masked pre-training and focal-contrastive fine-tuning
Tuned to handle class imbalance and learn from compressed streams; exact values not specified.

axioms (1)

domain assumption Normal logs compress into regular byte patterns while anomalies systematically disrupt them in detectable ways
Key insight stated in the abstract that underpins the entire approach.

pith-pipeline@v0.9.0 · 5470 in / 1486 out tokens · 60814 ms · 2026-05-10T14:54:29.944150+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 3 canonical work pages · 3 internal anchors

[1]

Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. 2024. xlstm: Extended long short-term memory.Advances in Neural Information Processing Systems37 (2024), 107547–107603

2024
[2]

Rui Chen, Shenglin Zhang, Dongwen Li, Yuzhe Zhang, Fangrui Guo, Weibin Meng, Dan Pei, Yuzhi Zhang, Xu Chen, and Yuqing Liu. 2020. Logtransfer: Cross-system log anomaly detection for software systems with transfer learning. In2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 37–47

2020
[3]

Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security. 1285–1298

2017
[4]

Yangxin Fan, Haolai Che, and Yinghui Wu. 2025. Inference-Friendly Graph Compression for Graph Neural Networks.Proceedings of the VLDB Endowment 18, 9 (2025), 3203–3215

2025
[5]

Jiawei Guan, Feng Zhang, Siqi Ma, Kuangyu Chen, Yihua Hu, Yuxing Chen, Anqun Pan, and Xiaoyong Du. 2023. Homomorphic compression: Making text processing on compression unlimited.Proceedings of the ACM on Management of Data1, 4 (2023), 1–28

2023
[6]

Haixuan Guo, Shuhan Yuan, and Xintao Wu. 2021. Logbert: Log anomaly detec- tion via bert. In2021 international joint conference on neural networks (IJCNN). IEEE, 1–8

2021
[7]

Hao Hu, Qiyang Zheng, Xiangyu Zou, Lisha Qin, Chengwei Zhang, Wanchuan Zhang, Zhaoheng Jiang, Dingwen Tao, Hongpeng Wang, and Wen Xia. 2025. A cost-effective and decompression-transparent compressor for OLTP-oriented databases. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE, 405–418

2025
[8]

Peng Jia, Shaofeng Cai, Beng Chin Ooi, Pinghui Wang, and Yiyuan Xiong. 2023. Robust and transferable log-based anomaly detection.Proceedings of the ACM on Management of Data1, 1 (2023), 1–26

2023
[9]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning.Advances in neural information processing systems33 (2020), 18661–18673

2020
[10]

Van-Hoang Le and Hongyu Zhang. 2021. Log-based anomaly detection with- out log parsing. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 492–504

2021
[11]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. InProceedings of the IEEE international conference on computer vision. 2980–2988

2017
[12]

Jie Liu, Jiamou Liu, Kaiqi Zhao, Yanni Tang, and Wu Chen. 2024. Tp-gnn: Continuous dynamic graph neural network for graph classification. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 2848–2861

2024
[13]

Jinyang Liu, Jieming Zhu, Shilin He, Pinjia He, Zibin Zheng, and Michael R Lyu. 2019. Logzip: Extracting hidden structures via iterative clustering for log compression. In2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 863–873

2019
[14]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Siyang Lu, Xiang Wei, Yandong Li, and Liqiang Wang. 2018. Detecting anomaly in big data system logs using convolutional neural network. In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Cong...

2018
[16]

Lei Ma, Lei Cao, Peter M VanNostrand, Dennis M Hofmann, Yao Su, and Elke A Rundensteiner. 2024. Pluto: Sample selection for robust anomaly detection on polluted log data.Proceedings of the ACM on Management of Data2, 4 (2024), 1–25

2024
[17]

Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, et al. 2019. Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs.. InIjcai, Vol. 19. 4739–4745

2019
[18]

Adam Oliner and Jon Stearley. 2007. What supercomputers say: A study of five system logs. In37th annual IEEE/IFIP international conference on dependable systems and networks (DSN’07). IEEE, 575–584

2007
[19]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

Jiaxing Qi, Shaohan Huang, Zhongzhi Luan, Shu Yang, Carol Fung, Hailong Yang, Depei Qian, Jing Shang, Zhiwen Xiao, and Zhihui Wu. 2023. Loggpt: Exploring chatgpt for log-based anomaly detection. In2023 IEEE international conference on high performance computing & communications, data science & systems, smart city & dependability in sensor, cloud & big dat...

2023
[21]

Kirk Rodrigues, Yu Luo, and Ding Yuan. 2021. CLP: Efficient and scalable search on compressed text logs. In15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). 183–198

2021
[22]

Noam Shazeer. 2020. Glu variants improve transformer.arXiv preprint arXiv:2002.05202(2020)

work page internal anchor Pith review arXiv 2020
[23]

Yicheng Sui, Xiaotian Wang, Tianyu Cui, Tong Xiao, Chenghao He, Shenglin Zhang, Yuzhi Zhang, Xiao Yang, Yongqian Sun, and Dan Pei. 2025. Bridging the gap: Llm-powered transfer learning for log anomaly detection in new software systems. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE, 4414–4427

2025
[24]

Benzhao Tang, Shiyu Yang, Zhitao Shen, Wenjie Zhang, Xuemin Lin, and Zhihong Tian. 2025. LogLite: Lightweight Plug-and-Play Streaming Log Compression. Proceedings of the VLDB Endowment18, 11 (2025), 3757–3770

2025
[25]

Yanni Tang, Zhuoxing Zhang, Kaiqi Zhao, Lanting Fang, Zhenhua Li, and Wu Chen. 2024. Substructure-Aware Log Anomaly Detection.Proceedings of the VLDB Endowment18, 2 (2024), 213–225

2024
[26]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

2017
[27]

Yi Wan, Yilin Liu, Dong Wang, and Yujin Wen. 2021. Glad-paw: Graph-based log anomaly detection by position aware weighted graph attention network. In Pacific-asia conference on knowledge discovery and data mining. Springer, 66–77

2021
[28]

Rui Wang, Devin Gibson, Kirk Rodrigues, Yu Luo, Yun Zhang, Kaibo Wang, Yupeng Fu, Ting Chen, and Ding Yuan. 2024. 𝜇Slope: High Compression and Fast Search on Semi-Structured Logs. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 529–544

2024
[29]

Ziheng Wang, Junyu Wei, Alex Aiken, Guangyan Zhang, Jacob O Tørring, Rain Jiang, Chenyu Jiang, and Wei Xu. 2025. LogCIoud: Fast Search of Compressed Logs on Object Storage.Proceedings of the VLDB Endowment18, 8 (2025), 2362– 2370

2025
[30]

Junyu Wei, Guangyan Zhang, Junchao Chen, Yang Wang, Weimin Zheng, Tingtao Sun, Jiesheng Wu, and Jiangwei Jiang. 2023. Loggrep: Fast and cheap cloud log storage by exploiting both static and runtime patterns. InProceedings of the Eighteenth European Conference on Computer Systems. 452–468

2023
[31]

Junyu Wei, Guangyan Zhang, Yang Wang, Zhiwei Liu, Zhanyang Zhu, Junchao Chen, Tingtao Sun, and Qi Zhou. 2021. On the feasibility of parser-based log compression in Large-Scale cloud systems. In19th USENIX Conference on File and Storage Technologies (FAST 21). 249–262

2021
[32]

Yuxin Wu and Kaiming He. 2018. Group normalization. InProceedings of the European conference on computer vision (ECCV). 3–19

2018
[33]

Yongzheng Xie, Hongyu Zhang, and Muhammad Ali Babar. 2022. Loggd: Detect- ing anomalies from system logs with graph neural networks. In2022 IEEE 22nd International conference on software quality, reliability and security (QRS). IEEE, 299–310

2022
[34]

Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tieyan Liu. 2020. On layer normalization in the transformer architecture. InInternational conference on machine learning. PMLR, 10524–10533

2020
[35]

Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan. 2009. Online system problem detection by mining patterns of console logs. In2009 ninth IEEE international conference on data mining. IEEE, 588–597

2009
[36]

Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-supervised log-based anomaly detection via probabilistic label estimation. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1448–1460

2021
[37]

Guangba Yu, Pengfei Chen, Pairui Li, Tianjun Weng, Haibing Zheng, Yuetang Deng, and Zibin Zheng. 2023. Logreducer: Identify and reduce log hotspots in kernel on the fly. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1763–1775

2023
[38]

Biao Zhang and Rico Sennrich. 2019. Root mean square layer normalization. Advances in neural information processing systems32 (2019)

2019
[39]

Feng Zhang, Weitao Wan, Chenyang Zhang, Jidong Zhai, Yunpeng Chai, Haixi- ang Li, and Xiaoyong Du. 2022. CompressDB: Enabling efficient compressed data direct processing for various databases. InProceedings of the 2022 International Conference on Management of Data. 1655–1669

2022
[40]

Lingzhe Zhang, Tong Jia, Mengxi Jia, Ying Li, Yong Yang, and Zhonghai Wu. 2024. Multivariate log-based anomaly detection for distributed database. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4256–4267

2024
[41]

Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, et al. 2019. Robust log-based anomaly detection on unstable log data. InProceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 807–817

2019
[42]

Yanliang Zhou, Feng Zhang, Tuo Lin, Yuanjie Huang, Saiqin Long, Jidong Zhai, and Xiaoyong Du. 2024. F-tadoc: Fpga-based text analytics directly on compres- sion with hls. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 3739–3752

2024
[43]

Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, and Michael R. Lyu. 2023. Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics. InIEEE International Symposium on Software Reliability Engineering (ISSRE)

2023
[44]

Xuhang Zhu, Xiu Tang, Sai Wu, Jichen Li, Haobo Wang, Chang Yao, Quanqing Xu, and Gang Chen. 2025. CoLA: Model Collaboration for Log-based Anomaly Detection.Proceedings of the VLDB Endowment18, 11 (2025), 3979–3987

2025