Unsupervised Anomaly Detection in Process-Complex Industrial Time Series: A Real-World Case Study

J\"org H\"ahner; Lukas Meitz; Michael Heider; Samineh Bagheri; Sergej Krasnikov; Thorsten Sch\"oler

arxiv: 2604.13928 · v1 · submitted 2026-04-15 · 💻 cs.LG

Unsupervised Anomaly Detection in Process-Complex Industrial Time Series: A Real-World Case Study

Sergej Krasnikov , Lukas Meitz , Samineh Bagheri , Michael Heider , Thorsten Sch\"oler , J\"org H\"ahner This is my paper

Pith reviewed 2026-05-10 13:05 UTC · model grok-4.3

classification 💻 cs.LG

keywords anomaly detectionindustrial time seriesautoencodersisolation forestunsupervised learningtemporal convolutional networksprocess variability

0 comments

The pith

Autoencoders outperform Isolation Forest on complex real-world industrial time series data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper conducts an empirical evaluation of unsupervised anomaly detection techniques on time series data collected from real operating industrial machinery, which exhibits greater complexity due to multi-stage processes than standard benchmarks. It demonstrates that the Isolation Forest method struggles with the non-periodic and multi-scale patterns in the data, while autoencoder-based approaches handle them more effectively. Temporal convolutional autoencoders in particular deliver robust results, and the study highlights why model choice matters in practical settings where anomalies can indicate equipment issues or process faults.

Core claim

On a dataset from fully operational industrial machinery explicitly capturing pronounced process-induced variability, Isolation Forest proves insufficient for modeling the non-periodic, multi-scale dynamics, while autoencoders perform better overall. Temporal convolutional autoencoders achieve the most robust performance, whereas recurrent and variational variants require more careful tuning.

What carries the argument

Autoencoder models for learning to reconstruct normal time series patterns and flag anomalies via reconstruction error, contrasted with Isolation Forest's isolation-based approach.

If this is right

Isolation Forest cannot sufficiently model the complexities of real industrial time series.
Autoencoders provide a more effective unsupervised method for anomaly detection in this domain.
Temporal convolutional autoencoders show the strongest and most stable performance among the tested models.
Recurrent and variational autoencoders may succeed but need extensive parameter tuning to match the robustness of convolutional variants.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar datasets from other manufacturing processes could be used to test if temporal convolutional autoencoders maintain their advantage.
The emphasis on real-world data suggests that research should shift toward more representative industrial benchmarks to improve method applicability.
These findings could inform the design of monitoring systems in factories to prioritize architectures that capture multi-scale temporal features.

Load-bearing premise

The single real-world dataset sufficiently captures the full range of process-induced variability found in industrial settings and that the observed model performance differences generalize to other cases.

What would settle it

If another study applies the same models to a comparable industrial time series dataset and finds that Isolation Forest performs as well as or better than the autoencoders, the claim of their insufficiency would be disproven.

Figures

Figures reproduced from arXiv: 2604.13928 by J\"org H\"ahner, Lukas Meitz, Michael Heider, Samineh Bagheri, Sergej Krasnikov, Thorsten Sch\"oler.

**Figure 1.** Figure 1: Plot of a process sample from the dataset, showing readings from four selected sensors captured during a single run [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Differences in anomaly detection performance across [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Industrial time-series data from real production environments exhibits substantially higher complexity than commonly used benchmark datasets, primarily due to heterogeneous, multi-stage operational processes. As a result, anomaly detection methods validated under simplified conditions often fail to generalize to industrial settings. This work presents an empirical study on a unique dataset collected from fully operational industrial machinery, explicitly capturing pronounced process-induced variability. We evaluate which model classes are capable of capturing this complexity, starting with a classical Isolation Forest baseline and extending to multiple autoencoder architectures. Experimental results show that Isolation Forest is insufficient for modeling the non-periodic, multi-scale dynamics present in the data, whereas autoencoders consistently perform better. Among them, temporal convolutional autoencoders achieve the most robust performance, while recurrent and variational variants require more careful tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a solid empirical case study on real industrial time series that shows autoencoders handling process variability better than Isolation Forest, but the results stay too high-level to pin down why.

read the letter

The punchline is that temporal convolutional autoencoders come out most robust on this dataset while Isolation Forest falls short on the non-periodic, multi-scale patterns, and the authors use that to argue for more realistic benchmarks. The work is new mainly because it brings a fresh, fully operational machinery dataset that actually reflects multi-stage production variability instead of the usual clean benchmarks. That part is useful: it gives practitioners a concrete example of where classical methods break and which off-the-shelf architectures are worth trying first without needing to invent new ones. The comparison itself is straightforward and honest about the gap between lab data and the factory floor. Credit for shipping a real-world collection rather than another synthetic testbed. The soft spots sit in the experimental reporting. The abstract and high-level claims do not spell out dataset size, the precise scoring functions, hyperparameter search details, run-to-run variance, or any statistical tests, so it is hard to judge whether the performance edge is stable or just a tuning artifact. A single-machine case study also limits how far the conclusions travel, even if the authors do not overclaim generalization. No circularity or invented math here, just an empirical head-to-head. This paper is for applied researchers and engineers who need guidance on anomaly detection in messy process data rather than theorists looking for new algorithms. A reader who wants practical model-selection hints will get something out of it, provided they treat the numbers as directional. I would send it to peer review once the authors add the missing metrics, ablations, and a clearer description of the evaluation protocol; the core idea is worth referee time even if the current write-up needs tightening.

Referee Report

2 major / 1 minor

Summary. This paper conducts an empirical evaluation of unsupervised anomaly detection techniques on a real-world dataset from industrial machinery exhibiting process-induced variability. It concludes that the Isolation Forest method is inadequate for handling non-periodic and multi-scale dynamics in the data, while various autoencoder models perform better, with temporal convolutional autoencoders showing the most robust results.

Significance. If substantiated with detailed metrics and statistical analysis, the findings would be significant as they demonstrate the limitations of classical methods in complex industrial settings and highlight the potential of deep learning approaches like TCN autoencoders for such applications. This could inform model selection in real production environments where benchmark datasets fall short.

major comments (2)

[Abstract] The comparative performance claims, including the insufficiency of Isolation Forest and the robustness of temporal convolutional autoencoders, are made at a high level without reporting concrete evaluation metrics (e.g., AUC-ROC, F1-score), anomaly detection thresholds, or any statistical significance tests. This makes it challenging to assess the validity and magnitude of the reported differences.
[Experimental results] The manuscript provides no information on key experimental details such as the size of the dataset (number of samples or time series length), the method for labeling or identifying anomalies, the hyperparameter selection process for each model, or the specific anomaly scoring functions used. These omissions are critical because they could explain observed performance variations independently of the model architectures.

minor comments (1)

[Introduction] The paper could more explicitly discuss how the collected dataset captures 'pronounced process-induced variability' compared to standard benchmarks, perhaps with a table summarizing key dataset statistics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and will revise the paper to improve the presentation of results and experimental details.

read point-by-point responses

Referee: [Abstract] The comparative performance claims, including the insufficiency of Isolation Forest and the robustness of temporal convolutional autoencoders, are made at a high level without reporting concrete evaluation metrics (e.g., AUC-ROC, F1-score), anomaly detection thresholds, or any statistical significance tests. This makes it challenging to assess the validity and magnitude of the reported differences.

Authors: We agree that the abstract presents the findings at a high level. In the revised version, we will incorporate specific quantitative metrics (AUC-ROC and F1-scores) for the key methods, along with a brief reference to the anomaly scoring approach and thresholds. As the work is a single real-world case study without repeated independent trials, we did not perform formal statistical significance tests; we will add a short discussion noting this limitation and the consistency of trends observed across data subsets. revision: yes
Referee: [Experimental results] The manuscript provides no information on key experimental details such as the size of the dataset (number of samples or time series length), the method for labeling or identifying anomalies, the hyperparameter selection process for each model, or the specific anomaly scoring functions used. These omissions are critical because they could explain observed performance variations independently of the model architectures.

Authors: We acknowledge that these experimental details require more explicit description for reproducibility. In the revision, we will expand the relevant sections to state the dataset size (number of samples and time series lengths), clarify the method used to identify anomalies for evaluation purposes (domain-expert review of process logs), detail the hyperparameter selection procedure (including search strategy and validation approach for each model), and specify the anomaly scoring functions (reconstruction error for the autoencoders and the standard Isolation Forest anomaly score). These additions will help rule out alternative explanations for the performance differences. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison of existing methods

full rationale

The paper is an empirical case study that evaluates standard anomaly detection algorithms (Isolation Forest and multiple autoencoder variants) on a collected industrial dataset. No derivations, equations, fitted parameters, or self-referential claims are present; performance claims rest on experimental comparisons rather than any chain that reduces to its own inputs by construction. Self-citations, if any, are not load-bearing for a derivation and do not create circularity under the defined criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an empirical case study applying standard machine learning techniques to a novel dataset; it introduces no free parameters, axioms, or invented entities beyond established anomaly detection methods.

pith-pipeline@v0.9.0 · 5453 in / 1122 out tokens · 48412 ms · 2026-05-10T13:05:07.326801+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams,Learning Internal Representations by Error Propagation. Cambridge, MA, USA: MIT Press, 1986, p. 318–362

work page 1986
[2]

[Online]

Z. Zamanzadeh Darban, G. I. Webb, S. Pan, C. Aggarwal, and M. Salehi, “Deep learning for time series anomaly detection: A survey,” ACM Computing Surveys, vol. 57, no. 1, p. 1–42, Oct. 2024. [Online]. Available: http://dx.doi.org/10.1145/3691338

work page doi:10.1145/3691338 2024
[3]

Analysis of different rnn autoen- coder variants for time series classification and machine prognostics,

W. Yu, I. Kim, and C. Mechefske, “Analysis of different rnn autoen- coder variants for time series classification and machine prognostics,” Mechanical Systems and Signal Processing, vol. 149, p. 107322, 2021

work page 2021
[4]

TSMAE: A novel anomaly detection approach for internet of things time series data using memory-augmented autoencoder,

H. Gao, B. Qiu, R. J. D. Barroso, W. Hussain, Y . Xu, and X. Wang, “TSMAE: A novel anomaly detection approach for internet of things time series data using memory-augmented autoencoder,”IEEE Trans- actions on Network Science and Engineering, vol. 10, pp. 2978–2990, 2023

work page 2023
[5]

Robust anomaly detection for multivariate time series through stochastic recurrent neural network,

Y . Su, Y . Zhao, C. Niu, R. Liu, W. Sun, and D. Pei, “Robust anomaly detection for multivariate time series through stochastic recurrent neural network,” inProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ser. KDD ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 2828–2837. [Online]. A...

work page doi:10.1145/3292500.3330672 2019
[6]

Temporal convolutional autoencoder for unsupervised anomaly detection in time series,

M. Thill, W. Konen, H. Wang, and T. B ¨ack, “Temporal convolutional autoencoder for unsupervised anomaly detection in time series,” Applied Soft Computing, vol. 112, p. 107751, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1568494621006724

work page 2021
[7]

Process data based estimation of tool wear on punching machines using TCN-autoencoder from raw time-series information,

S. Asahi, C. Karadogan, S. Tamura, S. Hayamizu, and M. Liewald, “Process data based estimation of tool wear on punching machines using TCN-autoencoder from raw time-series information,”IOP Conference Series: Materials Science and Engineering, vol. 1157, 2021

work page 2021
[8]

TCAE: Temporal convolutional autoencoders for time series anomaly detection,

J. Park, Y .-S. Park, and C.-I. Kim, “TCAE: Temporal convolutional autoencoders for time series anomaly detection,”2022 Thirteenth In- ternational Conference on Ubiquitous and Future Networks (ICUFN), pp. 421–426, 2022

work page 2022
[9]

A comparative study of detecting anomalies in time series data using lstm and tcn models,

S. Gopali, F. Abri, S. Siami-Namini, and A. Siami Namin, “A comparative study of detecting anomalies in time series data using lstm and tcn models,”arXiv preprint arXiv:2112.09293, 2021. [Online]. Available: https://arxiv.org/abs/2112.09293

work page arXiv 2021
[10]

Dimension reduction for time series with variational autoencoders,

W. Todo, B. Laurent, J.-M. Loubes, and M. Selmani, “Dimension reduction for time series with variational autoencoders,”ArXiv, vol. abs/2204.11060, 2022

work page arXiv 2022
[11]

Weakly augmented variational autoencoder in time series anomaly detection,

Z. Wu, L. Cao, Q. Zhang, J. Zhou, and H. Chen, “Weakly augmented variational autoencoder in time series anomaly detection,”ArXiv, vol. abs/2401.03341, 2024

work page arXiv 2024
[12]

Exploring variational autoencoders and generative latent time-series models for synthetic data generation and forecasting,

S. Dodda, “Exploring variational autoencoders and generative latent time-series models for synthetic data generation and forecasting,”2024 Control Instrumentation System Conference (CISCON), pp. 1–6, 2024

work page 2024
[13]

A literature review framework and open research challenges for predictive maintenance in industry 4.0,

L. Meitz, J. Senge, T. Wagenhals, T. Sch ¨oler, J. H ¨ahner, J. Edinger, and C. Krupitzer, “A literature review framework and open research challenges for predictive maintenance in industry 4.0,”Computers and Industrial Engineering, vol. 206, p. 111193, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0360835225003390

work page 2025
[14]

A taxonomy for complexity estimation of machine data in machine health applications,

L. Meitz, M. Heider, T. Sch ¨oler, and J. H ¨ahner, “A taxonomy for complexity estimation of machine data in machine health applications,” inProceedings of the 21st International Conference on Informatics in Control, Automation and Robotics - V olume 1: ICINCO, INSTICC. SciTePress, 2024, pp. 341–350

work page 2024
[15]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville,Deep Learning. Cambridge, MA: MIT Press, 2016, eLBO definition on p. 624

work page 2016

[1] [1]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams,Learning Internal Representations by Error Propagation. Cambridge, MA, USA: MIT Press, 1986, p. 318–362

work page 1986

[2] [2]

[Online]

Z. Zamanzadeh Darban, G. I. Webb, S. Pan, C. Aggarwal, and M. Salehi, “Deep learning for time series anomaly detection: A survey,” ACM Computing Surveys, vol. 57, no. 1, p. 1–42, Oct. 2024. [Online]. Available: http://dx.doi.org/10.1145/3691338

work page doi:10.1145/3691338 2024

[3] [3]

Analysis of different rnn autoen- coder variants for time series classification and machine prognostics,

W. Yu, I. Kim, and C. Mechefske, “Analysis of different rnn autoen- coder variants for time series classification and machine prognostics,” Mechanical Systems and Signal Processing, vol. 149, p. 107322, 2021

work page 2021

[4] [4]

TSMAE: A novel anomaly detection approach for internet of things time series data using memory-augmented autoencoder,

H. Gao, B. Qiu, R. J. D. Barroso, W. Hussain, Y . Xu, and X. Wang, “TSMAE: A novel anomaly detection approach for internet of things time series data using memory-augmented autoencoder,”IEEE Trans- actions on Network Science and Engineering, vol. 10, pp. 2978–2990, 2023

work page 2023

[5] [5]

Robust anomaly detection for multivariate time series through stochastic recurrent neural network,

Y . Su, Y . Zhao, C. Niu, R. Liu, W. Sun, and D. Pei, “Robust anomaly detection for multivariate time series through stochastic recurrent neural network,” inProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ser. KDD ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 2828–2837. [Online]. A...

work page doi:10.1145/3292500.3330672 2019

[6] [6]

Temporal convolutional autoencoder for unsupervised anomaly detection in time series,

M. Thill, W. Konen, H. Wang, and T. B ¨ack, “Temporal convolutional autoencoder for unsupervised anomaly detection in time series,” Applied Soft Computing, vol. 112, p. 107751, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1568494621006724

work page 2021

[7] [7]

Process data based estimation of tool wear on punching machines using TCN-autoencoder from raw time-series information,

S. Asahi, C. Karadogan, S. Tamura, S. Hayamizu, and M. Liewald, “Process data based estimation of tool wear on punching machines using TCN-autoencoder from raw time-series information,”IOP Conference Series: Materials Science and Engineering, vol. 1157, 2021

work page 2021

[8] [8]

TCAE: Temporal convolutional autoencoders for time series anomaly detection,

J. Park, Y .-S. Park, and C.-I. Kim, “TCAE: Temporal convolutional autoencoders for time series anomaly detection,”2022 Thirteenth In- ternational Conference on Ubiquitous and Future Networks (ICUFN), pp. 421–426, 2022

work page 2022

[9] [9]

A comparative study of detecting anomalies in time series data using lstm and tcn models,

S. Gopali, F. Abri, S. Siami-Namini, and A. Siami Namin, “A comparative study of detecting anomalies in time series data using lstm and tcn models,”arXiv preprint arXiv:2112.09293, 2021. [Online]. Available: https://arxiv.org/abs/2112.09293

work page arXiv 2021

[10] [10]

Dimension reduction for time series with variational autoencoders,

W. Todo, B. Laurent, J.-M. Loubes, and M. Selmani, “Dimension reduction for time series with variational autoencoders,”ArXiv, vol. abs/2204.11060, 2022

work page arXiv 2022

[11] [11]

Weakly augmented variational autoencoder in time series anomaly detection,

Z. Wu, L. Cao, Q. Zhang, J. Zhou, and H. Chen, “Weakly augmented variational autoencoder in time series anomaly detection,”ArXiv, vol. abs/2401.03341, 2024

work page arXiv 2024

[12] [12]

Exploring variational autoencoders and generative latent time-series models for synthetic data generation and forecasting,

S. Dodda, “Exploring variational autoencoders and generative latent time-series models for synthetic data generation and forecasting,”2024 Control Instrumentation System Conference (CISCON), pp. 1–6, 2024

work page 2024

[13] [13]

A literature review framework and open research challenges for predictive maintenance in industry 4.0,

L. Meitz, J. Senge, T. Wagenhals, T. Sch ¨oler, J. H ¨ahner, J. Edinger, and C. Krupitzer, “A literature review framework and open research challenges for predictive maintenance in industry 4.0,”Computers and Industrial Engineering, vol. 206, p. 111193, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0360835225003390

work page 2025

[14] [14]

A taxonomy for complexity estimation of machine data in machine health applications,

L. Meitz, M. Heider, T. Sch ¨oler, and J. H ¨ahner, “A taxonomy for complexity estimation of machine data in machine health applications,” inProceedings of the 21st International Conference on Informatics in Control, Automation and Robotics - V olume 1: ICINCO, INSTICC. SciTePress, 2024, pp. 341–350

work page 2024

[15] [15]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville,Deep Learning. Cambridge, MA: MIT Press, 2016, eLBO definition on p. 624

work page 2016