pith. sign in

arxiv: 1907.07480 · v1 · pith:ZUFQIVQEnew · submitted 2019-07-17 · 💻 cs.LG · stat.ML

Remaining Useful Lifetime Prediction via Deep Domain Adaptation

Pith reviewed 2026-05-24 20:23 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords remaining useful lifedomain adaptationLSTMDANNprognosticsdistribution shifttime serieshealth management
0
0 comments X

The pith

Domain adversarial training on LSTM features produces RUL predictions that hold across different operating conditions without target labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Sensor data for remaining useful lifetime prediction often comes from mismatched distributions due to varying operating conditions and fault modes. Standard models trained on one set of conditions degrade when applied to another because they lack labels for retraining in the new domain. The paper trains an LSTM on time-windowed sequences from a labeled source domain while using adversarial training to make the extracted features indistinguishable from those in an unlabeled target domain. A regressor then predicts RUL from these shared features. Experiments on datasets with shifted conditions show the adapted predictions are more reliable than those from a source-only model.

Core claim

A Domain Adversarial Neural Network combined with LSTM extracts features from time-windowed sensor data such that a domain classifier cannot distinguish source from target while an RUL regressor still performs well on the source; the resulting features support direct RUL regression on the target domain that contains only sensor readings and no failure labels.

What carries the argument

Domain Adversarial Neural Network (DANN) that pits an LSTM feature extractor against a domain classifier while jointly training an RUL regressor on the same features.

If this is right

  • RUL models can be deployed on equipment operating under new conditions without first collecting complete run-to-failure traces in those conditions.
  • The same source data can serve multiple target domains provided the sensor modalities remain comparable.
  • Temporal structure captured by the time-window LSTM is preserved while domain-specific statistics are suppressed.
  • Prediction reliability improves specifically when distribution shift arises from operating conditions or fault modes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adversarial setup could be tested on other time-series regression tasks such as predicting equipment degradation rates or energy consumption under shifting regimes.
  • If the learned features prove stable, the approach reduces the cost of labeling new failure events in industrial fleets.
  • Combining the method with modest amounts of target-domain pseudo-labels might close the remaining gap to fully supervised performance.
  • Limits would appear if the target domain introduces entirely new sensor noise characteristics not present in the source.

Load-bearing premise

Adversarial training on time-windowed sequences will produce features that remain useful for RUL regression even after the domain classifier is fooled.

What would settle it

On a target dataset with known RUL values, check whether the mean absolute error of the adapted model is lower than that of a source-only LSTM by a margin larger than the variability seen across multiple random seeds.

Figures

Figures reproduced from arXiv: 1907.07480 by Alp Akcay, Paulo R. de O. da Costa, Uzay Kaymak, Yingqian Zhang.

Figure 1
Figure 1. Figure 1: LSTM memory cell [51]. In our proposed model, we use LSTM layers to extract temporal features contained in the previous time windows of size Tw before a RUL prediction. In an LSTM, the memory cell ( [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The proposed domain adaptation deep network architecture. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Normalised sensor values 100 time steps before a failure for each C-MAPPS dataset. The [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance metrics plot. The Scoring performance metric overpenalises positive errors [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The training procedure of the LSTM-DANN Next, the proposed LSTM-DANN architecture is defined including the number of LSTM and fully connected hidden layers, number of cells in each layer, learning rate and gradient update algorithm. We train using the Rectified Linear Unit (ReLU) as activation function. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: RUL predictions of the TARGET-ONLY, SOURCE-ONLY and LSTM-DANN models [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: RUL predictions of the TARGET-ONLY, SOURCE-ONLY and LSTM-DANN-STD mod [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
read the original abstract

In Prognostics and Health Management (PHM) sufficient prior observed degradation data is usually critical for Remaining Useful Lifetime (RUL) prediction. Most previous data-driven prediction methods assume that training (source) and testing (target) condition monitoring data have similar distributions. However, due to different operating conditions, fault modes, noise and equipment updates distribution shift exists across different data domains. This shift reduces the performance of predictive models previously built to specific conditions when no observed run-to-failure data is available for retraining. To address this issue, this paper proposes a new data-driven approach for domain adaptation in prognostics using Long Short-Term Neural Networks (LSTM). We use a time window approach to extract temporal information from time-series data in a source domain with observed RUL values and a target domain containing only sensor information. We propose a Domain Adversarial Neural Network (DANN) approach to learn domain-invariant features that can be used to predict the RUL in the target domain. The experimental results show that the proposed method can provide more reliable RUL predictions under datasets with different operating conditions and fault modes. These results suggest that the proposed method offers a promising approach to performing domain adaptation in practical PHM applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a domain-adversarial LSTM architecture (DANN) that extracts fixed-length time windows from source-domain sensor sequences with RUL labels and target-domain sequences without labels, trains an LSTM feature extractor adversarially against a domain classifier while regressing RUL only on source labels, and claims that the resulting domain-invariant features yield more reliable RUL predictions on C-MAPSS subsets that differ in operating conditions and fault modes.

Significance. If the central claim holds, the work would provide a practical route to label-free transfer of RUL models across heterogeneous PHM datasets, a setting where run-to-failure data are often unavailable in the target domain.

major comments (2)
  1. [Proposed DANN-LSTM method] The architecture description (time-window LSTM followed by domain classifier on embeddings and RUL regressor on source only) contains no mechanism that aligns degradation trajectories or rates across domains. Because the adversarial loss operates on static embeddings rather than on the mapping from trajectory to remaining life, a shift that alters how sensor evolution maps to RUL can remain after training; the reported gains therefore rest on an untested assumption that feature-level invariance is sufficient.
  2. [Abstract / Experimental results] The abstract asserts that experiments demonstrate improved reliability yet supplies no quantitative metrics, baseline comparisons, dataset sizes, or statistical tests. Without these details the support for the central claim cannot be evaluated.
minor comments (1)
  1. [Method] Notation for the time-window length, LSTM hidden size, and adversarial weighting hyper-parameter should be introduced once and used consistently.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Proposed DANN-LSTM method] The architecture description (time-window LSTM followed by domain classifier on embeddings and RUL regressor on source only) contains no mechanism that aligns degradation trajectories or rates across domains. Because the adversarial loss operates on static embeddings rather than on the mapping from trajectory to remaining life, a shift that alters how sensor evolution maps to RUL can remain after training; the reported gains therefore rest on an untested assumption that feature-level invariance is sufficient.

    Authors: The referee correctly notes that the method performs feature-level alignment via adversarial training on LSTM embeddings extracted from fixed time windows, without an explicit term for aligning degradation rates or trajectory mappings. Our design follows the standard DANN framework, where the LSTM captures temporal dynamics within each window and the adversarial objective encourages domain-invariant representations for the subsequent RUL regressor. We acknowledge that this rests on the assumption that such invariance suffices for the distribution shifts present in the C-MAPSS subsets (different operating conditions and fault modes). The empirical improvements reported in the experiments provide support for this assumption on the evaluated data, but we agree that a more explicit trajectory-alignment mechanism could be investigated. In revision we will add a paragraph in the method and discussion sections clarifying this modeling choice and its limitations. revision: partial

  2. Referee: [Abstract / Experimental results] The abstract asserts that experiments demonstrate improved reliability yet supplies no quantitative metrics, baseline comparisons, dataset sizes, or statistical tests. Without these details the support for the central claim cannot be evaluated.

    Authors: We agree that the abstract should be more informative. The full manuscript contains quantitative results on the C-MAPSS datasets, including comparisons against non-adapted baselines and the proposed DANN-LSTM, but these details are not summarized in the abstract. We will revise the abstract to include key performance metrics (e.g., RMSE or score improvements), mention of the datasets and their sizes, and reference to the baselines used. revision: yes

Circularity Check

0 steps flagged

Standard DANN+LSTM domain adaptation pipeline with no self-referential reductions

full rationale

The paper describes a conventional supervised domain-adaptation setup: time-windowed sensor sequences are processed by LSTM, a domain classifier is trained adversarially on the embeddings, and the regressor is supervised solely by source-domain RUL labels. No equations, loss terms, or claimed derivations in the abstract or description reduce the target-domain RUL predictions to quantities fitted on target labels, self-defined by the alignment objective, or imported via self-citation chains. The method is presented as an application of existing DANN techniques rather than a derivation whose outputs are tautological with its inputs. The reported performance therefore rests on empirical generalization rather than construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated beyond the standard assumption that sensor time series contain transferable temporal patterns and that adversarial feature alignment can bridge domain gaps.

pith-pipeline@v0.9.0 · 5751 in / 1008 out tokens · 28515 ms · 2026-05-24T20:23:18.597717+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 6 internal anchors

  1. [1]

    Atamuradov, K

    V. Atamuradov, K. Medjaher, P. Dersin, B. Lamoureux, N. Zerhouni, Prognostics and health manage- ment for maintenance practitioners-review, implementation and tools evaluation, International Journal of Prognostics and Health Management 8 (2017) 1–31

  2. [2]

    A. K. Jardine, D. Lin, D. Banjevic, A review on machinery diagnostics and prognostics implementing condition-based maintenance, Mechanical Systems and Signal Processing 20 (2006) 1483–1510

  3. [3]

    Papakostas, P

    N. Papakostas, P. Papachatzakis, V. Xanthakis, D. Mourtzis, G. Chryssolouris, An approach to oper- ational aircraft maintenance planning, Decision Support Systems 48 (2010) 604–612

  4. [4]

    Cubillo, S

    A. Cubillo, S. Perinpanayagam, M. Esperon-Miguez, A review of physics-based models in prognostics: Application to gears and bearings of rotating machinery, 2016

  5. [5]

    Y. Lei, N. Li, L. Guo, N. Li, T. Yan, J. Lin, Machinery health prognostics: A systematic review from data acquisition to RUL prediction, 2018

  6. [6]

    X. S. Si, W. Wang, C. H. Hu, D. H. Zhou, Remaining useful life estimation - A review on the statistical data driven approaches, European Journal of Operational Research 213 (2011) 1–14

  7. [7]

    S. Dong, T. Luo, Bearing degradation process prediction based on the PCA and optimized LS-SVM model, Measurement: Journal of the International Measurement Confederation 46 (2013) 3143–3152

  8. [8]

    Benkedjouh, K

    T. Benkedjouh, K. Medjaher, N. Zerhouni, S. Rechak, Remaining useful life estimation based on nonlin- ear feature reduction and support vector regression, Engineering Applications of Artificial Intelligence 26 (2013) 1751–1760

  9. [9]

    Faghih-Roohi, S

    S. Faghih-Roohi, S. Hajizadeh, A. Nunez, R. Babuska, B. De Schutter, Deep convolutional neural networks for detection of rail surface defects, in: Proceedings of the International Joint Conference on Neural Networks, volume 2016-Octob, pp. 2584–2589

  10. [10]

    X. Li, Q. Ding, J. Q. Sun, Remaining useful life estimation in prognostics using deep convolution neural networks, Reliability Engineering and System Safety 172 (2018) 1–11

  11. [11]

    Listou Ellefsen, E

    A. Listou Ellefsen, E. Bjørlykhaug, V. Æsøy, S. Ushakov, H. Zhang, Remaining useful life predictions for turbofan engine degradation using semi-supervised deep architecture, Reliability Engineering & System Safety 183 (2019) 240–251

  12. [12]

    Zheng, K

    S. Zheng, K. Ristovski, A. Farahat, C. Gupta, Long Short-Term Memory Network for Remaining Useful Life estimation, in: 2017 IEEE International Conference on Prognostics and Health Management, ICPHM 2017, pp. 88–95

  13. [13]

    Y. Wu, M. Yuan, S. Dong, L. Lin, Y. Liu, Neurocomputing Remaining useful life estimation of engineered systems using vanilla LSTM neural networks, Neurocomputing 275 (2018) 167–179

  14. [14]

    Y. Hong, W. Q. Meeker, J. D. McCalley, et al., Prediction of remaining life of power transformers based on left truncated and right censored lifetime data, The Annals of Applied Statistics 3 (2009) 857–879

  15. [15]

    S. J. Pan, Q. Yang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering 22 (2010) 1345–1359. 28

  16. [16]

    Jiang, C

    J. Jiang, C. Zhai, Instance weighting for domain adaptation in nlp, in: Proceedings of the 45th annual meeting of the association of computational linguistics, pp. 264–271

  17. [17]

    Fernando, A

    B. Fernando, A. Habrard, M. Sebban, T. Tuytelaars, Unsupervised visual domain adaptation using subspace alignment, in: Proceedings of the IEEE international conference on computer vision, pp. 2960–2967

  18. [18]

    Tzeng, J

    E. Tzeng, J. Hoffman, K. Saenko, T. Darrell, Adversarial discriminative domain adaptation, in: Computer Vision and Pattern Recognition (CVPR), volume 1, p. 4

  19. [19]

    Ganin, E

    Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, V. Lem- pitsky, Domain-adversarial training of neural networks, The Journal of Machine Learning Research 17 (2016) 2096–2030

  20. [20]

    Hochreiter, J

    S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (1997) 1735–1780

  21. [21]

    P. J. Werbos, et al., Backpropagation through time: what it does and how to do it, Proceedings of the IEEE 78 (1990) 1550–1560

  22. [22]

    Saxena, K

    A. Saxena, K. Goebel, D. Simon, N. Eklund, Damage propagation modeling for aircraft engine run-to- failure simulation, in: 2008 International Conference on Prognostics and Health Management, PHM 2008, pp. 1–9

  23. [23]

    D. He, E. Bechhoefer, Development and validation of bearing diagnostic and prognostic tools using hums condition indicators, in: 2008 IEEE Aerospace Conference, pp. 1–8

  24. [24]

    E. Zio, F. Di Maio, A data-driven fuzzy approach for predicting the remaining useful life in dynamic failure scenarios of a nuclear system, Reliability Engineering & System Safety 95 (2010) 49–57

  25. [25]

    Tian, An artificial neural network method for remaining useful life prediction of equipment subject to condition monitoring, Journal of Intelligent Manufacturing 23 (2012) 227–237

    Z. Tian, An artificial neural network method for remaining useful life prediction of equipment subject to condition monitoring, Journal of Intelligent Manufacturing 23 (2012) 227–237

  26. [26]

    Huang, L

    R. Huang, L. Xi, X. Li, C. R. Liu, H. Qiu, J. Lee, Residual life predictions for ball bearings based on self-organizing map and back propagation neural network methods, Mechanical systems and signal processing 21 (2007) 193–207

  27. [27]

    Bengio, P

    Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE transactions on neural networks 5 (1994) 157–166

  28. [28]

    K. Cho, B. Van Merri¨ enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using rnn encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078 (2014)

  29. [29]

    M. Yuan, Y. Wu, L. Lin, Fault diagnosis and remaining useful life estimation of aero engine using lstm neural network, in: 2016 IEEE International Conference on Aircraft Utility Systems (AUS), pp. 135–140

  30. [30]

    Y. Wu, M. Yuan, S. Dong, L. Lin, Y. Liu, Remaining useful life estimation of engineered systems using vanilla lstm neural networks, Neurocomputing 275 (2018) 167–179

  31. [31]

    M. Z. Hossain, F. A. Sohel, M. F. Shiratuddin, H. Laga, A comprehensive survey of deep learning for image captioning., CoRR abs/1810.04020 (2018)

  32. [32]

    G. S. Babu, P. Zhao, X.-L. Li, Deep convolutional neural network based regression approach for estimation of remaining useful life, in: International conference on database systems for advanced applications, Springer, pp. 214–228

  33. [33]

    Huang, A

    J. Huang, A. Gretton, K. Borgwardt, B. Sch¨ olkopf, A. J. Smola, Correcting sample selection bias by unlabeled data, in: Advances in neural information processing systems, pp. 601–608

  34. [34]

    S. J. Pan, I. W. Tsang, J. T. Kwok, Q. Yang, Domain adaptation via transfer component analysis, IEEE Transactions on Neural Networks 22 (2011) 199–210

  35. [35]

    B. Sun, J. Feng, K. Saenko, Return of frustratingly easy domain adaptation., in: AAAI, volume 6, p. 8

  36. [36]

    Ben-David, J

    S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, J. W. Vaughan, A theory of learning from different domains, Machine Learning 79 (2010) 151–175

  37. [37]

    M. Long, Y. Cao, J. Wang, M. I. Jordan, Learning transferable features with deep adaptation networks, arXiv preprint arXiv:1502.02791 (2015)

  38. [38]

    Deep Domain Confusion: Maximizing for Domain Invariance

    E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, T. Darrell, Deep domain confusion: Maximizing for 29 domain invariance, arXiv preprint arXiv:1412.3474 (2014)

  39. [39]

    B. Sun, K. Saenko, Deep coral: Correlation alignment for deep domain adaptation, in: European Conference on Computer Vision, Springer, pp. 443–450

  40. [40]

    Domain-Adversarial Neural Networks

    H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, Domain-adversarial neural net- works, arXiv preprint arXiv:1412.4446 (2014)

  41. [41]

    Goodfellow, J

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural information processing systems, pp. 2672–2680

  42. [42]

    Cortes, M

    C. Cortes, M. Mohri, Domain adaptation and sample bias correction theory and algorithm for regres- sion, Theoretical Computer Science 519 (2014) 103–126

  43. [43]

    Lopez-Paz, J

    D. Lopez-Paz, J. M. Hern´ andez-lobato, B. Sch¨ olkopf, Semi-supervised domain adaptation with non- parametric copulas, in: Advances in neural information processing systems, pp. 665–673

  44. [44]

    Nikzad-Langerodi, W

    R. Nikzad-Langerodi, W. Zellinger, E. Lughofer, S. Saminger-Platz, Domain-invariant partial-least- squares regression, Analytical chemistry 90 (2018) 6693–6701

  45. [45]

    Purushotham, W

    S. Purushotham, W. Carvalho, T. Nilanon, Y. Liu, Variational Adversarial Deep Domain Adaptation for Health Care Time Series Analysis, 29th Conference on Neural Information Processing Systems (2016)

  46. [46]

    W. Lu, B. Liang, Y. Cheng, D. Meng, J. Yang, T. Zhang, Deep model based domain adaptation for fault diagnosis, IEEE Transactions on Industrial Electronics 64 (2017) 2296–2305

  47. [47]

    Zhang, G

    W. Zhang, G. Peng, C. Li, Y. Chen, Z. Zhang, A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals, Sensors 17 (2017) 425

  48. [48]

    X. Li, W. Zhang, Q. Ding, J.-Q. Sun, Multi-layer domain adaptation method for rolling bearing fault diagnosis, Signal Processing 157 (2019) 180–197

  49. [49]

    J. Xie, L. Zhang, L. Duan, J. Wang, On cross-domain feature fusion in gearbox fault diagnosis un- der various operating conditions based on transfer component analysis, in: 2016 IEEE International Conference on Prognostics and Health Management (ICPHM), pp. 1–6

  50. [50]

    X. Li, W. Zhang, Q. Ding, Cross-domain fault diagnosis of rolling element bearings using deep generative neural networks, IEEE Transactions on Industrial Electronics (2018)

  51. [51]

    Olah, Understanding lstm networks, 2015

    C. Olah, Understanding lstm networks, 2015

  52. [52]

    Srivastava, G

    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research 15 (2014) 1929–1958

  53. [53]

    F. O. Heimes, Recurrent neural networks for remaining useful life estimation, in: 2008 International Conference on Prognostics and Health Management, pp. 1–6

  54. [54]

    Tieleman, G

    T. Tieleman, G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural networks for machine learning 4 (2012) 26–31

  55. [55]

    Chollet, et al., Keras, 2015

    F. Chollet, et al., Keras, 2015

  56. [56]

    Abadi, et al., TensorFlow: Large-scale machine learning on heterogeneous systems, 2015

    M. Abadi, et al., TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org

  57. [57]

    D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014)

  58. [58]

    Zhang, P

    C. Zhang, P. Lim, A. Qin, K. C. Tan, Multiobjective deep belief networks ensemble for remaining useful life estimation in prognostics, IEEE transactions on neural networks and learning systems 28 (2017) 2306–2318. 30