Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization

Guijiang Li; Jing Li; Mingfei Lu; Yang Hu; Yong Si; Yueheng Song; Zhaokui Wang

arxiv: 2604.07171 · v1 · submitted 2026-04-08 · 💻 cs.LG

Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization

Yong Si , Mingfei Lu , Jing Li , Yang Hu , Guijiang Li , Yueheng Song , Zhaokui Wang This is my paper

Pith reviewed 2026-05-10 17:49 UTC · model grok-4.3

classification 💻 cs.LG

keywords hierarchical reinforcement learningfleet maintenance optimizationprognostics and health managementlogistics decision makingsparse rewardsdiscrete-event simulationstochastic mission profiles

0 comments

The pith

A two-tier hierarchical reinforcement learning system decomposes fleet maintenance into strategic and tactical levels to handle complexity and sparse rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a hierarchical reinforcement learning framework to optimize sequential maintenance and logistics decisions for large aircraft fleets under uncertainty. It splits the control task so a top-level commander sets fleet-wide availability and cost goals while lower-level commanders handle daily scheduling, resource use, and sortie generation. Layered rewards and planning-enhanced networks are added to manage delayed feedback. The approach is tested in a detailed simulation of aircraft operations and support logistics, where it trains faster and scales more reliably than single-level deep reinforcement learning or rule-based alternatives. If the results hold, this structure could make reinforcement learning usable for real-time decisions in high-dimensional, stochastic systems like military aviation support.

Core claim

By decomposing the complex fleet PHM control problem into a two-tier hierarchy, a strategic General Commander manages overall availability and cost objectives while tactical Operation Commanders execute specific actions for sortie generation, maintenance scheduling, and resource allocation; integrating layered reward shaping with planning-enhanced neural networks addresses sparse and delayed rewards, enabling the system to outperform monolithic deep reinforcement learning and rule-based baselines in training speed, scalability, and robustness within a high-fidelity discrete-event simulation.

What carries the argument

The two-tier hierarchy with a General Commander overseeing fleet-level availability and costs and Operation Commanders managing specific maintenance and logistics actions, supported by layered reward shaping and planning-enhanced networks to process sparse feedback.

If this is right

The hierarchy allows training time to stay manageable as fleet size grows rather than exploding with the full state space.
Robustness improves because tactical commanders can adapt locally even when unexpected failures occur at the fleet level.
Layered rewards and planning integration make it possible to learn effective policies despite long delays between actions and outcomes in logistics chains.
The framework produces policies that maintain higher aircraft availability at lower cost than flat methods under stochastic mission demands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar hierarchical decompositions could simplify reinforcement learning for other large-scale logistics problems such as supply-chain scheduling or power-grid maintenance.
The simulation results suggest that adding explicit planning layers may be a general way to reduce sample requirements in delayed-reward domains.
If the hierarchy generalizes, it could lower the barrier to applying reinforcement learning in safety-critical fleet settings where full monolithic training is impractical.

Load-bearing premise

The custom-built high-fidelity discrete-event simulation accurately captures the real dynamics of aircraft configuration, support logistics, stochastic mission profiles, and sparse feedback encountered in actual fleet operations.

What would settle it

Deploying the trained hierarchical policies on real fleet operational data or in a live test environment and measuring whether the reported gains in training time, availability, and robustness persist relative to monolithic and rule-based baselines.

Figures

Figures reproduced from arXiv: 2604.07171 by Guijiang Li, Jing Li, Mingfei Lu, Yang Hu, Yong Si, Yueheng Song, Zhaokui Wang.

**Figure 1.** Figure 1: Hierarchical decision-making architecture of the Smart Commander framework. The General Commander operates at the strategic level, [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Training dynamics under nominal conditions. The Smart Commander (HRL, purple) converges faster than DRL (orange) across all metrics. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Training rewards under nominal conditions. Top: General Commander reward ( [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Scalability analysis under varying system complexity ( [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Robustness analysis under varying failure intensities ( [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Simulation model architecture with mission, fleet-health, and support/logistics modules. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Simulation flow of fleet operations in each DES cycle. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Rule-based policy: mission selection and fleet state evolution under nominal conditions. [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Flat DRL policy: mission selection and fleet state evolution under nominal conditions. [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: HRL Smart Commander: mission selection and fleet state evolution under nominal conditions. [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

read the original abstract

Decision-making in military aviation Prognostics and Health Management (PHM) faces significant challenges due to the "curse of dimensionality" in large-scale fleet operations, combined with sparse feedback and stochastic mission profiles. To address these issues, this paper proposes Smart Commander, a novel Hierarchical Reinforcement Learning (HRL) framework designed to optimize sequential maintenance and logistics decisions. The framework decomposes the complex control problem into a two-tier hierarchy: a strategic General Commander manages fleet-level availability and cost objectives, while tactical Operation Commanders execute specific actions for sortie generation, maintenance scheduling, and resource allocation. The proposed approach is validated within a custom-built, high-fidelity discrete-event simulation environment that captures the dynamics of aircraft configuration and support logistics.By integrating layered reward shaping with planning-enhanced neural networks, the method effectively addresses the difficulty of sparse and delayed rewards. Empirical evaluations demonstrate that Smart Commander significantly outperforms conventional monolithic Deep Reinforcement Learning (DRL) and rule-based baselines. Notably, it achieves a substantial reduction in training time while demonstrating superior scalability and robustness in failure-prone environments. These results highlight the potential of HRL as a reliable paradigm for next-generation intelligent fleet management.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies a two-tier HRL structure to fleet PHM in simulation but the outperformance claims depend on an unvalidated custom environment.

read the letter

This paper takes hierarchical RL and splits fleet maintenance decisions into a strategic General Commander for availability and cost goals plus tactical Operation Commanders for maintenance scheduling and resource moves. That decomposition is a reasonable way to manage scale and the sparse delayed rewards that come with aircraft operations. The planning-enhanced networks and layered reward shaping are straightforward additions to help the agents learn under uncertainty. Those pieces are the actual new elements here, as a domain-specific two-tier setup for PHM rather than a general algorithmic advance. The custom discrete-event simulation tries to include aircraft configurations, sortie generation, and logistics, which at least moves past abstract toy problems. If the full text includes implementation details on how the hierarchy coordinates, that could be useful for others building similar systems. The main weakness is that all the reported gains in training time, scalability, and robustness sit inside that one simulation. The abstract gives no numbers, no error bars, no ablation results, and no sign that the model was calibrated against real fleet data or tested for sensitivity on failure rates and mission profiles. Without those checks the improvements could easily be artifacts of the chosen environment rather than reliable algorithmic progress. The stress-test note on missing calibration is on target. This is aimed at people working on RL for logistics or military maintenance optimization. A reader already familiar with HRL would see a concrete example of how to layer commanders for fleet-level problems and might borrow the reward-shaping approach. It does not claim new theory and the empirical side needs more support, but the problem framing and architecture show clear thinking. I would bring it to a reading group to discuss the simulation assumptions. It is not ready to cite without further validation, but the work is coherent enough that a serious editor should send it to referees rather than desk-reject. Recommendation: send for peer review and ask specifically for calibration evidence and sensitivity results.

Referee Report

3 major / 2 minor

Summary. The paper proposes Smart Commander, a two-tier hierarchical reinforcement learning framework for fleet-level Prognostics and Health Management (PHM) decision optimization in military aviation. A strategic General Commander handles high-level availability and cost objectives while tactical Operation Commanders manage sortie generation, maintenance scheduling, and resource allocation. The method incorporates layered reward shaping and planning-enhanced networks to address sparse rewards and is evaluated in a custom high-fidelity discrete-event simulation, claiming substantial outperformance over monolithic DRL and rule-based baselines in training time, scalability, and robustness under failure-prone conditions.

Significance. If the simulation faithfully reproduces real fleet dynamics and the empirical gains are reproducible, the work could meaningfully advance scalable HRL applications to high-dimensional, stochastic PHM problems with delayed feedback, offering a practical path toward improved aircraft availability and reduced logistics costs in large-scale operations.

major comments (3)

[Abstract] Abstract: the central empirical claim of 'significant outperformance' and 'substantial reduction in training time' is stated without any quantitative metrics, confidence intervals, ablation results, or baseline implementation details, rendering the primary result unverifiable from the provided text.
[Validation / Experiments] The validation section (implied by the abstract's simulation description): the custom discrete-event simulation is presented as high-fidelity yet no calibration against historical fleet data, parameter sensitivity analysis on failure rates or mission profiles, or cross-validation with real operations is reported; this assumption is load-bearing for all applicability conclusions.
[Method] Method description: while the two-tier hierarchy is outlined, no explicit equations or pseudocode define the inter-level communication, reward decomposition, or how the planning-enhanced networks are trained, preventing assessment of whether the claimed robustness stems from the architecture or from simulation-specific tuning.

minor comments (2)

[Method] Notation for the two commander levels and reward components should be introduced with consistent symbols and a table of definitions to improve readability.
[Abstract] The abstract mentions 'failure-prone environments' without specifying the failure rate distribution or how it was sampled; a brief parameter table would clarify reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major point below, indicating where revisions will be made to improve clarity, verifiability, and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim of 'significant outperformance' and 'substantial reduction in training time' is stated without any quantitative metrics, confidence intervals, ablation results, or baseline implementation details, rendering the primary result unverifiable from the provided text.

Authors: We agree that the abstract would be strengthened by including key quantitative results. While the full manuscript provides these details (including metrics, confidence intervals, ablation studies, and baseline specifications) in the experimental evaluation, the abstract itself does not. We will revise the abstract to incorporate specific performance figures and references to the supporting tables and figures. revision: yes
Referee: [Validation / Experiments] The validation section (implied by the abstract's simulation description): the custom discrete-event simulation is presented as high-fidelity yet no calibration against historical fleet data, parameter sensitivity analysis on failure rates or mission profiles, or cross-validation with real operations is reported; this assumption is load-bearing for all applicability conclusions.

Authors: The simulation is constructed from domain-standard models of aircraft availability, failure processes, and logistics, but we acknowledge that direct calibration to classified historical data is not possible in this work. We will add a dedicated parameter sensitivity analysis on failure rates and mission profiles in the revised validation section, along with expanded discussion of how parameter choices align with published PHM literature. This addresses the concern about robustness without claiming direct real-world calibration. revision: partial
Referee: [Method] Method description: while the two-tier hierarchy is outlined, no explicit equations or pseudocode define the inter-level communication, reward decomposition, or how the planning-enhanced networks are trained, preventing assessment of whether the claimed robustness stems from the architecture or from simulation-specific tuning.

Authors: We agree that formal definitions are needed for reproducibility. We will add explicit equations for inter-level communication and reward decomposition, as well as pseudocode for the training procedure of the planning-enhanced networks, in the revised method section. These additions will clarify the architectural mechanisms independent of simulation details. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on independent empirical comparisons

full rationale

The paper proposes a hierarchical RL framework (Smart Commander) that decomposes fleet PHM decisions into strategic and tactical levels, then reports empirical outperformance versus monolithic DRL and rule-based baselines inside a custom discrete-event simulation. No equations, derivations, or first-principles results are shown that reduce any claimed prediction or performance metric to fitted parameters or self-referential definitions by construction. The simulation functions as an external testbed for scalability and robustness rather than an input that is redefined as output. Any self-citations present do not carry the load of the central empirical claims, satisfying the criteria for a self-contained, non-circular derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unverified fidelity of a custom simulation and the effectiveness of the chosen hierarchy and reward shaping; no explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5517 in / 1175 out tokens · 21265 ms · 2026-05-10T17:49:55.873462+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The framework decomposes the complex control problem into a two-tier hierarchy: a strategic General Commander manages fleet-level availability and cost objectives, while tactical Operation Commanders execute specific actions for sortie generation, maintenance scheduling, and resource allocation.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By integrating layered reward shaping with planning-enhanced neural networks, the method effectively addresses the difficulty of sparse and delayed rewards.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

Prognostics and health management (phm): Where are we and where do we (need to) go in theory and practice.Reliability Engineering & System Safety, 218:108119, 2022

Enrico Zio. Prognostics and health management (phm): Where are we and where do we (need to) go in theory and practice.Reliability Engineering & System Safety, 218:108119, 2022

work page 2022
[2]

A systematic literature review of predictive maintenance for defence fixed-wing aircraft sustainment and operations.Sensors, 22(18):7070, 2022

Michael J Scott, Wim JC Verhagen, Marie T Bieber, and Pier Marzocca. A systematic literature review of predictive maintenance for defence fixed-wing aircraft sustainment and operations.Sensors, 22(18):7070, 2022

work page 2022
[3]

Dynamic fleet maintenance management model applied to rolling stock.Reliability Engineering & System Safety, 240:109607, 2023

Adolfo Crespo del Castillo, José Antonio Marcos, and Ajith Kumar Parlikad. Dynamic fleet maintenance management model applied to rolling stock.Reliability Engineering & System Safety, 240:109607, 2023

work page 2023
[4]

Adolfo Crespo del Castillo and Ajith Kumar Parlikad. Dynamic fleet management: Integrating predictive and preventive maintenance with operation workload balance to minimise cost.Reliability Engineering & System Safety, 249:110243, 2024

work page 2024
[5]

Robert Meissner, Antonia Rahn, and Kai Wicke. Developing prescriptive maintenance strategies in the aviation industry based on a discrete-event simulation framework for post-prognostics decision making.Reliability Engineering & System Safety, 214:107812, 2021

work page 2021
[6]

Diagnostics and prognostics for complex systems: A review of methods and challenges.Quality and Reliability Engineering International, 37(8):3746–3778, 2021

Morteza Soleimani, Felician Campean, and Daniel Neagu. Diagnostics and prognostics for complex systems: A review of methods and challenges.Quality and Reliability Engineering International, 37(8):3746–3778, 2021

work page 2021
[7]

Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints.Reliability Engineering & System Safety, 212:107551, 2021

Charalampos P Andriotis and Konstantinos G Papakonstantinou. Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints.Reliability Engineering & System Safety, 212:107551, 2021

work page 2021
[8]

Reinforcement learning in reliability and maintenance optimization: A tutorial.Reliability Engineering & System Safety, 251:110401, 2024

Qin Zhang, Yu Liu, Yisha Xiang, and Tangfan Xiahou. Reinforcement learning in reliability and maintenance optimization: A tutorial.Reliability Engineering & System Safety, 251:110401, 2024

work page 2024
[9]

A survey on reinforcement learning in aviation applications.Engineering Applications of Artificial Intelligence, 136:108911, 2024

Pouria Razzaghi, Amin Tabrizian, Wei Guo, Shulu Chen, Abenezer Taye, Ellis Thompson, Alexis Bregeon, Ali Baheri, and Peng Wei. A survey on reinforcement learning in aviation applications.Engineering Applications of Artificial Intelligence, 136:108911, 2024

work page 2024
[10]

Reinforcement learning for dynamic condition-based maintenance of a system with individually repairable components.Quality Engineering, 32(3):388–408, 2020

Nima Yousefi, Sotirios Tsianikas, and David W Coit. Reinforcement learning for dynamic condition-based maintenance of a system with individually repairable components.Quality Engineering, 32(3):388–408, 2020

work page 2020
[11]

Yunfei Zhao and Carol Smidts. Reinforcement learning for adaptive maintenance policy optimization under imperfect knowledge of the system degradation model and partial observability of system states.Reliability Engineering & System Safety, 224:108541, 2022

work page 2022
[12]

Nailong Zhang and Wujun Si. Deep reinforcement learning for condition-based maintenance planning of multi-component systems under dependent competing risks.Reliability Engineering & System Safety, 203:107094, 2020

work page 2020
[13]

Iordanis Tseremoglou and Bruno F. Santos. Condition-based maintenance scheduling of an aircraft fleet under partial observability: A deep reinforcement learning approach.Reliability Engineering & System Safety, 241:109582, 2024

work page 2024
[14]

Zhang, B

Y. Zhang, B. Cai, C. Gao, Y. Zhao, X. Shao, and C. Yang. A system-centred predictive maintenance re-optimization method based on multi-agent deep reinforcement learning.Expert Systems with Applications, 274:127034, 2025

work page 2025
[15]

Reinforcement learning for predictive maintenance: A systematic technical review.Artificial Intelligence Review, 56(11):12885–12947, 2023

Rajesh Siraskar, Satish Kumar, Shruti Patil, Arunkumar Bongale, and Ketan Kotecha. Reinforcement learning for predictive maintenance: A systematic technical review.Artificial Intelligence Review, 56(11):12885–12947, 2023

work page 2023
[16]

Hierarchical reinforcement learning: A comprehensive survey.ACM Computing Surveys (CSUR), 54(5):1–35, 2021

Shubham Pateria, Budhitama Subagdja, Ah-hwee Tan, and Chai Quek. Hierarchical reinforcement learning: A comprehensive survey.ACM Computing Surveys (CSUR), 54(5):1–35, 2021

work page 2021
[17]

Multi-agent deep reinforcement learning: a survey.Artificial Intelligence Review, 55(2):895–943, 2022

Sven Gronauer and Klaus Diepold. Multi-agent deep reinforcement learning: a survey.Artificial Intelligence Review, 55(2):895–943, 2022

work page 2022
[18]

Alma: Hierarchical learning for composite multi-agent tasks

Shariq Iqbal, Robby Costales, and Fei Sha. Alma: Hierarchical learning for composite multi-agent tasks. InAdvances in Neural Information Processing Systems, volume 35, pages 7155–7166, 2022

work page 2022
[19]

Prognostics and health management: A review from the perspectives of design, development and decision.Reliability Engineering & System Safety, 217:108063, 2022

Yang Hu, Xuewen Miao, Yong Si, Ershun Pan, and Enrico Zio. Prognostics and health management: A review from the perspectives of design, development and decision.Reliability Engineering & System Safety, 217:108063, 2022

work page 2022
[20]

Explainable artificial intelligence for fault diagnosis of industrial processes.IEEE Transactions on Industrial Informatics, 21:4–11, 2025

Kyojin Jang, Karl Ezra Salgado Pilario, Nayoung Lee, Il Moon, and Jonggeol Na. Explainable artificial intelligence for fault diagnosis of industrial processes.IEEE Transactions on Industrial Informatics, 21:4–11, 2025

work page 2025
[21]

Maintenance planning recommendation of complex industrial equipment based on knowledge graph and graph neural network.Reliability Engineering & System Safety, 232:109068, 2023

Liqiao Xia, Yongshi Liang, Jiewu Leng, and Pai Zheng. Maintenance planning recommendation of complex industrial equipment based on knowledge graph and graph neural network.Reliability Engineering & System Safety, 232:109068, 2023

work page 2023
[22]

Catarina Silva, Pedro Andrade, Bernardete Ribeiro, and Bruno F. Santos. Adaptive reinforcement learning for task scheduling in aircraft maintenance.Scientific Reports, 13(1):16605, 2023

work page 2023
[23]

Meimei Zheng, Zhiyun Su, Dong Wang, and Ershun Pan. Joint maintenance and spare part ordering from multiple suppliers for multicomponent systems using a deep reinforcement learning algorithm.Reliability Engineering & System Safety, 241:109628, 2024

work page 2024
[24]

Optimization of multi-echelon spare parts inventory systems using multi-agent deep reinforcement learning

Yifan Zhou, Kai Guo, Cheng Yu, and Zhisheng Zhang. Optimization of multi-echelon spare parts inventory systems using multi-agent deep reinforcement learning. Applied Mathematical Modelling, 125:827–844, 2024

work page 2024
[25]

Reinforcement learning-driven maintenance strategy: A novel solution for long-term aircraft maintenance decision optimization.Computers & Industrial Engineering, 153:107056, 2021

Yang Hu, Xuewen Miao, Jun Zhang, Jie Liu, and Ershun Pan. Reinforcement learning-driven maintenance strategy: A novel solution for long-term aircraft maintenance decision optimization.Computers & Industrial Engineering, 153:107056, 2021

work page 2021
[26]

Deep reinforcement learning for predictive aircraft maintenance using probabilistic remaining-useful-life prognostics.Reliability Engineering & System Safety, 230:108908, 2023

Lennart Lee and Mihaela Mitici. Deep reinforcement learning for predictive aircraft maintenance using probabilistic remaining-useful-life prognostics.Reliability Engineering & System Safety, 230:108908, 2023

work page 2023
[27]

Joint optimization of maintenance and quality inspection for manufacturing networks based on deep reinforcement learning.Reliability Engineering & System Safety, 245:109290, 2024

Ye, Cai, Yang, Si, and Zhou. Joint optimization of maintenance and quality inspection for manufacturing networks based on deep reinforcement learning.Reliability Engineering & System Safety, 245:109290, 2024

work page 2024
[28]

Jian Zuo, Nadia Yousfi Steiner, Zhongliang Li, Catherine Cadet, Christophe Bérenguer, and Daniel Hissel. Reinforcement learning-based maintenance scheduling for a stochastic deteriorating fuel cell considering stack-to-stack heterogeneity.Reliability Engineering & System Safety, 247:110700, 2024

work page 2024
[29]

An intelligent preventive maintenance method based on reinforcement learning for battery energy storage systems.IEEE Transactions on Industrial Informatics, 17(12):8254–8264, 2021

Qilong Wu, Qiang Feng, Yi Ren, Quan Xia, Zhen Wang, and Bingqian Cai. An intelligent preventive maintenance method based on reinforcement learning for battery energy storage systems.IEEE Transactions on Industrial Informatics, 17(12):8254–8264, 2021

work page 2021
[30]

Remaining useful life prediction using a novel feature-attention-based end-to-end approach.IEEE Transactions on Industrial Informatics, 17(2):1197–1207, 2021

Hui Liu, Zhenyu Liu, Weiqiang Jia, and Xianke Lin. Remaining useful life prediction using a novel feature-attention-based end-to-end approach.IEEE Transactions on Industrial Informatics, 17(2):1197–1207, 2021

work page 2021
[31]

Fault knowledge transfer assisted ensemble method for remaining useful life prediction

Pengcheng Xia, Yixiang Huang, Peng Li, Chengliang Liu, and Lun Shi. Fault knowledge transfer assisted ensemble method for remaining useful life prediction. IEEE Transactions on Industrial Informatics, 18(3):1758–1769, 2022

work page 2022
[32]

Predictive maintenance using digital twins: A systematic literature review.Information and Software Technology, 151:107008, 2022

Raymon van Dinter, Bedir Tekinerdogan, and Cagatay Catal. Predictive maintenance using digital twins: A systematic literature review.Information and Software Technology, 151:107008, 2022

work page 2022
[33]

Z. Li, Q. He, and J. Li. A survey of deep learning-driven architecture for predictive maintenance.Engineering Applications of Artificial Intelligence, 133:108285, 2024. Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization 21

work page 2024
[34]

K. Lei, P. Guo, Y. Wang, J. Zhang, X. Meng, and L. Qian. Large-scale dynamic scheduling for flexible job-shop with random arrivals of new jobs by hierarchical reinforcement learning.IEEE Transactions on Industrial Informatics, 20(1):1007–1018, 2024

work page 2024
[35]

Flexible job-shop scheduling via graph neural network and deep reinforcement learning.IEEE Transactions on Industrial Informatics, 19(2):1600–1610, 2023

Wen Song, Xinyang Chen, Qiqiang Li, and Zhiguang Cao. Flexible job-shop scheduling via graph neural network and deep reinforcement learning.IEEE Transactions on Industrial Informatics, 19(2):1600–1610, 2023

work page 2023
[36]

H. Yu, T. Taleb, and J. Zhang. Deep reinforcement learning-based deterministic routing and scheduling for mixed-criticality flows.IEEE Transactions on Industrial Informatics, 19(8):8806–8816, 2023

work page 2023

[1] [1]

Prognostics and health management (phm): Where are we and where do we (need to) go in theory and practice.Reliability Engineering & System Safety, 218:108119, 2022

Enrico Zio. Prognostics and health management (phm): Where are we and where do we (need to) go in theory and practice.Reliability Engineering & System Safety, 218:108119, 2022

work page 2022

[2] [2]

A systematic literature review of predictive maintenance for defence fixed-wing aircraft sustainment and operations.Sensors, 22(18):7070, 2022

Michael J Scott, Wim JC Verhagen, Marie T Bieber, and Pier Marzocca. A systematic literature review of predictive maintenance for defence fixed-wing aircraft sustainment and operations.Sensors, 22(18):7070, 2022

work page 2022

[3] [3]

Dynamic fleet maintenance management model applied to rolling stock.Reliability Engineering & System Safety, 240:109607, 2023

Adolfo Crespo del Castillo, José Antonio Marcos, and Ajith Kumar Parlikad. Dynamic fleet maintenance management model applied to rolling stock.Reliability Engineering & System Safety, 240:109607, 2023

work page 2023

[4] [4]

Adolfo Crespo del Castillo and Ajith Kumar Parlikad. Dynamic fleet management: Integrating predictive and preventive maintenance with operation workload balance to minimise cost.Reliability Engineering & System Safety, 249:110243, 2024

work page 2024

[5] [5]

Robert Meissner, Antonia Rahn, and Kai Wicke. Developing prescriptive maintenance strategies in the aviation industry based on a discrete-event simulation framework for post-prognostics decision making.Reliability Engineering & System Safety, 214:107812, 2021

work page 2021

[6] [6]

Diagnostics and prognostics for complex systems: A review of methods and challenges.Quality and Reliability Engineering International, 37(8):3746–3778, 2021

Morteza Soleimani, Felician Campean, and Daniel Neagu. Diagnostics and prognostics for complex systems: A review of methods and challenges.Quality and Reliability Engineering International, 37(8):3746–3778, 2021

work page 2021

[7] [7]

Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints.Reliability Engineering & System Safety, 212:107551, 2021

Charalampos P Andriotis and Konstantinos G Papakonstantinou. Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints.Reliability Engineering & System Safety, 212:107551, 2021

work page 2021

[8] [8]

Reinforcement learning in reliability and maintenance optimization: A tutorial.Reliability Engineering & System Safety, 251:110401, 2024

Qin Zhang, Yu Liu, Yisha Xiang, and Tangfan Xiahou. Reinforcement learning in reliability and maintenance optimization: A tutorial.Reliability Engineering & System Safety, 251:110401, 2024

work page 2024

[9] [9]

A survey on reinforcement learning in aviation applications.Engineering Applications of Artificial Intelligence, 136:108911, 2024

Pouria Razzaghi, Amin Tabrizian, Wei Guo, Shulu Chen, Abenezer Taye, Ellis Thompson, Alexis Bregeon, Ali Baheri, and Peng Wei. A survey on reinforcement learning in aviation applications.Engineering Applications of Artificial Intelligence, 136:108911, 2024

work page 2024

[10] [10]

Reinforcement learning for dynamic condition-based maintenance of a system with individually repairable components.Quality Engineering, 32(3):388–408, 2020

Nima Yousefi, Sotirios Tsianikas, and David W Coit. Reinforcement learning for dynamic condition-based maintenance of a system with individually repairable components.Quality Engineering, 32(3):388–408, 2020

work page 2020

[11] [11]

Yunfei Zhao and Carol Smidts. Reinforcement learning for adaptive maintenance policy optimization under imperfect knowledge of the system degradation model and partial observability of system states.Reliability Engineering & System Safety, 224:108541, 2022

work page 2022

[12] [12]

Nailong Zhang and Wujun Si. Deep reinforcement learning for condition-based maintenance planning of multi-component systems under dependent competing risks.Reliability Engineering & System Safety, 203:107094, 2020

work page 2020

[13] [13]

Iordanis Tseremoglou and Bruno F. Santos. Condition-based maintenance scheduling of an aircraft fleet under partial observability: A deep reinforcement learning approach.Reliability Engineering & System Safety, 241:109582, 2024

work page 2024

[14] [14]

Zhang, B

Y. Zhang, B. Cai, C. Gao, Y. Zhao, X. Shao, and C. Yang. A system-centred predictive maintenance re-optimization method based on multi-agent deep reinforcement learning.Expert Systems with Applications, 274:127034, 2025

work page 2025

[15] [15]

Reinforcement learning for predictive maintenance: A systematic technical review.Artificial Intelligence Review, 56(11):12885–12947, 2023

Rajesh Siraskar, Satish Kumar, Shruti Patil, Arunkumar Bongale, and Ketan Kotecha. Reinforcement learning for predictive maintenance: A systematic technical review.Artificial Intelligence Review, 56(11):12885–12947, 2023

work page 2023

[16] [16]

Hierarchical reinforcement learning: A comprehensive survey.ACM Computing Surveys (CSUR), 54(5):1–35, 2021

Shubham Pateria, Budhitama Subagdja, Ah-hwee Tan, and Chai Quek. Hierarchical reinforcement learning: A comprehensive survey.ACM Computing Surveys (CSUR), 54(5):1–35, 2021

work page 2021

[17] [17]

Multi-agent deep reinforcement learning: a survey.Artificial Intelligence Review, 55(2):895–943, 2022

Sven Gronauer and Klaus Diepold. Multi-agent deep reinforcement learning: a survey.Artificial Intelligence Review, 55(2):895–943, 2022

work page 2022

[18] [18]

Alma: Hierarchical learning for composite multi-agent tasks

Shariq Iqbal, Robby Costales, and Fei Sha. Alma: Hierarchical learning for composite multi-agent tasks. InAdvances in Neural Information Processing Systems, volume 35, pages 7155–7166, 2022

work page 2022

[19] [19]

Prognostics and health management: A review from the perspectives of design, development and decision.Reliability Engineering & System Safety, 217:108063, 2022

Yang Hu, Xuewen Miao, Yong Si, Ershun Pan, and Enrico Zio. Prognostics and health management: A review from the perspectives of design, development and decision.Reliability Engineering & System Safety, 217:108063, 2022

work page 2022

[20] [20]

Explainable artificial intelligence for fault diagnosis of industrial processes.IEEE Transactions on Industrial Informatics, 21:4–11, 2025

Kyojin Jang, Karl Ezra Salgado Pilario, Nayoung Lee, Il Moon, and Jonggeol Na. Explainable artificial intelligence for fault diagnosis of industrial processes.IEEE Transactions on Industrial Informatics, 21:4–11, 2025

work page 2025

[21] [21]

Maintenance planning recommendation of complex industrial equipment based on knowledge graph and graph neural network.Reliability Engineering & System Safety, 232:109068, 2023

Liqiao Xia, Yongshi Liang, Jiewu Leng, and Pai Zheng. Maintenance planning recommendation of complex industrial equipment based on knowledge graph and graph neural network.Reliability Engineering & System Safety, 232:109068, 2023

work page 2023

[22] [22]

Catarina Silva, Pedro Andrade, Bernardete Ribeiro, and Bruno F. Santos. Adaptive reinforcement learning for task scheduling in aircraft maintenance.Scientific Reports, 13(1):16605, 2023

work page 2023

[23] [23]

Meimei Zheng, Zhiyun Su, Dong Wang, and Ershun Pan. Joint maintenance and spare part ordering from multiple suppliers for multicomponent systems using a deep reinforcement learning algorithm.Reliability Engineering & System Safety, 241:109628, 2024

work page 2024

[24] [24]

Optimization of multi-echelon spare parts inventory systems using multi-agent deep reinforcement learning

Yifan Zhou, Kai Guo, Cheng Yu, and Zhisheng Zhang. Optimization of multi-echelon spare parts inventory systems using multi-agent deep reinforcement learning. Applied Mathematical Modelling, 125:827–844, 2024

work page 2024

[25] [25]

Reinforcement learning-driven maintenance strategy: A novel solution for long-term aircraft maintenance decision optimization.Computers & Industrial Engineering, 153:107056, 2021

Yang Hu, Xuewen Miao, Jun Zhang, Jie Liu, and Ershun Pan. Reinforcement learning-driven maintenance strategy: A novel solution for long-term aircraft maintenance decision optimization.Computers & Industrial Engineering, 153:107056, 2021

work page 2021

[26] [26]

Deep reinforcement learning for predictive aircraft maintenance using probabilistic remaining-useful-life prognostics.Reliability Engineering & System Safety, 230:108908, 2023

Lennart Lee and Mihaela Mitici. Deep reinforcement learning for predictive aircraft maintenance using probabilistic remaining-useful-life prognostics.Reliability Engineering & System Safety, 230:108908, 2023

work page 2023

[27] [27]

Joint optimization of maintenance and quality inspection for manufacturing networks based on deep reinforcement learning.Reliability Engineering & System Safety, 245:109290, 2024

Ye, Cai, Yang, Si, and Zhou. Joint optimization of maintenance and quality inspection for manufacturing networks based on deep reinforcement learning.Reliability Engineering & System Safety, 245:109290, 2024

work page 2024

[28] [28]

Jian Zuo, Nadia Yousfi Steiner, Zhongliang Li, Catherine Cadet, Christophe Bérenguer, and Daniel Hissel. Reinforcement learning-based maintenance scheduling for a stochastic deteriorating fuel cell considering stack-to-stack heterogeneity.Reliability Engineering & System Safety, 247:110700, 2024

work page 2024

[29] [29]

An intelligent preventive maintenance method based on reinforcement learning for battery energy storage systems.IEEE Transactions on Industrial Informatics, 17(12):8254–8264, 2021

Qilong Wu, Qiang Feng, Yi Ren, Quan Xia, Zhen Wang, and Bingqian Cai. An intelligent preventive maintenance method based on reinforcement learning for battery energy storage systems.IEEE Transactions on Industrial Informatics, 17(12):8254–8264, 2021

work page 2021

[30] [30]

Remaining useful life prediction using a novel feature-attention-based end-to-end approach.IEEE Transactions on Industrial Informatics, 17(2):1197–1207, 2021

Hui Liu, Zhenyu Liu, Weiqiang Jia, and Xianke Lin. Remaining useful life prediction using a novel feature-attention-based end-to-end approach.IEEE Transactions on Industrial Informatics, 17(2):1197–1207, 2021

work page 2021

[31] [31]

Fault knowledge transfer assisted ensemble method for remaining useful life prediction

Pengcheng Xia, Yixiang Huang, Peng Li, Chengliang Liu, and Lun Shi. Fault knowledge transfer assisted ensemble method for remaining useful life prediction. IEEE Transactions on Industrial Informatics, 18(3):1758–1769, 2022

work page 2022

[32] [32]

Predictive maintenance using digital twins: A systematic literature review.Information and Software Technology, 151:107008, 2022

Raymon van Dinter, Bedir Tekinerdogan, and Cagatay Catal. Predictive maintenance using digital twins: A systematic literature review.Information and Software Technology, 151:107008, 2022

work page 2022

[33] [33]

Z. Li, Q. He, and J. Li. A survey of deep learning-driven architecture for predictive maintenance.Engineering Applications of Artificial Intelligence, 133:108285, 2024. Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization 21

work page 2024

[34] [34]

K. Lei, P. Guo, Y. Wang, J. Zhang, X. Meng, and L. Qian. Large-scale dynamic scheduling for flexible job-shop with random arrivals of new jobs by hierarchical reinforcement learning.IEEE Transactions on Industrial Informatics, 20(1):1007–1018, 2024

work page 2024

[35] [35]

Flexible job-shop scheduling via graph neural network and deep reinforcement learning.IEEE Transactions on Industrial Informatics, 19(2):1600–1610, 2023

Wen Song, Xinyang Chen, Qiqiang Li, and Zhiguang Cao. Flexible job-shop scheduling via graph neural network and deep reinforcement learning.IEEE Transactions on Industrial Informatics, 19(2):1600–1610, 2023

work page 2023

[36] [36]

H. Yu, T. Taleb, and J. Zhang. Deep reinforcement learning-based deterministic routing and scheduling for mixed-criticality flows.IEEE Transactions on Industrial Informatics, 19(8):8806–8816, 2023

work page 2023