arxiv: 2604.22777 · v1 · submitted 2026-04-03 · 💻 cs.AI · cs.LG

Recognition: no theorem link

An Intelligent Fault Diagnosis Method for General Aviation Aircraft Based on Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement

Zhihuan Wei , Yang Hu , Xinhang Chen , Yiming Zhang , Jie Liu , Wei Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:13 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords fault diagnosisdigital twingeneral aviationFMEAresidual featuresmulti-fidelity simulationaircraft enginemachine learning

0 comments

The pith

Multi-fidelity digital twin with paired-mirror residuals achieves 96.2 percent Macro-F1 on 20-class aircraft engine fault diagnosis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a fault diagnosis system for general aviation aircraft engines that first creates synthetic training data through a high-fidelity flight simulator and systematic fault injection drawn from failure mode analysis. It then isolates fault effects by subtracting nominal mirror trajectories from observed signals and supplements this with a fast GRU-based predictor for real-time use. A convolutional classifier operates on the resulting residual features, while an LLM component turns the outputs into readable reports that include causal explanations. Experiments across many design variants show that the quality of these residual signals accounts for most of the diagnostic success, far outweighing changes to the classifier itself.

Core claim

A digital twin built on the JSBSim six-degree-of-freedom engine generates 23-channel sensor data for 19 engine fault types via FMEA-modeled injection; paired-mirror residuals extract clean fault deviations from high-fidelity nominal trajectories, while a GRU surrogate enables low-fidelity online residuals; a 1D-CNN then classifies 20 fault classes, and an FMEA-augmented LLM produces natural-language reports. The paired-mirror scheme reaches 96.2 percent Macro-F1, the GRU surrogate delivers 4.3 times faster inference at 0.6 percent accuracy cost, and residual feature quality contributes roughly five times more to performance than classifier architecture.

What carries the argument

Multi-fidelity residual computation framework that subtracts nominal mirror trajectories from observed signals to isolate fault deviations and uses a GRU surrogate for real-time approximation.

If this is right

Residual feature quality contributes approximately five times more to diagnostic performance than classifier architecture choice.
Paired-mirror residuals reach 96.2 percent Macro-F1 on the twenty-class fault task.
GRU surrogate residuals enable 4.3 times faster inference with only 0.6 percent accuracy loss.
FMEA knowledge integrated into an LLM produces interpretable natural-language diagnostic reports.
Synthetic data from physics-based simulation overcomes scarcity of real fault examples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar residual-based digital-twin methods could be applied to other sensor-rich mechanical systems where real fault data are rare.
Prioritizing accurate physics simulation over classifier tuning may be a higher-leverage design choice in data-scarce domains.
Combining domain knowledge graphs such as FMEA with language models offers a route to auditable explanations in safety-critical monitoring.
Direct comparison of simulated versus flight-recorded residuals on the same aircraft would quantify how much simulation fidelity limits real-world transfer.

Load-bearing premise

The JSBSim flight dynamics engine plus semi-empirical sensor equations faithfully reproduce the real signatures and propagation of the modeled engine faults.

What would settle it

A large drop in diagnostic accuracy when the trained model is tested on sensor recordings from actual general aviation flights that contain documented engine faults.

read the original abstract

Fault diagnosis of general aviation aircraft faces challenges including scarce real fault data, diverse fault types, and weak fault signatures. This paper proposes an intelligent fault diagnosis framework based on multi-fidelity digital twin, integrating four modules: high-fidelity flight dynamics simulation, FMEA-driven fault injection, multi-fidelity residual feature extraction, and large language model (LLM)-enhanced interpretable report generation. A digital twin is constructed using the JSBSim six-degree-of-freedom (6-DoF) flight dynamics engine, generating 23-channel engine health monitoring data via semi-empirical sensor synthesis equations. A three-layer fault injection engine based on failure mode and effects analysis (FMEA) models the physical causal propagation of 19 engine fault types. A multi-fidelity residual computation framework comprising paired-mirror residuals and GRU surrogate prediction residuals is proposed: the high-fidelity path obtains clean fault deviation signals using nominal mirror trajectories with identical initial conditions, while the low-fidelity path achieves online real-time residual computation through a multi-step prediction GRU surrogate model. A 1D-CNN classifier performs end-to-end diagnosis of 20 fault classes. An LLM diagnostic report engine enhanced with FMEA knowledge fuses classification results, residual evidence, and domain causal knowledge to generate interpretable natural language reports. Experiments show the paired-mirror residual scheme achieves a Macro-F1 of 96.2% on the 20-class task, while the GRU surrogate scheme achieves 4.3x inference acceleration at only 0.6% performance cost. Comparison across 24 schemes reveals that residual feature quality contributes approximately 5x more to diagnostic performance than classifier architecture, establishing the "residual quality first" design principle.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable simulation pipeline for GA engine fault diagnosis but its performance numbers rest entirely on uncalibrated JSBSim data.

read the letter

The main takeaway is a concrete pipeline that builds a JSBSim-based digital twin, injects 19 fault types through an FMEA model, extracts residuals via paired-mirror trajectories or a GRU surrogate, classifies them with a 1D-CNN, and feeds the output plus causal knowledge into an LLM for readable reports. It reports 96.2% Macro-F1 on the 20-class task and shows the GRU version runs 4.3 times faster with almost no accuracy loss. Across 24 scheme comparisons it finds residual quality matters far more than classifier architecture, which is a useful practical observation for anyone building similar systems under data scarcity.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a multi-fidelity digital twin framework for fault diagnosis of general aviation aircraft engines. It constructs a JSBSim 6-DoF simulator with semi-empirical sensor synthesis, injects 19 fault types via a three-layer FMEA engine, extracts residuals via paired-mirror high-fidelity paths and GRU surrogate low-fidelity paths, classifies 20 classes with a 1D-CNN, and generates LLM-based interpretable reports. On synthetic 23-channel trajectories the paired-mirror scheme reports 96.2% Macro-F1 while the GRU surrogate delivers 4.3x inference speedup at 0.6% accuracy cost; ablation across 24 schemes concludes residual feature quality contributes approximately 5x more to performance than classifier choice.

Significance. If the JSBSim-plus-semi-empirical model faithfully reproduces real fault signatures and sensor dynamics, the framework would offer a practical route to overcome scarce real fault data in general aviation. The explicit comparison of residual versus architecture contributions supplies a concrete design principle that could transfer to other simulation-augmented diagnostic tasks. The LLM report module adds interpretability that is often missing in black-box classifiers.

major comments (2)

[Abstract and Experimental Results] Abstract and Experimental Results section: All reported metrics (96.2% Macro-F1, 4.3x acceleration, 5x residual-quality dominance) are obtained exclusively on trajectories generated by the same JSBSim 6-DoF engine and semi-empirical sensor equations used to create the training set. No comparison against real flight-recorder data, engine test-cell measurements, or held-out real-fault subsets is presented, leaving the central claim of effective diagnosis unsupported by external evidence.
[Multi-fidelity residual computation framework] Multi-fidelity residual computation framework: The paired-mirror residuals are defined by subtracting nominal mirror trajectories that are themselves produced by the identical digital-twin model. Consequently the “clean fault deviation signals” are not independent measurements but quantities internal to the simulator; any mismatch between simulated and real sensor dynamics or noise statistics directly affects the reported performance and the “residual quality first” conclusion.

minor comments (2)

[Methodology] The GRU surrogate hyperparameters (layer count, hidden size, prediction horizon) and the precise 1D-CNN architecture (kernel sizes, channel counts) are not tabulated, impeding exact reproduction of the 4.3x speedup result.
[LLM diagnostic report engine] No quantitative evaluation (e.g., human expert rating or factual-consistency metric) of the LLM-generated diagnostic reports is supplied, so the claim of “interpretable natural language reports” remains qualitative.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of simulation-based validation that we will address through targeted revisions to improve clarity and transparency. We respond point by point below.

read point-by-point responses

Referee: [Abstract and Experimental Results] Abstract and Experimental Results section: All reported metrics (96.2% Macro-F1, 4.3x acceleration, 5x residual-quality dominance) are obtained exclusively on trajectories generated by the same JSBSim 6-DoF engine and semi-empirical sensor equations used to create the training set. No comparison against real flight-recorder data, engine test-cell measurements, or held-out real-fault subsets is presented, leaving the central claim of effective diagnosis unsupported by external evidence.

Authors: We acknowledge that all quantitative results are derived from trajectories generated by the JSBSim 6-DoF simulator with semi-empirical sensor synthesis. This is a deliberate design choice driven by the well-documented scarcity of real fault data in general aviation, which is stated as the core motivation in the introduction. The framework is intended to enable diagnosis when real fault examples are unavailable. To address the concern, we will revise the Experimental Results section to include an expanded discussion of JSBSim fidelity, citing prior validation studies against known engine dynamics and sensor models. We will also add a dedicated limitations subsection that explicitly notes the absence of real-world data comparisons and outlines future work involving test-cell or flight-test data. These changes will clarify the scope of the current claims without overstating generalizability. revision: partial
Referee: [Multi-fidelity residual computation framework] Multi-fidelity residual computation framework: The paired-mirror residuals are defined by subtracting nominal mirror trajectories that are themselves produced by the identical digital-twin model. Consequently the “clean fault deviation signals” are not independent measurements but quantities internal to the simulator; any mismatch between simulated and real sensor dynamics or noise statistics directly affects the reported performance and the “residual quality first” conclusion.

Authors: We agree that the paired-mirror residuals are generated internally by subtracting nominal trajectories produced by the same digital-twin model. This construction is intentional: identical initial conditions and dynamics allow isolation of fault-induced deviations without confounding effects from varying flight conditions or external disturbances. The ablation across 24 schemes, which supports the “residual quality first” principle, is performed entirely within this controlled environment. We will revise the Multi-fidelity residual computation framework section to explicitly state these modeling assumptions, discuss the simulator-reality gap, and suggest mitigation approaches such as domain randomization or future domain adaptation. This revision will improve transparency regarding the internal nature of the residuals. revision: partial

standing simulated objections not resolved

Direct quantitative validation on real flight-recorder data or engine test-cell measurements cannot be provided, as such labeled fault data is not publicly available and its acquisition lies outside the scope of the current study.

Circularity Check

1 steps flagged

Residual features derived from same JSBSim digital twin that generates all training data

specific steps

self definitional [Abstract (multi-fidelity residual computation framework)]
"the high-fidelity path obtains clean fault deviation signals using nominal mirror trajectories with identical initial conditions, while the low-fidelity path achieves online real-time residual computation through a multi-step prediction GRU surrogate model. A digital twin is constructed using the JSBSim six-degree-of-freedom (6-DoF) flight dynamics engine, generating 23-channel engine health monitoring data via semi-empirical sensor synthesis equations."

Deviation signals are computed as (faulty JSBSim trajectory minus nominal mirror JSBSim trajectory). Because the training set consists of the same faulty JSBSim trajectories, the residual features are algebraically defined relative to the identical model that supplies both the inputs and the 20-class labels, rendering the classification performance tautological within the simulator.

full rationale

The paper constructs both the training trajectories and the residual features from the identical JSBSim 6-DoF simulator plus semi-empirical sensor equations. High-fidelity residuals are defined as direct differences against nominal mirror runs from that same simulator, so the input features to the 1D-CNN are guaranteed to contain the exact fault-injection signatures used to label the data. This makes the reported 96.2% Macro-F1 and the 'residual quality first' ranking direct consequences of the closed simulation loop rather than an independent derivation. No external real-flight data or cross-validation against held-out physical measurements is cited to break the loop.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on the fidelity of the JSBSim simulation and the completeness of the FMEA fault models, both treated as domain assumptions without independent real-data grounding in the abstract.

free parameters (2)

GRU surrogate hyperparameters
Parameters of the multi-step prediction GRU model are fitted to simulation data to approximate high-fidelity residuals.
Fault injection magnitudes
Specific severity levels and propagation parameters for the 19 FMEA-modeled faults are chosen within the three-layer injection engine.

axioms (2)

domain assumption JSBSim 6-DoF engine produces accurate nominal trajectories and sensor readings for the target aircraft class.
Invoked to generate clean fault deviation signals via paired-mirror trajectories.
domain assumption FMEA analysis enumerates all relevant engine fault types and their physical causal effects.
Used to drive the fault injection engine covering 19 fault types.

pith-pipeline@v0.9.0 · 5627 in / 1760 out tokens · 39029 ms · 2026-05-13T20:13:23.249910+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

A new method for fault detection of aero-engine based on isolation forest

Wang HF, Jiang W, Deng XY, et al. A new method for fault detection of aero-engine based on isolation forest. Measurement 2021;185:110064

work page 2021
[2]

Addressing class-imbalanced learning in real-time aero-engine gas-path fault diagnosis via feature filtering and mapping

Liao ZB, Zhan KY, Zhao H, et al. Addressing class-imbalanced learning in real-time aero-engine gas-path fault diagnosis via feature filtering and mapping. Reliability Engineering & System Safety 2024;249:110191

work page 2024
[3]

An overview on how failure analysis contributes to flight safety in the Portuguese Air Force

Duarte D, Marado B, Nogueira J, et al. An overview on how failure analysis contributes to flight safety in the Portuguese Air Force. Engineering Failure Analysis 2016;65:86-101

work page 2016
[4]

Data-driven fault detection, isolation and estimation of aircraft gas turbine engine actuator and sensors

Naderi E, Khorasani K. Data-driven fault detection, isolation and estimation of aircraft gas turbine engine actuator and sensors. Mechanical Systems and Signal Processing 2018;100:415-438

work page 2018
[5]

Framework for offline data-driven aircraft fault diagnosis

Coutinho PF, Ramos RGF, Ribeiro AMR. Framework for offline data-driven aircraft fault diagnosis. Journal of Aerospace Information Systems 2023;20(3):127-141

work page 2023
[6]

Gas path health monitoring for a turbofan engine based on a nonlinear filtering approach

Lu F, Huang JQ, Lv YQ. Gas path health monitoring for a turbofan engine based on a nonlinear filtering approach. Energies 2013;6(1):492-513

work page 2013
[7]

Gas path fault detection and isolation for aero-engine based on LSTM-DAE approach under multiple-model architecture

Wang K, Guo YQ, Zhao WL. Gas path fault detection and isolation for aero-engine based on LSTM-DAE approach under multiple-model architecture. Measurement 2023;210:112560

work page 2023
[8]

Prognostics and health management: A review from the perspectives of design, development and decision

Hu Y, Miao XW, Si Y, et al. Prognostics and health management: A review from the perspectives of design, development and decision. Reliability Engineering & System Safety 2022;217:108063

work page 2022
[9]

Performance-analysis-based gas turbine diagnostics: A review

Li YG. Performance-analysis-based gas turbine diagnostics: A review. Proceedings of the Institution of Mechanical Engineers, Part A: Journal of Power and Energy 2002;216(5):363-377

work page 2002
[10]

Applications of machine learning to machine fault diagnosis: A review and roadmap

Lei YG, Yang B, Jiang XW, et al. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mechanical Systems and Signal Processing 2020;138:106587

work page 2020
[11]

Digital twin-assisted imbalanced fault diagnosis framework using subdomain adaptive mechanism and margin-aware regularization

Yan S, Zhong X, Shao HD, et al. Digital twin-assisted imbalanced fault diagnosis framework using subdomain adaptive mechanism and margin-aware regularization. Reliability Engineering & System Safety 2023;239:109469

work page 2023
[12]

Supervised contrastive learning-based domain adaptation network for intelligent unsupervised fault diagnosis of rolling bearing

Zhang YC, Ren ZH, Zhou SH, et al. Supervised contrastive learning-based domain adaptation network for intelligent unsupervised fault diagnosis of rolling bearing. IEEE/ASME Transactions on Mechatronics 2022;27(6):5371-5380

work page 2022
[13]

Intelligent fault diagnosis of machinery using digital twin-assisted deep transfer learning

Xia M, Shao HD, Williams D, et al. Intelligent fault diagnosis of machinery using digital twin-assisted deep transfer learning. Reliability Engineering & System Safety 2021;215:107938

work page 2021
[14]

A novel digital twin approach based on deep multimodal information fusion for aero-engine fault diagnosis

Huang YF, Tao J, Sun G, et al. A novel digital twin approach based on deep multimodal information fusion for aero-engine fault diagnosis. Energy 2023;270:126894

work page 2023
[15]

Recent progress in digital twin-driven fault diagnosis of rotating machinery: A comprehensive review

Zhang PB, Chen RX, Yang LX, et al. Recent progress in digital twin-driven fault diagnosis of rotating machinery: A comprehensive review. Neurocomputing 2025;625:129467. 18

work page 2025
[16]

Digital twin-assisted enhanced meta-transfer learning for rolling bearing fault diagnosis

Ma C, Zhan XW, Shi HT, et al. Digital twin-assisted enhanced meta-transfer learning for rolling bearing fault diagnosis. Mechanical Systems and Signal Processing 2023;200:110490

work page 2023
[17]

Digital twin-driven partial domain adaptation network for intelligent fault diagnosis of rolling bearing

Li CJ, Li SY, Zhang AS, et al. Digital twin-driven partial domain adaptation network for intelligent fault diagnosis of rolling bearing. Reliability Engineering & System Safety 2023;234:109186

work page 2023
[18]

Digital twin for rotating machinery fault diagnosis in smart manufacturing

Wang JJ, Ye LK, Gao RX, et al. Digital twin for rotating machinery fault diagnosis in smart manufacturing. International Journal of Production Research 2019;57(12):3920-3934

work page 2019
[19]

Digital twin-assisted multiscale residual-self-attention feature fusion network for hypersonic flight vehicle fault diagnosis

Dong YT, Jiang HK, Wu ZH, et al. Digital twin-assisted multiscale residual-self-attention feature fusion network for hypersonic flight vehicle fault diagnosis. Reliability Engineering & System Safety 2023;235:109253

work page 2023
[20]

Graph structure embedded with physical constraints-based information fusion network for interpretable fault diagnosis of aero-engine

Huang YF, Tao J, Zhao JY, et al. Graph structure embedded with physical constraints-based information fusion network for interpretable fault diagnosis of aero-engine. Energy 2023;283:129120

work page 2023
[21]

A digital twin-enhanced semi-supervised framework for motor fault diagnosis based on phase-contrastive current dot pattern

Xia PC, Huang YX, Tao ZY, et al. A digital twin-enhanced semi-supervised framework for motor fault diagnosis based on phase-contrastive current dot pattern. Reliability Engineering & System Safety 2023;235:109224

work page 2023
[22]

Fault detection and isolation of aeroengine combustion chamber based on unscented Kalman filter method fusing artificial neural network

Long ZH, Bai MJ, Ren MH, et al. Fault detection and isolation of aeroengine combustion chamber based on unscented Kalman filter method fusing artificial neural network. Energy 2023;272:127117

work page 2023
[23]

Towards trustworthy machine fault diagnosis: A probabilistic Bayesian deep learning framework

Zhou TY, Han T, Droguett EL. Towards trustworthy machine fault diagnosis: A probabilistic Bayesian deep learning framework. Reliability Engineering & System Safety 2022;224:108525

work page 2022
[24]

Probing a novel machine tool fault reasoning and maintenance service recommendation approach through data-knowledge empowered LLMs integrated with AR-assisted maintenance guidance

Liu C, Song J, Tang D, et al. Probing a novel machine tool fault reasoning and maintenance service recommendation approach through data-knowledge empowered LLMs integrated with AR-assisted maintenance guidance. Advanced Engineering Informatics 2025;65:103460

work page 2025
[25]

LMPHM: Fault inference diagnosis based on causal network and large language model- enhanced knowledge graph network

Guo Z, Wan L, Wang YQ, et al. LMPHM: Fault inference diagnosis based on causal network and large language model- enhanced knowledge graph network. Chinese Journal of Mechanical Engineering 2025;(in press)

work page 2025