Quantization Impact on the Accuracy and Communication Efficiency Trade-off in Federated Learning for Aerospace Predictive Maintenance

Abdelkarim Loukili

arxiv: 2604.08474 · v1 · submitted 2026-04-09 · 💻 cs.LG

Quantization Impact on the Accuracy and Communication Efficiency Trade-off in Federated Learning for Aerospace Predictive Maintenance

Abdelkarim Loukili This is my paper

Pith reviewed 2026-05-10 18:02 UTC · model grok-4.3

classification 💻 cs.LG

keywords federated learningquantizationpredictive maintenanceaerospacenon-IIDcommunication efficiencygradient quantizationC-MAPSS

0 comments

The pith

INT4 quantization in federated learning preserves accuracy for aerospace predictive maintenance while reducing communication costs eightfold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the effect of lowering the bit precision of gradient updates in a federated learning setup for predicting when aircraft engines need maintenance. It demonstrates that 4-bit integer quantization produces predictions statistically equivalent to full 32-bit precision on the NASA C-MAPSS benchmarks, yet requires only one-eighth the data to transmit between devices and the central server each training round. A sympathetic reader would care because this trade-off directly addresses the bandwidth limits of onboard aircraft sensors and enables more practical deployment of privacy-preserving models across fleets. The evaluation also highlights that realistic non-uniform data distributions across clients expose instabilities in even lower precision that uniform test partitions conceal.

Core claim

Using a custom lightweight 1-D convolutional model called AeroConv1D with under 10,000 parameters, the work shows through multi-seed experiments that symmetric uniform 4-bit quantization yields mean absolute error and NASA scores on FD001 and FD002 datasets that are statistically indistinguishable from 32-bit floating point results, while reducing gradient communication volume by a factor of eight from 37.88 KiB to 4.73 KiB per round. It further establishes that 2-bit quantization, although sometimes lowering average error, produces highly variable NASA scores under non-IID conditions, rendering it unreliable. The analysis includes direct comparisons showing that IID client splits mask these

What carries the argument

Symmetric uniform quantization of gradients at varying bit widths applied during federated averaging on the AeroConv1D model under Non-IID partitioning of C-MAPSS data, which quantifies the accuracy-efficiency trade-off in the federated setting.

If this is right

INT4 enables deployment on bandwidth-limited IoT nodes in aerospace without compromising predictive performance.
The Non-IID evaluation protocol is required to accurately assess quantization stability in operational settings.
FPGA resource estimates indicate that INT4 supports full on-chip federated learning pipelines.
Lower precision training can be integrated into existing FL frameworks for similar maintenance tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending this quantization strategy to other sensor-based prediction problems in transportation or manufacturing could yield similar efficiency gains.
The reduced communication might permit increasing the number of participating clients per round, potentially enhancing model generalization across diverse fleet conditions.
Future work could test adaptive quantization levels that adjust based on detected data heterogeneity.

Load-bearing premise

The specific Non-IID partitioning of the C-MAPSS dataset and the chosen statistical significance tests represent the heterogeneity and variability found in actual aerospace fleet operations.

What would settle it

Conducting the same federated training experiments using real sensor data collected from a fleet of aircraft with documented variations in usage and maintenance history, checking if the p-values for equivalence remain above 0.05.

Figures

Figures reproduced from arXiv: 2604.08474 by Abdelkarim Loukili.

**Figure 2.** Figure 2: MAE by subset and quantization level. Hatching indicates FD002 (6 operating [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: NASA score S convergence on C-MAPSS FD001. Lower is better. INT2 early-round values reach 109 and are off-scale; the y-axis is clipped for readability. Negative values arise when systematic under-prediction dominates; INT2 oscillates between extreme positive and negative scores, illustrating non-reproducibility. Verdict. INT2 is unsuitable for aerospace RUL regression not because of uniform accuracy degrad… view at source ↗

**Figure 4.** Figure 4: Gradient-distortion privacy proxy Lpriv on FD001 (log scale) over 20 FL rounds. Higher values indicate greater gradient distortion and higher gradient-inversion attack cost [4, 20]. FP32 is omitted (Lpriv = 0 by definition). Lpriv is not a formal DP bound; see Section 3.5. 5.5 Accuracy–Communication Trade-off [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Accuracy–communication trade-off on FD001. Error bars: [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Federated learning (FL) enables privacy-preserving predictive maintenance across distributed aerospace fleets, but gradient communication overhead constrains deployment on bandwidth-limited IoT nodes. This paper investigates the impact of symmetric uniform quantization ($b \in \{32,8,4,2\}$ bits) on the accuracy--efficiency trade-off of a custom-designed lightweight 1-D convolutional model (AeroConv1D, 9\,697 parameters) trained via FL on the NASA C-MAPSS benchmark under a realistic Non-IID client partition. Using a rigorous multi-seed evaluation ($N=10$ seeds), we show that INT4 achieves accuracy \emph{statistically indistinguishable} from FP32 on both FD001 ($p=0.341$) and FD002 ($p=0.264$ MAE, $p=0.534$ NASA score) while delivering an $8\times$ reduction in gradient communication cost (37.88~KiB $\to$ 4.73~KiB per round). A key methodological finding is that na\"ive IID client partitioning artificially suppresses variance; correct Non-IID evaluation reveals the true operational instability of extreme quantization, demonstrated via a direct empirical IID vs.\ Non-IID comparison. INT2 is empirically characterized as unsuitable: while it achieves lower MAE on FD002 through extreme quantization-induced over-regularization, this apparent gain is accompanied by catastrophic NASA score instability (CV\,=\,45.8\% vs.\ 22.3\% for FP32), confirming non-reproducibility under heterogeneous operating conditions. Analytical FPGA resource projections on the Xilinx ZCU102 confirm that INT4 fits within hardware constraints (85.5\% DSP utilization), potentially enabling a complete FL pipeline on a single SoC. The full simulation codebase and FPGA estimation scripts are publicly available at https://github.com/therealdeadbeef/aerospace-fl-quantization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

INT4 cuts communication 8x in this aerospace FL setup with accuracy close to FP32 under Non-IID, but the indistinguishability claim rests on non-significant p-values rather than equivalence tests.

read the letter

The key point here is that INT4 quantization delivers an 8x drop in communication cost for federated learning on the C-MAPSS dataset while keeping accuracy close to full precision, at least under their Non-IID setup. They also show INT2 causes too much instability in the NASA score. The work stands out for its direct IID versus Non-IID comparison, which demonstrates how standard IID splits can mask quantization issues. Running 10 seeds and reporting p-values adds some rigor, and making the code and FPGA scripts available is helpful for anyone wanting to reproduce or extend it. The lightweight model and hardware projections tie the results to actual deployment constraints on something like the ZCU102. The soft spot is the interpretation of those p-values. Saying the accuracies are statistically indistinguishable because p=0.341 or 0.264 does not prove equivalence; it only shows the test did not detect a difference. With N=10, the power is limited for spotting small but meaningful shifts in MAE or NASA score. Equivalence testing or confidence intervals on the differences would strengthen that part. The specific way they created the Non-IID clients is another area where more detail on the partitioning logic would let readers judge how well it represents real aerospace fleets. This paper is aimed at researchers and engineers working on federated learning for predictive maintenance in bandwidth-limited settings, especially those dealing with heterogeneous data across machines. Anyone evaluating quantization for FL will find the empirical trade-off data and the methodological note on IID assumptions useful. It is worth sending for peer review. The empirical focus and public code make it a solid candidate, even if the statistical claims need tightening.

Referee Report

2 major / 2 minor

Summary. The paper investigates the impact of symmetric uniform quantization (b in {32,8,4,2} bits) on accuracy-efficiency trade-offs for a lightweight 1-D convolutional model (AeroConv1D, 9697 parameters) in federated learning on NASA C-MAPSS data under Non-IID partitioning. Using N=10 seeds, it claims INT4 yields accuracy statistically indistinguishable from FP32 (p=0.341 on FD001; p=0.264 MAE and p=0.534 NASA score on FD002) with 8x lower gradient communication (37.88 KiB to 4.73 KiB per round), warns that IID partitioning suppresses variance, shows INT2 causes high instability (CV=45.8% on NASA score), and provides FPGA projections for Xilinx ZCU102.

Significance. If the statistical claims are strengthened, the work offers actionable guidance for bandwidth-constrained FL deployment in aerospace predictive maintenance, with strengths in multi-seed evaluation, explicit IID vs. Non-IID comparison, open codebase, and hardware feasibility estimates. The methodological warning on partitioning is a useful contribution for realistic FL benchmarking.

major comments (2)

[Abstract and Results section] Abstract and Results: The central claim that INT4 accuracy is 'statistically indistinguishable' from FP32 rests on non-significant p-values from standard difference tests (p=0.341 on FD001; p=0.264/0.534 on FD002) with N=10 seeds. These demonstrate failure to reject the null but provide no equivalence bounds, TOST results, or effect-size confidence intervals. With modest power for detecting small MAE shifts (e.g., 1-2%), this does not securely support the indistinguishability assertion for operational aerospace use.
[Methods/Experimental Setup] Methods/Experimental Setup: The Non-IID client partitioning of C-MAPSS is presented as realistic, with a direct IID vs. Non-IID comparison showing suppressed variance under IID. However, the specific partitioning mechanism (e.g., how operating conditions or sensor distributions are assigned across clients) is not detailed enough to evaluate whether it captures real fleet heterogeneity, which is load-bearing for the methodological finding and generalizability.

minor comments (2)

[Abstract] Abstract: Specify the exact statistical test (e.g., paired t-test) used to compute the reported p-values for reproducibility.
[Results] The paper should report full confidence intervals or standard deviations alongside means for all metrics to allow readers to assess practical significance beyond p-values.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to improve our manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract and Results section] Abstract and Results: The central claim that INT4 accuracy is 'statistically indistinguishable' from FP32 rests on non-significant p-values from standard difference tests (p=0.341 on FD001; p=0.264/0.534 on FD002) with N=10 seeds. These demonstrate failure to reject the null but provide no equivalence bounds, TOST results, or effect-size confidence intervals. With modest power for detecting small MAE shifts (e.g., 1-2%), this does not securely support the indistinguishability assertion for operational aerospace use.

Authors: We agree that non-significant p-values alone do not establish equivalence and that our phrasing of 'statistically indistinguishable' requires stronger support for operational claims. In the revision we will add Two One-Sided Tests (TOST) for equivalence, report effect sizes with confidence intervals, and revise the language in the abstract and results sections to reflect the updated analysis. revision: yes
Referee: [Methods/Experimental Setup] Methods/Experimental Setup: The Non-IID client partitioning of C-MAPSS is presented as realistic, with a direct IID vs. Non-IID comparison showing suppressed variance under IID. However, the specific partitioning mechanism (e.g., how operating conditions or sensor distributions are assigned across clients) is not detailed enough to evaluate whether it captures real fleet heterogeneity, which is load-bearing for the methodological finding and generalizability.

Authors: We thank the referee for this observation. The revised manuscript will include an expanded description of the Non-IID partitioning algorithm, detailing how operating conditions and sensor distributions are assigned to clients to emulate fleet heterogeneity. This will improve reproducibility and allow readers to better assess the realism of the setup. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation with direct measurements

full rationale

The paper reports experimental results from training a 1-D CNN on NASA C-MAPSS data under Non-IID partitioning, comparing FP32/INT8/INT4/INT2 quantization via MAE, NASA score, communication volume, and p-values from N=10 seeds. All load-bearing claims (statistical indistinguishability, 8x cost reduction, INT2 instability) rest on these direct measurements and standard statistical tests. No derivation chain, fitted parameters, self-citations, or ansatzes are present in the provided text; the work is self-contained against the public C-MAPSS benchmark and does not reduce any result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions of statistical hypothesis testing and the representativeness of the NASA C-MAPSS benchmark under the chosen partition; no new free parameters, axioms, or invented entities are introduced beyond the tested quantization bit widths.

axioms (1)

standard math Assumptions required for the validity of the reported p-values (e.g., appropriate distribution for the test statistic)
Invoked when claiming statistical indistinguishability between INT4 and FP32.

pith-pipeline@v0.9.0 · 5645 in / 1348 out tokens · 37148 ms · 2026-05-10T18:02:58.657742+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

INT4 achieves accuracy statistically indistinguishable from FP32 on both FD001 (p=0.341) and FD002 (p=0.264 MAE, p=0.534 NASA score) while delivering an 8× reduction in gradient communication cost

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Alistarh, D

D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic. Qsgd: Communication- efficient sgd via gradient quantization and encoding. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Ad- vances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings....

work page 2017
[2]

Bernstein, Y.-X

J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, and A. Anandkumar. signsgd: compressed optimisation for non-convex problems. InInternational Conference on Machine Learning,

work page
[3]

URLhttps://api.semanticscholar.org/CorpusID:7763588

work page
[4]

Fahim et al

F. Fahim et al. hls4ml: An open-source codesign workflow to empower scientific low- power machine learning devices.IEEE Transactions on Nuclear Science, 68(8):1885–1896, 2021. 14

work page 2021
[5]

Geiping, H

J. Geiping, H. Bau, F. Droste, and M. Moeller. Inverting gradients – how easy is it to break privacy in federated learning? InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 16937–16947, 2020

work page 2020
[6]

He et al

Z. He et al. Feddt: A communication-efficient federated learning via knowledge distillation and ternary compression.Electronics, 14(11):2183, 2025

work page 2025
[7]

L. V. Hedges and I. Olkin.Statistical Methods for Meta-Analysis. Academic Press, 1985

work page 1985
[8]

Khalil et al

K. Khalil et al. A federated learning model based on hardware acceleration for the early detection of alzheimer’s disease.Sensors, 23(19):8272, 2023

work page 2023
[9]

Landau, I

D. Landau, I. de Pater, M. Mitici, and N. Saurabh. Federated learning framework for collaborative remaining useful life prognostics: an aircraft engine case study, 2025. URL https://arxiv.org/abs/2506.00499

work page arXiv 2025
[10]

Laouiti et al

A. Laouiti et al. Hardware acceleration of fully homomorphic encryption for edge federated learning.IEEE Internet of Things Journal, 2025

work page 2025
[11]

Lee et al

S. Lee et al. Biprunefl: Computation and communication efficient federated learning with binary quantization and pruning.IEEE Access, 2025

work page 2025
[12]

F. Li, B. Liu, X. Wang, B. Zhang, and J. Yan. Ternary weight networks, 2022. URL https://arxiv.org/abs/1605.04711

work page arXiv 2022
[13]

X. Ma, J. Zhu, Z. Lin, Y. Qin, and S. Chen. A state-of-the-art survey on solving non-iid data in federated learning.Future Generation Computer Systems, 135, 05 2022. doi: 10.1016/j.future.2022.05.003

work page doi:10.1016/j.future.2022.05.003 2022
[14]

McMahan, E

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas. Communication- Efficient Learning of Deep Networks from Decentralized Data. In A. Singh and J. Zhu, editors,Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 ofProceedings of Machine Learning Research, pages 1273–1282. PMLR, 20–22 Apr 2017....

work page 2017
[15]

T. D. D. Nguyen, J. Kim, and H. Lee. Ckks-based homomorphic encryption architecture using parallel ntt multiplier. In2023 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2023

work page 2023
[16]

A. A. Purkayastha et al. Federated learning for predictive maintenance: A survey of methods, applications, and challenges. In2024 IEEE 67th International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE, 2024

work page 2024
[17]

Damage propagation modeling for aircraft engine run-to-failure simulation

A. Saxena, K. Goebel, D. Simon, and N. Eklund. Damage propagation modeling for aircraft engine run-to-failure simulation.International Conference on Prognostics and Health Management, 10 2008. doi: 10.1109/PHM.2008.4711414. 15

work page doi:10.1109/phm.2008.4711414 2008
[18]

Wang and M

C. Wang and M. Gao. Sam: A scalable accelerator for number theoretic transform using multi-dimensional decomposition. InProceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2023

work page 2023
[19]

Ye and M

Z. Ye and M. Ikeda. Implementing homomorphic encryption-based logic locking in soc designs.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 33(7), 2025

work page 2025
[20]

Zheng, Z

Z. Zheng, Z. Wang, X. Cui, M. Li, J. Chen, Yun, Liang, A. Li, and X. Chen. Fedhq: Hybrid runtime quantization for federated learning, 2025. URL https://arxiv.org/ abs/2505.11982

work page arXiv 2025
[21]

L. Zhu, Z. Liu, and S. Han. Deep leakage from gradients. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019. 16

work page 2019

[1] [1]

Alistarh, D

D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic. Qsgd: Communication- efficient sgd via gradient quantization and encoding. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Ad- vances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings....

work page 2017

[2] [2]

Bernstein, Y.-X

J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, and A. Anandkumar. signsgd: compressed optimisation for non-convex problems. InInternational Conference on Machine Learning,

work page

[3] [3]

URLhttps://api.semanticscholar.org/CorpusID:7763588

work page

[4] [4]

Fahim et al

F. Fahim et al. hls4ml: An open-source codesign workflow to empower scientific low- power machine learning devices.IEEE Transactions on Nuclear Science, 68(8):1885–1896, 2021. 14

work page 2021

[5] [5]

Geiping, H

J. Geiping, H. Bau, F. Droste, and M. Moeller. Inverting gradients – how easy is it to break privacy in federated learning? InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 16937–16947, 2020

work page 2020

[6] [6]

He et al

Z. He et al. Feddt: A communication-efficient federated learning via knowledge distillation and ternary compression.Electronics, 14(11):2183, 2025

work page 2025

[7] [7]

L. V. Hedges and I. Olkin.Statistical Methods for Meta-Analysis. Academic Press, 1985

work page 1985

[8] [8]

Khalil et al

K. Khalil et al. A federated learning model based on hardware acceleration for the early detection of alzheimer’s disease.Sensors, 23(19):8272, 2023

work page 2023

[9] [9]

Landau, I

D. Landau, I. de Pater, M. Mitici, and N. Saurabh. Federated learning framework for collaborative remaining useful life prognostics: an aircraft engine case study, 2025. URL https://arxiv.org/abs/2506.00499

work page arXiv 2025

[10] [10]

Laouiti et al

A. Laouiti et al. Hardware acceleration of fully homomorphic encryption for edge federated learning.IEEE Internet of Things Journal, 2025

work page 2025

[11] [11]

Lee et al

S. Lee et al. Biprunefl: Computation and communication efficient federated learning with binary quantization and pruning.IEEE Access, 2025

work page 2025

[12] [12]

F. Li, B. Liu, X. Wang, B. Zhang, and J. Yan. Ternary weight networks, 2022. URL https://arxiv.org/abs/1605.04711

work page arXiv 2022

[13] [13]

X. Ma, J. Zhu, Z. Lin, Y. Qin, and S. Chen. A state-of-the-art survey on solving non-iid data in federated learning.Future Generation Computer Systems, 135, 05 2022. doi: 10.1016/j.future.2022.05.003

work page doi:10.1016/j.future.2022.05.003 2022

[14] [14]

McMahan, E

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas. Communication- Efficient Learning of Deep Networks from Decentralized Data. In A. Singh and J. Zhu, editors,Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 ofProceedings of Machine Learning Research, pages 1273–1282. PMLR, 20–22 Apr 2017....

work page 2017

[15] [15]

T. D. D. Nguyen, J. Kim, and H. Lee. Ckks-based homomorphic encryption architecture using parallel ntt multiplier. In2023 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2023

work page 2023

[16] [16]

A. A. Purkayastha et al. Federated learning for predictive maintenance: A survey of methods, applications, and challenges. In2024 IEEE 67th International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE, 2024

work page 2024

[17] [17]

Damage propagation modeling for aircraft engine run-to-failure simulation

A. Saxena, K. Goebel, D. Simon, and N. Eklund. Damage propagation modeling for aircraft engine run-to-failure simulation.International Conference on Prognostics and Health Management, 10 2008. doi: 10.1109/PHM.2008.4711414. 15

work page doi:10.1109/phm.2008.4711414 2008

[18] [18]

Wang and M

C. Wang and M. Gao. Sam: A scalable accelerator for number theoretic transform using multi-dimensional decomposition. InProceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2023

work page 2023

[19] [19]

Ye and M

Z. Ye and M. Ikeda. Implementing homomorphic encryption-based logic locking in soc designs.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 33(7), 2025

work page 2025

[20] [20]

Zheng, Z

Z. Zheng, Z. Wang, X. Cui, M. Li, J. Chen, Yun, Liang, A. Li, and X. Chen. Fedhq: Hybrid runtime quantization for federated learning, 2025. URL https://arxiv.org/ abs/2505.11982

work page arXiv 2025

[21] [21]

L. Zhu, Z. Liu, and S. Han. Deep leakage from gradients. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019. 16

work page 2019