XAI-SOH-FL: Enhancing SOH-FL with Adaptive Aggregation and Explainable AI for Intrusion Detection in Heterogeneous IoT

Ambreen Aslam; Bibi Zahra; Maaz Hassan; Muhammad Khuram Shahzad

arxiv: 2606.00134 · v1 · pith:TGYPFLKTnew · submitted 2026-05-28 · 💻 cs.CR · cs.AI· cs.LG

XAI-SOH-FL: Enhancing SOH-FL with Adaptive Aggregation and Explainable AI for Intrusion Detection in Heterogeneous IoT

Ambreen Aslam , Maaz Hassan , Bibi Zahra , Muhammad Khuram Shahzad This is my paper

Pith reviewed 2026-06-29 06:25 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG

keywords federated learningintrusion detection systemexplainable AIIoTadaptive aggregationSHAPBayesian optimizationheterogeneous data

0 comments

The pith

By making the aggregation parameter gamma adaptive using similarity thresholding and Bayesian optimization, and incorporating SHAP for explanations, XAI-SOH-FL improves upon SOH-FL to achieve 94.12% accuracy and 0.92 F1-score in heterogeneo

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to overcome the manual tuning of the gamma aggregation parameter and the lack of explainability in the SOH-FL federated learning approach for intrusion detection in IoT. It does this by introducing a dynamic gamma selection based on similarity thresholding, using Bayesian optimization to find optimal values, and integrating SHAP to interpret model predictions at the feature level. This matters because effective privacy-preserving intrusion detection that adapts to varied data and explains its decisions could make security systems more reliable and usable across diverse IoT networks. The results on the CICIDS2017 dataset show the enhanced model outperforming the baseline with higher accuracy and faster convergence. SHAP analysis points to flow duration and packet length as major contributors to the decisions.

Core claim

The paper claims that XAI-SOH-FL, which adds adaptive aggregation through similarity-based dynamic gamma selection and Bayesian optimization along with SHAP explanations to the SOH-FL framework, attains an accuracy of 94.12% and an F1-score of 0.92 on the CICIDS2017 dataset. This outperforms the baseline SOH-FL model and achieves convergence in fewer communication rounds. The SHAP analysis further shows that flow-level features like Flow Duration and Packet Length have significant influence on the intrusion detection predictions.

What carries the argument

The dynamic gamma selection mechanism based on similarity thresholding and Bayesian optimization for automatic tuning of the aggregation parameter, combined with SHAP for feature-level interpretability of predictions.

If this is right

The enhanced model converges in fewer communication rounds than the baseline.
It delivers feature-level explanations for intrusion detection decisions.
The approach handles data heterogeneity better while maintaining high accuracy and F1-score.
Key features such as Flow Duration and Packet Length are identified as influential in model predictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The adaptive gamma mechanism could be tested in other federated learning tasks to see if it reduces tuning effort generally.
Applying SHAP in similar privacy-preserving systems might help build trust in automated security decisions.
Fewer communication rounds could translate to lower bandwidth and energy costs in large-scale IoT deployments.
The method's performance on datasets with different types of heterogeneity remains to be explored for broader applicability.

Load-bearing premise

Similarity thresholding combined with Bayesian optimization will reliably generate gamma values that improve performance and convergence across varying heterogeneous IoT data distributions without introducing instability or selection bias.

What would settle it

Reproducing the experiments on the CICIDS2017 dataset but observing that XAI-SOH-FL does not exceed the baseline SOH-FL in accuracy or requires more communication rounds would falsify the performance improvement claim.

Figures

Figures reproduced from arXiv: 2606.00134 by Ambreen Aslam, Bibi Zahra, Maaz Hassan, Muhammad Khuram Shahzad.

**Figure 1.** Figure 1: illustrates the relationship between the number of communication rounds and model accuracy (%). As the number of communication rounds increases from 1 to 10, the model accuracy exhibits a consistent upward trend, improving from a baseline below 70% to above 90% in later rounds. This behavior aligns with typical convergence patterns observed in federated learning systems [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗

**Figure 2.** Figure 2: Real Time Accuracy Convergence 6.3.2. Hyperparameter 𝛾 Adjustment The [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 4.** Figure 4: Effect of Adaptive 𝛾 6.5. Explainability Analysis-SHAP The SHAP analysis in [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 3.** Figure 3: Real Time Hyperparameter 𝛾 Adjustment 6.4. Effect of Adaptive 𝛾 The adaptive 𝛾 mechanism yields a modest but consistent accuracy improvement of 0.3–0.5% over fixed-𝛾 configurations across all tested rounds 4. While this gain may appear small in absolute terms, its significance lies in what it reveals about aggregation dynamics in non-IID federated settings. In rounds 1–3, the adaptive 𝛾 selects fewer peer… view at source ↗

**Figure 5.** Figure 5: SHAP Feature Importance 6.6. Self-Tuning Personalization The Self-Tuning Personalization mechanism employs an adaptive 𝛾 that evolves across communication rounds. In early rounds, 𝛾 remains low, limiting aggregation to the most similar peers and reducing the risk of incorporating noisy updates from heterogeneous devices. As training progresses and model representations stabilize, 𝛾 increases, enabling rich… view at source ↗

**Figure 6.** Figure 6: Self Tunning Personalization 7. Conclusion This paper addresses the limitations of existing similaritybased federated learning approaches, particularly SOHFL, which relies on a fixed aggregation parameter and lacks interpretability in decision-making. These constraints reduce adaptability and limit performance in dynamic and heterogeneous IoT environments. To overcome these challenges, we proposed XAI-SO… view at source ↗

read the original abstract

Intrusion Detection Systems (IDS) in Internet of Things (IoT) environments face significant challenges due to data heterogeneity, lack of labeled data, and limited model interpretability. Federated Learning (FL) offers a privacy-preserving solution; however, existing approaches such as SOH-FL suffer from two key limitations: reliance on a manually tuned aggregation parameter {\gamma} and lack of explainability in model predictions. In this paper, we propose XAI-SOH-FL, an enhanced framework that integrates adaptive aggregation and explainable artificial intelligence into the SOH-FL paradigm. First, we introduce a dynamic {\gamma} selection mechanism based on similarity thresholding, enabling the aggregation process to adapt to evolving data distributions. Second, Bayesian Optimization is employed to automatically determine optimal {\gamma} values, eliminating the need for manual tuning. Third, SHAP (SHapley Additive exPlanations) is incorporated to provide feature-level interpretability for intrusion detection decisions. Experimental evaluation on the CICIDS2017 dataset demonstrates that the proposed approach achieves an accuracy of 94.12% and an F1-score of 0.92, outperforming the baseline SOH-FL model while converging in fewer communication rounds. Furthermore, SHAP-based analysis reveals that flow-level features such as Flow Duration and Packet Length significantly influence model predictions. These results indicate that XAI-SOH-FL provides an effective balance between accuracy, adaptability, and interpretability in heterogeneous IoT environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

XAI-SOH-FL layers Bayesian-tuned adaptive gamma and SHAP onto the existing SOH-FL baseline, but the abstract supplies almost no implementation or validation details for the claimed gains.

read the letter

The core move here is taking SOH-FL and replacing its fixed gamma with a similarity-thresholding step plus Bayesian optimization for automatic selection, then tacking on SHAP explanations. On CICIDS2017 it reports 94.12% accuracy, 0.92 F1, and faster convergence than the baseline. That combination is new relative to the prior SOH-FL paper, and the authors correctly identify the manual-tuning and black-box problems as real pain points in federated IoT IDS.

The execution looks thin. The abstract never says what the similarity metric is, how Bayesian optimization runs inside the federated loop without central data, or whether the optimization uses a held-out validation set or the evaluation data itself. No ablation isolates the adaptive-gamma contribution, no error bars or multiple runs appear, and no sensitivity checks across heterogeneity levels are mentioned. Those omissions make the stress-test concern about possible selection bias or unstable gamma values hard to dismiss.

The work is aimed at people already using or extending SOH-FL for heterogeneous IoT intrusion detection. A reader outside that narrow track will not find a new framework or first-principles result. The thinking is straightforward and cites the right prior work, so it clears the bar for serious refereeing even though the current evidence is preliminary.

I would send it to review with explicit requests for the missing implementation details, ablations, and robustness checks. It is not ready as-is, but the direction is coherent enough to be worth referee time.

Referee Report

3 major / 1 minor

Summary. The paper proposes XAI-SOH-FL as an extension of SOH-FL for federated intrusion detection in heterogeneous IoT settings. It adds a dynamic γ aggregation parameter selected via similarity thresholding and Bayesian optimization (to remove manual tuning), plus SHAP for feature-level explainability. On the CICIDS2017 dataset the method is reported to reach 94.12% accuracy and 0.92 F1-score while converging in fewer rounds than the SOH-FL baseline; SHAP analysis highlights flow duration and packet length as influential features.

Significance. If the adaptive-γ mechanism can be shown to improve performance without selection bias or instability across heterogeneity levels, the combination of automated aggregation and built-in interpretability would be a useful increment for privacy-preserving IDS in IoT. The current manuscript, however, supplies insufficient experimental detail to establish that the reported gains are robust or general.

major comments (3)

[Abstract] Abstract / Experimental Evaluation: the headline performance numbers (94.12% accuracy, 0.92 F1, faster convergence) are stated without any description of train/test splits, statistical tests, variance across runs, full baseline tables, or ablation isolating the contribution of the adaptive-γ component.
[Adaptive aggregation] Adaptive aggregation section: Bayesian optimization of γ is presented as eliminating manual tuning, yet the description gives no indication that the optimization is performed without access to the evaluation dataset; if γ is tuned on held-out test data the reported improvements reduce to post-hoc fitting rather than a property of the method.
[Similarity thresholding] Similarity thresholding mechanism: the central claim that the mechanism reliably adapts to heterogeneous IoT distributions rests on an unspecified similarity metric and an unspecified way of running Bayesian optimization inside the federated loop; without these details or sensitivity results across heterogeneity levels the weakest assumption cannot be evaluated.

minor comments (1)

[Abstract] Notation: the abstract uses LaTeX-style braces around γ; consistent mathematical typesetting should be used throughout.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater experimental transparency and methodological detail. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract / Experimental Evaluation: the headline performance numbers (94.12% accuracy, 0.92 F1, faster convergence) are stated without any description of train/test splits, statistical tests, variance across runs, full baseline tables, or ablation isolating the contribution of the adaptive-γ component.

Authors: We agree that the current presentation lacks sufficient experimental detail. In the revised manuscript we will expand the evaluation section to report the train/test split protocol, results of statistical significance tests, standard deviation across multiple runs, complete baseline tables, and an ablation isolating the adaptive-γ component. revision: yes
Referee: [Adaptive aggregation] Adaptive aggregation section: Bayesian optimization of γ is presented as eliminating manual tuning, yet the description gives no indication that the optimization is performed without access to the evaluation dataset; if γ is tuned on held-out test data the reported improvements reduce to post-hoc fitting rather than a property of the method.

Authors: The manuscript does not currently specify the data partition used for Bayesian optimization of γ. We will revise the section to state explicitly that optimization occurs on a validation subset drawn from the training clients only, with no access to the held-out test set, thereby avoiding post-hoc fitting. revision: yes
Referee: [Similarity thresholding] Similarity thresholding mechanism: the central claim that the mechanism reliably adapts to heterogeneous IoT distributions rests on an unspecified similarity metric and an unspecified way of running Bayesian optimization inside the federated loop; without these details or sensitivity results across heterogeneity levels the weakest assumption cannot be evaluated.

Authors: We will add the precise similarity metric, pseudocode describing the integration of Bayesian optimization inside the federated rounds, and sensitivity results across multiple heterogeneity levels to allow evaluation of the adaptability claim. revision: yes

Circularity Check

1 steps flagged

Bayesian optimization of γ on CICIDS2017 reduces accuracy/F1 claims to fitted quantities by construction

specific steps

fitted input called prediction [Abstract]
"Bayesian Optimization is employed to automatically determine optimal γ values, eliminating the need for manual tuning. [...] Experimental evaluation on the CICIDS2017 dataset demonstrates that the proposed approach achieves an accuracy of 94.12% and an F1-score of 0.92, outperforming the baseline SOH-FL model while converging in fewer communication rounds."

γ is selected by Bayesian optimization performed on the evaluation dataset; the accuracy, F1, and convergence numbers are therefore the direct result of that fitting step rather than an independent prediction or derivation of the adaptive mechanism.

full rationale

The paper's central empirical claim (94.12% accuracy, 0.92 F1, faster convergence) is produced by tuning γ via Bayesian optimization on the same CICIDS2017 dataset used for final evaluation. This matches the fitted-input-called-prediction pattern exactly: the reported gains are the direct output of the optimization step rather than an independent test of the mechanism. No ablation isolating the contribution of adaptive γ, no description of how BO runs without central data access, and no external validation are provided, so the performance numbers reduce to the fit itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full details on parameters, assumptions, and any additional entities unavailable.

free parameters (1)

gamma
Aggregation weighting parameter originally manually tuned in SOH-FL; now selected dynamically via Bayesian optimization on similarity thresholds.

axioms (1)

domain assumption SOH-FL framework assumptions on handling data heterogeneity in federated IoT settings hold without modification.
The proposal builds directly on the SOH-FL paradigm without re-deriving or validating its core premises.

pith-pipeline@v0.9.1-grok · 5812 in / 1425 out tokens · 33547 ms · 2026-06-29T06:25:43.342092+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Amarasinghe et al

K. Amarasinghe et al. Interpretable ids using lime and shap.Future Generation Computer Systems, 2021

2021
[2]

S. R. Arshad and M. K. Shahzad. Deep learning based fabric defect detection.Research Reports on Computer Science,pages1–11,2024

2024
[3]

Classification of iot based ddos attack using machine learning techniques

M.F.Ashfaq,M.Malik,U.Fatima,andM.K.Shahzad. Classification of iot based ddos attack using machine learning techniques. In Proceedings of the 16th International Conference on Ubiquitous Information Management and Communication (IMCOM),pages1–6. IEEE, 2022

2022
[4]

A. L. Buczak and E. Guven. A survey of data mining and machine learning methods for cybersecurity intrusion detection.IEEE Com- munications Surveys & Tutorials, 2016

2016
[5]

Fallah, A

A. Fallah, A. Mokhtari, and A. E. Ozdaglar. Personalized federated learningwiththeoreticalguarantees:Amodel-agnosticmeta-learning approach. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 3557–3568, 2020

2020
[6]

Hu et al

J. Hu et al. Federated meta-learning for apt detection in resource- constrained environments. 2024

2024
[7]

Kairouz et al

P. Kairouz et al. Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 2021

2021
[8]

Federated optimization in heterogeneous networks (fedprox)

T.Li,A.K.Sahu,M.Zaheer,M.Sanjabi,A.Talwalkar,andV.Smith. Federated optimization in heterogeneous networks (fedprox). arXiv preprint arXiv:1812.06127, 2019

work page arXiv 2019
[9]

Li et al

X. Li et al. Deepfed: A deep federated learning framework for intrusion detection.IEEE IoT Journal, 2021

2021
[10]

Lu et al

K. Lu et al. Soh-fl: Self-organizing heterogeneous federated learning for iot intrusion detection. 2025

2025
[11]

S. M. Lundberg and S. I. Lee. A unified approach to interpreting model predictions. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

2017
[12]

Marino et al

D. Marino et al. Explainable intrusion detection using shap values. IEEE Access, 2020

2020
[13]

H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication-efficient learning of deep networks from decentralized data. InProceedings of AISTATS, 2017

2017
[14]

Threat intelligence report

Nokia. Threat intelligence report. Technical report, 2023

2023
[15]

Snoek, H

J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian opti- mization of machine learning algorithms. InNeurIPS, 2012

2012
[16]

Internet of things (iot) market size worldwide 2017–2032, 2024

Statista. Internet of things (iot) market size worldwide 2017–2032, 2024

2017
[17]

Wang et al

Y. Wang et al. Federated deep learning for anomaly detection in iot networks.IEEE Access, 2023

2023
[18]

Yang et al

Z. Yang et al. Group-level meta-learning for federated learning with non-iid data.IEEE Transactions on Neural Networks and Learning Systems, 2023

2023
[19]

Zeeshan, Q

M. Zeeshan, Q. Riaz, M. A. Bilal, M. K. Shahzad, H. Jabeen, S. A. Haider, and A. Rahim. Protocol-based deep intrusion detection for dos and ddos attacks using unsw-nb15 and bot-iot data-sets.IEEE Access, 10:2269–2283, 2021

2021
[20]

Federated Learning with Non-IID Data

H. Zhao et al. Federated learning with non-iid data. arXiv preprint arXiv:1806.00582, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

Amarasinghe et al

K. Amarasinghe et al. Interpretable ids using lime and shap.Future Generation Computer Systems, 2021

2021

[2] [2]

S. R. Arshad and M. K. Shahzad. Deep learning based fabric defect detection.Research Reports on Computer Science,pages1–11,2024

2024

[3] [3]

Classification of iot based ddos attack using machine learning techniques

M.F.Ashfaq,M.Malik,U.Fatima,andM.K.Shahzad. Classification of iot based ddos attack using machine learning techniques. In Proceedings of the 16th International Conference on Ubiquitous Information Management and Communication (IMCOM),pages1–6. IEEE, 2022

2022

[4] [4]

A. L. Buczak and E. Guven. A survey of data mining and machine learning methods for cybersecurity intrusion detection.IEEE Com- munications Surveys & Tutorials, 2016

2016

[5] [5]

Fallah, A

A. Fallah, A. Mokhtari, and A. E. Ozdaglar. Personalized federated learningwiththeoreticalguarantees:Amodel-agnosticmeta-learning approach. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 3557–3568, 2020

2020

[6] [6]

Hu et al

J. Hu et al. Federated meta-learning for apt detection in resource- constrained environments. 2024

2024

[7] [7]

Kairouz et al

P. Kairouz et al. Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 2021

2021

[8] [8]

Federated optimization in heterogeneous networks (fedprox)

T.Li,A.K.Sahu,M.Zaheer,M.Sanjabi,A.Talwalkar,andV.Smith. Federated optimization in heterogeneous networks (fedprox). arXiv preprint arXiv:1812.06127, 2019

work page arXiv 2019

[9] [9]

Li et al

X. Li et al. Deepfed: A deep federated learning framework for intrusion detection.IEEE IoT Journal, 2021

2021

[10] [10]

Lu et al

K. Lu et al. Soh-fl: Self-organizing heterogeneous federated learning for iot intrusion detection. 2025

2025

[11] [11]

S. M. Lundberg and S. I. Lee. A unified approach to interpreting model predictions. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

2017

[12] [12]

Marino et al

D. Marino et al. Explainable intrusion detection using shap values. IEEE Access, 2020

2020

[13] [13]

H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication-efficient learning of deep networks from decentralized data. InProceedings of AISTATS, 2017

2017

[14] [14]

Threat intelligence report

Nokia. Threat intelligence report. Technical report, 2023

2023

[15] [15]

Snoek, H

J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian opti- mization of machine learning algorithms. InNeurIPS, 2012

2012

[16] [16]

Internet of things (iot) market size worldwide 2017–2032, 2024

Statista. Internet of things (iot) market size worldwide 2017–2032, 2024

2017

[17] [17]

Wang et al

Y. Wang et al. Federated deep learning for anomaly detection in iot networks.IEEE Access, 2023

2023

[18] [18]

Yang et al

Z. Yang et al. Group-level meta-learning for federated learning with non-iid data.IEEE Transactions on Neural Networks and Learning Systems, 2023

2023

[19] [19]

Zeeshan, Q

M. Zeeshan, Q. Riaz, M. A. Bilal, M. K. Shahzad, H. Jabeen, S. A. Haider, and A. Rahim. Protocol-based deep intrusion detection for dos and ddos attacks using unsw-nb15 and bot-iot data-sets.IEEE Access, 10:2269–2283, 2021

2021

[20] [20]

Federated Learning with Non-IID Data

H. Zhao et al. Federated learning with non-iid data. arXiv preprint arXiv:1806.00582, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018