Federated Naive Bayes with Real Mixture of Gaussians and Institutional Governance Regularization for Network Intrusion Detection

Edgar Oswaldo; L\'opez Rubio; Ezequiel; Herrera Logro\~no; Juan Miguel; Ortiz de Lazcano Lobato

arxiv: 2605.18647 · v1 · pith:G2QUAPTZnew · submitted 2026-05-18 · 💻 cs.CR

Federated Naive Bayes with Real Mixture of Gaussians and Institutional Governance Regularization for Network Intrusion Detection

Herrera Logro\~no , Edgar Oswaldo; L\'opez Rubio , Ezequiel , Ortiz de Lazcano Lobato , Juan Miguel This is my paper

Pith reviewed 2026-05-20 09:13 UTC · model grok-4.3

classification 💻 cs.CR

keywords federated learningintrusion detectionnaive bayesmixture of gaussiansgovernance regularizationheterogeneous datanetwork security

0 comments

The pith

Using institutional governance quality to set federated model weights improves intrusion detection when data distributions differ across organizations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper challenges the standard assumption in federated learning that every participating institution should contribute equally to the shared model. Instead it constructs an Institutional Coherence Index from four governance indicators and feeds that index into an optimizer that assigns higher influence to more mature nodes. Local models use a hybrid Categorical and Gaussian Naive Bayes classifier whose distributions are then combined on the server as an actual mixture of Gaussians rather than averaged parameters. This produces measurable gains in F1-macro score on three public intrusion datasets under controlled heterogeneity levels. A reader would care because real security collaborations routinely involve organizations whose data quality and risk profiles are not interchangeable.

Core claim

The central claim is that replacing size-proportional federated averaging with weights produced by a Nelder-Mead optimizer regularized by an Institutional Coherence Index, while preserving each node's local Categorical-Gaussian Naive Bayes distribution inside a real mixture of Gaussians, raises F1-macro from 0.9076 to 0.9135 on NSL-KDD, from 0.6771 to 0.7556 on CIC-IDS2017, and from 0.2060 to 0.2110 on UNSW-NB15 across seven Dirichlet heterogeneity settings, with statistical significance in seventy of ninety-four tested configurations.

What carries the argument

The Institutional Coherence Index, formed from control maturity, proportion of implemented controls, risk-indicator frequency, and mean vulnerability score, which acts as a regularization prior inside the Nelder-Mead weight optimizer; combined with real Mixture of Gaussians aggregation that keeps each node's statistical identity intact.

If this is right

The optimizer assigns the highest weight to the most mature node and the lowest to the least mature in every dataset without any explicit ordering rule.
Gains hold across seven different levels of data heterogeneity induced by Dirichlet sampling.
Statistical significance by McNemar test appears in the large majority of configurations on all three benchmark collections.
The mixture-of-Gaussians step avoids collapsing local models into a single global parameter vector.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same weighting principle could be tested in other federated settings where participant quality is known to vary, such as medical imaging or financial fraud detection.
One could replace the four chosen indicators with alternative governance or audit metrics to check whether the performance lift is robust to the specific index definition.
If the method generalizes, federated security systems might routinely collect lightweight governance summaries alongside model updates rather than relying only on data volume or sample count.

Load-bearing premise

The four governance indicators can be validly summarized into a single index that reflects genuine institutional quality and that using this index to regularize federated weights will raise detection performance without introducing bias or overfitting to the chosen metrics.

What would settle it

Randomly shuffling the governance-indicator values across institutions while keeping the same data partitions and retraining; if the performance advantage over size-proportional averaging disappears, the claim that the index captures quality differences would be falsified.

Figures

Figures reproduced from arXiv: 2605.18647 by Edgar Oswaldo; L\'opez Rubio, Ezequiel, Herrera Logro\~no, Juan Miguel, Ortiz de Lazcano Lobato.

**Figure 1.** Figure 1: Heterogeneity gradient, NSL-KDD (2009). F1-macro and ANLL per proposal across seven Dirichlet levels. Proposal A leads B consistently [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 4.** Figure 4: Heterogeneity gradient, CIC-IDS2017 (2017). Proposal A shows the widest gap relative to B of any dataset in this study [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: ICC Alignment, CIC-IDS2017 (2017). Financial and Health nodes show near parity, a data-driven result reflecting the Health node's effectiveness on binary attack detection under this class distribution [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Learned weights per node, CIC-IDS2017 (2017). Higher volatility reflects class imbalance under Dirichlet partitioning at low alpha levels. C. UNSW-NB15 (2015) UNSW-NB15 is the hardest dataset in this study. Ten attack classes, 133 unique protocol values in a single categorical feature, and severe class imbalance under Dirichlet partitioning combine to create conditions where local models for rare classes (… view at source ↗

**Figure 7.** Figure 7: Heterogeneity gradient, UNSW-NB15 (2015). The centralized ceiling (gray) outperforms federated proposals due to rare-class disappearance under Dirichlet partitioning [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 8.** Figure 8: ICC Alignment, UNSW-NB15 (2015). Clearest alignment with the full ICC hierarchy of any dataset: Financial > Health > Government [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗

**Figure 10.** Figure 10: ICC Alignment cruzado: NSL-KDD (2009), CIC-IDS2017 (2017), UNSW-NB15 (2015). Bars: learned weights averaged over 5 reps and 7 alpha levels. Orange line: normalized ICC prior. In all three panels, the Financial node (left) consistently exceeds the Government node (right). The governance hierarchy learned by the optimizer matches the CRISC hierarchy in every case [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗

**Figure 11.** Figure 11: MoG densities per institutional node, NSL-KDD (2009), feature: duration. Left: Normal traffic. Right: Attack traffic. Each curve is one node's local Gaussian. The server weights these three distributions by the learned ICC weights at prediction time. VI. DISCUSSION A. Performance Consistency Across Configurations Proposal A does not produce statistically significant degradation relative to B in any of the… view at source ↗

read the original abstract

Federated learning for intrusion detection rests on a flawed premise: that every participating institution contributes equally to the shared model. In practice, a financial institution with mature security controls and low vulnerability exposure produces fundamentally different data than a government agency running with weaker controls and higher exposure. Treating their local models as equivalent discards information that organisations already collect through standard risk management audits. Four governance indicators from the CRISC framework of ISACA, specifically control maturity (CMM), proportion of implemented controls (KCI), risk indicator activation frequency (KRI), and mean vulnerability score (CVSS), are combined here into an Institutional Coherence Index (ICC). This index enters a Nelder-Mead federated weight optimizer as a regularization prior, guiding weight assignment toward institutional quality without imposing any fixed allocation. Each node trains a hybrid local classifier combining Categorical and Gaussian Naive Bayes. The server combines local distributions as a real Mixture of Gaussians, preserving each node's statistical identity rather than collapsing it into a global parameter vector. Validation on NSL-KDD (2009), CIC-IDS2017 (2017), and UNSW-NB15 (2015), under seven Dirichlet heterogeneity levels, shows that the ICC-regularized proposal outperforms size-proportional federated averaging in all three datasets: F1-macro 0.9135 vs. 0.9076 (+0.0059), 0.7556 vs. 0.6771 (+0.0785), and 0.2110 vs. 0.2060 (+0.0050). Statistical significance holds in 70 of 94 configurations (McNemar, p < 0.05). In all three cases, the optimizer assigned the highest weight to the institutionally most mature node and the lowest to the least mature, without any explicit ordering constraint.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a governance-derived index to regularize weights in a federated mixture-of-Gaussians Naive Bayes setup for intrusion detection and reports modest outperformance over size-proportional averaging.

read the letter

The main point is that they build an Institutional Coherence Index from four CRISC indicators and feed it into a Nelder-Mead optimizer as a regularization prior so that better-governed nodes get higher weight in the federated combination. They keep local distributions separate via a real mixture of Gaussians rather than collapsing to a single parameter set, and they test this on NSL-KDD, CIC-IDS2017, and UNSW-NB15 under seven Dirichlet heterogeneity levels. The optimizer ends up assigning highest weight to the most mature node without any explicit ordering, and they beat size-proportional averaging with F1-macro lifts of 0.0059, 0.0785, and 0.0050 plus McNemar significance in 70 of 94 configurations. That combination of external governance data with the mixture model is the actual novelty here. It directly tackles the equal-contribution assumption that standard federated averaging makes. The experiments are run across multiple datasets and heterogeneity settings, which is better than many single-dataset federated security papers. The fact that the optimizer discovers the quality ordering on its own is a small but concrete result. The soft spots are around the ICC itself. The abstract gives no derivation or external validation showing that this particular combination of control maturity, implemented controls, risk indicators, and vulnerability scores actually tracks data quality or model robustness. There are no error bars, no sensitivity checks on alternative index constructions, and no details on the exact normalization or regularization strength. The gains on two of the three datasets are small enough that they could disappear under different random seeds or slight changes to the index. This is the kind of work that would interest people doing applied federated learning for security where participants really do have different risk and governance profiles. A reader who already works with heterogeneous clients and wants to incorporate side information would find the setup useful to think about. It is worth sending to peer review so the full methods, exact ICC formula, and optimizer hyperparameters can be checked against the reported numbers.

Referee Report

3 major / 2 minor

Summary. The paper claims that incorporating an Institutional Coherence Index (ICC), formed from four CRISC governance indicators (control maturity, proportion of implemented controls, risk indicator activation frequency, and mean vulnerability score), as a regularization prior in a Nelder-Mead optimizer improves federated aggregation for network intrusion detection. Local nodes use hybrid Categorical-Gaussian Naive Bayes classifiers whose distributions are combined at the server as a real Mixture of Gaussians. On NSL-KDD, CIC-IDS2017, and UNSW-NB15 under seven Dirichlet heterogeneity levels, the ICC-regularized method outperforms size-proportional federated averaging with F1-macro gains of +0.0059, +0.0785, and +0.0050 respectively, reaching statistical significance (McNemar p<0.05) in 70 of 94 configurations; the optimizer assigns highest weight to the most mature node without explicit ordering.

Significance. If the central performance claims hold after addressing missing details, the work offers a practical way to leverage existing institutional risk-management data in federated intrusion detection, moving beyond size-based or uniform averaging. The real Mixture of Gaussians approach preserves local statistical identity, and the multi-dataset evaluation with controlled heterogeneity provides a reproducible testbed. The observation that the optimizer spontaneously favors mature institutions is noteworthy and could motivate further study of governance-aware FL.

major comments (3)

[Abstract / Methods] Abstract and Methods (governance regularization section): the exact formula, normalization procedure, and weighting scheme for combining the four CRISC indicators (CMM, KCI, KRI, CVSS) into the single ICC scalar are not supplied. Because the ICC is the sole input to the Nelder-Mead regularizer and the source of the reported weight ordering and F1 gains, this omission is load-bearing for the central claim that governance data improves detection performance.
[Experimental results] Experimental results (Tables reporting F1-macro): the manuscript gives point estimates (0.9135 vs 0.9076, 0.7556 vs 0.6771, 0.2110 vs 0.2060) but provides neither error bars, standard deviations across runs, nor the number of independent trials. Without these, it is impossible to judge whether the modest deltas, especially the +0.0050 on UNSW-NB15, are statistically or practically reliable.
[Abstract / §4] Abstract and §4 (validation): no correlation analysis, external benchmark, or sensitivity test is presented showing that the chosen ICC formulation predicts better local data distributions or that the observed gains survive under alternative combinations of the four raw indicators or an independent maturity measure. This leaves open the possibility that the superiority and emergent weight ordering are artifacts of the particular simulation rather than evidence for governance-aware regularization.

minor comments (2)

[Methods] The Nelder-Mead regularization strength hyperparameter and the precise form of the regularization term added to the objective are not stated, hindering reproducibility.
[Methods] The paper should clarify how the four governance metrics are scaled to a common range before ICC construction and whether any robustness checks were performed against monotonic transformations of the index.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review of our manuscript. The comments have identified important areas for clarification and strengthening of the presentation. We respond to each major comment below and indicate the changes made to the revised version.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods (governance regularization section): the exact formula, normalization procedure, and weighting scheme for combining the four CRISC indicators (CMM, KCI, KRI, CVSS) into the single ICC scalar are not supplied. Because the ICC is the sole input to the Nelder-Mead regularizer and the source of the reported weight ordering and F1 gains, this omission is load-bearing for the central claim that governance data improves detection performance.

Authors: We agree that the precise formula, normalization steps, and weighting scheme for the ICC were not explicitly stated in the submitted manuscript. This was an oversight that affects reproducibility of the central claim. In the revised manuscript we have added a dedicated paragraph in the Methods section that supplies the exact combination rule, the min-max normalization applied to each indicator, and the fixed weights used to produce the scalar ICC value passed to the Nelder-Mead optimizer. revision: yes
Referee: [Experimental results] Experimental results (Tables reporting F1-macro): the manuscript gives point estimates (0.9135 vs 0.9076, 0.7556 vs 0.6771, 0.2110 vs 0.2060) but provides neither error bars, standard deviations across runs, nor the number of independent trials. Without these, it is impossible to judge whether the modest deltas, especially the +0.0050 on UNSW-NB15, are statistically or practically reliable.

Authors: The referee is correct that only point estimates appear in the original tables. We have revised the Experimental results section to report the number of independent trials (ten runs per configuration with different Dirichlet seeds) and to include standard deviations together with error bars on all F1-macro figures and tables. These additions allow readers to assess the stability of the observed gains. revision: yes
Referee: [Abstract / §4] Abstract and §4 (validation): no correlation analysis, external benchmark, or sensitivity test is presented showing that the chosen ICC formulation predicts better local data distributions or that the observed gains survive under alternative combinations of the four raw indicators or an independent maturity measure. This leaves open the possibility that the superiority and emergent weight ordering are artifacts of the particular simulation rather than evidence for governance-aware regularization.

Authors: We acknowledge that the original submission did not contain an explicit sensitivity study or correlation analysis between ICC values and local data quality. The evaluation across three datasets and seven controlled heterogeneity levels already shows consistent gains and the same emergent weight ordering, which we view as evidence against pure simulation artifact. Nevertheless, to directly address the concern we have added a short sensitivity subsection that recomputes results under two alternative ICC formulations (equal weights and an external maturity proxy) and confirms that the performance advantage and weight ordering remain qualitatively unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained against external benchmarks.

full rationale

The paper assembles the Institutional Coherence Index directly from four externally sourced CRISC indicators (CMM, KCI, KRI, CVSS) and feeds the resulting scalar only as a regularization prior into a Nelder-Mead optimizer whose output weights are then applied to a Mixture-of-Gaussians aggregation of independently trained local Naive Bayes models. All performance numbers are obtained by running the resulting federated procedure on three fixed public datasets (NSL-KDD, CIC-IDS2017, UNSW-NB15) under controlled Dirichlet heterogeneity and are assessed with McNemar tests; no equation or claim equates a reported F1 gain to the ICC definition itself, nor does any load-bearing step rest on a self-citation whose content is presupposed. The optimizer’s emergent preference for higher-ICC nodes is an observed outcome of the optimization, not a definitional identity. Consequently the central empirical claim does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim rests on the validity of the four CRISC-derived metrics as a quality proxy and on standard federated-learning assumptions about data heterogeneity; the ICC itself is a new constructed quantity whose combination rule is not detailed in the abstract.

free parameters (2)

ICC combination rule
How the four governance indicators are weighted or normalized into a single index is not specified and may involve hand-chosen or fitted coefficients.
Nelder-Mead regularization strength
The strength with which the ICC prior influences the optimizer is a tunable parameter whose value is not reported.

axioms (2)

domain assumption Institutional governance metrics from the CRISC framework are reliable proxies for data quality in intrusion detection tasks.
The paper treats the four indicators as direct inputs to the coherence index without further validation.
domain assumption Data heterogeneity across institutions can be simulated by Dirichlet distributions at seven levels.
Validation explicitly uses seven Dirichlet heterogeneity levels.

invented entities (1)

Institutional Coherence Index (ICC) no independent evidence
purpose: To provide a scalar regularization prior that guides the assignment of weights to local models according to institutional quality.
Newly defined composite index built from four CRISC indicators; no independent evidence outside the paper is supplied.

pith-pipeline@v0.9.0 · 5891 in / 1826 out tokens · 47712 ms · 2026-05-20T09:13:06.602070+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

[1]

https://arxiv.org/abs/1602.05629

H. B. McMahan et al., "Communication -efficient learning o f deep networks from decentralized data," AISTATS 2017. arXiv:1602.05629

work page arXiv 2017
[2]

ISACA, 2025

ISACA, CRISC Review Manual, 8th ed. ISACA, 2025

work page 2025
[3]

UNSW-NB15 : A comprehensive data set for network intrusion detection systems ( UNSW-NB15 network data set)

N. Moustafa and J. Slay, "UNSW-NB15: a comprehensive data set for network intrusion detection systems," IEEE MilCIS, 2015. DOI: 10.1109/MilCIS.2015.7348942

work page doi:10.1109/milcis.2015.7348942 2015
[4]

The evaluation of Network Anomaly Detection Systems,

N. Moustafa and J. Slay, "The evaluation of Network Anomaly Detection Systems," Information Security Journal, 2016. DOI: 10.1080/19393555.2015.1125974

work page doi:10.1080/19393555.2015.1125974 2016
[5]

Toward generating a new intrusion detection dataset and intrusion traffic characterization

I. Sharafaldin et al., "Toward generating a new intrusion detection dataset," ICISSP 2018. DOI: 10.5220/0006639801080116

work page doi:10.5220/0006639801080116 2018
[6]

A detailed analysis of the KDD CUP 99 data set,

M. Tavallaee et al., "A detailed analysis of the KDD CUP 99 data set," IEEE CISDA, 2009. DOI: 10.1109/CISDA.2009.5137363

work page doi:10.1109/cisda.2009.5137363 2009
[7]

Double -layered hybrid NID using Naive Bayes and SVM,

T. Wisanwanichthan and M. Thammawichai, "Double -layered hybrid NID using Naive Bayes and SVM," IEEE Access, vol. 9,

work page
[8]

DOI: 10.1109/ACCESS.2021.3118573

work page doi:10.1109/access.2021.3118573 2021
[9]

Framework for Drift Detection in AI- driven Anomaly Detection,

A. Lara-Gutierrez et al., "Framework for Drift Detection in AI- driven Anomaly Detection," Int. J. Inf. Security, vol. 24, 2025. DOI: 10.1007/s10207-025-01118-9

work page doi:10.1007/s10207-025-01118-9 2025
[10]

Meta -heuristic optimization hierarchical IDS,

K. A. ElDahshan et al., "Meta -heuristic optimization hierarchical IDS," Computers, vol. 11, 2022. DOI: 10.3390/computers11120170

work page doi:10.3390/computers11120170 2022

[1] [1]

https://arxiv.org/abs/1602.05629

H. B. McMahan et al., "Communication -efficient learning o f deep networks from decentralized data," AISTATS 2017. arXiv:1602.05629

work page arXiv 2017

[2] [2]

ISACA, 2025

ISACA, CRISC Review Manual, 8th ed. ISACA, 2025

work page 2025

[3] [3]

UNSW-NB15 : A comprehensive data set for network intrusion detection systems ( UNSW-NB15 network data set)

N. Moustafa and J. Slay, "UNSW-NB15: a comprehensive data set for network intrusion detection systems," IEEE MilCIS, 2015. DOI: 10.1109/MilCIS.2015.7348942

work page doi:10.1109/milcis.2015.7348942 2015

[4] [4]

The evaluation of Network Anomaly Detection Systems,

N. Moustafa and J. Slay, "The evaluation of Network Anomaly Detection Systems," Information Security Journal, 2016. DOI: 10.1080/19393555.2015.1125974

work page doi:10.1080/19393555.2015.1125974 2016

[5] [5]

Toward generating a new intrusion detection dataset and intrusion traffic characterization

I. Sharafaldin et al., "Toward generating a new intrusion detection dataset," ICISSP 2018. DOI: 10.5220/0006639801080116

work page doi:10.5220/0006639801080116 2018

[6] [6]

A detailed analysis of the KDD CUP 99 data set,

M. Tavallaee et al., "A detailed analysis of the KDD CUP 99 data set," IEEE CISDA, 2009. DOI: 10.1109/CISDA.2009.5137363

work page doi:10.1109/cisda.2009.5137363 2009

[7] [7]

Double -layered hybrid NID using Naive Bayes and SVM,

T. Wisanwanichthan and M. Thammawichai, "Double -layered hybrid NID using Naive Bayes and SVM," IEEE Access, vol. 9,

work page

[8] [8]

DOI: 10.1109/ACCESS.2021.3118573

work page doi:10.1109/access.2021.3118573 2021

[9] [9]

Framework for Drift Detection in AI- driven Anomaly Detection,

A. Lara-Gutierrez et al., "Framework for Drift Detection in AI- driven Anomaly Detection," Int. J. Inf. Security, vol. 24, 2025. DOI: 10.1007/s10207-025-01118-9

work page doi:10.1007/s10207-025-01118-9 2025

[10] [10]

Meta -heuristic optimization hierarchical IDS,

K. A. ElDahshan et al., "Meta -heuristic optimization hierarchical IDS," Computers, vol. 11, 2022. DOI: 10.3390/computers11120170

work page doi:10.3390/computers11120170 2022