arxiv: 2605.11571 · v1 · submitted 2026-05-12 · 💻 cs.LG

FedOUI: OUI-Guided Client Weighting for Federated Aggregation

Alberto Fern\'andez-Hern\'andez , Jose I. Mestre , Cristian P\'erez-Corral , Manuel F. Dolz , Jose Duato , Enrique S. Quintana-Ort\'i This is my paper

Pith reviewed 2026-05-13 01:49 UTC · model grok-4.3

classification 💻 cs.LG

keywords federated learningclient weightingoverfitting-underfitting indicatornon-IID dataaggregation ruleactivation metricsCIFAR-10heterogeneity

0 comments

The pith

OUI-based weighting downweights clients with atypical internal activations to improve federated aggregation under strong data heterogeneity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FedOUI, an aggregation rule that has each client compute and send an Overfitting-Underfitting Indicator (OUI) value derived from its activations on a fixed probe batch. The server estimates the round-wise OUI distribution and applies smooth reweighting to reduce the influence of structurally atypical clients. On CIFAR-10 with strong non-IID partitions and noisy clients, this yields clearer gains than FedAvg, FedProx, or gradient-alignment baselines while adding little overhead and preserving interpretability. A reader would care because conventional aggregation relies only on dataset size or gradient geometry and misses signals about how each local model organizes its input space.

Core claim

FedOUI is a simple aggregation rule based on the Overfitting-Underfitting Indicator (OUI), an activation-based and label-free metric. Each participating client sends its local update together with an OUI value computed on a fixed probe batch, and the server estimates the round-wise OUI distribution to assign lower weights to structurally atypical clients through a smooth reweighting rule, improving aggregation quality under strong heterogeneity on CIFAR-10.

What carries the argument

The Overfitting-Underfitting Indicator (OUI), an activation-based label-free metric computed on a fixed probe batch that quantifies how a client's model organizes its input space and enables detection of atypical clients for downweighting.

If this is right

Aggregation quality improves under strong non-IID partitioning without extra labels or heavy computation.
The method provides interpretability by linking client weights directly to internal activation patterns.
It remains effective under noisy-client conditions by downweighting outliers in the OUI distribution.
It can be combined with existing federated algorithms as a lightweight add-on.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same probe-batch OUI signal might help detect poisoned or backdoored clients if their activation structure deviates sharply.
Extending OUI-style metrics to other modalities could offer analogous internal-structure signals for aggregation.
The approach might reduce the need for explicit client clustering or personalization by implicitly handling distribution shifts through reweighting.

Load-bearing premise

That an atypical OUI value on a fixed probe batch reliably identifies clients whose updates will harm the global model, so that downweighting them improves performance without discarding useful diversity.

What would settle it

Running the same strong non-IID CIFAR-10 partitions and observing that OUI-weighted aggregation produces equal or lower test accuracy than unweighted FedAvg would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.11571 by Alberto Fern\'andez-Hern\'andez, Cristian P\'erez-Corral, Enrique S. Quintana-Ort\'i, Jose Duato, Jose I. Mestre, Manuel F. Dolz.

**Figure 1.** Figure 1: Representative round-wise OUI distribution in the strong non-IID setting, together with the fitted Beta density. The fit is stable and defines a meaningful central region for structural weighting. 5 Conclusions and future work FedOUI provides a simple and effective way to incorporate activation structure into federated aggregation. Across the experiments, it behaves as a soft structural weighting rule: cli… view at source ↗

read the original abstract

Federated learning usually aggregates client updates using dataset size or gradient-level criteria, while overlooking internal signals about how each client model is organizing its input space during training. We introduce FedOUI, a simple aggregation rule based on the Overfitting-Underfitting Indicator (OUI), an activation-based and label-free metric. Each participating client sends its local update together with a OUI value computed on a fixed probe batch, and the server estimates the round-wise OUI distribution to assign lower weights to structurally atypical clients through a smooth reweighting rule. We evaluate FedOUI on CIFAR-10 under strong non-IID partitioning and noisy-client conditions, comparing it with FedAvg, FedProx, and a gradient-alignment baseline. The clearest gains appear under strong heterogeneity, where OUI-based weighting improves aggregation quality while remaining lightweight and interpretable. These results show that internal activation structure can provide useful information for federated aggregation beyond client size and gradient geometry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedOUI introduces a simple activation-based OUI metric for client reweighting in FL that shows gains on heterogeneous CIFAR-10, but the signal may track data shift more than update quality.

read the letter

The paper's core idea is to compute an Overfitting-Underfitting Indicator from activations on a fixed probe batch, then use the round-wise distribution of those values to smoothly downweight atypical clients during aggregation. This is positioned as a lightweight addition to FedAvg-style methods that avoids labels and heavy gradient comparisons. It reports the clearest improvements under strong non-IID partitions and noisy clients on CIFAR-10, while staying interpretable and low-overhead. That is the main thing to take away: a practical heuristic that tries to use internal model structure for weighting where size or gradient alignment fall short. The approach is new in its specific metric and its distribution-aware rule, and the authors give credit to the baselines they compare against. The implementation looks straightforward enough that someone could reproduce the weighting step without much trouble. The evaluation stays focused on the heterogeneity regime where the method is meant to help, which is reasonable. The soft spot is the causal link the method assumes. An atypical OUI on a shared probe batch can arise simply because a client's local data distribution differs from the others, not because its update is net harmful to the global model. Downweighting on that basis risks reducing useful diversity from tail distributions rather than filtering bad updates. The abstract gives no quantitative numbers, error bars, or ablation on the probe batch choice, so it is difficult to judge how much this confound affects the reported gains. If the full paper includes controls that separate distribution shift from update harm, that would strengthen the case; otherwise the improvement remains an empirical observation without a clear mechanism. This work is aimed at practitioners building federated systems that must handle real-world client heterogeneity without adding much communication or compute. A reader already familiar with FedProx or gradient-alignment baselines would see the incremental value quickly. It is solid enough on the idea and the problem relevance to warrant a serious referee, even though the current evidence is mostly empirical and the central assumption needs tighter validation. I would send it to review with a request for the exact reweighting formula, statistical details, and a check on whether the OUI signal is robust to probe batch choice.

Referee Report

2 major / 2 minor

Summary. The paper introduces FedOUI, a federated learning aggregation rule that augments standard methods by incorporating an Overfitting-Underfitting Indicator (OUI), an activation-based and label-free metric computed by each client on a fixed probe batch. Clients transmit their local update together with the OUI value; the server estimates the round-wise OUI distribution and applies a smooth reweighting function that downweights structurally atypical clients. Experiments compare FedOUI against FedAvg, FedProx, and a gradient-alignment baseline on CIFAR-10 under strong non-IID partitioning and noisy-client scenarios, with the largest reported improvements occurring under high heterogeneity.

Significance. If the OUI signal can be shown to isolate updates that are net harmful rather than merely distributionally atypical, the method supplies a lightweight, interpretable auxiliary signal for client weighting that operates without extra communication beyond a scalar per client. This could complement existing size- and gradient-based heuristics in heterogeneous federated settings and encourage further exploration of internal activation statistics for aggregation decisions.

major comments (2)

[§3.2] §3.2 (OUI definition and probe-batch protocol): The central claim that atypical OUI values reliably identify clients whose updates degrade global performance rests on an untested causal link. In strong non-IID CIFAR-10 partitions, activation statistics on any fixed probe batch will deviate simply because of covariate shift across clients' local data distributions; the manuscript provides no ablation or controlled experiment that isolates update harmfulness from distributional atypicality, leaving open the possibility that downweighting discards useful tail-class diversity.
[§4] §4 (Experimental evaluation): The abstract and evaluation summary assert “clearest gains appear under strong heterogeneity” yet supply no quantitative numbers, error bars, number of random seeds, or statistical significance tests. Without tables reporting mean test accuracy ± std, ablation on probe-batch size/composition, or direct comparison of OUI-based weights versus random or size-based weights under identical partitions, the load-bearing claim that OUI weighting improves aggregation quality cannot be assessed for robustness or reproducibility.

minor comments (2)

[§3.3] The reweighting formula (presumably in §3.3) should be stated explicitly with any hyperparameters or distribution-estimation details so that the method is fully reproducible from the text alone.
[§4] Figure captions and axis labels in the experimental section would benefit from explicit mention of the number of clients, participation rate, and exact non-IID partitioning method (e.g., Dirichlet α value) to allow direct replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below, acknowledging where revisions are needed to strengthen the claims and improve reproducibility.

read point-by-point responses

Referee: [§3.2] §3.2 (OUI definition and probe-batch protocol): The central claim that atypical OUI values reliably identify clients whose updates degrade global performance rests on an untested causal link. In strong non-IID CIFAR-10 partitions, activation statistics on any fixed probe batch will deviate simply because of covariate shift across clients' local data distributions; the manuscript provides no ablation or controlled experiment that isolates update harmfulness from distributional atypicality, leaving open the possibility that downweighting discards useful tail-class diversity.

Authors: We agree that the manuscript does not include a controlled ablation isolating whether atypical OUI values reflect harmful updates versus merely distributionally atypical but potentially useful ones. While our results show performance gains from OUI-based downweighting under strong heterogeneity, this leaves open the possibility raised. We will add a new ablation in the revision: synthetic clients with controlled covariate shifts (via label-preserving augmentations) that produce atypical OUI but non-degrading updates, to test whether OUI weighting preserves or discards such diversity. This will clarify the metric's specificity. revision: yes
Referee: [§4] §4 (Experimental evaluation): The abstract and evaluation summary assert “clearest gains appear under strong heterogeneity” yet supply no quantitative numbers, error bars, number of random seeds, or statistical significance tests. Without tables reporting mean test accuracy ± std, ablation on probe-batch size/composition, or direct comparison of OUI-based weights versus random or size-based weights under identical partitions, the load-bearing claim that OUI weighting improves aggregation quality cannot be assessed for robustness or reproducibility.

Authors: We concur that the current experimental reporting is insufficient for full reproducibility and robustness assessment. The revised manuscript will include: tables of mean test accuracy ± std over 5 random seeds with error bars on figures; statistical significance tests (e.g., t-tests) on improvements; ablations on probe-batch size and composition; and explicit comparisons of OUI weighting against random weighting and size-based weighting on identical partitions. These additions will directly support the heterogeneity gains claim. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical heuristic with no self-referential derivations or fitted predictions

full rationale

The paper presents FedOUI as a lightweight, interpretable aggregation heuristic that computes an activation-based OUI on a fixed probe batch and applies smooth reweighting based on deviation from the round-wise OUI distribution. No equations, uniqueness theorems, or derivation steps are described that reduce the weighting rule to a fitted parameter, self-citation chain, or input by construction. Evaluation is purely empirical (CIFAR-10 non-IID and noisy-client settings vs. FedAvg/FedProx baselines), with gains attributed to observed performance rather than any closed-form identity. This matches the reader's assessment that the method contains no load-bearing predictions or self-definitional steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the unproven premise that OUI captures client quality independently of gradient geometry and data size; no free parameters are named in the abstract, but the probe batch and reweighting smoothness are implicit choices.

axioms (1)

domain assumption OUI computed on a fixed probe batch is a stable and informative indicator of structural atypicality
Invoked when the server uses the round-wise OUI distribution to assign weights; no justification or sensitivity analysis provided in abstract.

invented entities (1)

Overfitting-Underfitting Indicator (OUI) no independent evidence
purpose: Activation-based, label-free scalar summarizing how a client model organizes its input space
Newly defined metric whose computation and interpretation are not detailed beyond the abstract description.

pith-pipeline@v0.9.0 · 5495 in / 1408 out tokens · 35656 ms · 2026-05-13T01:49:38.408040+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

ot_k = 1/d ∑ uj(t)/⌊B/2⌋ with uj = min(sj, B-sj) … Beta(αt,βt) … st_k = 2 min(Ft(ot_k),1-Ft(ot_k)) … wt_k ∝ nk(ε + st_k)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

OUI-based weighting improves aggregation quality … under strong heterogeneity on CIFAR-10

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

Communication-

McMahan, Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and Arcas, Blaise Aguera y , month = apr, year =. Communication-. Proceedings of the 20th

work page
[2]

Proceedings of Machine Learning and Systems , author =

Federated. Proceedings of Machine Learning and Systems , author =. 2020 , pages =

work page 2020
[3]

IEEE Transactions on Signal Processing , author =

Robust. IEEE Transactions on Signal Processing , author =. 2022 , note =. doi:10.1109/TSP.2022.3153135 , abstract =

work page doi:10.1109/tsp.2022.3153135 2022
[4]

and Dolz, Manuel F

Fernández-Hernández, Alberto and Mestre, Jose I. and Dolz, Manuel F. and Duato, Jose and Quintana-Ortí, Enrique S. , month = jul, year =. 2025. doi:10.1109/AMLDS63918.2025.11159348 , abstract =

work page doi:10.1109/amlds63918.2025.11159348 2025
[5]

and Dolz, Manuel F

Pérez-Corral, Cristian and Fernández-Hernández, Alberto and Mestre, Jose I. and Dolz, Manuel F. and Duato, Jose and Quintana-Ortí, Enrique S. , month = feb, year =. Regime. doi:10.48550/arXiv.2602.08333 , abstract =

work page doi:10.48550/arxiv.2602.08333
[6]

arXiv.org , author =

When. arXiv.org , author =

work page
[7]

ArXiv , author =

Federated. ArXiv , author =

work page
[8]

arXiv.org , author =

Revisiting. arXiv.org , author =

work page
[9]

Learning

Krizhevsky, Alex , year =. Learning

work page
[10]

Mendieta, Matias and Yang, Taojiannan and Wang, Pu and Lee, Minwoo and Ding, Zhengming and Chen, Chen , month = jun, year =. Local. 2022. doi:10.1109/CVPR52688.2022.00821 , abstract =

work page doi:10.1109/cvpr52688.2022.00821 2022
[11]

Huang, Jin and Ling, Charles X , year =. Using. IEEE Transactions on knowledge and Data Engineering , publisher =

work page