FedReLa: Imbalanced Federated Learning via Re-Labeling
Pith reviewed 2026-06-25 19:21 UTC · model grok-4.3
The pith
FedReLa corrects biased global decision boundaries in federated learning by re-labeling samples with a feature-dependent allocator that uses only local data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By re-labeling samples with a feature-dependent label re-allocator, FedReLa corrects biased global decision boundaries without requiring knowledge of the global class distribution. This modular, model-agnostic approach can be integrated with algorithmic methods to deliver consistent improvements without additional communication overhead. Through extensive experiments, our method significantly improves the accuracy of minority classes and the overall accuracy on stepwise-imbalanced and long-tailed datasets, outperforming the previous state of the art.
What carries the argument
The feature-dependent label re-allocator, which re-labels local samples to reduce mismatch between local and global class imbalances.
If this is right
- Minority-class accuracy rises on both stepwise-imbalanced and long-tailed federated datasets.
- Overall model accuracy improves while preserving privacy constraints.
- No extra communication rounds are needed beyond standard federated aggregation.
- The method combines directly with existing algorithmic imbalance techniques.
- Performance holds under extreme client-level class absence.
Where Pith is reading between the lines
- Local feature statistics alone may suffice to infer corrective label shifts in other privacy-sensitive distributed settings.
- The same re-labeling logic could be tested on tabular or sequential data to check whether the feature-dependence assumption generalizes.
- Integration with server-side re-weighting schemes might further reduce the need for client-side data augmentation.
- If the re-allocator proves stable, it could lower the barrier to deploying federated models in domains where class ratios differ sharply across sites.
Load-bearing premise
The feature-dependent label re-allocator can reliably identify and correct the mismatch between local and global class imbalances using only local data and without any global distribution information or extra communication.
What would settle it
A controlled experiment in which the re-labeling step produces no gain in minority-class accuracy or requires access to global class counts to function.
Figures
read the original abstract
Federated learning has emerged as the foremost approach for decentralized model training with privacy preservation. The global class imbalance and cross-client data heterogeneity naturally coexist, and the mismatch between local and global imbalances exacerbates the performance degradation of the aggregated model. The agnosticism of global class distribution poses significant challenges for data-level methods, especially under extreme conditions with severe class absence across clients. In this paper, we propose FedReLa, a novel data-level approach that tackles the coexistence of data heterogeneity and class imbalance in federated learning. By re-labeling samples with a feature-dependent label re-allocator, FedReLa corrects biased global decision boundaries without requiring knowledge of the global class distribution. This modular, model-agnostic approach can be integrated with algorithmic methods to deliver consistent improvements without additional communication overhead. Through extensive experiments, our method significantly improves the accuracy of minority classes and the overall accuracy on stepwise-imbalanced and long-tailed datasets, outperforming the previous state of the art.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes FedReLa, a data-level approach for federated learning under the joint challenges of data heterogeneity and class imbalance. It introduces a feature-dependent label re-allocator that re-labels local samples to correct biased global decision boundaries without access to global class distribution information or extra communication rounds. The method is presented as modular and model-agnostic, integrable with existing algorithmic techniques. Experiments are claimed to demonstrate consistent gains in minority-class and overall accuracy on stepwise-imbalanced and long-tailed datasets, outperforming prior state-of-the-art methods.
Significance. If the central claim holds, the contribution would be significant: it offers a practical, communication-free data-level intervention for a pervasive FL setting where local and global imbalances mismatch, without requiring global statistics that are often unavailable under privacy constraints. The modular design would also allow straightforward combination with existing FL algorithms.
major comments (2)
- [Method description] The central claim—that a purely local, feature-dependent re-allocator can reliably detect and correct mismatches between local and global class distributions—rests on the unverified assumption that sufficient signal exists even under extreme local class absence. The manuscript must demonstrate (via algorithm details, pseudocode, or equations in the method section) how the re-allocator produces labels that systematically improve the aggregated boundary rather than local artifacts when one or more classes are entirely missing from a client.
- [Experiments] The abstract asserts consistent outperformance and improvements on minority classes, yet supplies no equations, baseline tables, or error analysis. The full paper must include quantitative comparisons (with specific baselines, metrics, and statistical tests) and ablation studies isolating the re-allocator's contribution; without these, the empirical support for the claim cannot be evaluated.
minor comments (2)
- [Method] Clarify the exact form of the feature-dependent re-allocator (e.g., is it a learned model, a heuristic, or a clustering step?) and any hyperparameters it introduces.
- [Introduction] The abstract states the approach works 'without requiring knowledge of the global class distribution'; the paper should explicitly contrast this with any implicit assumptions about feature space overlap across clients.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments identify areas where additional clarity and detail will strengthen the presentation. We address each major comment below and commit to revisions that directly respond to the concerns raised.
read point-by-point responses
-
Referee: [Method description] The central claim—that a purely local, feature-dependent re-allocator can reliably detect and correct mismatches between local and global class distributions—rests on the unverified assumption that sufficient signal exists even under extreme local class absence. The manuscript must demonstrate (via algorithm details, pseudocode, or equations in the method section) how the re-allocator produces labels that systematically improve the aggregated boundary rather than local artifacts when one or more classes are entirely missing from a client.
Authors: We agree that explicit demonstration is needed for the extreme case of class absence. The re-allocator computes pairwise feature similarities using the current global model embeddings and reassigns labels to local samples whose features are closer to other present classes; this is guided by the aggregated model rather than purely local statistics. In the revised manuscript we will add pseudocode and a dedicated subsection with equations showing the similarity-based reassignment rule and the mechanism by which local corrections propagate through aggregation to adjust the global decision boundary. This addition will clarify why the approach does not produce mere local artifacts. revision: yes
-
Referee: [Experiments] The abstract asserts consistent outperformance and improvements on minority classes, yet supplies no equations, baseline tables, or error analysis. The full paper must include quantitative comparisons (with specific baselines, metrics, and statistical tests) and ablation studies isolating the re-allocator's contribution; without these, the empirical support for the claim cannot be evaluated.
Authors: The full manuscript already contains tables comparing FedReLa against FedAvg, FedProx, and several imbalance-aware baselines on both stepwise-imbalanced and long-tailed datasets, reporting per-class and overall accuracy. To address the referee's concern we will expand the experimental section with (i) explicit ablation tables isolating the re-allocator, (ii) statistical significance tests (paired t-tests across random seeds), and (iii) error-bar plots. These additions will be placed in the revised version. revision: yes
Circularity Check
No circularity: procedural method with no self-referential derivation
full rationale
The paper presents FedReLa as a modular, model-agnostic data-level method that re-labels samples via a feature-dependent label re-allocator to address local-global imbalance mismatch without global distribution knowledge or extra communication. No equations, parameter-fitting steps, or derivation chains appear in the abstract or described claims. The approach is described procedurally rather than as a quantity derived from itself, with no self-citations invoked as load-bearing uniqueness theorems or ansatzes. The central claim rests on empirical integration and experiments rather than reducing to fitted inputs or self-definitional loops, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
PMLR, 2021. Frénay, B. and Verleysen, M. Classification in the presence of label noise: a survey.IEEE transactions on neural networks and learning systems, 25(5):845–869, 2013. Goldberger, J. and Ben-Reuven, E. Training deep neural- networks using a noise adaptation layer.International conference on learning representations, 2022. Han, H., Wang, W.-Y ., a...
arXiv 2021
-
[2]
10 FedReLa: Imbalanced Federated Learning via Re-Labeling Li, Q., He, B., and Song, D
URL https://www.cs.toronto.edu/ ~kriz/learning-features-2009-TR.pdf. 10 FedReLa: Imbalanced Federated Learning via Re-Labeling Li, Q., He, B., and Song, D. Model-contrastive federated learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10713– 10722, 2021. Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalka...
-
[3]
For the parameter-aggregated global model: f x, KX k=1 wkθ(k) ! ≈f(x, θ 0) + ∂f(x, θ) ∂θ θ=θ0 KX k=1 wk(θ(k) −θ 0) =f(x, θ 0) + KX k=1 wk ∂f(x, θ) ∂θ θ=θ0 (θ(k) −θ 0)
-
[4]
For the posterior-aggregated global model (noting PK k=1 wk = 1): KX k=1 wkf(x, θ (k))≈f(x, θ 0) + ∂f(x, θ) ∂θ θ=θ0 KX k=1 wkθ(k) −θ 0 ! =f(x, θ 0) + KX k=1 wk ∂f(x, θ) ∂θ θ=θ0 (θ(k) −θ 0) The two expansions are identical, confirming that the parameter-aggregated and posterior-aggregated global models are first-order equivalentwhen local parameters are su...
2000
-
[5]
Local training overhead involvesgradient updates for new parameters or module. For example, methods that introduce new optimizable parameters (e.g., CLIMB, FedLOGE, etc.) require extra per-round local training overhead to update the gradients of these parameters
-
[6]
without extra local training
ONE-TIME Model Inference: FedReLa only performsone-time model inference during a single roundto obtain posterior probabilities, without updating the model or gradients. Therefore, we describe FedReLa as operating "without extra local training." Approximate one-time computation cost of FedReLa.The strength of FedReLa as a data-level method lies in its requ...
-
[7]
FedETF” and “+FedReLa (one-shot)
and FedLOGE (Xiao et al., 2024), are long-tail-oriented approaches, we conducted additional experiments on CIFAR-10 with step-wise imbalance. The results in Table 6 demonstrate that FedReLa still achieves SOTA performance on step-wise imbalance. FedReLa brings significant improvements, especially under higher imbalance ratios and more heterogeneous data. ...
arXiv 2024
-
[8]
Datasets & Sets DGlobal dataset (union of all local datasets) D(k) Local dataset of clientk eD(k) Re-labeled local dataset of clientk(by FedReLa) XFeature space (x∈ X ⊆R d) YLabel space (Y={1,2, ..., C},C: number of classes) Y (k) Original label set of clientk eY (k) Re-labeled label set of clientk I (k) i Index set of samples inD (k) with the same label ...
-
[9]
Model & Parameters θModel parameter vector θglobal t Global model parameter at communication roundt θ(k) Local model parameter of clientk f(θ;x)Global model (maps featurexto posterior probabilities) Trelabel Communication round for FedReLa’s one-shot re-labeling
-
[10]
Probability & Distribution πj Global prior probability of classj(π j = Pr(Y=j)) π(k) j Local prior probability of classjon clientk π[w] j Weighted aggregated prior of classj(server-side) ηj(x)Global posterior probability of classjgivenx η(k) j (x)Local posterior probability of classjgivenxon clientk eη(k) j (x)Posterior probability of classjon eD(k) eη[w]...
-
[11]
FedReLa Core Parameters ρ(k) ℓ→j(x)Re-labeling probability from local majority classℓto local minority classjon clientk ρ(k) j→ℓ(x)Re-labeling probability from classjtoℓon clientk(set to 0) Q(k) Posterior probability matrix ofD (k) (|D(k)| ×C) z(k) i Class-wise z-score vector of samplex (k) i on clientk µ(k) i Class-wise mean of posterior probabilities (f...
-
[12]
Imbalance & Heterogeneity IR(D)Global imbalance ratio (max j πj/min j πj) IF Imbalance factor (for long-tailed datasets) αHeterogeneity control parameter (Latent Dirichlet Sampling) KNumber of clients in the federation wk Aggregation weight of clientk(FedAvg:w k =|D (k)|/|D|)
-
[13]
Decision Boundaries Sj,ℓ Optimal Bayesian decision boundary between classesjandℓ eS(k) Decision boundary of clientkon re-labeled local dataset eD(k) Table 5.Notation Table: Key Symbols and Definitions. 22
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.