FedReLa: Imbalanced Federated Learning via Re-Labeling

Feng Liu; Guanghui Wang; Guangzheng Hu; Liuhua Peng; Mingming Gong; Patricia Men\'endez

arxiv: 2606.26037 · v1 · pith:OQOQCHIBnew · submitted 2026-06-24 · 📊 stat.ML · cs.CV· cs.LG

FedReLa: Imbalanced Federated Learning via Re-Labeling

Guangzheng Hu , Patricia Men\'endez , Feng Liu , Mingming Gong , Guanghui Wang , Liuhua Peng This is my paper

Pith reviewed 2026-06-25 19:21 UTC · model grok-4.3

classification 📊 stat.ML cs.CVcs.LG

keywords federated learningclass imbalancere-labelingdata heterogeneityimbalanced learningprivacy preservationmodel-agnostic method

0 comments

The pith

FedReLa corrects biased global decision boundaries in federated learning by re-labeling samples with a feature-dependent allocator that uses only local data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FedReLa to address the simultaneous problems of data heterogeneity across clients and global class imbalance in federated learning. It does so by re-labeling training samples locally according to a feature-dependent label re-allocator that realigns local labels with the global decision boundary. The method requires no knowledge of the global class distribution and adds no communication cost. A sympathetic reader would care because existing data-level fixes break when the global distribution is unknown and when some classes are entirely missing from individual clients. The approach is presented as modular so it can be combined with existing algorithmic imbalance corrections.

Core claim

By re-labeling samples with a feature-dependent label re-allocator, FedReLa corrects biased global decision boundaries without requiring knowledge of the global class distribution. This modular, model-agnostic approach can be integrated with algorithmic methods to deliver consistent improvements without additional communication overhead. Through extensive experiments, our method significantly improves the accuracy of minority classes and the overall accuracy on stepwise-imbalanced and long-tailed datasets, outperforming the previous state of the art.

What carries the argument

The feature-dependent label re-allocator, which re-labels local samples to reduce mismatch between local and global class imbalances.

If this is right

Minority-class accuracy rises on both stepwise-imbalanced and long-tailed federated datasets.
Overall model accuracy improves while preserving privacy constraints.
No extra communication rounds are needed beyond standard federated aggregation.
The method combines directly with existing algorithmic imbalance techniques.
Performance holds under extreme client-level class absence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Local feature statistics alone may suffice to infer corrective label shifts in other privacy-sensitive distributed settings.
The same re-labeling logic could be tested on tabular or sequential data to check whether the feature-dependence assumption generalizes.
Integration with server-side re-weighting schemes might further reduce the need for client-side data augmentation.
If the re-allocator proves stable, it could lower the barrier to deploying federated models in domains where class ratios differ sharply across sites.

Load-bearing premise

The feature-dependent label re-allocator can reliably identify and correct the mismatch between local and global class imbalances using only local data and without any global distribution information or extra communication.

What would settle it

A controlled experiment in which the re-labeling step produces no gain in minority-class accuracy or requires access to global class counts to function.

Figures

Figures reproduced from arXiv: 2606.26037 by Feng Liu, Guanghui Wang, Guangzheng Hu, Liuhua Peng, Mingming Gong, Patricia Men\'endez.

**Figure 1.** Figure 1: FedReLa Framework. At round t = Trelabel, FedReLa re-labels the local dataset with the label re-allocator based on the global model before the local training starts. in parallel. Specifically, before client k starts to train the global model f(θ; x) with parameter θ = θ global t received at round t = Trelabel, FedReLa re-labels its local dataset D(k) = {(x (k) i , y (k) i )} nk i=1 using a client-specific… view at source ↗

**Figure 2.** Figure 2: FedReLa re-labeling examples on CIFAR-10-LT. Top red: original head classes; bottom blue: re-labeled tail classes. “Posterior”: underestimated tail-class posterior P(ytail|x); “Relabel Prob”: assigned re-labeling probabilities. 5. Experiments Datasets: To provide a comprehensive evaluation, we conduct experiments under both step-wise and long-tailed global imbalance settings on Fashion-MNIST (F-MNIST) (X… view at source ↗

**Figure 3.** Figure 3: Sensitive analysis respect to τ , which controls the re-labeling strength. The threshold-tuning capability allows FedReLa to deliver customized class-wise enhancement, prioritizing tail-class gains (τ = 5) while preserving overall performance. This strategic trade-off (suppressing overprivileged head classes to boost 18 [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

read the original abstract

Federated learning has emerged as the foremost approach for decentralized model training with privacy preservation. The global class imbalance and cross-client data heterogeneity naturally coexist, and the mismatch between local and global imbalances exacerbates the performance degradation of the aggregated model. The agnosticism of global class distribution poses significant challenges for data-level methods, especially under extreme conditions with severe class absence across clients. In this paper, we propose FedReLa, a novel data-level approach that tackles the coexistence of data heterogeneity and class imbalance in federated learning. By re-labeling samples with a feature-dependent label re-allocator, FedReLa corrects biased global decision boundaries without requiring knowledge of the global class distribution. This modular, model-agnostic approach can be integrated with algorithmic methods to deliver consistent improvements without additional communication overhead. Through extensive experiments, our method significantly improves the accuracy of minority classes and the overall accuracy on stepwise-imbalanced and long-tailed datasets, outperforming the previous state of the art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedReLa proposes a local feature-based re-labeling step to handle class imbalance in federated learning without global distribution knowledge, but the abstract leaves the reliability of that step under extreme client heterogeneity unclear.

read the letter

The core idea is a data-level fix: a feature-dependent label re-allocator that adjusts sample labels on each client so the aggregated model moves toward better global boundaries. This is presented as new because it avoids any need for global class counts or extra rounds, and it is designed to plug into existing algorithmic methods without added communication.

What stands out as useful is the modularity and the focus on the practical mismatch between local and global imbalance. The claim of consistent gains on stepwise-imbalanced and long-tailed data, plus outperformance of prior work, is the kind of result that would interest people running real federated systems where minority classes matter.

The soft spot is exactly the one the stress-test flags. When a client has no examples of certain classes, its local feature space gives no direct signal about those classes' global prevalence or boundary location. The abstract does not show how the re-allocator still produces corrections that systematically help the global model rather than introducing new local artifacts. Without equations, pseudocode, or ablation on class-absence cases, it is difficult to tell whether the method actually meets its own strongest claim.

This paper is aimed at applied federated-learning practitioners who already deal with imbalance and want a lightweight data-level option. It is not trying to rewrite theory. A serious referee should see it if the full version supplies a clear description of the allocator, proper baseline comparisons, and targeted checks on the missing-class regime; otherwise the central promise stays untested.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes FedReLa, a data-level approach for federated learning under the joint challenges of data heterogeneity and class imbalance. It introduces a feature-dependent label re-allocator that re-labels local samples to correct biased global decision boundaries without access to global class distribution information or extra communication rounds. The method is presented as modular and model-agnostic, integrable with existing algorithmic techniques. Experiments are claimed to demonstrate consistent gains in minority-class and overall accuracy on stepwise-imbalanced and long-tailed datasets, outperforming prior state-of-the-art methods.

Significance. If the central claim holds, the contribution would be significant: it offers a practical, communication-free data-level intervention for a pervasive FL setting where local and global imbalances mismatch, without requiring global statistics that are often unavailable under privacy constraints. The modular design would also allow straightforward combination with existing FL algorithms.

major comments (2)

[Method description] The central claim—that a purely local, feature-dependent re-allocator can reliably detect and correct mismatches between local and global class distributions—rests on the unverified assumption that sufficient signal exists even under extreme local class absence. The manuscript must demonstrate (via algorithm details, pseudocode, or equations in the method section) how the re-allocator produces labels that systematically improve the aggregated boundary rather than local artifacts when one or more classes are entirely missing from a client.
[Experiments] The abstract asserts consistent outperformance and improvements on minority classes, yet supplies no equations, baseline tables, or error analysis. The full paper must include quantitative comparisons (with specific baselines, metrics, and statistical tests) and ablation studies isolating the re-allocator's contribution; without these, the empirical support for the claim cannot be evaluated.

minor comments (2)

[Method] Clarify the exact form of the feature-dependent re-allocator (e.g., is it a learned model, a heuristic, or a clustering step?) and any hyperparameters it introduces.
[Introduction] The abstract states the approach works 'without requiring knowledge of the global class distribution'; the paper should explicitly contrast this with any implicit assumptions about feature space overlap across clients.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments identify areas where additional clarity and detail will strengthen the presentation. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Method description] The central claim—that a purely local, feature-dependent re-allocator can reliably detect and correct mismatches between local and global class distributions—rests on the unverified assumption that sufficient signal exists even under extreme local class absence. The manuscript must demonstrate (via algorithm details, pseudocode, or equations in the method section) how the re-allocator produces labels that systematically improve the aggregated boundary rather than local artifacts when one or more classes are entirely missing from a client.

Authors: We agree that explicit demonstration is needed for the extreme case of class absence. The re-allocator computes pairwise feature similarities using the current global model embeddings and reassigns labels to local samples whose features are closer to other present classes; this is guided by the aggregated model rather than purely local statistics. In the revised manuscript we will add pseudocode and a dedicated subsection with equations showing the similarity-based reassignment rule and the mechanism by which local corrections propagate through aggregation to adjust the global decision boundary. This addition will clarify why the approach does not produce mere local artifacts. revision: yes
Referee: [Experiments] The abstract asserts consistent outperformance and improvements on minority classes, yet supplies no equations, baseline tables, or error analysis. The full paper must include quantitative comparisons (with specific baselines, metrics, and statistical tests) and ablation studies isolating the re-allocator's contribution; without these, the empirical support for the claim cannot be evaluated.

Authors: The full manuscript already contains tables comparing FedReLa against FedAvg, FedProx, and several imbalance-aware baselines on both stepwise-imbalanced and long-tailed datasets, reporting per-class and overall accuracy. To address the referee's concern we will expand the experimental section with (i) explicit ablation tables isolating the re-allocator, (ii) statistical significance tests (paired t-tests across random seeds), and (iii) error-bar plots. These additions will be placed in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: procedural method with no self-referential derivation

full rationale

The paper presents FedReLa as a modular, model-agnostic data-level method that re-labels samples via a feature-dependent label re-allocator to address local-global imbalance mismatch without global distribution knowledge or extra communication. No equations, parameter-fitting steps, or derivation chains appear in the abstract or described claims. The approach is described procedurally rather than as a quantity derived from itself, with no self-citations invoked as load-bearing uniqueness theorems or ansatzes. The central claim rests on empirical integration and experiments rather than reducing to fitted inputs or self-definitional loops, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are identifiable in the provided text.

pith-pipeline@v0.9.1-grok · 5713 in / 1059 out tokens · 25099 ms · 2026-06-25T19:21:11.188034+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 1 canonical work pages

[1]

Frénay, B

PMLR, 2021. Frénay, B. and Verleysen, M. Classification in the presence of label noise: a survey.IEEE transactions on neural networks and learning systems, 25(5):845–869, 2013. Goldberger, J. and Ben-Reuven, E. Training deep neural- networks using a noise adaptation layer.International conference on learning representations, 2022. Han, H., Wang, W.-Y ., a...

arXiv 2021
[2]

10 FedReLa: Imbalanced Federated Learning via Re-Labeling Li, Q., He, B., and Song, D

URL https://www.cs.toronto.edu/ ~kriz/learning-features-2009-TR.pdf. 10 FedReLa: Imbalanced Federated Learning via Re-Labeling Li, Q., He, B., and Song, D. Model-contrastive federated learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10713– 10722, 2021. Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalka...

work page doi:10.1109/icdmw51313.2020.00057 2009
[3]

For the parameter-aggregated global model: f x, KX k=1 wkθ(k) ! ≈f(x, θ 0) + ∂f(x, θ) ∂θ θ=θ0 KX k=1 wk(θ(k) −θ 0) =f(x, θ 0) + KX k=1 wk ∂f(x, θ) ∂θ θ=θ0 (θ(k) −θ 0)
[4]

For the posterior-aggregated global model (noting PK k=1 wk = 1): KX k=1 wkf(x, θ (k))≈f(x, θ 0) + ∂f(x, θ) ∂θ θ=θ0 KX k=1 wkθ(k) −θ 0 ! =f(x, θ 0) + KX k=1 wk ∂f(x, θ) ∂θ θ=θ0 (θ(k) −θ 0) The two expansions are identical, confirming that the parameter-aggregated and posterior-aggregated global models are first-order equivalentwhen local parameters are su...

2000
[5]

Local training overhead involvesgradient updates for new parameters or module. For example, methods that introduce new optimizable parameters (e.g., CLIMB, FedLOGE, etc.) require extra per-round local training overhead to update the gradients of these parameters
[6]

without extra local training

ONE-TIME Model Inference: FedReLa only performsone-time model inference during a single roundto obtain posterior probabilities, without updating the model or gradients. Therefore, we describe FedReLa as operating "without extra local training." Approximate one-time computation cost of FedReLa.The strength of FedReLa as a data-level method lies in its requ...
[7]

FedETF” and “+FedReLa (one-shot)

and FedLOGE (Xiao et al., 2024), are long-tail-oriented approaches, we conducted additional experiments on CIFAR-10 with step-wise imbalance. The results in Table 6 demonstrate that FedReLa still achieves SOTA performance on step-wise imbalance. FedReLa brings significant improvements, especially under higher imbalance ratios and more heterogeneous data. ...

arXiv 2024
[8]

Datasets & Sets DGlobal dataset (union of all local datasets) D(k) Local dataset of clientk eD(k) Re-labeled local dataset of clientk(by FedReLa) XFeature space (x∈ X ⊆R d) YLabel space (Y={1,2, ..., C},C: number of classes) Y (k) Original label set of clientk eY (k) Re-labeled label set of clientk I (k) i Index set of samples inD (k) with the same label ...
[9]

Model & Parameters θModel parameter vector θglobal t Global model parameter at communication roundt θ(k) Local model parameter of clientk f(θ;x)Global model (maps featurexto posterior probabilities) Trelabel Communication round for FedReLa’s one-shot re-labeling
[10]

Probability & Distribution πj Global prior probability of classj(π j = Pr(Y=j)) π(k) j Local prior probability of classjon clientk π[w] j Weighted aggregated prior of classj(server-side) ηj(x)Global posterior probability of classjgivenx η(k) j (x)Local posterior probability of classjgivenxon clientk eη(k) j (x)Posterior probability of classjon eD(k) eη[w]...
[11]

FedReLa Core Parameters ρ(k) ℓ→j(x)Re-labeling probability from local majority classℓto local minority classjon clientk ρ(k) j→ℓ(x)Re-labeling probability from classjtoℓon clientk(set to 0) Q(k) Posterior probability matrix ofD (k) (|D(k)| ×C) z(k) i Class-wise z-score vector of samplex (k) i on clientk µ(k) i Class-wise mean of posterior probabilities (f...
[12]

Imbalance & Heterogeneity IR(D)Global imbalance ratio (max j πj/min j πj) IF Imbalance factor (for long-tailed datasets) αHeterogeneity control parameter (Latent Dirichlet Sampling) KNumber of clients in the federation wk Aggregation weight of clientk(FedAvg:w k =|D (k)|/|D|)
[13]

Decision Boundaries Sj,ℓ Optimal Bayesian decision boundary between classesjandℓ eS(k) Decision boundary of clientkon re-labeled local dataset eD(k) Table 5.Notation Table: Key Symbols and Definitions. 22

[1] [1]

Frénay, B

PMLR, 2021. Frénay, B. and Verleysen, M. Classification in the presence of label noise: a survey.IEEE transactions on neural networks and learning systems, 25(5):845–869, 2013. Goldberger, J. and Ben-Reuven, E. Training deep neural- networks using a noise adaptation layer.International conference on learning representations, 2022. Han, H., Wang, W.-Y ., a...

arXiv 2021

[2] [2]

10 FedReLa: Imbalanced Federated Learning via Re-Labeling Li, Q., He, B., and Song, D

URL https://www.cs.toronto.edu/ ~kriz/learning-features-2009-TR.pdf. 10 FedReLa: Imbalanced Federated Learning via Re-Labeling Li, Q., He, B., and Song, D. Model-contrastive federated learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10713– 10722, 2021. Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalka...

work page doi:10.1109/icdmw51313.2020.00057 2009

[3] [3]

For the parameter-aggregated global model: f x, KX k=1 wkθ(k) ! ≈f(x, θ 0) + ∂f(x, θ) ∂θ θ=θ0 KX k=1 wk(θ(k) −θ 0) =f(x, θ 0) + KX k=1 wk ∂f(x, θ) ∂θ θ=θ0 (θ(k) −θ 0)

[4] [4]

For the posterior-aggregated global model (noting PK k=1 wk = 1): KX k=1 wkf(x, θ (k))≈f(x, θ 0) + ∂f(x, θ) ∂θ θ=θ0 KX k=1 wkθ(k) −θ 0 ! =f(x, θ 0) + KX k=1 wk ∂f(x, θ) ∂θ θ=θ0 (θ(k) −θ 0) The two expansions are identical, confirming that the parameter-aggregated and posterior-aggregated global models are first-order equivalentwhen local parameters are su...

2000

[5] [5]

Local training overhead involvesgradient updates for new parameters or module. For example, methods that introduce new optimizable parameters (e.g., CLIMB, FedLOGE, etc.) require extra per-round local training overhead to update the gradients of these parameters

[6] [6]

without extra local training

ONE-TIME Model Inference: FedReLa only performsone-time model inference during a single roundto obtain posterior probabilities, without updating the model or gradients. Therefore, we describe FedReLa as operating "without extra local training." Approximate one-time computation cost of FedReLa.The strength of FedReLa as a data-level method lies in its requ...

[7] [7]

FedETF” and “+FedReLa (one-shot)

and FedLOGE (Xiao et al., 2024), are long-tail-oriented approaches, we conducted additional experiments on CIFAR-10 with step-wise imbalance. The results in Table 6 demonstrate that FedReLa still achieves SOTA performance on step-wise imbalance. FedReLa brings significant improvements, especially under higher imbalance ratios and more heterogeneous data. ...

arXiv 2024

[8] [8]

Datasets & Sets DGlobal dataset (union of all local datasets) D(k) Local dataset of clientk eD(k) Re-labeled local dataset of clientk(by FedReLa) XFeature space (x∈ X ⊆R d) YLabel space (Y={1,2, ..., C},C: number of classes) Y (k) Original label set of clientk eY (k) Re-labeled label set of clientk I (k) i Index set of samples inD (k) with the same label ...

[9] [9]

Model & Parameters θModel parameter vector θglobal t Global model parameter at communication roundt θ(k) Local model parameter of clientk f(θ;x)Global model (maps featurexto posterior probabilities) Trelabel Communication round for FedReLa’s one-shot re-labeling

[10] [10]

Probability & Distribution πj Global prior probability of classj(π j = Pr(Y=j)) π(k) j Local prior probability of classjon clientk π[w] j Weighted aggregated prior of classj(server-side) ηj(x)Global posterior probability of classjgivenx η(k) j (x)Local posterior probability of classjgivenxon clientk eη(k) j (x)Posterior probability of classjon eD(k) eη[w]...

[11] [11]

FedReLa Core Parameters ρ(k) ℓ→j(x)Re-labeling probability from local majority classℓto local minority classjon clientk ρ(k) j→ℓ(x)Re-labeling probability from classjtoℓon clientk(set to 0) Q(k) Posterior probability matrix ofD (k) (|D(k)| ×C) z(k) i Class-wise z-score vector of samplex (k) i on clientk µ(k) i Class-wise mean of posterior probabilities (f...

[12] [12]

Imbalance & Heterogeneity IR(D)Global imbalance ratio (max j πj/min j πj) IF Imbalance factor (for long-tailed datasets) αHeterogeneity control parameter (Latent Dirichlet Sampling) KNumber of clients in the federation wk Aggregation weight of clientk(FedAvg:w k =|D (k)|/|D|)

[13] [13]

Decision Boundaries Sj,ℓ Optimal Bayesian decision boundary between classesjandℓ eS(k) Decision boundary of clientkon re-labeled local dataset eD(k) Table 5.Notation Table: Key Symbols and Definitions. 22