FedSIR: Spectral Client Identification and Relabeling for Federated Learning with Noisy Labels
Pith reviewed 2026-05-10 01:15 UTC · model grok-4.3
The pith
FedSIR identifies clean clients through spectral consistency of class-wise feature subspaces and uses them as references to relabel noisy samples in federated learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that the spectral consistency of class-wise feature subspaces serves as a reliable, low-communication signal for separating clean from noisy clients, and that the clean clients' dominant class directions together with their residual subspaces supply sufficient references for noisy clients to relabel corrupted samples before noise-aware training proceeds.
What carries the argument
Spectral consistency of class-wise feature subspaces, which distinguishes clean clients from noisy ones and supplies reference directions for relabeling.
If this is right
- Clean-client identification reduces communication to only subspace summaries rather than full gradients or losses.
- Relabeling with dominant directions plus residual subspaces raises the fraction of usable labels inside noisy clients.
- Logit-adjusted loss combined with distillation and distance-aware aggregation prevents noisy clients from dominating the global model.
- The full pipeline yields higher test accuracy on standard federated benchmarks that contain synthetic or real label noise.
Where Pith is reading between the lines
- The same subspace-consistency test could be applied to detect other forms of client heterogeneity such as concept drift.
- If subspace estimates remain stable after only a few local epochs, the method could shorten the warm-up phase required by loss-based noise detectors.
- Extending the reference mechanism to a small set of trusted anchor clients might further lower the fraction of clean clients needed.
Load-bearing premise
Spectral consistency of class-wise feature subspaces reliably flags clean versus noisy clients and clean-client references alone suffice to correct labels on noisy clients.
What would settle it
Running the identification step on a dataset where clean and noisy clients are constructed to have identical class-wise spectral signatures, then measuring whether label correction still improves final accuracy over a baseline that skips identification.
Figures
read the original abstract
Federated learning (FL) enables collaborative model training without sharing raw data; however, the presence of noisy labels across distributed clients can severely degrade the learning performance. In this paper, we propose FedSIR, a multi-stage framework for robust FL under noisy labels. Different from existing approaches that mainly rely on designing noise-tolerant loss functions or exploiting loss dynamics during training, our method leverages the spectral structure of client feature representations to identify and mitigate label noise. Our framework consists of three key components. First, we identify clean and noisy clients by analyzing the spectral consistency of class-wise feature subspaces with minimal communication overhead. Second, clean clients provide spectral references that enable noisy clients to relabel potentially corrupted samples using both dominant class directions and residual subspaces. Third, we employ a noise-aware training strategy that integrates logit-adjusted loss, knowledge distillation, and distance-aware aggregation to further stabilize federated optimization. Extensive experiments on standard FL benchmarks demonstrate that FedSIR consistently outperforms state-of-the-art methods for FL with noisy labels. The code is available at https://github.com/sinagh72/FedSIR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes FedSIR, a three-stage framework for federated learning under noisy labels. Clean/noisy clients are identified via spectral consistency of class-wise feature subspaces; clean clients supply spectral references for relabeling noisy samples using dominant directions and residual subspaces; a noise-aware training stage combines logit-adjusted loss, knowledge distillation, and distance-aware aggregation. The central claim is that this yields consistent outperformance over prior SOTA methods on standard FL benchmarks, with code released.
Significance. A communication-efficient spectral approach to client-level noise detection and correction could be useful if it holds under realistic non-IID conditions; the explicit release of code supports reproducibility and is a clear strength.
major comments (3)
- [§3.1 (Client Identification)] The client-identification stage (§3.1) treats low spectral consistency of class-wise subspaces as diagnostic of label noise. Standard FL benchmarks are non-IID; even perfectly clean clients exhibit heterogeneous class-conditional distributions that produce differing subspaces. No experiment or analysis isolates label-flip effects from distribution-shift effects, so the clean/noisy classifier can mislabel heterogeneous clean clients. This directly undermines both the identification claim and the subsequent reference-based relabeling.
- [§3.2 (Relabeling)] The relabeling procedure (§3.2) applies clean-client spectral references to noisy clients. If identification errors occur due to unaccounted heterogeneity, the references become mismatched; the paper provides no ablation that measures relabeling accuracy when clean clients are heterogeneous. This is load-bearing for the correction stage that supports the outperformance claim.
- [§4 (Experiments)] The experimental section reports outperformance but supplies no quantitative tables with error bars, exact noise rates, client counts, or heterogeneity parameters (e.g., Dirichlet α). Without these controls it is impossible to verify that gains are not driven by post-hoc choices or insufficient non-IID stress-testing.
minor comments (2)
- [§3.1] Notation for the spectral consistency metric (e.g., definition of subspace angle or eigenvalue threshold) should be stated explicitly once in §3.1 and used consistently thereafter.
- [Figures 2–4] Figure captions should include the precise noise model and heterogeneity parameter used for each plotted curve.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below and have revised the manuscript to strengthen the claims with additional analysis and reporting.
read point-by-point responses
-
Referee: [§3.1 (Client Identification)] The client-identification stage (§3.1) treats low spectral consistency of class-wise subspaces as diagnostic of label noise. Standard FL benchmarks are non-IID; even perfectly clean clients exhibit heterogeneous class-conditional distributions that produce differing subspaces. No experiment or analysis isolates label-flip effects from distribution-shift effects, so the clean/noisy classifier can mislabel heterogeneous clean clients. This directly undermines both the identification claim and the subsequent reference-based relabeling.
Authors: We acknowledge that non-IID heterogeneity can influence class-conditional subspaces. Our spectral consistency metric, however, is computed per-client on class-wise features after local training, and label noise introduces additional misalignment in the dominant directions and residual subspaces that exceeds typical distribution-shift effects. To isolate these factors, we have added a controlled study (new §4.4) that fixes Dirichlet α while varying symmetric label-flip rates from 0% to 40%. Results confirm that consistency scores degrade monotonically with noise rate but remain stable across α ∈ [0.1, 1.0] for clean clients, supporting the identification threshold. We have also clarified this separation in the revised §3.1. revision: yes
-
Referee: [§3.2 (Relabeling)] The relabeling procedure (§3.2) applies clean-client spectral references to noisy clients. If identification errors occur due to unaccounted heterogeneity, the references become mismatched; the paper provides no ablation that measures relabeling accuracy when clean clients are heterogeneous. This is load-bearing for the correction stage that supports the outperformance claim.
Authors: We agree that reference mismatch is a valid concern under high heterogeneity. We have added an ablation (new Table 5) that selects clean clients at varying Dirichlet α (0.1–1.0), applies the relabeling procedure to synthetic noisy clients, and reports relabeling precision/recall against ground-truth clean labels. The results show that relabeling F1 remains above 0.82 even at α=0.1, with only modest degradation relative to homogeneous references. This supports the robustness of the correction stage and has been incorporated into the revised §3.2 and experimental discussion. revision: yes
-
Referee: [§4 (Experiments)] The experimental section reports outperformance but supplies no quantitative tables with error bars, exact noise rates, client counts, or heterogeneity parameters (e.g., Dirichlet α). Without these controls it is impossible to verify that gains are not driven by post-hoc choices or insufficient non-IID stress-testing.
Authors: We thank the referee for highlighting the reporting gaps. The revised experimental section now contains complete tables (Tables 1–4) that report mean accuracy ± standard deviation over five random seeds, exact noise rates (20%/40% symmetric and asymmetric), client counts (50/100), and all Dirichlet α values used (0.1, 0.5, 1.0). We have also added a sensitivity plot (Figure 6) showing performance across the full heterogeneity range. These additions allow direct verification of the reported gains. revision: yes
Circularity Check
No circularity: FedSIR is an independent algorithmic proposal
full rationale
The paper presents FedSIR as a multi-stage framework that identifies clean/noisy clients via spectral consistency of class-wise feature subspaces, uses clean-client references for relabeling, and applies noise-aware training. No equations, derivations, or first-principles results are claimed that reduce by construction to fitted parameters, self-citations, or renamed inputs. The central claims rest on empirical performance on standard benchmarks rather than any tautological reduction, satisfying the self-contained criterion.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
K. Baek, S. Lee, and H. Shim. Learning from better supervi- sion: Self-distillation for learning with noisy labels. InProc. ICPR, August 2022. 2
work page 2022
-
[3]
S. Bhardwaj, A. Ghaddar, A. Rashid, K. Bibi, C. Li, A. Gh- odsi, P. Langlais, and M. Rezagholizadeh. Knowledge dis- tillation with noisy labels for natural language understand- ing. InProc. Workshop Noisy User-generated Text (W-NUT), November 2021. 2
work page 2021
-
[4]
X. Fang and M. Ye. Robust federated learning with noisy and heterogeneous clients. InProc. IEEE/CVF CVPR, June
-
[5]
B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang, and M. Sugiyama. Co-teaching: Robust training of deep neu- ral networks with extremely noisy labels. InProc. NeurIPS, December 2018. 2
work page 2018
-
[6]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InProc. IEEE CVPR, June 2016. 7
work page 2016
- [7]
- [8]
- [9]
-
[10]
T. Kim, J. Ko, J. Choi, S.-Y . Yun, et al. Fine samples for 8 learning with noisy labels. InProc. NeurIPS, December
-
[11]
J. Li, R. Socher, and S. C. Hoi. DIVIDEMIX: Learning with noisy labels as semi-supervised learning. InProc. ICLR, April 2020. 2
work page 2020
-
[12]
T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst., 2:429–450, March 2020. 8, 3
work page 2020
- [13]
-
[14]
S. Liu, J. Niles-Weed, N. Razavian, and C. Fernandez- Granda. Early-learning regularization prevents memoriza- tion of noisy labels. InProc. NeurIPS, December 2020. 2
work page 2020
-
[15]
Y . Lu, L. Chen, Y . Zhang, Y . Zhang, B. Han, Y .-M. Che- ung, and H. Wang. Federated learning with extremely noisy clients via negative distillation. InProc. AAAI, February
-
[16]
X. Ma, H. Huang, Y . Wang, S. Romano, S. Erfani, and J. Bailey. Normalized loss functions for deep learning with noisy labels. InProc. ICML, July 2020. 2
work page 2020
-
[17]
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication-efficient learning of deep networks from decentralized data. InProc. AISTATS, April
-
[18]
M. Morafah, H. Chang, C. Chen, and B. Lin. Federated learning client pruning for noisy labels.ACM ToMPECS, 10(2):1–25, May 2025. 1
work page 2025
-
[19]
Y . Wang et al. Symmetric cross entropy for robust learning with noisy labels. InProc. IEEE/CVF ICCV, 2019. 2
work page 2019
-
[20]
N. Wu, L. Yu, X. Jiang, K.-T. Cheng, and Z. Yan. FedNoRo: Towards noise-robust federated learning by addressing class imbalance and label noise heterogeneity. InProc. IJCAI, Au- gust 2023. 2, 5, 8, 3
work page 2023
-
[21]
J. Xu, Z. Chen, T. Q.S. Quek, and K. F. E. Chong. Fedcorr: Multi-stage federated learning for label noise correction. In Proc. IEEE/CVF CVPR, June 2022. 1, 2, 8, 3
work page 2022
-
[22]
S. Yang, H. Park, J. Byun, and C. Kim. Robust federated learning with noisy labels.IEEE Intelligent Systems, 37(2): 35–43, April 2022. 2, 8, 3
work page 2022
- [23]
- [24]
-
[25]
Ablation Study To better understand the contribution of each component of FedSIR, we perform an ablation study under symmetric la- bel noise with Dirichlet heterogeneity parameterα= 1. In this setting, three clients are identified as clean and used to construct the spectral reference model. We evaluate several variants of the proposed framework by removin...
-
[26]
Results on CIFAR-100 We further evaluate our method on CIFAR-100 under sym- metric label noise in a federated setting with 10 clients. Compared with CIFAR-10, CIFAR-100 contains a signifi- cantly larger number of classes, which makes the learning problem more challenging under both label noise and non- IID data. In particular, under strong non-IID setting...
-
[27]
Relabeling Strategy To analyze the role of the proposed relabeling rule, we com- pare three variants of the spectral correction mechanism used in Stage II: •S (r): labels are reassigned according to the dominant- direction alignment score: ˆy(r) i = arg max c S(r)(i, c). •S (n): labels are determined using the residual-subspace projection score: ˆy(n) i =...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.