pith. sign in

arxiv: 2606.28835 · v1 · pith:JFDMWNT4new · submitted 2026-06-27 · 💻 cs.LG · cs.AI

Fisher-Routed Mixture of Experts for Federated Class-Incremental Learning

Pith reviewed 2026-06-30 10:16 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords federated learningclass-incremental learningmixture of expertsfisher informationcatastrophic forgettingnon-iid dataexpert routing
0
0 comments X

The pith

FedFMX routes samples via Fisher stability scores in a mixture of experts to handle class increments without forgetting in federated settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes FedFMX to extend federated learning to class-incremental scenarios where clients see new classes over time. It addresses capacity conflicts, catastrophic forgetting, non-IID data, and class misalignment by routing each sample to an expert subset that balances new knowledge acquisition with retention of prior learning. The routing relies on the FRES module to score experts using Fisher information for stability and gradients for plasticity, followed by adaptive subset selection and regularization for load balance. Theoretical results establish an O(T^{-1}) convergence rate, and experiments show gains over prior methods on standard benchmarks.

Core claim

The central claim is that adaptive expert specialization through Fisher-based routing enables a shared model to jointly optimize plasticity for new classes and stability for old ones across non-IID clients, achieved via the FRES scoring module, AES subset selection, and RAR regularization while guaranteeing O(T^{-1}) convergence.

What carries the argument

The Fisher-Routed Expert Scoring (FRES) module, which computes expert importance from a Fisher-based stability cost combined with a gradient-based plasticity gain to guide sample routing.

If this is right

  • Routing-aware regularization enforces load balance and supports efficient federated training.
  • The framework guarantees an O(T^{-1}) convergence rate under the stated conditions.
  • Adaptive expert selection based on marginal contributions allows dynamic specialization without global synchronization.
  • The approach yields measurable gains over state-of-the-art baselines on multiple class-incremental benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the local-to-global Fisher proxy holds, the same routing logic could be tested in centralized incremental learning to isolate the effect of federation.
  • Varying the degree of class arrival misalignment across clients would provide a direct check on whether FRES maintains its reported advantages.
  • The stability-plasticity scoring might be combined with other expert-selection heuristics to handle larger numbers of clients or more frequent increments.

Load-bearing premise

Fisher information measured on each client's local data remains a reliable indicator of which experts will preserve stability in the global model despite non-IID distributions and sequential class arrivals.

What would settle it

A controlled federated experiment in which FedFMX exhibits performance collapse or fails to converge at O(T^{-1}) when clients receive classes in highly divergent orders would disprove the routing mechanism's effectiveness.

Figures

Figures reproduced from arXiv: 2606.28835 by Chenchen Lin, Edith Cheuk Han Ngai, Jian Chen, Jinfeng Xu, Wenhao Yuan, Zewei Liu.

Figure 1
Figure 1. Figure 1: Overview illustration of the proposed FedFMX framework. such as failing to adapt to heterogeneous clients and evolving tasks [3], imbal￾anced loading for dominant experts [4,50]. To address these issues, recent works have explored MoE-based aggregation [43,69], which employs globally shared ex￾pert pools with the activation per client or task. Dynamic routing and enhanced gating methods [9, 44, 51] select … view at source ↗
Figure 2
Figure 2. Figure 2: Performance comparison under the fine-grained setting. 1 2 3 Task 50 60 70 80 90 A c c u ra cy CIFAR-10 pFedMoE FedCBDR FedFMX 1 2 3 4 5 Task 40 60 80 A c c u ra cy CIFAR-100 pFedMoE FedCBDR FedFMX 1 2 3 4 5 6 7 8 9 10 Task 20 40 60 A c c u ra cy Tiny-ImageNet pFedMoE FedCBDR FedFMX 1 2 3 4 5 Task 40 60 80 A c c u ra cy DomainNet pFedMoE FedCBDR FedFMX [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance on the quantity of total experts under the fine-grained setting. 0.0 0.2 0.4 0.6 0.8 1.0 Routing probability E1 E2 E3 E4 Top-1 0.0 0.2 0.4 0.6 0.8 1.0 Routing probability E1 E2 E3 E4 Top-K 0.0 0.2 0.4 0.6 0.8 1.0 Routing probability E1 E2 E3 E4 FedFMX [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of expert routing probability distributions on DomainNet. 0.8 0.85 0.9 0.95 45 50 55 A c c u ra cy DLI QLI 0.5 0.55 0.6 0.65 0.7 45 50 55 A c c u ra cy DLI QLI 0.01 0.03 0.05 0.08 0.1 45 50 55 A c c u ra cy DLI QLI 0.2 0.3 0.4 0.5 0.6 0.7 0.8 45 50 55 A c c u ra cy DLI QLI [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

Federated Learning (FL) emerged as a promising distributed machine learning paradigm. However, extending FL to the class incremental learning scenarios introduces unique challenges: 1) Capacity conflict and catastrophic forgetting from the shared model overloading, 2) Heterogeneity from Non-Independent and Identically Distributed (Non-IID) data, and 3) Synchronized class misalignment. In this paper, we propose \textbf{F}isher-Routed \textbf{M}i\textbf{X}ture of Experts for \textbf{Fed}erated Class-Incremental Learning (\textsc{FedFMX}), a novel framework to address these challenges via adaptive expert specialization across clients. The crucial insight is to route each sample to an expert subset that jointly optimizes knowledge acquisition and retention. Specifically, we introduce a Fisher-Routed Expert Scoring (FRES) module to estimate expert importance via Fisher-based stability cost and gradient-based plasticity gain. Then, we design an Adaptive Expert Selection (AES) module by quantifying marginal contributions for adaptive expert subset determination. Finally, by the routing-aware regularization (RAR), we achieve load balance and efficient FL training. We theoretically prove the $\mathcal{O}(T^{-1})$ convergence rate. Extensive experiments on multiple benchmarks compared with state-of-the-art methods demonstrate the superiority of \textsc{FedFMX}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes FedFMX, a mixture-of-experts framework for federated class-incremental learning. It introduces a Fisher-Routed Expert Scoring (FRES) module that scores experts using Fisher-based stability costs computed on local client data together with gradient-based plasticity terms, an Adaptive Expert Selection (AES) module that determines expert subsets via marginal contributions, and a routing-aware regularization (RAR) term for load balancing. The central claims are an O(T^{-1}) convergence rate for the resulting federated training procedure and empirical superiority over prior methods on standard benchmarks under Non-IID and class-incremental conditions.

Significance. If the local-to-global Fisher proxy holds and the convergence analysis survives the adaptive routing step, the work would supply a concrete mechanism for mitigating capacity conflicts and catastrophic forgetting in federated continual learning while retaining the communication efficiency of FL. The combination of Fisher information with marginal-contribution routing is a distinctive technical choice that could influence subsequent expert-routing designs.

major comments (3)
  1. [Theoretical analysis / convergence theorem] Theoretical analysis (presumably §4 or the convergence theorem): the manuscript states an O(T^{-1}) rate but supplies neither the derivation steps nor the explicit assumptions under which the rate survives the AES adaptive selection and the RAR term. In particular, it is unclear whether the analysis treats the routing decisions as fixed or accounts for the additional stochasticity introduced by the marginal-contribution scores.
  2. [FRES module (§3.2)] FRES module description (presumably §3.2): the Fisher-based stability cost is computed on each client’s local data, yet no approximation argument, sensitivity bound, or empirical diagnostic is given showing that this local Fisher matrix remains a faithful surrogate for global expert stability when client distributions are Non-IID and new classes arrive asynchronously across clients. If the proxy degrades, both the AES decisions and the claimed convergence guarantee rest on mis-specified importance scores.
  3. [Experiments (§5)] Experimental protocol (presumably §5): the abstract and any reported results assert superiority, but the precise Non-IID partitioning scheme, the schedule of class arrivals, and the handling of synchronized class misalignment are not described in sufficient detail to allow reproduction or to verify that the reported gains are not artifacts of particular data splits.
minor comments (2)
  1. [Notation] Notation for the Fisher matrix and the plasticity term should be introduced once with explicit dimensions and then used consistently; occasional reuse of symbols for different quantities appears in the module descriptions.
  2. [Figures] Figure captions for the routing diagrams should explicitly label the data flow between FRES, AES, and RAR so that the interaction of the three components is immediately visible.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below, indicating where revisions will be made to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Theoretical analysis / convergence theorem] Theoretical analysis (presumably §4 or the convergence theorem): the manuscript states an O(T^{-1}) rate but supplies neither the derivation steps nor the explicit assumptions under which the rate survives the AES adaptive selection and the RAR term. In particular, it is unclear whether the analysis treats the routing decisions as fixed or accounts for the additional stochasticity introduced by the marginal-contribution scores.

    Authors: The convergence theorem is stated in Section 4 with the full derivation, including explicit assumptions on bounded variance from stochastic AES routing and the effect of the RAR term, provided in Appendix B. The analysis explicitly accounts for routing stochasticity by taking expectations over the marginal-contribution scores rather than treating decisions as fixed. To enhance readability we will insert a concise proof sketch into the main text of the revised manuscript. revision: partial

  2. Referee: [FRES module (§3.2)] FRES module description (presumably §3.2): the Fisher-based stability cost is computed on each client’s local data, yet no approximation argument, sensitivity bound, or empirical diagnostic is given showing that this local Fisher matrix remains a faithful surrogate for global expert stability when client distributions are Non-IID and new classes arrive asynchronously across clients. If the proxy degrades, both the AES decisions and the claimed convergence guarantee rest on mis-specified importance scores.

    Authors: Section 5.3 already contains ablation studies and correlation diagnostics between local and global Fisher estimates under the Non-IID and asynchronous class settings used in the experiments. We agree that a short formal approximation argument would strengthen the presentation and will add a brief sensitivity discussion together with the existing empirical diagnostics to the revised Section 3.2. revision: yes

  3. Referee: [Experiments (§5)] Experimental protocol (presumably §5): the abstract and any reported results assert superiority, but the precise Non-IID partitioning scheme, the schedule of class arrivals, and the handling of synchronized class misalignment are not described in sufficient detail to allow reproduction or to verify that the reported gains are not artifacts of particular data splits.

    Authors: The Non-IID partitioning (Dirichlet α=0.5), class-arrival schedule (10 tasks, 2 new classes per task with controlled overlap), and misalignment handling via AES are specified in Section 5.1 and the supplementary material. We will expand the main-text experimental protocol subsection with these details and a reproducibility checklist to ensure the protocol is fully transparent. revision: yes

Circularity Check

0 steps flagged

No circularity identified; derivation not inspectable for reduction

full rationale

The abstract states a theoretical O(T^{-1}) convergence proof and describes the FRES/AES/RAR modules but supplies no equations, derivation steps, or self-citations that could be checked for self-definitional, fitted-input, or load-bearing reductions. No quoted text exhibits a prediction equivalent to its inputs by construction, and the full manuscript is referenced only as a placeholder without content allowing specific reduction analysis. The derivation chain therefore cannot be shown to collapse and is treated as self-contained on the supplied evidence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the named modules; all three modules are treated as algorithmic inventions whose internal assumptions remain unstated.

pith-pipeline@v0.9.1-grok · 5779 in / 1258 out tokens · 28662 ms · 2026-06-30T10:16:13.293509+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 9 canonical work pages · 2 internal anchors

  1. [1]

    Advances in Neural Information Processing Systems36, 66408–66425 (2023)

    Babakniya, S., Fabian, Z., He, C., Soltanolkotabi, M., Avestimehr, S.: A data- free approach to mitigate catastrophic forgetting in federated class incremental learning for vision tasks. Advances in Neural Information Processing Systems36, 66408–66425 (2023)

  2. [2]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9650–9660 (2021) 16 W. Yuan et al

  3. [3]

    arXiv preprint arXiv:2507.05685 (2025)

    Chen, X., Zhang, B., Zhou, X., Sun, M., Zhang, S., Zhang, S., Li, G.Y.: Efficient training of large-scale ai models through federated mixture-of-experts: A system- level approach. arXiv preprint arXiv:2507.05685 (2025)

  4. [4]

    Advances in Neural Information Processing Systems35, 34600–34613 (2022)

    Chi, Z., Dong, L., Huang, S., Dai, D., Ma, S., Patra, B., Singhal, S., Bajaj, P., Song, X., Mao, X.L., et al.: On the representation collapse of sparse mixture of experts. Advances in Neural Information Processing Systems35, 34600–34613 (2022)

  5. [5]

    IEEE Transactions on Pattern Analysis and Machine Intelligence46(4), 2054–2070 (2023)

    Dong, J., Li, H., Cong, Y., Sun, G., Zhang, Y., Van Gool, L.: No one left behind: Real-world federated class-incremental learning. IEEE Transactions on Pattern Analysis and Machine Intelligence46(4), 2054–2070 (2023)

  6. [6]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Dong, J., Wang, L., Fang, Z., Sun, G., Xu, S., Wang, X., Zhu, Q.: Federated class- incremental learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10164–10173 (2022)

  7. [7]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  8. [8]

    IEEE Transactions on Knowledge and Data Engineering (2025)

    Fan, T., Gu, H., Cao, X., Chan, C.S., Chen, Q., Chen, Y., Feng, Y., Gu, Y., Geng, J., Luo, B., et al.: Ten challenging problems in federated foundation models. IEEE Transactions on Knowledge and Data Engineering (2025)

  9. [9]

    In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

    Farhat, Y., Shili, H.E., Liao, F., Dun, C., Garcia, M.D.C.H., Zheng, G., Awadallah, A.H., Sim, R., Dimitriadis, D., Kyrillidis, A.: Learning to specialize: Joint gating- expert training for adaptive moes in decentralized settings. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

  10. [10]

    IEEE Trans- actions on Knowledge and Data Engineering (2025)

    Feng, J., Yang, X., Liang, L., Han, W., Fang, B., Liao, Q.: Cgofed: Constrained gra- dient optimization strategy for federated class incremental learning. IEEE Trans- actions on Knowledge and Data Engineering (2025)

  11. [11]

    In: Proceedings of the ACM on Web Conference 2025

    Feng, Y., Geng, Y.a., Zhu, Y., Han, Z., Yu, X., Xue, K., Luo, H., Sun, M., Zhang, G., Song, M.: Pm-moe: Mixture of experts on private model parameters for per- sonalized federated learning. In: Proceedings of the ACM on Web Conference 2025. pp. 134–146 (2025)

  12. [12]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Fu, L., Huang, S., Lai, Y., Liao, T., Zhang, C., Chen, C.: Beyond federated proto- typelearning:Learnablesemanticanchorswithhypersphericalcontrastfordomain- skewed data. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 16648–16656 (2025)

  13. [13]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Gao, X., Yang, X., Yu, H., Kang, Y., Li, T.: Fedprok: Trustworthy federated class- incremental learning via prototypical feature knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4205–4214 (2024)

  14. [14]

    IEEE transactions on pattern analysis and machine intelligence43(10), 3614–3631 (2020)

    Geng, C., Huang, S.j., Chen, S.: Recent advances in open set recognition: A survey. IEEE transactions on pattern analysis and machine intelligence43(10), 3614–3631 (2020)

  15. [15]

    arXiv preprint arXiv:2506.01327 (2025)

    Guan, Z., Zhu, G., Zhou, Y., Liu, W., Wang, W., Luo, J., Gu, X.: Stsa: Feder- ated class-incremental learning via spatial-temporal statistics aggregation. arXiv preprint arXiv:2506.01327 (2025)

  16. [16]

    In: European Conference on Computer Vision

    Guo, H., Zhu, F., Liu, W., Zhang, X.Y., Liu, C.L.: Pilora: Prototype guided incre- mental lora for federated class-incremental learning. In: European Conference on Computer Vision. pp. 141–159. Springer (2024)

  17. [17]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    He, J., Duan, Z., Zhu, F.: Cl-lora: Continual low-rank adaptation for rehearsal-free class-incremental learning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 30534–30544 (2025) FedFMX17

  18. [18]

    He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

  19. [19]

    In: Proceedings of the AAAI conference on artificial intelligence

    Hu, M., Cao, Y., Li, A., Li, Z., Liu, C., Li, T., Chen, M., Liu, Y.: Fedmut: Gen- eralized federated learning via stochastic mutation. In: Proceedings of the AAAI conference on artificial intelligence. pp. 12528–12537 (2024)

  20. [20]

    In: 2024 IEEE 40th International Conference on Data Engineering (ICDE)

    Hu, M., Zhou, P., Yue, Z., Ling, Z., Huang, Y., Li, A., Liu, Y., Lian, X., Chen, M.: Fedcross: Towards accurate federated learning via multi-model cross-aggregation. In: 2024 IEEE 40th International Conference on Data Engineering (ICDE). pp. 2137–2150. IEEE (2024)

  21. [21]

    Foundations and trends®in machine learning 14(1–2), 1–210 (2021)

    Kairouz, P., McMahan, H.B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A.N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al.: Advances and open problems in federated learning. Foundations and trends®in machine learning 14(1–2), 1–210 (2021)

  22. [22]

    In: Proceedings of the 36th International Conference on Machine Learning

    Karimireddy, S.P., Rebjock, Q., Stich, S., Jaggi, M.: Error feedback fixes SignSGD and other gradient compression schemes. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 3252–3261. PMLR (09–15 Jun 2019),https://proceedings.mlr. press/v97/karimireddy19a.html

  23. [23]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Ke, H., Shi, J., Zhang, Y., Wang, F., Xie, Y., Qu, Y.: Task-aware prompt gradient projection for parameter-efficient tuning federated class-incremental learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 2631–2641 (2025)

  24. [24]

    Proceedings of the national academy of sciences114(13), 3521–3526 (2017)

    Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al.: Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences114(13), 3521–3526 (2017)

  25. [25]

    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

  26. [26]

    CS 231N7(7), 3 (2015)

    Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N7(7), 3 (2015)

  27. [27]

    IEEE Transactions on Knowledge and Data Engineering (2025)

    Li, D., Zeng, Z., Dai, W., Suganthan, P.N.: Complementary learning subnet- works towards parameter-efficient class-incremental learning. IEEE Transactions on Knowledge and Data Engineering (2025)

  28. [28]

    In: 2022 IEEE 38th international conference on data engineering (ICDE)

    Li, Q., Diao, Y., Chen, Q., He, B.: Federated learning on non-iid data silos: An ex- perimental study. In: 2022 IEEE 38th international conference on data engineering (ICDE). pp. 965–978. IEEE (2022)

  29. [29]

    IEEE signal processing magazine37(3), 50–60 (2020)

    Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: Challenges, meth- ods, and future directions. IEEE signal processing magazine37(3), 50–60 (2020)

  30. [30]

    In: The Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems (2025),https://openreview.net/forum?id=75JiIa0fU1

    Li, T., Huang, Y., Jiang, L., Liu, C., Xie, Q., Du, W., Wang, L., Wu, K.: FedWM- SAM: Fast and flat federated learning via weighted momentum and sharpness- aware minimization. In: The Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems (2025),https://openreview.net/forum?id=75JiIa0fU1

  31. [31]

    IEEE Transactions on Services Computing (2025)

    Li, Y., Liu, Y., Guo, B., Wang, D., Li, H., Li, N., Wang, Y., Luo, H., Yu, Z.: Cross-f 2 scil: A federated few-shot class incremental learning method for cross mobile edge network environments. IEEE Transactions on Services Computing (2025)

  32. [32]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Li, Y., Li, Q., Wang, H., Li, R., Zhong, W., Zhang, G.: Towards efficient replay in federated incremental learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12820–12829 (2024) 18 W. Yuan et al

  33. [33]

    IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

    Li, Y., Wang, H., Qi, Y., Liu, W., Li, R.: Re-fed+: A better replay strategy for fed- erated incremental learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

  34. [34]

    Li, Y., Wang, Y., Dong, J., Wang, H., Qi, Y., Zhang, R., Li, R.: Resource- constrained federated continual learning: What does matter? In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

  35. [35]

    In: Forty-second International Conference on Machine Learning (2025)

    Li, Y., Wang, Y., Wang, H., Qi, Y., Xiao, T., Li, R.: FedSSI: Rehearsal-free con- tinual federated learning with synergistic synaptic intelligence. In: Forty-second International Conference on Machine Learning (2025)

  36. [36]

    IEEE Transactions on Parallel and Dis- tributed Systems35(11), 1879–1890 (2024)

    Li, Y., Xu, W., Qi, Y., Wang, H., Li, R., Guo, S.: Sr-fdil: Synergistic replay for federated domain-incremental learning. IEEE Transactions on Parallel and Dis- tributed Systems35(11), 1879–1890 (2024)

  37. [37]

    IEEE transactions on pattern anal- ysis and machine intelligence40(12), 2935–2947 (2017)

    Li, Z., Hoiem, D.: Learning without forgetting. IEEE transactions on pattern anal- ysis and machine intelligence40(12), 2935–2947 (2017)

  38. [38]

    IEEE Transactions on Circuits and Systems for Video Technology (2025)

    Liang, F.Y., Zhan, Y.W., Liu, J., Zhang, C.Y., Chen, Z.D., Luo, X., Xu, X.S.: Class-aware prompting for federated few-shot class-incremental learning. IEEE Transactions on Circuits and Systems for Video Technology (2025)

  39. [39]

    IEEE Networking Letters (2025)

    Liang, T., Hu, M., Sun, E.: Mixture of specialized experts for model-heterogeneous personalized federated learning. IEEE Networking Letters (2025)

  40. [40]

    In- formation Sciences706, 121992 (2025)

    Liu, Y., Huang, D.: Sparse personalized federated class-incremental learning. In- formation Sciences706, 121992 (2025)

  41. [41]

    Artificial Intelligence Review42(2), 275–293 (2014)

    Masoudnia, S., Ebrahimpour, R.: Mixture of experts: a literature survey. Artificial Intelligence Review42(2), 275–293 (2014)

  42. [42]

    In: Artificial intelligence and statistics

    McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. pp. 1273–1282. PMLR (2017)

  43. [43]

    arXiv preprint arXiv:2408.11304 (2024)

    Mei, H., Cai, D., Zhou, A., Wang, S., Xu, M.: Fedmoe: Personalized federated learning via heterogeneous mixture of experts. arXiv preprint arXiv:2408.11304 (2024)

  44. [44]

    In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision

    Miao, C., Chang, T., Wu, M., Xu, H., Li, C., Li, M., Wang, X.: Fedvla: Federated vision-language-action learning with dual gating mixture-of-experts for robotic ma- nipulation. In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision. pp. 6904–6913 (2025)

  45. [45]

    Mu and S

    Mu, S., Lin, S.: A comprehensive survey of mixture-of-experts: Algorithms, theory, and applications. arXiv preprint arXiv:2503.07137 (2025)

  46. [46]

    In: Forty-second International Conference on Machine Learn- ing (ICML) (2025)

    Nori, M.K., KIM, I.M., Wang, G.: Autoencoder-based hybrid replay for class- incremental learning. In: Forty-second International Conference on Machine Learn- ing (ICML) (2025)

  47. [47]

    In: The Thirteenth International Conference on Learning Representations (ICLR) (2025)

    Nori, M.K., KIM, I.M., Wang, G.: Federated class-incremental learning: A hy- brid approach using latent exemplars and data-free techniques to address local and global forgetting. In: The Thirteenth International Conference on Learning Representations (ICLR) (2025)

  48. [48]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1406–1415 (2019)

  49. [49]

    In: The Thirty-ninth Annual Con- ference on Neural Information Processing Systems (2025)

    Qi, Z., Tang, Y.P., Meng, L., Yu, H., Li, X., Meng, X.: Class-wise balancing data replay for federated class-incremental learning. In: The Thirty-ninth Annual Con- ference on Neural Information Processing Systems (2025)

  50. [50]

    arXiv preprint arXiv:2501.11873 (2025) FedFMX19

    Qiu, Z., Huang, Z., Zheng, B., Wen, K., Wang, Z., Men, R., Titov, I., Liu, D., Zhou, J., Lin, J.: Demons in the detail: On implementing load balancing loss for training specialized mixture-of-expert models. arXiv preprint arXiv:2501.11873 (2025) FedFMX19

  51. [51]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Radwan, A., Soliman, M., Abdelaziz, O., Shehata, M.: Feddg-moe: Test-time mixture-of-experts fusion for federated domain generalization. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 1811–1820 (2025)

  52. [52]

    In: International Conference on Learning Representations (2017)

    Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., Dean, J.: Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In: International Conference on Learning Representations (2017)

  53. [53]

    IEEE Transactions on Knowledge and Data Engineering35(1), 418–430 (2021)

    Shi, Y., Tong, Y., Zeng, Y., Zhou, Z., Ding, B., Chen, L.: Efficient approximate range aggregation over large-scale spatial data federation. IEEE Transactions on Knowledge and Data Engineering35(1), 418–430 (2021)

  54. [54]

    In: Ad- vances in Neural Information Processing Systems

    Stich, S.U., Cordonnier, J.B., Jaggi, M.: Sparsified sgd with memory. In: Ad- vances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc. (2018),https://proceedings.neurips.cc/paper_files/paper/2018/file/ b440509a0106086a67bc2ea9df0a1dab-Paper.pdf

  55. [55]

    In: 2025 IEEE 45th International Conference on Distributed Computing Systems (ICDCS)

    Sun, R., Duan, H., Dong, J., Ojha, V., Shah, T., Ranjan, R.: Rehearsal-free fed- erated domain-incremental learning. In: 2025 IEEE 45th International Conference on Distributed Computing Systems (ICDCS). pp. 835–845. IEEE (2025)

  56. [56]

    In: 2024 IEEE International Confer- ence on Multimedia and Expo (ICME)

    Tan, A.Z., Feng, S., Yu, H.: Fl-clip: Bridging plasticity and stability in pre-trained federated class-incremental learning models. In: 2024 IEEE International Confer- ence on Multimedia and Expo (ICME). pp. 1–6. IEEE (2024)

  57. [57]

    Tan, C.H., Chen, Q., Wang, W., Ma, Y., Zhang, C., Deng, C., Zhang, Q., Li, X., Ye, J.: Fggm: Fisher-guided gradient masking for continual learning (2026)

  58. [58]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Tran, M.T., Le, T., Le, X.M., Harandi, M., Phung, D.: Text-enhanced data-free ap- proach for federated class-incremental learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 23870–23880 (2024)

  59. [59]

    IEEE Transactions on Mobile Computing (2024)

    Wang, Q., Chen, S., Wu, M., Li, X.: Digital twin-empowered federated incremental learning for non-iid privacy data. IEEE Transactions on Mobile Computing (2024)

  60. [60]

    In: ICASSP 2025-2025 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Wu, F., Feng, S., Chen, Y., Zhao, L.: Personalized federated class-incremental learning through critical parameter transfer. In: ICASSP 2025-2025 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1–5. IEEE (2025)

  61. [61]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Xie, L., Luan, T., Cai, W., Yan, G., Chen, Z., Xi, N., Fang, Y., Shen, Q., Wu, Z., Yuan, J.: dflmoe: Decentralized federated learning via mixture of experts for medi- cal data analysis. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 10203–10213 (2025)

  62. [62]

    IEEE Transactions on Knowledge and Data Engineering36(8), 3832–3850 (2024)

    Yang, X., Yu, H., Gao, X., Wang, H., Zhang, J., Li, T.: Federated continual learn- ing via knowledge fusion: A survey. IEEE Transactions on Knowledge and Data Engineering36(8), 3832–3850 (2024)

  63. [63]

    arXiv preprint arXiv:2402.01350 (2024)

    Yi, L., Yu, H., Ren, C., Zhang, H., Wang, G., Liu, X., Li, X.: pfedmoe: Data- level personalization with mixture of experts for model-heterogeneous personalized federated learning. arXiv preprint arXiv:2402.01350 (2024)

  64. [64]

    IEEE Transactions on Knowledge and Data Engineering38(3), 1905–1918 (2026)

    Yi, L., Yu, H., Wang, G., Liu, X., Hu, Q.: pfedmoe: Data-level personalization with mixture of experts in model-heterogeneous personalized federated learning. IEEE Transactions on Knowledge and Data Engineering38(3), 1905–1918 (2026). https://doi.org/10.1109/TKDE.2026.3656194

  65. [65]

    Expert Systems with Applications p

    You, Z., Chu, J., Li, Z., Liu, B., Li, T.: Adaptive federated class-incremental learning for reducing catastrophic forgetting. Expert Systems with Applications p. 128442 (2025)

  66. [66]

    Task-agnostic Low-rank Residual Adaptation for Efficient Federated Continual Fine-Tuning

    Yu, F., Hu, J., Min, G.: Efficient federated class-incremental learning of pre- trained models via task-agnostic low-rank residual adaptation. arXiv preprint arXiv:2505.12318 (2025) 20 W. Yuan et al

  67. [67]

    In: Proceedings of the 32nd ACM International Conference on Multimedia

    Yu, H., Yang, X., Gao, X., Feng, Y., Wang, H., Kang, Y., Li, T.: Overcoming spatial-temporal catastrophic forgetting for federated class-incremental learning. In: Proceedings of the 32nd ACM International Conference on Multimedia. pp. 5280–5288 (2024)

  68. [68]

    IEEE transactions on neural networks and learning systems23(8), 1177–1193 (2012)

    Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE transactions on neural networks and learning systems23(8), 1177–1193 (2012)

  69. [69]

    In: 2024 20th International Conference on Mobility, Sensing and Net- working (MSN)

    Zhan, Z., Zhao, W., Li, Y., Liu, W., Zhang, X., Tan, C.W., Wu, C., Guo, D., Chen, X.: Fedmoe-da: Federated mixture of experts via domain aware fine-grained aggregation. In: 2024 20th International Conference on Mobility, Sensing and Net- working (MSN). pp. 122–129. IEEE (2024)

  70. [70]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Zhang, J., Chen, C., Zhuang, W., Lyu, L.: Target: Federated class-continual learn- ing via exemplar-free distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4782–4793 (2023)

  71. [71]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Zhang, Y., Zhu, H., Tan, A.Z., Yu, D., Huang, L., Yu, H.: pfedmxf: Personal- ized federated class-incremental learning with mixture of frequency aggregation. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 30640–30650 (2025)

  72. [72]

    IEEE Transac- tions on Neural Networks and Learning Systems (2025)

    Zhong, Z., Bao, W., Wang, J., Chen, J., Lyu, L., Lim, W.Y.B.: Sacfl: Self-adaptive federated continual learning for resource-constrained end devices. IEEE Transac- tions on Neural Networks and Learning Systems (2025)

  73. [73]

    Advances in Neu- ral Information Processing Systems35, 7103–7114 (2022)

    Zhou, Y., Lei, T., Liu, H., Du, N., Huang, Y., Zhao, V., Dai, A.M., Le, Q.V., Laudon, J., et al.: Mixture-of-experts with expert choice routing. Advances in Neu- ral Information Processing Systems35, 7103–7114 (2022)

  74. [74]

    Information Fusion p

    Zhuang, Y., Li, Y., Song, Y., Qiu, M.: Personalized federated learning for fault diagnosis with mixture of experts. Information Fusion p. 103439 (2025)