Recognition: unknown
From Coordinate Matching to Structural Alignment: Rethinking Prototype Alignment in Heterogeneous Federated Learning
Pith reviewed 2026-05-08 10:51 UTC · model grok-4.3
The pith
In heterogeneous federated learning, prototype alignment succeeds when it matches inter-class relations rather than exact coordinates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Coordinate alignment couples two objectives that should be separate: matching inter-class semantic structure, which aids classification, and forcing a shared feature basis, which is harmful under model heterogeneity. Structural alignment removes the second objective by matching relational properties such as inter-class similarities or distances instead of absolute positions, allowing each client's feature extractor to remain distinct while still benefiting from global class relations.
What carries the argument
Structural alignment objective that matches inter-class relational structure across clients instead of absolute coordinate positions in the embedding space.
Load-bearing premise
Inter-class relational structure can be aligned across clients without any shared coordinate basis and that doing so is always more useful than coordinate matching when feature extractors differ.
What would settle it
An experiment on the same heterogeneous benchmarks where structural alignment produces equal or lower accuracy than coordinate alignment.
Figures
read the original abstract
Heterogeneous federated learning (HtFL) aims to enable collaboration among clients that differ in both data distributions and model architectures. Prototype-based methods, which communicate class-level feature centers (prototypes) instead of full model parameters, have recently shown strong potential for HtFL. Existing prototype-based HtFL methods typically reuse the MSE-based or cosine-based alignment mechanism developed for homogeneous FL when aligning client-specific representations with global prototypes. These approaches are essentially coordinate alignment, where representations of clients are forced to match the global prototypes in the embedding space in an element-wise manner. Such alignment implicitly assumes that all clients should map their representations into the feature subspace defined by the global prototypes. This assumption is reasonable in homogeneous FL, where all clients share the same feature extractor. However, it becomes problematic in HtFL, since heterogeneous feature extractors naturally induce client-specific feature subspaces, and forcing all clients to optimize within a single global subspace unnecessarily suppresses their learning capacity. We observe that coordinate alignment implicitly couples two distinct objectives: aligning inter-class semantic structure, which is directly beneficial for classification, and enforcing a shared feature basis, which is unnecessary and even harmful under model heterogeneity. Building on this insight, we design FedSAF, which shifts the alignment objective from absolute coordinates to inter-class relational structure. We demonstrate that structural alignment consistently outperforms coordinate alignment in heterogeneous settings. Experiments on multiple benchmarks show that our structural alignment outperforms state-of-the-art prototype-based HtFL methods by up to 3.52\%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that existing prototype-based methods in heterogeneous federated learning (HtFL) rely on coordinate alignment (MSE or cosine similarity) of client prototypes to global ones, which implicitly enforces a shared feature basis unsuitable for heterogeneous model architectures. It proposes FedSAF to instead align inter-class relational structures, claiming this decouples beneficial semantic alignment from harmful coordinate constraints and yields consistent gains, outperforming state-of-the-art prototype-based HtFL methods by up to 3.52% across benchmarks.
Significance. If the performance improvements are shown to arise specifically from the structural alignment objective, the work offers a conceptually useful reframing of prototype alignment in HtFL that could guide future methods toward preserving client-specific feature subspaces while still transferring class relations. The distinction between coordinate and structural objectives is a clear contribution, though its significance hinges on whether experiments isolate this factor from other design elements.
major comments (2)
- [Experiments] Experiments section: the reported gains of up to 3.52% are not supported by an ablation that fixes all other FedSAF components (prototype computation, optimization schedule, regularization) and reverts only the alignment loss to standard MSE/cosine coordinate matching. Without this control, it is impossible to attribute improvements to the claimed shift from coordinate to structural alignment rather than confounding factors.
- [Method] Method section: no explicit loss formulation or derivation is provided for the structural alignment objective (e.g., how inter-class relations are quantified and optimized independently of absolute coordinates), which is load-bearing for the central claim that coordinate alignment couples two distinct objectives.
minor comments (1)
- [Abstract] Abstract: the claim of 'multiple benchmarks' is not accompanied by any enumeration of datasets, model architectures, or heterogeneity settings, which hinders immediate assessment of the scope of the empirical results.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the experimental isolation of our core contribution and to improve the explicitness of the method. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the reported gains of up to 3.52% are not supported by an ablation that fixes all other FedSAF components (prototype computation, optimization schedule, regularization) and reverts only the alignment loss to standard MSE/cosine coordinate matching. Without this control, it is impossible to attribute improvements to the claimed shift from coordinate to structural alignment rather than confounding factors.
Authors: We agree that the current experiments do not include a controlled ablation that holds every other FedSAF component fixed while swapping only the alignment loss back to coordinate matching. Such an ablation is required to isolate the effect of the structural objective. In the revised manuscript we will add this experiment on the primary benchmarks, reporting accuracy deltas when the structural loss is replaced by MSE and by cosine similarity under identical prototype computation, optimization schedule, and regularization settings. revision: yes
-
Referee: [Method] Method section: no explicit loss formulation or derivation is provided for the structural alignment objective (e.g., how inter-class relations are quantified and optimized independently of absolute coordinates), which is load-bearing for the central claim that coordinate alignment couples two distinct objectives.
Authors: We accept that the manuscript would benefit from a more self-contained mathematical presentation. The structural alignment objective is described in Section 3 as alignment of inter-class relation matrices (pairwise cosine similarities among prototypes), but the explicit loss expression and its derivation are not written out. In the revision we will insert a dedicated subsection containing (i) the precise loss formula, (ii) the derivation showing how the objective depends only on relative angles and is invariant to client-specific linear transformations of the feature space, and (iii) a short argument clarifying the separation from coordinate constraints. revision: yes
Circularity Check
No circularity: new structural alignment objective introduced independently of prior fitted results
full rationale
The paper proposes FedSAF by shifting the alignment loss from coordinate matching (MSE/cosine on prototypes) to inter-class relational structure. This is presented as a design choice motivated by an observation about heterogeneous feature subspaces, not as a mathematical derivation or re-expression of any fitted quantity. No equations reduce a prediction to an input by construction, no parameters are fitted on a subset and then called a prediction, and no self-citation chain is invoked to justify uniqueness or force the method. The central claim rests on experimental comparisons rather than tautological re-labeling of existing results. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Inter-class relational structure is directly beneficial for classification and can be aligned independently of client-specific feature bases.
Reference graph
Works this paper leans on
-
[1]
Communication-efficient learning of deep networks from decentralized data,
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial Intelligence and Statistics. PMLR, 2017, pp. 1273– 1282
2017
-
[2]
Federated learning for generalization, robustness, fairness: A survey and benchmark,
W. Huang, M. Ye, Z. Shi, G. Wan, H. Li, B. Du, and Q. Yang, “Federated learning for generalization, robustness, fairness: A survey and benchmark,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 9387–9406, 2024
2024
-
[3]
Federated optimization in heterogeneous networks,
T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,”Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020
2020
-
[4]
Scaffold: Stochastic controlled averaging for federated learn- ing,
S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learn- ing,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 5132–5143
2020
-
[5]
Model-contrastive federated learning,
Q. Li, B. He, and D. Song, “Model-contrastive federated learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 713–10 722
2021
-
[6]
Stabilizing and accelerating federated learning on heterogeneous data with partial client participation,
H. Zhang, C. Li, W. Dai, Z. Zheng, J. Zou, and H. Xiong, “Stabilizing and accelerating federated learning on heterogeneous data with partial client participation,”IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, vol. 47, no. 1, pp. 67–83, 2025
2025
-
[7]
Federated feature augmentation and alignment,
T. Zhou, Y . Yuan, B. Wang, and E. Konukoglu, “Federated feature augmentation and alignment,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 11 119–11 135, 2024
2024
-
[8]
Bold but cautious: Unlocking the potential of personalized federated learning through cautiously aggressive collaboration,
X. Wu, X. Liu, J. Niu, G. Zhu, and S. Tang, “Bold but cautious: Unlocking the potential of personalized federated learning through cautiously aggressive collaboration,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 19 375–19 384
2023
-
[9]
X. Wu, X. Liu, J. Niu, H. Wang, S. Tang, G. Zhu, and H. Su, “Decoupling general and personalized knowledge in federated learning via additive and low-rank decomposition,” inProceedings of the 32nd ACM International Conference on Multimedia, ser. MM ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 7172–7181. [Online]. Available: https...
-
[10]
The diversity bonus: Learning from dissimilar clients in personalized fed- erated learning,
X. Wu, J. Niu, X. Liu, G. Zhu, S. Tang, W. Lin, and J. Cao, “The diversity bonus: Learning from dissimilar clients in personalized fed- erated learning,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 10, pp. 18 613–18 627, 2025
2025
-
[11]
Tackling feature-classifier mismatch in federated learning via prompt-driven feature transformation,
X. Wu, X. Liu, J. Niu, G. Zhu, M. Shi, S. Tang, and J. Yuan, “Tackling feature-classifier mismatch in federated learning via prompt-driven feature transformation,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id=vTJFQu5YXz JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8...
2025
-
[12]
Htfllib: A comprehensive heterogeneous federated learning library and benchmark,
J. Zhang, X. Wu, Y . Zhou, X. Sun, Q. Cai, Y . Liu, Y . Hua, Z. Zheng, J. Cao, and Q. Yang, “Htfllib: A comprehensive heterogeneous federated learning library and benchmark,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2025
2025
-
[13]
Fedmd: Heterogenous federated learning via model distillation,
D. Li and J. Wang, “Fedmd: Heterogenous federated learning via model distillation,”arXiv preprint arXiv:1910.03581, 2019
-
[14]
Ensemble distillation for robust model fusion in federated learning,
T. Lin, L. Kong, S. U. Stich, and M. Jaggi, “Ensemble distillation for robust model fusion in federated learning,”Advances in neural information processing systems, vol. 33, pp. 2351–2363, 2020
2020
-
[15]
Parameterized knowledge transfer for personalized federated learning,
J. Zhang, S. Guo, X. Ma, H. Wang, W. Xu, and F. Wu, “Parameterized knowledge transfer for personalized federated learning,”Advances in Neural Information Processing Systems, vol. 34, pp. 10 092–10 104, 2021
2021
-
[16]
Communication-efficient federated learning via knowledge distillation,
C. Wu, F. Wu, L. Lyu, Y . Huang, and X. Xie, “Communication-efficient federated learning via knowledge distillation,”Nature communications, vol. 13, no. 1, p. 2032, 2022
2032
-
[17]
Group knowledge transfer: Federated learning of large cnns at the edge,
C. He, M. Annavaram, and S. Avestimehr, “Group knowledge transfer: Federated learning of large cnns at the edge,”Advances in neural information processing systems, vol. 33, pp. 14 068–14 080, 2020
2020
-
[18]
Fedproto: Federated prototype learning across heterogeneous clients,
Y . Tan, G. Long, L. Liu, T. Zhou, Q. Lu, J. Jiang, and C. Zhang, “Fedproto: Federated prototype learning across heterogeneous clients,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, 2022, pp. 8432–8440
2022
-
[19]
Fedtgp: Trainable global prototypes with adaptive-margin-enhanced contrastive learning for data and model heterogeneity in federated learning,
J. Zhang, Y . Liu, Y . Hua, and J. Cao, “Fedtgp: Trainable global prototypes with adaptive-margin-enhanced contrastive learning for data and model heterogeneity in federated learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 38, no. 15, 2024, pp. 16 768–16 776
2024
-
[20]
X. Wu, J. Niu, X. Liu, G. Zhu, J. Zhang, and S. Tang, “Enhancing visual representation with textual semantics: Textual semantics-powered prototypes for heterogeneous federated learning,” 2025. [Online]. Available: https://arxiv.org/abs/2503.13543
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
Aligning before aggregating: Enabling communication efficient cross-domain federated learning via consistent feature extraction,
G. Zhu, X. Liu, S. Tang, and J. Niu, “Aligning before aggregating: Enabling communication efficient cross-domain federated learning via consistent feature extraction,”IEEE Transactions on Mobile Computing, vol. 23, no. 5, pp. 5880–5896, 2024
2024
-
[22]
Personalized federated learning with feature alignment and classifier collaboration,
J. Xu, X. Tong, and S.-L. Huang, “Personalized federated learning with feature alignment and classifier collaboration,” inThe Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=SXZr8aDKia
2023
-
[23]
Taming cross- domain representation variance in federated prototype learning with heterogeneous data domains,
L. Wang, J. Bian, L. Zhang, C. Chen, and J. Xu, “Taming cross- domain representation variance in federated prototype learning with heterogeneous data domains,” inThe Thirty-eighth Annual Conference on Neural Information Processing Systems
-
[24]
Rethinking federated learning with domain shift: A prototype view,
W. Huang, M. Ye, Z. Shi, H. Li, and B. Du, “Rethinking federated learning with domain shift: A prototype view,” in2023 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023, pp. 16 312–16 322
2023
-
[25]
Federated learning from pre-trained models: A contrastive learning approach,
Y . Tan, G. Long, J. Ma, L. Liu, T. Zhou, and J. Jiang, “Federated learning from pre-trained models: A contrastive learning approach,”Advances in neural information processing systems, vol. 35, pp. 19 332–19 344, 2022
2022
-
[26]
Fedfa: Federated learning with feature anchors to align features and classifiers for heterogeneous data,
T. Zhou, J. Zhang, and D. H. K. Tsang, “Fedfa: Federated learning with feature anchors to align features and classifiers for heterogeneous data,” IEEE Transactions on Mobile Computing, vol. 23, no. 6, pp. 6731–6742, 2024
2024
-
[27]
Heterogeneous feder- ated learning: State-of-the-art and research challenges,
M. Ye, X. Fang, B. Du, P. C. Yuen, and D. Tao, “Heterogeneous feder- ated learning: State-of-the-art and research challenges,”ACM Computing Surveys, vol. 56, no. 3, pp. 1–44, 2023
2023
-
[28]
Hetero{fl}: Computation and communication efficient federated learning for heterogeneous clients,
E. Diao, J. Ding, and V . Tarokh, “Hetero{fl}: Computation and communication efficient federated learning for heterogeneous clients,” inInternational Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=TNkPBBYFkXg
2021
-
[29]
Fedrolex: Model- heterogeneous federated learning with rolling sub-model extraction,
S. Alam, L. Liu, M. Yan, and M. Zhang, “Fedrolex: Model- heterogeneous federated learning with rolling sub-model extraction,” Advances in neural information processing systems, vol. 35, pp. 29 677– 29 690, 2022
2022
-
[30]
Fiarse: Model-heterogeneous federated learning via importance-aware submodel extraction,
F. Wu, X. Wang, Y . Wang, T. Liu, L. Su, and J. Gao, “Fiarse: Model-heterogeneous federated learning via importance-aware submodel extraction,”Advances in Neural Information Processing Systems, vol. 37, pp. 115 615–115 651, 2024
2024
-
[31]
Data-Free Knowledge Distillation for Heterogeneous Federated Learning,
Z. Zhu, J. Hong, and J. Zhou, “Data-Free Knowledge Distillation for Heterogeneous Federated Learning,” 2021
2021
-
[32]
Generalizable heterogeneous fed- erated cross-correlation and instance similarity learning,
W. Huang, M. Ye, Z. Shi, and B. Du, “Generalizable heterogeneous fed- erated cross-correlation and instance similarity learning,”IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. 46, no. 2, pp. 712–728, 2023
2023
-
[33]
Robust asymmetric heterogeneous fed- erated learning with corrupted clients,
X. Fang, M. Ye, and B. Du, “Robust asymmetric heterogeneous fed- erated learning with corrupted clients,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 4, pp. 2693–2705, 2025
2025
-
[34]
Federated mutual learning: a collaborative machine learning method for heterogeneous data, models, and objectives,
T. Shen, J. Zhang, X. Jia, F. Zhang, Z. Lv, K. Kuang, C. Wu, and F. Wu, “Federated mutual learning: a collaborative machine learning method for heterogeneous data, models, and objectives,”Frontiers of Information Technology & Electronic Engineering, vol. 24, no. 10, pp. 1390–1402, 2023
2023
-
[35]
Federated model heterogeneous matryoshka representation learning,
L. Yi, H. Yu, C. Ren, G. Wang, X. Liet al., “Federated model heterogeneous matryoshka representation learning,”Advances in Neural Information Processing Systems, vol. 37, pp. 66 431–66 454, 2024
2024
-
[36]
Bridging model heterogeneity in federated learning via uncertainty-based asym- metrical reciprocity learning,
J. Wang, C. Zhao, L. Lyu, Q. You, M. Huai, and F. Ma, “Bridging model heterogeneity in federated learning via uncertainty-based asym- metrical reciprocity learning,” inProceedings of the 41st International Conference on Machine Learning, 2024, pp. 52 290–52 308
2024
-
[37]
Think locally, act globally: Federated learning with local and global representations,
P. P. Liang, T. Liu, L. Ziyin, N. B. Allen, R. P. Auerbach, D. Brent, R. Salakhutdinov, and L.-P. Morency, “Think locally, act globally: Federated learning with local and global representations,”arXiv preprint arXiv:2001.01523, 2020
-
[38]
Fedgh: Heterogeneous federated learning with generalized global header,
L. Yi, G. Wang, X. Liu, Z. Shi, and H. Yu, “Fedgh: Heterogeneous federated learning with generalized global header,” inProceedings of the 31st ACM International Conference on Multimedia, 2023
2023
-
[39]
Fedssa: semantic similarity-based aggregation for efficient model-heterogeneous personalized federated learning,
L. Yi, H. Yu, Z. Shi, G. Wang, X. Liu, L. Cui, and X. Li, “Fedssa: semantic similarity-based aggregation for efficient model-heterogeneous personalized federated learning,” inProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024, pp. 5371– 5379
2024
-
[40]
An upload-efficient scheme for transferring knowledge from a server-side pre-trained generator to clients in heterogeneous federated learning,
J. Zhang, Y . Liu, Y . Hua, and J. Cao, “An upload-efficient scheme for transferring knowledge from a server-side pre-trained generator to clients in heterogeneous federated learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 12 109–12 119
2024
-
[41]
Y . Zhou, X. Qu, C. You, J. Zhou, J. Tang, X. Zheng, C. Cai, and Y . Wu, “Fedsa: A unified representation learning via semantic anchors for prototype-based federated learning,”arXiv preprint arXiv:2501.05496, 2025
-
[42]
Tackling data heterogeneity in federated learning with class prototypes,
Y . Dai, Z. Chen, J. Li, S. Heinecke, L. Sun, and R. Xu, “Tackling data heterogeneity in federated learning with class prototypes,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 6, 2023, pp. 7314–7322
2023
-
[43]
Cifar-10 (canadian institute for advanced research),
A. Krizhevsky, V . Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),”URL http://www. cs. toronto. edu/kriz/cifar. html, vol. 5, 2010
2010
-
[44]
Learning multiple layers of features from tiny images,
A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009
2009
-
[45]
Tiny imagenet visual recognition challenge,
Y . Le and X. Yang, “Tiny imagenet visual recognition challenge,”CS 231N, vol. 7, no. 7, p. 3, 2015
2015
-
[46]
Deeper, broader and artier domain generalization,
D. Li, Y . Yang, Y .-Z. Song, and T. M. Hospedales, “Deeper, broader and artier domain generalization,” inProceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017
2017
-
[47]
Mo- ment matching for multi-source domain adaptation,
X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Mo- ment matching for multi-source domain adaptation,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019
2019
-
[48]
Character-level convolutional net- works for text classification,
X. Zhang, J. Zhao, and Y . LeCun, “Character-level convolutional net- works for text classification,”Advances in neural information processing systems, vol. 28, 2015
2015
-
[49]
Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, and Tianyi Zhou
S. Caldas, S. M. K. Duddu, P. Wu, T. Li, J. Kone ˇcn`y, H. B. McMahan, V . Smith, and A. Talwalkar, “Leaf: A benchmark for federated settings,” arXiv preprint arXiv:1812.01097, 2018
-
[50]
E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Communication-efficient on-device machine learning: Federated dis- tillation and augmentation under non-iid private data,”arXiv preprint arXiv:1811.11479, 2018. APPENDIX A. Proof of Proposition 1 Proof.Recall the coordinate alignment loss Lcoord(Z, P) :=∥ ˆZ− ˆP∥ 2 F ,(22) and define R∗ ∈arg min...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.