Semantic-based Distributed Learning for Diverse and Discriminative Representations

Chaouki Ben Issaid; Mehdi Bennis; Zhuojun Tian

arxiv: 2604.18237 · v1 · submitted 2026-04-20 · 💻 cs.LG · cs.AI

Semantic-based Distributed Learning for Diverse and Discriminative Representations

Zhuojun Tian , Chaouki Ben Issaid , Mehdi Bennis This is my paper

Pith reviewed 2026-05-10 05:02 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords distributed learningdiverse representationsdiscriminative embeddingssemantic sharingnon-i.i.d. optimizationprimal-dual methodblock coordinate descentimage classification

0 comments

The pith

A distributed learning framework uses variance constraints and node clustering to produce both diverse and discriminative representations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve the problem of collapsed variability in distributed settings by creating representations that remain diverse within classes yet useful for discrimination in tasks like classification. It decouples the global objective for matching data distributions by adding explicit variance constraints and solves the resulting problem with a primal-dual method. For mismatched data distributions it groups nodes into clusters, replicates them virtually, and optimizes each cluster separately with block coordinate descent. Theoretical arguments establish that the obtained solutions keep both properties intact and converge when data are i.i.d., while semantic exchange removes the need for every node to run the same neural network. Experiments on standard image benchmarks illustrate that the approach recovers global structure more effectively than conventional task-specific methods.

Core claim

We propose a novel distributed learning framework that ensures both diverse and discriminative representations. For i.i.d. data, we reformulate and decouple the global optimization function by introducing constraints on representation variance. The update rules are then derived and simplified using a primal-dual approach. For non-i.i.d. data distributions, we tackle the problem by clustering and virtually replicating nodes, allowing model updates within each cluster using block coordinate descent. In both cases, the resulting optimal solutions are theoretically proven to maintain discriminative and diverse properties, with a guaranteed convergence for i.i.d. conditions. Additionally, the use

What carries the argument

Reformulation of the global objective with explicit representation-variance constraints solved by primal-dual updates for i.i.d. data, combined with node clustering and virtual replication solved by block coordinate descent for non-i.i.d. data.

If this is right

Optimal solutions preserve both discriminative power and diversity of representations.
Convergence is guaranteed when data across nodes are i.i.d.
Semantic sharing among nodes removes the requirement that every node use the same neural-network architecture.
The method recovers global structural representations on MNIST, CIFAR-10, and CIFAR-100.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Heterogeneous devices could collaborate without first agreeing on identical model architectures.
Communication volume may drop because only compact semantic summaries are exchanged rather than full model parameters.
The same variance-plus-clustering construction might extend to regression or reinforcement-learning tasks where structural preservation is also desirable.

Load-bearing premise

That adding variance constraints and virtually replicating nodes will produce stable optimal solutions that keep diversity and discriminativeness without creating new instabilities or needing extra tuning that removes the guarantees.

What would settle it

Running the derived primal-dual updates on i.i.d. data and checking whether intra-class representation variance remains above a positive threshold while classification accuracy stays high and the iterates converge.

Figures

Figures reproduced from arXiv: 2604.18237 by Chaouki Ben Issaid, Mehdi Bennis, Zhuojun Tian.

**Figure 2.** Figure 2: Illustration of the learned subspaces of MCR [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: System model: The distributed nodes collaboratively learn the global representations by transmitting the semantic information. The [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of the global representation learning algorithm under non-i.i.d. conditions: First, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Convergence curves of the averaged loss. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Cosine similarity between learned representations for MNIST under i.i.d. data distribution by using different algorithms. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Singular value comparison for i.i.d. MNIST dataset: (red) [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Cosine similarity between learned representations for CIFAR10 under i.i.d. data distribution using different algorithms. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Convergence curves of the averaged loss. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Cosine similarity between learned representations for MNIST [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: Cosine similarity between learned representations for CI [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 12.** Figure 12: Cosine similarity of learned representations for MNIST and [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

read the original abstract

In large-scale distributed scenarios, increasingly complex tasks demand more intelligent collaboration across networks, requiring the joint extraction of structural representations from data samples. However, conventional task-specific approaches often result in nonstructural embeddings, leading to collapsed variability among data samples within the same class, particularly in classification tasks. To address this issue and fully leverage the intrinsic structure of data for downstream applications, we propose a novel distributed learning framework that ensures both diverse and discriminative representations. For independent and identically distributed (i.i.d.) data, we reformulate and decouple the global optimization function by introducing constraints on representation variance. The update rules are then derived and simplified using a primal-dual approach. For non-i.i.d. data distributions, we tackle the problem by clustering and virtually replicating nodes, allowing model updates within each cluster using block coordinate descent. In both cases, the resulting optimal solutions are theoretically proven to maintain discriminative and diverse properties, with a guaranteed convergence for i.i.d. conditions. Additionally, semantic information from representations is shared among nodes, reducing the need for common neural network architectures. Finally, extensive simulations on MNIST, CIFAR-10 and CIFAR-100 confirm the effectiveness of the proposed algorithms in capturing global structural representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds variance constraints to a primal-dual objective for i.i.d. distributed learning and uses clustering with virtual replication for non-i.i.d. cases to avoid representation collapse, backed by claimed proofs and standard dataset experiments.

read the letter

The main point of this paper is a framework for distributed learning that enforces diverse and discriminative representations by adding variance constraints to the global objective in the i.i.d. case, then using primal-dual methods to derive updates. For non-i.i.d. data, it clusters nodes with virtual replication and applies block coordinate descent. The authors claim the optimal solutions preserve the desired properties, with convergence guaranteed only under i.i.d. conditions. What is actually new here is the specific combination of variance constraints in a decoupled primal-dual formulation and the clustering-based handling for non-i.i.d. scenarios. It extends common optimization techniques to address representation collapse in distributed classification without requiring shared network architectures across nodes. The simulations on MNIST, CIFAR-10, and CIFAR-100 show it can capture global structural representations better than conventional approaches, which is a practical check. The paper does well in laying out a clear problem and a structured solution that separates the i.i.d. and non-i.i.d. cases. Sharing semantic information among nodes is a nice touch for heterogeneous environments. The soft spots are in the theoretical backing and completeness. The abstract mentions that update rules are derived and solutions proven to maintain properties, but without the actual equations or proof sketches visible, it's difficult to assess if there are gaps in the error analysis or regularity conditions. Convergence is only guaranteed for i.i.d., leaving the non-i.i.d. case without that assurance, which could be an issue since non-i.i.d. is often the harder and more relevant setting. The variance constraint strength appears as a tunable parameter that might require careful selection, potentially affecting the claimed guarantees in practice. This work is for people in distributed and federated learning who focus on representation quality for downstream tasks like classification. A reader looking for incremental improvements in optimization for non-i.i.d. data would get some value from the clustering method and the experimental results. It deserves a serious referee because it has a defined contribution with theory and experiments, even if the proofs need close reading. I would recommend sending it to peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a semantic-based distributed learning framework to obtain both diverse and discriminative representations across networked nodes. For i.i.d. data, the global objective is reformulated by adding representation variance constraints, then decoupled and solved via primal-dual updates. For non-i.i.d. data, nodes are clustered with virtual replication and updated via block coordinate descent. The resulting solutions are claimed to provably preserve discriminative and diverse properties (with convergence guaranteed only under i.i.d. conditions). Semantic information is exchanged to permit heterogeneous architectures. Experiments on MNIST, CIFAR-10, and CIFAR-100 are reported to confirm effectiveness.

Significance. If the derivations and proofs hold, the work would offer a principled approach to mitigating representation collapse in distributed settings while supporting heterogeneous models through semantic sharing. This could meaningfully advance federated and collaborative learning by providing theoretical guarantees on structural properties of representations, particularly valuable for large-scale networks with non-i.i.d. distributions.

major comments (3)

[non-i.i.d. analysis and theoretical proofs] The abstract and theoretical sections assert that optimal solutions maintain discriminative and diverse properties for non-i.i.d. data via clustering and virtual replication, yet convergence is guaranteed only for i.i.d. conditions. The non-i.i.d. analysis must explicitly delineate which properties are rigorously proven versus asserted, including any additional assumptions required for the block-coordinate updates to preserve the variance and clustering objectives.
[i.i.d. reformulation and primal-dual derivation] The representation variance constraint is introduced as a key mechanism for i.i.d. decoupling, but its strength appears as a tunable parameter. The proofs should demonstrate that the claimed properties hold independently of this parameter (or specify the range where they remain valid), as any post-hoc selection risks undermining the 'proven' guarantees.
[non-i.i.d. clustering and virtual replication] The weakest assumption—that variance constraints plus virtual replication produce stable optimal solutions without introducing new instabilities—is load-bearing for the central claim. The manuscript should include a sensitivity analysis or counter-example showing that the derived updates do not collapse diversity or discriminativeness under realistic non-i.i.d. shifts.

minor comments (2)

[Experiments] The experimental section should report the specific value (or selection procedure) used for the variance constraint strength on each dataset, along with ablation results showing sensitivity.
[Preliminaries and method] Notation for the primal-dual variables and the semantic sharing mechanism should be introduced earlier and used consistently to improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, clarifying the scope of our theoretical results and outlining revisions to improve the manuscript's rigor and transparency.

read point-by-point responses

Referee: [non-i.i.d. analysis and theoretical proofs] The abstract and theoretical sections assert that optimal solutions maintain discriminative and diverse properties for non-i.i.d. data via clustering and virtual replication, yet convergence is guaranteed only for i.i.d. conditions. The non-i.i.d. analysis must explicitly delineate which properties are rigorously proven versus asserted, including any additional assumptions required for the block-coordinate updates to preserve the variance and clustering objectives.

Authors: We agree that the distinction between rigorously proven results and those that follow from the formulation requires explicit delineation. In the revised manuscript we will insert a new subsection (e.g., Section 4.3) that states: (i) the maintenance of discriminative and diverse properties for non-i.i.d. data is proven by showing that block-coordinate descent on the clustered, virtually replicated objective preserves the variance constraints and cluster assignments at optimality; (ii) convergence of the iterates is proven only under the i.i.d. primal-dual setting; and (iii) the additional assumptions required for the non-i.i.d. case are that the clustering step produces stable partitions and that virtual replication faithfully reproduces intra-cluster statistics. These clarifications will be cross-referenced in the abstract and introduction. revision: yes
Referee: [i.i.d. reformulation and primal-dual derivation] The representation variance constraint is introduced as a key mechanism for i.i.d. decoupling, but its strength appears as a tunable parameter. The proofs should demonstrate that the claimed properties hold independently of this parameter (or specify the range where they remain valid), as any post-hoc selection risks undermining the 'proven' guarantees.

Authors: The variance constraint is enforced via a positive Lagrange multiplier λ in the primal-dual updates. At optimality the constraint is satisfied for any λ > 0, which directly yields the diversity property independently of the specific positive value; the discriminative property follows from the original supervised loss. We will add a remark immediately after the statement of Theorem 1 (or the corresponding i.i.d. theorem) that explicitly notes the guarantees hold for all λ > 0 and that the dual ascent step prevents the trivial zero-variance solution. This removes any ambiguity about post-hoc parameter selection. revision: yes
Referee: [non-i.i.d. clustering and virtual replication] The weakest assumption—that variance constraints plus virtual replication produce stable optimal solutions without introducing new instabilities—is load-bearing for the central claim. The manuscript should include a sensitivity analysis or counter-example showing that the derived updates do not collapse diversity or discriminativeness under realistic non-i.i.d. shifts.

Authors: We acknowledge that empirical validation of stability under non-i.i.d. shifts strengthens the central claim. In the revision we will add a sensitivity study in Section 5 (Experiments) that varies the degree of non-i.i.d. partitioning (Dirichlet concentration parameter) and reports the resulting representation variance and class-separation metrics before and after the block-coordinate updates. If any regime exhibits collapse, we will state the corresponding conditions under which the method remains reliable. This provides the requested empirical support without altering the theoretical assumptions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations apply standard methods to novel objective

full rationale

The paper starts from a global optimization objective, introduces variance constraints to decouple it for i.i.d. data, derives primal-dual update rules, and for non-i.i.d. data applies clustering with virtual replication plus block coordinate descent. The subsequent proofs establish that the resulting solutions preserve discriminative and diverse properties under the stated conditions. These steps rely on standard convex optimization techniques and do not reduce any claimed result to a fitted parameter, self-citation chain, or input by construction. No equations or claims in the provided description equate a prediction or theorem to its own inputs; the framework is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard convex optimization techniques plus the domain assumption that representation variance can be directly constrained to enforce diversity without destroying discriminability.

free parameters (1)

representation variance constraint strength
A tunable parameter introduced to control diversity; its value or selection method is not specified in the abstract.

axioms (2)

domain assumption Global optimization function can be reformulated and decoupled by adding representation variance constraints
Invoked to derive update rules for i.i.d. case via primal-dual approach.
domain assumption Clustering and virtual node replication allow block coordinate descent to preserve properties in non-i.i.d. settings
Used to handle heterogeneous data distributions.

pith-pipeline@v0.9.0 · 5517 in / 1350 out tokens · 82671 ms · 2026-05-10T05:02:14.555258+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Distributed learning in wireless networks: Recent progress and future challenges,

M. Chen, D. G ¨und¨uz, K. Huang, W. Saad, M. Bennis, A. V . Fel- jan, and H. V . Poor, “Distributed learning in wireless networks: Recent progress and future challenges,”IEEE Journal on Selected Areas in Commun., vol. 39, no. 12, pp. 3579–3605, 2021

work page 2021
[2]

On the principles of parsimony and self-consistency for the emergence of intelligence,

Y . Ma, D. Tsao, and H.-Y . Shum, “On the principles of parsimony and self-consistency for the emergence of intelligence,”Frontiers of Information Technology & Electronic Engineering, vol. 23, no. 9, pp. 1298–1323, 2022

work page 2022
[3]

A geometric analysis of neural collapse with unconstrained features,

Z. Zhu, T. Ding, J. Zhou, X. Li, C. You, J. Sulam, and Q. Qu, “A geometric analysis of neural collapse with unconstrained features,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 34, pp. 29 820–29 834, 2021

work page 2021
[4]

Neural collapse with normalized features: A geometric analysis over the riemannian manifold,

C. Yaras, P. Wang, Z. Zhu, L. Balzano, and Q. Qu, “Neural collapse with normalized features: A geometric analysis over the riemannian manifold,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 35, pp. 11 547–11 560, 2022

work page 2022
[5]

arXiv preprint arXiv:2410.14817 , year=

E. Elmoznino, T. Jiralerspong, Y . Bengio, and G. La- joie, “A complexity-based theory of compositionality,”arXiv: 2410.14817, 2024

work page arXiv 2024
[6]

Federated learn- ing: Challenges, methods, and future directions,

T. Li, A. K. Sahu, A. Talwalkar, and V . Smith, “Federated learn- ing: Challenges, methods, and future directions,”IEEE signal process. magazine, vol. 37, no. 3, pp. 50–60, 2020

work page 2020
[7]

Can decentralized algorithms out- perform centralized algorithms? a case study for decentralized parallel stochastic gradient descent,

X. Lian, C. Zhang, et. al., “Can decentralized algorithms out- perform centralized algorithms? a case study for decentralized parallel stochastic gradient descent,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 30, 2017

work page 2017
[8]

Federated multi-task learning,

V . Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated multi-task learning,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 30, 2017

work page 2017
[9]

Distributed stochastic gradient tracking methods,

S. Pu and A. Nedi ´c, “Distributed stochastic gradient tracking methods,”Mathematical Programming, vol. 187, no. 1, pp. 409– 457, 2021. 16

work page 2021
[10]

Distributed learning over networks with graph-attention-based personaliza- tion,

Z. Tian, Z. Zhang, Z. Yang, R. Jin, and H. Dai, “Distributed learning over networks with graph-attention-based personaliza- tion,”IEEE Trans. Signal Process., vol. 71, pp. 2071–2086, 2023

work page 2071
[11]

Robust and communication-efficient federated learning from non-iid data,

F. Sattler, S. Wiedemann, K.-R. M ¨uller, and W. Samek, “Robust and communication-efficient federated learning from non-iid data,”IEEE Trans. Neural Net. Learn. Syst., vol. 31, no. 9, pp. 3400–3413, 2019

work page 2019
[12]

Ex- ploiting shared representations for personalized federated learn- ing,

L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, “Ex- ploiting shared representations for personalized federated learn- ing,” inInt. Conf. Mach. Learning, 2021, pp. 2089–2099

work page 2021
[13]

Distributed compressed sensing with personalized variational auto-encoders,

Z. Tian, Z. Zhang, R. Jin, L. Liu, and Z. Yang, “Distributed compressed sensing with personalized variational auto-encoders,” inIEEE 33rd Inter. Workshop on Machine Learning for Signal Processing (MLSP), 2023, pp. 1–6

work page 2023
[14]

One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis,

G. Zhu, Y . Du, D. G ¨und¨uz, and K. Huang, “One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis,”IEEE Trans. Wireless Com- mun., vol. 20, no. 3, pp. 2120–2135, 2020

work page 2020
[15]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial Intell. and Statis.. PMLR, 2017, pp. 1273–1282

work page 2017
[16]

Communication-efficient federated learning based on compressed sensing,

C. Li, G. Li, and P. K. Varshney, “Communication-efficient federated learning based on compressed sensing,”IEEE Internet of Things Journal, vol. 8, no. 20, pp. 15 531–15 541, 2021

work page 2021
[17]

Fed- mask: Joint computation and communication-efficient personal- ized federated learning via heterogeneous masking,

A. Li, J. Sun, X. Zeng, M. Zhang, H. Li, and Y . Chen, “Fed- mask: Joint computation and communication-efficient personal- ized federated learning via heterogeneous masking,” inProc. 19th ACM Conf. on Embed. Networked Sensor Syst., 2021, pp. 42–55

work page 2021
[18]

Group knowledge transfer: Federated learning of large cnns at the edge,

C. He, M. Annavaram, and S. Avestimehr, “Group knowledge transfer: Federated learning of large cnns at the edge,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 33, pp. 14 068–14 080, 2020

work page 2020
[19]

Distributed learning of deep neural network over multiple nodes,

O. Gupta and R. Raskar, “Distributed learning of deep neural network over multiple nodes,”Journal of Network and Computer Applications, vol. 116, pp. 1–8, 2018

work page 2018
[20]

Heterofl: Computation and communication efficient federated learning for heterogeneous clients,

E. Diao, J. Ding, and V . Tarokh, “Heterofl: Computation and communication efficient federated learning for heterogeneous clients,”arXiv: 2010.01264, 2020

work page arXiv 2010
[21]

Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout,

S. Horvath, S. Laskaridis, M. Almeida, I. Leontiadis, S. Ve- nieris, and N. Lane, “Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 34, 2021

work page 2021
[22]

Tailorfl: Dual-personalized federated learning under system and data heterogeneity,

Y . Deng, W. Chen, J. Ren, F. Lyu, Y . Liu, Y . Liu, and Y . Zhang, “Tailorfl: Dual-personalized federated learning under system and data heterogeneity,” inProc. 20th ACM Conf. Embedded Net- worked Sensor Systems, 2022, pp. 592–606

work page 2022
[23]

Model pruning enables efficient federated learning on edge devices,

Y . Jiang, S. Wang, V . Valls, B. J. Ko, W.-H. Lee, K. K. Leung, and L. Tassiulas, “Model pruning enables efficient federated learning on edge devices,”IEEE Trans. Neural Net. Learn. Syst., vol. 34, no. 12, pp. 10 374–10 386, 2022

work page 2022
[24]

Communication- Efficient Personalized Distributed Learning with Data and Node Heterogeneity,

Z. Tian, Z. Zhang, Y . Li, and M. Bennis, “Communication- Efficient Personalized Distributed Learning with Data and Node Heterogeneity,”IEEE Transactions on Cognitive Communica- tions and Networking, 2025

work page 2025
[25]

Fedhm: Efficient federated learning for heterogeneous models via low-rank factorization,

D. Yao, W. Pan, M. J. O’Neill, Y . Dai, Y . Wan, H. Jin, and L. Sun, “Fedhm: Efficient federated learning for heterogeneous models via low-rank factorization,”arXiv: 2111.14655, 2021

work page arXiv 2021
[26]

Resource-adaptive federated learning with all-in-one neural composition,

Y . Mei, P. Guo, M. Zhou, and V . Patel, “Resource-adaptive federated learning with all-in-one neural composition,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 35, pp. 4270–4284, 2022

work page 2022
[27]

Deep representation learning: Funda- mentals, technologies, applications, and open challenges,

A. Payandeh, K. T. Baghaei, P. Fayyazsanavi, S. B. Ramezani, Z. Chen, and S. Rahimi, “Deep representation learning: Funda- mentals, technologies, applications, and open challenges,”IEEE Access, vol. 11, pp. 137 621–137 659, 2023

work page 2023
[28]

A survey of multi-view represen- tation learning,

Y . Li, M. Yang, and Z. Zhang, “A survey of multi-view represen- tation learning,”IEEE Trans. Knowledge and Data Engineering, vol. 31, no. 10, pp. 1863–1883, 2018

work page 2018
[29]

Representation learn- ing: A review and new perspectives,

Y . Bengio, A. Courville, and P. Vincent, “Representation learn- ing: A review and new perspectives,”IEEE Trans. Pattern Analysis Machine Intell., vol. 35, no. 8, pp. 1798–1828, 2013

work page 2013
[30]

Distributed representation learning via node2vec for implicit feedback rec- ommendation,

Y . Liu, Z. Tian, J. Sun, Y . Jiang, and X. Zhang, “Distributed representation learning via node2vec for implicit feedback rec- ommendation,”Neural Computing and Applications, vol. 32, no. 9, pp. 4335–4345, 2020

work page 2020
[31]

Distributed variational represen- tation learning,

I. E. Aguerri and A. Zaidi, “Distributed variational represen- tation learning,”IEEE Trans. Pattern Analysis Machine Intell., vol. 43, no. 1, pp. 120–138, 2019

work page 2019
[32]

Collaborative unsupervised visual representation learning from decentralized data,

W. Zhuang, X. Gan, Y . Wen, S. Zhang, and S. Yi, “Collaborative unsupervised visual representation learning from decentralized data,” inProc. IEEE/CVF Int. Conf. Computer Vision, 2021, pp. 4912–4921

work page 2021
[33]

Orchestra: Unsupervised federated learning via globally consistent clustering

E. S. Lubana, C. I. Tang, F. Kawsar, R. P. Dick, and A. Mathur, “Orchestra: Unsupervised federated learning via globally consis- tent clustering,”arXiv: 2205.11506, 2022

work page arXiv 2022
[34]

Rethinking the representation in federated unsupervised learning with non-iid data,

X. Liao, W. Liu, C. Chen,et al., “Rethinking the representation in federated unsupervised learning with non-iid data,” inProc. IEEE/CVF Conf. Computer Vision Pattern Recognition, 2024, pp. 22841–22850

work page 2024
[35]

Federated unsupervised representation learning,

F. Zhang, K. Kuang, L. Chen,et al., “Federated unsupervised representation learning,”Frontiers of Information Technology & Electronic Engineering, vol. 24, no. 8, pp. 1181–1193, 2023

work page 2023
[36]

Simclr: A simple framework for contrastive learning of visual representa- tions,

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “Simclr: A simple framework for contrastive learning of visual representa- tions,” inInt. Conf. Learn. Represen., vol. 2, no. 4, 2020

work page 2020
[37]

SheafAlign: A Sheaf-theoretic Framework for Decentralized Multimodal Alignment,

A. Ghalkha, Z. Tian, C. B. Issaid, and M. Bennis, “SheafAlign: A Sheaf-theoretic Framework for Decentralized Multimodal Alignment,”arXiv: 2510.20540, 2025

work page arXiv 2025
[38]

Learning diverse and discriminative representations via the principle of maximal coding rate reduction,

Y . Yu, K. H. R. Chan, C. You, C. Song, and Y . Ma, “Learning diverse and discriminative representations via the principle of maximal coding rate reduction,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 33, pp. 9422–9434, 2020

work page 2020
[39]

Closed-loop data transcription to an ldr via minimaxing rate reduction,

X. Dai, S. Tong, M. Li, Z. Wu, M. Psenka, K. H. R. Chan, P. Zhai, Y . Yu, X. Yuan, H. Y . Shumet al., “Closed-loop data transcription to an ldr via minimaxing rate reduction,”arXiv: 2111.06636, 2021

work page arXiv 2021
[40]

Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective,

Z. Tian and B. Mehdi, “Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective,”IEEE Signal Process. Letters, vol. 32, pp. 4409– 4413, 2025

work page 2025
[41]

Segmentation of multivariate mixed data via lossy data coding and compression,

Y . Ma, H. Derksen, W. Hong, and J. Wright, “Segmentation of multivariate mixed data via lossy data coding and compression,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1546–1562, 2007

work page 2007
[42]

Distributed admm with synergetic communication and compu- tation,

Z. Tian, Z. Zhang, J. Wang, X. Chen, W. Wang, and H. Dai, “Distributed admm with synergetic communication and compu- tation,”IEEE Trans. Commun., vol. 69, no. 1, pp. 501–517, 2020

work page 2020
[43]

Distributed admm for in-network reconstruction of sparse sig- nals with innovations,

J. Matamoros, S. M. Fosson, E. Magli, and C. Ant ´on-Haro, “Distributed admm for in-network reconstruction of sparse sig- nals with innovations,”IEEE Trans. Signal and Information Process. over Networks, vol. 1, no. 4, pp. 225–234, 2015

work page 2015
[44]

On the linear convergence of the alternating direction method of multipliers,

M. Hong and Z.-Q. Luo, “On the linear convergence of the alternating direction method of multipliers,”Mathematical Pro- gramming, vol. 162, no. 1, pp. 165–199, 2017

work page 2017
[45]

Distributed multi-view sparse vector recovery,

Z. Tian, Z. Zhang, and L. Hanzo, “Distributed multi-view sparse vector recovery,”IEEE Trans. Signal Process., vol. 71, pp. 1448– 1463, 2023

work page 2023
[46]

On the convergence of block coordinate descent type methods,

A. Beck and L. Tetruashvili, “On the convergence of block coordinate descent type methods,”SIAM journal on Optimization, vol. 23, no. 4, pp. 2037–2060, 2013

work page 2037
[47]

Iteration complexity analysis of block coordinate descent methods,

M. Hong, X. Wang, M. Razaviyayn, and Z.-Q. Luo, “Iteration complexity analysis of block coordinate descent methods,”Math- ematical Programming, vol.163, pp. 85–114, 2017

work page 2017

[1] [1]

Distributed learning in wireless networks: Recent progress and future challenges,

M. Chen, D. G ¨und¨uz, K. Huang, W. Saad, M. Bennis, A. V . Fel- jan, and H. V . Poor, “Distributed learning in wireless networks: Recent progress and future challenges,”IEEE Journal on Selected Areas in Commun., vol. 39, no. 12, pp. 3579–3605, 2021

work page 2021

[2] [2]

On the principles of parsimony and self-consistency for the emergence of intelligence,

Y . Ma, D. Tsao, and H.-Y . Shum, “On the principles of parsimony and self-consistency for the emergence of intelligence,”Frontiers of Information Technology & Electronic Engineering, vol. 23, no. 9, pp. 1298–1323, 2022

work page 2022

[3] [3]

A geometric analysis of neural collapse with unconstrained features,

Z. Zhu, T. Ding, J. Zhou, X. Li, C. You, J. Sulam, and Q. Qu, “A geometric analysis of neural collapse with unconstrained features,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 34, pp. 29 820–29 834, 2021

work page 2021

[4] [4]

Neural collapse with normalized features: A geometric analysis over the riemannian manifold,

C. Yaras, P. Wang, Z. Zhu, L. Balzano, and Q. Qu, “Neural collapse with normalized features: A geometric analysis over the riemannian manifold,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 35, pp. 11 547–11 560, 2022

work page 2022

[5] [5]

arXiv preprint arXiv:2410.14817 , year=

E. Elmoznino, T. Jiralerspong, Y . Bengio, and G. La- joie, “A complexity-based theory of compositionality,”arXiv: 2410.14817, 2024

work page arXiv 2024

[6] [6]

Federated learn- ing: Challenges, methods, and future directions,

T. Li, A. K. Sahu, A. Talwalkar, and V . Smith, “Federated learn- ing: Challenges, methods, and future directions,”IEEE signal process. magazine, vol. 37, no. 3, pp. 50–60, 2020

work page 2020

[7] [7]

Can decentralized algorithms out- perform centralized algorithms? a case study for decentralized parallel stochastic gradient descent,

X. Lian, C. Zhang, et. al., “Can decentralized algorithms out- perform centralized algorithms? a case study for decentralized parallel stochastic gradient descent,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 30, 2017

work page 2017

[8] [8]

Federated multi-task learning,

V . Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated multi-task learning,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 30, 2017

work page 2017

[9] [9]

Distributed stochastic gradient tracking methods,

S. Pu and A. Nedi ´c, “Distributed stochastic gradient tracking methods,”Mathematical Programming, vol. 187, no. 1, pp. 409– 457, 2021. 16

work page 2021

[10] [10]

Distributed learning over networks with graph-attention-based personaliza- tion,

Z. Tian, Z. Zhang, Z. Yang, R. Jin, and H. Dai, “Distributed learning over networks with graph-attention-based personaliza- tion,”IEEE Trans. Signal Process., vol. 71, pp. 2071–2086, 2023

work page 2071

[11] [11]

Robust and communication-efficient federated learning from non-iid data,

F. Sattler, S. Wiedemann, K.-R. M ¨uller, and W. Samek, “Robust and communication-efficient federated learning from non-iid data,”IEEE Trans. Neural Net. Learn. Syst., vol. 31, no. 9, pp. 3400–3413, 2019

work page 2019

[12] [12]

Ex- ploiting shared representations for personalized federated learn- ing,

L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, “Ex- ploiting shared representations for personalized federated learn- ing,” inInt. Conf. Mach. Learning, 2021, pp. 2089–2099

work page 2021

[13] [13]

Distributed compressed sensing with personalized variational auto-encoders,

Z. Tian, Z. Zhang, R. Jin, L. Liu, and Z. Yang, “Distributed compressed sensing with personalized variational auto-encoders,” inIEEE 33rd Inter. Workshop on Machine Learning for Signal Processing (MLSP), 2023, pp. 1–6

work page 2023

[14] [14]

One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis,

G. Zhu, Y . Du, D. G ¨und¨uz, and K. Huang, “One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis,”IEEE Trans. Wireless Com- mun., vol. 20, no. 3, pp. 2120–2135, 2020

work page 2020

[15] [15]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial Intell. and Statis.. PMLR, 2017, pp. 1273–1282

work page 2017

[16] [16]

Communication-efficient federated learning based on compressed sensing,

C. Li, G. Li, and P. K. Varshney, “Communication-efficient federated learning based on compressed sensing,”IEEE Internet of Things Journal, vol. 8, no. 20, pp. 15 531–15 541, 2021

work page 2021

[17] [17]

Fed- mask: Joint computation and communication-efficient personal- ized federated learning via heterogeneous masking,

A. Li, J. Sun, X. Zeng, M. Zhang, H. Li, and Y . Chen, “Fed- mask: Joint computation and communication-efficient personal- ized federated learning via heterogeneous masking,” inProc. 19th ACM Conf. on Embed. Networked Sensor Syst., 2021, pp. 42–55

work page 2021

[18] [18]

Group knowledge transfer: Federated learning of large cnns at the edge,

C. He, M. Annavaram, and S. Avestimehr, “Group knowledge transfer: Federated learning of large cnns at the edge,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 33, pp. 14 068–14 080, 2020

work page 2020

[19] [19]

Distributed learning of deep neural network over multiple nodes,

O. Gupta and R. Raskar, “Distributed learning of deep neural network over multiple nodes,”Journal of Network and Computer Applications, vol. 116, pp. 1–8, 2018

work page 2018

[20] [20]

Heterofl: Computation and communication efficient federated learning for heterogeneous clients,

E. Diao, J. Ding, and V . Tarokh, “Heterofl: Computation and communication efficient federated learning for heterogeneous clients,”arXiv: 2010.01264, 2020

work page arXiv 2010

[21] [21]

Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout,

S. Horvath, S. Laskaridis, M. Almeida, I. Leontiadis, S. Ve- nieris, and N. Lane, “Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 34, 2021

work page 2021

[22] [22]

Tailorfl: Dual-personalized federated learning under system and data heterogeneity,

Y . Deng, W. Chen, J. Ren, F. Lyu, Y . Liu, Y . Liu, and Y . Zhang, “Tailorfl: Dual-personalized federated learning under system and data heterogeneity,” inProc. 20th ACM Conf. Embedded Net- worked Sensor Systems, 2022, pp. 592–606

work page 2022

[23] [23]

Model pruning enables efficient federated learning on edge devices,

Y . Jiang, S. Wang, V . Valls, B. J. Ko, W.-H. Lee, K. K. Leung, and L. Tassiulas, “Model pruning enables efficient federated learning on edge devices,”IEEE Trans. Neural Net. Learn. Syst., vol. 34, no. 12, pp. 10 374–10 386, 2022

work page 2022

[24] [24]

Communication- Efficient Personalized Distributed Learning with Data and Node Heterogeneity,

Z. Tian, Z. Zhang, Y . Li, and M. Bennis, “Communication- Efficient Personalized Distributed Learning with Data and Node Heterogeneity,”IEEE Transactions on Cognitive Communica- tions and Networking, 2025

work page 2025

[25] [25]

Fedhm: Efficient federated learning for heterogeneous models via low-rank factorization,

D. Yao, W. Pan, M. J. O’Neill, Y . Dai, Y . Wan, H. Jin, and L. Sun, “Fedhm: Efficient federated learning for heterogeneous models via low-rank factorization,”arXiv: 2111.14655, 2021

work page arXiv 2021

[26] [26]

Resource-adaptive federated learning with all-in-one neural composition,

Y . Mei, P. Guo, M. Zhou, and V . Patel, “Resource-adaptive federated learning with all-in-one neural composition,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 35, pp. 4270–4284, 2022

work page 2022

[27] [27]

Deep representation learning: Funda- mentals, technologies, applications, and open challenges,

A. Payandeh, K. T. Baghaei, P. Fayyazsanavi, S. B. Ramezani, Z. Chen, and S. Rahimi, “Deep representation learning: Funda- mentals, technologies, applications, and open challenges,”IEEE Access, vol. 11, pp. 137 621–137 659, 2023

work page 2023

[28] [28]

A survey of multi-view represen- tation learning,

Y . Li, M. Yang, and Z. Zhang, “A survey of multi-view represen- tation learning,”IEEE Trans. Knowledge and Data Engineering, vol. 31, no. 10, pp. 1863–1883, 2018

work page 2018

[29] [29]

Representation learn- ing: A review and new perspectives,

Y . Bengio, A. Courville, and P. Vincent, “Representation learn- ing: A review and new perspectives,”IEEE Trans. Pattern Analysis Machine Intell., vol. 35, no. 8, pp. 1798–1828, 2013

work page 2013

[30] [30]

Distributed representation learning via node2vec for implicit feedback rec- ommendation,

Y . Liu, Z. Tian, J. Sun, Y . Jiang, and X. Zhang, “Distributed representation learning via node2vec for implicit feedback rec- ommendation,”Neural Computing and Applications, vol. 32, no. 9, pp. 4335–4345, 2020

work page 2020

[31] [31]

Distributed variational represen- tation learning,

I. E. Aguerri and A. Zaidi, “Distributed variational represen- tation learning,”IEEE Trans. Pattern Analysis Machine Intell., vol. 43, no. 1, pp. 120–138, 2019

work page 2019

[32] [32]

Collaborative unsupervised visual representation learning from decentralized data,

W. Zhuang, X. Gan, Y . Wen, S. Zhang, and S. Yi, “Collaborative unsupervised visual representation learning from decentralized data,” inProc. IEEE/CVF Int. Conf. Computer Vision, 2021, pp. 4912–4921

work page 2021

[33] [33]

Orchestra: Unsupervised federated learning via globally consistent clustering

E. S. Lubana, C. I. Tang, F. Kawsar, R. P. Dick, and A. Mathur, “Orchestra: Unsupervised federated learning via globally consis- tent clustering,”arXiv: 2205.11506, 2022

work page arXiv 2022

[34] [34]

Rethinking the representation in federated unsupervised learning with non-iid data,

X. Liao, W. Liu, C. Chen,et al., “Rethinking the representation in federated unsupervised learning with non-iid data,” inProc. IEEE/CVF Conf. Computer Vision Pattern Recognition, 2024, pp. 22841–22850

work page 2024

[35] [35]

Federated unsupervised representation learning,

F. Zhang, K. Kuang, L. Chen,et al., “Federated unsupervised representation learning,”Frontiers of Information Technology & Electronic Engineering, vol. 24, no. 8, pp. 1181–1193, 2023

work page 2023

[36] [36]

Simclr: A simple framework for contrastive learning of visual representa- tions,

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “Simclr: A simple framework for contrastive learning of visual representa- tions,” inInt. Conf. Learn. Represen., vol. 2, no. 4, 2020

work page 2020

[37] [37]

SheafAlign: A Sheaf-theoretic Framework for Decentralized Multimodal Alignment,

A. Ghalkha, Z. Tian, C. B. Issaid, and M. Bennis, “SheafAlign: A Sheaf-theoretic Framework for Decentralized Multimodal Alignment,”arXiv: 2510.20540, 2025

work page arXiv 2025

[38] [38]

Learning diverse and discriminative representations via the principle of maximal coding rate reduction,

Y . Yu, K. H. R. Chan, C. You, C. Song, and Y . Ma, “Learning diverse and discriminative representations via the principle of maximal coding rate reduction,”Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 33, pp. 9422–9434, 2020

work page 2020

[39] [39]

Closed-loop data transcription to an ldr via minimaxing rate reduction,

X. Dai, S. Tong, M. Li, Z. Wu, M. Psenka, K. H. R. Chan, P. Zhai, Y . Yu, X. Yuan, H. Y . Shumet al., “Closed-loop data transcription to an ldr via minimaxing rate reduction,”arXiv: 2111.06636, 2021

work page arXiv 2021

[40] [40]

Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective,

Z. Tian and B. Mehdi, “Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective,”IEEE Signal Process. Letters, vol. 32, pp. 4409– 4413, 2025

work page 2025

[41] [41]

Segmentation of multivariate mixed data via lossy data coding and compression,

Y . Ma, H. Derksen, W. Hong, and J. Wright, “Segmentation of multivariate mixed data via lossy data coding and compression,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1546–1562, 2007

work page 2007

[42] [42]

Distributed admm with synergetic communication and compu- tation,

Z. Tian, Z. Zhang, J. Wang, X. Chen, W. Wang, and H. Dai, “Distributed admm with synergetic communication and compu- tation,”IEEE Trans. Commun., vol. 69, no. 1, pp. 501–517, 2020

work page 2020

[43] [43]

Distributed admm for in-network reconstruction of sparse sig- nals with innovations,

J. Matamoros, S. M. Fosson, E. Magli, and C. Ant ´on-Haro, “Distributed admm for in-network reconstruction of sparse sig- nals with innovations,”IEEE Trans. Signal and Information Process. over Networks, vol. 1, no. 4, pp. 225–234, 2015

work page 2015

[44] [44]

On the linear convergence of the alternating direction method of multipliers,

M. Hong and Z.-Q. Luo, “On the linear convergence of the alternating direction method of multipliers,”Mathematical Pro- gramming, vol. 162, no. 1, pp. 165–199, 2017

work page 2017

[45] [45]

Distributed multi-view sparse vector recovery,

Z. Tian, Z. Zhang, and L. Hanzo, “Distributed multi-view sparse vector recovery,”IEEE Trans. Signal Process., vol. 71, pp. 1448– 1463, 2023

work page 2023

[46] [46]

On the convergence of block coordinate descent type methods,

A. Beck and L. Tetruashvili, “On the convergence of block coordinate descent type methods,”SIAM journal on Optimization, vol. 23, no. 4, pp. 2037–2060, 2013

work page 2037

[47] [47]

Iteration complexity analysis of block coordinate descent methods,

M. Hong, X. Wang, M. Razaviyayn, and Z.-Q. Luo, “Iteration complexity analysis of block coordinate descent methods,”Math- ematical Programming, vol.163, pp. 85–114, 2017

work page 2017