pith. machine review for the scientific record. sign in

arxiv: 2605.05718 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: unknown

Enabling Federated Inference via Unsupervised Consensus Embedding

Authors on Pith no claims yet

Pith reviewed 2026-05-08 15:02 UTC · model grok-4.3

classification 💻 cs.LG
keywords federated inferenceconsensus embeddingunsupervised alignmentprivacy-preserving MLcooperative inferencenon-IID datamodel cooperation
0
0 comments X

The pith

Pretrained models can cooperate on predictions using only shared unlabeled data without sharing parameters or inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that independently trained machine learning models can improve inference accuracy by aligning their intermediate representations into a shared space learned solely from unlabeled data. This approach matters because it removes requirements for data sharing, parameter exchange, or common model architectures that restrict cooperative inference in privacy-sensitive or cross-organizational settings. Experiments demonstrate consistent gains over using any single model alone on image classification benchmarks with non-identical data distributions across clients. The method further extends to text and time-series tasks, though gains vary with how outputs are combined. Representation alignment emerges as the key factor limiting further improvements.

Core claim

CE-FI enables pretrained models to cooperate at inference time without sharing model parameters or raw inputs and without assuming a common encoder. It introduces a Consensus Embedding layer that maps heterogeneous intermediate representations into a common embedding space and a Cooperative Output layer that produces predictions from these embeddings. Both layers are trained using shared unlabeled data only, so the cooperative stage does not require additional labeled data. Experiments on image classification benchmarks under diverse non-IID conditions show that CE-FI consistently outperforms solo inference and performs comparably to conventional methods that require stronger sharing.

What carries the argument

The Consensus Embedding layer, which maps heterogeneous intermediate representations from independently trained models into a single common space trained unsupervised on shared unlabeled data, paired with a Cooperative Output layer for final predictions.

If this is right

  • Independent models achieve higher accuracy than any one alone by combining aligned features without exchanging data or parameters.
  • The framework matches performance of methods that assume stronger sharing, such as common encoders or labeled data exchange.
  • Representation alignment is the primary bottleneck, so improving it directly raises cooperative gains.
  • The approach applies beyond images to text and time-series tasks depending on the output combination strategy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Organizations could pool predictions across separately developed models while keeping training data and architectures fully private.
  • If alignment techniques advance, the performance gap to fully centralized ensembles would shrink further.
  • The method suggests testing whether the same unlabeled alignment works when models differ more radically in architecture or task.

Load-bearing premise

Enough shared unlabeled data is available and the models' intermediate features can be aligned into a useful common space without any labeled examples or shared architecture components.

What would settle it

No accuracy improvement over individual model predictions when the consensus embedding is applied to models whose intermediate representations have completely disjoint feature spaces on a standard benchmark.

Figures

Figures reproduced from arXiv: 2605.05718 by Takahito Tanimura, Takayuki Nishio, Yuichi Kitagawa, Yui Hashimoto.

Figure 1
Figure 1. Figure 1: Cooperative inference with CE-FI. Each device maps the intermediate view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the CO layer training procedure. Each device generates view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of labels in the 3-device setting: (a)Mild (b)Moderate view at source ↗
Figure 6
Figure 6. Figure 6: t-SNE visualization of feature representations across devices in view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy across label partition strategies under different datasets view at source ↗
Figure 7
Figure 7. Figure 7: Accuracy across label partition strategies on different modalities (3 view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of accuracy between Solo Inference and CE-FI under view at source ↗
Figure 9
Figure 9. Figure 9: Accuracy under various label partition settings with 3 devices: (a) view at source ↗
Figure 12
Figure 12. Figure 12: t-SNE visualization of individual feature spaces from one pretrained view at source ↗
Figure 13
Figure 13. Figure 13: Accuracy on CIFAR-10 under domain shift in the shared unlabeled view at source ↗
Figure 14
Figure 14. Figure 14: Accuracy difference from baseline CE-FI under idealized consensus view at source ↗
Figure 15
Figure 15. Figure 15: Reconstruction attack results on CIFAR-10. For visualization, eight view at source ↗
read the original abstract

Cooperative inference across independently deployed machine learning models is increasingly desirable in distributed environments, as there is a growing need to leverage multiple models while keeping their data and model parameters private. However, existing cooperative frameworks typically rely on sharing input data, model parameters, or a common encoder, which limits their applicability in privacy-sensitive or cross-organizational settings. To address this challenge, we propose Consensus Embedding-based Federated Inference (CE-FI), a framework that enables pretrained models to cooperate at inference time without sharing model parameters or raw inputs and without assuming a common encoder. CE-FI introduces two components: a Consensus Embedding (CE) layer that maps heterogeneous intermediate representations into a common embedding space, and a Cooperative Output (CO) layer that produces predictions from these embeddings. Both layers are trained using shared unlabeled data only, so the cooperative stage does not require additional labeled data. Experiments on image classification benchmarks -- CIFAR-10 and CIFAR-100 -- under diverse non-IID conditions show that CE-FI consistently outperforms solo inference and performs comparably to conventional methods that require stronger sharing assumptions. Additional evaluations on text and time-series tasks indicate applicability beyond image classification, although performance depends on the ensemble strategy. Further analysis identifies representation alignment as the primary bottleneck.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Consensus Embedding-based Federated Inference (CE-FI) to enable cooperation among independently trained models at inference time without sharing parameters, raw inputs, or assuming a common encoder. It introduces a Consensus Embedding (CE) layer to map heterogeneous intermediate representations into a shared space and a Cooperative Output (CO) layer for predictions, both trained solely on shared unlabeled data. Experiments on CIFAR-10 and CIFAR-100 under non-IID partitions report consistent gains over solo inference and performance comparable to stronger-sharing baselines; additional results on text and time-series tasks are presented, with representation alignment identified as the primary bottleneck.

Significance. If the unsupervised alignment step reliably preserves class-discriminative information across incompatible representation spaces, CE-FI would provide a practical route to privacy-preserving cooperative inference in cross-organizational settings. The empirical results on standard image benchmarks under non-IID conditions are a positive signal, and the extension to non-image modalities broadens potential impact. However, the absence of theoretical guarantees on alignment quality, the explicit dependence on external shared unlabeled data, and the authors' own identification of alignment as the dominant failure mode limit the strength of the contribution relative to existing federated or ensemble methods.

major comments (3)
  1. [Abstract, §5] Abstract and §5 (Experiments): the central claim of 'consistent outperformance over solo inference' is immediately qualified by the statements that 'performance depends on the ensemble strategy' and that 'representation alignment [is] the primary bottleneck.' No quantitative metric of alignment quality (e.g., class-separability in the CE space or correlation with downstream accuracy) is reported, so it is unclear whether the observed gains are robust or merely an artifact of favorable ensemble choices on the tested partitions.
  2. [§3] §3 (Method), definition of the CE layer: the unsupervised training objective on shared unlabeled data is presented without any analysis or ablation of how the amount, distribution, or domain shift of that unlabeled data affects alignment quality. Because the entire cooperative stage rests on this step, the lack of sensitivity analysis makes the practical applicability of CE-FI difficult to assess.
  3. [§4] §4 (Theoretical analysis) or equivalent: no bounds, convergence arguments, or even empirical verification are given that the unsupervised consistency/reconstruction losses preserve label-discriminative structure when the input representations come from independently trained models with incompatible architectures. The skeptic concern that alignment may fail without labels or a shared encoder therefore remains unaddressed at the load-bearing point of the argument.
minor comments (2)
  1. [§3] Notation for the CE and CO layers is introduced without a clear diagram or pseudocode listing the forward and training passes; a figure would improve readability.
  2. [§5] The non-IID partitioning details (Dirichlet parameter, number of clients, etc.) are described only at a high level; exact reproduction would require additional specification.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight key areas where additional analysis and clarification can strengthen the manuscript. We address each major comment point-by-point below, with planned revisions to improve robustness and transparency.

read point-by-point responses
  1. Referee: [Abstract, §5] Abstract and §5 (Experiments): the central claim of 'consistent outperformance over solo inference' is immediately qualified by the statements that 'performance depends on the ensemble strategy' and that 'representation alignment [is] the primary bottleneck.' No quantitative metric of alignment quality (e.g., class-separability in the CE space or correlation with downstream accuracy) is reported, so it is unclear whether the observed gains are robust or merely an artifact of favorable ensemble choices on the tested partitions.

    Authors: We agree that quantitative metrics of alignment quality would better substantiate the claims and clarify the relationship between alignment success and performance gains. In the revised manuscript, we will add such metrics in §5, including class separability (silhouette score) and intra-class vs. inter-class distances in the CE space, as well as their correlation with downstream accuracy across different ensemble strategies. We will also explicitly state the conditions (e.g., when alignment quality exceeds a threshold) under which consistent outperformance is observed, rather than leaving it qualified only in the text. revision: yes

  2. Referee: [§3] §3 (Method), definition of the CE layer: the unsupervised training objective on shared unlabeled data is presented without any analysis or ablation of how the amount, distribution, or domain shift of that unlabeled data affects alignment quality. Because the entire cooperative stage rests on this step, the lack of sensitivity analysis makes the practical applicability of CE-FI difficult to assess.

    Authors: We acknowledge that sensitivity analysis on the shared unlabeled data is essential for assessing practical applicability. In the revision, we will include new ablations in §5 examining the effects of varying the amount of unlabeled data (e.g., 10%, 50%, 100% of available samples), different distributions (IID vs. non-IID partitions of the unlabeled set), and moderate domain shifts (e.g., using unlabeled data from a related but distinct source). These results will be presented alongside the main experiments to demonstrate robustness. revision: yes

  3. Referee: [§4] §4 (Theoretical analysis) or equivalent: no bounds, convergence arguments, or even empirical verification are given that the unsupervised consistency/reconstruction losses preserve label-discriminative structure when the input representations come from independently trained models with incompatible architectures. The skeptic concern that alignment may fail without labels or a shared encoder therefore remains unaddressed at the load-bearing point of the argument.

    Authors: The manuscript provides empirical verification through consistent performance gains on CIFAR-10/100 and extensions to text and time-series tasks, where the unsupervised losses enable cooperation without labels or shared encoders. However, we agree that more targeted empirical analysis of structure preservation would address the concern directly. In the revision, we will add in §5 visualizations (t-SNE of pre- and post-alignment representations) and quantitative measures (e.g., mutual information between CE embeddings and ground-truth labels) to show preservation of discriminative structure. Regarding theoretical bounds or convergence arguments, general guarantees for arbitrary heterogeneous architectures under purely unsupervised objectives are challenging to derive without strong assumptions and lie beyond the scope of this work; we will add an explicit discussion of this limitation in a new Limitations section. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical method relies on external shared unlabeled data and independent training

full rationale

The paper presents an empirical framework (CE-FI) whose central claims are performance gains on CIFAR-10/100 and other benchmarks under non-IID conditions. These gains are measured against solo inference baselines using held-out test data and do not reduce, by any equation or self-citation in the provided text, to quantities defined solely in terms of the method's own fitted parameters. The CE and CO layers are trained on external shared unlabeled data; the alignment step is acknowledged as a potential bottleneck rather than derived as a theorem. No load-bearing self-citation, uniqueness theorem, or ansatz smuggling is present in the abstract or reader summary. The derivation chain is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The framework assumes existence of shared unlabeled data representative enough to align representations and that intermediate features from heterogeneous models are alignable without supervision. No free parameters are explicitly fitted in the abstract description beyond standard training hyperparameters.

axioms (1)
  • domain assumption Existence of shared unlabeled data across all participating models that is sufficient for representation alignment
    Used to train the CE and CO layers without requiring labeled data or parameter sharing.
invented entities (2)
  • Consensus Embedding (CE) layer no independent evidence
    purpose: Maps heterogeneous intermediate representations from different models into a common embedding space
    New component introduced to enable cooperation without a common encoder.
  • Cooperative Output (CO) layer no independent evidence
    purpose: Generates final predictions from the aligned common embeddings
    New component for producing cooperative outputs.

pith-pipeline@v0.9.0 · 5525 in / 1332 out tokens · 36567 ms · 2026-05-08T15:02:56.894839+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 6 canonical work pages · 3 internal anchors

  1. [1]

    Evolutionary optimization of model merging recipes,

    T. Akiba, M. Shing, Y . Tang, Q. Sun, and D. Ha, “Evolutionary optimization of model merging recipes,”Nature Mach. Intell., vol. 7, no. 2, pp. 195–204, Feb. 2025

  2. [2]

    Adaptive mixtures of local experts,

    R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,”Neural Comput., vol. 3, no. 1, pp. 79–87, Mar. 1991

  3. [3]

    Model inversion attacks that exploit confidence information and basic countermeasures,

    M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” inProc. 22nd ACM SIGSAC Conf. Comput. Commun. Secur., 2015, pp. 1322– 1333

  4. [4]

    Towards federated inference: An online model ensemble framework for cooper- ative edge ai,

    Z. Zhou, J. Xie, M. Huang, T. Ouyang, F. Liu, and X. Chen, “Towards federated inference: An online model ensemble framework for cooper- ative edge ai,” inProc. IEEE INFOCOM, 2025, pp. 1–10

  5. [5]

    Fedserving: A federated prediction serving framework based on incentive mechanism,

    J. Weng, J. Weng, H. Huang, C. Cai, and C. Wang, “Fedserving: A federated prediction serving framework based on incentive mechanism,” inProc. IEEE INFOCOM, 2021, pp. 1–10

  6. [6]

    Cooperative edge inferences with online learning,

    M. Li, R. Venkatesha Prasad, and G. Iosifidis, “Cooperative edge inferences with online learning,”IEEE Internet Things J., vol. 12, no. 22, pp. 46 611–46 625, 2025

  7. [7]

    Decentralized low-latency collaborative inference via ensembles on the edge,

    M. Malka, E. Farhan, H. Morgenstern, and N. Shlezinger, “Decentralized low-latency collaborative inference via ensembles on the edge,”IEEE Trans. Wireless Commun., vol. 24, no. 1, pp. 598–614, Jan. 2025

  8. [8]

    Svcca: singular vector canonical correlation analysis for deep learning dynamics and interpretability,

    M. Raghu, J. Gilmer, J. Yosinski, and J. Sohl-Dickstein, “Svcca: singular vector canonical correlation analysis for deep learning dynamics and interpretability,” inProc. 31st Int. Conf. Neural Inf. Process. Syst., 2017, pp. 6078–6087. 18

  9. [9]

    Convergent Learning: Do different neural networks learn the same representations?

    Y . Li, J. Yosinski, J. Clune, H. Lipson, and J. Hopcroft, “Convergent learning: Do different neural networks learn the same representations?” arXiv:1511.07543, 2015

  10. [10]

    Neural network ensembles,

    L. K. Hansen and P. Salamon, “Neural network ensembles,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no. 10, pp. 993–1001, Oct. 1990

  11. [11]

    Ensemble methods in machine learning,

    T. G. Dietterich, “Ensemble methods in machine learning,” inProc. Int. Workshop Multiple Classifier Syst., Dec. 2000, pp. 1–15

  12. [12]

    Communication-Efficient Learning of Deep Networks from Decentralized Data,

    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” inProc. 20th Int. Conf. Artif. Intell. Statist., Apr. 2017, pp. 1273–1282

  13. [13]

    Federated machine learning: Concept and applications,

    Q. Yang, Y . Liu, T. Chen, and Y . Tong, “Federated machine learning: Concept and applications,”ACM Trans. Intell. Sys. Technol., vol. 10, no. 2, pp. 1–19, 2019

  14. [14]

    Federated optimization in heterogeneous networks,

    T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” inProc. Mach. Learn. Syst., vol. 2, 2020, pp. 429–450

  15. [15]

    Model-contrastive federated learning,

    Q. Li, B. He, and D. Song, “Model-contrastive federated learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2021, pp. 10 713–10 722

  16. [16]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv:1503.02531, 2015

  17. [17]

    arXiv preprint arXiv:1804.03235 , year=

    R. Anil, G. Pereyra, A. Passos, R. Ormandi, G. E. Dahl, and G. E. Hinton, “Large scale distributed neural network training through online distillation,”arXiv:1804.03235, 2018

  18. [18]

    Communication-efficient federated learning via knowledge distillation,

    C. Wu, F. Wu, L. Lyu, Y . Huang, and X. Xie, “Communication-efficient federated learning via knowledge distillation,”Nature Commun., vol. 13, no. 1, p. 2032, Apr. 2022

  19. [19]

    Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data,

    S. Itahara, T. Nishio, Y . Koda, M. Morikura, and K. Ya- mamoto, “Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data,” IEEE Trans. Mobile Comput., vol. 22, no. 1, pp. 191–205, Jan. 2023

  20. [20]

    A survey on transfer learning,

    S. J. Pan and Q. Yang, “A survey on transfer learning,”IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010

  21. [21]

    Cnn features off-the-shelf: an astounding baseline for recognition,

    A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-shelf: an astounding baseline for recognition,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, Jun. 2014, pp. 806–813

  22. [22]

    One-shot federated learning,

    N. Guha, A. Talwalkar, and V . Smith, “One-shot federated learning,” arXiv:1902.11175, 2019

  23. [23]

    Towards one-shot federated learning: Advances, chal- lenges, and future directions,

    F. Amato, L. Qiu, M. Tanveer, S. Cuomo, D. Annunziata, F. Giampaolo, and F. Piccialli, “Towards one-shot federated learning: Advances, chal- lenges, and future directions,”Neurocomputing, vol. 664, p. 132088, Feb. 2026

  24. [24]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    D. Alexey, B. Lucas, K. Alexander, W. Dirk, Z. Xiaohua, U. Thomas, D. Mostafa, M. Matthias, H. Georg, G. Sylvain, U. Jakob, and H. Neil, “An image is worth 16x16 words: Transformers for image recognition at scale,” inProc. Int. Conf. Learn. Represent., Jan. 2021

  25. [25]

    Distributed learning of deep neural network over multiple agents,

    O. Gupta and R. Raskar, “Distributed learning of deep neural network over multiple agents,”J. Netw. Comput. Appl., vol. 116, pp. 1–8, Aug. 2018

  26. [26]

    Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,

    Y . Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,”ACM SIGARCH Comput. Archit. News, pp. 615–629, Apr. 2017

  27. [27]

    Hybrid ensemble of classifiers using voting,

    I. Gandhi and M. Pandey, “Hybrid ensemble of classifiers using voting,” inProc. Int. Conf. Green Comput. Internet Things, 2015, pp. 399–404

  28. [28]

    S. W. A. Sherazi, J.-W. Bae, and J. Y . Lee, “A soft voting ensemble classifier for early prediction and diagnosis of occurrences of major adverse cardiovascular events for stemi and nstemi during 2-year follow- up in patients with acute coronary syndrome,”PLOS ONE, vol. 16, no. 6, pp. 1–20, Jun. 2021

  29. [29]

    A simple framework for contrastive learning of visual representations,

    T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” inProc. 37th Int. Conf. Mach. Learn., Jul. 2020, pp. 1597–1607

  30. [30]

    Energy-based out-of-distribution detection,

    W. Liu, X. Wang, J. Owens, and Y . Li, “Energy-based out-of-distribution detection,” inProc. 34th Int. Conf. Neural Inf. Process. Syst., 2020, pp. 21 464–21 475

  31. [31]

    Learning word vectors for sentiment analysis,

    A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y . Ng, and C. Potts, “Learning word vectors for sentiment analysis,” inProc. 49th Annu. Meeting Assoc. Comput. Linguistics: Hum. Lang. Technol., Jun. 2011, pp. 142–150

  32. [32]

    Genre classification dataset (imdb),

    hijest, “Genre classification dataset (imdb),” Kaggle, 2020, accessed: Feb. 10, 2026. [Online]. Available: https://www.kaggle.com/datasets/ hijest/genre-classification-dataset-imdb

  33. [33]

    ImageNet Large Scale Visual Recognition Challenge,

    R. Olga, D. Jia, S. Hao, K. Jonathan, S. Sanjeev, M. Sean, H. Zhiheng, K. Andrej, K. Aditya, B. Michael, C. B. Alexander, and F.-F. Li, “ImageNet Large Scale Visual Recognition Challenge,”Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Apr. 2015

  34. [34]

    Learning transferable visual models from natural language supervi- sion,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervi- sion,” inProc. 38th Int. Conf. Mach. Learn., vol. 139, Jul. 2021, pp. 8748–8763

  35. [35]

    DINOv3

    O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoa, F. Massa, D. Haz- iza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sentana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. J ´egou, P. Labatut, and P. Bojanowski, “Dinov3,”arXiv:2508.10104, 2025

  36. [36]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., Dec. 2016, pp. 770–778

  37. [37]

    Convnext v2: Co-designing and scaling convnets with masked autoencoders,

    S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie, “Convnext v2: Co-designing and scaling convnets with masked autoencoders,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 16 133–16 142

  38. [38]

    Simple CLIP,

    M. M. Shariatnia, “Simple CLIP,” gitHub repository, 2021, Accessed: Jul. 31, 2025. [Online]. Available: https://github.com/moein-shariatnia/ OpenAI-CLIP

  39. [39]

    Visualizing data using t-sne,

    L. van der Maaten and G. Hinton, “Visualizing data using t-sne,”J. Mach. Learn. Res., vol. 9, no. 86, pp. 2579–2605, Nov. 2008

  40. [40]

    Unsplit: Data-oblivious model inversion, model stealing, and label inference attacks against split learning,

    E. Erdo ˘gan, A. K ¨upc ¸¨u, and A. E. C ¸ ic ¸ek, “Unsplit: Data-oblivious model inversion, model stealing, and label inference attacks against split learning,” inProc. 21st ACM Workshop Privacy Electron. Soc., 2022, pp. 115–124

  41. [41]

    Unleashing the tiger: Inference attacks on split learning,

    D. Pasquini, G. Ateniese, and M. Bernaschi, “Unleashing the tiger: Inference attacks on split learning,” inProc. ACM SIGSAC Conf. Comput. Commun. Sec., 2021, pp. 2113–2129

  42. [42]

    A stealthy wrongdoer: Feature-oriented reconstruction attack against split learning,

    X. Xu, M. Yang, W. Yi, Z. Li, J. Wang, H. Hu, Y . Zhuang, and Y . Liu, “A stealthy wrongdoer: Feature-oriented reconstruction attack against split learning,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 12 130–12 139

  43. [43]

    Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

    A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv:1511.06434, 2015