pith. sign in

arxiv: 2605.15640 · v1 · pith:JUJ3EJG4new · submitted 2026-05-15 · 💻 cs.CV

Learning Disentangled Representations for Generalized Multi-view Clustering

Pith reviewed 2026-05-20 18:46 UTC · model grok-4.3

classification 💻 cs.CV
keywords multi-view clusteringdisentangled representationsautoencodersadversarial learningmutual informationincomplete viewsclustering performance
0
0 comments X

The pith

Dual-path autoencoders separate view-specific and shared features to improve multi-view clustering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a Generalized Multi-view Auto-Encoder that learns disentangled representations by routing source features through dual paths, one for view-specific details and one for view-common structure. Adversarial discriminators push the specific paths to become more discriminative while mutual information modulation keeps the common path aligned and non-collapsed. This setup is tested on both full and partial view scenarios across many standard datasets, where it produces tighter clusters than prior fusion approaches.

Core claim

GMAE decouples source features into view-specific and view-common embeddings through dual-path autoencoders. Cross-view adversarial discriminators guide the specific encoders toward more discriminative features, while mutual information modulation aligns distributions across views and avoids trivial solutions, yielding robust embeddings that support higher-quality clustering even when some views are missing.

What carries the argument

Dual-path autoencoders that split features into view-specific and view-common embeddings, steered by adversarial discriminators and mutual information modulation.

Load-bearing premise

That separating view-specific and view-common information through dual autoencoder paths will keep complementary details intact while reducing entanglement during fusion.

What would settle it

Clustering accuracy or normalized mutual information would fail to rise, or would drop, when the dual-path split or the mutual information term is removed from the model on the same 13 benchmark collections.

Figures

Figures reproduced from arXiv: 2605.15640 by Chang Tang, Kunlun He, Ruimeng Liu, Wanqing Li, Xinwang Liu, Xin Zou, Zhenglai Li.

Figure 1
Figure 1. Figure 1: An illustrative example of our motivation. As cluster [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The t-SNE visualization [37] results of feature representations on the STL-10 dataset for different SOTA methods. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The flowchart of our proposed GMAE. Specifically, given the multi-view feature matrix, GMAE first employs [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a-d) An analysis of the training process, including complete and incomplete multi-view datasets, respectively. (a) Dermatology (ACC) (b) Dermatology (NMI) (c) MSRCV1 (ACC) (d) MSRCV1 (NMI) [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of different missing radio on (ACC/NMI) evaluation metrics on Dermatology and MSRCV1 datasets. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The visualization results of feature representations on [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of embedded features Q. 4.3 Ablation Study The Z, H, and C embeddings play distinct roles at different stages in GMAE, reflecting the progressive refinement of feature learning. The t-SNE visualizations in [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Hyperparameter sensitivity analysis. The clustering results (ACC/NMI) vary with different values of α and β. (a) BRCA (ACC) (b) BRCA (NMI) (c) LGG (PUR) (d) LGG (NMI) [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The hyperparameter sensitivity analysis on the omics-based MVC datasets varies across different values of α and β. deeper insights into the evolution and function of these features. The view-specific representations Z are indepen￾dently generated by the encoders for each view, capturing the unique characteristics of individual view inputs. While these features effectively capture view-specific content, the… view at source ↗
Figure 10
Figure 10. Figure 10: (a-d) Effect of embedding dimension d on a range of evaluation metrics across different MVC datasets. dataset, slight overlaps appear between certain categories, which can be inferred to result from the similarity of cluster features across multiple views within the original dataset. In summary, the integrated features extracted by our proposed GMAE successfully produce dense clusters with well-defined bo… view at source ↗
read the original abstract

Multi-View Clustering (MVC) has gained significant attention for its ability to leverage complementary information across diverse views. However, existing deep MVC methods often struggle with view-distribution entanglement during cross-view fusion, which hampers the quality of the shared latent space and leads to suboptimal Figures. To address this issue, we propose the Generalized Multi-view Auto-Encoder (GMAE), a framework designed to preserve cross-view complementarity through disentangled representation learning. Specifically, GMAE employs dual-path autoencoders to decouple source features into view-specific and view-common embeddings, facilitating the discovery of clearer clustering structures. We further construct cross-view adversarial discriminators to guide view-specific encoders in capturing more discriminative features. By strategically modulating mutual information, GMAE effectively aligns distributions and prevents representation collapse, ensuring the generation of robust, non-trivial embeddings. Comprehensive experiments on 13 benchmark datasets demonstrate that GMAE consistently outperforms state-of-the-art methods in both complete and incomplete MVC tasks. Our code implementation is available at the repository: https://github.com/obananas/GMAE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Generalized Multi-view Auto-Encoder (GMAE) framework for multi-view clustering (MVC). It decouples source features into view-specific and view-common embeddings via dual-path autoencoders, employs cross-view adversarial discriminators to enhance discriminativeness, and uses mutual information modulation to align distributions and prevent representation collapse. The central claim is that this disentanglement preserves cross-view complementarity and yields clearer clustering structures, with consistent outperformance over state-of-the-art methods on 13 benchmark datasets in both complete and incomplete MVC settings. Code is released at a public repository.

Significance. If the disentanglement mechanism and MI modulation are shown to function as described without inadvertently discarding discriminative information, the work could advance deep MVC by providing a constructive way to handle view-distribution entanglement while retaining complementarity. The release of code supports reproducibility and allows direct verification of the reported gains on the 13 datasets.

major comments (2)
  1. [Method (dual-path autoencoders and MI modulation)] The central performance claims on incomplete MVC tasks rest on the effectiveness of mutual information modulation in preserving complementarity without collapse or loss of discriminative information. However, the method description provides no quantitative verification such as measured MI values between view-specific and view-common embeddings or ablation studies removing the modulation term; without these, it remains possible that reported gains arise from increased model capacity rather than the claimed disentanglement.
  2. [Experiments] §4 (Experiments): The abstract states consistent outperformance on 13 datasets for both complete and incomplete settings, yet the evaluation protocols, hyperparameter sensitivity analysis, and controls for post-hoc choices (e.g., clustering algorithm parameters or view selection in incomplete cases) are not detailed. This undermines confidence in the robustness of the cross-dataset superiority claim.
minor comments (2)
  1. [Abstract] Abstract: 'suboptimal Figures' appears to be a typo and should be clarified (likely intended as 'results' or 'performance').
  2. [Method] The handling of incomplete views is mentioned but lacks explicit description of how the dual-path architecture and discriminators are adapted when views are missing; a dedicated subsection or figure would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to improve our manuscript. We address each of the major comments point by point below, indicating the revisions we intend to make in the next version.

read point-by-point responses
  1. Referee: [Method (dual-path autoencoders and MI modulation)] The central performance claims on incomplete MVC tasks rest on the effectiveness of mutual information modulation in preserving complementarity without collapse or loss of discriminative information. However, the method description provides no quantitative verification such as measured MI values between view-specific and view-common embeddings or ablation studies removing the modulation term; without these, it remains possible that reported gains arise from increased model capacity rather than the claimed disentanglement.

    Authors: We acknowledge the importance of providing quantitative evidence for the mutual information modulation. In the revised manuscript, we will add ablation studies that isolate the effect of the MI modulation term by removing it and comparing performance. Additionally, we will include measurements of mutual information between the view-specific and view-common embeddings to verify the disentanglement and show that discriminative information is preserved. These additions will help demonstrate that the performance gains stem from the proposed disentanglement mechanism. revision: yes

  2. Referee: [Experiments] §4 (Experiments): The abstract states consistent outperformance on 13 datasets for both complete and incomplete settings, yet the evaluation protocols, hyperparameter sensitivity analysis, and controls for post-hoc choices (e.g., clustering algorithm parameters or view selection in incomplete cases) are not detailed. This undermines confidence in the robustness of the cross-dataset superiority claim.

    Authors: We agree that providing more details on the experimental setup is necessary to support the robustness of our claims. In the revision, we will expand the Experiments section to include a thorough description of the evaluation protocols used across the 13 datasets, a hyperparameter sensitivity analysis, and explicit controls for post-hoc choices including clustering algorithm parameters and view selection procedures in incomplete MVC settings. This will enhance the reproducibility and confidence in the reported results. revision: yes

Circularity Check

0 steps flagged

No circularity: GMAE is a constructive empirical framework with no derivation chain reducing to self-defined inputs

full rationale

The paper introduces GMAE as a new architecture employing dual-path autoencoders for disentangling view-specific and view-common embeddings, cross-view adversarial discriminators, and mutual information modulation to address entanglement in multi-view clustering. All claims rest on experimental validation across 13 datasets rather than any first-principles derivation, uniqueness theorem, or parameter fit that is then relabeled as a prediction. No equations or steps in the provided description reduce by construction to quantities defined from the method's own outputs or prior self-citations; the approach is presented as an independent constructive solution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract invokes standard deep-learning assumptions about the benefits of disentanglement and adversarial alignment without introducing new free parameters, axioms, or invented entities beyond the proposed architecture itself.

pith-pipeline@v0.9.0 · 5724 in / 1094 out tokens · 48149 ms · 2026-05-20T18:46:34.422051+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · 5 internal anchors

  1. [1]

    Multi-view discriminant analysis,

    M. Kan, S. Shan, H. Zhang, S. Lao, and X. Chen, “Multi-view discriminant analysis,”IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 1, pp. 188–194, 2015

  2. [2]

    An information theo- retic framework for multi-view learning,

    K. Sridharan and S. M. Kakade, “An information theo- retic framework for multi-view learning,” inCOLT, no. 114, 2008, pp. 403–414

  3. [3]

    A comprehensive survey on multi-view clustering,

    U. Fang, M. Li, J. Li, L. Gao, T. Jia, and Y. Zhang, “A comprehensive survey on multi-view clustering,”IEEE IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 13 Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12 350–12 368, 2023

  4. [4]

    Multi-view unsuper- vised user feature embedding for social media-based substance use prediction,

    T. Ding, W. K. Bickel, and S. Pan, “Multi-view unsuper- vised user feature embedding for social media-based substance use prediction,” inProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 2275–2284

  5. [5]

    Multi-omic data analysis using galaxy,

    J. Boekel, J. M. Chilton, I. R. Cooke, P . L. Horvatovich, P . D. Jagtap, L. K ¨all, J. Lehti ¨o, P . Lukasse, P . D. Moer- land, and T. J. Griffin, “Multi-omic data analysis using galaxy,”Nature biotechnology, vol. 33, no. 2, pp. 137–139, 2015

  6. [6]

    Hierarchical attention learning for multimodal classi- fication,

    X. Zou, C. Tang, W. Zhang, K. Sun, and L. Jiang, “Hierarchical attention learning for multimodal classi- fication,” in2023 IEEE International Conference on Mul- timedia and Expo (ICME). IEEE, 2023, pp. 936–941

  7. [7]

    Dpnet: Dynamic poly-attention network for trustworthy multi-modal classification,

    X. Zou, C. Tang, X. Zheng, Z. Li, X. He, S. An, and X. Liu, “Dpnet: Dynamic poly-attention network for trustworthy multi-modal classification,” inProceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 3550–3559

  8. [8]

    Dai-net: Dual adaptive interaction network for coordinated medication recommendation,

    X. Zou, X. He, X. Zheng, W. Zhang, J. Chen, and C. Tang, “Dai-net: Dual adaptive interaction network for coordinated medication recommendation,”IEEE Journal of Biomedical and Health Informatics, vol. 28, pp. 6201–6211, 2024

  9. [9]

    Modality-aware mutual learning for multi- modal medical image segmentation,

    Y. Zhang, J. Yang, J. Tian, Z. Shi, C. Zhong, Y. Zhang, and Z. He, “Modality-aware mutual learning for multi- modal medical image segmentation,” inMedical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer, 2021, pp. 589–599

  10. [10]

    Reconsidering representation alignment for multi- view clustering,

    D. J. Trosten, S. Lokse, R. Jenssen, and M. Kampffmeyer, “Reconsidering representation alignment for multi- view clustering,” inProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2021, pp. 1255–1265

  11. [11]

    Consensus graph learning for multi-view clustering,

    Z. Li, C. Tang, X. Liu, X. Zheng, W. Zhang, and E. Zhu, “Consensus graph learning for multi-view clustering,” IEEE Transactions on Multimedia, vol. 24, pp. 2461–2472, 2021

  12. [12]

    Adaptive feature projection with distribution alignment for deep incomplete multi-view clustering,

    J. Xu, C. Li, L. Peng, Y. Ren, X. Shi, H. T. Shen, and X. Zhu, “Adaptive feature projection with distribution alignment for deep incomplete multi-view clustering,” IEEE Transactions on Image Processing, vol. 32, pp. 1354– 1366, 2023

  13. [13]

    From concrete to abstract: Multi-view clustering on relational knowledge,

    K. Liang, L. Meng, H. Li, J. Wang, L. Lan, M. Li, X. Liu, and H. Wang, “From concrete to abstract: Multi-view clustering on relational knowledge,”IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–18, 2025

  14. [14]

    One-pass multi-view clustering for large- scale data,

    J. Liu, X. Liu, Y. Yang, L. Liu, S. Wang, W. Liang, and J. Shi, “One-pass multi-view clustering for large- scale data,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12 344–12 353

  15. [15]

    Orthogo- nal non-negative tensor factorization based multi-view clustering,

    J. Li, Q. Gao, Q. Wang, M. Yang, and W. Xia, “Orthogo- nal non-negative tensor factorization based multi-view clustering,”Advances in Neural Information Processing Systems, vol. 36, 2024

  16. [16]

    A survey of knowl- edge graph reasoning on graph types: Static, dynamic, and multi-modal,

    K. Liang, L. Meng, M. Liu, Y. Liu, W. Tu, S. Wang, S. Zhou, X. Liu, F. Sun, and K. He, “A survey of knowl- edge graph reasoning on graph types: Static, dynamic, and multi-modal,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 9456–9478, 2024

  17. [17]

    Gmc: Graph-based multi-view clustering,

    H. Wang, Y. Yang, and B. Liu, “Gmc: Graph-based multi-view clustering,”IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 6, pp. 1116–1129, 2019

  18. [18]

    Inclusivity induced adaptive graph learning for multi-view clustering,

    X. Zou, C. Tang, X. Zheng, K. Sun, W. Zhang, and D. Ding, “Inclusivity induced adaptive graph learning for multi-view clustering,”Knowledge-Based Systems, vol. 267, p. 110424, 2023

  19. [19]

    Multi-view contrastive graph clustering,

    E. Pan and Z. Kang, “Multi-view contrastive graph clustering,”Advances in neural information processing systems, vol. 34, pp. 2148–2159, 2021

  20. [20]

    Unified one-step multi-view spectral clustering,

    C. Tang, Z. Li, J. Wang, X. Liu, W. Zhang, and E. Zhu, “Unified one-step multi-view spectral clustering,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 6, pp. 6449–6460, 2022

  21. [21]

    Diversity-induced multi-view subspace clustering,

    X. Cao, C. Zhang, H. Fu, S. Liu, and H. Zhang, “Diversity-induced multi-view subspace clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 586–594

  22. [22]

    Multi-view sub- space clustering,

    H. Gao, F. Nie, X. Li, and H. Huang, “Multi-view sub- space clustering,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 4238–4246

  23. [23]

    Generalized latent multi-view subspace clus- tering,

    C. Zhang, H. Fu, Q. Hu, X. Cao, Y. Xie, D. Tao, and D. Xu, “Generalized latent multi-view subspace clus- tering,”IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 1, pp. 86–99, 2018

  24. [24]

    A survey on multiview clustering,

    G. Chao, S. Sun, and J. Bi, “A survey on multiview clustering,”IEEE transactions on artificial intelligence, vol. 2, no. 2, pp. 146–168, 2021

  25. [25]

    Deep adversarial multi-view clustering network

    Z. Li, Q. Wang, Z. Tao, Q. Gao, Z. Yanget al., “Deep adversarial multi-view clustering network.” inIJCAI, vol. 2, no. 3, 2019, p. 4

  26. [26]

    Deep safe incomplete multi-view clustering: Theorem and algorithm,

    H. Tang and Y. Liu, “Deep safe incomplete multi-view clustering: Theorem and algorithm,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 21 090–21 110

  27. [27]

    Dual alignment feature embedding network for multi- omics data clustering,

    Y. Xiao, D. Yang, J. Li, X. Zou, H. Zhou, and C. Tang, “Dual alignment feature embedding network for multi- omics data clustering,”Knowledge-Based Systems, vol. 309, p. 112774, 2025

  28. [28]

    On the effects of self-supervision and contrastive alignment in deep multi-view clustering,

    D. J. Trosten, S. Løkse, R. Jenssen, and M. C. Kampffmeyer, “On the effects of self-supervision and contrastive alignment in deep multi-view clustering,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 23 976–23 985

  29. [29]

    A novel approach for effective multi-view clustering with information-theoretic perspective,

    C. Cui, Y. Ren, J. Pu, J. Li, X. Pu, T. Wu, Y. Shi, and L. He, “A novel approach for effective multi-view clustering with information-theoretic perspective,”Advances in Neural Information Processing Systems, vol. 36, 2024

  30. [30]

    Trusted mamba contrastive network for multi-view clustering,

    J. Zhu, X. Zou, L. Liu, Z. Huang, Y. Zhang, C. Tang, and L.-R. Dai, “Trusted mamba contrastive network for multi-view clustering,”arXiv preprint arXiv:2412.16487, 2024

  31. [31]

    Rethinking multi-view representation learning via distilled dis- entangling,

    G. Ke, B. Wang, X. Wang, and S. He, “Rethinking multi-view representation learning via distilled dis- entangling,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 14 26 774–26 783

  32. [33]

    Com- pleter: Incomplete multi-view clustering via contrastive prediction,

    Y. Lin, Y. Gou, Z. Liu, B. Li, J. Lv, and X. Peng, “Com- pleter: Incomplete multi-view clustering via contrastive prediction,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 11 174– 11 183

  33. [34]

    Gcfagg: Global and cross-view feature aggregation for multi-view clustering,

    W. Yan, Y. Zhang, C. Lv, C. Tang, G. Yue, L. Liao, and W. Lin, “Gcfagg: Global and cross-view feature aggregation for multi-view clustering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 863–19 872

  34. [35]

    Self-weighted contrastive fusion for deep multi-view clustering,

    S. Wu, Y. Zheng, Y. Ren, J. He, X. Pu, S. Huang, Z. Hao, and L. He, “Self-weighted contrastive fusion for deep multi-view clustering,”IEEE Transactions on Multimedia, 2024

  35. [36]

    Investigating and mitigating the side effects of noisy views for self-supervised clustering algorithms in practical multi-view scenarios,

    J. Xu, Y. Ren, X. Wang, L. Feng, Z. Zhang, G. Niu, and X. Zhu, “Investigating and mitigating the side effects of noisy views for self-supervised clustering algorithms in practical multi-view scenarios,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 957–22 966

  36. [37]

    Visualizing data using t-sne

    L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.”Journal of machine learning research, vol. 9, no. 11, 2008

  37. [38]

    Autoencoders, minimum description length and helmholtz free energy,

    G. E. Hinton and R. Zemel, “Autoencoders, minimum description length and helmholtz free energy,”Ad- vances in neural information processing systems, vol. 6, 1993

  38. [39]

    Unsupervised deep embedding for clustering analysis,

    J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” inInternational con- ference on machine learning. PMLR, 2016, pp. 478–487

  39. [40]

    Improved deep em- bedded clustering with local structure preservation

    X. Guo, L. Gao, X. Liu, and J. Yin, “Improved deep em- bedded clustering with local structure preservation.” in Ijcai, vol. 17, 2017, pp. 1753–1759

  40. [41]

    Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering

    Z. Jiang, Y. Zheng, H. Tan, B. Tang, and H. Zhou, “Variational deep embedding: An unsupervised and generative approach to clustering,”arXiv preprint arXiv:1611.05148, 2016

  41. [42]

    Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders

    N. Dilokthanakul, P . A. Mediano, M. Garnelo, M. C. Lee, H. Salimbeni, K. Arulkumaran, and M. Shana- han, “Deep unsupervised clustering with gaussian mixture variational autoencoders,”arXiv preprint arXiv:1611.02648, 2016

  42. [43]

    Multi-vae: Learning disentangled view- common and view-peculiar visual representations for multi-view clustering,

    J. Xu, Y. Ren, H. Tang, X. Pu, X. Zhu, M. Zeng, and L. He, “Multi-vae: Learning disentangled view- common and view-peculiar visual representations for multi-view clustering,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9234–9243

  43. [44]

    Generative adversarial nets,

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”Advances in neural infor- mation processing systems, vol. 27, 2014

  44. [45]

    Generative adversarial networks,

    ——, “Generative adversarial networks,”Communica- tions of the ACM, vol. 63, no. 11, pp. 139–144, 2020

  45. [46]

    Intriguing properties of synthetic im- ages: from generative adversarial networks to diffusion models,

    R. Corvi, D. Cozzolino, G. Poggi, K. Nagano, and L. Verdoliva, “Intriguing properties of synthetic im- ages: from generative adversarial networks to diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 973– 982

  46. [47]

    Dual adversarial autoencoders for clustering,

    P . Ge, C.-X. Ren, D.-Q. Dai, J. Feng, and S. Yan, “Dual adversarial autoencoders for clustering,”IEEE trans- actions on neural networks and learning systems, vol. 31, no. 4, pp. 1417–1424, 2019

  47. [48]

    Sparsemvc: Probing cross-view sparsity variations for multi-view clustering,

    R. Liu, X. Zou, C. Tang, X. Zheng, X. Hu, K. Sun, and X. Liu, “Sparsemvc: Probing cross-view sparsity variations for multi-view clustering,” inThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025

  48. [49]

    Representation learning in multi-view clustering: A literature review,

    M.-S. Chen, J.-Q. Lin, X.-L. Li, B.-Y. Liu, C.-D. Wang, D. Huang, and J.-H. Lai, “Representation learning in multi-view clustering: A literature review,”Data Science and Engineering, vol. 7, no. 3, pp. 225–241, 2022

  49. [50]

    An information- maximization approach to blind separation and blind deconvolution,

    A. J. Bell and T. J. Sejnowski, “An information- maximization approach to blind separation and blind deconvolution,”Neural computation, vol. 7, no. 6, pp. 1129–1159, 1995

  50. [51]

    Learning deep representations by mutual information estimation and maximization

    R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Gre- wal, P . Bachman, A. Trischler, and Y. Bengio, “Learning deep representations by mutual information estimation and maximization,”arXiv preprint arXiv:1808.06670, 2018

  51. [52]

    Learn- ing representations by maximizing mutual information across views,

    P . Bachman, R. D. Hjelm, and W. Buchwalter, “Learn- ing representations by maximizing mutual information across views,”Advances in neural information processing systems, vol. 32, 2019

  52. [53]

    Deep mutual information maximin for cross-modal clustering,

    Y. Mao, X. Yan, Q. Guo, and Y. Ye, “Deep mutual information maximin for cross-modal clustering,” in Proceedings of the AAAI Conference on Artificial Intelli- gence, vol. 35, no. 10, 2021, pp. 8893–8901

  53. [54]

    Multi-view clustering via triplex information maximization,

    C. Zhang, Z. Lou, Q. Zhou, and S. Hu, “Multi-view clustering via triplex information maximization,”IEEE Transactions on Image Processing, 2023

  54. [55]

    De- coupled contrastive multi-view clustering with high- order random walks,

    Y. Lu, Y. Lin, M. Yang, D. Peng, P . Hu, and X. Peng, “De- coupled contrastive multi-view clustering with high- order random walks,” inProceedings of the AAAI Con- ference on Artificial Intelligence, vol. 38, no. 13, 2024, pp. 14 193–14 201

  55. [56]

    Mcoco: Multi-level consistency collaborative multi- view clustering,

    Y. Zhou, Q. Zheng, Y. Wang, W. Yan, P . Shi, and J. Zhu, “Mcoco: Multi-level consistency collaborative multi- view clustering,”Expert Systems with Applications, vol. 238, p. 121976, 2024

  56. [57]

    beta- vae: Learning basic visual concepts with a constrained variational framework

    I. Higgins, L. Matthey, A. Pal, C. P . Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner, “beta- vae: Learning basic visual concepts with a constrained variational framework.”ICLR (Poster), vol. 3, 2017

  57. [58]

    Understanding disentangling in $\beta$-VAE

    C. P . Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner, “Understanding dis- entangling inβ-VAE,”arXiv preprint arXiv:1804.03599, 2018

  58. [59]

    Challenging common assumptions in the unsupervised learning of disen- tangled representations,

    F. Locatello, S. Bauer, M. Lucic, G. Raetsch, S. Gelly, B. Sch ¨olkopf, and O. Bachem, “Challenging common assumptions in the unsupervised learning of disen- tangled representations,” ininternational conference on machine learning. PMLR, 2019, pp. 4114–4124

  59. [60]

    Infogan: Interpretable rep- IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 15 resentation learning by information maximizing gener- ative adversarial nets,

    X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P . Abbeel, “Infogan: Interpretable rep- IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 15 resentation learning by information maximizing gener- ative adversarial nets,”Advances in neural information processing systems, vol. 29, 2016

  60. [61]

    Causalvae: Disentangled representation learning via neural structural causal models,

    M. Yang, F. Liu, Z. Chen, X. Shen, J. Hao, and J. Wang, “Causalvae: Disentangled representation learning via neural structural causal models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, 2021, pp. 9593–9602

  61. [62]

    High-fidelity synthesis with disentangled representation,

    W. Lee, D. Kim, S. Hong, and H. Lee, “High-fidelity synthesis with disentangled representation,” inCom- puter Vision–ECCV 2020: 16th European Conference, Glas- gow, UK, August 23–28, 2020, Proceedings, Part XXVI 16. Springer, 2020, pp. 157–174

  62. [63]

    Vmi-vae: Variational mu- tual information maximization framework for vae with discrete and continuous priors,

    A. Serdega and D.-S. Kim, “Vmi-vae: Variational mu- tual information maximization framework for vae with discrete and continuous priors,”arXiv preprint arXiv:2005.13953, 2020

  63. [64]

    Debias- ing graph neural networks via learning disentangled causal substructure,

    S. Fan, X. Wang, Y. Mo, C. Shi, and J. Tang, “Debias- ing graph neural networks via learning disentangled causal substructure,”Advances in Neural Information Processing Systems, vol. 35, pp. 24 934–24 946, 2022

  64. [65]

    Multi-level feature learning for contrastive multi-view clustering,

    J. Xu, H. Tang, Y. Ren, L. Peng, X. Zhu, and L. He, “Multi-level feature learning for contrastive multi-view clustering,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 051– 16 060

  65. [66]

    Reducing the dimensionality of data with neural networks,

    G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,”science, vol. 313, no. 5786, pp. 504–507, 2006

  66. [67]

    Deep spectral clustering using dual autoencoder network,

    X. Yang, C. Deng, F. Zheng, J. Yan, and W. Liu, “Deep spectral clustering using dual autoencoder network,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4066–4075

  67. [68]

    Adaptive graph auto- encoder for general data clustering,

    X. Li, H. Zhang, and R. Zhang, “Adaptive graph auto- encoder for general data clustering,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9725–9732, 2021

  68. [69]

    Trustworthy multi-view clustering via alternat- ing generative adversarial representation learning and fusion,

    W. Yang, M. Wang, C. Tang, X. Zheng, X. Liu, and K. He, “Trustworthy multi-view clustering via alternat- ing generative adversarial representation learning and fusion,”Information Fusion, vol. 107, p. 102323, 2024

  69. [70]

    Zeronas: Differentiable generative adver- sarial networks search for zero-shot learning,

    C. Yan, X. Chang, Z. Li, W. Guan, Z. Ge, L. Zhu, and Q. Zheng, “Zeronas: Differentiable generative adver- sarial networks search for zero-shot learning,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 12, pp. 9733–9740, 2021

  70. [71]

    Representation Learning with Contrastive Predictive Coding

    A. v. d. Oord, Y. Li, and O. Vinyals, “Representa- tion learning with contrastive predictive coding,”arXiv preprint arXiv:1807.03748, 2018

  71. [72]

    Mutual information-driven multi-view clustering,

    L. Zhang, L. Fu, T. Wang, C. Chen, and C. Zhang, “Mutual information-driven multi-view clustering,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 3268– 3277

  72. [73]

    Disentangled multiplex graph representation learn- ing,

    Y. Mo, Y. Lei, J. Shen, X. Shi, H. T. Shen, and X. Zhu, “Disentangled multiplex graph representation learn- ing,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 24 983–25 005

  73. [74]

    Dual contrastive prediction for incomplete multi-view representation learning,

    Y. Lin, Y. Gou, X. Liu, J. Bai, J. Lv, and X. Peng, “Dual contrastive prediction for incomplete multi-view representation learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4447–4461, 2022

  74. [75]

    Deep safe multi-view clustering: Reducing the risk of clustering performance degrada- tion caused by view increase,

    H. Tang and Y. Liu, “Deep safe multi-view clustering: Reducing the risk of clustering performance degrada- tion caused by view increase,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 202–211

  75. [76]

    Robust multi-view clustering with incomplete information,

    M. Yang, Y. Li, P . Hu, J. Bai, J. Lv, and X. Peng, “Robust multi-view clustering with incomplete information,” IEEE Transactions on Pattern Analysis and Machine In- telligence, vol. 45, no. 1, pp. 1055–1069, 2022

  76. [77]

    Dealmvc: Dual contrastive calibration for multi-view clustering,

    X. Yang, J. Jiaqi, S. Wang, K. Liang, Y. Liu, Y. Wen, S. Liu, S. Zhou, X. Liu, and E. Zhu, “Dealmvc: Dual contrastive calibration for multi-view clustering,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 337–346

  77. [78]

    Deep incomplete multi-view clustering with cross-view par- tial sample and prototype alignment,

    J. Jin, S. Wang, Z. Dong, X. Liu, and E. Zhu, “Deep incomplete multi-view clustering with cross-view par- tial sample and prototype alignment,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 11 600–11 609

  78. [79]

    Self-supervised discriminative feature learning for deep multi-view clustering,

    J. Xu, Y. Ren, H. Tang, Z. Yang, L. Pan, Y. Yang, X. Pu, S. Y. Philip, and L. He, “Self-supervised discriminative feature learning for deep multi-view clustering,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 7, pp. 7470–7482, 2023

  79. [80]

    Deep multi- view clustering by contrasting cluster assignments,

    J. Chen, H. Mao, W. L. Woo, and X. Peng, “Deep multi- view clustering by contrasting cluster assignments,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 16 752–16 761

  80. [81]

    Incomplete multi-view clustering via diffusion contrastive generation,

    Y. Zhang, Y. Lin, W. Yan, L. Yao, X. Wan, G. Li, C. Zhang, G. Ke, and J. Xu, “Incomplete multi-view clustering via diffusion contrastive generation,” inPro- ceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 21, 2025, pp. 22 650–22 658

Showing first 80 references.