Double Self-weighted Multi-view Clustering via Adaptive View Fusion
read the original abstract
Multi-view clustering has been applied in many real-world applications where original data often contain noises. Some graph-based multi-view clustering methods have been proposed to try to reduce the negative influence of noises. However, previous graph-based multi-view clustering methods treat all features equally even if there are redundant features or noises, which is obviously unreasonable. In this paper, we propose a novel multi-view clustering framework Double Self-weighted Multi-view Clustering (DSMC) to overcome the aforementioned deficiency. DSMC performs double self-weighted operations to remove redundant features and noises from each graph, thereby obtaining robust graphs. For the first self-weighted operation, it assigns different weights to different features by introducing an adaptive weight matrix, which can reinforce the role of the important features in the joint representation and make each graph robust. For the second self-weighting operation, it weights different graphs by imposing an adaptive weight factor, which can assign larger weights to more robust graphs. Furthermore, by designing an adaptive multiple graphs fusion, we can fuse the features in the different graphs to integrate these graphs for clustering. Experiments on six real-world datasets demonstrate its advantages over other state-of-the-art multi-view clustering methods.
This paper has not been read by Pith yet.
Forward citations
Cited by 9 Pith papers
-
Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval Using Language
OpenVMR uses normalizing flow to detect out-of-distribution queries and performs moment retrieval only on in-distribution queries.
-
Annotations Are Not All You Need: A Cross-modal Knowledge Transfer Network for Unsupervised Temporal Sentence Grounding
A cross-modal knowledge transfer network performs unsupervised temporal sentence grounding by adapting appearance knowledge from Image-Noun pairs and action knowledge from Video-Verb pairs using a copy-paste refinement.
-
Turing Patterns for Multimedia: Reaction-Diffusion Multi-Modal Fusion for Language-Guided Video Moment Retrieval
Reaction-Diffusion Multimodal Fusion (RDMF) applies the Gray-Scott model to video-text alignment for language-guided moment retrieval, claiming better adaptive modeling than static attention.
-
Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs
Proposes the first unified incomplete video-language model that processes missing modalities and serves as a plug-and-play module to boost existing VLMs on multi-modal tasks.
-
Rethinking Video-Language Model from the Language Input Perspective
Introduces a plug-and-play framework that generates varied texts and uses attribute reasoning plus video-guided loss to improve state-of-the-art Video-Language Models.
-
SLAP: The Semantic Least Action Principle for Variational Video-Language Modeling
SLAP reframes video interpolation as a variational mechanics boundary value problem on a semantic manifold to enforce object persistence without pixel rendering.
-
Immuno-VLM: Immunizing Large Vision-Language Models via Generative Semantic Antibodies for Open-World Trustworthiness
Immuno-VLM generates semantic antibodies via LLMs to bound VLM decision spaces for open-world trustworthiness and reports new SOTA results on ImageNet-1K plus four OOD benchmarks.
-
Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security
APD framework disentangles adversarial prompts via mutual information decomposition, spectral graph analysis, and a trained classifier to cut harmful LLM outputs by over 85%.
-
CogniVerse: Revolutionizing Multi-Modal Retrieval-Augmented Generation with Cognitive Reflection and Geometric Reasoning
CogniVerse is a proposed MMRAG framework that combines cognitive reflection for retrieval filtering, Riemannian manifold alignment plus spectral graphs for retrieval, and optimal transport loss for generation, claimin...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.