On the Provable Importance of Gradients for Language-Assisted Image Clustering

· 2025 · cs.CV · arXiv 2510.16335

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

This paper investigates the recently emerged problem of Language-assisted Image Clustering (LaIC), where textual semantics are leveraged to improve the discriminability of visual representations to facilitate image clustering. Due to the unavailability of true class names, one of core challenges of LaIC lies in how to filter positive nouns, i.e., those semantically close to the images of interest, from unlabeled wild corpus data. Existing filtering strategies are predominantly based on the off-the-shelf feature space learned by CLIP; however, despite being intuitive, these strategies lack a rigorous theoretical foundation. To fill this gap, we propose a novel gradient-based framework, termed as GradNorm, which is theoretically guaranteed and shows strong empirical performance. In particular, we measure the positiveness of each noun based on the magnitude of gradients back-propagated from the cross-entropy between the predicted target distribution and the softmax output. Theoretically, we provide a rigorous error bound to quantify the separability of positive nouns by GradNorm and prove that GradNorm naturally subsumes existing filtering strategies as extremely special cases of itself. Empirically, extensive experiments show that GradNorm achieves the state-of-the-art clustering performance on various benchmarks. Code is publicly available at \href{https://github.com/60pen9/On-the-Provable-Importance-of-Gradients-for-Language-Assisted-Image-Clustering}{here}.

representative citing papers

Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Debiased negative mining via Monte-Carlo sampling from ID labels and unlabeled wild data improves OOD detection with VLMs and achieves new state-of-the-art results.

citing papers explorer

Showing 1 of 1 citing paper.

Debiased Negative Mining Improves Out-of-distribution Detection with Pre-trained Vision-Language Models cs.LG · 2026-05-22 · unverdicted · none · ref 48 · internal anchor
Debiased negative mining via Monte-Carlo sampling from ID labels and unlabeled wild data improves OOD detection with VLMs and achieves new state-of-the-art results.

On the Provable Importance of Gradients for Language-Assisted Image Clustering

fields

years

verdicts

representative citing papers

citing papers explorer