pith. machine review for the scientific record. sign in

arxiv: 1706.02582 · v1 · submitted 2017-06-08 · 💻 cs.LG · stat.ML

Recognition: unknown

Clustering with t-SNE, provably

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords t-sneclusteringmaatenembeddingexaggerationhintonproposedprove
0
0 comments X
read the original abstract

t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove that t-SNE is able to recover well-separated clusters; more precisely, we prove that t-SNE in the `early exaggeration' phase, an optimization technique proposed by van der Maaten & Hinton (2008) and van der Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests novel ways for setting the exaggeration parameter $\alpha$ and step size $h$. Numerical examples illustrate the effectiveness of these rules: in particular, the quality of embedding of topological structures (e.g. the swiss roll) improves. We also discuss a connection to spectral clustering methods.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DR-SNE: Density-Regularized Stochastic Neighbor Embedding

    cs.LG 2026-05 unverdicted novelty 6.0

    DR-SNE augments the SNE objective with a density regularization term from normalized log-density estimates to preserve relative densities while retaining neighborhood structure.