Recognition: unknown
On the continuum limit of t-SNE for data visualization
Pith reviewed 2026-05-10 15:05 UTC · model grok-4.3
The pith
As the dataset size tends to infinity, t-SNE's Kullback-Leibler objective converges after rescaling to a continuum variational problem with non-convex gradient regularization and a density penalty.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove that as the number of data points n → ∞, after a natural rescaling and in applicable parameter regimes, the Kullback-Leibler divergence is consistent with a continuum variational problem that involves a non-convex gradient regularization term and a penalty on the magnitude of the probability density function in the visualization space. These two terms represent the continuum limits of the attraction and repulsion forces in the t-SNE algorithm. Due to the lack of convexity, well-posedness is only partially resolved: when both dimensions are one the problem admits a unique smooth minimizer along with an infinite number of discontinuous minimizers interpreted in a relaxed sense.
What carries the argument
The continuum variational problem whose integrand combines a non-convex gradient regularization term with a penalty on the magnitude of the probability density function in the visualization space.
If this is right
- The discrete attraction and repulsion forces of t-SNE correspond exactly to the gradient-regularization and density-penalty terms in the continuum limit.
- In one dimension there exists a unique smooth minimizer and infinitely many discontinuous minimizers, matching the empirical capacity of t-SNE to separate data in arbitrary ways.
- The limiting energy is closely related to the Perona-Malik equation, suggesting shared analytical and numerical features with image-denoising problems.
- Numerical experiments with increasing data size confirm that the discrete t-SNE objective approaches the predicted continuum energy.
Where Pith is reading between the lines
- The non-convexity of the limit may explain why t-SNE embeddings can depend strongly on initialization and hyper-parameter choices in practice.
- In dimensions greater than one, studying relaxed or measure-valued minimizers could clarify how the method produces distinct cluster separations.
- Numerical schemes developed for the Perona-Malik equation might be adapted to compute or regularize t-SNE embeddings at large scale.
Load-bearing premise
After natural rescaling the similarity graph remains sparse and the continuum limit of the discrete objective exists in the applicable parameter regimes.
What would settle it
Numerical simulations in which the t-SNE embeddings for successively larger point sets fail to approach the minimizers of the stated continuum energy would falsify the convergence claim.
Figures
read the original abstract
This work is concerned with the continuum limit of a graph-based data visualization technique called the t-Distributed Stochastic Neighbor Embedding (t-SNE), which is widely used for visualizing data in a variety of applications, but is still poorly understood from a theoretical standpoint. The t-SNE algorithm produces visualizations by minimizing the Kullback-Leibler divergence between similarity matrices representing the high dimensional data and its low dimensional representation. We prove that as the number of data points $n \to \infty$, after a natural rescaling and in applicable parameter regimes, the Kullback-Leibler divergence is consistent as the number of data points $n \to \infty$ and the similarity graph remains sparse with a continuum variational problem that involves a non-convex gradient regularization term and a penalty on the magnitude of the probability density function in the visualization space. These two terms represent the continuum limits of the attraction and repulsion forces in the t-SNE algorithm. Due to the lack of convexity in the continuum variational problem, the question of well-posedeness is only partially resolved. We show that when both dimensions are $1$, the problem admits a unique smooth minimizer, along with an infinite number of discontinuous minimizers (interpreted in a relaxed sense). This aligns well with the empirically observed ability of t-SNE to separate data in seemingly arbitrary ways in the visualization. The energy is also very closely related to the famously ill-posed Perona-Malik equation, which is used for denoising and simplifying images. We present numerical results validating the continuum limit, provide some preliminary results about the delicate nature of the limiting energetic problem in higher dimensions, and highlight several problems for future work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes the continuum limit of t-SNE by proving that, as n→∞ after natural rescaling and in regimes where the similarity graph remains sparse, the discrete KL divergence between high- and low-dimensional similarities converges to a continuum variational energy consisting of a non-convex gradient regularization term (from attraction) and a density penalty (from repulsion). It establishes that the 1D limit problem admits a unique smooth minimizer together with infinitely many discontinuous minimizers (in a relaxed sense), links the energy to the Perona-Malik equation, supplies numerical validation of the limit, and offers preliminary observations on the higher-dimensional case.
Significance. If the consistency result is fully established, the work supplies a rigorous asymptotic foundation for t-SNE that explains its empirical clustering behavior and connects a widely used visualization method to a known ill-posed variational problem. The explicit 1D uniqueness statement and the numerical checks are concrete strengths; the partial higher-D analysis correctly identifies open questions rather than overclaiming.
major comments (3)
- [Abstract, §1] Abstract and §1: the consistency statement is conditioned on 'applicable parameter regimes' in which the rescaled similarity graph remains sparse, yet no explicit bounds on perplexity, bandwidth scaling, or embedding dimension are supplied to delineate these regimes. This assumption is load-bearing for the limit derivation and for applicability to standard fixed-perplexity t-SNE.
- [§4] §4 (1D well-posedness): while uniqueness of the smooth minimizer is shown, the relaxed formulation for discontinuous minimizers is only sketched; a precise statement of the relaxation (e.g., which space and which notion of convergence) is needed to confirm that the infinite family of minimizers is indeed attained by the discrete t-SNE dynamics.
- [§5] §5 (higher dimensions): the preliminary numerical observations on non-uniqueness and ill-posedness are useful, but the absence of even a partial well-posedness result or a clear statement of the technical obstacles (beyond non-convexity) leaves the central claim about the continuum limit incomplete for d>1.
minor comments (2)
- Notation for the rescaled bandwidth and embedding coordinates should be introduced once and used consistently; several passages reuse symbols without redefinition.
- The numerical section would benefit from a table or plot showing the rate at which the discrete KL approaches the continuum energy as n increases, rather than qualitative agreement only.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major point below and indicate the revisions we plan to incorporate.
read point-by-point responses
-
Referee: [Abstract, §1] Abstract and §1: the consistency statement is conditioned on 'applicable parameter regimes' in which the rescaled similarity graph remains sparse, yet no explicit bounds on perplexity, bandwidth scaling, or embedding dimension are supplied to delineate these regimes. This assumption is load-bearing for the limit derivation and for applicability to standard fixed-perplexity t-SNE.
Authors: We agree that the manuscript would benefit from clearer delineation of the regimes. In the revision we will add explicit asymptotic conditions in the abstract and §1 (e.g., perplexity scaling slower than n and bandwidth chosen so that the rescaled graph remains locally finite) under which the sparsity assumption holds for typical data distributions. Deriving fully sharp, distribution-independent bounds is technically demanding and lies beyond the present scope; we will flag this as future work. revision: partial
-
Referee: [§4] §4 (1D well-posedness): while uniqueness of the smooth minimizer is shown, the relaxed formulation for discontinuous minimizers is only sketched; a precise statement of the relaxation (e.g., which space and which notion of convergence) is needed to confirm that the infinite family of minimizers is indeed attained by the discrete t-SNE dynamics.
Authors: The referee correctly identifies that the relaxed formulation is only sketched. We will revise §4 to give a precise statement: the energy is extended to the space of probability measures equipped with weak-* convergence, and we will prove that the relaxed functional is the Gamma-limit of the discrete energies. This will make the connection to the discrete t-SNE dynamics rigorous. revision: yes
-
Referee: [§5] §5 (higher dimensions): the preliminary numerical observations on non-uniqueness and ill-posedness are useful, but the absence of even a partial well-posedness result or a clear statement of the technical obstacles (beyond non-convexity) leaves the central claim about the continuum limit incomplete for d>1.
Authors: We acknowledge that the higher-dimensional analysis remains preliminary. In the revision we will expand §5 with an explicit list of obstacles (lack of a maximum principle, possible mass concentration, and failure of standard compactness arguments) and will state that no partial well-posedness result is currently available for d>1. The continuum-limit derivation itself is independent of well-posedness and holds under the same sparsity assumption. revision: partial
Circularity Check
No circularity: direct asymptotic analysis of discrete KL to continuum energy
full rationale
The paper derives the continuum limit of the t-SNE objective via standard asymptotic analysis of the discrete Kullback-Leibler divergence as n→∞ under rescaling, showing convergence to a variational problem with gradient regularization and density penalty terms. This proceeds from the finite-n graph-based objective to integral functionals without any parameter fitting, self-referential definitions, or load-bearing self-citations; the 1D well-posedness is established directly from the limiting energy, and higher-dimensional issues are left open. The sparsity assumption defines the regime of applicability but does not reduce the claimed consistency result to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Data points are sampled from a continuous probability density allowing a well-defined continuum limit of the similarity graph
- domain assumption The similarity kernel bandwidth and perplexity parameters are chosen so that the graph remains sparse in the limit
Reference graph
Works this paper leans on
-
[1]
Arora, W
S. Arora, W. Hu, and P. K. Kothari. An analysis of the t-SNE algorithm for data visualization. InConference on learning theory, pages 1455–1462. PMLR, 2018
2018
-
[2]
A. Auffinger and D. Fletcher. Equilibrium distributions for t-distributed stochastic neighbour embedding.arXiv preprint arXiv:2304.03727, 2023
-
[3]
Bardi, I
M. Bardi, I. C. Dolcetta, et al.Optimal control and viscosity solutions of Hamilton-Jacobi- Bellman equations, volume 12. Springer, 1997
1997
-
[4]
Belkin and P
M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data represen- tation.Neural computation, 15(6):1373–1396, 2003
2003
- [5]
-
[6]
Bocker, M
M. Bocker, M. G. Grushko, and K. E. Arline. Toward improved cancer classification using PCA+ tSNE dimensionality reduction on bulk RNA-seq data.Cancer Research, 82(12_Supplement):2708–2708, 2022
2022
-
[7]
Boucheron, G
S. Boucheron, G. Lugosi, and O. Bousquet. Concentration inequalities. InSummer school on machine learning, pages 208–240. Springer, 2003
2003
-
[8]
L. Bungert, J. Calder, M. Mihailescu, K. Houssou, and A. Yuan. Convergence rates for Poisson learning to a Poisson equation with measure data.arXiv preprint arXiv:2407.06783, 2024
-
[9]
Burchard
A. Burchard. A short course on rearrangement inequalities.Lecture notes, IMDEA Winter School, Madrid, 2009
2009
-
[10]
T. T. Cai and R. Ma. Theoretical foundations of t-sne for visualizing high-dimensional clustered data.Journal of Machine Learning Research, 23(301):1–54, 2022
2022
-
[11]
J. Calder. Lecture Notes on Viscosity Solutions.Online Lecture Notes, 2018. [pdf]
2018
-
[12]
J. Calder. The Calculus of Variations.Online Lecture Notes, 2020. [pdf]
2020
-
[13]
Calder and A
J. Calder and A. Mansouri. Anisotropic image sharpening via well-posed Sobolev gradient flows.SIAM journal on mathematical analysis, 43(4):1536–1556, 2011
2011
-
[14]
Calder and P
J. Calder and P. J. Olver.Linear Algebra, Data Science, and Machine Learning. Springer, 2025. 45
2025
-
[15]
Catté, P.-L
F. Catté, P.-L. Lions, J.-M. Morel, and T. Coll. Image selective smoothing and edge detection by nonlinear diffusion.SIAM Journal on Numerical analysis, 29(1):182–193, 1992
1992
-
[16]
M. C. Cieslak, A. M. Castelfranco, V. Roncalli, P. H. Lenz, and D. K. Hartline. t-Distributed Stochastic Neighbor Embedding (t-SNE): A tool for eco-physiological transcriptomic analysis. Marine genomics, 51:100723, 2020
2020
-
[17]
R. R. Coifman and S. Lafon. Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006
2006
-
[18]
Dacorogna.Direct methods in the calculus of variations
B. Dacorogna.Direct methods in the calculus of variations. Springer, 2008
2008
-
[19]
Damrich and F
S. Damrich and F. A. Hamprecht. On UMAP’s true loss function.Advances in Neural Infor- mation Processing Systems, 34:5798–5809, 2021
2021
-
[20]
Esedoglu
S. Esedoglu. Stability Properties of the Perona–Malik Scheme.SIAM journal on numerical analysis, 44(3):1297–1313, 2006
2006
-
[21]
Esedo¯ glu
S. Esedo¯ glu. An analysis of the Perona-Malik scheme.Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 54(12):1442– 1487, 2001
2001
-
[22]
L. C. Evans.Measure Theory and Fine Properties of Functions. Chapman and Hall/CRC, 2nd edition, 2025
2025
-
[23]
Guidotti
P. Guidotti. A backward–forward regularization of the Perona–Malik equation.Journal of Differential Equations, 252(4):3226–3244, 2012
2012
-
[24]
G. E. Hinton and S. Roweis. Stochastic neighbor embedding.Advances in neural information processing systems, 15, 2002
2002
-
[25]
S. Jeong and H.-T. Wu. Convergence analysis of t-SNE as a gradient flow for point cloud on a manifold.arXiv preprint arXiv:2401.17675, 2024
-
[26]
Kichenassamy
S. Kichenassamy. The perona–malik paradox.SIAM Journal on Applied Mathematics, 57(5):1328–1342, 1997
1997
-
[27]
Kim and B
S. Kim and B. Yan. Convex Integration and Infinitely Many Weak Solutions to the Perona– Malik Equation in All Dimensions.SIAM Journal on Mathematical Analysis, 47(4):2770–2794, 2015
2015
-
[28]
Kobak and P
D. Kobak and P. Berens. The art of using t-SNE for single-cell transcriptomics.Nature communications, 10(1):5416, 2019
2019
-
[29]
LeCun, L
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002
2002
-
[30]
G. C. Linderman, M. Rachh, J. G. Hoskins, S. Steinerberger, and Y. Kluger. Fast interpolation- based t-SNE for improved visualization of single-cell RNA-seq data.Nature methods, 16(3):243– 245, 2019
2019
-
[31]
G. C. Linderman and S. Steinerberger. Clustering with t-SNE, provably.SIAM journal on mathematics of data science, 1(2):313–332, 2019. 46
2019
-
[32]
G. C. Linderman and S. Steinerberger. Dimensionality reduction via dynamical systems: the case of t-SNE.SIAM Review, 64(1):153–178, 2022
2022
-
[33]
Lu and J
J. Lu and J. Calder. Attraction-Repulsion Swarming: A Generalized Framework of t-SNE via Force Normalization and Tunable Interactions.Philosophical Transactions of the Royal Society A, special issue on Partial Differential Equations in Data Science, 2025. [arXiv], [Journal], [Code]
2025
-
[34]
L. v. d. Maaten and G. Hinton. Visualizing data using t-SNE.Journal of machine learning research, 9(Nov):2579–2605, 2008
2008
-
[35]
Mandel, R
J. Mandel, R. Avula, and E. V. Prochownik. Sequential analysis of transcript expression patterns improves survival prediction in multiple cancers.BMC cancer, 20:1–14, 2020
2020
-
[36]
Marino and S
G. Marino and S. Mosconi. Lipschitz regularity for solutions of a general class of elliptic equations.Calculus of Variations and Partial Differential Equations, 63(1):25, 2024
2024
-
[37]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
L. McInnes, J. Healy, and J. Melville. Umap: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018
work page internal anchor Pith review arXiv 2018
-
[38]
Miyamoto, M
A. Miyamoto, M. V. Bendarkar, and D. N. Mavris. Natural Language Processing of Aviation Safety Reports to Identify Inefficient Operational Patterns.Aerospace, 9(8):450, 2022
2022
-
[39]
R. Murray and A. Pickarski. Large data limits and scaling laws for tSNE.arXiv preprint arXiv:2410.13063, 2024
-
[40]
F. Otto. The Geometry of Dissipative Evolution Equations: The Porous Medium Equation. Communications in Partial Differential Equations, 26(1-2):101–174, Jan. 2001. [Journal]
2001
-
[41]
Perona and J
P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion.IEEE Transactions on pattern analysis and machine intelligence, 12(7):629–639, 1990
1990
-
[42]
Sharma, M
N. Sharma, M. Mangla, S. N. Mohanty, and S. Satpaty. A stochastic neighbor embedding approach for cancer prediction. In2021 international conference on emerging smart computing and informatics (ESCI), pages 599–603. IEEE, 2021
2021
-
[43]
Sharma and S
N. Sharma and S. Sharma. Optimization of t-SNE by Tuning Perplexity for Dimensionality Reduction in NLP. InInternational Conference on Communication and Computational Tech- nologies, pages 519–528. Springer, 2023
2023
-
[44]
E. M. Stein.Singular integrals and differentiability properties of functions. Number 30. Prince- ton university press, 1970
1970
-
[45]
Steinerberger and Y
S. Steinerberger and Y. Zhang. t-SNE, forceful colorings, and mean field limits.Research in the Mathematical Sciences, 9(3):42, 2022
2022
-
[46]
Van der Maaten and G
L. Van der Maaten and G. Hinton. Visualizing data using t-SNE.Journal of Machine Learning Research, 9(11), 2008
2008
-
[47]
Wattenberg, F
M. Wattenberg, F. Viégas, and I. Johnson. How to use t-SNE effectively.Distill, 1(10):e2, 2016. 47
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.