The Shape of Attraction in UMAP: Exploring the Embedding Forces in Dimensionality Reduction
Pith reviewed 2026-05-23 00:23 UTC · model grok-4.3
The pith
Modifying attractive forces in UMAP improves consistency of cluster formation under random initialization by resolving their dual effects in the embedding.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Repulsion emphasizes differences and controls cluster boundaries and inter-cluster distances in the embedding. Attraction is subtler because attractive tension between points can manifest simultaneously as attraction and repulsion in the lower-dimensional mapping. This dual effect explains the need for learning rate annealing and different treatments of the terms. Modifying the attraction term improves the consistency of cluster formation under random initialization.
What carries the argument
The attractive and repulsive forces applied to sampled point pairs in the low-dimensional space, with attraction showing dual manifestation effects.
If this is right
- Repulsion controls cluster boundaries and inter-cluster distances.
- Attraction's dual effects require separate treatment from repulsion and learning rate annealing.
- Modifying attraction increases consistency of clusters across random initializations.
- UMAP and similar methods gain a mechanistic understanding from force analysis.
Where Pith is reading between the lines
- Similar force modifications could be tested in other neighbor embedding methods like t-SNE to check for consistency gains.
- The dual nature of attraction might affect convergence speed in large datasets.
- Users of UMAP could benefit from the modified attraction for more reliable visualizations in exploratory analysis.
Load-bearing premise
The dual effects of attraction can be isolated and changed independently without causing unwanted changes to embedding quality or how fast it converges.
What would settle it
Run multiple UMAP embeddings with the modified attraction on the same dataset but different random starts, and check if cluster assignments or shapes become more similar than with standard UMAP.
Figures
read the original abstract
Uniform manifold approximation and projection (UMAP) is among the most popular neighbor embedding methods. The method samples pairs of point indices according to similarities in the high-dimensional space, and applies attractive and repulsive forces to their coordinates in the low-dimensional embedding. In this paper, we analyze the forces to reveal their effects on cluster formations and visualization, and compare UMAP to its contemporaries. Repulsion emphasizes differences, controlling cluster boundaries and inter-cluster distance. Attraction is more subtle, as attractive tension between points can manifest simultaneously as attraction and repulsion in the lower-dimensional mapping. This explains the need for learning rate annealing and motivates the different treatments between attractive and repulsive terms. Moreover, by modifying attraction, we improve the consistency of cluster formation under random initialization. Overall, our analysis provides a mechanistic understanding of UMAP and related embedding methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes attractive and repulsive forces in UMAP, claiming repulsion controls cluster boundaries and inter-cluster distances while attraction exhibits dual effects that can simultaneously attract and repel in the embedding space. This duality explains the need for learning-rate annealing and differing treatment of terms. The authors propose a modification to the attraction term that improves consistency of cluster formation under random initialization, providing a mechanistic understanding of UMAP and related methods.
Significance. If the central claims hold, the work supplies a force-level mechanistic account of neighbor embedding that could guide more stable and interpretable dimensionality reduction; the reported consistency gain under random seeds would be a practical contribution if shown to be isolated from side effects on other embedding properties.
major comments (1)
- [Abstract] Abstract (paragraph on force analysis and modification): the central claim that attraction can be modified independently to improve cluster consistency rests on the untested premise that the change leaves overall force balance, negative sampling, and optimization dynamics unaltered; no evidence is supplied that trustworthiness, continuity, or convergence speed remain unchanged.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on the manuscript. We address the major comment regarding the modification to the attraction term below.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph on force analysis and modification): the central claim that attraction can be modified independently to improve cluster consistency rests on the untested premise that the change leaves overall force balance, negative sampling, and optimization dynamics unaltered; no evidence is supplied that trustworthiness, continuity, or convergence speed remain unchanged.
Authors: We acknowledge that the current manuscript does not report explicit evaluations of trustworthiness, continuity, or convergence speed under the modified attraction term, nor does it provide a direct analysis of whether the change leaves negative sampling and optimization dynamics unaltered. The reported consistency improvement is demonstrated through cluster formation metrics under random initialization, but the referee is correct that broader embedding quality measures were not compared. In the revised version we will add these experiments, including side-by-side comparisons of trustworthiness and continuity scores as well as convergence behavior, to confirm that the modification does not introduce unintended side effects on the overall force balance. revision: yes
Circularity Check
No circularity: empirical force analysis and modification are independent of inputs
full rationale
The paper conducts an empirical analysis of attractive and repulsive forces in UMAP embeddings, reports observed dual effects of attraction, and describes a modification that improves cluster consistency under random initialization. No load-bearing step reduces a claimed result to its own fitted parameters or self-citations by construction. The central claims rest on direct observation of embedding behavior and comparison to other methods, remaining falsifiable against external metrics such as trustworthiness and convergence. This is the expected non-finding for an analysis paper without a closed derivation loop.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption UMAP samples pairs according to high-dimensional similarities and applies attractive/repulsive forces in the embedding space
Forward citations
Cited by 1 Pith paper
-
MAPLE: Self-Supervised Learning-Enhanced Nonlinear Dimensionality Reduction for Visual Analysis
MAPLE enhances UMAP via self-supervised MMCRs to untangle complex manifolds, yielding clearer clusters and finer subclusters than standard UMAP at similar cost.
Reference graph
Works this paper leans on
-
[1]
9 APREPRINT Ehsan Amid and Manfred K Warmuth. TriMap: Large-scale dimensionality reduction using triplets.arXiv preprint arXiv:1910.00204,
-
[2]
Low-dimensional embeddings of high- dimensional data.arXiv preprint arXiv:2508.15929,
Cyril de Bodt, Alex Diaz-Papkovich, Michael Bleher, Kerstin Bunte, Corinna Coupette, Sebastian Damrich, Enrique Fita Sanmartin, Fred A Hamprecht, Em˝oke-Ágnes Horvát, Dhruv Kohli, et al. Low-dimensional embeddings of high- dimensional data.arXiv preprint arXiv:2508.15929,
-
[3]
Manifold-aligned neighbor embedding
Mohammad Tariqul Islam and Jason W Fleischer. Manifold-aligned neighbor embedding. InICLR 2022 Workshop on Geometrical and Topological Representation Learning,
work page 2022
-
[4]
Noël Kury, Dmitry Kobak, and Sebastian Damrich. DREAMS: Preserving both local and global structure in dimension- ality reduction.arXiv preprint arXiv:2508.13747,
-
[5]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Aditya Ravuri and Neil D Lawrence. Towards one model for classical dimensionality reduction: A probabilistic perspective on umap and t-sne.arXiv preprint arXiv:2405.17412,
-
[7]
Dimensionality reduction as probabilistic inference
Aditya Ravuri, Francisco Vargas, Vidhi Lalchand, and Neil D Lawrence. Dimensionality reduction as probabilistic inference. InICML 2023 Workshop on Structured Probabilistic Inference{\&}Generative Modeling,
work page 2023
-
[8]
Stochastic Neighbor Embedding separates well-separated clusters
Uri Shaham and Stefan Steinerberger. Stochastic neighbor embedding separates well-separated clusters.arXiv preprint arXiv:1702.02670,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
doi: 10.1038/s41592-019-0686-2. Yingfan Wang, Haiyang Huang, Cynthia Rudin, and Yaron Shaposhnik. Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP for data visualization.The Journal of Machine Learning Research, 22(1):9129–9201,
-
[10]
Yang Yang, Hongjian Sun, Jialei Gong, Yali Du, and Di Yu. Interpretable dimensionality reduction by feature preserving manifold approximation and projection.arXiv preprint arXiv:2211.09321,
-
[11]
The default UMAP provides a crowded structure where labels (colors) overlap (Fig
with known labels from the PubMed dataset (González- Márquez et al., 2024). The default UMAP provides a crowded structure where labels (colors) overlap (Fig. 7(a)). 15 APREPRINT (a) (b) Figure 7: Visualization of 300K abstracts from the PubMed dataset. (a) Default UMAP (a= 1.576, b= 0.89 ) shows labels overlapping each other in the mapping, but (b) UMAP w...
work page 2024
-
[12]
translation, scaling, and rotation
measures similarity between two point clouds {x} and {y} under linear transformations, viz. translation, scaling, and rotation. Operationally, we hold the former fixed and vary the latter until the two sets are in maximum alignment. Let {y′} be the transformation of {y} that achieves this objective. Then the 16 APREPRINT Procrustes distance is given by pd...
work page 2018
-
[13]
data (Fig. 9 and 10, respectively). The main conclusion remains unchanged: modified and composite attraction shapes, such as those that increase attraction at large distances, significantly improve the consistency of reconstruction. (Random init.) (Random init.)(PCA init.) (Random init.) (a) (b) (c) (d) Attraction Shapes (Random init.) (Random init.) (Ran...
work page 2021
-
[14]
We then tackle t-SNE (Van der Maaten and Hinton, 2008)
that modify TriMAP’s loss function that works for pairwise interactions. We then tackle t-SNE (Van der Maaten and Hinton, 2008). Then, we provide a short note on SNE (Hinton and Roweis,
work page 2008
-
[15]
G.1 TriMap TriMap (Amid and Warmuth,
that uses an alternate kernel function and finally end the section by briefly discussing multidimensional scaling (Borg and Groenen, 2007). G.1 TriMap TriMap (Amid and Warmuth,
work page 2007
-
[16]
optimizes low-dimensional embedding at different scales. The loss function is LP =w N B X (i,j)∈N N 1 1 + 10qij +w M N X (i,k)∈M N 1 1 + 10000qik +w F P X (i,l)∈F P 1 1 + 1 qil ,(56) where wN B, wM N, and wF P are weights of the nearest neighbor (NN) pairs, mid-near (MN) pairs, and further pairs (FP), respectively (details in Appendix H). The first two te...
work page 2015
-
[17]
Thus, t-SNE’s repulsive forces can give both attraction and repulsion
resembles that of TriMap, the repulsion behavior will be the same. Thus, t-SNE’s repulsive forces can give both attraction and repulsion. Since, t-SNE’s forces are scaled (bywi,j for attraction and wi,j Z for repulsion), the attractive and repulsive forces are typically lower than that of UMAP. As a result, the algorithm generally uses large values for le...
work page 2024
-
[18]
This follows the same formula of t-SNE (Eq
is one of the oldest algorithms in this class. This follows the same formula of t-SNE (Eq. 71), but withq i,j = exp(−||yi −y j||2 2). This results in the attraction shape f SN E a (ζ) =−2,(80) that follows Proposition 4.1 (with0< λw i,jf SN E a <−1) and the repulsion shape f SN E r (ζ) = 2 exp(−ζ2),(81) that follows proposition G.3 (by replacing the t-SNE...
work page 2008
-
[19]
Nevertheless, few works discuss gradient methods (Kruskal, 1964; Zheng et al., 2018)
typically does not use gradient methods, as they often fail to converge to good mappings; instead, it employs stress majorization. Nevertheless, few works discuss gradient methods (Kruskal, 1964; Zheng et al., 2018). We offer a brief treatment for this below. Particularly, Zheng et al. (2018) formulates a successful gradient descent-based MDS algorithm fo...
work page 1964
-
[20]
The purpose of this is the same as the correspondingσ i parameters in Eq
with a self-tuning distance measure (Zelnik-Manor and Perona, 2004), d2 i,j = ||xi −x j||2 σiσj ,(93) where σi is the average distance between xi and its Euclidean nearest fourth to sixth neighbors. The purpose of this is the same as the correspondingσ i parameters in Eq. (87) and (90), despite all three being defined differently. Regardless of these choi...
work page 2004
-
[21]
18- 34, unless otherwise stated)
We also used the same implementation when we changed the attraction and the repulsion shapes to those of the alternative methods (Figs. 18- 34, unless otherwise stated). The trustworthiness and silhouette scores were computed using the corresponding function from thescikit-learnpackage. The mappings shown in Figs. 2, 3, 9, 10, and 15 are rotated to a refe...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.