pith. sign in

arxiv: 2503.09101 · v4 · submitted 2025-03-12 · 💻 cs.LG · cs.AI· cs.CV

The Shape of Attraction in UMAP: Exploring the Embedding Forces in Dimensionality Reduction

Pith reviewed 2026-05-23 00:23 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords UMAPdimensionality reductionembedding forcesattractive forcesrepulsive forcescluster formationvisualizationmanifold learning
0
0 comments X

The pith

Modifying attractive forces in UMAP improves consistency of cluster formation under random initialization by resolving their dual effects in the embedding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the attractive and repulsive forces that UMAP uses to create low-dimensional embeddings from high-dimensional data. It finds that repulsion mainly sets cluster boundaries and distances between clusters, while attraction has a more complex role where tension between points can appear as both pulling together and pushing apart in the final map. This insight explains why learning rates are annealed and why attraction and repulsion are treated differently, and it leads to a modification of attraction that makes cluster results more stable no matter the starting positions.

Core claim

Repulsion emphasizes differences and controls cluster boundaries and inter-cluster distances in the embedding. Attraction is subtler because attractive tension between points can manifest simultaneously as attraction and repulsion in the lower-dimensional mapping. This dual effect explains the need for learning rate annealing and different treatments of the terms. Modifying the attraction term improves the consistency of cluster formation under random initialization.

What carries the argument

The attractive and repulsive forces applied to sampled point pairs in the low-dimensional space, with attraction showing dual manifestation effects.

If this is right

  • Repulsion controls cluster boundaries and inter-cluster distances.
  • Attraction's dual effects require separate treatment from repulsion and learning rate annealing.
  • Modifying attraction increases consistency of clusters across random initializations.
  • UMAP and similar methods gain a mechanistic understanding from force analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar force modifications could be tested in other neighbor embedding methods like t-SNE to check for consistency gains.
  • The dual nature of attraction might affect convergence speed in large datasets.
  • Users of UMAP could benefit from the modified attraction for more reliable visualizations in exploratory analysis.

Load-bearing premise

The dual effects of attraction can be isolated and changed independently without causing unwanted changes to embedding quality or how fast it converges.

What would settle it

Run multiple UMAP embeddings with the modified attraction on the same dataset but different random starts, and check if cluster assignments or shapes become more similar than with standard UMAP.

Figures

Figures reproduced from arXiv: 2503.09101 by Jason W. Fleischer, Mohammad Tariqul Islam.

Figure 1
Figure 1. Figure 1: Attraction and repulsion shapes in UMAP. (a) Effect of different values of [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Effect of random initialization on different attraction shapes for the MNIST dataset. (a) Mapping using PCA. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Control of inter-cluster distances on the MNIST dataset. (a) Computing [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Embedding of the MNIST dataset with (a) default attraction shape but repulsion shape with [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Values of g(ζ, b, a) for (a) a fixed at 1.576 and (b) b fixed at 0.89. (a) (b) (d)(c) Near optimal operating point Constant Learning Rate, 1.0 Constant Learning Rate, 0.01 Constant Learning Rate, 0.1 [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effect of constant learning rate in embeddings. (a) When the learning rate is too high ( [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of 300K abstracts from the PubMed dataset. (a) Default UMAP ( [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Effect of sorting the diagonal of Procrustes matrix on its visualization. (a) Procrustes matrix reproduced [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effect of random UMAP initialization on different attraction shapes on FMNIST data. (a) Mapping using [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Effect of random UMAP initialization on different attraction shapes on single-cell transcriptomes data. (a) [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Sensitivity of UMAP and NEG-t-SNE to learning rate on the MNIST dataset. (a) Attraction and (b) repulsion shapes for UMAP (a = 1, b = 1) and NEG-t-SNE. (c,d) UMAP is very sensitive to the learning rate λ, as f U a < −1 as the separation distance ζ decreases. Thus, without annealing, the clusters become fuzzy. (e,f) NEG-t-SNE is less sensitive to λ as f N a ∈ [−1, 0] always, and the clusters are thus less … view at source ↗
Figure 12
Figure 12. Figure 12: (a) Attraction shapes of TriMap for different [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Attraction and repulsion behavior in TriMap. (a) [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Attraction and repulsion shapes of PaCMAP. (a,b) Attraction shapes for (a) nearest-neighbor and (b) mid-near [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: PaCMAP behavior for different conditions. (a) PaCMAP of MNIST data with PCA initialization. (b,c) [PITH_FULL_IMAGE:figures/full_fig_p024_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Behavior of LocalMAP on MNIST data. (a) Default embedding. (b) Attraction-repulsion shape of nearest [PITH_FULL_IMAGE:figures/full_fig_p025_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: (a) Attraction-repulsion characteristic of MDS using [PITH_FULL_IMAGE:figures/full_fig_p028_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: (a) Trustworthiness and (b) Silhouette score of different methods for the MNIST dataset. [PITH_FULL_IMAGE:figures/full_fig_p030_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Varying [PITH_FULL_IMAGE:figures/full_fig_p031_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Varying [PITH_FULL_IMAGE:figures/full_fig_p032_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Varying [PITH_FULL_IMAGE:figures/full_fig_p033_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Varying [PITH_FULL_IMAGE:figures/full_fig_p034_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: (a) Trustworthiness and (b) Silhouette score of different methods for the FMNIST dataset. [PITH_FULL_IMAGE:figures/full_fig_p035_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Varying [PITH_FULL_IMAGE:figures/full_fig_p036_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Varying [PITH_FULL_IMAGE:figures/full_fig_p037_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Varying [PITH_FULL_IMAGE:figures/full_fig_p038_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Varying [PITH_FULL_IMAGE:figures/full_fig_p039_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: (a) Trustworthiness and (b) Silhouette score of different methods for the Single-cell transcriptomes dataset. [PITH_FULL_IMAGE:figures/full_fig_p040_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Varying [PITH_FULL_IMAGE:figures/full_fig_p041_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Varying [PITH_FULL_IMAGE:figures/full_fig_p042_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Varying [PITH_FULL_IMAGE:figures/full_fig_p043_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Varying [PITH_FULL_IMAGE:figures/full_fig_p044_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Embedding of MNIST by mixing and matching attraction and repulsion shapes with a constant learning rate [PITH_FULL_IMAGE:figures/full_fig_p046_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: Embedding of MNIST by mixing and matching attraction and repulsion shapes with a constant learning rate [PITH_FULL_IMAGE:figures/full_fig_p046_34.png] view at source ↗
read the original abstract

Uniform manifold approximation and projection (UMAP) is among the most popular neighbor embedding methods. The method samples pairs of point indices according to similarities in the high-dimensional space, and applies attractive and repulsive forces to their coordinates in the low-dimensional embedding. In this paper, we analyze the forces to reveal their effects on cluster formations and visualization, and compare UMAP to its contemporaries. Repulsion emphasizes differences, controlling cluster boundaries and inter-cluster distance. Attraction is more subtle, as attractive tension between points can manifest simultaneously as attraction and repulsion in the lower-dimensional mapping. This explains the need for learning rate annealing and motivates the different treatments between attractive and repulsive terms. Moreover, by modifying attraction, we improve the consistency of cluster formation under random initialization. Overall, our analysis provides a mechanistic understanding of UMAP and related embedding methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper analyzes attractive and repulsive forces in UMAP, claiming repulsion controls cluster boundaries and inter-cluster distances while attraction exhibits dual effects that can simultaneously attract and repel in the embedding space. This duality explains the need for learning-rate annealing and differing treatment of terms. The authors propose a modification to the attraction term that improves consistency of cluster formation under random initialization, providing a mechanistic understanding of UMAP and related methods.

Significance. If the central claims hold, the work supplies a force-level mechanistic account of neighbor embedding that could guide more stable and interpretable dimensionality reduction; the reported consistency gain under random seeds would be a practical contribution if shown to be isolated from side effects on other embedding properties.

major comments (1)
  1. [Abstract] Abstract (paragraph on force analysis and modification): the central claim that attraction can be modified independently to improve cluster consistency rests on the untested premise that the change leaves overall force balance, negative sampling, and optimization dynamics unaltered; no evidence is supplied that trustworthiness, continuity, or convergence speed remain unchanged.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the manuscript. We address the major comment regarding the modification to the attraction term below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on force analysis and modification): the central claim that attraction can be modified independently to improve cluster consistency rests on the untested premise that the change leaves overall force balance, negative sampling, and optimization dynamics unaltered; no evidence is supplied that trustworthiness, continuity, or convergence speed remain unchanged.

    Authors: We acknowledge that the current manuscript does not report explicit evaluations of trustworthiness, continuity, or convergence speed under the modified attraction term, nor does it provide a direct analysis of whether the change leaves negative sampling and optimization dynamics unaltered. The reported consistency improvement is demonstrated through cluster formation metrics under random initialization, but the referee is correct that broader embedding quality measures were not compared. In the revised version we will add these experiments, including side-by-side comparisons of trustworthiness and continuity scores as well as convergence behavior, to confirm that the modification does not introduce unintended side effects on the overall force balance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical force analysis and modification are independent of inputs

full rationale

The paper conducts an empirical analysis of attractive and repulsive forces in UMAP embeddings, reports observed dual effects of attraction, and describes a modification that improves cluster consistency under random initialization. No load-bearing step reduces a claimed result to its own fitted parameters or self-citations by construction. The central claims rest on direct observation of embedding behavior and comparison to other methods, remaining falsifiable against external metrics such as trustworthiness and convergence. This is the expected non-finding for an analysis paper without a closed derivation loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, ad-hoc axioms, or invented entities are stated. Standard neighbor-embedding sampling assumptions are implicit but not detailed.

axioms (1)
  • domain assumption UMAP samples pairs according to high-dimensional similarities and applies attractive/repulsive forces in the embedding space
    Invoked in the opening description of the method.

pith-pipeline@v0.9.0 · 5674 in / 1084 out tokens · 59598 ms · 2026-05-23T00:23:12.809497+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MAPLE: Self-Supervised Learning-Enhanced Nonlinear Dimensionality Reduction for Visual Analysis

    cs.LG 2026-01 unverdicted novelty 6.0

    MAPLE enhances UMAP via self-supervised MMCRs to untangle complex manifolds, yielding clearer clusters and finer subclusters than standard UMAP at similar cost.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Amid and M

    9 APREPRINT Ehsan Amid and Manfred K Warmuth. TriMap: Large-scale dimensionality reduction using triplets.arXiv preprint arXiv:1910.00204,

  2. [2]

    Low-dimensional embeddings of high- dimensional data.arXiv preprint arXiv:2508.15929,

    Cyril de Bodt, Alex Diaz-Papkovich, Michael Bleher, Kerstin Bunte, Corinna Coupette, Sebastian Damrich, Enrique Fita Sanmartin, Fred A Hamprecht, Em˝oke-Ágnes Horvát, Dhruv Kohli, et al. Low-dimensional embeddings of high- dimensional data.arXiv preprint arXiv:2508.15929,

  3. [3]

    Manifold-aligned neighbor embedding

    Mohammad Tariqul Islam and Jason W Fleischer. Manifold-aligned neighbor embedding. InICLR 2022 Workshop on Geometrical and Topological Representation Learning,

  4. [4]

    DREAMS: Preserving both local and global structure in dimension- ality reduction.arXiv preprint arXiv:2508.13747,

    Noël Kury, Dmitry Kobak, and Sebastian Damrich. DREAMS: Preserving both local and global structure in dimension- ality reduction.arXiv preprint arXiv:2508.13747,

  5. [5]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426,

  6. [6]

    Towards one model for classical dimensionality reduction: A probabilistic perspective on umap and t-sne.arXiv preprint arXiv:2405.17412,

    Aditya Ravuri and Neil D Lawrence. Towards one model for classical dimensionality reduction: A probabilistic perspective on umap and t-sne.arXiv preprint arXiv:2405.17412,

  7. [7]

    Dimensionality reduction as probabilistic inference

    Aditya Ravuri, Francisco Vargas, Vidhi Lalchand, and Neil D Lawrence. Dimensionality reduction as probabilistic inference. InICML 2023 Workshop on Structured Probabilistic Inference{\&}Generative Modeling,

  8. [8]

    Stochastic Neighbor Embedding separates well-separated clusters

    Uri Shaham and Stefan Steinerberger. Stochastic neighbor embedding separates well-separated clusters.arXiv preprint arXiv:1702.02670,

  9. [9]

    Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, St´ efan J

    doi: 10.1038/s41592-019-0686-2. Yingfan Wang, Haiyang Huang, Cynthia Rudin, and Yaron Shaposhnik. Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP for data visualization.The Journal of Machine Learning Research, 22(1):9129–9201,

  10. [10]

    Interpretable dimensionality reduction by feature preserving manifold approximation and projection.arXiv preprint arXiv:2211.09321,

    Yang Yang, Hongjian Sun, Jialei Gong, Yali Du, and Di Yu. Interpretable dimensionality reduction by feature preserving manifold approximation and projection.arXiv preprint arXiv:2211.09321,

  11. [11]

    The default UMAP provides a crowded structure where labels (colors) overlap (Fig

    with known labels from the PubMed dataset (González- Márquez et al., 2024). The default UMAP provides a crowded structure where labels (colors) overlap (Fig. 7(a)). 15 APREPRINT (a) (b) Figure 7: Visualization of 300K abstracts from the PubMed dataset. (a) Default UMAP (a= 1.576, b= 0.89 ) shows labels overlapping each other in the mapping, but (b) UMAP w...

  12. [12]

    translation, scaling, and rotation

    measures similarity between two point clouds {x} and {y} under linear transformations, viz. translation, scaling, and rotation. Operationally, we hold the former fixed and vary the latter until the two sets are in maximum alignment. Let {y′} be the transformation of {y} that achieves this objective. Then the 16 APREPRINT Procrustes distance is given by pd...

  13. [13]

    9 and 10, respectively)

    data (Fig. 9 and 10, respectively). The main conclusion remains unchanged: modified and composite attraction shapes, such as those that increase attraction at large distances, significantly improve the consistency of reconstruction. (Random init.) (Random init.)(PCA init.) (Random init.) (a) (b) (c) (d) Attraction Shapes (Random init.) (Random init.) (Ran...

  14. [14]

    We then tackle t-SNE (Van der Maaten and Hinton, 2008)

    that modify TriMAP’s loss function that works for pairwise interactions. We then tackle t-SNE (Van der Maaten and Hinton, 2008). Then, we provide a short note on SNE (Hinton and Roweis,

  15. [15]

    G.1 TriMap TriMap (Amid and Warmuth,

    that uses an alternate kernel function and finally end the section by briefly discussing multidimensional scaling (Borg and Groenen, 2007). G.1 TriMap TriMap (Amid and Warmuth,

  16. [16]

    optimizes low-dimensional embedding at different scales. The loss function is LP =w N B X (i,j)∈N N 1 1 + 10qij +w M N X (i,k)∈M N 1 1 + 10000qik +w F P X (i,l)∈F P 1 1 + 1 qil ,(56) where wN B, wM N, and wF P are weights of the nearest neighbor (NN) pairs, mid-near (MN) pairs, and further pairs (FP), respectively (details in Appendix H). The first two te...

  17. [17]

    Thus, t-SNE’s repulsive forces can give both attraction and repulsion

    resembles that of TriMap, the repulsion behavior will be the same. Thus, t-SNE’s repulsive forces can give both attraction and repulsion. Since, t-SNE’s forces are scaled (bywi,j for attraction and wi,j Z for repulsion), the attractive and repulsive forces are typically lower than that of UMAP. As a result, the algorithm generally uses large values for le...

  18. [18]

    This follows the same formula of t-SNE (Eq

    is one of the oldest algorithms in this class. This follows the same formula of t-SNE (Eq. 71), but withq i,j = exp(−||yi −y j||2 2). This results in the attraction shape f SN E a (ζ) =−2,(80) that follows Proposition 4.1 (with0< λw i,jf SN E a <−1) and the repulsion shape f SN E r (ζ) = 2 exp(−ζ2),(81) that follows proposition G.3 (by replacing the t-SNE...

  19. [19]

    Nevertheless, few works discuss gradient methods (Kruskal, 1964; Zheng et al., 2018)

    typically does not use gradient methods, as they often fail to converge to good mappings; instead, it employs stress majorization. Nevertheless, few works discuss gradient methods (Kruskal, 1964; Zheng et al., 2018). We offer a brief treatment for this below. Particularly, Zheng et al. (2018) formulates a successful gradient descent-based MDS algorithm fo...

  20. [20]

    The purpose of this is the same as the correspondingσ i parameters in Eq

    with a self-tuning distance measure (Zelnik-Manor and Perona, 2004), d2 i,j = ||xi −x j||2 σiσj ,(93) where σi is the average distance between xi and its Euclidean nearest fourth to sixth neighbors. The purpose of this is the same as the correspondingσ i parameters in Eq. (87) and (90), despite all three being defined differently. Regardless of these choi...

  21. [21]

    18- 34, unless otherwise stated)

    We also used the same implementation when we changed the attraction and the repulsion shapes to those of the alternative methods (Figs. 18- 34, unless otherwise stated). The trustworthiness and silhouette scores were computed using the corresponding function from thescikit-learnpackage. The mappings shown in Figs. 2, 3, 9, 10, and 15 are rotated to a refe...