MEDAL: Manifold Embedding Distillation via Autoencoder Learning

Genevera I. Allen; Irene Chang; Tarek M. Zikry

arxiv: 2605.24244 · v1 · pith:G37QTNC3new · submitted 2026-05-22 · 📊 stat.ML · cs.LG

MEDAL: Manifold Embedding Distillation via Autoencoder Learning

Irene Chang , Tarek M. Zikry , Genevera I. Allen This is my paper

Pith reviewed 2026-06-30 14:19 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords manifold embeddingautoencoderdimension reductionout-of-sample extensionheld-out validationembedding distillationreconstruction errornonlinear embedding

0 comments

The pith

MEDAL distills any manifold embedding into a constrained autoencoder that matches the embedding at the bottleneck while reconstructing inputs, enabling held-out validation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Nonlinear dimension reduction techniques such as t-SNE and UMAP lack explicit maps for new points and inverses back to the original space, preventing standard held-out validation. MEDAL solves this by training an autoencoder whose bottleneck layer is forced to reproduce the teacher embedding exactly and whose decoder reconstructs the input data. The resulting model supplies an out-of-sample encoder, an approximate inverse, and a reconstruction-based distortion measure. This turns one-time embeddings into reusable models that support quantitative comparison of methods and hyperparameter choices on unseen data.

Core claim

By training a constrained autoencoder so that its bottleneck exactly reproduces any given teacher embedding while the decoder reconstructs the original inputs, MEDAL produces an explicit out-of-sample map, an approximate inverse map, and a pointwise reconstruction error that serves as a distortion measure, thereby converting static manifold embeddings into models that admit held-out validation, method comparison, and hyperparameter tuning.

What carries the argument

Constrained autoencoder whose bottleneck is forced to match the teacher embedding exactly while the decoder reconstructs the input.

If this is right

New samples receive an explicit map into the embedding space.
The decoder supplies an approximate inverse from embedding coordinates back to original features.
Pointwise reconstruction error quantifies local distortion in the manifold space.
Different dimension reduction methods can be compared quantitatively on held-out data.
Hyperparameters of the original embedding can be tuned using validation metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reconstruction error could flag regions where the original embedding compresses biologically meaningful structure.
Mapping new samples and inspecting their reconstruction errors might serve as a practical test for distribution shift relative to the training manifold.
The same distillation step could be applied to other embedding algorithms to create a uniform validation layer across the field.
One could check whether the distilled model preserves higher-order neighborhood statistics better than existing out-of-sample extensions.

Load-bearing premise

The autoencoder can be trained to reproduce the geometry and neighborhoods of an arbitrary teacher embedding without adding its own systematic distortions.

What would settle it

A held-out test set where the neighborhoods or distances in the MEDAL-mapped space differ substantially from those produced by applying the original embedding method directly to the same points.

Figures

Figures reproduced from arXiv: 2605.24244 by Genevera I. Allen, Irene Chang, Tarek M. Zikry.

**Figure 1.** Figure 1: MEDAL inputs a fitted manifold embedding and distills this into a reusable model that permits quantitative validation of the embedding. A, MEDAL distills a teacher embedding into a constrained autoencoder by jointly optimizing a distillation loss that aligns the bottleneck with the teacher manifold and a reconstruction loss that preserves input-space information. B, Once distilled, the learned encoder–dec… view at source ↗

**Figure 2.** Figure 2: MEDAL enables quantitative validation, comparison, and distortion analysis on MNIST data. A, UMAP hyperparameter selection using held-out reconstruction error. For each value of n neighbors, a fitted UMAP teacher embedding was distilled into a MEDAL student, and reconstruction loss was evaluated on train, validation, and test splits. The selected model (n neighbors=35; dashed line) balances class separati… view at source ↗

**Figure 3.** Figure 3: MEDAL selects a biologically coherent embedding of the whole-animal Hydra single-cell RNA-seq dataset [87] and localizes cell-type-specific distortion. A, MEDAL hyperparameter tuning for t-SNE teachers on the Hydra single-cell atlas. Reconstruction loss was evaluated across t-SNE perplexities on train, validation, and test splits. The selected perplexity, chosen by the validation curve using the one-stand… view at source ↗

**Figure 4.** Figure 4: Comparison of MEDAL with existing embedding diagnostics on a t-SNE embedding on Hydra ([87]) single-cell RNA-seq dataset. A, Hyperparameter-selection curves across t-SNE perplexities for MEDAL, neMDBD ([64]), scDEED ([103]), and EMBEDR ([51]). Dashed vertical lines indicate the selected perplexity for each method. MEDAL selects perplexity 499 by held-out reconstruction loss, whereas neMDBD selects perplexi… view at source ↗

**Figure 5.** Figure 5: MEDAL selects a manifold embedding for mouse neocortex single-cell RNAseq ([91]) and reveals cell types poorly represented on the manifold. A, MEDAL hyperparameter tuning for t-SNE teachers on the neocortex atlas. Reconstruction loss was evaluated across t-SNE perplexities on train, validation, and test splits. The selected perplexity was 53. The corresponding training and test embeddings preserve major … view at source ↗

**Figure 6.** Figure 6: MEDAL detects subject-level distribution shift in macaque retina single-cell RNA-seq ([81]) by embedding new cells into a fixed reference manifold. A, Joint t-SNE embedding of macaque retinal cells from multiple subjects. The embedding appears organized by cell type when colored by annotated retinal cell class, but coloring the same embedding by subject reveals substantial subject-level structure, indicati… view at source ↗

read the original abstract

Low-dimensional embeddings are widely used as visual summaries of high-dimensional data and to enable downstream scientific discoveries. Yet, popular nonlinear dimension reduction methods, such as t-SNE and UMAP, are often selected based on visual appeal alone and without rigorous quantitative validation. A major reason is that manifold embeddings typically do not provide an out-of-sample map nor an inverse back to the original feature space; this makes held-out validation, the gold standard in supervised learning, all but impossible. To address these challenges, we develop a novel framework, MEDAL (Manifold Embedding Distillation via Autoencoder Learning), which distills a fitted manifold embedding into a reusable encoder--decoder model. MEDAL trains a constrained autoencoder whose bottleneck exactly matches any teacher embedding while the decoder reconstructs the original input; this yields an explicit map for new samples, an approximate inverse, and a pointwise reconstruction-based measure of distortion in the manifold space. This converts static manifold embeddings into models that can be evaluated on held-out data, enabling quantitative validation including comparing different dimension reduction methods as well as hyperparameter tuning. Across multiple benchmark and scientific case studies, we show that MEDAL enables held-out validation to determine optimal manifold embeddings and hyperparameters, reveals biologically coherent regions that are difficult to preserve in two dimensional embeddings, and detects distribution shift when new samples are mapped into a fixed reference manifold. MEDAL provides a general validation wrapper to any existing dimension reduction technique that will improve the rigor and reliability of dimension reduction in scientific workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MEDAL wraps any DR embedding in an autoencoder for out-of-sample maps and reconstruction validation, but the geometry match may not hold under the joint loss.

read the letter

The main point is that MEDAL turns a static manifold embedding into an autoencoder whose bottleneck copies the teacher coordinates while the decoder reconstructs the input. This supplies an explicit map for new points, an approximate inverse, and a reconstruction error that can serve as a distortion measure on held-out data.

What is new is the concrete distillation setup that forces the bottleneck to match an arbitrary teacher and then uses reconstruction for validation. The abstract shows this applied to benchmarks and scientific cases, where it helps pick hyperparameters and flags distribution shift. That framing fills a practical gap: most nonlinear methods like t-SNE or UMAP lack out-of-sample extensions, so quantitative checks have been hard.

The experiments appear to demonstrate the workflow on real data, which is a step forward for users who need more than visual inspection.

The soft spot is the joint optimization. Matching the teacher coordinates coordinate-wise does not automatically preserve local neighborhoods or geometry. Nothing in the loss prevents the encoder from introducing folds or warps that keep the pointwise match but break the downstream validation measure. Without ablations that check neighborhood preservation or compare against direct geometry metrics, it is unclear how reliable the reconstruction-based score actually is.

This is for applied statisticians and biologists who already run dimension reduction and want a way to validate choices on held-out samples. A reader looking for a general validation layer will find the idea useful even if the current evidence is preliminary.

It deserves peer review because the problem is real and the proposed wrapper is straightforward to implement and test. The central claim can be checked with targeted experiments on local structure.

Referee Report

2 major / 2 minor

Summary. The paper introduces MEDAL, a framework for distilling any pre-fitted nonlinear manifold embedding (e.g., t-SNE or UMAP) into a constrained autoencoder. The encoder is trained so its bottleneck layer exactly reproduces the teacher embedding coordinates on training points while the decoder reconstructs the original high-dimensional input; this supplies an explicit out-of-sample map, an approximate inverse, and a pointwise reconstruction error that serves as a distortion measure in the embedded space. The resulting model enables held-out quantitative validation, hyperparameter selection, and distribution-shift detection for dimension-reduction methods that otherwise lack these capabilities.

Significance. If the central construction holds, MEDAL would convert static, non-reusable embeddings into evaluable models, directly addressing the lack of rigorous validation that currently limits the scientific use of nonlinear dimension reduction. The approach is general (applicable to any teacher embedding) and supplies concrete tools—out-of-sample extension and a reconstruction-based fidelity metric—that are absent from standard t-SNE/UMAP pipelines. No machine-checked proofs or parameter-free derivations are claimed, but the empirical demonstration on benchmarks and biological case studies, if supported by appropriate controls, would constitute a practical advance.

major comments (2)

[§3.2] §3.2 (composite loss): the coordinate-matching term ||f_θ(x) − teacher(x)||_2 does not constrain local geometry or neighborhood structure. Nothing prevents the joint optimizer from trading small increases in matching error for large reconstruction gains by introducing folds or warps invisible to the pointwise L2 term yet visible to downstream nearest-neighbor or distance-based validation; the manuscript must supply explicit evidence (e.g., k-NN preservation or trustworthiness scores on held-out data) that such distortion does not occur.
[§5] §5 (held-out validation experiments): the reported improvements in hyperparameter selection and distribution-shift detection rely on the assumption that the bottleneck faithfully reproduces the teacher geometry on unseen points. Without an ablation that isolates the effect of the matching loss weight or compares against a pure reconstruction autoencoder, it is unclear whether the observed gains are attributable to faithful distillation or to the autoencoder’s own inductive bias.

minor comments (2)

Notation for the teacher embedding and the encoder output should be unified across equations and text to avoid confusion between the static teacher map and the learned f_θ.
Figure captions should explicitly state whether the displayed embeddings are the original teacher or the MEDAL-reconstructed versions, and whether any quantitative metric is computed on training or held-out points.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on empirical validation. We address each major comment below, agreeing where additional evidence is warranted and outlining the revisions we will make.

read point-by-point responses

Referee: [§3.2] §3.2 (composite loss): the coordinate-matching term ||f_θ(x) − teacher(x)||_2 does not constrain local geometry or neighborhood structure. Nothing prevents the joint optimizer from trading small increases in matching error for large reconstruction gains by introducing folds or warps invisible to the pointwise L2 term yet visible to downstream nearest-neighbor or distance-based validation; the manuscript must supply explicit evidence (e.g., k-NN preservation or trustworthiness scores on held-out data) that such distortion does not occur.

Authors: We agree that the pointwise L2 matching term alone does not explicitly regularize local geometry. While the training procedure is designed to reproduce the teacher coordinates exactly on the training points, we acknowledge that verification of neighborhood preservation on held-out data is required. In the revised manuscript we will add k-NN preservation and trustworthiness scores evaluated on held-out points, comparing the distilled MEDAL embeddings against the original teacher embeddings to provide the requested evidence that distortion does not occur. revision: yes
Referee: [§5] §5 (held-out validation experiments): the reported improvements in hyperparameter selection and distribution-shift detection rely on the assumption that the bottleneck faithfully reproduces the teacher geometry on unseen points. Without an ablation that isolates the effect of the matching loss weight or compares against a pure reconstruction autoencoder, it is unclear whether the observed gains are attributable to faithful distillation or to the autoencoder’s own inductive bias.

Authors: We agree that isolating the contribution of the matching loss is important for attributing the observed gains. The revised manuscript will include an ablation that varies the weight of the coordinate-matching term and directly compares the full MEDAL model against a pure reconstruction autoencoder (matching weight set to zero). These experiments will clarify whether the improvements stem from faithful distillation rather than the autoencoder architecture alone. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation self-contained with independent losses and evaluation

full rationale

The paper defines MEDAL as training an autoencoder with a composite objective that includes both reconstruction of the input and matching to a pre-fitted teacher embedding; the held-out validation, distortion measure, and out-of-sample mapping are defined directly from the decoder and encoder outputs without reducing to the teacher coordinates by construction. No self-citation is load-bearing for the central claim, no fitted parameter is relabeled as a prediction, and no uniqueness theorem or ansatz is imported from prior author work. The method is a standard constrained autoencoder wrapper whose outputs are independently falsifiable on held-out data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that an autoencoder can be constrained to reproduce a given embedding geometry while still learning a useful decoder; no free parameters or invented entities are introduced in the abstract description.

axioms (1)

domain assumption A standard autoencoder architecture can be trained so that its bottleneck layer exactly reproduces the coordinates of any given nonlinear manifold embedding.
This premise is required for the distillation step to preserve the teacher embedding properties.

pith-pipeline@v0.9.1-grok · 5806 in / 1197 out tokens · 42600 ms · 2026-06-30T14:19:28.871091+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

120 extracted references · 69 canonical work pages · 9 internal anchors

[1]

Allen, Luqin Gan, and Lili Zheng

Genevera I. Allen, Luqin Gan, and Lili Zheng. Interpretable machine learning for discovery: Statistical challenges and opportunities.Annual Review of Statistics and Its Application, 11 (Volume 11, 2024):97–121, 2024

2024
[2]

Bingxue An and Tiffany M. Tang. Consensus dimension reduction via multi-view learning,
[3]

URLhttps://arxiv.org/abs/2512.15802

work page arXiv
[4]

Hypernp: Interactive visual exploration of multidimensional projection hyperparameters, 2021

Gabriel Appleby, Mateus Espadoto, Rui Chen, Samuel Goree, Alexandru Telea, Erik W Anderson, and Remco Chang. Hypernp: Interactive visual exploration of multidimensional projection hyperparameters, 2021. URLhttps://arxiv.org/abs/2106.13777

work page arXiv 2021
[5]

Reidenbach, Adam Gayoso, and Nir Yosef

Tal Ashuach, Danny A. Reidenbach, Adam Gayoso, and Nir Yosef. Multivi: Deep generative model for the integration of multimodal data.Nature Methods, 20:1222–1231, 2023. doi: 10.1038/s41592-023-01909-9

work page doi:10.1038/s41592-023-01909-9 2023
[6]

Neural networks and principal component analysis: Learning from examples without local minima.Neural Networks, 2(1):53–58, 1989

Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural Networks, 2(1):53–58, 1989. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(89)90014-2. URLhttps://www.sciencedirect. com/science/article/pii/0893608089900142

work page doi:10.1016/0893-6080(89)90014-2 1989
[7]

Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal func- tion.IEEE Trans. Inf. Theory, 39:930–945, 1993. URLhttps://api.semanticscholar. org/CorpusID:15383918

1993
[8]

Dimensionality reduction for visualizing single-cell data using umap.Nature Biotechnology, 37(1):38–44, 2019

Etienne Becht, Leland McInnes, John Healy, Charles-Antoine Dutertre, Immanuel W H Kwok, Lai Guan Ng, Florent Ginhoux, and Evan W Newell. Dimensionality reduction for visualizing single-cell data using umap.Nature Biotechnology, 37(1):38–44, 2019

2019
[9]

Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373–1396, 2003

Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373–1396, 2003. doi: 10.1162/ 089976603321780317

2003
[10]

Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering.Advances in neural information processing systems, 16, 2003

Yoshua Bengio, Jean-fran¸ ccois Paiement, Pascal Vincent, Olivier Delalleau, Nicolas Roux, and Marie Ouimet. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering.Advances in neural information processing systems, 16, 2003

2003
[11]

doi: 10.1109/TPAMI.2013.50

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1798–1828, 2013. doi: 10.1109/TPAMI.2013.50

work page doi:10.1109/tpami.2013.50 2013
[12]

Vanderburg,˚Asa Segerstolpe, Meng Zhang, Inbal Avraham- Davidi, and Aviv Regev

Tommaso Biancalani, Gabriele Scalia, Lorenzo Buffoni, Rahul Avasthi, Ziqing Lu, Aviv Sanger, Nazli Tokcan, Charles R. Vanderburg,˚Asa Segerstolpe, Meng Zhang, Inbal Avraham- Davidi, and Aviv Regev. Deep learning and alignment of spatially resolved single-cell transcriptomes with tangram.Nature Methods, 18(11):1352–1362, 2021. doi: 10.1038/ s41592-021-01264-7

2021
[13]

Strategies for eels data analysis

Javier Blanco-Portals, Francesca Peir´ o, and S` onia Estrad´ e. Strategies for eels data analysis. introducing umap and hdbscan for dimensionality reduction and clustering.Microscopy and Microanalysis, 28(1):109–122, 2022. 26

2022
[14]

Bourlard and Y

H. Bourlard and Y. Kamp. Auto-association by multilayer perceptrons and singular value decomposition.Biological Cybernetics, 59(4):291–294, 1988. doi: 10.1007/BF00332918. URL https://doi.org/10.1007/BF00332918

work page doi:10.1007/bf00332918 1988
[15]

Stability and generalization.Journal of Machine Learning Research, 2:499–526, 2002

Olivier Bousquet and Andr´ e Elisseeff. Stability and generalization.Journal of Machine Learning Research, 2:499–526, 2002. doi: 10.1162/153244302760200704

work page doi:10.1162/153244302760200704 2002
[16]

McFaline-Figueroa, Michael I

Pierre Boyeau, Jeremy Hong, Adam Gayoso, Michelle Kim, Jos´ e L. McFaline-Figueroa, Michael I. Jordan, Elham Azizi, Can Ergen, and Nir Yosef. Deep generative model- ing of sample-level heterogeneity in single-cell genomics.Nature Methods, 2025. doi: 10.1038/s41592-025-02808-x

work page doi:10.1038/s41592-025-02808-x 2025
[17]

Friedman, Richard A

Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone.Classification and Regression Trees. Wadsworth, Belmont, CA, 1984

1984
[18]

Automatic Selection of t-SNE Perplexity

Yanshuai Cao and Luyu Wang. Automatic selection of t-sne perplexity, 2017. URLhttps: //arxiv.org/abs/1708.03229

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

A critical analysis of the usage of dimensionality reduction in four domains.IEEE Transactions on Visualization and Computer Graphics, 31(10):9405–9423, October 2025

Dylan Cashman, Mark Keller, Hyeon Jeon, Bum Chul Kwon, and Qianwen Wang. A critical analysis of the usage of dimensionality reduction in four domains.IEEE Transactions on Visualization and Computer Graphics, 31(10):9405–9423, October 2025. ISSN 2160-9306. doi: 10.1109/tvcg.2025.3567989. URLhttp://dx.doi.org/10.1109/TVCG.2025.3567989

work page doi:10.1109/tvcg.2025.3567989 2025
[20]

Raymond B. Cattell. The scree test for the number of factors.Multivariate Behavioral Research, 1(2):245–276, 1966. doi: 10.1207/s15327906mbr0102 10

work page doi:10.1207/s15327906mbr0102 1966
[21]

Tang, Tarek M

Andersen Chang, Tiffany M. Tang, Tarek M. Zikry, and Genevera I. Allen. Unsupervised machine learning for scientific discovery: Workflow and best practices, 2025. URLhttps: //arxiv.org/abs/2506.04553

work page arXiv 2025
[22]

Dataslingers/medal: v0.1.0, May 2026

Irene Chang, tzUNC, and Matthew Shen. Dataslingers/medal: v0.1.0, May 2026. URL https://doi.org/10.5281/zenodo.20347573

work page doi:10.5281/zenodo.20347573 2026
[23]

The specious art of single-cell genomics.PLOS Computational Biology, 19(8):1–20, 08 2023

Tara Chari and Lior Pachter. The specious art of single-cell genomics.PLOS Computational Biology, 19(8):1–20, 08 2023. doi: 10.1371/journal.pcbi.1011288. URLhttps://doi.org/ 10.1371/journal.pcbi.1011288

work page doi:10.1371/journal.pcbi.1011288 2023
[24]

Neural population dynamics during reaching.Nature, 487(7405):51–56, 2012

Mark M Churchland, John P Cunningham, Matthew T Kaufman, Justin D Foster, Paul Nuyujukian, Stephen I Ryu, and Krishna V Shenoy. Neural population dynamics during reaching.Nature, 487(7405):51–56, 2012

2012
[25]

scgpt: Toward building a foundation model for single-cell multi-omics using generative ai.Nature Methods, 21:1470–1480, 2024

Haotian Cui, Chen Wang, Hassaan Maan, Kuan Pang, Feng Luo, and Bo Wang. scgpt: Toward building a foundation model for single-cell multi-omics using generative ai.Nature Methods, 21:1470–1480, 2024. doi: 10.1038/s41592-024-02201-0

work page doi:10.1038/s41592-024-02201-0 2024
[26]

Dimensionality reduction for large-scale neural record- ings.Nature neuroscience, 17(11):1500–1509, 2014

John P Cunningham and Byron M Yu. Dimensionality reduction for large-scale neural record- ings.Nature neuroscience, 17(11):1500–1509, 2014

2014
[27]

George V. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989. URLhttps://api.semanticscholar. org/CorpusID:3958369. 27

1989
[28]

Sloan, Derek Croote, Marco Mignardi, Sophia Chernikova, Pey- man Samghababi, Ye Zhang, Norma Neff, Mark Kowarsky, Christine Caneda, Gordon Li, Steven D

Spyros Darmanis, Steven A. Sloan, Derek Croote, Marco Mignardi, Sophia Chernikova, Pey- man Samghababi, Ye Zhang, Norma Neff, Mark Kowarsky, Christine Caneda, Gordon Li, Steven D. Chang, Ian David Connolly, Yingmei Li, Ben A. Barres, Melanie Hayden Gephart, and Stephen R. Quake. Single-cell rna-seq analysis of infiltrating neoplastic cells at the mi- grat...

2017
[29]

Lowell E. Davis. Histological and ultrastructural studies of the basal disk of hydra. iii. the gastrodermis and the mesoglea.Cell and Tissue Research, 162:107–118, 1975. doi: 10.1007/BF00223266

work page doi:10.1007/bf00223266 1975
[30]

Hamprecht, Em˝ oke´Agnes Horv´ at, Dhruv Kohli, Smita Krishnaswamy, John A

Cyril de Bodt, Alex Diaz-Papkovich, Michael Bleher, Kerstin Bunte, Corinna Coupette, Se- bastian Damrich, Enrique Fita Sanmartin, Fred A. Hamprecht, Em˝ oke´Agnes Horv´ at, Dhruv Kohli, Smita Krishnaswamy, John A. Lee, Boudewijn P. F. Lelieveldt, Leland McInnes, Ian T. Nabney, Maximilian Noichl, Pavlin G. Poliˇ car, Bastian Rieck, Guy Wolf, Gal Mishne, an...

work page arXiv 2025
[31]

Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder.Nature Communications, 13 (1):1739, 2022

Kangning Dong and Shihua Zhang. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder.Nature Communications, 13 (1):1739, 2022. doi: 10.1038/s41467-022-29439-6

work page doi:10.1038/s41467-022-29439-6 2022
[32]

Dorrity, Lauren M

Michael W. Dorrity, Lauren M. Saunders, Christine Queitsch, Stanley Fields, and Cole Trapnell. Dimensionality reduction by umap to visualize physical and genetic interac- tions.Nature Communications, 11(1):1537, 2020. doi: 10.1038/s41467-020-15351-4. URL https://doi.org/10.1038/s41467-020-15351-4

work page doi:10.1038/s41467-020-15351-4 2020
[33]

Duque, Sacha Morin, Guy Wolf, and Kevin Moon

Andres F. Duque, Sacha Morin, Guy Wolf, and Kevin Moon. Extendable and invertible mani- fold learning with geometry regularized autoencoders. In2020 IEEE International Conference on Big Data (Big Data), page 5027–5036. IEEE, December 2020. doi: 10.1109/bigdata50022. 2020.9378049. URLhttp://dx.doi.org/10.1109/BigData50022.2020.9378049

work page doi:10.1109/bigdata50022 2020
[34]

gene expression cancer RNA-Seq

Samuele Fiorini. gene expression cancer RNA-Seq. UCI Machine Learning Repository, 2016. DOI: https://doi.org/10.24432/C5R88H

work page doi:10.24432/c5r88h 2016
[35]

A review of unsupervised learning in astronomy.Astronomy and Com- puting, 48:100851, 2024

Sotiria Fotopoulou. A review of unsupervised learning in astronomy.Astronomy and Com- puting, 48:100851, 2024

2024
[36]

Are machine learning interpretations reliable? a stability study on global interpretations.arXiv preprint arXiv:2505.15728, 2025

Luqin Gan, Tarek M Zikry, and Genevera I Allen. Are machine learning interpretations reliable? a stability study on global interpretations.arXiv preprint arXiv:2505.15728, 2025

work page arXiv 2025
[37]

Nazor, Aaron Streets, and Nir Yosef

Adam Gayoso, Zo¨ e Steier, Romain Lopez, Jeffrey Regier, Kristopher L. Nazor, Aaron Streets, and Nir Yosef. Joint probabilistic modeling of single-cell multi-omic data with totalvi.Nature Methods, 18(3):272–282, 2021. doi: 10.1038/s41592-020-01050-x

work page doi:10.1038/s41592-020-01050-x 2021
[38]

Rufus Gikera, Elizaphan Maina, Shadrack Maina Mambo, and Jonathan Mwaura. K- hyperparameter tuning in high-dimensional genomics using joint optimization of deep differ- ential evolutionary algorithm and unsupervised transfer learning from intelligent genoumap embeddings.International Journal of Information Technology, 17(3):1679–1701, 2025

2025
[39]

Expressivity of deep neural networks,

Ingo G¨ uhring, Mones Raslan, and Gitta Kutyniok. Expressivity of deep neural networks,
[40]

URLhttps://arxiv.org/abs/2007.04759. 28

work page arXiv 2007
[41]

Laleh Haghverdi, Aaron T. L. Lun, Michael D. Morgan, and John C. Marioni. Batch effects in single-cell rna-sequencing data are corrected by matching mutual nearest neighbors.Nature Biotechnology, 36:421–427, 2018. doi: 10.1038/nbt.4091

work page doi:10.1038/nbt.4091 2018
[42]

Approximating Continuous Functions by ReLU Nets of Minimal Width

Boris Hanin and Mark Sellke. Approximating continuous functions by relu nets of minimal width, 2018. URLhttps://arxiv.org/abs/1710.11278

work page internal anchor Pith review Pith/arXiv arXiv 2018
[43]

Mauck III, Shiwei Zheng, Andrew Butler, Maddie J

Yuhan Hao, Stephanie Hao, Erica Andersen-Nissen, William M. Mauck III, Shiwei Zheng, Andrew Butler, Maddie J. Lee, Aaron J. Wilk, Charlotte Darby, Michael Zager, et al. In- tegrated analysis of multimodal single-cell data.Cell, 184(13):3573–3587.e29, 2021. doi: 10.1016/j.cell.2021.04.048

work page doi:10.1016/j.cell.2021.04.048 2021
[44]

Interactive single-cell data analysis using cellar.Nature Communications, 13(1):1998, 2022

Euxhen Hasanaj, Jingtao Wang, Arjun Sarathi, Jun Ding, and Ziv Bar-Joseph. Interactive single-cell data analysis using cellar.Nature Communications, 13(1):1998, 2022

1998
[45]

Distilling the knowledge in a neural network,

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network,
[46]

URLhttps://arxiv.org/abs/1503.02531

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Hinton and Richard S

Geoffrey E. Hinton and Richard S. Zemel. Autoencoders, minimum description length and helmholtz free energy. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems 6, pages 3–10. Morgan Kaufmann, 1994

1994
[48]

Holstein

Thomas W. Holstein. The hydra stem cell system – revisited.Cells & Development, 174: 203846, 2023. doi: 10.1016/j.cdev.2023.203846

work page doi:10.1016/j.cdev.2023.203846 2023
[49]

Approximation capabilities of multilayer feedforward networks.Neural Net- works, 4(2):251–257, 1991

Kurt Hornik. Approximation capabilities of multilayer feedforward networks.Neural Net- works, 4(2):251–257, 1991

1991
[50]

Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359–366, 1989

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359–366, 1989

1989
[51]

Analysis of a complex of statistical variables into principal components

Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933

1933
[52]

Haiyang Huang, Yingfan Wang, Cynthia Rudin, and Edward P. Browne. Towards a com- prehensive evaluation of dimension reduction methods for transcriptomic data visualiza- tion.Communications Biology, 5(1):719, 2022. doi: 10.1038/s42003-022-03628-x. URL https://doi.org/10.1038/s42003-022-03628-x

work page doi:10.1038/s42003-022-03628-x 2022
[53]

Stop misusing t-sne and umap for visual analytics, 2025

Hyeon Jeon, Jeongin Park, Sungbok Shin, and Jinwook Seo. Stop misusing t-sne and umap for visual analytics, 2025. URLhttps://arxiv.org/abs/2506.08725

work page arXiv 2025
[54]

Embedr: distinguishing signal from noise in single-cell omics data.Patterns, 3(3), 2022

Eric M Johnson, William Kath, and Madhav Mani. Embedr: distinguishing signal from noise in single-cell omics data.Patterns, 3(3), 2022

2022
[55]

Optimizing dimension- ality reduction in sdn: A metaheuristic approach of umap parameter tuning

Abderrahmane Jouilili, Hajar Hantouti, and Rajae E L Ouazzani. Optimizing dimension- ality reduction in sdn: A metaheuristic approach of umap parameter tuning. In2024 5th International Conference on Communications, Information, Electronic and Energy Systems (CIEES), pages 1–6, 2024. doi: 10.1109/CIEES62939.2024.10811181

work page doi:10.1109/ciees62939.2024.10811181 2024
[56]

Kang, Aparna Nathan, Kathryn Weinand, Fan Zhang, Nghia Millard, Laurie Rumker, D

Joyce B. Kang, Aparna Nathan, Kathryn Weinand, Fan Zhang, Nghia Millard, Laurie Rumker, D. Branch Moody, Ilya Korsunsky, and Soumya Raychaudhuri. Efficient and pre- cise single-cell reference atlas mapping with symphony.Nature Communications, 12(1):5890,
[57]

doi: 10.1038/s41467-021-25957-x. 29

work page doi:10.1038/s41467-021-25957-x
[58]

Self-Normalizing Neural Networks

G¨ unter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self- normalizing neural networks, 2017. URLhttps://arxiv.org/abs/1706.02515

work page internal anchor Pith review Pith/arXiv arXiv 2017
[59]

Vitalii Kleshchevnikov, Artem Shmatko, Emma Dann, Alexander Aivazidis, Hamish W. King, Tong Li, Rasa Elmentaite, Artem Lomakin, Veronika Kedlian, Adam Gayoso, Mika Sarkin Jain, Jun Sung Park, Lauma Ramona, Elizabeth Tuck, Anna Arutyunyan, Roser Vento- Tormo, Moritz Gerstung, Louisa James, Oliver Stegle, and Omer Ali Bayraktar. Cell2location maps fine-grai...
[60]

doi: 10.1038/s41587-021-01139-4

work page doi:10.1038/s41587-021-01139-4
[61]

The art of using t-sne for single-cell transcriptomics

Dmitry Kobak and Philipp Berens. The art of using t-sne for single-cell transcriptomics. Nature communications, 10(1):5416, 2019

2019
[62]

Linderman

Dmitry Kobak and George C. Linderman. Initialization is critical for preserving global data structure in both t-sne and umap.Nature Biotechnology, 39(2):156–157, 2021. doi: 10.1038/ s41587-020-00809-z

2021
[63]

Fast, sensitive and accurate integration of single-cell data with harmony.Nature Methods, 16(12):1289–1296,

Ilya Korsunsky, Nghia Millard, Jean Fan, Kamil Slowikowski, Fan Zhang, Kevin Wei, Yuriy Baglaenko, Michael Brenner, Po-Ru Loh, and Soumya Raychaudhuri. Fast, sensitive and accurate integration of single-cell data with harmony.Nature Methods, 16(12):1289–1296,
[64]

doi: 10.1038/s41592-019-0619-0

work page doi:10.1038/s41592-019-0619-0
[65]

Lecun, L

Yann LeCun, L´ eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791

work page doi:10.1109/5.726791 1998
[66]

Lee and Michel Verleysen

John A. Lee and Michel Verleysen. Quality assessment of dimensionality reduction: Rank- based criteria.Neurocomputing, 72(7–9):1431–1443, 2009. doi: 10.1016/j.neucom.2008.12.017

work page doi:10.1016/j.neucom.2008.12.017 2009
[67]

Lin, Allan Pinkus, and Shimon Schocken

Moshe Leshno, Vladimir Ya. Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.Neural Networks, 6(6):861–867, 1993

1993
[68]

Efficient and robust bayesian selection of hyper- parameters in dimension reduction for visualization, 2023

Yin-Ting Liao, Hengrui Luo, and Anna Ma. Efficient and robust bayesian selection of hyper- parameters in dimension reduction for visualization, 2023. URLhttps://arxiv.org/abs/ 2306.00357

work page arXiv 2023
[69]

Calibrating dimension reduction hyperparameters in the presence of noise.PLOS Computational Biology, 20(9):e1012427, September 2024

Justin Lin and Julia Fukuyama. Calibrating dimension reduction hyperparameters in the presence of noise.PLOS Computational Biology, 20(9):e1012427, September 2024. ISSN 1553-

2024
[70]

URLhttp://dx.doi.org/10.1371/journal

doi: 10.1371/journal.pcbi.1012427. URLhttp://dx.doi.org/10.1371/journal. pcbi.1012427

work page doi:10.1371/journal.pcbi.1012427
[71]

Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective, 2025

Zhexuan Liu, Rong Ma, and Yiqiao Zhong. Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective, 2025. URLhttps://arxiv.org/abs/ 2410.16608

work page arXiv 2025
[72]

Cole, Michael I

Romain Lopez, Jeffrey Regier, Michael B. Cole, Michael I. Jordan, and Nir Yosef. Deep generative modeling for single-cell transcriptomics.Nature Methods, 15(12):1053–1058, 2018. doi: 10.1038/s41592-018-0229-2

work page doi:10.1038/s41592-018-0229-2 2018
[73]

Alexander Wolf, and Fabian J

Mohammad Lotfollahi, F. Alexander Wolf, and Fabian J. Theis. scgen predicts single-cell perturbation responses.Nature Methods, 16:715–721, 2019. doi: 10.1038/s41592-019-0494-8. 30

work page doi:10.1038/s41592-019-0494-8 2019
[74]

Luecken, Matin Khajavi, Maren B¨ uttner, Marco Wagenstetter, ˇZiga Avsec, Adam Gayoso, Nir Yosef, Marta Interlandi, Sergei Rybakov, Alexander V

Mohammad Lotfollahi, Mohsen Naghipourfar, Malte D. Luecken, Matin Khajavi, Maren B¨ uttner, Marco Wagenstetter, ˇZiga Avsec, Adam Gayoso, Nir Yosef, Marta Interlandi, Sergei Rybakov, Alexander V. Misharin, and Fabian J. Theis. Mapping single-cell data to reference atlases by transfer learning.Nature Biotechnology, 40(1):121–130, 2022. doi: 10.1038/s41587-...

work page doi:10.1038/s41587-021-01001-7 2022
[75]

Ibarra, Sanjay R

Mohammad Lotfollahi, Anna Klimovskaia Susmelj, Carlo De Donno, Yuge Ji, Ignacio L. Ibarra, Sanjay R. Srivatsan, Mohsen Naghipourfar, Riza M. Daza, Beth Martin, F. Alexander Wolf, Nailya Yakubova, Jay Shendure Lee, Jos´ e L. McFaline-Figueroa, and Fabian J. Theis. Predicting cellular responses to complex perturbations in high-throughput screens.Molecular S...

work page doi:10.15252/msb.202211517 2023
[76]

The Expressive Power of Neural Networks: A View from the Width

Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks: A view from the width, 2017. URLhttps://arxiv.org/abs/ 1709.02540

work page internal anchor Pith review Pith/arXiv arXiv 2017
[77]

Manifold-learning- based feature extraction for classification of hyperspectral data: A review of advances in manifold learning.IEEE Signal Processing Magazine, 31(1):55–66, 2013

Dalton Lunga, Saurabh Prasad, Melba M Crawford, and Okan Ersoy. Manifold-learning- based feature extraction for classification of hyperspectral data: A review of advances in manifold learning.IEEE Signal Processing Magazine, 31(1):55–66, 2013

2013
[78]

& Großberger, L

Leland McInnes, John Healy, Nathaniel Saul, and Lukas Grossberger. Umap: Uniform man- ifold approximation and projection.Journal of Open Source Software, 3(29):861, 2018. doi: 10.21105/joss.00861. URLhttps://doi.org/10.21105/joss.00861

work page doi:10.21105/joss.00861 2018
[79]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction, 2020. URLhttps://arxiv.org/abs/1802.03426

work page internal anchor Pith review Pith/arXiv arXiv 2020
[80]

Manifold learning: What, how, and why.Annual Review of Statistics and Its Application, 11:27–57, 2024

Marina Meil˘ a and Hanyu Zhang. Manifold learning: What, how, and why.Annual Review of Statistics and Its Application, 11:27–57, 2024. doi: 10.1146/annurev-statistics-112723-034552. URLhttps://arxiv.org/abs/2311.03757

work page doi:10.1146/annurev-statistics-112723-034552 2024

Showing first 80 references.

[1] [1]

Allen, Luqin Gan, and Lili Zheng

Genevera I. Allen, Luqin Gan, and Lili Zheng. Interpretable machine learning for discovery: Statistical challenges and opportunities.Annual Review of Statistics and Its Application, 11 (Volume 11, 2024):97–121, 2024

2024

[2] [2]

Bingxue An and Tiffany M. Tang. Consensus dimension reduction via multi-view learning,

[3] [3]

URLhttps://arxiv.org/abs/2512.15802

work page arXiv

[4] [4]

Hypernp: Interactive visual exploration of multidimensional projection hyperparameters, 2021

Gabriel Appleby, Mateus Espadoto, Rui Chen, Samuel Goree, Alexandru Telea, Erik W Anderson, and Remco Chang. Hypernp: Interactive visual exploration of multidimensional projection hyperparameters, 2021. URLhttps://arxiv.org/abs/2106.13777

work page arXiv 2021

[5] [5]

Reidenbach, Adam Gayoso, and Nir Yosef

Tal Ashuach, Danny A. Reidenbach, Adam Gayoso, and Nir Yosef. Multivi: Deep generative model for the integration of multimodal data.Nature Methods, 20:1222–1231, 2023. doi: 10.1038/s41592-023-01909-9

work page doi:10.1038/s41592-023-01909-9 2023

[6] [6]

Neural networks and principal component analysis: Learning from examples without local minima.Neural Networks, 2(1):53–58, 1989

Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural Networks, 2(1):53–58, 1989. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(89)90014-2. URLhttps://www.sciencedirect. com/science/article/pii/0893608089900142

work page doi:10.1016/0893-6080(89)90014-2 1989

[7] [7]

Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal func- tion.IEEE Trans. Inf. Theory, 39:930–945, 1993. URLhttps://api.semanticscholar. org/CorpusID:15383918

1993

[8] [8]

Dimensionality reduction for visualizing single-cell data using umap.Nature Biotechnology, 37(1):38–44, 2019

Etienne Becht, Leland McInnes, John Healy, Charles-Antoine Dutertre, Immanuel W H Kwok, Lai Guan Ng, Florent Ginhoux, and Evan W Newell. Dimensionality reduction for visualizing single-cell data using umap.Nature Biotechnology, 37(1):38–44, 2019

2019

[9] [9]

Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373–1396, 2003

Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373–1396, 2003. doi: 10.1162/ 089976603321780317

2003

[10] [10]

Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering.Advances in neural information processing systems, 16, 2003

Yoshua Bengio, Jean-fran¸ ccois Paiement, Pascal Vincent, Olivier Delalleau, Nicolas Roux, and Marie Ouimet. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering.Advances in neural information processing systems, 16, 2003

2003

[11] [11]

doi: 10.1109/TPAMI.2013.50

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1798–1828, 2013. doi: 10.1109/TPAMI.2013.50

work page doi:10.1109/tpami.2013.50 2013

[12] [12]

Vanderburg,˚Asa Segerstolpe, Meng Zhang, Inbal Avraham- Davidi, and Aviv Regev

Tommaso Biancalani, Gabriele Scalia, Lorenzo Buffoni, Rahul Avasthi, Ziqing Lu, Aviv Sanger, Nazli Tokcan, Charles R. Vanderburg,˚Asa Segerstolpe, Meng Zhang, Inbal Avraham- Davidi, and Aviv Regev. Deep learning and alignment of spatially resolved single-cell transcriptomes with tangram.Nature Methods, 18(11):1352–1362, 2021. doi: 10.1038/ s41592-021-01264-7

2021

[13] [13]

Strategies for eels data analysis

Javier Blanco-Portals, Francesca Peir´ o, and S` onia Estrad´ e. Strategies for eels data analysis. introducing umap and hdbscan for dimensionality reduction and clustering.Microscopy and Microanalysis, 28(1):109–122, 2022. 26

2022

[14] [14]

Bourlard and Y

H. Bourlard and Y. Kamp. Auto-association by multilayer perceptrons and singular value decomposition.Biological Cybernetics, 59(4):291–294, 1988. doi: 10.1007/BF00332918. URL https://doi.org/10.1007/BF00332918

work page doi:10.1007/bf00332918 1988

[15] [15]

Stability and generalization.Journal of Machine Learning Research, 2:499–526, 2002

Olivier Bousquet and Andr´ e Elisseeff. Stability and generalization.Journal of Machine Learning Research, 2:499–526, 2002. doi: 10.1162/153244302760200704

work page doi:10.1162/153244302760200704 2002

[16] [16]

McFaline-Figueroa, Michael I

Pierre Boyeau, Jeremy Hong, Adam Gayoso, Michelle Kim, Jos´ e L. McFaline-Figueroa, Michael I. Jordan, Elham Azizi, Can Ergen, and Nir Yosef. Deep generative model- ing of sample-level heterogeneity in single-cell genomics.Nature Methods, 2025. doi: 10.1038/s41592-025-02808-x

work page doi:10.1038/s41592-025-02808-x 2025

[17] [17]

Friedman, Richard A

Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone.Classification and Regression Trees. Wadsworth, Belmont, CA, 1984

1984

[18] [18]

Automatic Selection of t-SNE Perplexity

Yanshuai Cao and Luyu Wang. Automatic selection of t-sne perplexity, 2017. URLhttps: //arxiv.org/abs/1708.03229

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

A critical analysis of the usage of dimensionality reduction in four domains.IEEE Transactions on Visualization and Computer Graphics, 31(10):9405–9423, October 2025

Dylan Cashman, Mark Keller, Hyeon Jeon, Bum Chul Kwon, and Qianwen Wang. A critical analysis of the usage of dimensionality reduction in four domains.IEEE Transactions on Visualization and Computer Graphics, 31(10):9405–9423, October 2025. ISSN 2160-9306. doi: 10.1109/tvcg.2025.3567989. URLhttp://dx.doi.org/10.1109/TVCG.2025.3567989

work page doi:10.1109/tvcg.2025.3567989 2025

[20] [20]

Raymond B. Cattell. The scree test for the number of factors.Multivariate Behavioral Research, 1(2):245–276, 1966. doi: 10.1207/s15327906mbr0102 10

work page doi:10.1207/s15327906mbr0102 1966

[21] [21]

Tang, Tarek M

Andersen Chang, Tiffany M. Tang, Tarek M. Zikry, and Genevera I. Allen. Unsupervised machine learning for scientific discovery: Workflow and best practices, 2025. URLhttps: //arxiv.org/abs/2506.04553

work page arXiv 2025

[22] [22]

Dataslingers/medal: v0.1.0, May 2026

Irene Chang, tzUNC, and Matthew Shen. Dataslingers/medal: v0.1.0, May 2026. URL https://doi.org/10.5281/zenodo.20347573

work page doi:10.5281/zenodo.20347573 2026

[23] [23]

The specious art of single-cell genomics.PLOS Computational Biology, 19(8):1–20, 08 2023

Tara Chari and Lior Pachter. The specious art of single-cell genomics.PLOS Computational Biology, 19(8):1–20, 08 2023. doi: 10.1371/journal.pcbi.1011288. URLhttps://doi.org/ 10.1371/journal.pcbi.1011288

work page doi:10.1371/journal.pcbi.1011288 2023

[24] [24]

Neural population dynamics during reaching.Nature, 487(7405):51–56, 2012

Mark M Churchland, John P Cunningham, Matthew T Kaufman, Justin D Foster, Paul Nuyujukian, Stephen I Ryu, and Krishna V Shenoy. Neural population dynamics during reaching.Nature, 487(7405):51–56, 2012

2012

[25] [25]

scgpt: Toward building a foundation model for single-cell multi-omics using generative ai.Nature Methods, 21:1470–1480, 2024

Haotian Cui, Chen Wang, Hassaan Maan, Kuan Pang, Feng Luo, and Bo Wang. scgpt: Toward building a foundation model for single-cell multi-omics using generative ai.Nature Methods, 21:1470–1480, 2024. doi: 10.1038/s41592-024-02201-0

work page doi:10.1038/s41592-024-02201-0 2024

[26] [26]

Dimensionality reduction for large-scale neural record- ings.Nature neuroscience, 17(11):1500–1509, 2014

John P Cunningham and Byron M Yu. Dimensionality reduction for large-scale neural record- ings.Nature neuroscience, 17(11):1500–1509, 2014

2014

[27] [27]

George V. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989. URLhttps://api.semanticscholar. org/CorpusID:3958369. 27

1989

[28] [28]

Sloan, Derek Croote, Marco Mignardi, Sophia Chernikova, Pey- man Samghababi, Ye Zhang, Norma Neff, Mark Kowarsky, Christine Caneda, Gordon Li, Steven D

Spyros Darmanis, Steven A. Sloan, Derek Croote, Marco Mignardi, Sophia Chernikova, Pey- man Samghababi, Ye Zhang, Norma Neff, Mark Kowarsky, Christine Caneda, Gordon Li, Steven D. Chang, Ian David Connolly, Yingmei Li, Ben A. Barres, Melanie Hayden Gephart, and Stephen R. Quake. Single-cell rna-seq analysis of infiltrating neoplastic cells at the mi- grat...

2017

[29] [29]

Lowell E. Davis. Histological and ultrastructural studies of the basal disk of hydra. iii. the gastrodermis and the mesoglea.Cell and Tissue Research, 162:107–118, 1975. doi: 10.1007/BF00223266

work page doi:10.1007/bf00223266 1975

[30] [30]

Hamprecht, Em˝ oke´Agnes Horv´ at, Dhruv Kohli, Smita Krishnaswamy, John A

Cyril de Bodt, Alex Diaz-Papkovich, Michael Bleher, Kerstin Bunte, Corinna Coupette, Se- bastian Damrich, Enrique Fita Sanmartin, Fred A. Hamprecht, Em˝ oke´Agnes Horv´ at, Dhruv Kohli, Smita Krishnaswamy, John A. Lee, Boudewijn P. F. Lelieveldt, Leland McInnes, Ian T. Nabney, Maximilian Noichl, Pavlin G. Poliˇ car, Bastian Rieck, Guy Wolf, Gal Mishne, an...

work page arXiv 2025

[31] [31]

Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder.Nature Communications, 13 (1):1739, 2022

Kangning Dong and Shihua Zhang. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder.Nature Communications, 13 (1):1739, 2022. doi: 10.1038/s41467-022-29439-6

work page doi:10.1038/s41467-022-29439-6 2022

[32] [32]

Dorrity, Lauren M

Michael W. Dorrity, Lauren M. Saunders, Christine Queitsch, Stanley Fields, and Cole Trapnell. Dimensionality reduction by umap to visualize physical and genetic interac- tions.Nature Communications, 11(1):1537, 2020. doi: 10.1038/s41467-020-15351-4. URL https://doi.org/10.1038/s41467-020-15351-4

work page doi:10.1038/s41467-020-15351-4 2020

[33] [33]

Duque, Sacha Morin, Guy Wolf, and Kevin Moon

Andres F. Duque, Sacha Morin, Guy Wolf, and Kevin Moon. Extendable and invertible mani- fold learning with geometry regularized autoencoders. In2020 IEEE International Conference on Big Data (Big Data), page 5027–5036. IEEE, December 2020. doi: 10.1109/bigdata50022. 2020.9378049. URLhttp://dx.doi.org/10.1109/BigData50022.2020.9378049

work page doi:10.1109/bigdata50022 2020

[34] [34]

gene expression cancer RNA-Seq

Samuele Fiorini. gene expression cancer RNA-Seq. UCI Machine Learning Repository, 2016. DOI: https://doi.org/10.24432/C5R88H

work page doi:10.24432/c5r88h 2016

[35] [35]

A review of unsupervised learning in astronomy.Astronomy and Com- puting, 48:100851, 2024

Sotiria Fotopoulou. A review of unsupervised learning in astronomy.Astronomy and Com- puting, 48:100851, 2024

2024

[36] [36]

Are machine learning interpretations reliable? a stability study on global interpretations.arXiv preprint arXiv:2505.15728, 2025

Luqin Gan, Tarek M Zikry, and Genevera I Allen. Are machine learning interpretations reliable? a stability study on global interpretations.arXiv preprint arXiv:2505.15728, 2025

work page arXiv 2025

[37] [37]

Nazor, Aaron Streets, and Nir Yosef

Adam Gayoso, Zo¨ e Steier, Romain Lopez, Jeffrey Regier, Kristopher L. Nazor, Aaron Streets, and Nir Yosef. Joint probabilistic modeling of single-cell multi-omic data with totalvi.Nature Methods, 18(3):272–282, 2021. doi: 10.1038/s41592-020-01050-x

work page doi:10.1038/s41592-020-01050-x 2021

[38] [38]

Rufus Gikera, Elizaphan Maina, Shadrack Maina Mambo, and Jonathan Mwaura. K- hyperparameter tuning in high-dimensional genomics using joint optimization of deep differ- ential evolutionary algorithm and unsupervised transfer learning from intelligent genoumap embeddings.International Journal of Information Technology, 17(3):1679–1701, 2025

2025

[39] [39]

Expressivity of deep neural networks,

Ingo G¨ uhring, Mones Raslan, and Gitta Kutyniok. Expressivity of deep neural networks,

[40] [40]

URLhttps://arxiv.org/abs/2007.04759. 28

work page arXiv 2007

[41] [41]

Laleh Haghverdi, Aaron T. L. Lun, Michael D. Morgan, and John C. Marioni. Batch effects in single-cell rna-sequencing data are corrected by matching mutual nearest neighbors.Nature Biotechnology, 36:421–427, 2018. doi: 10.1038/nbt.4091

work page doi:10.1038/nbt.4091 2018

[42] [42]

Approximating Continuous Functions by ReLU Nets of Minimal Width

Boris Hanin and Mark Sellke. Approximating continuous functions by relu nets of minimal width, 2018. URLhttps://arxiv.org/abs/1710.11278

work page internal anchor Pith review Pith/arXiv arXiv 2018

[43] [43]

Mauck III, Shiwei Zheng, Andrew Butler, Maddie J

Yuhan Hao, Stephanie Hao, Erica Andersen-Nissen, William M. Mauck III, Shiwei Zheng, Andrew Butler, Maddie J. Lee, Aaron J. Wilk, Charlotte Darby, Michael Zager, et al. In- tegrated analysis of multimodal single-cell data.Cell, 184(13):3573–3587.e29, 2021. doi: 10.1016/j.cell.2021.04.048

work page doi:10.1016/j.cell.2021.04.048 2021

[44] [44]

Interactive single-cell data analysis using cellar.Nature Communications, 13(1):1998, 2022

Euxhen Hasanaj, Jingtao Wang, Arjun Sarathi, Jun Ding, and Ziv Bar-Joseph. Interactive single-cell data analysis using cellar.Nature Communications, 13(1):1998, 2022

1998

[45] [45]

Distilling the knowledge in a neural network,

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network,

[46] [46]

URLhttps://arxiv.org/abs/1503.02531

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

Hinton and Richard S

Geoffrey E. Hinton and Richard S. Zemel. Autoencoders, minimum description length and helmholtz free energy. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems 6, pages 3–10. Morgan Kaufmann, 1994

1994

[48] [48]

Holstein

Thomas W. Holstein. The hydra stem cell system – revisited.Cells & Development, 174: 203846, 2023. doi: 10.1016/j.cdev.2023.203846

work page doi:10.1016/j.cdev.2023.203846 2023

[49] [49]

Approximation capabilities of multilayer feedforward networks.Neural Net- works, 4(2):251–257, 1991

Kurt Hornik. Approximation capabilities of multilayer feedforward networks.Neural Net- works, 4(2):251–257, 1991

1991

[50] [50]

Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359–366, 1989

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359–366, 1989

1989

[51] [51]

Analysis of a complex of statistical variables into principal components

Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933

1933

[52] [52]

Haiyang Huang, Yingfan Wang, Cynthia Rudin, and Edward P. Browne. Towards a com- prehensive evaluation of dimension reduction methods for transcriptomic data visualiza- tion.Communications Biology, 5(1):719, 2022. doi: 10.1038/s42003-022-03628-x. URL https://doi.org/10.1038/s42003-022-03628-x

work page doi:10.1038/s42003-022-03628-x 2022

[53] [53]

Stop misusing t-sne and umap for visual analytics, 2025

Hyeon Jeon, Jeongin Park, Sungbok Shin, and Jinwook Seo. Stop misusing t-sne and umap for visual analytics, 2025. URLhttps://arxiv.org/abs/2506.08725

work page arXiv 2025

[54] [54]

Embedr: distinguishing signal from noise in single-cell omics data.Patterns, 3(3), 2022

Eric M Johnson, William Kath, and Madhav Mani. Embedr: distinguishing signal from noise in single-cell omics data.Patterns, 3(3), 2022

2022

[55] [55]

Optimizing dimension- ality reduction in sdn: A metaheuristic approach of umap parameter tuning

Abderrahmane Jouilili, Hajar Hantouti, and Rajae E L Ouazzani. Optimizing dimension- ality reduction in sdn: A metaheuristic approach of umap parameter tuning. In2024 5th International Conference on Communications, Information, Electronic and Energy Systems (CIEES), pages 1–6, 2024. doi: 10.1109/CIEES62939.2024.10811181

work page doi:10.1109/ciees62939.2024.10811181 2024

[56] [56]

Kang, Aparna Nathan, Kathryn Weinand, Fan Zhang, Nghia Millard, Laurie Rumker, D

Joyce B. Kang, Aparna Nathan, Kathryn Weinand, Fan Zhang, Nghia Millard, Laurie Rumker, D. Branch Moody, Ilya Korsunsky, and Soumya Raychaudhuri. Efficient and pre- cise single-cell reference atlas mapping with symphony.Nature Communications, 12(1):5890,

[57] [57]

doi: 10.1038/s41467-021-25957-x. 29

work page doi:10.1038/s41467-021-25957-x

[58] [58]

Self-Normalizing Neural Networks

G¨ unter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self- normalizing neural networks, 2017. URLhttps://arxiv.org/abs/1706.02515

work page internal anchor Pith review Pith/arXiv arXiv 2017

[59] [59]

Vitalii Kleshchevnikov, Artem Shmatko, Emma Dann, Alexander Aivazidis, Hamish W. King, Tong Li, Rasa Elmentaite, Artem Lomakin, Veronika Kedlian, Adam Gayoso, Mika Sarkin Jain, Jun Sung Park, Lauma Ramona, Elizabeth Tuck, Anna Arutyunyan, Roser Vento- Tormo, Moritz Gerstung, Louisa James, Oliver Stegle, and Omer Ali Bayraktar. Cell2location maps fine-grai...

[60] [60]

doi: 10.1038/s41587-021-01139-4

work page doi:10.1038/s41587-021-01139-4

[61] [61]

The art of using t-sne for single-cell transcriptomics

Dmitry Kobak and Philipp Berens. The art of using t-sne for single-cell transcriptomics. Nature communications, 10(1):5416, 2019

2019

[62] [62]

Linderman

Dmitry Kobak and George C. Linderman. Initialization is critical for preserving global data structure in both t-sne and umap.Nature Biotechnology, 39(2):156–157, 2021. doi: 10.1038/ s41587-020-00809-z

2021

[63] [63]

Fast, sensitive and accurate integration of single-cell data with harmony.Nature Methods, 16(12):1289–1296,

Ilya Korsunsky, Nghia Millard, Jean Fan, Kamil Slowikowski, Fan Zhang, Kevin Wei, Yuriy Baglaenko, Michael Brenner, Po-Ru Loh, and Soumya Raychaudhuri. Fast, sensitive and accurate integration of single-cell data with harmony.Nature Methods, 16(12):1289–1296,

[64] [64]

doi: 10.1038/s41592-019-0619-0

work page doi:10.1038/s41592-019-0619-0

[65] [65]

Lecun, L

Yann LeCun, L´ eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791

work page doi:10.1109/5.726791 1998

[66] [66]

Lee and Michel Verleysen

John A. Lee and Michel Verleysen. Quality assessment of dimensionality reduction: Rank- based criteria.Neurocomputing, 72(7–9):1431–1443, 2009. doi: 10.1016/j.neucom.2008.12.017

work page doi:10.1016/j.neucom.2008.12.017 2009

[67] [67]

Lin, Allan Pinkus, and Shimon Schocken

Moshe Leshno, Vladimir Ya. Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.Neural Networks, 6(6):861–867, 1993

1993

[68] [68]

Efficient and robust bayesian selection of hyper- parameters in dimension reduction for visualization, 2023

Yin-Ting Liao, Hengrui Luo, and Anna Ma. Efficient and robust bayesian selection of hyper- parameters in dimension reduction for visualization, 2023. URLhttps://arxiv.org/abs/ 2306.00357

work page arXiv 2023

[69] [69]

Calibrating dimension reduction hyperparameters in the presence of noise.PLOS Computational Biology, 20(9):e1012427, September 2024

Justin Lin and Julia Fukuyama. Calibrating dimension reduction hyperparameters in the presence of noise.PLOS Computational Biology, 20(9):e1012427, September 2024. ISSN 1553-

2024

[70] [70]

URLhttp://dx.doi.org/10.1371/journal

doi: 10.1371/journal.pcbi.1012427. URLhttp://dx.doi.org/10.1371/journal. pcbi.1012427

work page doi:10.1371/journal.pcbi.1012427

[71] [71]

Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective, 2025

Zhexuan Liu, Rong Ma, and Yiqiao Zhong. Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective, 2025. URLhttps://arxiv.org/abs/ 2410.16608

work page arXiv 2025

[72] [72]

Cole, Michael I

Romain Lopez, Jeffrey Regier, Michael B. Cole, Michael I. Jordan, and Nir Yosef. Deep generative modeling for single-cell transcriptomics.Nature Methods, 15(12):1053–1058, 2018. doi: 10.1038/s41592-018-0229-2

work page doi:10.1038/s41592-018-0229-2 2018

[73] [73]

Alexander Wolf, and Fabian J

Mohammad Lotfollahi, F. Alexander Wolf, and Fabian J. Theis. scgen predicts single-cell perturbation responses.Nature Methods, 16:715–721, 2019. doi: 10.1038/s41592-019-0494-8. 30

work page doi:10.1038/s41592-019-0494-8 2019

[74] [74]

Luecken, Matin Khajavi, Maren B¨ uttner, Marco Wagenstetter, ˇZiga Avsec, Adam Gayoso, Nir Yosef, Marta Interlandi, Sergei Rybakov, Alexander V

Mohammad Lotfollahi, Mohsen Naghipourfar, Malte D. Luecken, Matin Khajavi, Maren B¨ uttner, Marco Wagenstetter, ˇZiga Avsec, Adam Gayoso, Nir Yosef, Marta Interlandi, Sergei Rybakov, Alexander V. Misharin, and Fabian J. Theis. Mapping single-cell data to reference atlases by transfer learning.Nature Biotechnology, 40(1):121–130, 2022. doi: 10.1038/s41587-...

work page doi:10.1038/s41587-021-01001-7 2022

[75] [75]

Ibarra, Sanjay R

Mohammad Lotfollahi, Anna Klimovskaia Susmelj, Carlo De Donno, Yuge Ji, Ignacio L. Ibarra, Sanjay R. Srivatsan, Mohsen Naghipourfar, Riza M. Daza, Beth Martin, F. Alexander Wolf, Nailya Yakubova, Jay Shendure Lee, Jos´ e L. McFaline-Figueroa, and Fabian J. Theis. Predicting cellular responses to complex perturbations in high-throughput screens.Molecular S...

work page doi:10.15252/msb.202211517 2023

[76] [76]

The Expressive Power of Neural Networks: A View from the Width

Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks: A view from the width, 2017. URLhttps://arxiv.org/abs/ 1709.02540

work page internal anchor Pith review Pith/arXiv arXiv 2017

[77] [77]

Manifold-learning- based feature extraction for classification of hyperspectral data: A review of advances in manifold learning.IEEE Signal Processing Magazine, 31(1):55–66, 2013

Dalton Lunga, Saurabh Prasad, Melba M Crawford, and Okan Ersoy. Manifold-learning- based feature extraction for classification of hyperspectral data: A review of advances in manifold learning.IEEE Signal Processing Magazine, 31(1):55–66, 2013

2013

[78] [78]

& Großberger, L

Leland McInnes, John Healy, Nathaniel Saul, and Lukas Grossberger. Umap: Uniform man- ifold approximation and projection.Journal of Open Source Software, 3(29):861, 2018. doi: 10.21105/joss.00861. URLhttps://doi.org/10.21105/joss.00861

work page doi:10.21105/joss.00861 2018

[79] [79]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction, 2020. URLhttps://arxiv.org/abs/1802.03426

work page internal anchor Pith review Pith/arXiv arXiv 2020

[80] [80]

Manifold learning: What, how, and why.Annual Review of Statistics and Its Application, 11:27–57, 2024

Marina Meil˘ a and Hanyu Zhang. Manifold learning: What, how, and why.Annual Review of Statistics and Its Application, 11:27–57, 2024. doi: 10.1146/annurev-statistics-112723-034552. URLhttps://arxiv.org/abs/2311.03757

work page doi:10.1146/annurev-statistics-112723-034552 2024