pith. sign in

arxiv: 2605.24244 · v1 · pith:G37QTNC3new · submitted 2026-05-22 · 📊 stat.ML · cs.LG

MEDAL: Manifold Embedding Distillation via Autoencoder Learning

Pith reviewed 2026-06-30 14:19 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords manifold embeddingautoencoderdimension reductionout-of-sample extensionheld-out validationembedding distillationreconstruction errornonlinear embedding
0
0 comments X

The pith

MEDAL distills any manifold embedding into a constrained autoencoder that matches the embedding at the bottleneck while reconstructing inputs, enabling held-out validation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Nonlinear dimension reduction techniques such as t-SNE and UMAP lack explicit maps for new points and inverses back to the original space, preventing standard held-out validation. MEDAL solves this by training an autoencoder whose bottleneck layer is forced to reproduce the teacher embedding exactly and whose decoder reconstructs the input data. The resulting model supplies an out-of-sample encoder, an approximate inverse, and a reconstruction-based distortion measure. This turns one-time embeddings into reusable models that support quantitative comparison of methods and hyperparameter choices on unseen data.

Core claim

By training a constrained autoencoder so that its bottleneck exactly reproduces any given teacher embedding while the decoder reconstructs the original inputs, MEDAL produces an explicit out-of-sample map, an approximate inverse map, and a pointwise reconstruction error that serves as a distortion measure, thereby converting static manifold embeddings into models that admit held-out validation, method comparison, and hyperparameter tuning.

What carries the argument

Constrained autoencoder whose bottleneck is forced to match the teacher embedding exactly while the decoder reconstructs the input.

If this is right

  • New samples receive an explicit map into the embedding space.
  • The decoder supplies an approximate inverse from embedding coordinates back to original features.
  • Pointwise reconstruction error quantifies local distortion in the manifold space.
  • Different dimension reduction methods can be compared quantitatively on held-out data.
  • Hyperparameters of the original embedding can be tuned using validation metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The reconstruction error could flag regions where the original embedding compresses biologically meaningful structure.
  • Mapping new samples and inspecting their reconstruction errors might serve as a practical test for distribution shift relative to the training manifold.
  • The same distillation step could be applied to other embedding algorithms to create a uniform validation layer across the field.
  • One could check whether the distilled model preserves higher-order neighborhood statistics better than existing out-of-sample extensions.

Load-bearing premise

The autoencoder can be trained to reproduce the geometry and neighborhoods of an arbitrary teacher embedding without adding its own systematic distortions.

What would settle it

A held-out test set where the neighborhoods or distances in the MEDAL-mapped space differ substantially from those produced by applying the original embedding method directly to the same points.

Figures

Figures reproduced from arXiv: 2605.24244 by Genevera I. Allen, Irene Chang, Tarek M. Zikry.

Figure 1
Figure 1. Figure 1: MEDAL inputs a fitted manifold embedding and distills this into a reusable model that permits quantitative validation of the embedding. A, MEDAL distills a teacher embedding into a constrained autoencoder by jointly optimizing a distillation loss that aligns the bottleneck with the teacher manifold and a reconstruction loss that preserves input-space informa￾tion. B, Once distilled, the learned encoder–dec… view at source ↗
Figure 2
Figure 2. Figure 2: MEDAL enables quantitative validation, comparison, and distortion analy￾sis on MNIST data. A, UMAP hyperparameter selection using held-out reconstruction error. For each value of n neighbors, a fitted UMAP teacher embedding was distilled into a MEDAL student, and reconstruction loss was evaluated on train, validation, and test splits. The selected model (n neighbors=35; dashed line) balances class separati… view at source ↗
Figure 3
Figure 3. Figure 3: MEDAL selects a biologically coherent embedding of the whole-animal Hydra single-cell RNA-seq dataset [87] and localizes cell-type-specific distortion. A, MEDAL hyperparameter tuning for t-SNE teachers on the Hydra single-cell atlas. Reconstruction loss was evaluated across t-SNE perplexities on train, validation, and test splits. The selected perplex￾ity, chosen by the validation curve using the one-stand… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of MEDAL with existing embedding diagnostics on a t-SNE embedding on Hydra ([87]) single-cell RNA-seq dataset. A, Hyperparameter-selection curves across t-SNE perplexities for MEDAL, neMDBD ([64]), scDEED ([103]), and EMBEDR ([51]). Dashed vertical lines indicate the selected perplexity for each method. MEDAL selects perplexity 499 by held-out reconstruction loss, whereas neMDBD selects perplexi… view at source ↗
Figure 5
Figure 5. Figure 5: MEDAL selects a manifold embedding for mouse neocortex single-cell RNA￾seq ([91]) and reveals cell types poorly represented on the manifold. A, MEDAL hy￾perparameter tuning for t-SNE teachers on the neocortex atlas. Reconstruction loss was evaluated across t-SNE perplexities on train, validation, and test splits. The selected perplexity was 53. The corresponding training and test embeddings preserve major … view at source ↗
Figure 6
Figure 6. Figure 6: MEDAL detects subject-level distribution shift in macaque retina single-cell RNA-seq ([81]) by embedding new cells into a fixed reference manifold. A, Joint t-SNE embedding of macaque retinal cells from multiple subjects. The embedding appears organized by cell type when colored by annotated retinal cell class, but coloring the same embedding by subject reveals substantial subject-level structure, indicati… view at source ↗
read the original abstract

Low-dimensional embeddings are widely used as visual summaries of high-dimensional data and to enable downstream scientific discoveries. Yet, popular nonlinear dimension reduction methods, such as t-SNE and UMAP, are often selected based on visual appeal alone and without rigorous quantitative validation. A major reason is that manifold embeddings typically do not provide an out-of-sample map nor an inverse back to the original feature space; this makes held-out validation, the gold standard in supervised learning, all but impossible. To address these challenges, we develop a novel framework, MEDAL (Manifold Embedding Distillation via Autoencoder Learning), which distills a fitted manifold embedding into a reusable encoder--decoder model. MEDAL trains a constrained autoencoder whose bottleneck exactly matches any teacher embedding while the decoder reconstructs the original input; this yields an explicit map for new samples, an approximate inverse, and a pointwise reconstruction-based measure of distortion in the manifold space. This converts static manifold embeddings into models that can be evaluated on held-out data, enabling quantitative validation including comparing different dimension reduction methods as well as hyperparameter tuning. Across multiple benchmark and scientific case studies, we show that MEDAL enables held-out validation to determine optimal manifold embeddings and hyperparameters, reveals biologically coherent regions that are difficult to preserve in two dimensional embeddings, and detects distribution shift when new samples are mapped into a fixed reference manifold. MEDAL provides a general validation wrapper to any existing dimension reduction technique that will improve the rigor and reliability of dimension reduction in scientific workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MEDAL, a framework for distilling any pre-fitted nonlinear manifold embedding (e.g., t-SNE or UMAP) into a constrained autoencoder. The encoder is trained so its bottleneck layer exactly reproduces the teacher embedding coordinates on training points while the decoder reconstructs the original high-dimensional input; this supplies an explicit out-of-sample map, an approximate inverse, and a pointwise reconstruction error that serves as a distortion measure in the embedded space. The resulting model enables held-out quantitative validation, hyperparameter selection, and distribution-shift detection for dimension-reduction methods that otherwise lack these capabilities.

Significance. If the central construction holds, MEDAL would convert static, non-reusable embeddings into evaluable models, directly addressing the lack of rigorous validation that currently limits the scientific use of nonlinear dimension reduction. The approach is general (applicable to any teacher embedding) and supplies concrete tools—out-of-sample extension and a reconstruction-based fidelity metric—that are absent from standard t-SNE/UMAP pipelines. No machine-checked proofs or parameter-free derivations are claimed, but the empirical demonstration on benchmarks and biological case studies, if supported by appropriate controls, would constitute a practical advance.

major comments (2)
  1. [§3.2] §3.2 (composite loss): the coordinate-matching term ||f_θ(x) − teacher(x)||_2 does not constrain local geometry or neighborhood structure. Nothing prevents the joint optimizer from trading small increases in matching error for large reconstruction gains by introducing folds or warps invisible to the pointwise L2 term yet visible to downstream nearest-neighbor or distance-based validation; the manuscript must supply explicit evidence (e.g., k-NN preservation or trustworthiness scores on held-out data) that such distortion does not occur.
  2. [§5] §5 (held-out validation experiments): the reported improvements in hyperparameter selection and distribution-shift detection rely on the assumption that the bottleneck faithfully reproduces the teacher geometry on unseen points. Without an ablation that isolates the effect of the matching loss weight or compares against a pure reconstruction autoencoder, it is unclear whether the observed gains are attributable to faithful distillation or to the autoencoder’s own inductive bias.
minor comments (2)
  1. Notation for the teacher embedding and the encoder output should be unified across equations and text to avoid confusion between the static teacher map and the learned f_θ.
  2. Figure captions should explicitly state whether the displayed embeddings are the original teacher or the MEDAL-reconstructed versions, and whether any quantitative metric is computed on training or held-out points.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on empirical validation. We address each major comment below, agreeing where additional evidence is warranted and outlining the revisions we will make.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (composite loss): the coordinate-matching term ||f_θ(x) − teacher(x)||_2 does not constrain local geometry or neighborhood structure. Nothing prevents the joint optimizer from trading small increases in matching error for large reconstruction gains by introducing folds or warps invisible to the pointwise L2 term yet visible to downstream nearest-neighbor or distance-based validation; the manuscript must supply explicit evidence (e.g., k-NN preservation or trustworthiness scores on held-out data) that such distortion does not occur.

    Authors: We agree that the pointwise L2 matching term alone does not explicitly regularize local geometry. While the training procedure is designed to reproduce the teacher coordinates exactly on the training points, we acknowledge that verification of neighborhood preservation on held-out data is required. In the revised manuscript we will add k-NN preservation and trustworthiness scores evaluated on held-out points, comparing the distilled MEDAL embeddings against the original teacher embeddings to provide the requested evidence that distortion does not occur. revision: yes

  2. Referee: [§5] §5 (held-out validation experiments): the reported improvements in hyperparameter selection and distribution-shift detection rely on the assumption that the bottleneck faithfully reproduces the teacher geometry on unseen points. Without an ablation that isolates the effect of the matching loss weight or compares against a pure reconstruction autoencoder, it is unclear whether the observed gains are attributable to faithful distillation or to the autoencoder’s own inductive bias.

    Authors: We agree that isolating the contribution of the matching loss is important for attributing the observed gains. The revised manuscript will include an ablation that varies the weight of the coordinate-matching term and directly compares the full MEDAL model against a pure reconstruction autoencoder (matching weight set to zero). These experiments will clarify whether the improvements stem from faithful distillation rather than the autoencoder architecture alone. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation self-contained with independent losses and evaluation

full rationale

The paper defines MEDAL as training an autoencoder with a composite objective that includes both reconstruction of the input and matching to a pre-fitted teacher embedding; the held-out validation, distortion measure, and out-of-sample mapping are defined directly from the decoder and encoder outputs without reducing to the teacher coordinates by construction. No self-citation is load-bearing for the central claim, no fitted parameter is relabeled as a prediction, and no uniqueness theorem or ansatz is imported from prior author work. The method is a standard constrained autoencoder wrapper whose outputs are independently falsifiable on held-out data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that an autoencoder can be constrained to reproduce a given embedding geometry while still learning a useful decoder; no free parameters or invented entities are introduced in the abstract description.

axioms (1)
  • domain assumption A standard autoencoder architecture can be trained so that its bottleneck layer exactly reproduces the coordinates of any given nonlinear manifold embedding.
    This premise is required for the distillation step to preserve the teacher embedding properties.

pith-pipeline@v0.9.1-grok · 5806 in / 1197 out tokens · 42600 ms · 2026-06-30T14:19:28.871091+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

120 extracted references · 69 canonical work pages · 9 internal anchors

  1. [1]

    Allen, Luqin Gan, and Lili Zheng

    Genevera I. Allen, Luqin Gan, and Lili Zheng. Interpretable machine learning for discovery: Statistical challenges and opportunities.Annual Review of Statistics and Its Application, 11 (Volume 11, 2024):97–121, 2024

  2. [2]

    Bingxue An and Tiffany M. Tang. Consensus dimension reduction via multi-view learning,

  3. [3]

    URLhttps://arxiv.org/abs/2512.15802

  4. [4]

    Hypernp: Interactive visual exploration of multidimensional projection hyperparameters, 2021

    Gabriel Appleby, Mateus Espadoto, Rui Chen, Samuel Goree, Alexandru Telea, Erik W Anderson, and Remco Chang. Hypernp: Interactive visual exploration of multidimensional projection hyperparameters, 2021. URLhttps://arxiv.org/abs/2106.13777

  5. [5]

    Reidenbach, Adam Gayoso, and Nir Yosef

    Tal Ashuach, Danny A. Reidenbach, Adam Gayoso, and Nir Yosef. Multivi: Deep generative model for the integration of multimodal data.Nature Methods, 20:1222–1231, 2023. doi: 10.1038/s41592-023-01909-9

  6. [6]

    Neural networks and principal component analysis: Learning from examples without local minima.Neural Networks, 2(1):53–58, 1989

    Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural Networks, 2(1):53–58, 1989. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(89)90014-2. URLhttps://www.sciencedirect. com/science/article/pii/0893608089900142

  7. [7]

    Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal func- tion.IEEE Trans. Inf. Theory, 39:930–945, 1993. URLhttps://api.semanticscholar. org/CorpusID:15383918

  8. [8]

    Dimensionality reduction for visualizing single-cell data using umap.Nature Biotechnology, 37(1):38–44, 2019

    Etienne Becht, Leland McInnes, John Healy, Charles-Antoine Dutertre, Immanuel W H Kwok, Lai Guan Ng, Florent Ginhoux, and Evan W Newell. Dimensionality reduction for visualizing single-cell data using umap.Nature Biotechnology, 37(1):38–44, 2019

  9. [9]

    Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373–1396, 2003

    Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373–1396, 2003. doi: 10.1162/ 089976603321780317

  10. [10]

    Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering.Advances in neural information processing systems, 16, 2003

    Yoshua Bengio, Jean-fran¸ ccois Paiement, Pascal Vincent, Olivier Delalleau, Nicolas Roux, and Marie Ouimet. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering.Advances in neural information processing systems, 16, 2003

  11. [11]

    doi: 10.1109/TPAMI.2013.50

    Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1798–1828, 2013. doi: 10.1109/TPAMI.2013.50

  12. [12]

    Vanderburg,˚Asa Segerstolpe, Meng Zhang, Inbal Avraham- Davidi, and Aviv Regev

    Tommaso Biancalani, Gabriele Scalia, Lorenzo Buffoni, Rahul Avasthi, Ziqing Lu, Aviv Sanger, Nazli Tokcan, Charles R. Vanderburg,˚Asa Segerstolpe, Meng Zhang, Inbal Avraham- Davidi, and Aviv Regev. Deep learning and alignment of spatially resolved single-cell transcriptomes with tangram.Nature Methods, 18(11):1352–1362, 2021. doi: 10.1038/ s41592-021-01264-7

  13. [13]

    Strategies for eels data analysis

    Javier Blanco-Portals, Francesca Peir´ o, and S` onia Estrad´ e. Strategies for eels data analysis. introducing umap and hdbscan for dimensionality reduction and clustering.Microscopy and Microanalysis, 28(1):109–122, 2022. 26

  14. [14]

    Bourlard and Y

    H. Bourlard and Y. Kamp. Auto-association by multilayer perceptrons and singular value decomposition.Biological Cybernetics, 59(4):291–294, 1988. doi: 10.1007/BF00332918. URL https://doi.org/10.1007/BF00332918

  15. [15]

    Stability and generalization.Journal of Machine Learning Research, 2:499–526, 2002

    Olivier Bousquet and Andr´ e Elisseeff. Stability and generalization.Journal of Machine Learning Research, 2:499–526, 2002. doi: 10.1162/153244302760200704

  16. [16]

    McFaline-Figueroa, Michael I

    Pierre Boyeau, Jeremy Hong, Adam Gayoso, Michelle Kim, Jos´ e L. McFaline-Figueroa, Michael I. Jordan, Elham Azizi, Can Ergen, and Nir Yosef. Deep generative model- ing of sample-level heterogeneity in single-cell genomics.Nature Methods, 2025. doi: 10.1038/s41592-025-02808-x

  17. [17]

    Friedman, Richard A

    Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone.Classification and Regression Trees. Wadsworth, Belmont, CA, 1984

  18. [18]

    Automatic Selection of t-SNE Perplexity

    Yanshuai Cao and Luyu Wang. Automatic selection of t-sne perplexity, 2017. URLhttps: //arxiv.org/abs/1708.03229

  19. [19]

    A critical analysis of the usage of dimensionality reduction in four domains.IEEE Transactions on Visualization and Computer Graphics, 31(10):9405–9423, October 2025

    Dylan Cashman, Mark Keller, Hyeon Jeon, Bum Chul Kwon, and Qianwen Wang. A critical analysis of the usage of dimensionality reduction in four domains.IEEE Transactions on Visualization and Computer Graphics, 31(10):9405–9423, October 2025. ISSN 2160-9306. doi: 10.1109/tvcg.2025.3567989. URLhttp://dx.doi.org/10.1109/TVCG.2025.3567989

  20. [20]

    Raymond B. Cattell. The scree test for the number of factors.Multivariate Behavioral Research, 1(2):245–276, 1966. doi: 10.1207/s15327906mbr0102 10

  21. [21]

    Tang, Tarek M

    Andersen Chang, Tiffany M. Tang, Tarek M. Zikry, and Genevera I. Allen. Unsupervised machine learning for scientific discovery: Workflow and best practices, 2025. URLhttps: //arxiv.org/abs/2506.04553

  22. [22]

    Dataslingers/medal: v0.1.0, May 2026

    Irene Chang, tzUNC, and Matthew Shen. Dataslingers/medal: v0.1.0, May 2026. URL https://doi.org/10.5281/zenodo.20347573

  23. [23]

    The specious art of single-cell genomics.PLOS Computational Biology, 19(8):1–20, 08 2023

    Tara Chari and Lior Pachter. The specious art of single-cell genomics.PLOS Computational Biology, 19(8):1–20, 08 2023. doi: 10.1371/journal.pcbi.1011288. URLhttps://doi.org/ 10.1371/journal.pcbi.1011288

  24. [24]

    Neural population dynamics during reaching.Nature, 487(7405):51–56, 2012

    Mark M Churchland, John P Cunningham, Matthew T Kaufman, Justin D Foster, Paul Nuyujukian, Stephen I Ryu, and Krishna V Shenoy. Neural population dynamics during reaching.Nature, 487(7405):51–56, 2012

  25. [25]

    scgpt: Toward building a foundation model for single-cell multi-omics using generative ai.Nature Methods, 21:1470–1480, 2024

    Haotian Cui, Chen Wang, Hassaan Maan, Kuan Pang, Feng Luo, and Bo Wang. scgpt: Toward building a foundation model for single-cell multi-omics using generative ai.Nature Methods, 21:1470–1480, 2024. doi: 10.1038/s41592-024-02201-0

  26. [26]

    Dimensionality reduction for large-scale neural record- ings.Nature neuroscience, 17(11):1500–1509, 2014

    John P Cunningham and Byron M Yu. Dimensionality reduction for large-scale neural record- ings.Nature neuroscience, 17(11):1500–1509, 2014

  27. [27]

    George V. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989. URLhttps://api.semanticscholar. org/CorpusID:3958369. 27

  28. [28]

    Sloan, Derek Croote, Marco Mignardi, Sophia Chernikova, Pey- man Samghababi, Ye Zhang, Norma Neff, Mark Kowarsky, Christine Caneda, Gordon Li, Steven D

    Spyros Darmanis, Steven A. Sloan, Derek Croote, Marco Mignardi, Sophia Chernikova, Pey- man Samghababi, Ye Zhang, Norma Neff, Mark Kowarsky, Christine Caneda, Gordon Li, Steven D. Chang, Ian David Connolly, Yingmei Li, Ben A. Barres, Melanie Hayden Gephart, and Stephen R. Quake. Single-cell rna-seq analysis of infiltrating neoplastic cells at the mi- grat...

  29. [29]

    Lowell E. Davis. Histological and ultrastructural studies of the basal disk of hydra. iii. the gastrodermis and the mesoglea.Cell and Tissue Research, 162:107–118, 1975. doi: 10.1007/BF00223266

  30. [30]

    Hamprecht, Em˝ oke´Agnes Horv´ at, Dhruv Kohli, Smita Krishnaswamy, John A

    Cyril de Bodt, Alex Diaz-Papkovich, Michael Bleher, Kerstin Bunte, Corinna Coupette, Se- bastian Damrich, Enrique Fita Sanmartin, Fred A. Hamprecht, Em˝ oke´Agnes Horv´ at, Dhruv Kohli, Smita Krishnaswamy, John A. Lee, Boudewijn P. F. Lelieveldt, Leland McInnes, Ian T. Nabney, Maximilian Noichl, Pavlin G. Poliˇ car, Bastian Rieck, Guy Wolf, Gal Mishne, an...

  31. [31]

    Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder.Nature Communications, 13 (1):1739, 2022

    Kangning Dong and Shihua Zhang. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder.Nature Communications, 13 (1):1739, 2022. doi: 10.1038/s41467-022-29439-6

  32. [32]

    Dorrity, Lauren M

    Michael W. Dorrity, Lauren M. Saunders, Christine Queitsch, Stanley Fields, and Cole Trapnell. Dimensionality reduction by umap to visualize physical and genetic interac- tions.Nature Communications, 11(1):1537, 2020. doi: 10.1038/s41467-020-15351-4. URL https://doi.org/10.1038/s41467-020-15351-4

  33. [33]

    Duque, Sacha Morin, Guy Wolf, and Kevin Moon

    Andres F. Duque, Sacha Morin, Guy Wolf, and Kevin Moon. Extendable and invertible mani- fold learning with geometry regularized autoencoders. In2020 IEEE International Conference on Big Data (Big Data), page 5027–5036. IEEE, December 2020. doi: 10.1109/bigdata50022. 2020.9378049. URLhttp://dx.doi.org/10.1109/BigData50022.2020.9378049

  34. [34]

    gene expression cancer RNA-Seq

    Samuele Fiorini. gene expression cancer RNA-Seq. UCI Machine Learning Repository, 2016. DOI: https://doi.org/10.24432/C5R88H

  35. [35]

    A review of unsupervised learning in astronomy.Astronomy and Com- puting, 48:100851, 2024

    Sotiria Fotopoulou. A review of unsupervised learning in astronomy.Astronomy and Com- puting, 48:100851, 2024

  36. [36]

    Are machine learning interpretations reliable? a stability study on global interpretations.arXiv preprint arXiv:2505.15728, 2025

    Luqin Gan, Tarek M Zikry, and Genevera I Allen. Are machine learning interpretations reliable? a stability study on global interpretations.arXiv preprint arXiv:2505.15728, 2025

  37. [37]

    Nazor, Aaron Streets, and Nir Yosef

    Adam Gayoso, Zo¨ e Steier, Romain Lopez, Jeffrey Regier, Kristopher L. Nazor, Aaron Streets, and Nir Yosef. Joint probabilistic modeling of single-cell multi-omic data with totalvi.Nature Methods, 18(3):272–282, 2021. doi: 10.1038/s41592-020-01050-x

  38. [38]

    Rufus Gikera, Elizaphan Maina, Shadrack Maina Mambo, and Jonathan Mwaura. K- hyperparameter tuning in high-dimensional genomics using joint optimization of deep differ- ential evolutionary algorithm and unsupervised transfer learning from intelligent genoumap embeddings.International Journal of Information Technology, 17(3):1679–1701, 2025

  39. [39]

    Expressivity of deep neural networks,

    Ingo G¨ uhring, Mones Raslan, and Gitta Kutyniok. Expressivity of deep neural networks,

  40. [40]

    URLhttps://arxiv.org/abs/2007.04759. 28

  41. [41]

    Laleh Haghverdi, Aaron T. L. Lun, Michael D. Morgan, and John C. Marioni. Batch effects in single-cell rna-sequencing data are corrected by matching mutual nearest neighbors.Nature Biotechnology, 36:421–427, 2018. doi: 10.1038/nbt.4091

  42. [42]

    Approximating Continuous Functions by ReLU Nets of Minimal Width

    Boris Hanin and Mark Sellke. Approximating continuous functions by relu nets of minimal width, 2018. URLhttps://arxiv.org/abs/1710.11278

  43. [43]

    Mauck III, Shiwei Zheng, Andrew Butler, Maddie J

    Yuhan Hao, Stephanie Hao, Erica Andersen-Nissen, William M. Mauck III, Shiwei Zheng, Andrew Butler, Maddie J. Lee, Aaron J. Wilk, Charlotte Darby, Michael Zager, et al. In- tegrated analysis of multimodal single-cell data.Cell, 184(13):3573–3587.e29, 2021. doi: 10.1016/j.cell.2021.04.048

  44. [44]

    Interactive single-cell data analysis using cellar.Nature Communications, 13(1):1998, 2022

    Euxhen Hasanaj, Jingtao Wang, Arjun Sarathi, Jun Ding, and Ziv Bar-Joseph. Interactive single-cell data analysis using cellar.Nature Communications, 13(1):1998, 2022

  45. [45]

    Distilling the knowledge in a neural network,

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network,

  46. [46]

    URLhttps://arxiv.org/abs/1503.02531

  47. [47]

    Hinton and Richard S

    Geoffrey E. Hinton and Richard S. Zemel. Autoencoders, minimum description length and helmholtz free energy. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems 6, pages 3–10. Morgan Kaufmann, 1994

  48. [48]

    Holstein

    Thomas W. Holstein. The hydra stem cell system – revisited.Cells & Development, 174: 203846, 2023. doi: 10.1016/j.cdev.2023.203846

  49. [49]

    Approximation capabilities of multilayer feedforward networks.Neural Net- works, 4(2):251–257, 1991

    Kurt Hornik. Approximation capabilities of multilayer feedforward networks.Neural Net- works, 4(2):251–257, 1991

  50. [50]

    Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359–366, 1989

    Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359–366, 1989

  51. [51]

    Analysis of a complex of statistical variables into principal components

    Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933

  52. [52]

    Haiyang Huang, Yingfan Wang, Cynthia Rudin, and Edward P. Browne. Towards a com- prehensive evaluation of dimension reduction methods for transcriptomic data visualiza- tion.Communications Biology, 5(1):719, 2022. doi: 10.1038/s42003-022-03628-x. URL https://doi.org/10.1038/s42003-022-03628-x

  53. [53]

    Stop misusing t-sne and umap for visual analytics, 2025

    Hyeon Jeon, Jeongin Park, Sungbok Shin, and Jinwook Seo. Stop misusing t-sne and umap for visual analytics, 2025. URLhttps://arxiv.org/abs/2506.08725

  54. [54]

    Embedr: distinguishing signal from noise in single-cell omics data.Patterns, 3(3), 2022

    Eric M Johnson, William Kath, and Madhav Mani. Embedr: distinguishing signal from noise in single-cell omics data.Patterns, 3(3), 2022

  55. [55]

    Optimizing dimension- ality reduction in sdn: A metaheuristic approach of umap parameter tuning

    Abderrahmane Jouilili, Hajar Hantouti, and Rajae E L Ouazzani. Optimizing dimension- ality reduction in sdn: A metaheuristic approach of umap parameter tuning. In2024 5th International Conference on Communications, Information, Electronic and Energy Systems (CIEES), pages 1–6, 2024. doi: 10.1109/CIEES62939.2024.10811181

  56. [56]

    Kang, Aparna Nathan, Kathryn Weinand, Fan Zhang, Nghia Millard, Laurie Rumker, D

    Joyce B. Kang, Aparna Nathan, Kathryn Weinand, Fan Zhang, Nghia Millard, Laurie Rumker, D. Branch Moody, Ilya Korsunsky, and Soumya Raychaudhuri. Efficient and pre- cise single-cell reference atlas mapping with symphony.Nature Communications, 12(1):5890,

  57. [57]

    doi: 10.1038/s41467-021-25957-x. 29

  58. [58]

    Self-Normalizing Neural Networks

    G¨ unter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self- normalizing neural networks, 2017. URLhttps://arxiv.org/abs/1706.02515

  59. [59]

    Vitalii Kleshchevnikov, Artem Shmatko, Emma Dann, Alexander Aivazidis, Hamish W. King, Tong Li, Rasa Elmentaite, Artem Lomakin, Veronika Kedlian, Adam Gayoso, Mika Sarkin Jain, Jun Sung Park, Lauma Ramona, Elizabeth Tuck, Anna Arutyunyan, Roser Vento- Tormo, Moritz Gerstung, Louisa James, Oliver Stegle, and Omer Ali Bayraktar. Cell2location maps fine-grai...

  60. [60]

    doi: 10.1038/s41587-021-01139-4

  61. [61]

    The art of using t-sne for single-cell transcriptomics

    Dmitry Kobak and Philipp Berens. The art of using t-sne for single-cell transcriptomics. Nature communications, 10(1):5416, 2019

  62. [62]

    Linderman

    Dmitry Kobak and George C. Linderman. Initialization is critical for preserving global data structure in both t-sne and umap.Nature Biotechnology, 39(2):156–157, 2021. doi: 10.1038/ s41587-020-00809-z

  63. [63]

    Fast, sensitive and accurate integration of single-cell data with harmony.Nature Methods, 16(12):1289–1296,

    Ilya Korsunsky, Nghia Millard, Jean Fan, Kamil Slowikowski, Fan Zhang, Kevin Wei, Yuriy Baglaenko, Michael Brenner, Po-Ru Loh, and Soumya Raychaudhuri. Fast, sensitive and accurate integration of single-cell data with harmony.Nature Methods, 16(12):1289–1296,

  64. [64]

    doi: 10.1038/s41592-019-0619-0

  65. [65]

    Lecun, L

    Yann LeCun, L´ eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791

  66. [66]

    Lee and Michel Verleysen

    John A. Lee and Michel Verleysen. Quality assessment of dimensionality reduction: Rank- based criteria.Neurocomputing, 72(7–9):1431–1443, 2009. doi: 10.1016/j.neucom.2008.12.017

  67. [67]

    Lin, Allan Pinkus, and Shimon Schocken

    Moshe Leshno, Vladimir Ya. Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.Neural Networks, 6(6):861–867, 1993

  68. [68]

    Efficient and robust bayesian selection of hyper- parameters in dimension reduction for visualization, 2023

    Yin-Ting Liao, Hengrui Luo, and Anna Ma. Efficient and robust bayesian selection of hyper- parameters in dimension reduction for visualization, 2023. URLhttps://arxiv.org/abs/ 2306.00357

  69. [69]

    Calibrating dimension reduction hyperparameters in the presence of noise.PLOS Computational Biology, 20(9):e1012427, September 2024

    Justin Lin and Julia Fukuyama. Calibrating dimension reduction hyperparameters in the presence of noise.PLOS Computational Biology, 20(9):e1012427, September 2024. ISSN 1553-

  70. [70]

    URLhttp://dx.doi.org/10.1371/journal

    doi: 10.1371/journal.pcbi.1012427. URLhttp://dx.doi.org/10.1371/journal. pcbi.1012427

  71. [71]

    Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective, 2025

    Zhexuan Liu, Rong Ma, and Yiqiao Zhong. Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective, 2025. URLhttps://arxiv.org/abs/ 2410.16608

  72. [72]

    Cole, Michael I

    Romain Lopez, Jeffrey Regier, Michael B. Cole, Michael I. Jordan, and Nir Yosef. Deep generative modeling for single-cell transcriptomics.Nature Methods, 15(12):1053–1058, 2018. doi: 10.1038/s41592-018-0229-2

  73. [73]

    Alexander Wolf, and Fabian J

    Mohammad Lotfollahi, F. Alexander Wolf, and Fabian J. Theis. scgen predicts single-cell perturbation responses.Nature Methods, 16:715–721, 2019. doi: 10.1038/s41592-019-0494-8. 30

  74. [74]

    Luecken, Matin Khajavi, Maren B¨ uttner, Marco Wagenstetter, ˇZiga Avsec, Adam Gayoso, Nir Yosef, Marta Interlandi, Sergei Rybakov, Alexander V

    Mohammad Lotfollahi, Mohsen Naghipourfar, Malte D. Luecken, Matin Khajavi, Maren B¨ uttner, Marco Wagenstetter, ˇZiga Avsec, Adam Gayoso, Nir Yosef, Marta Interlandi, Sergei Rybakov, Alexander V. Misharin, and Fabian J. Theis. Mapping single-cell data to reference atlases by transfer learning.Nature Biotechnology, 40(1):121–130, 2022. doi: 10.1038/s41587-...

  75. [75]

    Ibarra, Sanjay R

    Mohammad Lotfollahi, Anna Klimovskaia Susmelj, Carlo De Donno, Yuge Ji, Ignacio L. Ibarra, Sanjay R. Srivatsan, Mohsen Naghipourfar, Riza M. Daza, Beth Martin, F. Alexander Wolf, Nailya Yakubova, Jay Shendure Lee, Jos´ e L. McFaline-Figueroa, and Fabian J. Theis. Predicting cellular responses to complex perturbations in high-throughput screens.Molecular S...

  76. [76]

    The Expressive Power of Neural Networks: A View from the Width

    Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks: A view from the width, 2017. URLhttps://arxiv.org/abs/ 1709.02540

  77. [77]

    Manifold-learning- based feature extraction for classification of hyperspectral data: A review of advances in manifold learning.IEEE Signal Processing Magazine, 31(1):55–66, 2013

    Dalton Lunga, Saurabh Prasad, Melba M Crawford, and Okan Ersoy. Manifold-learning- based feature extraction for classification of hyperspectral data: A review of advances in manifold learning.IEEE Signal Processing Magazine, 31(1):55–66, 2013

  78. [78]

    & Großberger, L

    Leland McInnes, John Healy, Nathaniel Saul, and Lukas Grossberger. Umap: Uniform man- ifold approximation and projection.Journal of Open Source Software, 3(29):861, 2018. doi: 10.21105/joss.00861. URLhttps://doi.org/10.21105/joss.00861

  79. [79]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction, 2020. URLhttps://arxiv.org/abs/1802.03426

  80. [80]

    Manifold learning: What, how, and why.Annual Review of Statistics and Its Application, 11:27–57, 2024

    Marina Meil˘ a and Hanyu Zhang. Manifold learning: What, how, and why.Annual Review of Statistics and Its Application, 11:27–57, 2024. doi: 10.1146/annurev-statistics-112723-034552. URLhttps://arxiv.org/abs/2311.03757

Showing first 80 references.