Estimating the effective dimension of large biological datasets using Fisher separability analysis

Andrei Zinovyev; Jonathan Bac; Luca Albergante

arxiv: 1901.06328 · v1 · pith:ME4U54ZQnew · submitted 2019-01-18 · 💻 cs.LG · q-bio.QM· stat.ML

Estimating the effective dimension of large biological datasets using Fisher separability analysis

Luca Albergante , Jonathan Bac , Andrei Zinovyev This is my paper

classification 💻 cs.LG q-bio.QMstat.ML

keywords datasetsintrinsicbiologicaldatadimensiondimensionalityestimatingfrequently

0 comments

read the original abstract

Modern large-scale datasets are frequently said to be high-dimensional. However, their data point clouds frequently possess structures, significantly decreasing their intrinsic dimensionality (ID) due to the presence of clusters, points being located close to low-dimensional varieties or fine-grained lumping. We test a recently introduced dimensionality estimator, based on analysing the separability properties of data points, on several benchmarks and real biological datasets. We show that the introduced measure of ID has performance competitive with state-of-the-art measures, being efficient across a wide range of dimensions and performing better in the case of noisy samples. Moreover, it allows estimating the intrinsic dimension in situations where the intrinsic manifold assumption is not valid.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Symphony of high-dimensional brain
q-bio.NC 2019-06 unverdicted novelty 1.0

The paper analyzes participant opinions from a Physics of Life Reviews discussion on the simplicity revolution in high-dimensional neuroscience and its implications for machine learning.