Testing the Manifold Hypothesis

Charles Fefferman; Hariharan Narayanan; Sanjoy Mitter

arxiv: 1310.0425 · v2 · pith:YPZRS5HKnew · submitted 2013-10-01 · 🧮 math.ST · math.CA· math.DG· stat.TH

Testing the Manifold Hypothesis

Charles Fefferman , Sanjoy Mitter , Hariharan Narayanan This is my paper

classification 🧮 math.ST math.CAmath.DGstat.TH

keywords manifolddistributionprobabilityalgorithmfollowinghypothesisleastpoint

0 comments

read the original abstract

The hypothesis that high dimensional data tend to lie in the vicinity of a low dimensional manifold is the basis of manifold learning. The goal of this paper is to develop an algorithm (with accompanying complexity guarantees) for fitting a manifold to an unknown probability distribution supported in a separable Hilbert space, only using i.i.d samples from that distribution. More precisely, our setting is the following. Suppose that data are drawn independently at random from a probability distribution $P$ supported on the unit ball of a separable Hilbert space $H$. Let $G(d, V, \tau)$ be the set of submanifolds of the unit ball of $H$ whose volume is at most $V$ and reach (which is the supremum of all $r$ such that any point at a distance less than $r$ has a unique nearest point on the manifold) is at least $\tau$. Let $L(M, P)$ denote mean-squared distance of a random point from the probability distribution $P$ to $M$. We obtain an algorithm that tests the manifold hypothesis in the following sense. The algorithm takes i.i.d random samples from $P$ as input, and determines which of the following two is true (at least one must be): (a) There exists $M \in G(d, CV, \frac{\tau}{C})$ such that $L(M, P) \leq C \epsilon.$ (b) There exists no $M \in G(d, V/C, C\tau)$ such that $L(M, P) \leq \frac{\epsilon}{C}.$ The answer is correct with probability at least $1-\delta$.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Generative models on phase space
hep-ph 2026-04 unverdicted novelty 8.0

Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
Diffusion Models Memorize in Training -- and Generalize in Inference
cs.LG 2026-03 unverdicted novelty 6.0

Diffusion models overfit denoising loss at intermediate noise but generalize in inference as model error smooths the flow field and sampling paths avoid memorized noisy training data.