Manifold Dimension Estimation via Local Graph Structure

Pierre Lafaye de Micheaux; Zelong Bi

arxiv: 2510.15141 · v4 · pith:PFL4BETMnew · submitted 2025-10-16 · 📊 stat.ML · cs.LG· stat.AP

Manifold Dimension Estimation via Local Graph Structure

Zelong Bi , Pierre Lafaye de Micheaux This is my paper

Pith reviewed 2026-05-18 05:52 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.AP

keywords manifold dimension estimationlocal PCAgraph structurequadratic embeddingtotal least squarescurvaturemachine learning

0 comments

The pith

A framework using regression on local PCA coordinates estimates manifold dimension without assuming local flatness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a new way to estimate the dimension of a manifold in data by modeling local connections through regression on principal component analysis coordinates from small neighborhoods. This avoids the usual assumption that those neighborhoods look flat and instead accounts for curvature in the data structure. Two concrete estimators are built in this setup: quadratic embedding and total least squares. On both artificial and real datasets the estimators match or exceed the accuracy of existing leading methods. The approach matters for machine learning tasks that need reliable estimates of intrinsic dimension in curved high-dimensional spaces.

Core claim

Most existing manifold dimension estimators rely on the assumption that the underlying manifold is locally flat within the neighborhoods under consideration. Motivated by curvature-adjusted PCA, the authors propose a framework that captures the local graph structure of the manifold through regression on local PCA coordinates. Within this framework, quadratic embedding (QE) and total least squares (TLS) estimators are introduced and shown through experiments to perform competitively with and often outperform state-of-the-art approaches on synthetic and real-world datasets.

What carries the argument

Regression on local PCA coordinates to capture the local graph structure of the manifold.

If this is right

The QE and TLS estimators can estimate dimension on manifolds where local flatness does not hold.
Both estimators achieve competitive or superior accuracy on synthetic data with controlled curvature.
The same estimators also perform well on real-world datasets without extra tuning for sampling density.
The framework offers a direct alternative to curvature-adjusted PCA for dimension estimation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The regression approach could be tested on manifolds whose curvature changes across regions to check robustness.
Replacing the quadratic model with other simple regressors might yield further gains in accuracy for specific data types.
The method suggests dimension estimation could be combined with local structure recovery for joint tasks like clustering.

Load-bearing premise

That performing regression on local PCA coordinates reliably captures the local graph structure of the manifold even when neighborhoods are not locally flat.

What would settle it

A controlled test on a synthetic manifold with known non-zero curvature where the QE or TLS estimator recovers the true dimension while methods that assume local flatness produce errors.

read the original abstract

Most existing manifold dimension estimators rely on the assumption that the underlying manifold is locally flat within the neighborhoods under consideration. More recently, curvature-adjusted principal component analysis (CA-PCA) has emerged as a powerful alternative by explicitly accounting for the manifold's curvature. Motivated by these ideas, we propose a manifold dimension estimation framework that captures the local graph structure of the manifold through regression on local PCA coordinates. Within this framework, we introduce two representative estimators: quadratic embedding (QE) and total least squares (TLS). Experiments on both synthetic and real-world datasets demonstrate that these methods perform competitively with, and often outperform, state-of-the-art approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The regression-on-local-PCA-coordinates framework gives a workable extension of CA-PCA ideas for dimension estimation, with QE and TLS showing decent empirical results, but the experiments are thin and the flatness relaxation claim is not fully convincing.

read the letter

The main thing to know is that this paper sets up manifold dimension estimation as a regression problem on local PCA coordinates to capture the underlying graph structure, then introduces quadratic embedding and total least squares estimators that are meant to work without forcing neighborhoods to be locally flat. It reports that these beat or match existing methods on both synthetic manifolds and some real datasets. That is the core contribution and the part worth paying attention to. It builds on recent curvature-adjusted PCA work but reframes the problem around regression rather than direct curvature correction, which is a reasonable incremental step. The empirical comparisons are the strongest part of what is shown so far, since they at least test the estimators on controlled data where curvature is present. The soft spots are more noticeable once you look at the experimental claims. There is almost no detail on how neighborhoods were sized, what exact baselines were run, how noise was varied, or whether any error bars or significance checks were done. Without that, the statement that the new estimators “often outperform” is hard to weigh. The stress-test note about local flatness also lands. Local PCA coordinates approximate the tangent space reliably only when curvature effects are small inside the neighborhood; if the regression is run directly on those coordinates with no extra correction or weighting for curvature or sampling density, then performance gains could still trace back to neighborhood tuning rather than a genuine relaxation of the flatness assumption. The paper does not appear to supply a theoretical argument or targeted experiment that separates those effects. This is the kind of paper that would interest people working on practical manifold learning tools in statistical ML, especially when the data are known to be curved. A reader who needs a new estimator to try on high-dimensional point clouds would get something usable from the QE and TLS variants. It is worth sending to peer review so the experimental gaps and the assumption question can be addressed directly.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a manifold dimension estimation framework that captures local graph structure via regression on local PCA coordinates. It introduces two estimators—quadratic embedding (QE) and total least squares (TLS)—and reports that they perform competitively with or outperform existing methods on synthetic and real-world data while relaxing the local-flatness assumption common in prior estimators.

Significance. If the central claims hold, the work could advance manifold learning by offering estimators that incorporate graph structure without explicit local-flatness requirements, potentially improving robustness on curved manifolds. The regression-based approach on PCA coordinates is a concrete technical contribution that builds on CA-PCA ideas.

major comments (2)

[Abstract] Abstract: the performance claim that QE and TLS 'perform competitively with, and often outperform, state-of-the-art approaches' is load-bearing yet unsupported by any description of experimental design, baselines, error bars, or statistical tests, making it impossible to evaluate whether the data actually support the claim.
[Method description] Method description (framework section): the central claim that regression on local PCA coordinates captures graph structure without requiring locally flat neighborhoods is not accompanied by an explicit correction, sampling-density normalization, or curvature-aware term; local PCA approximates the tangent space only when curvature is negligible within the neighborhood, so the regression target may still encode curvature bias.

minor comments (1)

[Abstract] The abstract would be clearer if it briefly indicated the key equations or loss functions defining the QE and TLS estimators.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help us improve the clarity and rigor of the manuscript. We address each major comment point by point below, indicating the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: the performance claim that QE and TLS 'perform competitively with, and often outperform, state-of-the-art approaches' is load-bearing yet unsupported by any description of experimental design, baselines, error bars, or statistical tests, making it impossible to evaluate whether the data actually support the claim.

Authors: We agree that the abstract, being concise, does not detail the experimental design. The full manuscript contains a dedicated experiments section describing the synthetic manifolds (with known intrinsic dimensions), real-world datasets, baseline estimators, performance metrics, and visualizations. To address the concern directly, we will revise the abstract to include a brief clause summarizing the validation, e.g., 'validated through experiments on synthetic and real datasets with comparisons to existing estimators.' We will also ensure error bars and any statistical comparisons are explicitly noted in the revised figures and tables. revision: yes
Referee: [Method description] Method description (framework section): the central claim that regression on local PCA coordinates captures graph structure without requiring locally flat neighborhoods is not accompanied by an explicit correction, sampling-density normalization, or curvature-aware term; local PCA approximates the tangent space only when curvature is negligible within the neighborhood, so the regression target may still encode curvature bias.

Authors: We acknowledge that local PCA provides a tangent-space approximation that is most accurate under low curvature. Our regression-based framework, however, extends beyond this by fitting models (quadratic in QE, robust linear in TLS) directly on the PCA coordinates to encode local graph connectivity and higher-order effects. This is motivated by and builds upon CA-PCA ideas. We agree an explicit discussion would strengthen the presentation. In revision we will expand the framework section with a paragraph clarifying how the regression step captures structure beyond the first-order tangent approximation and under what neighborhood conditions the local-flatness assumption is relaxed. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces a manifold dimension estimation framework based on regression on local PCA coordinates to capture local graph structure, along with QE and TLS estimators. The abstract motivates the approach from prior CA-PCA work but presents the new methods and their empirical performance on synthetic and real datasets as independent contributions. No equations, derivations, or self-citations are provided that reduce any central claim to a fitted parameter, self-definition, or load-bearing prior result by the authors themselves. The framework is tested externally rather than being tautological with its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unelaborated premise that local PCA regression captures graph structure.

pith-pipeline@v0.9.0 · 5629 in / 1040 out tokens · 33558 ms · 2026-05-18T05:52:15.917447+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose a general framework for manifold dimension estimation that characterizes the manifold’s local graph structure through the integration of PCA and regression-based techniques... quadratic embedding (QE) and total least squares (TLS)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the neighborhood of x0 is represented as the graph of g... quadratic approximation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.