Subspace Determination through Local Intrinsic Dimensional Decomposition: Theory and Experimentation
Pith reviewed 2026-05-24 21:10 UTC · model grok-4.3
The pith
Decomposing local intrinsic dimension along feature axes identifies subspaces that support cluster formation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An estimator of LID along axis projections is developed, and preliminary evidence is provided that this LID decomposition can indicate axis-aligned data subspaces that support the formation of clusters, by identifying axes with the greatest local discriminability or fewest LID components capturing local complexity.
What carries the argument
The estimator of local intrinsic dimension along axis projections, which identifies directions of greatest local discriminability.
If this is right
- For each data point the axes with highest LID components indicate the most discriminative features locally.
- The decomposition reduces the search space for subspace clustering by avoiding evaluation of every possible feature combination.
- Identified subspaces are those where clusters are likely to form due to high local discriminability.
- The fewest LID components needed to capture local complexity mark the key contributing axes.
Where Pith is reading between the lines
- The per-point axis selection could be integrated into density-based clustering pipelines to focus computation on relevant dimensions.
- If the axis extension holds, the same decomposition might guide local feature weighting in outlier detection tasks.
- Experiments on data with planted axis-aligned structures would provide a direct test of whether the recovered subspaces match ground-truth clusters.
Load-bearing premise
That the Local Intrinsic Dimension model extends to axis projections in a way that preserves identification of features with greatest local discriminability.
What would settle it
A controlled experiment on synthetic data with known axis-aligned cluster subspaces where the LID decomposition fails to recover those subspaces would falsify the central claim.
Figures
read the original abstract
Axis-aligned subspace clustering generally entails searching through enormous numbers of subspaces (feature combinations) and evaluation of cluster quality within each subspace. In this paper, we tackle the problem of identifying subsets of features with the most significant contribution to the formation of the local neighborhood surrounding a given data point. For each point, the recently-proposed Local Intrinsic Dimension (LID) model is used in identifying the axis directions along which features have the greatest local discriminability, or equivalently, the fewest number of components of LID that capture the local complexity of the data. In this paper, we develop an estimator of LID along axis projections, and provide preliminary evidence that this LID decomposition can indicate axis-aligned data subspaces that support the formation of clusters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to develop an estimator of Local Intrinsic Dimension (LID) along axis projections for each data point, using it to identify axis directions with greatest local discriminability (equivalently, the fewest LID components capturing local complexity). This is positioned as a way to determine axis-aligned subspaces supporting cluster formation without exhaustive search over feature combinations, with preliminary experimental evidence provided.
Significance. If the estimator and its extension to projections are sound, the approach could offer a scalable, non-combinatorial alternative for subspace clustering in high dimensions by directly linking LID decomposition to feature relevance for local neighborhoods.
major comments (2)
- [Abstract] Abstract: the central claim rests on developing and validating an estimator of LID along axis projections, yet the provided text gives no derivation, no explicit extension of the prior LID model to projections, and no error analysis or bias discussion; this absence makes the soundness of the extension (that it preserves identification of features with greatest local discriminability) impossible to assess from the manuscript.
- [Abstract] Abstract: preliminary evidence is asserted for the claim that LID decomposition indicates axis-aligned subspaces supporting clusters, but no details are supplied on the experimental setup, datasets, baselines, quantitative metrics, or controls for confounding factors such as dimensionality or noise; without these the evidence cannot be evaluated as supporting the claim.
minor comments (1)
- [Abstract] Abstract: the parenthetical equivalence between 'greatest local discriminability' and 'fewest number of components of LID' is stated without supporting justification or reference to prior LID work; a brief clarification would improve readability.
Simulated Author's Rebuttal
We thank the referee for their thoughtful comments on our manuscript. We address each of the major comments below and will revise the abstract to better convey the key elements of our contribution.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim rests on developing and validating an estimator of LID along axis projections, yet the provided text gives no derivation, no explicit extension of the prior LID model to projections, and no error analysis or bias discussion; this absence makes the soundness of the extension (that it preserves identification of features with greatest local discriminability) impossible to assess from the manuscript.
Authors: While the abstract is concise by nature, the full manuscript details the derivation of the axis-projection estimator for LID in the methods section, explicitly extends the prior LID model to handle axis projections, and includes discussion of error and bias in the theoretical analysis. To address the concern that the abstract alone does not allow assessment, we will revise the abstract to include a brief overview of the estimator's development and its properties. revision: yes
-
Referee: [Abstract] Abstract: preliminary evidence is asserted for the claim that LID decomposition indicates axis-aligned subspaces supporting clusters, but no details are supplied on the experimental setup, datasets, baselines, quantitative metrics, or controls for confounding factors such as dimensionality or noise; without these the evidence cannot be evaluated as supporting the claim.
Authors: The experimental details, including the setup with synthetic and real datasets, comparison to baselines, metrics used for evaluation, and controls for dimensionality and noise, are presented in the experiments section of the paper. The abstract characterizes the results as 'preliminary evidence' to reflect their initial nature. We will update the abstract to provide a short summary of the experimental validation to make this clearer. revision: yes
Circularity Check
No significant circularity; derivation builds on independent prior LID model
full rationale
The paper develops an estimator of LID along axis projections by extending the recently-proposed LID model to identify axis directions of greatest local discriminability. No equations or derivations are shown that reduce the target result to a fitted parameter or self-defined quantity by construction. The central claim relies on the prior LID framework as an external input rather than re-deriving it from the subspace result itself. No self-citation chain, ansatz smuggling, or renaming of known results is exhibited in the provided text that would force the outcome. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Local Intrinsic Dimension (LID) model accurately captures local data complexity and can be meaningfully decomposed along axes.
Reference graph
Works this paper leans on
-
[1]
Achtert, E., B¨ ohm, C., David, J., Kr¨ oger, P., Zimek, A.: Global correlation clustering based on the Hough transform. Stat. Anal. Data Min. 1(3), 111–127 (2008)
work page 2008
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
-
[13]
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” mean- ingful? In: Proc. ICDT. pp. 217–235 (1999)
work page 1999
- [14]
- [15]
-
[16]
Casanova, G., Englmeier, E., Houle, M., Kr¨ oger, P., Nett, M., Schubert, E., Zimek, A.: Dimensional testing for reverse k-nearest neighbor search. PVLDB 10(7), 769–780 (2017)
work page 2017
- [17]
- [18]
- [19]
-
[20]
IEEE TKDE 19(7), 873–886 (2007)
Fran¸ cois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE TKDE 19(7), 873–886 (2007)
work page 2007
-
[21]
Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes. J. R. Statist. Soc. B 66(4), 825–849 (2004)
work page 2004
- [22]
- [23]
- [24]
-
[25]
Houle, M.E., Kriegel, H.P., Kr¨ oger, P., Schubert, E., Zimek, A.: Can shared-neighbor distances defeat the curse of dimensionality? In: Proc. SSDBM. pp. 482–500 (2010)
work page 2010
- [26]
-
[27]
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
work page 1985
-
[28]
The Teaching of Mathematics 8(1), 15–29 (2005)
Kadelburg, Z., Marjanovi´ c, M.: Interchanging two limits. The Teaching of Mathematics 8(1), 15–29 (2005)
work page 2005
- [29]
- [30]
-
[31]
Kriegel, H.P., Kr¨ oger, P., Zimek, A.: Clustering high dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD3(1), 1–58 (2009)
work page 2009
-
[32]
WIREs DMKD 2(4), 351–364 (2012)
Kriegel, H.P., Kr¨ oger, P., Zimek, A.: Subspace clustering. WIREs DMKD 2(4), 351–364 (2012)
work page 2012
- [33]
- [34]
-
[35]
Moise, G., Sander, J., Ester, M.: Robust projected clustering. KAIS 14(3), 273–298 (2008)
work page 2008
-
[36]
Muller, M.E.: A note on a method for generating points uniformly on n-dimensional spheres. Commun. ACM 2(4), 19–20 (Apr 1959)
work page 1959
- [37]
-
[38]
IEEE Transactions on Knowl- edge and Data Engineering 18(7), 902–916 (2006)
Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Transactions on Knowl- edge and Data Engineering 18(7), 902–916 (2006)
work page 2006
-
[39]
Romano, S., Chelly, O., Nguyen, V., Bailey, J., Houle, M.E.: Measuring dependency via intrinsic dimensionality. In: ICPR16. pp. 1207–1212 (Dec 2016)
work page 2016
-
[40]
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
work page 2000
-
[41]
Machine learning 89(1-2), 37–65 (2012)
Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Machine learning 89(1-2), 37–65 (2012)
work page 2012
- [42]
-
[43]
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
work page 2002
-
[44]
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clustering comparison: Variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)
work page 2010
-
[45]
Woo, K.G., Lee, J.H., Kim, M.H., Lee, Y.J.: FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting. Inform. Software Technol. 46(4), 255–271 (2004)
work page 2004
- [46]
-
[47]
IEEE TKDE 17(2), 176–189 (2005)
Yiu, M.L., Mamoulis, N.: Iterative projected clustering by subspace mining. IEEE TKDE 17(2), 176–189 (2005)
work page 2005
-
[48]
Zimek, A., Assent, I., Vreeken, J.: Frequent pattern mining algorithms for data clustering. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, chap. 16, pp. 403–423. Springer (2014)
work page 2014
-
[49]
Zimek, A., Schubert, E., Kriegel, H.P.: A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. 5(5), 363–387 (2012) 17
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.