pith. sign in

arxiv: 1907.06771 · v1 · pith:UPY6OVSZnew · submitted 2019-07-15 · 💻 cs.LG · stat.ML

Subspace Determination through Local Intrinsic Dimensional Decomposition: Theory and Experimentation

Pith reviewed 2026-05-24 21:10 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords subspace clusteringlocal intrinsic dimensionaxis-aligned subspacesfeature discriminabilityhigh-dimensional dataLID estimationlocal neighborhoodscluster formation
0
0 comments X

The pith

Decomposing local intrinsic dimension along feature axes identifies subspaces that support cluster formation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to estimate local intrinsic dimension along individual axis projections for each data point. This decomposition highlights the feature directions that contribute most to the local complexity and discriminability around points. By focusing on these directions, it aims to find axis-aligned subspaces without exhaustively searching all feature combinations. If successful, this could make subspace clustering more efficient in high-dimensional spaces where many features may be irrelevant locally.

Core claim

An estimator of LID along axis projections is developed, and preliminary evidence is provided that this LID decomposition can indicate axis-aligned data subspaces that support the formation of clusters, by identifying axes with the greatest local discriminability or fewest LID components capturing local complexity.

What carries the argument

The estimator of local intrinsic dimension along axis projections, which identifies directions of greatest local discriminability.

If this is right

  • For each data point the axes with highest LID components indicate the most discriminative features locally.
  • The decomposition reduces the search space for subspace clustering by avoiding evaluation of every possible feature combination.
  • Identified subspaces are those where clusters are likely to form due to high local discriminability.
  • The fewest LID components needed to capture local complexity mark the key contributing axes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The per-point axis selection could be integrated into density-based clustering pipelines to focus computation on relevant dimensions.
  • If the axis extension holds, the same decomposition might guide local feature weighting in outlier detection tasks.
  • Experiments on data with planted axis-aligned structures would provide a direct test of whether the recovered subspaces match ground-truth clusters.

Load-bearing premise

That the Local Intrinsic Dimension model extends to axis projections in a way that preserves identification of features with greatest local discriminability.

What would settle it

A controlled experiment on synthetic data with known axis-aligned cluster subspaces where the LID decomposition fails to recover those subspaces would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.06771 by Arthur Zimek, Imane Hafnaoui, Michael E. Houle, Pan Li, Ruben Becker.

Figure 1
Figure 1. Figure 1: Consider a circular neighborhood of a reference point [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Plot for the results for estimators hill distances, sum hill projections, and sum w hill projections in a neighborhood of size k = 100, for dimensions m between 2 and 1024. Note that the neighborhoods for hill distances and sum w hill projections are chosen with respect to the Euclidean norm, while the neighborhoods for sum hill projections are chosen with respect to the L∞ norm. The errorbars denote 95% c… view at source ↗
Figure 3
Figure 3. Figure 3: ID estimates as they were computed by sum hill projections in an L∞ neighbor￾hood of k = 100 points, for different combinations of uniform distributions. We compare this value IDd∗ F with the sum Pm i=1 IDd∗ i , where we consider two ways of obtaining the estimates IDd∗ i . In the first case (sum hill projections), we pick k nearest neighbors with respect to the L∞ norm. In the second case (sum w hill proj… view at source ↗
Figure 4
Figure 4. Figure 4: DiSH 3D Dataset. (a) Description. d ||S|| Noisy Ai T1 30 {5, 5, 5, 5, 5} 5 T2 50 {3, 5, 7, 7, 11} 17 T3 100 {3, 5, 7, 7, 11} 67 (b) Results. NMI AMI ARI Recall T1 DiSH 0.535 0.362 0.264 0.582 CLIQUE 0.431 0.275 0.303 0.635 LID-DBSCAN 0.801 0.734 0.803 0.726 T2 DiSH 0.568 0.396 0.532 0.7 CLIQUE 0.644 0.473 0.568 0.78 LID-DBSCAN 0.779 0.695 0.716 0.765 T3 DiSH 0.570 0.397 0.412 0.702 CLIQUE 0.644 0.473 0.568… view at source ↗
Figure 5
Figure 5. Figure 5: Clustering results on the DiSH 3D dataset shown in Figure 4. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: With respect to both RNIA and ARR, LID decomposition significantly outperforms [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
read the original abstract

Axis-aligned subspace clustering generally entails searching through enormous numbers of subspaces (feature combinations) and evaluation of cluster quality within each subspace. In this paper, we tackle the problem of identifying subsets of features with the most significant contribution to the formation of the local neighborhood surrounding a given data point. For each point, the recently-proposed Local Intrinsic Dimension (LID) model is used in identifying the axis directions along which features have the greatest local discriminability, or equivalently, the fewest number of components of LID that capture the local complexity of the data. In this paper, we develop an estimator of LID along axis projections, and provide preliminary evidence that this LID decomposition can indicate axis-aligned data subspaces that support the formation of clusters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to develop an estimator of Local Intrinsic Dimension (LID) along axis projections for each data point, using it to identify axis directions with greatest local discriminability (equivalently, the fewest LID components capturing local complexity). This is positioned as a way to determine axis-aligned subspaces supporting cluster formation without exhaustive search over feature combinations, with preliminary experimental evidence provided.

Significance. If the estimator and its extension to projections are sound, the approach could offer a scalable, non-combinatorial alternative for subspace clustering in high dimensions by directly linking LID decomposition to feature relevance for local neighborhoods.

major comments (2)
  1. [Abstract] Abstract: the central claim rests on developing and validating an estimator of LID along axis projections, yet the provided text gives no derivation, no explicit extension of the prior LID model to projections, and no error analysis or bias discussion; this absence makes the soundness of the extension (that it preserves identification of features with greatest local discriminability) impossible to assess from the manuscript.
  2. [Abstract] Abstract: preliminary evidence is asserted for the claim that LID decomposition indicates axis-aligned subspaces supporting clusters, but no details are supplied on the experimental setup, datasets, baselines, quantitative metrics, or controls for confounding factors such as dimensionality or noise; without these the evidence cannot be evaluated as supporting the claim.
minor comments (1)
  1. [Abstract] Abstract: the parenthetical equivalence between 'greatest local discriminability' and 'fewest number of components of LID' is stated without supporting justification or reference to prior LID work; a brief clarification would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful comments on our manuscript. We address each of the major comments below and will revise the abstract to better convey the key elements of our contribution.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim rests on developing and validating an estimator of LID along axis projections, yet the provided text gives no derivation, no explicit extension of the prior LID model to projections, and no error analysis or bias discussion; this absence makes the soundness of the extension (that it preserves identification of features with greatest local discriminability) impossible to assess from the manuscript.

    Authors: While the abstract is concise by nature, the full manuscript details the derivation of the axis-projection estimator for LID in the methods section, explicitly extends the prior LID model to handle axis projections, and includes discussion of error and bias in the theoretical analysis. To address the concern that the abstract alone does not allow assessment, we will revise the abstract to include a brief overview of the estimator's development and its properties. revision: yes

  2. Referee: [Abstract] Abstract: preliminary evidence is asserted for the claim that LID decomposition indicates axis-aligned subspaces supporting clusters, but no details are supplied on the experimental setup, datasets, baselines, quantitative metrics, or controls for confounding factors such as dimensionality or noise; without these the evidence cannot be evaluated as supporting the claim.

    Authors: The experimental details, including the setup with synthetic and real datasets, comparison to baselines, metrics used for evaluation, and controls for dimensionality and noise, are presented in the experiments section of the paper. The abstract characterizes the results as 'preliminary evidence' to reflect their initial nature. We will update the abstract to provide a short summary of the experimental validation to make this clearer. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation builds on independent prior LID model

full rationale

The paper develops an estimator of LID along axis projections by extending the recently-proposed LID model to identify axis directions of greatest local discriminability. No equations or derivations are shown that reduce the target result to a fitted parameter or self-defined quantity by construction. The central claim relies on the prior LID framework as an external input rather than re-deriving it from the subspace result itself. No self-citation chain, ansatz smuggling, or renaming of known results is exhibited in the provided text that would force the outcome. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review conducted from abstract only; no explicit free parameters, axioms, or invented entities are detailed. The LID model itself is treated as a background assumption from prior work.

axioms (1)
  • domain assumption The Local Intrinsic Dimension (LID) model accurately captures local data complexity and can be meaningfully decomposed along axes.
    Invoked as the foundation for the new estimator and subspace indication claim.

pith-pipeline@v0.9.0 · 5660 in / 1047 out tokens · 17507 ms · 2026-05-24T21:10:45.037893+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

  1. [1]

    Achtert, E., B¨ ohm, C., David, J., Kr¨ oger, P., Zimek, A.: Global correlation clustering based on the Hough transform. Stat. Anal. Data Min. 1(3), 111–127 (2008)

  2. [2]

    In: Proc

    Achtert, E., B¨ ohm, C., Kriegel, H.P., Kr¨ oger, P., M¨ uller-Gorman, I., Zimek, A.: Finding hierarchies of subspace clusters. In: Proc. PKDD. pp. 446–453 (2006)

  3. [3]

    In: Proc

    Achtert, E., B¨ ohm, C., Kriegel, H.P., Kr¨ oger, P., M¨ uller-Gorman, I., Zimek, A.: Detection and visualization of subspace cluster hierarchies. In: Proc. DASFAA. pp. 152–163 (2007)

  4. [4]

    In: Proc

    Achtert, E., B¨ ohm, C., Kriegel, H.P., Kr¨ oger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. In: Proc. SDM. pp. 413–418 (2007)

  5. [5]

    In: Proc

    Aggarwal, C.C., Procopiuc, C.M., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proc. SIGMOD. pp. 61–72 (1999)

  6. [6]

    In: Proc

    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. SIGMOD. pp. 94–105 (1998)

  7. [7]

    In: Proc

    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. VLDB. pp. 487–499 (1994)

  8. [8]

    In: Proc

    Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K., Nett, M.: Estimating local intrinsic dimensionality. In: Proc. KDD. pp. 29–38 (2015)

  9. [9]

    In: Proc

    Assent, I., Krieger, R., M¨ uller, E., Seidl, T.: DUSC: dimensionality unbiased subspace clustering. In: Proc. ICDM. pp. 409–414 (2007)

  10. [10]

    In: Proc

    Assent, I., Krieger, R., M¨ uller, E., Seidl, T.: EDSC: efficient density-based subspace clus- tering. In: Proc. CIKM. pp. 1093–1102 (2008)

  11. [11]

    In: Proc

    Becker, R., Hafnaoui, I., Houle, M.E., Li, P., Zimek, A.: Subspace determination through local intrinsic dimensional decomposition. In: Proc. SISAP (2019)

  12. [12]

    In: Proc

    Bernecker, T., Emrich, T., Graf, F., Kriegel, H.P., Kr¨ oger, P., Renz, M., Schubert, E., Zimek, A.: Subspace similarity search: Efficient k-NN queries in arbitrary subspaces. In: Proc. SSDBM. pp. 555–564 (2010)

  13. [13]

    nearest neighbor

    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” mean- ingful? In: Proc. ICDT. pp. 217–235 (1999)

  14. [14]

    In: Proc

    B¨ ohm, C., Kailing, K., Kriegel, H.P., Kr¨ oger, P.: Density connected clustering with local subspace preferences. In: Proc. ICDM. pp. 27–34 (2004)

  15. [15]

    In: Proc

    B¨ ohm, C., Kailing, K., Kr¨ oger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proc. SIGMOD. pp. 455–466 (2004) 15

  16. [16]

    PVLDB 10(7), 769–780 (2017)

    Casanova, G., Englmeier, E., Houle, M., Kr¨ oger, P., Nett, M., Schubert, E., Zimek, A.: Dimensional testing for reverse k-nearest neighbor search. PVLDB 10(7), 769–780 (2017)

  17. [17]

    In: Proc

    Cheng, C.H., Fu, A.W.C., Zhang, Y.: Entropy-based subspace clustering for mining nu- merical data. In: Proc. KDD. pp. 84–93 (1999)

  18. [18]

    In: Proc

    Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: Proc. SDM (2004)

  19. [19]

    In: Proc

    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. KDD. pp. 226–231 (1996)

  20. [20]

    IEEE TKDE 19(7), 873–886 (2007)

    Fran¸ cois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE TKDE 19(7), 873–886 (2007)

  21. [21]

    Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes. J. R. Statist. Soc. B 66(4), 825–849 (2004)

  22. [22]

    In: Proc

    Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: Proc. ICDM Workshops. pp. 468–473 (2013)

  23. [23]

    In: Proc

    Houle, M.E.: Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In: Proc. SISAP. pp. 64–79 (2017)

  24. [24]

    In: Proc

    Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: Proc. SISAP. pp. 80–95 (2017)

  25. [25]

    Houle, M.E., Kriegel, H.P., Kr¨ oger, P., Schubert, E., Zimek, A.: Can shared-neighbor distances defeat the curse of dimensionality? In: Proc. SSDBM. pp. 482–500 (2010)

  26. [26]

    In: Proc

    Houle, M.E., Ma, X., Oria, V., Sun, J.: Efficient algorithms for similarity search in axis- aligned subspaces. In: Proc. SISAP. pp. 1–12 (2014)

  27. [27]

    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

  28. [28]

    The Teaching of Mathematics 8(1), 15–29 (2005)

    Kadelburg, Z., Marjanovi´ c, M.: Interchanging two limits. The Teaching of Mathematics 8(1), 15–29 (2005)

  29. [29]

    In: Proc

    Kailing, K., Kriegel, H.P., Kr¨ oger, P.: Density-connected subspace clustering for high- dimensional data. In: Proc. SDM. pp. 246–257 (2004)

  30. [30]

    In: Proc

    Kriegel, H.P., Kr¨ oger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: Proc. ICDM. pp. 250–257 (2005)

  31. [31]

    ACM TKDD3(1), 1–58 (2009)

    Kriegel, H.P., Kr¨ oger, P., Zimek, A.: Clustering high dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD3(1), 1–58 (2009)

  32. [32]

    WIREs DMKD 2(4), 351–364 (2012)

    Kriegel, H.P., Kr¨ oger, P., Zimek, A.: Subspace clustering. WIREs DMKD 2(4), 351–364 (2012)

  33. [33]

    In: Proc

    Ma, X., Li, B., Wang, Y., Erfani, S.M., Wijewickrema, S.N.R., Schoenebeck, G., Song, D., Houle, M.E., Bailey, J.: Characterizing adversarial subspaces using local intrinsic dimen- sionality. In: Proc. ICLR. pp. 1–15 (2018)

  34. [34]

    In: Proc

    Ma, X., Wang, Y., Houle, M.E., Zhou, S., Erfani, S.M., Xia, S., Wijewickrema, S.N.R., Bailey, J.: Dimensionality-driven learning with noisy labels. In: Proc. ICML. pp. 3361–3370 (2018) 16

  35. [35]

    KAIS 14(3), 273–298 (2008)

    Moise, G., Sander, J., Ester, M.: Robust projected clustering. KAIS 14(3), 273–298 (2008)

  36. [36]

    Muller, M.E.: A note on a method for generating points uniformly on n-dimensional spheres. Commun. ACM 2(4), 19–20 (Apr 1959)

  37. [37]

    In: Proc

    Nagesh, H.S., Goil, S., Choudhary, A.: Adaptive grids for clustering massive data sets. In: Proc. SDM. pp. 1–17 (2001)

  38. [38]

    IEEE Transactions on Knowl- edge and Data Engineering 18(7), 902–916 (2006)

    Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Transactions on Knowl- edge and Data Engineering 18(7), 902–916 (2006)

  39. [39]

    In: ICPR16

    Romano, S., Chelly, O., Nguyen, V., Bailey, J., Houle, M.E.: Measuring dependency via intrinsic dimensionality. In: ICPR16. pp. 1207–1212 (Dec 2016)

  40. [40]

    Science 290, 2323–2326 (2000)

    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

  41. [41]

    Machine learning 89(1-2), 37–65 (2012)

    Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Machine learning 89(1-2), 37–65 (2012)

  42. [42]

    Data Min

    Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace cluster- ing. Data Min. Knowl. Disc. 26(2), 332–397 (2013)

  43. [43]

    Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

  44. [44]

    Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clustering comparison: Variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)

  45. [45]

    Woo, K.G., Lee, J.H., Kim, M.H., Lee, Y.J.: FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting. Inform. Software Technol. 46(4), 255–271 (2004)

  46. [46]

    In: Proc

    Yip, K.Y., Cheung, D.W., Ng, M.K.: On discovery of extremely low-dimensional clusters using semi-supervised projected clustering. In: Proc. ICDE. pp. 329–340 (2005)

  47. [47]

    IEEE TKDE 17(2), 176–189 (2005)

    Yiu, M.L., Mamoulis, N.: Iterative projected clustering by subspace mining. IEEE TKDE 17(2), 176–189 (2005)

  48. [48]

    In: Aggarwal, C.C., Han, J

    Zimek, A., Assent, I., Vreeken, J.: Frequent pattern mining algorithms for data clustering. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, chap. 16, pp. 403–423. Springer (2014)

  49. [49]

    Zimek, A., Schubert, E., Kriegel, H.P.: A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. 5(5), 363–387 (2012) 17