pith. sign in

arxiv: 2406.17952 · v2 · submitted 2024-06-25 · 💻 cs.LG · cs.CG

LINSCAN -- A Linearity Based Clustering Algorithm

Pith reviewed 2026-05-24 00:11 UTC · model grok-4.3

classification 💻 cs.LG cs.CG
keywords clusteringlineated clustersDBSCANKullback-Leibler divergencenormal distributionsseismic datafault detection
0
0 comments X

The pith

LINSCAN detects lineated clusters that are spatially close but orthogonally oriented by embedding points as local normal distributions and using a KL-divergence distance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LINSCAN as a new clustering method that builds on DBSCAN and OPTICS to handle lineated structures. Points are embedded as normal distributions fitted to their local neighborhoods so that covariance directions capture linearity. A distance derived from the Kullback-Leibler divergence then separates clusters whose directions are orthogonal even when the clusters occupy overlapping space. The method is demonstrated on seismic data to recover active faults and their orientations, including cases where faults intersect. The authors also outline stability properties that any generalization of density-based clustering must preserve.

Core claim

By embedding points as normal distributions approximating their local neighborhoods and leveraging a distance function derived from the Kullback-Leibler Divergence, LINSCAN can detect and distinguish lineated clusters that are spatially close but have orthogonal covariances.

What carries the argument

Embedding each point as a normal distribution fitted to its local neighborhood, paired with a distance derived from the Kullback-Leibler Divergence between those distributions.

If this is right

  • The algorithm recovers intersecting faults and their orientations from seismic data.
  • It extends density-based clustering while aiming to keep the stability properties of DBSCAN and OPTICS.
  • Any successful generalization of DBSCAN or OPTICS must satisfy specific stability conditions stated in the paper.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same local-normal embedding could be tested on other data with crossing linear features such as road networks or filamentary structures in images.
  • Replacing the fixed neighborhood size with an adaptive scale might improve performance on lines of varying density.
  • The KL distance could be swapped for other divergences between Gaussians to check whether separation of orthogonal directions remains robust.

Load-bearing premise

Local neighborhoods can be approximated by normal distributions whose covariances capture linear direction well enough for the KL distance to separate orthogonal clusters without extra tuning.

What would settle it

Run LINSCAN on a synthetic dataset of two thin, crossing line segments whose centers are close and whose directions are orthogonal; failure to recover them as separate clusters would falsify the separation claim.

Figures

Figures reproduced from arXiv: 2406.17952 by Alexander Cloninger, Andrew Dennehy, Shabnam J. Semnani, Xiaoyu Zou, Yuri Fialko.

Figure 1
Figure 1. Figure 1: Test Data (a) Data (b) DBSCAN Results identify active seismogenic faults based on epicentral locations of micro-earthquakes [8, 2, 4]. While faults are three￾dimensional quasi-planar surfaces, in appropriate projections they appear as linear features, so that the associated locations of micro-earthquakes can be recognized as quasi-linear features after accounting for noise. To highlight the deficiency of e… view at source ↗
Figure 2
Figure 2. Figure 2: Crossing Lines (a) Data (b) DBSCAN results (c) LINSCAN Resulfts (d) ADCN Result (e) ADCN Result, Differ￾ent Initialization [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Synthetic Data (a) LINSCAN Results (b) LINSCAN Results with Noise Removed (c) LINSCAN Results with Isotropic Clusters Re￾moved 4.2. Measuring Performance To quantitatively evaluate the algorithm performance, we conducted several tests on synthetic, labeled data. We generated 10 linear clusters, 5 isotropic clusters, and 10 pairs of linear clusters intersecting at angles in the range [.1π, .9π] and separate… view at source ↗
Figure 4
Figure 4. Figure 4: Real Data (a) Data (b) LINSCAN results (c) LINSCAN Results with Noise Removed (d) LINSCAN Results with Isotropic Clusters Removed So, R(C, C ′ ) is the fraction of pairs of elements of X such that C and C ′ both agree about whether the pair of elements lie in the same cluster or not. Note that R is symmetric in C and C ′ and lies in the interval [0, 1]. However, random partitions are not guaranteed to have… view at source ↗
Figure 5
Figure 5. Figure 5: Generated Data (a) Dataset with True Labels (b) LINSCAN Results (c) OPTICS Results 6. Code availability Name of Repository: LINSCAN Public Contact: adennehy@uchicago.edu Hardware requirements: CPU Program language: Python, C Software required: Python, CPython Program size: 30 MB Author Remark: Running the code requires compiling a C function for use in the python script. Instructions are included in the re… view at source ↗
read the original abstract

DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN, designed to seek lineated clusters that are difficult to find and isolate with existing methods. In particular, by embedding points as normal distributions approximating their local neighborhoods and leveraging a distance function derived from the Kullback Leibler Divergence, LINSCAN can detect and distinguish lineated clusters that are spatially close but have orthogonal covariances. We demonstrate how LINSCAN can be applied to seismic data to identify active faults, including intersecting faults, and determine their orientation. Finally, we discuss the properties a generalization of DBSCAN and OPTICS must have in order to retain the stability benefits of these algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces LINSCAN as an extension of DBSCAN and OPTICS for detecting lineated clusters. Points are embedded as normal distributions approximating local neighborhoods, and a distance derived from the Kullback-Leibler Divergence is used to distinguish clusters that are spatially close but have orthogonal covariances. The method is applied to seismic data to identify active faults including intersecting ones, and the paper discusses the properties needed for generalizations of DBSCAN/OPTICS to retain stability benefits.

Significance. If the local-normal embedding and KL-distance construction can be shown to separate orthogonal lineated structures without neighborhood contamination or hidden parameters, the approach would be significant for applications such as fault detection in seismology where standard density-based methods fail on intersecting linear features. The explicit focus on parameter-free operation and stability inheritance from DBSCAN/OPTICS is a constructive framing.

major comments (2)
  1. [Abstract] Abstract: the central claim that normal-distribution embeddings of local neighborhoods yield covariances whose principal axes reflect only one lineated cluster (enabling reliable KL-based separation of orthogonal structures) is load-bearing yet unsupported by any derivation, definition of the neighborhood, or pseudocode; without these the claim cannot be checked against the contamination mode in which boundary points mix points from spatially close orthogonal clusters.
  2. [Final section] Final section (stability discussion): the claimed retention of DBSCAN/OPTICS stability is not shown to survive the case of intersecting or spatially adjacent lineated clusters, where any fixed or adaptive neighborhood produces a mixture covariance; the closed-form KL divergence between such mixtures does not automatically encode orthogonality, leaving the motivating use-case unaddressed.
minor comments (1)
  1. The manuscript would be strengthened by the addition of explicit pseudocode for the embedding step and the KL-distance computation, together with at least one small synthetic example showing the covariance matrices and resulting distances for two orthogonal lines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and indicate the revisions that will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that normal-distribution embeddings of local neighborhoods yield covariances whose principal axes reflect only one lineated cluster (enabling reliable KL-based separation of orthogonal structures) is load-bearing yet unsupported by any derivation, definition of the neighborhood, or pseudocode; without these the claim cannot be checked against the contamination mode in which boundary points mix points from spatially close orthogonal clusters.

    Authors: We agree that the abstract claim is central and that the current manuscript does not supply an explicit derivation, a precise definition of the neighborhood used for the normal embedding, or pseudocode. These omissions make it difficult to verify the claim against boundary contamination. We will revise the manuscript to add (i) a formal definition of the local neighborhood, (ii) a short derivation showing that the fitted covariance principal axis aligns with a single lineated structure under the stated assumptions, (iii) pseudocode for the embedding step, and (iv) a brief discussion of how boundary points are treated to limit mixing from orthogonal clusters. revision: yes

  2. Referee: [Final section] Final section (stability discussion): the claimed retention of DBSCAN/OPTICS stability is not shown to survive the case of intersecting or spatially adjacent lineated clusters, where any fixed or adaptive neighborhood produces a mixture covariance; the closed-form KL divergence between such mixtures does not automatically encode orthogonality, leaving the motivating use-case unaddressed.

    Authors: We acknowledge that the stability discussion in the final section is framed at a general level and does not contain a specific analysis of mixture covariances arising from intersecting or adjacent lineated clusters. While the seismic application demonstrates successful separation of intersecting faults, this is empirical rather than a theoretical guarantee that the KL-based distance preserves orthogonality under mixture covariances. We will expand the final section to (a) explicitly consider the mixture case, (b) clarify the scope of the inherited stability claim, and (c) either provide additional reasoning or note the limitation and rely on the empirical results for the motivating use-case. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation is self-contained extension of DBSCAN/OPTICS

full rationale

No equations, fitted parameters, or self-citations are exhibited in the provided text that would reduce the claimed KL-based distance or local normal embeddings to a definitionally equivalent input. The algorithm is presented as a direct construction using standard KL divergence on covariance approximations, with stability properties argued from generalization of existing methods rather than internal redefinition. The reader's assessment of score 2.0 aligns with absence of load-bearing reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions stated in the abstract: that local neighborhoods admit normal-distribution approximations whose covariances encode linearity, and that KL divergence on those distributions separates orthogonal orientations. No free parameters or invented entities are named in the abstract.

axioms (2)
  • domain assumption Local neighborhoods of points can be approximated by normal distributions whose covariance matrices capture linear structure
    Explicitly invoked by the phrase 'embedding points as normal distributions approximating their local neighborhoods'
  • domain assumption A distance derived from Kullback-Leibler divergence between these distributions distinguishes clusters with orthogonal covariances
    Stated as the mechanism that enables detection of spatially close but orthogonally oriented lineated clusters

pith-pipeline@v0.9.0 · 5676 in / 1409 out tokens · 25338 ms · 2026-05-24T00:11:59.038358+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    Ester, H.-P

    M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD ’96, AAAI Press, 1996, p. 226–231

  2. [2]

    Fialko, Estimation of absolute stress in the hypocentral region of the 2019 Ridgecrest, California, earthquakes, J

    Y . Fialko, Estimation of absolute stress in the hypocentral region of the 2019 Ridgecrest, California, earthquakes, J. Geophys. Res. 126 (2021) e2021JB022000

  3. [3]

    X. Zou, Y . Fialko, A. Dennehy, A. Cloninger, S. Semnani, High-angle active conjugate faults in the Anza-Borrego shear zone, Southern California, Geophys. Res. Lett. 50 (2023) e2023GL105783

  4. [4]

    D. R. Shelly, R. J. Skoumal, J. L. Hardebeck, Fracture-mesh faulting in the swarm-like 2020 Maacama sequence revealed by high-precision earthquake detection, location, and focal mechanisms, Geophys. Res. Lett. 50 (2023) e2022GL101233

  5. [5]

    Barden, Stresses and displacements in a cross-anisotropic soil, Geotechnique 13 (3) (1963) 198–210

    L. Barden, Stresses and displacements in a cross-anisotropic soil, Geotechnique 13 (3) (1963) 198–210

  6. [6]

    E. H. Isaaks, M. Srivastava, Applied geostatistics (1989)

  7. [7]

    G. Mai, K. Janowicz, Y . Hu, S. Gao, Adcn: An anisotropic density-based clustering algorithm for discovering spatial point patterns with noise, Transactions in GIS 22 (1) (2018) 348–369

  8. [8]

    E. S. Cochran, R. J. Skoumal, D. McPhillips, Z. E. Ross, K. M. Keranen, Activation of optimally and unfavourably oriented faults in a uniform local stress field during the 2011 Prague, Oklahoma, sequence, Geophys. J. Int. 222 (2020) 153–168. 8

  9. [9]

    G. Mai, K. Janowicz, Y . Hu, S. Gao, Adcn: An anisotropic density-based clustering algorithm, in: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPACIAL ’16, Association for Computing Machinery, New York, NY , USA, 2016.doi:10.1145/2996913.2996940

  10. [10]

    Ankerst, M

    M. Ankerst, M. M. Breunig, H.-P. Kriegel, J. Sander, Optics: Ordering points to identify the clustering structure, SIGMOD Rec. 28 (2) (1999) 49–60

  11. [11]

    M. A. Khamsi, W. A. Kirk, An introduction to metric spaces and fixed point theory, 304 pp., John Wiley & Sons, 2011

  12. [12]

    Zhang, W

    Y . Zhang, W. Liu, Z. Chen, K. Li, J. Wang, On the properties of Kullback-Leibler divergence between Gaussians (2021)

  13. [13]

    Fialko, Z

    Y . Fialko, Z. Jin, Simple shear origin of the cross-faults ruptured in the 2019 Ridgecrest earthquake sequence, Nature Geoscience 14 (2021) 513–518

  14. [14]

    Hubert, P

    L. Hubert, P. Arabie, Comparing partitions, Journal of Classification 2 (1) (1985) 193–218. Appendix A. Approximation of KL(P|Q) First, log |A| is the logarithm of the product of the eigenvalues ofA, which is the same as the sum of the logarithms of the eigenvalues. Therefore, log |A| = tr(log(A)) where log(A) is the matrix logarithm, which exists and is ...