pith. sign in

arxiv: 2605.23150 · v1 · pith:6FI2WRELnew · submitted 2026-05-22 · 🌌 astro-ph.SR · astro-ph.IM

An Improved HDBSCAN-based Detection and Tracking Method for Solar Active Regions in Magnetograms

Pith reviewed 2026-05-25 03:29 UTC · model grok-4.3

classification 🌌 astro-ph.SR astro-ph.IM
keywords solar active regionsHDBSCANmagnetogramsdetection and trackingspace weatherpolarity inversion lineSOHO/MDISDO/HMI
0
0 comments X

The pith

HDBSCAN clustering detects solar active regions more sensitively and tracks them with greater continuity than prior DBSCAN methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HARDAT, which applies hierarchical density-based clustering to line-of-sight magnetograms from SOHO/MDI and SDO/HMI. This replaces fixed-threshold DBSCAN with an approach that adapts to magnetic structures of different densities, allowing detection of small and diffuse active regions while preserving their shapes. A tracking component uses solar differential rotation and Hamming distance to maintain object identities across time. A support vector method extracts polarity inversion lines. Evaluation against NOAA and DSARD catalogues shows gains in sensitivity, accuracy, and stability, particularly for clustered regions.

Core claim

The HDBSCAN-based solar active region detection and tracking (HARDAT) method processes magnetograms to identify active regions by clustering points without requiring fixed density thresholds, enabling effective handling of multi-density structures. It incorporates a differential rotation model for tracking and a support vector classification method for extracting polarity inversion lines. When applied to two decades of SOHO/MDI and SDO/HMI observations, HARDAT achieves higher sensitivity and stability in detection and tracking than previous DBSCAN-based frameworks when benchmarked against established catalogues.

What carries the argument

The HDBSCAN algorithm, which performs hierarchical density-based clustering to identify active regions in varying density magnetic fields without preset parameters.

If this is right

  • Improved detection of small and diffuse active regions that fixed-threshold methods may miss.
  • Better separation and tracking of individual regions inside dense clusters.
  • More reliable long-term identity maintenance for studies of active region evolution.
  • Greater suitability for integration into operational space weather forecasting pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on full-disk vector magnetograms to check whether performance gains persist when more magnetic information is available.
  • Direct head-to-head comparison against convolutional neural network detectors on identical magnetogram sequences would clarify relative strengths.
  • The polarity inversion line extraction step might be combined with existing flare prediction models to test whether boundary accuracy improves forecast skill.

Load-bearing premise

The NOAA and DSARD catalogues provide an unbiased and complete reference for correct active region detections and tracks.

What would settle it

Independent labeling of active regions by multiple solar physicists on a held-out sample of magnetograms, followed by re-evaluation showing that HARDAT does not outperform the baselines in sensitivity or identity continuity.

Figures

Figures reproduced from arXiv: 2605.23150 by C. X. Shi, P. F. Chen, Q. Hao, Y. Guo.

Figure 1
Figure 1. Figure 1: The DSARD result for the SOHO/MDI magnetogram on 2000 May 20, with minDistance set to 20 Mm and minSize set to 70 Mm2 . Panel (a) The result after threshold segmentation. The blue box area is the target region. The right panels are the result of the target region after each part of DSARD processing. Panel (b) Clustering results of the first global DBSCAN. Panel (c) Clustering results of the second reformat… view at source ↗
Figure 2
Figure 2. Figure 2: Panel (a) is the result under the parameters in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: HDBSCAN-based solar active region detection result of the SOHO/MDI magnetogram on 2000 May 20. Panel (a) HDBSCAN clustering results. Panel (b) DBSCAN clustering results. Panel (c) Better integration with β set to 0.5 and γ set to 0.5. This necessitates adjusting two additional parameters, minSamples_H and ϵ_ClusterSelection, to achieve optimal results [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Solar active region detection results of the SOHO/MDI magnetogram on 1999 August 1 (a, b) and 2012 June 7 (c, d). Each red box with a number represents a bounding rectangle of a solar active region identified by DSARD (a, c) or HARD (b, d). The blue plus signs show the labeled ARs of NOAA. 4. TRACKING Due to the dynamic nature of solar active regions, which encompasses processes such as merging, splitting,… view at source ↗
Figure 5
Figure 5. Figure 5: shows an example to display the entire Pre￾diction and Match process for AR2 in [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: An example of tracking result. Panel (a) The active regions detected by the HARD method on 2022 February 1. Panels (b–j) The trajectory of the active region AR2 from February 1 to February 9, with a time interval of one day. by calculating the proportion of ground truth targets that are successfully tracked for most of their duration. These three metrics are bounded within the interval [0, 1], where larger… view at source ↗
Figure 7
Figure 7. Figure 7: Two line charts of the three tracking evaluation indicators MOTA (blue), IDR (green) and R_MT (red) as a function of lifetime threshold of the NOAA active regions. The dotted line represents the DSARD method, while the solid line represents the HARD method. The histogram above shows the number of NOAA numbers under the current lifetime threshold. Panel (a) The result of SOHO/MDI magnetograms. Panel (b) The… view at source ↗
Figure 8
Figure 8. Figure 8: PIL extraction based on the RBF kernel SVC classification. Panel (a) The HARD detection results of the SDO/HMI magnetogram on 2023 February 1. The right two columns represent the PIL extraction steps for the active regions AR2 and AR4, respectively. Panel (b) and panel (c) represent the original image after threshold segmentation. Panel (d) and panel (e) show the extraction result without filtering. Panel … view at source ↗
Figure 9
Figure 9. Figure 9: Results of PIL-mask extraction using different sigma and n_cluster parameters are presented, with the red areas denoting the extracted PIL regions. The active region selected is the same as that shown in [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: An example of PIL-svc extraction on July 15, 2023. Panel (a) The HARD detection results. Panel (b–d) are similar to [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The flowchart of the HDBSCAN clustering applied to solar magnetograms on 2023 February 1. Panel (a) The result after threshold segmentation. Panel (b) Minimum spanning tree construction with random 5000 pixels from (a). Panel (c) Single linkage hierarchy. Panel (d) Condensed hierarchy tree. Panel (e) The HDBSCAN clustering result. Panel (f) The clusters with bounding box and the background [PITH_FULL_IMA… view at source ↗
Figure 12
Figure 12. Figure 12: Thermogram of the different evaluation indicators changes depending on the parameters β and γ. Take the lifetime threshold of the NOAA active regions to be 2. The first row corresponds to 2000–2002 (the maximum of solar cycle 23), the second row corresponds to 2003–2005 (the decay phase of solar cycle 23), and the third row corresponds to 2007–2009 (the minimum of solar cycle 23) [PITH_FULL_IMAGE:figures… view at source ↗
read the original abstract

Solar active regions (ARs) are the primary source of solar eruptions and space weather. Accurate detection and tracking of ARs is crucial for understanding their evolution and predicting solar activities. In the previous work, based on the density-based spatial clustering of applications with noise (DBSCAN) approach, we proposed the DBSCAN-based solar active region detection (DSARD) framework. To overtake its limitations, in this paper we applied the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) approach to the detection of solar active regions, which is called the HDBSCAN-based solar active region detection and tracking (HARDAT) method. This enables the algorithm to handle multi-density magnetic structures dynamically, eliminating the need for fixed thresholds. Consequently, the algorithm can detect diffuse and small ARs more effectively while preserving morphological integrity. We have also developed a solar differential rotation based tracking algorithm that integrates physical motion models and Hamming distance similarity metrics to achieve robust multi-object tracking. Additionally, we propose a novel polarity inversion line extraction method that uses support vector classification, which offers superior generalization for complex AR boundaries. Processing line-of-sight magnetograms from SOHO/MDI (1996--2011) and SDO/HMI (2010--2024) and evaluating them against the National Oceanic and Atmospheric Administration (NOAA) and DSARD catalogues demonstrates that HARDAT is superior in terms of sensitivity, accuracy, and stability of detection and tracking. This is particularly evident when resolving clustered ARs and maintaining identity continuity. HARDAT therefore offers a comprehensive solution for the long-term analysis of AR evolution and space weather prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents HARDAT, an HDBSCAN-based pipeline for detecting solar active regions (ARs) in line-of-sight magnetograms from SOHO/MDI (1996–2011) and SDO/HMI (2010–2024). It replaces the authors’ prior DBSCAN-based DSARD method to handle variable densities without fixed thresholds, adds a differential-rotation tracking algorithm that combines physical motion models with Hamming-distance similarity, and introduces an SVC-based polarity-inversion-line extractor. The central claim is that HARDAT outperforms both the NOAA catalog and the authors’ own DSARD catalog in sensitivity, accuracy, and tracking stability, especially when resolving clustered ARs and preserving identity continuity.

Significance. If the performance gains can be shown to reflect improved fidelity to physical AR boundaries rather than agreement with the reference catalogs’ own selection biases, the method would supply a useful, threshold-free tool for long-term AR evolution studies and space-weather applications. The incorporation of hierarchical density clustering and physically motivated tracking is a clear technical advance over fixed-parameter DBSCAN approaches.

major comments (2)
  1. [Evaluation / results section] The superiority assertion in the evaluation (abstract and results sections) is established solely by agreement metrics against the NOAA and DSARD catalogs. No held-out human-expert labeling exercise on the same magnetograms, no quantitative comparison against independent third-party pipelines (e.g., other HDBSCAN variants or CNN segmenters), and no ablation of the HDBSCAN versus DBSCAN components are described. Consequently the measured gains could simply reproduce the reference catalogs’ implicit size/flux thresholds rather than demonstrate closer proximity to physical AR boundaries.
  2. [Tracking algorithm description and evaluation] The tracking algorithm’s claim of improved identity continuity is evaluated only against DSARD and NOAA; without an independent continuity metric (e.g., expert-verified AR lifetimes or cross-catalog overlap statistics on a common test set) it is impossible to separate genuine tracking improvement from catalog-specific labeling conventions.
minor comments (2)
  1. The abstract states that the method processes MDI and HMI data but does not report the exact number of magnetograms, cadence, or spatial resolution used in the quantitative comparisons.
  2. Notation for the Hamming-distance similarity metric and the SVC polarity-inversion-line classifier is introduced without explicit equations or pseudocode, making reproduction difficult.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below, acknowledging where the manuscript evaluation has limitations and outlining targeted revisions.

read point-by-point responses
  1. Referee: [Evaluation / results section] The superiority assertion in the evaluation (abstract and results sections) is established solely by agreement metrics against the NOAA and DSARD catalogs. No held-out human-expert labeling exercise on the same magnetograms, no quantitative comparison against independent third-party pipelines (e.g., other HDBSCAN variants or CNN segmenters), and no ablation of the HDBSCAN versus DBSCAN components are described. Consequently the measured gains could simply reproduce the reference catalogs’ implicit size/flux thresholds rather than demonstrate closer proximity to physical AR boundaries.

    Authors: We agree that the evaluation relies on agreement metrics with NOAA and DSARD. This follows standard practice for benchmarking against established references and our prior DSARD work. However, we recognize this does not constitute fully independent validation of physical boundaries. We will revise the manuscript to add an ablation study isolating HDBSCAN versus DBSCAN on identical datasets, expand the discussion to address potential reference biases, and note the absence of expert labeling or third-party comparisons as a limitation with suggestions for future work. The hierarchical density approach in HDBSCAN still provides a methodological basis for improved handling of variable-density structures. revision: partial

  2. Referee: [Tracking algorithm description and evaluation] The tracking algorithm’s claim of improved identity continuity is evaluated only against DSARD and NOAA; without an independent continuity metric (e.g., expert-verified AR lifetimes or cross-catalog overlap statistics on a common test set) it is impossible to separate genuine tracking improvement from catalog-specific labeling conventions.

    Authors: We agree the tracking evaluation uses comparisons to DSARD and NOAA. The algorithm integrates differential-rotation physics and Hamming-distance similarity to support continuity. In revision we will add quantitative tracking metrics (e.g., average track duration and identity persistence) and explicitly discuss the limitations of catalog-dependent evaluation, while noting that independent expert-verified lifetimes are not available in the current study. revision: partial

Circularity Check

1 steps flagged

Minor self-citation to prior DSARD work used as one evaluation reference; no derivation or fitted parameters present.

specific steps
  1. self citation load bearing [Abstract]
    "in the previous work, based on the density-based spatial clustering of applications with noise (DBSCAN) approach, we proposed the DBSCAN-based solar active region detection (DSARD) framework. [...] evaluating them against the National Oceanic and Atmospheric Administration (NOAA) and DSARD catalogues demonstrates that HARDAT is superior in terms of sensitivity, accuracy, and stability of detection and tracking."

    DSARD catalogue is produced by the authors' own prior method; using it as a reference label for the superiority claim makes part of the validation dependent on self-generated output rather than fully external ground truth, though NOAA provides an independent anchor and the core algorithmic description remains non-circular.

full rationale

The paper presents an algorithmic pipeline (HDBSCAN clustering plus tracking and PIL extraction) with no equations, fitted parameters, or first-principles derivation. The only self-reference is the mention of the authors' earlier DSARD framework and its catalogue as one of two evaluation benchmarks alongside the external NOAA catalogue. This is a normal self-citation for incremental method comparison and does not reduce any central claim to a self-referential loop by construction. No other patterns (self-definitional, fitted-input-as-prediction, uniqueness theorems, ansatz smuggling, or renaming) appear.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are stated. The method implicitly assumes that HDBSCAN's hierarchical clustering will separate physically meaningful magnetic structures without additional domain-specific constraints.

pith-pipeline@v0.9.0 · 5842 in / 1218 out tokens · 24282 ms · 2026-05-25T03:29:35.161453+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 3 internal anchors

  1. [1]

    , keywords =

    Statistical Analyses of Solar Active Regions in SDO/HMI Magnetograms Detected by the Unsupervised Machine Learning Method DSARD. , keywords =. doi:10.3847/1538-4365/adfdd8 , archivePrefix =. 2502.17990 , primaryClass =

  2. [2]

    The Astrophysical Journal , volume=

    Relationship between successive flares in the same active region and SHARP parameters , author=. The Astrophysical Journal , volume=. 2022 , publisher=

  3. [3]

    DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN , year =

    Schubert, Erich and Sander, J\". DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN , year =. ACM Trans. Database Syst. , month = jul, articleno =. doi:10.1145/3068335 , abstract =

  4. [4]

    , author=

    hdbscan: Hierarchical density based clustering. , author=. J. Open Source Softw. , volume=

  5. [5]

    and Dumais, S.T

    Hearst, M.A. and Dumais, S.T. and Osuna, E. and Platt, J. and Scholkopf, B. , journal=. Support vector machines , year=

  6. [6]

    2004 , publisher=

    Kernel methods for pattern analysis , author=. 2004 , publisher=

  7. [7]

    Living Reviews in Solar Physics , volume=

    Evolution of active regions , author=. Living Reviews in Solar Physics , volume=. 2015 , publisher=

  8. [8]

    Solar physics , volume=

    The solar differential rotation: present status of observations , author=. Solar physics , volume=. 1985 , publisher=

  9. [9]

    The Astrophysical Journal , volume=

    Correlation of the coronal mass ejection productivity of solar active regions with measures of their global nonpotentiality from vector magnetograms: Baseline results , author=. The Astrophysical Journal , volume=. 2002 , publisher=

  10. [10]

    ACM Sigmod record , volume=

    OPTICS: Ordering points to identify the clustering structure , author=. ACM Sigmod record , volume=. 1999 , publisher=

  11. [11]

    SIAM journal on computing , volume=

    Finding minimum spanning trees , author=. SIAM journal on computing , volume=. 1976 , publisher=

  12. [12]

    Toward a Live Homogeneous Database of Solar Active Regions Based on SOHO/MDI and SDO/HMI Synoptic Magnetograms. I. Automatic Detection and Calibration , author=. The Astrophysical Journal Supplement Series , volume=. 2023 , publisher=

  13. [13]

    EURASIP Journal on Image and Video Processing , volume=

    Evaluating multiple object tracking performance: the clear mot metrics , author=. EURASIP Journal on Image and Video Processing , volume=. 2008 , publisher=

  14. [14]

    Artificial intelligence , volume=

    Multiple object tracking: A literature review , author=. Artificial intelligence , volume=. 2021 , publisher=

  15. [15]

    2016 IEEE international conference on image processing (ICIP) , pages=

    Simple online and realtime tracking , author=. 2016 IEEE international conference on image processing (ICIP) , pages=. 2016 , organization=

  16. [16]

    International Journal of Computer Vision , volume=

    Discriminative correlation filter tracner with channel and spatial reliability , author=. International Journal of Computer Vision , volume=

  17. [17]

    doi:10.3847/1538-4357/ab4f7a , url =

    The SunPy Project: Open Source Development and Status of the Version 1.0 Core Package , journal =. doi:10.3847/1538-4357/ab4f7a , url =

  18. [18]

    Advances in neural information processing systems , volume=

    Hamming distance metric learning , author=. Advances in neural information processing systems , volume=

  19. [19]

    Flare-productive active regions

    Living Reviews in Solar Physics , keywords =. 2019 , month =. doi:10.1007/s41116-019-0019-7 , archiveprefix =. 1904.12027 , primaryclass =

  20. [20]

    2012 , month =

    , keywords =. 2012 , month =. doi:10.1007/s11207-011-9842-2 , adsurl =

  21. [21]

    2012 , month =

    , keywords =. 2012 , month =. doi:10.1007/s11207-011-9841-3 , adsurl =

  22. [22]

    , keywords =

    The Solar Oscillations Investigation - Michelson Doppler Imager. , keywords =. doi:10.1007/BF00733429 , adsurl =

  23. [23]

    , keywords =

    The SOHO Mission: an Overview. , keywords =. doi:10.1007/BF00733425 , adsurl =

  24. [24]

    2010 , month =

    , keywords =. 2010 , month =. doi:10.1088/0004-637X/723/2/1006 , adsurl =

  25. [25]

    2010 , month =

    , keywords =. 2010 , month =. doi:10.1007/s11207-009-9490-y , adsurl =

  26. [26]

    The Helioseismic and Magnetic Imager (HMI) Vector Magnetic Field Pipeline: SHARPs -- Space-weather HMI Active Region Patches

    , keywords =. 2014 , month =. doi:10.1007/s11207-014-0529-3 , archiveprefix =. 1404.1879 , primaryclass =

  27. [27]

    2022 , month =

    , keywords =. 2022 , month =. doi:10.1016/j.chinastron.2022.09.007 , adsurl =

  28. [28]

    Electronics , volume =

    Quan, Lin and Xu, Long and Li, Ling and Wang, Huaning and Huang, Xin , title =. Electronics , volume =. 2021 , number =

  29. [29]

    European conference on computer vision , pages=

    Performance measures and a data set for multi-target, multi-camera tracking , author=. European conference on computer vision , pages=. 2016 , organization=

  30. [30]

    Nature methods , volume =

    SciPy 1.0: fundamental algorithms for scientific computing in Python , author =. Nature methods , volume =. 2020 , publisher =

  31. [31]

    , keywords =

    Helioseismic Studies of Differential Rotation in the Solar Envelope by the Solar Oscillations Investigation Using the Michelson Doppler Imager. , keywords =. doi:10.1086/306146 , adsurl =

  32. [32]

    Disentangling the independently controllable factors of variation by interacting with the world

    The Internal Rotation of the Sun. , year = 2003, month = jan, volume =. doi:10.1146/annurev.astro.41.011802.094848 , adsurl =

  33. [33]

    Nature Astronomy , year = 2024, month = sep, volume =

    Height-dependent differential rotation of the solar atmosphere detected by CHASE. Nature Astronomy , year = 2024, month = sep, volume =. doi:10.1038/s41550-024-02299-4 , adsurl =

  34. [34]

    Toward a Live Homogeneous Database of Solar Active Regions Based on SOHO/MDI and SDO/HMI Synoptic Magnetograms. I. Automatic Detection and Calibration. , keywords =. doi:10.3847/1538-4365/acef1b , archivePrefix =. 2308.06914 , primaryClass =

  35. [35]

    Living Reviews in Solar Physics , keywords =

    Coronal Mass Ejections: Models and Their Observational Basis. Living Reviews in Solar Physics , keywords =. doi:10.12942/lrsp-2011-1 , adsurl =

  36. [36]

    Pacific-Asia conference on knowledge discovery and data mining , pages=

    Density-based clustering based on hierarchical density estimates , author=. Pacific-Asia conference on knowledge discovery and data mining , pages=. 2013 , organization=