An Improved HDBSCAN-based Detection and Tracking Method for Solar Active Regions in Magnetograms
Pith reviewed 2026-05-25 03:29 UTC · model grok-4.3
The pith
HDBSCAN clustering detects solar active regions more sensitively and tracks them with greater continuity than prior DBSCAN methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The HDBSCAN-based solar active region detection and tracking (HARDAT) method processes magnetograms to identify active regions by clustering points without requiring fixed density thresholds, enabling effective handling of multi-density structures. It incorporates a differential rotation model for tracking and a support vector classification method for extracting polarity inversion lines. When applied to two decades of SOHO/MDI and SDO/HMI observations, HARDAT achieves higher sensitivity and stability in detection and tracking than previous DBSCAN-based frameworks when benchmarked against established catalogues.
What carries the argument
The HDBSCAN algorithm, which performs hierarchical density-based clustering to identify active regions in varying density magnetic fields without preset parameters.
If this is right
- Improved detection of small and diffuse active regions that fixed-threshold methods may miss.
- Better separation and tracking of individual regions inside dense clusters.
- More reliable long-term identity maintenance for studies of active region evolution.
- Greater suitability for integration into operational space weather forecasting pipelines.
Where Pith is reading between the lines
- The approach could be tested on full-disk vector magnetograms to check whether performance gains persist when more magnetic information is available.
- Direct head-to-head comparison against convolutional neural network detectors on identical magnetogram sequences would clarify relative strengths.
- The polarity inversion line extraction step might be combined with existing flare prediction models to test whether boundary accuracy improves forecast skill.
Load-bearing premise
The NOAA and DSARD catalogues provide an unbiased and complete reference for correct active region detections and tracks.
What would settle it
Independent labeling of active regions by multiple solar physicists on a held-out sample of magnetograms, followed by re-evaluation showing that HARDAT does not outperform the baselines in sensitivity or identity continuity.
Figures
read the original abstract
Solar active regions (ARs) are the primary source of solar eruptions and space weather. Accurate detection and tracking of ARs is crucial for understanding their evolution and predicting solar activities. In the previous work, based on the density-based spatial clustering of applications with noise (DBSCAN) approach, we proposed the DBSCAN-based solar active region detection (DSARD) framework. To overtake its limitations, in this paper we applied the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) approach to the detection of solar active regions, which is called the HDBSCAN-based solar active region detection and tracking (HARDAT) method. This enables the algorithm to handle multi-density magnetic structures dynamically, eliminating the need for fixed thresholds. Consequently, the algorithm can detect diffuse and small ARs more effectively while preserving morphological integrity. We have also developed a solar differential rotation based tracking algorithm that integrates physical motion models and Hamming distance similarity metrics to achieve robust multi-object tracking. Additionally, we propose a novel polarity inversion line extraction method that uses support vector classification, which offers superior generalization for complex AR boundaries. Processing line-of-sight magnetograms from SOHO/MDI (1996--2011) and SDO/HMI (2010--2024) and evaluating them against the National Oceanic and Atmospheric Administration (NOAA) and DSARD catalogues demonstrates that HARDAT is superior in terms of sensitivity, accuracy, and stability of detection and tracking. This is particularly evident when resolving clustered ARs and maintaining identity continuity. HARDAT therefore offers a comprehensive solution for the long-term analysis of AR evolution and space weather prediction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents HARDAT, an HDBSCAN-based pipeline for detecting solar active regions (ARs) in line-of-sight magnetograms from SOHO/MDI (1996–2011) and SDO/HMI (2010–2024). It replaces the authors’ prior DBSCAN-based DSARD method to handle variable densities without fixed thresholds, adds a differential-rotation tracking algorithm that combines physical motion models with Hamming-distance similarity, and introduces an SVC-based polarity-inversion-line extractor. The central claim is that HARDAT outperforms both the NOAA catalog and the authors’ own DSARD catalog in sensitivity, accuracy, and tracking stability, especially when resolving clustered ARs and preserving identity continuity.
Significance. If the performance gains can be shown to reflect improved fidelity to physical AR boundaries rather than agreement with the reference catalogs’ own selection biases, the method would supply a useful, threshold-free tool for long-term AR evolution studies and space-weather applications. The incorporation of hierarchical density clustering and physically motivated tracking is a clear technical advance over fixed-parameter DBSCAN approaches.
major comments (2)
- [Evaluation / results section] The superiority assertion in the evaluation (abstract and results sections) is established solely by agreement metrics against the NOAA and DSARD catalogs. No held-out human-expert labeling exercise on the same magnetograms, no quantitative comparison against independent third-party pipelines (e.g., other HDBSCAN variants or CNN segmenters), and no ablation of the HDBSCAN versus DBSCAN components are described. Consequently the measured gains could simply reproduce the reference catalogs’ implicit size/flux thresholds rather than demonstrate closer proximity to physical AR boundaries.
- [Tracking algorithm description and evaluation] The tracking algorithm’s claim of improved identity continuity is evaluated only against DSARD and NOAA; without an independent continuity metric (e.g., expert-verified AR lifetimes or cross-catalog overlap statistics on a common test set) it is impossible to separate genuine tracking improvement from catalog-specific labeling conventions.
minor comments (2)
- The abstract states that the method processes MDI and HMI data but does not report the exact number of magnetograms, cadence, or spatial resolution used in the quantitative comparisons.
- Notation for the Hamming-distance similarity metric and the SVC polarity-inversion-line classifier is introduced without explicit equations or pseudocode, making reproduction difficult.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below, acknowledging where the manuscript evaluation has limitations and outlining targeted revisions.
read point-by-point responses
-
Referee: [Evaluation / results section] The superiority assertion in the evaluation (abstract and results sections) is established solely by agreement metrics against the NOAA and DSARD catalogs. No held-out human-expert labeling exercise on the same magnetograms, no quantitative comparison against independent third-party pipelines (e.g., other HDBSCAN variants or CNN segmenters), and no ablation of the HDBSCAN versus DBSCAN components are described. Consequently the measured gains could simply reproduce the reference catalogs’ implicit size/flux thresholds rather than demonstrate closer proximity to physical AR boundaries.
Authors: We agree that the evaluation relies on agreement metrics with NOAA and DSARD. This follows standard practice for benchmarking against established references and our prior DSARD work. However, we recognize this does not constitute fully independent validation of physical boundaries. We will revise the manuscript to add an ablation study isolating HDBSCAN versus DBSCAN on identical datasets, expand the discussion to address potential reference biases, and note the absence of expert labeling or third-party comparisons as a limitation with suggestions for future work. The hierarchical density approach in HDBSCAN still provides a methodological basis for improved handling of variable-density structures. revision: partial
-
Referee: [Tracking algorithm description and evaluation] The tracking algorithm’s claim of improved identity continuity is evaluated only against DSARD and NOAA; without an independent continuity metric (e.g., expert-verified AR lifetimes or cross-catalog overlap statistics on a common test set) it is impossible to separate genuine tracking improvement from catalog-specific labeling conventions.
Authors: We agree the tracking evaluation uses comparisons to DSARD and NOAA. The algorithm integrates differential-rotation physics and Hamming-distance similarity to support continuity. In revision we will add quantitative tracking metrics (e.g., average track duration and identity persistence) and explicitly discuss the limitations of catalog-dependent evaluation, while noting that independent expert-verified lifetimes are not available in the current study. revision: partial
Circularity Check
Minor self-citation to prior DSARD work used as one evaluation reference; no derivation or fitted parameters present.
specific steps
-
self citation load bearing
[Abstract]
"in the previous work, based on the density-based spatial clustering of applications with noise (DBSCAN) approach, we proposed the DBSCAN-based solar active region detection (DSARD) framework. [...] evaluating them against the National Oceanic and Atmospheric Administration (NOAA) and DSARD catalogues demonstrates that HARDAT is superior in terms of sensitivity, accuracy, and stability of detection and tracking."
DSARD catalogue is produced by the authors' own prior method; using it as a reference label for the superiority claim makes part of the validation dependent on self-generated output rather than fully external ground truth, though NOAA provides an independent anchor and the core algorithmic description remains non-circular.
full rationale
The paper presents an algorithmic pipeline (HDBSCAN clustering plus tracking and PIL extraction) with no equations, fitted parameters, or first-principles derivation. The only self-reference is the mention of the authors' earlier DSARD framework and its catalogue as one of two evaluation benchmarks alongside the external NOAA catalogue. This is a normal self-citation for incremental method comparison and does not reduce any central claim to a self-referential loop by construction. No other patterns (self-definitional, fitted-input-as-prediction, uniqueness theorems, ansatz smuggling, or renaming) appear.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Statistical Analyses of Solar Active Regions in SDO/HMI Magnetograms Detected by the Unsupervised Machine Learning Method DSARD. , keywords =. doi:10.3847/1538-4365/adfdd8 , archivePrefix =. 2502.17990 , primaryClass =
-
[2]
The Astrophysical Journal , volume=
Relationship between successive flares in the same active region and SHARP parameters , author=. The Astrophysical Journal , volume=. 2022 , publisher=
work page 2022
-
[3]
DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN , year =
Schubert, Erich and Sander, J\". DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN , year =. ACM Trans. Database Syst. , month = jul, articleno =. doi:10.1145/3068335 , abstract =
- [4]
-
[5]
Hearst, M.A. and Dumais, S.T. and Osuna, E. and Platt, J. and Scholkopf, B. , journal=. Support vector machines , year=
- [6]
-
[7]
Living Reviews in Solar Physics , volume=
Evolution of active regions , author=. Living Reviews in Solar Physics , volume=. 2015 , publisher=
work page 2015
-
[8]
The solar differential rotation: present status of observations , author=. Solar physics , volume=. 1985 , publisher=
work page 1985
-
[9]
The Astrophysical Journal , volume=
Correlation of the coronal mass ejection productivity of solar active regions with measures of their global nonpotentiality from vector magnetograms: Baseline results , author=. The Astrophysical Journal , volume=. 2002 , publisher=
work page 2002
-
[10]
OPTICS: Ordering points to identify the clustering structure , author=. ACM Sigmod record , volume=. 1999 , publisher=
work page 1999
-
[11]
SIAM journal on computing , volume=
Finding minimum spanning trees , author=. SIAM journal on computing , volume=. 1976 , publisher=
work page 1976
-
[12]
Toward a Live Homogeneous Database of Solar Active Regions Based on SOHO/MDI and SDO/HMI Synoptic Magnetograms. I. Automatic Detection and Calibration , author=. The Astrophysical Journal Supplement Series , volume=. 2023 , publisher=
work page 2023
-
[13]
EURASIP Journal on Image and Video Processing , volume=
Evaluating multiple object tracking performance: the clear mot metrics , author=. EURASIP Journal on Image and Video Processing , volume=. 2008 , publisher=
work page 2008
-
[14]
Artificial intelligence , volume=
Multiple object tracking: A literature review , author=. Artificial intelligence , volume=. 2021 , publisher=
work page 2021
-
[15]
2016 IEEE international conference on image processing (ICIP) , pages=
Simple online and realtime tracking , author=. 2016 IEEE international conference on image processing (ICIP) , pages=. 2016 , organization=
work page 2016
-
[16]
International Journal of Computer Vision , volume=
Discriminative correlation filter tracner with channel and spatial reliability , author=. International Journal of Computer Vision , volume=
-
[17]
doi:10.3847/1538-4357/ab4f7a , url =
The SunPy Project: Open Source Development and Status of the Version 1.0 Core Package , journal =. doi:10.3847/1538-4357/ab4f7a , url =
-
[18]
Advances in neural information processing systems , volume=
Hamming distance metric learning , author=. Advances in neural information processing systems , volume=
-
[19]
Flare-productive active regions
Living Reviews in Solar Physics , keywords =. 2019 , month =. doi:10.1007/s41116-019-0019-7 , archiveprefix =. 1904.12027 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/s41116-019-0019-7 2019
-
[20]
, keywords =. 2012 , month =. doi:10.1007/s11207-011-9842-2 , adsurl =
-
[21]
, keywords =. 2012 , month =. doi:10.1007/s11207-011-9841-3 , adsurl =
-
[22]
The Solar Oscillations Investigation - Michelson Doppler Imager. , keywords =. doi:10.1007/BF00733429 , adsurl =
-
[23]
The SOHO Mission: an Overview. , keywords =. doi:10.1007/BF00733425 , adsurl =
-
[24]
, keywords =. 2010 , month =. doi:10.1088/0004-637X/723/2/1006 , adsurl =
-
[25]
, keywords =. 2010 , month =. doi:10.1007/s11207-009-9490-y , adsurl =
-
[26]
, keywords =. 2014 , month =. doi:10.1007/s11207-014-0529-3 , archiveprefix =. 1404.1879 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/s11207-014-0529-3 2014
-
[27]
, keywords =. 2022 , month =. doi:10.1016/j.chinastron.2022.09.007 , adsurl =
-
[28]
Quan, Lin and Xu, Long and Li, Ling and Wang, Huaning and Huang, Xin , title =. Electronics , volume =. 2021 , number =
work page 2021
-
[29]
European conference on computer vision , pages=
Performance measures and a data set for multi-target, multi-camera tracking , author=. European conference on computer vision , pages=. 2016 , organization=
work page 2016
-
[30]
SciPy 1.0: fundamental algorithms for scientific computing in Python , author =. Nature methods , volume =. 2020 , publisher =
work page 2020
-
[31]
Helioseismic Studies of Differential Rotation in the Solar Envelope by the Solar Oscillations Investigation Using the Michelson Doppler Imager. , keywords =. doi:10.1086/306146 , adsurl =
-
[32]
Disentangling the independently controllable factors of variation by interacting with the world
The Internal Rotation of the Sun. , year = 2003, month = jan, volume =. doi:10.1146/annurev.astro.41.011802.094848 , adsurl =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1146/annurev.astro.41.011802.094848 2003
-
[33]
Nature Astronomy , year = 2024, month = sep, volume =
Height-dependent differential rotation of the solar atmosphere detected by CHASE. Nature Astronomy , year = 2024, month = sep, volume =. doi:10.1038/s41550-024-02299-4 , adsurl =
-
[34]
Toward a Live Homogeneous Database of Solar Active Regions Based on SOHO/MDI and SDO/HMI Synoptic Magnetograms. I. Automatic Detection and Calibration. , keywords =. doi:10.3847/1538-4365/acef1b , archivePrefix =. 2308.06914 , primaryClass =
-
[35]
Living Reviews in Solar Physics , keywords =
Coronal Mass Ejections: Models and Their Observational Basis. Living Reviews in Solar Physics , keywords =. doi:10.12942/lrsp-2011-1 , adsurl =
-
[36]
Pacific-Asia conference on knowledge discovery and data mining , pages=
Density-based clustering based on hierarchical density estimates , author=. Pacific-Asia conference on knowledge discovery and data mining , pages=. 2013 , organization=
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.