Recognition: no theorem link
Search for quasar pairs with Gaia astrometric data. II. Photometric redshift prediction with machine learning for the MGQPC catalogue
Pith reviewed 2026-05-12 04:45 UTC · model grok-4.3
The pith
Machine learning predicts photometric redshifts for MGQPC quasars to identify 185 consistent pair candidates, 20 of which are spectroscopically confirmed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A CatBoost regression model for photometric-redshift point estimates combined with a FlexZBoost model for full redshift probability density functions, trained on large spectroscopically confirmed quasar samples from SDSS and DESI, is applied to the MGQPC catalogue and yields 185 high-probability quasar pair candidates selected by photometric-redshift consistency, twenty of which have been independently confirmed as physical systems by spectroscopy.
What carries the argument
The CatBoost and FlexZBoost machine-learning models trained on SDSS and DESI multi-wavelength photometry to produce quasar photometric-redshift point estimates and probability density functions.
If this is right
- The MGQPC photometric-redshift catalogue supplies a ready list for targeted spectroscopic campaigns on quasar pairs and dual supermassive black holes.
- 185 candidates can be ranked by consistency probability for efficient allocation of follow-up telescope time.
- The reported performance metrics indicate that similar training sets can be used to pre-filter projected alignments in other Gaia-selected quasar samples.
Where Pith is reading between the lines
- The same workflow could be retrained on upcoming wide-field surveys to discover additional close quasar pairs at higher redshifts.
- Confirmed physical pairs offer direct targets for studying how supermassive black holes and their host galaxies interact at kiloparsec separations.
- Reducing the fraction of line-of-sight contaminants through photometric-redshift pre-selection lowers the cost of spectroscopic confirmation campaigns.
Load-bearing premise
Photometric-redshift agreement between the two members of a candidate pair reliably signals physical association rather than chance line-of-sight alignment, and the models trained on SDSS and DESI data transfer without major bias to the MGQPC quasars.
What would settle it
Spectroscopic follow-up of the 185 candidates showing that the majority have redshift differences too large for physical association, or an independent test set of quasars where the reported normalized median absolute deviation rises well above 0.036.
Figures
read the original abstract
The identification of physically associated kiloparsec-scale quasar pairs is important for understanding galaxy evolution, the growth of supermassive black holes, and their co-evolution with host galaxies. However, their rarity and the high contamination from stellar superpositions and projected alignments require efficient pre-selection methods. We develop a machine-learning framework to produce photometric-redshift point estimates and redshift probability density functions for quasars, with the main goal of identifying high-probability quasar pair candidates in the MGQPC catalogue. We construct two large spectroscopically confirmed quasar samples with multi-wavelength photometry, based on SDSS and DESI Legacy Imaging Surveys data. CatBoost is used for point-estimate photometric-redshift regression, and FlexZBoost is used for full redshift-PDF estimation. The workflow achieves robust performance, with a normalised median absolute deviation of 0.036 and an outlier fraction of 5.6% on the test sample. Applying the trained model to the MGQPC catalogue, we identify 185 high-probability quasar pair candidates based on photometric-redshift consistency. Among them, 20 systems have been subsequently confirmed as genuine physical pairs by independent spectroscopic observations. The resulting MGQPC photometric-redshift catalogue provides a useful resource for future spectroscopic follow-up of quasar pairs and dual supermassive black holes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a machine-learning pipeline (CatBoost for point-estimate photometric redshifts and FlexZBoost for redshift PDFs) trained on spectroscopically confirmed quasar samples from SDSS and DESI Legacy Imaging Surveys. The trained models are applied to the MGQPC catalogue to select 185 quasar-pair candidates whose members have consistent photometric redshifts; 20 of these systems have subsequently been confirmed as physical pairs by independent spectroscopy. The reported test-sample performance is NMAD = 0.036 and outlier fraction 5.6%.
Significance. If the photo-z estimates transfer without major bias and the candidate list has quantified low contamination, the work supplies a practical pre-selection tool for rare kpc-scale quasar pairs and a public photometric-redshift catalogue for MGQPC. The 20 spectroscopically confirmed pairs demonstrate that true positives exist and that the workflow can be useful for follow-up studies of dual SMBHs and galaxy evolution.
major comments (1)
- [Application to MGQPC catalogue] § on application to MGQPC and candidate selection (abstract and results): The selection of 185 'high-probability' quasar pair candidates rests entirely on photometric-redshift consistency between the two members. With the reported NMAD of 0.036, any finite consistency window (e.g., |Δz| < 0.1 or 3σ) will be satisfied by a non-negligible fraction of unrelated quasars drawn from the same redshift distribution. The manuscript provides no Monte-Carlo estimate of this random-alignment rate, no control sample of random pairs, and no statement of how many of the 185 were pre-selected for spectroscopic follow-up versus how many were observed. Consequently the purity of the full list cannot be assessed from the 20 confirmed pairs alone.
minor comments (2)
- [Abstract] Abstract: The abstract states that performance metrics are obtained 'on the test sample' but does not report the training/validation/test split fractions or the precise definition of the outlier fraction; these details should be added for reproducibility.
- [Methods] Methods: The exact numerical threshold or criterion used to declare photometric-redshift consistency between pair members (e.g., |Δz| < X or |Δz| < n × σ) is not stated explicitly; this should be given in the text or a table.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. We address the major comment below and have revised the manuscript to incorporate a quantitative assessment of random alignments, thereby improving the interpretation of the candidate list.
read point-by-point responses
-
Referee: [Application to MGQPC catalogue] § on application to MGQPC and candidate selection (abstract and results): The selection of 185 'high-probability' quasar pair candidates rests entirely on photometric-redshift consistency between the two members. With the reported NMAD of 0.036, any finite consistency window (e.g., |Δz| < 0.1 or 3σ) will be satisfied by a non-negligible fraction of unrelated quasars drawn from the same redshift distribution. The manuscript provides no Monte-Carlo estimate of this random-alignment rate, no control sample of random pairs, and no statement of how many of the 185 were pre-selected for spectroscopic follow-up versus how many were observed. Consequently the purity of the full list cannot be assessed from the 20 confirmed pairs alone.
Authors: We agree that a Monte Carlo estimate of the random-alignment rate is necessary to contextualize the purity of the 185 candidates. In the revised manuscript we have added such an analysis: we draw mock pairs from the photometric-redshift distribution of the full MGQPC sample, apply the identical consistency window used for the real candidates, and report the resulting expected number of chance alignments. This provides a direct estimate of contamination. We also clarify the spectroscopic follow-up status, noting that the 20 confirmed physical pairs were obtained from targeted observations of a subset of the candidates (with the exact numbers now stated), while the full list of 185 is released as a resource for additional follow-up. The 20 confirmations demonstrate that genuine pairs are recovered by the method; the added simulation allows readers to assess the overall reliability of the pre-selection. revision: yes
Circularity Check
No circularity: ML training and application are independent of target catalogue
full rationale
The paper trains CatBoost and FlexZBoost models on spectroscopically confirmed quasar samples from SDSS and DESI Legacy Surveys, reports NMAD=0.036 and outlier fraction 5.6% on a held-out test subset, then applies the fixed trained model to the separate MGQPC catalogue to obtain photo-z values and select 185 candidates by redshift consistency. The 20 spectroscopically confirmed physical pairs are from independent follow-up observations. No equation, parameter, or selection criterion is defined in terms of the MGQPC outputs themselves, and no self-citation supplies a load-bearing uniqueness theorem or ansatz that collapses the workflow to its inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- CatBoost and FlexZBoost hyperparameters
axioms (2)
- domain assumption Spectroscopically confirmed SDSS and DESI quasars are representative of the MGQPC population
- domain assumption Photometric-redshift consistency indicates high probability of physical association
Reference graph
Works this paper leans on
-
[1]
Abazajian, K. N., Adelman-McCarthy, J. K., Agüeros, M. A., et al. 2009, ApJS, 182, 543
work page 2009
-
[2]
Ahumada, R., Allende Prieto, C., Almeida, A., et al. 2020, ApJS, 249, 3
work page 2020
-
[3]
F., Argudo-Fernández, M., et al
Almeida, A., Anderson, S. F., Argudo-Fernández, M., et al. 2023, ApJS, 267, 44
work page 2023
- [4]
-
[5]
1999, MNRAS, 310, 540 Astropy Collaboration, Price-Whelan, A
Arnouts, S., Cristiani, S., Moscardini, L., et al. 1999, MNRAS, 310, 540 Astropy Collaboration, Price-Whelan, A. M., Lim, P. L., et al. 2022, ApJ, 935, 167 Astropy Collaboration, Price-Whelan, A. M., et al. 2018, AJ, 156, 123 Astropy Collaboration, Robitaille, T. P., Tollerud, E. J., et al. 2013, A&A, 558, A33
work page 1999
-
[6]
Begelman, M. C., Blandford, R. D., & Rees, M. J. 1980, Nature, 287, 307 Benítez, N. 2000, ApJ, 536, 571
work page 1980
- [7]
-
[8]
Brammer, G. B., van Dokkum, P. G., & Coppi, P. 2008, ApJ, 686, 1503
work page 2008
-
[9]
Cardamone, C. N., van Dokkum, P. G., Urry, C. M., et al. 2010, ApJS, 189, 270
work page 2010
- [10]
-
[11]
Chen, T. & Guestrin, C. 2016, in Proceedings of the 22nd ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, 785–794
work page 2016
- [12]
- [13]
- [14]
- [15]
-
[16]
Dawid, A. P. 1984, Journal of the Royal Statistical Society. Series A (General), 147, 278 de Bruijne, J. H. J. 2012, Ap&SS, 341, 31 De Rosa, A., Vignali, C., Bogdanovi´c, T., et al. 2019, New A Rev., 86, 101525
work page 1984
-
[17]
Data Release 1 of the Dark Energy Spectroscopic Instrument
Deng, Z., Chen, Q., Jing, L., & Wu, J. 2025, A&A, submitted DES Collaboration. 2016, MNRAS, 460, 1270 DESI Collaboration, Abdul-Karim, M., Adame, A. G., et al. 2025, arXiv e-prints, arXiv:2503.14745
work page internal anchor Pith review arXiv 2025
- [18]
- [19]
- [20]
-
[21]
CatBoost: gradient boosting with categorical features support
Dorogush, A. V ., Ershov, V ., & Gulin, A. 2018, arXiv e-prints, arXiv:1810.11363
work page Pith review arXiv 2018
-
[22]
Friedman, J. H. 2001, Annals of Statistics, 29, 1189
work page 2001
- [23]
- [24]
- [25]
-
[26]
Gneiting, T., Raftery, A. E., Westveld, A. H., & Goldman, T. 2005, Monthly Weather Review, 133, 1098
work page 2005
- [27]
-
[28]
Gunn, J. E., Siegmund, W. A., Mannery, E. J., et al. 2006, AJ, 131, 2332
work page 2006
-
[29]
Harris, C. R., Millman, K. J., van der Walt, S. J., et al. 2020, Nature, 585, 357
work page 2020
-
[30]
Hastie, T., Tibshirani, R., & Friedman, J. 2001, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (New York: Springer)
work page 2001
- [31]
- [32]
- [33]
-
[34]
Hildebrandt, H., Erben, T., Kuijken, K., et al. 2012, MNRAS, 421, 2355
work page 2012
-
[35]
Hogg, D. W. 1999, arXiv e-prints, arXiv:astro-ph/9905116
work page internal anchor Pith review arXiv 1999
- [36]
- [37]
-
[38]
Hunter, J. D. 2007, Computing in Science & Engineering, 9, 90
work page 2007
- [39]
-
[40]
1998, in Astronomical Society of the Pacific Conference Series, V ol
Impey, C. 1998, in Astronomical Society of the Pacific Conference Series, V ol. 146, The Young Universe: Galaxy Formation and Evolution at Intermediate and High Redshift, ed. S. D’Odorico, A. Fontana, & E. Giallongo, 391
work page 1998
- [41]
- [42]
-
[43]
Kelley, L., Charisi, M., Burke-Spolaor, S., et al. 2019, BAAS, 51, 490
work page 2019
- [44]
- [45]
-
[46]
Lundberg, S. M., Erion, G. G., & Lee, S.-I. 2020, Nature Machine Intelligence
work page 2020
-
[47]
Lundberg, S. M. & Lee, S.-I. 2017, in Advances in Neural Information Process- ing Systems
work page 2017
- [48]
-
[49]
MacLeod, C. L., Ivezi´c, Ž., Kochanek, C. S., et al. 2010, ApJ, 721, 1014
work page 2010
- [50]
- [51]
-
[52]
2010, in Proceedings of the 9th Python in Science Conference, ed
McKinney, W. 2010, in Proceedings of the 9th Python in Science Conference, ed. S. van der Walt & J. Millman, 51–56
work page 2010
-
[53]
Melchior, P., Moolekamp, F., Jerdee, M., et al. 2018, Astron. Comput., 24, 129
work page 2018
-
[54]
2013, Dynamics and Evolution of Galactic Nuclei
Merritt, D. 2013, Dynamics and Evolution of Galactic Nuclei
work page 2013
-
[55]
Myers, A. D., Richards, G. T., Brunner, R. J., et al. 2008, ApJ, 678, 635
work page 2008
-
[56]
Nakazono, L., Valença, R. R., Soares, G., et al. 2024, MNRAS, 531, 327
work page 2024
-
[57]
Newman, J. A., Dalal, N., De Jong, R. S., & et al. 2022, PASP, 134, 054501
work page 2022
-
[58]
Newman, J. A. & Gruen, D. 2022, ARA&A, 60, 363
work page 2022
- [59]
-
[60]
Pfeifle, R. W., Weaver, K. A., Secrest, N. J., Rothberg, B., & Patton, D. R. 2025, ApJ Supplement Series, 281
work page 2025
-
[61]
L., D’Isanto, A., & Gieseke, F
Polsterer, K. L., D’Isanto, A., & Gieseke, F. 2016, arXiv e-prints, arXiv:1608.08016
-
[62]
Prokhorenkova, L., Gusev, G., V orobev, A., Dorogush, A. V ., & Gulin, A. 2018, in Advances in Neural Information Processing Systems
work page 2018
- [63]
-
[64]
Roster, W., Salvato, M., Krippendorf, S., et al. 2024, A&A, 692, A260
work page 2024
- [65]
- [66]
-
[67]
Sandrinelli, A., Falomo, R., Treves, A., Farina, E. P., & Uslenghi, M. 2014, MN- RAS, 444, 1835
work page 2014
-
[68]
Sandrinelli, A., Falomo, R., Treves, A., Scarpa, R., & Uslenghi, M. 2018, MN- RAS, 474, 4925
work page 2018
-
[69]
Saxena, A., Salvato, M., Roster, W., & et al. 2024, A&A, 690, A365
work page 2024
-
[70]
Schlegel, D. J., Finkbeiner, D. P., & Davis, M. 1998, ApJ, 500, 525
work page 1998
-
[71]
Schmidt, K. B., Marshall, P. J., Rix, H.-W., et al. 2010, ApJ, 714, 1194
work page 2010
-
[72]
Schmidt, S. J., Malz, A. I., Soo, J. Y . H., et al. 2020, MNRAS, staa2799
work page 2020
-
[73]
Schuldt, S., Suyu, S. H., Cañameras, R., et al. 2021, A&A, 651, A55
work page 2021
-
[74]
Shen, Y ., Casey-Clyde, J. A., Chen, Y .-C., et al. 2023a, arXiv e-prints, arXiv:2306.15527
-
[75]
Shen, Y ., Hennawi, J. F., Shankar, F., et al. 2010, ApJ, 719, 1693
work page 2010
- [76]
- [77]
-
[78]
Taylor, M. B. 2005, in Astronomical Society of the Pacific Conference Se- ries, V ol. 347, Astronomical Data Analysis Software and Systems XIV , ed. P. Shopbell, M. Britton, & R. Ebert, 29 The pandas development team. 2022, pandas-dev/pandas: Pandas The RAIL Team, van den Busch, J. L., Charles, E., et al. 2025, arXiv e-prints, arXiv:2505.02928 van der Wal...
-
[79]
Wang, J.-M., Songsheng, Y .-Y ., Li, Y .-R., & Du, P. 2023, MNRAS, 518, 3397
work page 2023
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.