Photometric classification of quasars from DES and photo-z estimation with Machine Learning
Pith reviewed 2026-05-20 00:20 UTC · model grok-4.3
The pith
KNN on DES photometry classifies quasars at 0.99 precision and builds an 872k-object photo-z catalog
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Cross-matching DES DR2 with SDSS DR16 produces a training set of 168,738 objects on which a KNN classifier using four-band PSF magnitudes separates quasars from contaminants at 0.99 precision with 0.77 recall. A hybrid ML approach combining boosted decision trees and a decision tree regressor then estimates photometric redshifts across 872,372 photometric objects, with 675,683 cleaned objects reliable for cosmological applications in the range 0 < z < 3 and the full set useful at z ≈ 4.
What carries the argument
K-Nearest Neighbors classifier on PSF magnitudes in the g, r, i, z bands for quasar selection, followed by a hybrid boosted decision tree plus decision tree regressor pipeline for photometric redshift estimation
Load-bearing premise
The cross-matched training sample of 168,738 objects is representative of the full DES photometric population without significant selection biases or distribution shifts.
What would settle it
Spectroscopic follow-up on a random subset of the photometrically classified objects to verify whether the reported 0.99 precision and 0.77 recall are reproduced on objects outside the training cross-match.
read the original abstract
This paper presents a comprehensive study of quasar photometric classification and redshift estimation using machine learning techniques. We cross-matched photometric data from the Dark Energy Survey Data Release 2 (DES DR2) with spectroscopic classifications from the Sloan Digital Sky Survey Data Release 16 (SDSS DR16), yielding an initial sample of 168,738 point-like objects. Using a K-Nearest Neighbors (KNN) algorithm with PSF magnitudes in the $g$, $r$, $i$, and $z$ bands, we achieved high-precision quasar/galaxy classification against stellar contaminants, reaching a recall of 0.77 at 0.99 precision. Photometric redshifts were subsequently estimated using a hybrid machine learning approach combining a Boosted Decision Tree from ANNz and a Decision Tree Regressor from scikit-learn. The resulting catalog spans redshifts from $z \approx 0.5$ to $z > 3$, with a distinct population recovered at $z \approx 4$. A stacked outlier classifier was developed to mitigate catastrophic redshift errors. The full photometric redshift sample contains 872,372 objects and remains reliable for cosmological applications at $z \approx 4$. The cleaned catalog contains 675,683 objects and is suitable for large-scale structure studies in the range $0 < z < 3$. This robustly characterized quasar catalog provides a valuable resource for future cosmological investigations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper cross-matches DES DR2 photometry with SDSS DR16 spectroscopy to obtain 168,738 point-like objects and applies a KNN classifier on PSF g,r,i,z magnitudes to separate quasars from galaxies and stars, reporting a recall of 0.77 at 0.99 precision. A hybrid ML pipeline (ANNz boosted decision tree plus scikit-learn decision tree regressor) then produces photometric redshifts, yielding a catalog of 872,372 objects asserted to be reliable for cosmology at z≈4 and a cleaned subset of 675,683 objects for large-scale structure studies between 0<z<3.
Significance. A large, photometrically classified quasar sample extending to z≈4 would be a useful resource for cosmological analyses if the quoted performance metrics generalize beyond the training set. The work demonstrates a practical application of standard ML tools to a wide-field survey.
major comments (2)
- [Abstract] Abstract: the central claim that the 872,372-object catalog is 'reliable for cosmological applications at z≈4' rests on the untested assumption that the 168,738-object SDSS-DES cross-match is representative of the full DES photometric population; no reweighting, domain-adaptation diagnostics, or magnitude-color distribution comparisons are described to address spectroscopic selection biases that are known to affect high-z quasar recovery.
- [Abstract] Abstract: the quoted performance (recall 0.77 at 0.99 precision) is given without error bars, cross-validation procedure, or sensitivity analysis to the choice of K or other hyperparameters, so it is impossible to judge whether the metric is robust or whether post-hoc tuning has occurred.
minor comments (2)
- [Abstract] Abstract: the redshift range is described as 'z ≈ 0.5 to z > 3, with a distinct population recovered at z ≈ 4'; provide the precise redshift bounds of the final catalog and the criterion used to identify the z≈4 population.
- [Abstract] Abstract: clarify whether the 'stacked outlier classifier' is applied before or after the hybrid photo-z step and how it affects the final sample sizes.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the scope and limitations of our work. We respond to each major comment below and indicate revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the 872,372-object catalog is 'reliable for cosmological applications at z≈4' rests on the untested assumption that the 168,738-object SDSS-DES cross-match is representative of the full DES photometric population; no reweighting, domain-adaptation diagnostics, or magnitude-color distribution comparisons are described to address spectroscopic selection biases that are known to affect high-z quasar recovery.
Authors: We agree that explicit checks for representativeness are needed to support the reliability claim. The manuscript uses the SDSS-DES cross-match as the largest available spectroscopic anchor for DES DR2, but we will revise the abstract and add a new subsection in the methods describing magnitude and color distribution comparisons between the training sample and the full DES point-like photometric population. We will also outline a magnitude-based reweighting scheme and note its limitations for high-z selection biases. revision: yes
-
Referee: [Abstract] Abstract: the quoted performance (recall 0.77 at 0.99 precision) is given without error bars, cross-validation procedure, or sensitivity analysis to the choice of K or other hyperparameters, so it is impossible to judge whether the metric is robust or whether post-hoc tuning has occurred.
Authors: The metrics were computed on a held-out test set after 5-fold cross-validation for hyperparameter tuning on the training portion. We will revise the abstract and methods to report bootstrap error bars on the recall and precision, explicitly describe the cross-validation folds, and include a sensitivity plot showing performance stability for K between 3 and 15. This will confirm that the reported values reflect validated choices rather than post-hoc adjustment. revision: yes
Circularity Check
No significant circularity in empirical ML classification and photo-z pipeline
full rationale
The paper applies off-the-shelf KNN and hybrid ML (ANNz BDT + scikit-learn regressor) to a cross-matched DES-SDSS training set of 168738 objects, then reports recall/precision and produces a catalog of 872372 objects. All performance numbers are computed directly against external spectroscopic labels; no functional form is fitted and then re-used as a 'prediction', no self-citation supplies a load-bearing uniqueness theorem, and no ansatz or renaming occurs. The representativeness of the training sample is an empirical assumption whose validity can be tested externally, but it does not make the reported metrics tautological by construction. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- K in KNN
- hyperparameters of boosted decision tree and regressor
axioms (1)
- domain assumption The cross-matched DES-SDSS sample is free of significant selection bias relative to the full photometric population.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Using a K-Nearest Neighbors (KNN) algorithm with PSF magnitudes in the g, r, i, and z bands, we achieved high-precision quasar/galaxy classification against stellar contaminants, reaching a recall of 0.77 at 0.99 precision.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The full photometric redshift sample contains 872,372 objects and remains reliable for cosmological applications at z≈4.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Collaboration,The dark energy survey: More than dark energy – an overview,MNRAS 460(2016) 1270
D.E.S. Collaboration,The dark energy survey: More than dark energy – an overview,MNRAS 460(2016) 1270. – 22 –
work page 2016
-
[2]
Collaboration,The dark energy survey, inThe Dark Energy Survey White Paper, 2005
T.D.E.S. Collaboration,The dark energy survey, inThe Dark Energy Survey White Paper, 2005. [3]DES Collaborationcollaboration,Dark energy survey year 3 results: Cosmological constraints from galaxy clustering and weak lensing,Phys. Rev. D105(2022) 023520
work page 2005
-
[3]
R. Kessler et al.,Results from the dark energy survey supernova program,The Astronomical Journal150(2015) 172
work page 2015
-
[4]
Ž. Ivezić et al.,Lsst: From science drivers to reference design and anticipated data products, Astrophys. J.873(2019) 111
work page 2019
-
[5]
Euclid Definition Study Report
R. Laureijs et al.,Euclid definition study report,arXiv:1110.3193(2011)
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[6]
L. Amendola, S. Appleby, D. Bacon, T. Baker, M. Baldi, N. Bartolo et al.,Cosmology and fundamental physics with the euclid satellite,Living Reviews in Relativity16(2013)
work page 2013
-
[7]
Rees,Black hole models for active galactic nuclei,Annu
M.J. Rees,Black hole models for active galactic nuclei,Annu. Rev. Astron. Astrophys.22 (1984) 471
work page 1984
-
[8]
L. Van Waerbeke,Shear and magnification: cosmic complementarity,Monthly Notices of the Royal Astronomical Society401(2010) 2093
work page 2010
-
[9]
X. Fan, E. Bañados and R.A. Simcoe,Quasars and the intergalactic medium at cosmic dawn, Annual Review of Astronomy and Astrophysics61(2023) 373
work page 2023
- [10]
-
[11]
J.A. Frieman, M.S. Turner and D. Huterer,Dark energy and the accelerating universe,Annual Review of Astronomy and Astrophysics46(2008) 385–432
work page 2008
-
[12]
D.J. Eisenstein et al.,Detection of the baryon acoustic peak in the large-scale correlation function of sdss luminous red galaxies,Astrophys. J.633(2005) 560
work page 2005
- [13]
-
[14]
L. Anderson, É. Aubourg, S. Bailey, F. Beutler, V. Bhardwaj, M. Blanton et al.,The clustering of galaxies in the sdss-iii baryon oscillation spectroscopic survey: baryon acoustic oscillations in the data releases 10 and 11 galaxy samples,Monthly Notices of the Royal Astronomical Society 441(2014) 24–62
work page 2014
-
[15]
S. Alam, M. Ata, S. Bailey, F. Beutler, D. Bizyaev, J.A. Blazek et al.,The clustering of galaxies in the completed sdss-iii baryon oscillation spectroscopic survey: cosmological analysis of the dr12 galaxy sample,Monthly Notices of the Royal Astronomical Society470(2017) 2617–2652
work page 2017
-
[16]
G. Huetsi,Acoustic oscillations in the sdss dr4 luminous red galaxy sample power spectrum, Astronomy & Astrophysics449(2006) 891
work page 2006
- [17]
-
[18]
Rauch et al.,The lyman alpha forest in the spectra of quasars,Astrophys
M. Rauch et al.,The lyman alpha forest in the spectra of quasars,Astrophys. J.489(1998) 7
work page 1998
-
[19]
A. Repp, H. Ebeling and J. Richard,A systematic search for lensed high-redshift galaxies in hst images of macs clusters,Monthly Notices of the Royal Astronomical Society457(2016) 1399
work page 2016
-
[20]
G.T. Richards et al.,Colors of 2625 quasars at 0 < z < 5 measured in the sloan digital sky survey photometric system,Astron. J.123(2002) 2945. – 23 –
work page 2002
- [21]
-
[22]
G.T. Richards, M.A. Weinstein, D.P. Schneider, X. Fan, M.A. Strauss, D.E. Vanden Berk et al.,Photometric redshifts of quasars,The Astronomical Journal122(2001) 1151
work page 2001
-
[23]
S. Joudaki, C. Blake, A. Johnson, A. Amon, M. Asgari, A. Choi et al.,Kids-450+ 2dflens: Cosmological parameter constraints from weak gravitational lensing tomography and overlapping redshift-space galaxy clustering,Monthly Notices of the Royal Astronomical Society474(2018) 4894
work page 2018
-
[24]
H. Hildebrandt, M. Viola, C. Heymans, S. Joudaki, K. Kuijken, C. Blake et al.,Kids-450: cosmological parameter constraints from tomographic weak gravitational lensing,Monthly Notices of the Royal Astronomical Society465(2017) 1454
work page 2017
-
[25]
H. Hildebrandt, M. Brusa, O. Ilbert, P. Capak, M. Salvato et al.,Photometric redshifts in cosmology,The Astrophysical Journal721(2010) 109
work page 2010
-
[26]
Richards et al.,Efficient photometric selection of quasars from the sloan digital sky survey
G.T. Richards et al.,Efficient photometric selection of quasars from the sloan digital sky survey. ii.∼1,000,000 quasars from data release 6,Astron. J.131(2006) 2766
work page 2006
- [27]
-
[28]
X. Fan et al.,Evolution of the ionizing background and the epoch of reionization from the spectra ofz∼6quasars,Astrophys. J.526(1999) 57
work page 1999
-
[29]
A.S. Bolton, D.J. Schlegel, É. Aubourg, S. Bailey, V. Bhardwaj, J.R. Brownstein et al.,Spectral classification and redshift measurement for the sdss-iii baryon oscillation spectroscopic survey, The Astronomical Journal144(2012) 144
work page 2012
- [30]
-
[31]
B.J. Weiner, A.C. Phillips, S.M. Faber, C.N.A. Willmer, N.P. Vogt, L. Simard et al.,A spectroscopic survey of redshift 1.4 galaxies in the goods-north field: The redshift catalog,The Astrophysical Journal620(2005) 595
work page 2005
-
[32]
G.T. Richards, X. Fan, D.P. Schneider, D.E. Vanden Berk, M.A. Strauss, D.G. York et al., Colors of 2625 quasars at 0 < z < 5 measured in the sloan digital sky survey photometric system,The Astronomical Journal123(2002) 2945
work page 2002
-
[33]
N. Skrzypek, S.J. Warren and J.K. Faherty,Ukidss counterparts to cool wise-selected quasars: revealing a population of m-dwarf/quasar misidentifications,Monthly Notices of the Royal Astronomical Society458(2016) 2971–2977
work page 2016
-
[34]
J. Prat, C. Sánchez, Y. Fang, D. Gruen, J. Elvin-Poole, N. Kokron et al.,Dark energy survey year 1 results: Galaxy-galaxy lensing,Physical Review D98(2018) 042005
work page 2018
-
[35]
Vanden Berk et al.,Composite quasar spectra from the sloan digital sky survey,Astron
D.E. Vanden Berk et al.,Composite quasar spectra from the sloan digital sky survey,Astron. J. 122(2001) 549
work page 2001
-
[36]
M. Brescia, S. Cavuoti, R. D’Abrusco, G. Longo and A. Mercurio,Photometric redshifts for quasars in multi-band surveys,The Astrophysical Journal772(2013) 140
work page 2013
-
[37]
Fan et al.,A survey ofz >5.8quasars in the sloan digital sky survey
X. Fan et al.,A survey ofz >5.8quasars in the sloan digital sky survey. i. discovery of three new quasars and the spatial density of luminous quasars atz∼6,Astron. J.121(2001) 54
work page 2001
-
[38]
X.-B. Wu, W. Zhang and X. Zhou,Color-redshift relations and photometric redshift estimations of quasars in large sky surveys,Chinese Journal of Astronomy and Astrophysics4(2004) 17. – 24 – [40]DES and SPT Collaborationscollaboration,Dark energy survey year 1 results: Tomographic cross-correlations between dark energy survey galaxies and cmb lensing from s...
work page 2004
-
[39]
T. Abbott et al.,The dark energy survey: Data release 1,The Astrophysical Journal Supplement Series239(2018) 18
work page 2018
-
[40]
B. Hoyle, M.M. Rau, K. Paech, C. Bonnett and S. Seitz,Machine learning photometric redshifts with random forests and gaussian processes,Mon. Not. R. Astron. Soc.452(2015) 4183
work page 2015
-
[41]
Beck et al.,Photometric redshift estimation with a convolutional neural network,Mon
R. Beck et al.,Photometric redshift estimation with a convolutional neural network,Mon. Not. R. Astron. Soc.472(2017) 949
work page 2017
-
[42]
D.E. Rumelhart, G.E. Hinton and R.J. Williams,Learning representations by back-propagating errors,nature323(1986) 533
work page 1986
-
[43]
L. Bottou,Large-scale machine learning with stochastic gradient descent, inProceedings of COMPSTAT’2010, pp. 177–186, Springer, 2010, DOI
work page 2010
-
[44]
Friedman,Greedy function approximation: A gradient boosting machine,Annals of statistics(2001) 1189
J.H. Friedman,Greedy function approximation: A gradient boosting machine,Annals of statistics(2001) 1189
work page 2001
- [45]
-
[46]
I.A. Almosallam, M.J. Jarvis and S.J. Roberts,Gpz: non-stationary sparse gaussian processes for heteroscedastic uncertainty estimation in photometric redshifts,Monthly Notices of the Royal Astronomical Society462(2016) 726
work page 2016
-
[47]
I.A. Almosallam, S.N. Lindsay, M.J. Jarvis and S.J. Roberts,A sparse gaussian process framework for photometric redshift estimation,Monthly Notices of the Royal Astronomical Society455(2016) 2387
work page 2016
- [48]
-
[49]
York et al.,The sloan digital sky survey: Technical summary,Astron
D.G. York et al.,The sloan digital sky survey: Technical summary,Astron. J.120(2000) 1579
work page 2000
-
[50]
Kaiser et al.,The pan-starrs wide-field optical/nir imaging survey,Proc
N. Kaiser et al.,The pan-starrs wide-field optical/nir imaging survey,Proc. SPIE7733(2010)
work page 2010
-
[51]
J.A. Newman, M.C. Cooper, M. Davis, S. Faber, A.L. Coil, P. Guhathakurta et al.,The deep2 galaxy redshift survey: Design, observations, data reduction, and redshifts,The Astrophysical Journal Supplement Series208(2013) 5
work page 2013
-
[52]
Dawson et al.,The baryon oscillation spectroscopic survey of sdss-iii,Astron
K.S. Dawson et al.,The baryon oscillation spectroscopic survey of sdss-iii,Astron. J.145 (2013) 10
work page 2013
-
[53]
D.C. Masters, D.K. Stern, J.G. Cohen, P.L. Capak, S.A. Stanford, N. Hernitschek et al.,The complete calibration of the color–redshift relation (c3r2) survey: analysis and data release 2, The Astrophysical Journal877(2019) 81
work page 2019
-
[54]
Pedregosa et al.,Scikit-learn: Machine learning in python,J
F. Pedregosa et al.,Scikit-learn: Machine learning in python,J. Mach. Learn. Res.12(2011) 2825–2830
work page 2011
-
[55]
I. Sadeh et al.,Annz2: Photometric redshift and probability distribution function estimation using machine learning,ApJS219(2016) 1
work page 2016
-
[56]
E. Abdalla, F.B. Abdalla, A. Marins, A. Queiroz, R.M. Ribeiro and A.S. Souza,Machine learning analysis of photometric data from the dark energy survey,arXiv preprint arXiv:2508.10191(2025) . – 25 –
-
[57]
K. Bandura, G.E. Addison, M. Amiri, J.R. Bond, D. Campbell-Wilson, L. Connor et al., Canadian hydrogen intensity mapping experiment (chime) pathfinder, inGround-based and Airborne Telescopes V, vol. 9145, pp. 738–757, SPIE, 2014
work page 2014
-
[58]
R. Nan, D. Li, C. Jin, Q. Wang, L. Zhu, W. Zhu et al.,The five-hundred-meter aperture spherical radio telescope (fast) project,International Journal of Modern Physics D20(2011) 989
work page 2011
-
[59]
HI intensity mapping with FAST
M.-A. Bigot-Sazy, Y.-Z. Ma, R.A. Battye, I.W. Browne, T. Chen, C. Dickinson et al.,Hi intensity mapping with fast,arXiv preprint arXiv:1511.03006(2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[60]
Cosmology with a SKA HI intensity mapping survey
M.G. Santos, P. Bull, D. Alonso, S. Camera, P.G. Ferreira, G. Bernardi et al.,Cosmology with a ska hi intensity mapping survey,arXiv preprint arXiv:1501.03989(2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[61]
X. Chen,The tianlai project: a 21cm cosmology experiment, inInternational Journal of Modern Physics: Conference Series, vol. 12, pp. 256–263, World Scientific, 2012
work page 2012
-
[62]
E. Abdalla, E.G. Ferreira, R.G. Landim, A.A. Costa, K.S. Fornazier, F.B. Abdalla et al.,The bingo project-i. baryon acoustic oscillations from integrated neutral gas observations, Astronomy & Astrophysics664(2022) A14
work page 2022
-
[63]
C.A. Wuensche, T. Villela, E. Abdalla, V. Liccardo, F. Vieira, I. Browne et al.,The bingo project-ii. instrument description,Astronomy & Astrophysics664(2022) A15
work page 2022
-
[64]
F.B. Abdalla, A. Marins, P. Motta, E. Abdalla, R.M. Ribeiro, C.A. Wuensche et al.,The bingo project-iii. optical design and optimization of the focal plane,Astronomy & Astrophysics664 (2022) A16
work page 2022
-
[65]
V. Liccardo, E.J. de Mericia, C.A. Wuensche, E. Abdalla, F.B. Abdalla, L. Barosi et al.,The bingo project-iv. simulations for mission performance assessment and preliminary component separation steps,Astronomy & Astrophysics664(2022) A17
work page 2022
-
[66]
K.S. Fornazier, F.B. Abdalla, M. Remazeilles, J. Vieira, A. Marins, E. Abdalla et al.,The bingo project-v. further steps in component separation and bispectrum analysis,Astronomy & Astrophysics664(2022) A18
work page 2022
- [67]
-
[68]
A.A. Costa, R.G. Landim, C.P. Novaes, L. Xiao, E.G. Ferreira, F.B. Abdalla et al.,The bingo project-vii. cosmological forecasts from 21 cm intensity mapping,Astronomy & Astrophysics 664(2022) A20
work page 2022
- [69]
-
[70]
M.V.d. Santos, R.G. Landim, G.A. Hoerning, F.B. Abdalla, A. Queiroz, E. Abdalla et al.,The bingo project ix: Search for fast radio bursts–a forecast for the bingo interferometry system, arXiv preprint arXiv:2308.06805(2023)
-
[71]
T. Abbott et al.,Dark energy survey year 3 results: Data release 2,The Astrophysical Journal Supplement Series255(2021) 20
work page 2021
-
[72]
Ahumada et al.,The 16th data release of the sloan digital sky surveys,The Astrophysical Journal Supplement Series249(2020) 3
work page 2020
-
[73]
P.Z. Kunszt, A.S. Szalay and A.R. Thakar,The hierarchical triangular mesh, inMining the Sky, pp. 631–637, Springer, 2001
work page 2001
-
[74]
T. Cover and P. Hart,Nearest neighbor pattern classification,IEEE Transactions on Information Theory13(1967) 21. – 26 –
work page 1967
- [75]
- [76]
-
[77]
L. Li, Y. Zhang and Y. Zhao,k-nearest neighbors for automated classification of celestial objects,Science in China Series G: Physics, Mechanics and Astronomy51(2008) 916
work page 2008
-
[78]
L. Van der Maaten and G. Hinton,Visualizing data using t-sne.,Journal of machine learning research9(2008)
work page 2008
-
[79]
Benitez,Bayesian photometric redshift estimation,The Astrophysical Journal536(2000) 571
N. Benitez,Bayesian photometric redshift estimation,The Astrophysical Journal536(2000) 571
work page 2000
-
[80]
S. Arnouts and O. Ilbert,Lephare: Photometric analysis for redshift estimate,Astrophysics Source Code Library(2011) ascl
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.