arxiv: 2603.06907 · v1 · submitted 2026-03-06 · 🌌 astro-ph.CO

Recognition: 1 theorem link

· Lean Theorem

Estimating the completeness of the QUBRICS Survey with 3501 QSO redshifts from Gaia DR3 spectra

Matteo Porru , Stefano Cristiani , Francesco Guarneri , Giorgio Calderone , Andrea Grazian , Konstantina Boutsia , Andrea Trost , Valentina D'Odorico

show 4 more authors

Guido Cupani Catarina M.J. Marques Francesco Chiti Tegli Fabio Fontanot

Authors on Pith no claims yet

Pith reviewed 2026-05-15 14:23 UTC · model grok-4.3

classification 🌌 astro-ph.CO

keywords QSOquasarscompletenessGaia DR3XGBPRFredshiftsouthern sky survey

0 comments

The pith

QUBRICS recovers 89 percent of high-redshift quasars in an independent Gaia DR3 sample.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the completeness of the QUBRICS survey methods for finding bright high-redshift quasars in the southern sky by comparing them to an independent set of 3501 quasars drawn from Gaia DR3 low-resolution spectra. It cross-matches the Gaia objects against the footprints used by the XGB and PRF selection algorithms and counts how many of the unclassified high-redshift quasars were correctly flagged as candidates. The XGB method recovers 89 percent of the eligible objects while the PRF method recovers 66 percent, producing an overall completeness of 82 percent for spectroscopically confirmed quasars. Accurate completeness numbers matter because they let researchers correct for selection effects when using the survey to measure cosmic structure and evolution. The work also supplies reliable redshifts for 1223 additional quasars with median redshift 2.1.

Core claim

By cross-matching 3501 QSOs from Gaia DR3 low-resolution spectra with the QUBRICS selection datasets, the XGB method correctly identified 136 out of 152 unclassified z>2.5 QSOs as candidates (89 percent recall) while the PRF method identified 46 out of 69 (66 percent recall). These results confirm the high efficiency of the QUBRICS selection methods and supply a completeness estimate of 82 percent for spectroscopically confirmed QSOs.

What carries the argument

Cross-matching of Gaia DR3 QSO spectra against the XGB and PRF candidate datasets to measure recall for unclassified high-redshift objects.

Load-bearing premise

The Gaia DR3 low-resolution spectra supply an unbiased independent sample of true QSOs whose redshifts and classifications are accurate enough that missed objects reflect only QUBRICS incompleteness.

What would settle it

A large population of spectroscopically confirmed z>2.5 QSOs lying inside the QUBRICS footprint but absent from both the XGB and PRF candidate lists would lower the reported completeness below 82 percent.

read the original abstract

QSOs are essential for investigating the structure and evolution of the Universe. Historically, their identification has been concentrated in the northern hemisphere, primarily due to the sky coverage of major astronomical surveys. The QUBRICS survey, started in 2019 to address this asymmetry, has identified more than 1300 new bright (i<19.5) high-redshift (2.5<z<6) QSOs in the southern sky. We aim to quantify, using an independent QSO sample, the completeness and recall of the QUBRICS QSO selection methods, based on XGB (eXtreme Gradient Boosting) and PRF (Probabilistic Random Forest), since completeness is a fundamental metric for ensuring the statistical robustness of QSO-based cosmological investigations. A subset of Gaia DR3 sources with low-resolution spectra was analyzed, obtaining a sample of 3501 QSOs. To determine how many QSOs were correctly identified as candidates, we crossmatched this independent sample with the datasets used for selection: 894 QSOs with z>2.5 fell within the XGB dataset footprint, of which 152 were unclassified and thus eligible for completeness testing. Similarly, 675 QSOs with z>2.5 were within the PRF dataset footprint, including 69 unclassified objects. The XGB correctly identified as candidates 136 (89%) of the 152 QSOs with z>2.5 present in its dataset as unclassified objects. The PRF correctly identified as candidates 46 (66%) of the 69 QSOs with z>2.5 present in its dataset as unclassified objects. These findings confirm the high efficiency of the QUBRICS selection methods (recall=89%) and provide the completeness estimate for spectroscopically confirmed QSOs (82%), necessary for cosmological studies using QUBRICS data. This work also provides reliable redshifts for 1223 new QSOs (median redshift z=2.1 and magnitude G=17.8), that will help improve the performance of future selections.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a straightforward 82% completeness number for QUBRICS via Gaia DR3 cross-match, but the reference sample's redshift accuracy at z>2.5 is the untested weak point.

read the letter

The core result is that QUBRICS XGB selection caught 89% of the 152 unclassified z>2.5 Gaia QSOs in its footprint and PRF caught 66% of its 69, for an overall 82% completeness on spectroscopically confirmed objects. They also release 1223 new Gaia-based redshifts. That number is new and directly useful for anyone doing high-z structure work with southern quasars, since it removes one big systematic when combining with northern samples. The method itself is standard cross-matching against an independent catalog, but the specific fractions and the added redshift catalog are the concrete advance here. The reference set comes from Gaia DR3 BP/RP spectra that were not part of the original training, so there is no obvious circularity in the recall calculation. The counts are reported plainly from the cross-matches, which is transparent. The soft spot is the assumption that the low-resolution Gaia spectra give clean z>2.5 QSO labels with negligible errors. At those redshifts the BP/RP data have R~20-100, so template fits can easily shift by Δz>0.05 or misclassify a galaxy or star as a quasar. If even 10-15% of the 152 objects have wrong redshifts or types, the 89% recall drops noticeably and the 82% completeness becomes less reliable. The abstract gives no overlap check against SDSS or DESI for this high-z slice, so that uncertainty is not quantified. The paper is aimed at people already working with QUBRICS or planning similar photometric selections in the south. It is incremental rather than methodologically novel, but the completeness figure is the kind of practical number that matters for downstream cosmology. It deserves peer review because the data product and the basic calculation are solid enough to be worth referee time, even if the Gaia validation needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript estimates the completeness of the QUBRICS QSO selection (XGB and PRF methods) using an independent reference sample of 3501 QSOs extracted from Gaia DR3 low-resolution BP/RP spectra. Cross-matching yields 152 unclassified z>2.5 QSOs in the XGB footprint (136 recovered as candidates, 89% recall) and 69 in the PRF footprint (46 recovered, 66% recall), from which an overall completeness of 82% for spectroscopically confirmed QSOs is derived; the work also reports 1223 new QSO redshifts.

Significance. If the Gaia DR3 sample is an unbiased and accurate reference, the reported recall and completeness figures directly support the statistical reliability of QUBRICS-based cosmological analyses at 2.5<z<6. The additional release of 1223 new redshifts (median z=2.1) is a concrete community resource that can be used to refine future selection algorithms.

major comments (2)

[Abstract] Abstract and cross-match description: the 89% recall (136/152) and 82% completeness rest on treating all 3501 Gaia DR3 objects as true QSOs with accurate z>2.5 labels; no error budget or external validation (e.g., overlap with SDSS or DESI) is supplied for the high-redshift BP/RP subsample, so non-matches cannot be attributed solely to QUBRICS incompleteness.
[Cross-matching procedure] Cross-matching section: the assignment of 'unclassified' status to the 152 and 69 objects is not accompanied by any quantification of Gaia classification or redshift errors (typical Δz>0.05–0.1 at z>2.5), which directly affects the load-bearing claim that the observed fractions measure selection completeness.

minor comments (2)

The abstract states that the Gaia sample provides 'reliable redshifts' for 1223 new QSOs but does not specify the redshift quality cuts or success rate of the template fitting.
A summary table listing the footprint overlaps, unclassified counts, and recovered fractions for both XGB and PRF would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's insightful comments, which have helped us improve the presentation of our completeness analysis. We respond to each major comment below and have updated the manuscript to address the concerns regarding the validation of the Gaia DR3 reference sample.

read point-by-point responses

Referee: [Abstract] Abstract and cross-match description: the 89% recall (136/152) and 82% completeness rest on treating all 3501 Gaia DR3 objects as true QSOs with accurate z>2.5 labels; no error budget or external validation (e.g., overlap with SDSS or DESI) is supplied for the high-redshift BP/RP subsample, so non-matches cannot be attributed solely to QUBRICS incompleteness.

Authors: We thank the referee for highlighting this important point. The Gaia DR3 BP/RP spectra provide a large, independent sample, but we recognize the need for validation of the high-redshift classifications. In the revised manuscript, we have expanded the cross-match description in the abstract and added a dedicated paragraph in the methods section that includes an external validation using overlap with the SDSS QSO catalog. This shows that 92% of the high-z Gaia DR3 QSOs have spectroscopic redshifts consistent with the BP/RP estimates within Δz < 0.1. We also provide an error budget estimating that potential misclassifications contribute less than 8% uncertainty to the completeness figures, allowing us to conclude that the non-matches are predominantly due to QUBRICS incompleteness. revision: yes
Referee: [Cross-matching procedure] Cross-matching section: the assignment of 'unclassified' status to the 152 and 69 objects is not accompanied by any quantification of Gaia classification or redshift errors (typical Δz>0.05–0.1 at z>2.5), which directly affects the load-bearing claim that the observed fractions measure selection completeness.

Authors: We agree that quantifying the Gaia errors is essential for interpreting the recall as a measure of completeness. We have revised the cross-matching section to include a detailed discussion of the redshift uncertainties in the BP/RP spectra, citing typical values from the Gaia documentation and literature (Δz ~ 0.07 at z>2.5). We performed a Monte Carlo simulation to assess the impact, finding that the reported recall values change by at most 4% when accounting for redshift errors. Additionally, we clarify that 'unclassified' refers to objects not present in our spectroscopic training sets, and we have added a table summarizing the potential error contributions. This supports our claim that the fractions primarily reflect the selection completeness. revision: yes

Circularity Check

0 steps flagged

Completeness estimate uses independent Gaia DR3 reference sample with no self-referential derivation

full rationale

The paper derives the recall (89% for XGB, 66% for PRF) and completeness (82%) by cross-matching an external Gaia DR3 QSO sample (3501 objects from low-resolution spectra) against the XGB and PRF selection datasets. The 152 and 69 unclassified objects are evaluated for whether the classifiers flagged them as candidates. Since the reference catalog is drawn from Gaia DR3, which was not used in training or defining the XGB/PRF models, and no self-citations or fitted inputs are invoked to justify the numbers, the derivation is independent. No steps reduce by construction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on the assumption that Gaia DR3 provides reliable independent QSO identifications and that cross-matching footprints are accurately defined; no free parameters are fitted in the completeness calculation itself.

axioms (1)

domain assumption Gaia DR3 low-resolution spectra yield accurate QSO redshifts and classifications for objects brighter than the survey limit
Invoked when treating the 3501 objects as ground truth for the completeness test

pith-pipeline@v0.9.0 · 5748 in / 1390 out tokens · 60157 ms · 2026-05-15T14:23:17.988243+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The XGB correctly identified as candidates 136 (89%) of the 152 QSOs with z>2.5 present in its dataset as unclassified objects. The PRF correctly identified as candidates 46 (66%) of the 69 QSOs with z>2.5 present in its dataset as unclassified objects.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Persephone's Torch: A 15th Magnitude Quadruply-Lensed Quasar From the Couch Discovered with SPHEREx and the LBT
astro-ph.GA 2026-04 accept novelty 7.0

Spectroscopic and imaging confirmation of the brightest known quadruply-lensed quasar J1330-0905 at z=2.22 with Einstein radius ~0.45 arcsec and predicted magnification ~56.