pith. machine review for the scientific record. sign in

arxiv: 2604.18910 · v1 · submitted 2026-04-20 · 🌌 astro-ph.GA

Recognition: unknown

Predicting Redshift in Seyfert Galaxies Using Machine Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:23 UTC · model grok-4.3

classification 🌌 astro-ph.GA
keywords photometric redshiftSeyfert galaxiesmachine learningrandom forestoptical colorsmid-infrared photometryactive galactic nucleiSDSS and WISE
0
0 comments X

The pith

Machine learning models using combined optical and mid-infrared colors can estimate photometric redshifts for Seyfert II galaxies with high accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops machine learning techniques to predict redshifts from photometry alone for Seyfert II galaxies, which are difficult due to their active nuclei and dust. Using a sample of nearly 24,000 galaxies with known spectroscopic redshifts from SDSS matched to WISE infrared data, the authors test different feature sets and models. The best performance comes from combining optical and MIR colors in a Random Forest model. This matters because large future surveys will have far more photometric data than spectroscopic, making accurate photo-z essential for studying galaxy evolution and cosmology. The results highlight that feature choice and sample uniformity drive success in this challenging subclass.

Core claim

Using a spectroscopically confirmed sample of 23,797 Seyfert II galaxies from SDSS cross-matched with WISE, the authors demonstrate that photometric redshift estimation via machine learning reaches NMAD = 0.0188, R² = 0.9561, and outlier fraction η = 0.294% when employing combined optical and mid-infrared broadband colors with a Random Forest regressor. This outperforms single-band regimes and shows that accuracy stems from the physical information in the features and the homogeneity of the sample, providing a scalable method for upcoming surveys.

What carries the argument

Random Forest regression applied to combined optical+MIR broadband color features, which encodes the spectral energy distribution across wavelengths to infer redshift without spectra.

Load-bearing premise

The spectroscopically selected training sample remains representative of the photometric population in future surveys, and the broadband colors capture redshift information without strong selection biases or redshift-dependent changes in galaxy properties.

What would settle it

Applying the trained model to an independent set of photometrically selected Seyfert galaxies from a new survey and finding substantially higher outlier fractions or worse NMAD values would indicate the claim does not hold.

read the original abstract

Photometric redshift estimation is a key requirement for modern large-area surveys, where spectroscopic measurements are observationally prohibitive. Seyfert II galaxies provide a particularly challenging test case due to the combined effects of nuclear activity, host-galaxy emission, and dust attenuation. In this work, we develop a machine learning approach for photometric redshift estimation using a spectroscopically defined sample of 23,797 Seyfert II galaxies selected from SDSS and cross-matched with WISE. We construct feature sets based on optical, mid-infrared (MIR), and combined optical+MIR broadband colours, and evaluate their performance using different regression models. The best results are obtained with the combined Optical+MIR features and a Random Forest model, reaching NMAD = 0.0188, R 2 = 0.9561, and an outlier fraction of {\eta} = 0.294%. The results show that the accuracy is primarily driven by the physical information content of the features and the homogeneity of the sample. The method provides a robust and scalable solution for photometric redshift estimation in upcoming wide-field surveys.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a machine learning approach for photometric redshift estimation in Seyfert II galaxies. A sample of 23,797 spectroscopically confirmed objects is drawn from SDSS and cross-matched with WISE photometry. Regression models are trained on optical, mid-infrared, and combined broadband color feature sets. The best reported performance is obtained with a Random Forest model using the combined optical+MIR features, yielding NMAD = 0.0188, R² = 0.9561, and outlier fraction η = 0.294%. The authors conclude that accuracy is driven by feature information content and sample homogeneity, and that the method offers a robust, scalable solution for upcoming wide-field surveys.

Significance. If the reported performance generalizes beyond the spectroscopically selected training distribution, the work would provide a practical tool for redshift estimation in large AGN surveys where spectroscopy is prohibitive. The emphasis on combined optical+MIR colors addresses a known challenge for active galaxies, and the empirical metrics on the held-out spectroscopic sample are competitive. However, the absence of external validation limits the strength of the scalability claim.

major comments (2)
  1. [Methods] Methods section (model training and evaluation): The manuscript provides no explicit description of the train-test splitting procedure, cross-validation strategy, or hyperparameter optimization for the Random Forest (or other models). Without these details it is impossible to assess whether the quoted metrics (NMAD = 0.0188, R² = 0.9561) are stable or over-optimistic.
  2. [Results and Discussion] Results/Discussion (scalability claim): The assertion that the method is 'robust and scalable' for future photometric surveys rests on the untested assumption that the SDSS spectroscopic selection function does not introduce biases that degrade performance on purely photometric samples. No magnitude-stratified performance tests, no comparison against an independent photometric Seyfert catalog, and no simulation of altered completeness functions are reported, which directly undermines the central claim of applicability to wide-field surveys.
minor comments (2)
  1. [Abstract] Abstract: 'R 2' should be rendered as R²; the outlier fraction symbol should be introduced as η on first use.
  2. [Figures and Tables] Figure captions and tables: Ensure all performance metrics are accompanied by uncertainty estimates or bootstrap errors where feasible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The comments have prompted us to clarify key methodological details and to moderate our claims regarding scalability. We address each major comment below.

read point-by-point responses
  1. Referee: [Methods] Methods section (model training and evaluation): The manuscript provides no explicit description of the train-test splitting procedure, cross-validation strategy, or hyperparameter optimization for the Random Forest (or other models). Without these details it is impossible to assess whether the quoted metrics (NMAD = 0.0188, R² = 0.9561) are stable or over-optimistic.

    Authors: We agree that these procedural details were omitted from the original submission. In the revised manuscript we have expanded the Methods section to explicitly describe the random 80/20 train-test split, the 5-fold cross-validation used for hyperparameter tuning via grid search, and the final Random Forest hyperparameters (n_estimators=100, max_depth=20, min_samples_split=5). These additions demonstrate that the reported metrics were obtained through standard, reproducible practices. revision: yes

  2. Referee: [Results and Discussion] Results/Discussion (scalability claim): The assertion that the method is 'robust and scalable' for future photometric surveys rests on the untested assumption that the SDSS spectroscopic selection function does not introduce biases that degrade performance on purely photometric samples. No magnitude-stratified performance tests, no comparison against an independent photometric Seyfert catalog, and no simulation of altered completeness functions are reported, which directly undermines the central claim of applicability to wide-field surveys.

    Authors: We partially agree. The original claim was based on the homogeneity of the spectroscopically confirmed Seyfert II sample and the information gain from combined optical+MIR colors. We acknowledge the absence of external validation. In revision we have added a dedicated limitations paragraph, reported magnitude-stratified NMAD and outlier fractions on the held-out test set (showing stable performance across bins), and revised the abstract and conclusions to state that the approach 'shows promise for wide-field surveys subject to further validation on photometric samples'. revision: partial

Circularity Check

0 steps flagged

No circularity; standard empirical ML evaluation on held-out data

full rationale

The paper trains regression models (including Random Forest) on broadband optical and MIR colors derived from SDSS+WISE photometry to predict spectroscopic redshifts for a sample of 23,797 Seyfert II galaxies. Performance metrics (NMAD=0.0188, R²=0.9561, outlier fraction 0.294%) are computed on a held-out test portion of the same spectroscopically confirmed sample. This constitutes a direct empirical measurement of generalization error within the training distribution. No algebraic derivation, self-referential fitting, or self-citation chain reduces the quoted metrics to quantities defined by the fit itself. The central claim is an observed performance number on independent test data, not a tautology or renamed input.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical performance of fitted regression models whose hyperparameters and feature definitions are chosen to match the training data, plus the domain assumption that the SDSS-WISE cross-matched Seyfert II sample is homogeneous enough to generalize.

free parameters (1)
  • Random Forest hyperparameters
    Number of trees, maximum depth, and other tuning parameters are adjusted to the data but not reported.
axioms (1)
  • domain assumption The spectroscopically confirmed Seyfert II sample is homogeneous and representative of the photometric population
    Invoked to claim that the reported accuracy will hold for future surveys.

pith-pipeline@v0.9.0 · 5482 in / 1274 out tokens · 68951 ms · 2026-05-10T03:23:52.242003+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 47 canonical work pages · 6 internal anchors

  1. [1]

    Machine Learning , author =

    Random Forests. Machine Learning , keywords =. doi:10.1023/A:1010933404324 , adsurl =

  2. [2]

    XGBoost: A Scalable Tree Boosting System

    XGBoost: A Scalable Tree Boosting System. arXiv e-prints , keywords =. doi:10.48550/arXiv.1603.02754 , archivePrefix =. 1603.02754 , primaryClass =

  3. [3]

    , keywords =

    Machine learning applications in studies of the physical properties of active galactic nuclei based on photometric observations. , keywords =. doi:10.1051/0004-6361/202346557 , archivePrefix =. 2303.18076 , primaryClass =

  4. [4]

    arXiv:2507.07093

    The Nineteenth Data Release of the Sloan Digital Sky Survey. arXiv e-prints , keywords =. doi:10.48550/arXiv.2507.07093 , archivePrefix =. 2507.07093 , primaryClass =

  5. [5]

    , keywords =

    Optimal Filter Systems for Photometric Redshift Estimation. , keywords =. doi:10.1088/0004-637X/692/1/L5 , archivePrefix =. 0812.3568 , primaryClass =

  6. [6]

    S., Babler, B

    The Wavelength Dependence of Interstellar Extinction from 1.25 to 8.0 m Using GLIMPSE Data. , keywords =. doi:10.1086/426679 , archivePrefix =. astro-ph/0406403 , primaryClass =

  7. [7]

    W., & Schlegel, D

    WISE Photometry for 400 Million SDSS Sources. , keywords =. doi:10.3847/0004-6256/151/2/36 , archivePrefix =. 1410.7397 , primaryClass =

  8. [8]

    L., Eisenhardt, P

    The Wide-field Infrared Survey Explorer (WISE): Mission Description and Initial On-orbit Performance. , keywords =. doi:10.1088/0004-6256/140/6/1868 , archivePrefix =. 1008.0031 , primaryClass =

  9. [9]

    Wide-Field InfrarRed Survey Telescope-Astrophysics Focused Telescope Assets WFIRST-AFTA 2015 Report

    Wide-Field InfrarRed Survey Telescope-Astrophysics Focused Telescope Assets WFIRST-AFTA 2015 Report. arXiv e-prints , keywords =. doi:10.48550/arXiv.1503.03757 , archivePrefix =. 1503.03757 , primaryClass =

  10. [10]

    Euclid Definition Study Report

    Euclid Definition Study Report. arXiv e-prints , keywords =. doi:10.48550/arXiv.1110.3193 , archivePrefix =. 1110.3193 , primaryClass =

  11. [11]

    , keywords =

    The Biases of Optical Line-Ratio Selection for Active Galactic Nuclei and the Intrinsic Relationship between Black Hole Accretion and Galaxy Star Formation. , keywords =. doi:10.1088/0004-637X/811/1/26 , archivePrefix =. 1501.02801 , primaryClass =

  12. [12]

    2010, MNRAS, 401, 1670, doi: 10.1111/j.1365-2966.2009.15794.x

    Alternative diagnostic diagrams and the `forgotten' population of weak line galaxies in the SDSS. , keywords =. doi:10.1111/j.1365-2966.2009.16185.x , archivePrefix =. 0912.1643 , primaryClass =

  13. [13]

    2009, ApJ, 690, 1236, doi: 10.1088/0004-637X/690/2/1236 Ivezi´ c,ˇZ., Kahn, S

    Cosmos Photometric Redshifts with 30-Bands for 2-deg ^ 2. , keywords =. doi:10.1088/0004-637X/690/2/1236 , archivePrefix =. 0809.2101 , primaryClass =

  14. [14]

    , keywords =

    Angular Clustering with Photometric Redshifts in the Sloan Digital Sky Survey: Bimodality in the Clustering Properties of Galaxies. , keywords =. doi:10.1086/377168 , archivePrefix =. astro-ph/0305603 , primaryClass =

  15. [15]

    , eprint =

    K-Corrections and Filter Transformations in the Ultraviolet, Optical, and Near-Infrared. , keywords =. doi:10.1086/510127 , archivePrefix =. astro-ph/0606170 , primaryClass =

  16. [16]

    J., Groves, B., Kauffmann, G., & Heckman, T

    The host galaxies and classification of active galactic nuclei. , keywords =. doi:10.1111/j.1365-2966.2006.10859.x , archivePrefix =. astro-ph/0605681 , primaryClass =

  17. [17]

    A., Phillips, M

    Classification parameters for the emission-line spectra of extragalactic objects. , keywords =. doi:10.1086/130766 , adsurl =

  18. [18]

    Merloni, S

    The host galaxies of active galactic nuclei. , keywords =. doi:10.1111/j.1365-2966.2003.07154.x , archivePrefix =. astro-ph/0304239 , primaryClass =

  19. [19]

    J., Dopita, M

    Theoretical Modeling of Starburst Galaxies. , keywords =. doi:10.1086/321545 , archivePrefix =. astro-ph/0106324 , primaryClass =

  20. [20]

    , keywords =

    The Molecular Wind in the Nearest Seyfert Galaxy Circinus Revealed by ALMA. , keywords =. doi:10.3847/0004-637X/832/2/142 , archivePrefix =. 1609.06316 , primaryClass =

  21. [21]

    Detailed Decomposition of Galaxy Images. II. Beyond Axisymmetric Models. , keywords =. doi:10.1088/0004-6256/139/6/2097 , archivePrefix =. 0912.0731 , primaryClass =

  22. [22]

    Nature Astronomy , author =

    The many flavours of photometric redshifts. Nature Astronomy , keywords =. doi:10.1038/s41550-018-0478-0 , archivePrefix =. 1805.12574 , primaryClass =

  23. [23]

    Monthly Notices of the Royal Astronomical Society , author =

    Photometric redshifts for the SDSS Data Release 12. , keywords =. doi:10.1093/mnras/stw1009 , archivePrefix =. 1603.09708 , primaryClass =

  24. [24]

    , keywords =

    Random Forests for Photometric Redshifts. , keywords =. doi:10.1088/0004-637X/712/1/511 , adsurl =

  25. [25]

    , keywords =

    ANNz: Estimating Photometric Redshifts Using Artificial Neural Networks. , keywords =. doi:10.1086/383254 , archivePrefix =. astro-ph/0311058 , primaryClass =

  26. [26]

    , keywords =

    Photometric Redshift and Classification for the XMM-COSMOS Sources. , keywords =. doi:10.1088/0004-637X/690/2/1250 , archivePrefix =. 0809.2098 , primaryClass =

  27. [27]

    , keywords =

    Colors of 2625 Quasars at 0<Z<5 Measured in the Sloan Digital Sky Survey Photometric System. , keywords =. doi:10.1086/320392 , archivePrefix =. astro-ph/0012449 , primaryClass =

  28. [28]

    Photometric Redshifts based on standard SED fitting procedures

    Photometric redshifts based on standard SED fitting procedures. , keywords =. doi:10.48550/arXiv.astro-ph/0003380 , archivePrefix =. astro-ph/0003380 , primaryClass =

  29. [29]

    Bayesian

    Bayesian Photometric Redshift Estimation. , keywords =. doi:10.1086/308947 , archivePrefix =. astro-ph/9811189 , primaryClass =

  30. [30]

    , keywords =

    Reconstructing Galaxy Spectral Energy Distributions from Broadband Photometry. , keywords =. doi:10.1086/301159 , archivePrefix =. astro-ph/9910389 , primaryClass =

  31. [31]

    , keywords =

    Slicing Through Multicolor Space: Galaxy Redshifts from Broadband Photometry. , keywords =. doi:10.1086/117720 , archivePrefix =. astro-ph/9508100 , primaryClass =

  32. [32]

    , keywords =

    Optical multicolors : a poor person's Z machine for galaxies. , keywords =. doi:10.1086/113748 , adsurl =

  33. [33]

    , keywords =

    Quasar photometric redshifts from incomplete data using deep learning. , keywords =. doi:10.1093/mnras/stac660 , archivePrefix =. 2203.03679 , primaryClass =

  34. [34]

    , keywords =

    Photometric redshift estimation for CSST survey with LSTM neural networks. , keywords =. doi:10.1093/mnras/stae2446 , archivePrefix =. 2410.19402 , primaryClass =

  35. [35]

    ANNz2: Estimating photometric redshift and probability density functions using machine learning methods

  36. [36]

    , keywords =

    Photometric redshift-aided classification using ensemble learning. , keywords =. doi:10.1051/0004-6361/202243135 , archivePrefix =. 2204.02080 , primaryClass =

  37. [37]

    Problems of Extra-Galactic Research , year = 1962, editor =

    Photoelectric Magnitudes and Red-Shifts. Problems of Extra-Galactic Research , year = 1962, editor =

  38. [38]

    and others , title =

    Saxena, A. and others , title =. arXiv e-prints , year =

  39. [39]

    E., et al

    The Sloan Digital Sky Survey Photometric System. , keywords =. doi:10.1086/117915 , adsurl =

  40. [40]

    J., Finkbeiner, D

    Maps of Dust Infrared Emission for Use in Estimation of Reddening and Cosmic Microwave Background Radiation Foregrounds. , keywords =. doi:10.1086/305772 , archivePrefix =. astro-ph/9710327 , primaryClass =

  41. [41]

    Active Galactic Nuclei in the Sloan Digital Sky Survey. I. Sample Selection. , keywords =. doi:10.1086/428485 , archivePrefix =. astro-ph/0501059 , primaryClass =

  42. [42]

    A., Weinberg, D

    Spectroscopic Target Selection in the Sloan Digital Sky Survey: The Main Galaxy Sample. , keywords =. doi:10.1086/342343 , archivePrefix =. astro-ph/0206225 , primaryClass =

  43. [43]

    , keywords =

    Spectral Classification and Redshift Measurement for the SDSS-III Baryon Oscillation Spectroscopic Survey. , keywords =. doi:10.1088/0004-6256/144/5/144 , archivePrefix =. 1207.7326 , primaryClass =

  44. [44]

    Ivezi ´c, S

    LSST: From Science Drivers to Reference Design and Anticipated Data Products. , keywords =. doi:10.3847/1538-4357/ab042c , archivePrefix =. 0805.2366 , primaryClass =

  45. [45]

    LSST Science Book, Version 2.0

    LSST Science Book, Version 2.0. arXiv e-prints , keywords =. doi:10.48550/arXiv.0912.0201 , archivePrefix =. 0912.0201 , primaryClass =

  46. [46]

    2011, ApJS, 193, 29, doi: 10.1088/0067-0049/193/2/29

    The Eighth Data Release of the Sloan Digital Sky Survey: First Data from SDSS-III. , keywords =. doi:10.1088/0067-0049/193/2/29 , archivePrefix =. 1101.1559 , primaryClass =

  47. [47]

    G., Adelman, J., Anderson, Jr., J

    The Sloan Digital Sky Survey: Technical Summary. , keywords =. doi:10.1086/301513 , archivePrefix =. astro-ph/0006396 , primaryClass =

  48. [48]

    Peebles,Principles of Physical Cosmology(1993), 10.1515/9780691206721

    Principles of Physical Cosmology. doi:10.1515/9780691206721 , adsurl =

  49. [49]

    Princeton University Pres , doi =

    Principles of Physical Cosmology. Princeton University Pres , doi =

  50. [50]

    , year = 1931, month = mar, volume =

    A homogeneous universe of constant mass and increasing radius accounting for the radial velocity of extra-galactic nebulae. , year = 1931, month = mar, volume =. doi:10.1093/mnras/91.5.483 , adsurl =

  51. [51]

    Proceedings of the National Academy of Science , year = 1929, month = mar, volume =

    A Relation between Distance and Radial Velocity among Extra-Galactic Nebulae. Proceedings of the National Academy of Science , year = 1929, month = mar, volume =. doi:10.1073/pnas.15.3.168 , adsurl =