Feature importance for machine learning redshifts applied to SDSS galaxies

Ben Hoyle , Markus Michael Rau , Roman Zitlau , Stella Seitz , Jochen Weller

Authors on Pith no claims yet

classification 🌌 astro-ph.IM astro-ph.CO

keywords learningmachineredshiftredshiftsphotometriccatastrophicdecreasesfeatures

read the original abstract

We present an analysis of importance feature selection applied to photometric redshift estimation using the machine learning architecture Decision Trees with the ensemble learning routine Adaboost (hereafter RDF). We select a list of 85 easily measured (or derived) photometric quantities (or `features') and spectroscopic redshifts for almost two million galaxies from the Sloan Digital Sky Survey Data Release 10. After identifying which features have the most predictive power, we use standard artificial Neural Networks (aNN) to show that the addition of these features, in combination with the standard magnitudes and colours, improves the machine learning redshift estimate by 18% and decreases the catastrophic outlier rate by 32%. We further compare the redshift estimate using RDF with those from two different aNNs, and with photometric redshifts available from the SDSS. We find that the RDF requires orders of magnitude less computation time than the aNNs to obtain a machine learning redshift while reducing both the catastrophic outlier rate by up to 43%, and the redshift error by up to 25%. When compared to the SDSS photometric redshifts, the RDF machine learning redshifts both decreases the standard deviation of residuals scaled by 1/(1+z) by 36% from 0.066 to 0.041, and decreases the fraction of catastrophic outliers by 57% from 2.32% to 0.99%.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Machine Learning Techniques for Astrophysics and Cosmology: Photometric Redshifts
astro-ph.IM 2026-05 unverdicted novelty 3.0

AI techniques for photometric redshift estimation have converged and are now limited by the size, systematics, and selection effects in spectroscopic training samples rather than by methodology.