Recognition: unknown
The k-MENDEL sample of local analogs to reionization galaxies. Spectral identification of EELGs and properties of green peas in DESI
Pith reviewed 2026-05-10 16:46 UTC · model grok-4.3
The pith
A large DESI sample shows extreme emission-line galaxies as transient phases in low-mass galaxy evolution that resemble reionization-era systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The k-MENDEL sample of EELGs selected via automatic k-means classification on DESI spectra extends prior samples to higher redshifts and lower metallicities, revealing that these galaxies lie above the star-forming main sequence with sSFRs up to 100 Gyr^-1 and follow a shallower mass-metallicity relation offset by 0.3-0.5 dex, closely resembling high-z JWST galaxies and supporting their role as short-lived non-equilibrium phases in low-mass galaxy evolution.
What carries the argument
k-means classification applied to DESI spectra to isolate EELGs, followed by SED fitting and temperature-based metallicity measurements.
If this is right
- Only about 6% of the sample shows AGN-like signatures, while the rest are dominated by intense star formation with high ionization parameters.
- EELGs exhibit large intrinsic metallicity scatter even after accounting for the fundamental metallicity relation, indicating departures from simple bathtub models.
- The sample spans stellar masses from 10^6 to 10^10 solar masses and star formation rates from 0.1 to 100 solar masses per year.
- High ionization ratios like O32 up to 60 in the most extreme systems match those in confirmed Lyman-continuum emitters.
Where Pith is reading between the lines
- Similar machine-learning selection methods could be applied to other large spectroscopic surveys to expand the census of such galaxies.
- The non-equilibrium nature suggests that chemical evolution models for low-mass galaxies need to incorporate episodic inflows and outflows more explicitly.
- These local analogs may help calibrate the escape fraction of ionizing photons in simulations of reionization.
- Future observations could test if the properties evolve smoothly with redshift or show distinct phases.
Load-bearing premise
The k-means algorithm applied to the spectra successfully selects genuine extreme emission-line galaxies with little contamination from other populations.
What would settle it
Detailed follow-up spectroscopy of a subset of the k-MENDEL sample revealing that a large fraction lack the defining extreme emission lines or have properties inconsistent with the reported metallicities and ionization states.
Figures
read the original abstract
Low-mass galaxies with intense starbursts exhibit spectra dominated by extreme nebular emission and faint stellar continua. These extreme emission-line galaxies (EELGs) are key laboratories to study star formation, feedback, and ionizing photon escape in low-metallicity environments. We exploit the DESI survey to assemble the k-Means of Extreme Nebulae from DEsi outLiers (k-MENDEL), a statistically robust sample of ~16,000 EELGs at 0.01 < z < 0.96 selected via automatic k-means classification. Using SED fitting and Te-based metallicities, we characterize EELGs including "blueberry" and "green pea" galaxies, spanning stellar masses of 10^6-10^10 Msun and SFRs of 0.1-100 Msun/yr. k-MENDEL extends previous SDSS samples toward higher redshifts and lower metallicities (12+log(O/H) ~ 7.0-8.5). EELGs lie systematically above the star-forming main sequence, with sSFRs up to ~100 Gyr^-1. They follow a shallower mass-metallicity relation offset by 0.3-0.5 dex from local relations, closely resembling young galaxies observed with JWST at z > 3-10. The large intrinsic metallicity scatter, even after projecting along the fundamental metallicity relation, indicates strong departures from simple "bathtub" models, suggesting massive inflows of metal-poor gas followed by strong feedback. While ~6% of the sample shows AGN-like signatures, the most extreme star-forming systems reach high ionization (O32 ~ 5-60) comparable to confirmed Lyman-continuum emitters. Our results support the interpretation of EELGs as short-lived, non-equilibrium phases in the evolution of low-mass galaxies and highlight their importance as nearby analogs of galaxies likely driving cosmic reionization (Abridged).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the k-MENDEL sample of ~16,000 extreme emission-line galaxies (EELGs) at 0.01 < z < 0.96 assembled from DESI spectra via k-means clustering. Using SED fitting and Te-based metallicities, the authors characterize stellar masses (10^6-10^10 Msun), SFRs (0.1-100 Msun/yr), metallicities (12+log(O/H) ~7.0-8.5), sSFRs (up to ~100 Gyr^-1), and ionization parameters (O32 ~5-60). They report EELGs lie above the star-forming main sequence, follow a shallower MZR offset 0.3-0.5 dex from local relations, show large intrinsic metallicity scatter even after FMR projection, and interpret the population as short-lived non-equilibrium phases in low-mass galaxy evolution that serve as local analogs to reionization-era galaxies.
Significance. If the automated selection is validated as clean, the large sample size and extension to higher redshift/lower metallicity than prior SDSS work would provide a statistically robust catalog of local EELGs for comparison to JWST high-z observations. The reported MZR offset, high sSFRs, and O32 values comparable to confirmed LyC emitters would strengthen evidence for inflow/feedback-driven non-equilibrium evolution in low-mass systems and their role as reionization analogs. The automated k-means approach on a major survey like DESI offers potential for reproducibility and scalability.
major comments (1)
- [Methods] Methods (k-means classification and sample selection): The manuscript provides no quantitative validation metrics for the k-means clustering (e.g., purity/completeness against BPT diagnostics, cross-matches to known green pea/blueberry catalogs, or tests on mock spectra with varying S/N and redshift). This is load-bearing for the central claim because contamination from normal star-formers, weak AGN, or other outliers could artifactually produce the reported 0.3-0.5 dex MZR offset, high sSFR tail, and intrinsic metallicity scatter after FMR projection.
minor comments (1)
- [Abstract] Abstract: The parenthetical '(Abridged)' at the end is unnecessary and should be removed for the final version.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and positive evaluation of the k-MENDEL sample's potential value for comparisons with high-redshift observations. We address the single major comment on validation of the k-means selection below. We agree that strengthening this aspect will improve the manuscript and plan to incorporate the requested metrics in revision.
read point-by-point responses
-
Referee: [Methods] Methods (k-means classification and sample selection): The manuscript provides no quantitative validation metrics for the k-means clustering (e.g., purity/completeness against BPT diagnostics, cross-matches to known green pea/blueberry catalogs, or tests on mock spectra with varying S/N and redshift). This is load-bearing for the central claim because contamination from normal star-formers, weak AGN, or other outliers could artifactually produce the reported 0.3-0.5 dex MZR offset, high sSFR tail, and intrinsic metallicity scatter after FMR projection.
Authors: We acknowledge that the current version of the manuscript describes the k-means procedure and feature space in Section 2 but does not present explicit quantitative validation metrics such as purity/completeness, cross-matches to literature catalogs, or mock-spectrum tests. This is a fair criticism, as such metrics would directly address concerns about possible contamination affecting the reported MZR offset, sSFR distribution, and metallicity scatter. In the revised manuscript we will add a dedicated subsection (or appendix) containing: (i) BPT-based purity assessment for the subset of objects with [N II], [S II], and H-alpha coverage, (ii) recovery fractions from cross-matches to published green pea and blueberry samples (e.g., Cardamone et al. 2009 and subsequent works), and (iii) results from injecting mock extreme-emission spectra into real DESI noise at varying S/N and redshift to quantify completeness and contamination rates. These additions will demonstrate that the selection is sufficiently clean for the scientific conclusions drawn. revision: yes
Circularity Check
No significant circularity in observational selection and empirical measurements
full rationale
The paper constructs the k-MENDEL sample via k-means clustering applied directly to DESI spectra, then measures stellar masses, SFRs, metallicities (Te-based), and ionization parameters from the spectra and SED fitting on the selected objects. Reported offsets (0.3-0.5 dex in MZR, high sSFR, O32 values) and intrinsic scatter are computed from these independent data products and compared to external literature relations (e.g., local SFMS, MZR, FMR). No derivation step defines a quantity in terms of itself, renames a fitted parameter as a prediction, or relies on a load-bearing self-citation whose content reduces to the present work. The interpretation of short-lived non-equilibrium phases follows from the observed empirical patterns rather than from any internal definitional loop.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Amorín, R., Aguerri, J. A. L., Muñoz-Tuñón, C., & Cairós, L. M. 2009, A&A, 501, 75 Amorín, R., Fontana, A., Pérez-Montero, E., et al. 2017, Nature Astronomy, 1, 0052 Amorín, R., Grazian, A., Castellano, M., et al. 2014, ApJ, 788, L4 Amorín, R., Pérez-Montero, E., Contini, T., et al. 2015, A&A, 578, A105 Amorín, R., Pérez-Montero, E., Vílchez, J. M., & Pap...
-
[2]
Pre-processing includes shifting spec- tra to the rest frame, resampling to∆λ=0.80 Å, and normaliz- ing to the flux in theg-band (λ eff =4825 Å)
in the redshift range 0.01<z<0.96. Pre-processing includes shifting spec- tra to the rest frame, resampling to∆λ=0.80 Å, and normaliz- ing to the flux in theg-band (λ eff =4825 Å). The parent sam- ple is then divided into two subsets: a low-redshift subsample (0.01<z≤0.25, 267,568 spectra) and a high-redshift subsam- ple (0.25<z<0.96, 552,676 spectra). Fo...
2010
-
[3]
A preview of these quantities is shown in Table B.2
In addition, a second table provides key emission-line ratios and derived physical properties for the k-MENDEL sample. A preview of these quantities is shown in Table B.2. The catalog is designed to facilitate reproducibility and enable further analysis of the results presented in this work. Article number, page 18 of 21 Bonatto & Amorín et al.: EELG char...
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.