arxiv: 2604.09516 · v1 · submitted 2026-04-10 · 🌌 astro-ph.GA · astro-ph.CO

Recognition: unknown

The k-MENDEL sample of local analogs to reionization galaxies. Spectral identification of EELGs and properties of green peas in DESI

L. Bonatto , R. Amor\'in , A. Gim\'enez-Alc\'azar , J.A. Fern\'andez-Ontiveros , A. Hern\'an-Caballero , S. Su\'arez , J.M. V\'ilchez , E. P\'erez-Montero

show 2 more authors

M. Llerena J. S\'anchez Almeida

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:46 UTC · model grok-4.3

classification 🌌 astro-ph.GA astro-ph.CO

keywords extreme emission line galaxiesgreen pea galaxiesDESI surveyreionizationmass metallicity relationstarburst galaxieslow mass galaxiesk-means clustering

0 comments

The pith

A large DESI sample shows extreme emission-line galaxies as transient phases in low-mass galaxy evolution that resemble reionization-era systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper uses k-means clustering on DESI spectra to build the k-MENDEL sample of about 16,000 extreme emission-line galaxies across redshifts from 0.01 to 0.96. These objects, which include green peas and blueberries, display elevated specific star formation rates and metallicities lower than typical local galaxies, aligning more closely with young galaxies seen at high redshift by JWST. The offset mass-metallicity relation and large scatter point to non-equilibrium processes like metal-poor gas inflows and feedback. This positions EELGs as important nearby laboratories for the conditions that likely drove cosmic reionization. Readers care because understanding these local systems can inform models of how the first galaxies ionized the universe.

Core claim

The k-MENDEL sample of EELGs selected via automatic k-means classification on DESI spectra extends prior samples to higher redshifts and lower metallicities, revealing that these galaxies lie above the star-forming main sequence with sSFRs up to 100 Gyr^-1 and follow a shallower mass-metallicity relation offset by 0.3-0.5 dex, closely resembling high-z JWST galaxies and supporting their role as short-lived non-equilibrium phases in low-mass galaxy evolution.

What carries the argument

k-means classification applied to DESI spectra to isolate EELGs, followed by SED fitting and temperature-based metallicity measurements.

If this is right

Only about 6% of the sample shows AGN-like signatures, while the rest are dominated by intense star formation with high ionization parameters.
EELGs exhibit large intrinsic metallicity scatter even after accounting for the fundamental metallicity relation, indicating departures from simple bathtub models.
The sample spans stellar masses from 10^6 to 10^10 solar masses and star formation rates from 0.1 to 100 solar masses per year.
High ionization ratios like O32 up to 60 in the most extreme systems match those in confirmed Lyman-continuum emitters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar machine-learning selection methods could be applied to other large spectroscopic surveys to expand the census of such galaxies.
The non-equilibrium nature suggests that chemical evolution models for low-mass galaxies need to incorporate episodic inflows and outflows more explicitly.
These local analogs may help calibrate the escape fraction of ionizing photons in simulations of reionization.
Future observations could test if the properties evolve smoothly with redshift or show distinct phases.

Load-bearing premise

The k-means algorithm applied to the spectra successfully selects genuine extreme emission-line galaxies with little contamination from other populations.

What would settle it

Detailed follow-up spectroscopy of a subset of the k-MENDEL sample revealing that a large fraction lack the defining extreme emission lines or have properties inconsistent with the reported metallicities and ionization states.

Figures

Figures reproduced from arXiv: 2604.09516 by A. Gim\'enez-Alc\'azar, A. Hern\'an-Caballero, E. P\'erez-Montero, J.A. Fern\'andez-Ontiveros, J.M. V\'ilchez, J. S\'anchez Almeida, L. Bonatto, M. Llerena, R. Amor\'in, S. Su\'arez.

**Figure 1.** Figure 1: (Left) Rest-frame optical median stack spectra of the major (bottom) and minor (top) ASK classes. (Right) Median stack spectra of galaxies identified as outliers (see text for details). All spectra are normalized to the continuum level at λ 4800 Å and vertically shifted for clarity. Grey shaded regions indicate the spectral windows used for the ASK classification. Labels denote the number of galaxies contr… view at source ↗

**Figure 2.** Figure 2: Distributions of emission-line equivalent widths for galaxies classified in major (ASK1-12) and minor (ASK13-15) ASK classes. For comparison, we also show galaxies belonging to ASK12, which represents the most extreme major class, and spectral outliers (ASK16-25). Equivalent widths are taken from the DESI-EDR value-added catalog of Zou et al. (2024). Minor classes and outliers are clearly associated with p… view at source ↗

**Figure 3.** Figure 3: Maximum quality associated with major and minor ASK classes for galaxies in the low-redshift subsample used to define the ASK classification (0.01 < z ≤ 0.25). Each point represents one galaxy, showing the highest quality value among major classes (ASK1-12), Qmax(Major), versus that among minor classes (ASK13-15), Qmax(Minor). Colours indicate the ASK class providing the best spectral match, including ou… view at source ↗

**Figure 4.** Figure 4: Examples of DESI EELGs from the k-MENDEL sample. Legacy Survey DR10 color-composite images are shown for a representative subsample spanning a range of stellar masses, redshifts, and emission-line equivalent widths. Images are displayed with North up and East to the left, and each panel corresponds to a field of 20′′ ×20′′. Most systems appear compact or nearly unresolved at DESI imaging resolution, consis… view at source ↗

**Figure 5.** Figure 5: Example of spectrophotometric SED fitting with CIGALE for a representative EELG (DESI TARGETID 39633286554487272). Colored symbols show the synthetic medium-band photometric points derived from the DESI spectrum after convolution with box filters of 124 Å width, while the solid black curve indicates the best-fit model including stellar, nebular, and dust components. The fit illustrates the ability of the… view at source ↗

**Figure 6.** Figure 6: Ionization diagnostic diagrams for the DESI EELG sample. Top panels: Classical BPT diagrams for the subsample at z < 0.45, for which [N ii]λ6583 (left) and [S ii]λλ6716, 6731 (right) are accessible. Solid, dashed, and dotted curves show the demarcation relations from Kewley et al. (2001), Kauffmann et al. (2003), and Xiao et al. (2018), respectively. Symbols are colour-coded according to the O32 ratio, tra… view at source ↗

**Figure 7.** Figure 7: Median rest-frame composite spectrum of the k-MENDEL EELG sample after excluding sources consistent with AGN activity based on emission-line diagnostics (Sect. 5.1). The spectrum is shown in several wavelength intervals to highlight both nebular emission lines and stellar continuum features. Grey dashed vertical lines mark the spectral windows used for emission-line measurements, while red labels identify … view at source ↗

**Figure 8.** Figure 8: Distributions of physical and nebular properties for the k-MENDEL EELG sample. From left to right and top to bottom we show stellar mass, Balmer-based SFR, specific SFR, effective radius, nebular color excess E(B − V), electron density ne from [O ii], electron temperature te from [O iii], and oxygen abundance derived from the direct Te method. Solid vertical lines indicate the median values of each distrib… view at source ↗

**Figure 9.** Figure 9: Star-formation rate and specific SFR vs. stellar mass relation for the k-MENDEL EELG sample. Left: galaxies colour-coded by redshift. Right: galaxies colour-coded by the combined equivalent width EW(Hβ+[O iii]). Star-formation rates are derived from Balmer-line luminosities corrected for extinction and aperture effects. Black dashed lines indicate loci of constant specific star-formation rate (sSFR). For c… view at source ↗

**Figure 10.** Figure 10: Mass–metallicity relation (left) and FMR projection (right) for EELGs with direct metallicity measurements based on the detection of the [O iii]4363 auroral line. Individual galaxies are shown as colour-coded circles according to their instantaneous SFR derived from aperturecorrected fibre spectra. In the top panel, the black solid line shows the best-fit MZR relation obtained in this work. Grey dashed c… view at source ↗

**Figure 11.** Figure 11: O32 versus R23 diagram for the DESI EELG sample. Colored circles show individual EELGs at z < 0.85, color-coded by log EW(Hβ+[O iii]). The grey density map represents normal star-forming galaxies at z < 0.1 from SDSS DR16 (Pérez-Montero et al. 2021). Green star symbols indicate metal-poor galaxies at z ∼ 3-9 observed with JWST. The horizontal dashed line marks the approximate regime of candidate LyC emitt… view at source ↗

read the original abstract

Low-mass galaxies with intense starbursts exhibit spectra dominated by extreme nebular emission and faint stellar continua. These extreme emission-line galaxies (EELGs) are key laboratories to study star formation, feedback, and ionizing photon escape in low-metallicity environments. We exploit the DESI survey to assemble the k-Means of Extreme Nebulae from DEsi outLiers (k-MENDEL), a statistically robust sample of ~16,000 EELGs at 0.01 < z < 0.96 selected via automatic k-means classification. Using SED fitting and Te-based metallicities, we characterize EELGs including "blueberry" and "green pea" galaxies, spanning stellar masses of 10^6-10^10 Msun and SFRs of 0.1-100 Msun/yr. k-MENDEL extends previous SDSS samples toward higher redshifts and lower metallicities (12+log(O/H) ~ 7.0-8.5). EELGs lie systematically above the star-forming main sequence, with sSFRs up to ~100 Gyr^-1. They follow a shallower mass-metallicity relation offset by 0.3-0.5 dex from local relations, closely resembling young galaxies observed with JWST at z > 3-10. The large intrinsic metallicity scatter, even after projecting along the fundamental metallicity relation, indicates strong departures from simple "bathtub" models, suggesting massive inflows of metal-poor gas followed by strong feedback. While ~6% of the sample shows AGN-like signatures, the most extreme star-forming systems reach high ionization (O32 ~ 5-60) comparable to confirmed Lyman-continuum emitters. Our results support the interpretation of EELGs as short-lived, non-equilibrium phases in the evolution of low-mass galaxies and highlight their importance as nearby analogs of galaxies likely driving cosmic reionization (Abridged).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

k-MENDEL adds a large new DESI-based EELG catalog that extends prior SDSS work, but the k-means selection lacks the validation needed to firmly support the non-equilibrium and reionization-analog claims.

read the letter

The paper's core deliverable is a sample of about 16,000 extreme emission-line galaxies at 0.01 < z < 0.96 drawn from DESI spectra via k-means clustering. This pushes the redshift and metallicity range beyond earlier SDSS collections and supplies a bigger statistical base for comparing local low-mass starbursts to JWST high-z systems. That extension is the concrete advance here. The characterization uses standard SED fitting and Te-based metallicities, which is fine for a catalog paper, and the reported offsets (0.3-0.5 dex in the mass-metallicity relation, sSFRs up to 100 Gyr^-1, O32 values of 5-60) are presented clearly against literature benchmarks. The ~6% AGN fraction and the comparison to confirmed Lyman-continuum emitters are also straightforward to check. These elements give the work real utility as a reference sample. The main weakness sits in the selection step. The abstract describes the k-means as automatic and robust, yet supplies no purity metrics, no BPT cross-checks, no overlap tests with existing green-pea or blueberry catalogs, and no mock-spectrum runs that vary S/N or redshift. Without those, it is hard to rule out that some fraction of the reported scatter and offsets comes from mixing in ordinary star-formers or weak outliers rather than from genuine inflows and feedback. The interpretation of EELGs as short-lived non-equilibrium phases therefore rests on an assumption that still needs direct evidence in the text. This paper is aimed at galaxy-evolution groups that model feedback, ionizing-photon escape, or reionization sources and need a large local anchor sample. Readers who work with catalogs or selection techniques will get the most out of it once the full data products and selection flags are public. The sample size and topic make it worth a serious referee's time, even if the methods section requires tightening on validation. I would send it to review.

Referee Report

1 major / 1 minor

Summary. The paper presents the k-MENDEL sample of ~16,000 extreme emission-line galaxies (EELGs) at 0.01 < z < 0.96 assembled from DESI spectra via k-means clustering. Using SED fitting and Te-based metallicities, the authors characterize stellar masses (10^6-10^10 Msun), SFRs (0.1-100 Msun/yr), metallicities (12+log(O/H) ~7.0-8.5), sSFRs (up to ~100 Gyr^-1), and ionization parameters (O32 ~5-60). They report EELGs lie above the star-forming main sequence, follow a shallower MZR offset 0.3-0.5 dex from local relations, show large intrinsic metallicity scatter even after FMR projection, and interpret the population as short-lived non-equilibrium phases in low-mass galaxy evolution that serve as local analogs to reionization-era galaxies.

Significance. If the automated selection is validated as clean, the large sample size and extension to higher redshift/lower metallicity than prior SDSS work would provide a statistically robust catalog of local EELGs for comparison to JWST high-z observations. The reported MZR offset, high sSFRs, and O32 values comparable to confirmed LyC emitters would strengthen evidence for inflow/feedback-driven non-equilibrium evolution in low-mass systems and their role as reionization analogs. The automated k-means approach on a major survey like DESI offers potential for reproducibility and scalability.

major comments (1)

[Methods] Methods (k-means classification and sample selection): The manuscript provides no quantitative validation metrics for the k-means clustering (e.g., purity/completeness against BPT diagnostics, cross-matches to known green pea/blueberry catalogs, or tests on mock spectra with varying S/N and redshift). This is load-bearing for the central claim because contamination from normal star-formers, weak AGN, or other outliers could artifactually produce the reported 0.3-0.5 dex MZR offset, high sSFR tail, and intrinsic metallicity scatter after FMR projection.

minor comments (1)

[Abstract] Abstract: The parenthetical '(Abridged)' at the end is unnecessary and should be removed for the final version.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and positive evaluation of the k-MENDEL sample's potential value for comparisons with high-redshift observations. We address the single major comment on validation of the k-means selection below. We agree that strengthening this aspect will improve the manuscript and plan to incorporate the requested metrics in revision.

read point-by-point responses

Referee: [Methods] Methods (k-means classification and sample selection): The manuscript provides no quantitative validation metrics for the k-means clustering (e.g., purity/completeness against BPT diagnostics, cross-matches to known green pea/blueberry catalogs, or tests on mock spectra with varying S/N and redshift). This is load-bearing for the central claim because contamination from normal star-formers, weak AGN, or other outliers could artifactually produce the reported 0.3-0.5 dex MZR offset, high sSFR tail, and intrinsic metallicity scatter after FMR projection.

Authors: We acknowledge that the current version of the manuscript describes the k-means procedure and feature space in Section 2 but does not present explicit quantitative validation metrics such as purity/completeness, cross-matches to literature catalogs, or mock-spectrum tests. This is a fair criticism, as such metrics would directly address concerns about possible contamination affecting the reported MZR offset, sSFR distribution, and metallicity scatter. In the revised manuscript we will add a dedicated subsection (or appendix) containing: (i) BPT-based purity assessment for the subset of objects with [N II], [S II], and H-alpha coverage, (ii) recovery fractions from cross-matches to published green pea and blueberry samples (e.g., Cardamone et al. 2009 and subsequent works), and (iii) results from injecting mock extreme-emission spectra into real DESI noise at varying S/N and redshift to quantify completeness and contamination rates. These additions will demonstrate that the selection is sufficiently clean for the scientific conclusions drawn. revision: yes

Circularity Check

0 steps flagged

No significant circularity in observational selection and empirical measurements

full rationale

The paper constructs the k-MENDEL sample via k-means clustering applied directly to DESI spectra, then measures stellar masses, SFRs, metallicities (Te-based), and ionization parameters from the spectra and SED fitting on the selected objects. Reported offsets (0.3-0.5 dex in MZR, high sSFR, O32 values) and intrinsic scatter are computed from these independent data products and compared to external literature relations (e.g., local SFMS, MZR, FMR). No derivation step defines a quantity in terms of itself, renames a fitted parameter as a prediction, or relies on a load-bearing self-citation whose content reduces to the present work. The interpretation of short-lived non-equilibrium phases follows from the observed empirical patterns rather than from any internal definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an observational sample-selection and characterization study with no mathematical derivations or new physical models presented in the abstract. No free parameters, axioms, or invented entities are required beyond standard astronomical techniques.

pith-pipeline@v0.9.0 · 5738 in / 1351 out tokens · 45392 ms · 2026-05-10T16:46:59.834413+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 1 canonical work pages

[1]

Amorín, R., Aguerri, J. A. L., Muñoz-Tuñón, C., & Cairós, L. M. 2009, A&A, 501, 75 Amorín, R., Fontana, A., Pérez-Montero, E., et al. 2017, Nature Astronomy, 1, 0052 Amorín, R., Grazian, A., Castellano, M., et al. 2014, ApJ, 788, L4 Amorín, R., Pérez-Montero, E., Contini, T., et al. 2015, A&A, 578, A105 Amorín, R., Pérez-Montero, E., Vílchez, J. M., & Pap...

work page arXiv 2009
[2]

Pre-processing includes shifting spec- tra to the rest frame, resampling to∆λ=0.80 Å, and normaliz- ing to the flux in theg-band (λ eff =4825 Å)

in the redshift range 0.01<z<0.96. Pre-processing includes shifting spec- tra to the rest frame, resampling to∆λ=0.80 Å, and normaliz- ing to the flux in theg-band (λ eff =4825 Å). The parent sam- ple is then divided into two subsets: a low-redshift subsample (0.01<z≤0.25, 267,568 spectra) and a high-redshift subsam- ple (0.25<z<0.96, 552,676 spectra). Fo...

2010
[3]

A preview of these quantities is shown in Table B.2

In addition, a second table provides key emission-line ratios and derived physical properties for the k-MENDEL sample. A preview of these quantities is shown in Table B.2. The catalog is designed to facilitate reproducibility and enable further analysis of the results presented in this work. Article number, page 18 of 21 Bonatto & Amorín et al.: EELG char...

2000