pith. sign in

arxiv: 2606.20801 · v1 · pith:LJ34TT5Znew · submitted 2026-06-18 · 🌌 astro-ph.GA · astro-ph.IM

LEGGOS III: Mapping Star Formation and Dust in Gravitationally Lensed Galaxies with textit{SUMAC}, a UMAP and Clustering Framework

Pith reviewed 2026-06-26 16:25 UTC · model grok-4.3

classification 🌌 astro-ph.GA astro-ph.IM
keywords gravitational lensingJWST spectroscopystar formationunsupervised clusteringUMAPHDBSCANhigh-redshift galaxiesdust attenuation
0
0 comments X

The pith

An unsupervised pipeline segments JWST spaxel SEDs of a lensed galaxy into six physically distinct stellar and nebular populations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SUMAC, an automated method that embeds and clusters spectral energy distributions from JWST integral-field spectroscopy to identify star-forming regions in high-redshift lensed galaxies. Applied to observations of SGAS111020.0+645950.8 at redshift 2.481, the approach recovers six clusters whose median SEDs differ systematically in emission-line strengths, UV continuum slope, Balmer break, and dust-sensitive ratios. Bluer clusters correspond to unobscured star formation while redder ones trace dustier regions. This replaces manual, observer-dependent segmentation with a uniform, data-driven procedure that can be applied consistently across similar datasets.

Core claim

The SUMAC pipeline combines UMAP manifold embedding with HDBSCAN density clustering applied to spaxel spectral energy distributions, recovering six physically distinct stellar/nebular populations. The cluster median SEDs separate cleanly on the presence and strength of Hβ+[OIII], Hα+[NII], β_NUV slope, Balmer break strength, and the Balmer decrement, with bluer clusters tracing unobscured star-forming regions and progressively redder clusters tracing dusty star-forming regions.

What carries the argument

SUMAC: UMAP-based manifold embedding combined with HDBSCAN density clustering applied to spectral energy distributions at the spaxel level.

If this is right

  • The method automates identification of star-forming clumps in lensed JWST IFS data, reducing dependence on manual inspection.
  • Cluster separation shows that SED shape alone can distinguish unobscured from dusty star formation without additional priors.
  • The pipeline can be applied uniformly to other lensed galaxies at z approximately 2-4 observed with similar instruments.
  • Median SEDs of the recovered clusters supply empirical templates for stellar and nebular properties in high-redshift systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Testing the same clustering on non-lensed or lower-resolution data would check whether the six-population structure persists outside strong lensing.
  • Adding longer-wavelength photometry could tighten constraints on dust content within the redder clusters.
  • The framework might be used to measure the obscured fraction of star formation across a statistical sample of z approximately 2.5 galaxies.
  • Comparing cluster assignments against hydrodynamic simulations of galaxy assembly could test whether the recovered populations correspond to distinct evolutionary stages.

Load-bearing premise

The clusters produced by UMAP and HDBSCAN on spaxel SEDs represent physically distinct populations rather than artifacts of the embedding or clustering hyperparameters.

What would settle it

Re-running the pipeline on the same data with altered UMAP parameters or a different clustering algorithm produces clusters whose median SEDs no longer separate on emission lines, UV slope, Balmer features, or dust indicators.

Figures

Figures reproduced from arXiv: 2606.20801 by Alex Ross, Aritra Ghosh, Brian Welch, Cole Panzer, Dylan Berry, Gourav Khullar, Guillaume Mahler, Julissa Sarmiento, Michael Florian, Pedram Abedi, Taylor Hutchison, T. Emil Rivera-Thorsen, the JWST LEGGOS Collaboration.

Figure 1
Figure 1. Figure 1: Example of the primary outputs of SUMAC. Shown here is the UMAP embedding of the IFU spaxels (top left) with the HDBSCAN classified and colored spaxel clusters plotted on arbitrary units. The noise points identified by HDBSCAN are gray and do not belong to a specific cluster. In the middle left is the clusters mapped from UMAP space back to their original location on the lensed arc of SGAS1110, and an RGB … view at source ↗
read the original abstract

Strong gravitational lensing combined with JWST's spatio-spectral resolution enables resolved studies of star-forming regions in $z\sim$ 2-4 galaxies, but identifying and characterizing such regions in lensed integral-field and multi-band data remains a manual, observer-dependent process. We present $\texttt{SUMAC}$ (Software for the Uniform Manifold Approximation of Clumps), an unsupervised learning pipeline that segments JWST imaging and spectroscopy at the "spaxel" level by combining $\texttt{UMAP}$-based manifold embedding with $\texttt{HDBSCAN}$ density clustering applied to spectral energy distributions/spectra. We demonstrate the pipeline on JWST/NIRSpec PRISM IFS observations of the lensed galaxy SGAS111020.0+645950.8 at $z = 2.481$, recovering six physically distinct stellar/nebular populations. The cluster median SEDs separate cleanly on the presence and strength of H$\beta$+[OIII], H$\alpha$+[NII], $\beta_{NUV}$ slope, Balmer break strength, and the Balmer decrement, with bluer clusters tracing unobscured star-forming regions and progressively redder clusters tracing dusty star-forming regions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents SUMAC, an unsupervised pipeline applying UMAP manifold embedding followed by HDBSCAN density clustering directly to spaxel SED vectors from JWST/NIRSpec PRISM IFS observations of the lensed galaxy SGAS111020.0+645950.8 at z=2.481. It claims to recover six physically distinct stellar/nebular populations whose median SEDs separate cleanly on the presence/strength of Hβ+[OIII], Hα+[NII], β_NUV slope, Balmer break, and Balmer decrement, with bluer clusters tracing unobscured star-forming regions and redder clusters tracing dusty ones.

Significance. If the clusters map to physically distinct populations, the method would supply an objective, automated alternative to manual segmentation of resolved star-forming regions in gravitationally lensed z~2-4 galaxies, reducing observer dependence and enabling statistical studies of dust and star formation with JWST IFS data.

major comments (2)
  1. [Abstract] Abstract: the central claim that the six clusters correspond to physically distinct populations rests solely on post-hoc qualitative inspection of median SED separations in emission-line and continuum features; no error bars on the medians, silhouette scores or other validation metrics, comparison against manual segmentation, or recovery tests on simulated data with known ground-truth populations are reported.
  2. [Methods] Pipeline description: no systematic sweeps of UMAP (n_neighbors, min_dist) or HDBSCAN (min_cluster_size, min_samples) hyperparameters are presented, nor are alternative embeddings (PCA, t-SNE) or robustness checks across different random seeds or data subsets; without these, the clean six-cluster solution and the reported SED separations could be induced by the chosen embedding parameters rather than reflecting intrinsic physical components.
minor comments (1)
  1. Notation for β_NUV and the Balmer decrement should be defined explicitly on first use and used consistently in all figure captions and text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us identify areas where the manuscript can be strengthened. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the six clusters correspond to physically distinct populations rests solely on post-hoc qualitative inspection of median SED separations in emission-line and continuum features; no error bars on the medians, silhouette scores or other validation metrics, comparison against manual segmentation, or recovery tests on simulated data with known ground-truth populations are reported.

    Authors: We agree that the current abstract and results section rely primarily on qualitative assessment of the median SED separations. In the revised manuscript we will add error bars to all median SED plots, report silhouette scores (and optionally Davies-Bouldin index) for the six-cluster solution, and include a direct comparison between the SUMAC clusters and a manual segmentation performed by the authors. Full recovery tests on simulated NIRSpec PRISM data with injected ground-truth populations are a valuable next step but would require a separate simulation framework and are beyond the scope of this demonstration paper; we will explicitly note this limitation and flag it for future work. revision: partial

  2. Referee: [Methods] Pipeline description: no systematic sweeps of UMAP (n_neighbors, min_dist) or HDBSCAN (min_cluster_size, min_samples) hyperparameters are presented, nor are alternative embeddings (PCA, t-SNE) or robustness checks across different random seeds or data subsets; without these, the clean six-cluster solution and the reported SED separations could be induced by the chosen embedding parameters rather than reflecting intrinsic physical components.

    Authors: We acknowledge the absence of these robustness checks. The revised Methods section will contain a new subsection (or appendix) that systematically varies the key UMAP (n_neighbors, min_dist) and HDBSCAN (min_cluster_size, min_samples) parameters over plausible ranges and shows that the six-cluster solution and the associated SED separations remain stable. We will also present results using PCA as an alternative linear embedding and report clustering outcomes for multiple random seeds as well as for data subsets (e.g., different rest-frame wavelength ranges and spatial masks). These additions will demonstrate that the reported populations are not artifacts of a single hyperparameter choice. revision: yes

Circularity Check

0 steps flagged

No circularity; standard unsupervised pipeline with independent validation

full rationale

The paper applies UMAP manifold embedding followed by HDBSCAN clustering to spaxel SED vectors as a standard, off-the-shelf unsupervised segmentation method. The central claim that the resulting six clusters map to physically distinct populations rests on post-hoc inspection of median SED differences in emission lines and continuum features, not on any equation, fitted parameter, or self-citation that reduces the output to the input by construction. No derivation chain, uniqueness theorem, or ansatz is invoked; the method is externally falsifiable via hyperparameter sweeps or alternative embeddings, none of which are required for the absence of circularity. This is the normal case of a self-contained empirical application.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Abstract-only review; free parameters and axioms inferred from standard unsupervised clustering practice rather than explicit paper content.

free parameters (2)
  • UMAP n_neighbors and min_dist
    Hyperparameters controlling the manifold embedding; chosen to produce usable clusters but not reported in abstract.
  • HDBSCAN min_cluster_size and min_samples
    Control cluster granularity; directly determine the reported six populations.
axioms (1)
  • domain assumption Spaxel-level SEDs contain sufficient manifold structure to separate physically distinct stellar/nebular populations via density-based clustering.
    Core premise enabling the claim that recovered clusters are physically meaningful.

pith-pipeline@v0.9.1-grok · 5810 in / 1439 out tokens · 28948 ms · 2026-06-26T16:25:53.930510+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

10 extracted references · 8 canonical work pages

  1. [1]

    Accelerated Hierarchical Density Based Clustering , url=

    McInnes, Leland and Healy, John , year=. Accelerated Hierarchical Density Based Clustering , url=. doi:10.1109/icdmw.2017.12 , booktitle=

  2. [2]

    Rigby, J. R. and Johnson, T. L. and Sharon, K. and Whitaker, K. and Gladders, M. D. and Florian, M. and Lotz, J. and Bayliss, M. and Wuyts, E. , year=. Star Formation at z = 2.481 in the Lensed Galaxy SDSS J1110+6459. II. What is Missed at the Normal Resolution of the Hubble Space Telescope? , volume=. The Astrophysical Journal , publisher=. doi:10.3847/1...

  3. [3]

    Calzetti , author A

    Dust Extinction of the Stellar Continua in Starburst Galaxies: The Ultraviolet and Optical Extinction Law. , keywords =. doi:10.1086/174346 , adsurl =

  4. [4]

    , keywords =

    JWST Early Release Science Program TEMPLATES: Targeting Extremely Magnified Panchromatic Lensed Arcs and Their Extended Star Formation. , keywords =. doi:10.3847/1538-4357/ad7501 , archivePrefix =. 2312.10465 , primaryClass =

  5. [5]

    2020 , eprint=

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , author=. 2020 , eprint=

  6. [6]

    <scp>capivara</scp>: a spectral-based segmentation method for IFU data cubes , volume=

    de Souza, Rafael and Dahmer-Hahn, Luis G and Shen, Shiyin and Chies-Santos, Ana L and Chen, Mi and Rahna, P T and Coelho, Paula and Riffel, Rogério and Ye, Renhao and Tahmasebzadeh, Behzad , year=. <scp>capivara</scp>: a spectral-based segmentation method for IFU data cubes , volume=. Monthly Notices of the Royal Astronomical Society , publisher=. doi:10....

  7. [7]

    2026 , eprint=

    SAGUI: SED-based Segmentation of Multi-band Galaxy Images -- Application to JADES in GOODS-South , author=. 2026 , eprint=

  8. [8]

    and Steinhardt, Charles L

    Nielsen, Emma W. and Steinhardt, Charles L. and Harper, Mathieux and McPartland, Conor and Sedgewick, Aidan , year=. Evidence for multiple types of post-starburst galaxies , volume=. doi:10.1051/0004-6361/202554507 , journal=

  9. [9]

    Rosito, M. S. and Bignone, L. A. and Tissera, P. B. and Pedrosa, S. E. , year=. Application of dimensionality reduction and clustering algorithms for the classification of kinematic morphologies of galaxies , volume=. doi:10.1051/0004-6361/202244707 , journal=

  10. [10]

    L., Rigby, J

    Johnson, Traci L. and Rigby, Jane R. and Sharon, Keren and Gladders, Michael D. and Florian, Michael and Bayliss, Matthew B. and Wuyts, Eva and Whitaker, Katherine E. and Livermore, Rachael and Murray, Katherine T. , year=. Star Formation at z = 2.481 in the Lensed Galaxy SDSS J1110+6459: Star Formation Down to 30 pc Scales<sup>∗</sup> , volume=. The Astr...