The Cluster Completeness Correction Calculator (C-4): A Neural-Network framework and pilot application to the LEGUS Survey of NGC 628

Alan Zhang; Jianling Tang; Kathryn Grasha; Mark R. Krumholz; Tomasz R\'o\.za\'nski

arxiv: 2604.05291 · v1 · submitted 2026-04-07 · 🌌 astro-ph.IM · astro-ph.GA

The Cluster Completeness Correction Calculator (C-4): A Neural-Network framework and pilot application to the LEGUS Survey of NGC 628

Jianling Tang , Kathryn Grasha , Tomasz R\'o\.za\'nski , Mark R. Krumholz , Alan Zhang This is my paper

Pith reviewed 2026-05-10 19:53 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.GA

keywords star clusterscompleteness correctionneural networksselection effectsNGC 628cluster demographicsartificial star clustersLEGUS survey

0 comments

The pith

Neural networks trained on injected artificial clusters learn the selection function for detecting star clusters in galaxy images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that adds simulated star clusters to real telescope images of galaxies, runs them through the exact detection and filtering steps used for the original catalogue, and trains neural networks to predict which clusters would have been found. This produces a continuous completeness function that accounts for the complex, interdependent effects of cluster mass, age, and dust extinction. In a test on NGC 628 data, the approach removes artificial flattening from the observed mass and age distributions and pushes reliable analysis down to much lower masses and younger ages.

Core claim

By training multilayer perceptron networks on the detection outcomes of artificial clusters injected into observed images and processed identically to real data, the method learns a highly accurate selection operator that captures strongly non-separable dependencies on physical parameters, allowing direct completeness corrections that extend demographic analyses by roughly an order of magnitude in mass and age while eliminating biases in the distributions.

What carries the argument

Multilayer perceptron neural networks that map cluster physical parameters (mass, age, extinction) to a continuous detection probability after the clusters have been injected into real images and passed through the catalogue construction pipeline.

If this is right

Completeness corrections can be applied directly to existing cluster catalogues to recover intrinsic populations.
Observed mass and age distributions no longer show artificial flattening at the low end.
Demographic analyses become feasible over an order of magnitude wider range in mass and age.
The differentiable completeness functions can be inserted into forward-modeling or Bayesian inference frameworks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same injection-plus-network approach could be used to model selection functions for other resolved populations such as individual stars or H II regions.
Embedding the learned completeness operator inside hierarchical population models would allow joint inference of formation rates and disruption timescales.
The framework scales to large surveys by retraining on new fields or instruments without changing the core procedure.

Load-bearing premise

Artificial clusters added to the images reproduce the detection and filtering behavior of real clusters, and the trained networks generalize to actual data without significant overfitting.

What would settle it

A quantitative match between the neural network's predicted recovery fraction and the actual fraction recovered when a large new set of artificial clusters is injected and run through the pipeline as an independent test.

Figures

Figures reproduced from arXiv: 2604.05291 by Alan Zhang, Jianling Tang, Kathryn Grasha, Mark R. Krumholz, Tomasz R\'o\.za\'nski.

**Figure 1.** Figure 1: Illustration of the AST pipeline for artificial clusters with 𝑟eff = 5 pc. Top: white-light image of NGC 628 from the original LEGUS survey, with the “C” (central) field of view that we use in this paper outlined in black. The white streak across the image is an instrumental artefact caused by the chip gap in the ACS image, while the overlapping border in the lower-right corner is a stacking artefact arisi… view at source ↗

**Figure 2.** Figure 2: An illustration of neural network used to estimate completeness. and “photometric” networks. For both networks we define MLP𝝓 : 𝜽 ↦→ 𝑝ˆobs ∈ [0, 1], (8) where 𝝓 denotes the free parameters of the MLP trained to approximate the completeness function. Mathematically, the MLP for this binary classification task is implemented as a composition of several affine transformations, followed by element-wise nonlin… view at source ↗

**Figure 3.** Figure 3: Arithmetic means of the true completeness, 𝑝true (black points), and the completeness values predicted by the physical and photometric neural networks, 𝑝bphys (blue points) and 𝑝bphot (orange points), for the test cluster sample binned by log cluster age. 2 3 4 5 6 7 Log[Mass=M¯] 0.0 0.2 0.4 0.6 0.8 C o m ple t e n e s s ptrue pbphys pbphot [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Same as [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Same as [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 7.** Figure 7: Cumulative Distribution Function (CDF) of the absolute prediction error [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Completeness derived by binning the AST data set (left) versus completeness predicted by the physical neural network (right). The binned prediction is estimated by binning injected clusters in the (log 𝑀, log 𝑇) plane and computing the mean inclusion label per bin, while the NN prediction is evaluated on a fine grid and marginalised over an extinction prior 𝑝(𝐴𝑉 ) (equation 14). 6 7 8 9 10 log10 (Age=yr) 2… view at source ↗

**Figure 9.** Figure 9: Physical NN-predicted completeness 𝑝bphys surface evaluated at zero 𝐴𝑉 = 0, to consistently compare with literature completeness estimates from Adamo et al. (2017). The red solid curve shows the V band magnitude cut mapped into age-mass space, while the orange dashed curve marks the locus of masses and ages corresponding to the 90% completeness limit in V-band magnitude. 4.3 Comparison to literature comple… view at source ↗

**Figure 10.** Figure 10: Completeness-corrected CAF. The black line shows that raw (uncorrected) age histogram, while orange and green lines show completenesscorrected CAFs assuming the CMFs from Adamo et al. (2017) and Tang et al. (2024), respectively. Shaded envelopes show the confidence intervals between the 16% and 84% percentiles of correction factors within each bin. The vertical shaded band shows ages > 200 Myr, beyond … view at source ↗

read the original abstract

Integrated-light star cluster catalogues in external galaxies are subject to complex, often poorly-characterised selection effects that can bias inferred cluster demographics and introduce significant uncertainties, limiting the physical parameter space accessible to analysis. To mitigate this problem, here we introduce the Cluster Completeness Correction Calculator (C-4): a new software tool to quantify and predict these effects in both physical and photometric parameter spaces. C-4 adds artificial star clusters to observed galaxy images, processes these images through the same detection and filtering steps used to construct the original cluster catalogue, and then trains multilayer perceptron neural networks to learn the resulting selection function. The trained neural networks provide continuous, differentiable completeness functions that can be used for direct completeness corrections or incorporated into forward models. We present a pilot application of C-4 to NGC~628, demonstrating that the learned selection operator is highly accurate and successfully captures the strongly non-separable dependence of completeness on mass, age, and extinction. Applying the completeness correction to NGC 628 extends the range of cluster demographic analyses by roughly an order of magnitude in both mass and age, and removes artificial flattening in the observed cluster mass and age distributions. These results establish neural-network-based completeness modelling as a powerful and general approach for recovering intrinsic cluster populations, and provide a scalable framework for modelling high-dimensional selection functions in resolved stellar population studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

C-4 gives a workable neural-net route to model non-separable completeness in cluster surveys, but the accuracy still depends on unverified fidelity between artificial and real cluster injections.

read the letter

The paper introduces C-4 as a tool that adds artificial clusters to real galaxy images, runs the same detection pipeline used for the catalog, and trains multilayer perceptrons to output a continuous completeness function in mass, age, extinction, and other parameters. The pilot on LEGUS NGC 628 data claims this captures the strong non-separable dependencies and extends the usable range by roughly an order of magnitude while removing artificial flattening in the mass and age distributions. That is the core new piece: a scalable, differentiable selection operator learned directly from forward simulations rather than from separable analytic approximations. The approach is practical for anyone who already has image simulations and wants to fold the selection function into demographic modeling or forward fits. It is a clear step beyond the usual completeness maps that treat parameters independently. The execution looks competent on the simulation side, with the networks apparently interpolating well within the injected population. The main soft spot is exactly the one the stress-test flags. All labels come from the artificial clusters, so any mismatch in morphology, crowding, PSF, or background structure between the synthetics and real clusters will propagate straight into the learned function with no independent check from actual detections. The abstract asserts high accuracy, but without the full validation plots, hold-out tests on real clusters, or sensitivity runs on injection parameters, it is hard to judge how large that systematic could be. Standard train-test splits on the synthetics only confirm internal consistency, not transfer to the real data distribution. This work is aimed at people doing resolved cluster demographics in nearby galaxies, especially those using HST or similar surveys where selection biases are high-dimensional. It is worth sending to peer review because the method is new, the problem is real, and the demonstration is on actual survey data. Referees can push on the validation gaps and ask for more robustness checks, but the idea itself is solid enough to deserve that discussion.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Cluster Completeness Correction Calculator (C-4), a framework that injects artificial star clusters into observed galaxy images, processes them through the same detection and filtering pipeline as the real catalog, and trains multilayer perceptron neural networks to learn a continuous, differentiable completeness function in physical and photometric parameter space. In a pilot application to the LEGUS survey of NGC 628, the authors claim that the trained networks achieve high accuracy, capture strongly non-separable dependencies on mass, age, and extinction, and that applying the correction extends the accessible range of cluster demographic analyses by roughly an order of magnitude while removing artificial flattening in the observed mass and age distributions.

Significance. If the core assumption holds, C-4 offers a scalable, general method for modeling high-dimensional selection functions in resolved stellar populations, directly addressing a long-standing limitation in extragalactic cluster studies. The provision of a trained, differentiable completeness operator that can be used in forward modeling or direct corrections is a concrete advance over traditional binned or parametric approaches.

major comments (3)

[Methods and Results] The central claim that the learned selection operator is 'highly accurate' and successfully captures non-separable dependencies rests on the untested fidelity of the artificial-cluster injection procedure to real detection, photometry, and filtering decisions. No independent ground-truth comparison (e.g., against real clusters with known properties or against an alternative completeness estimator) is presented; all reported accuracy metrics derive from train/test splits within the synthetic population.
[Application to NGC 628] The reported order-of-magnitude extension in mass and age range for NGC 628, and the removal of artificial flattening, are direct consequences of the completeness function derived from the synthetic injections. Without a quantitative assessment of how mismatches in morphology, crowding, background structure, or PSF convolution between synthetic and real clusters propagate into the learned function, the demographic corrections remain provisional.
[Neural Network Training] The manuscript does not report the specific network architecture details, hyperparameter choices, loss function, or regularization strategy used for the multilayer perceptrons, nor does it include ablation tests showing that the non-separable behavior is learned rather than imposed by the training distribution.

minor comments (2)

[Figures] Figure captions should explicitly state the range of injected parameters (mass, age, extinction) and the number of synthetic clusters used for training and testing.
[Discussion] The abstract states that the networks 'provide continuous, differentiable completeness functions'; the main text should include an explicit statement of how these functions are evaluated or interpolated for use in downstream demographic modeling.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for their constructive and detailed comments, which highlight important limitations in the current presentation of the C-4 framework. We respond point-by-point to the major comments below, indicating where we will revise the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Methods and Results] The central claim that the learned selection operator is 'highly accurate' and successfully captures non-separable dependencies rests on the untested fidelity of the artificial-cluster injection procedure to real detection, photometry, and filtering decisions. No independent ground-truth comparison (e.g., against real clusters with known properties or against an alternative completeness estimator) is presented; all reported accuracy metrics derive from train/test splits within the synthetic population.

Authors: We agree that all reported accuracy metrics are internal to the synthetic population and that no independent validation against real clusters with known properties is provided. This is an inherent limitation of the pilot study, as true physical parameters for the undetected or marginally detected real clusters are not independently known. The method's design relies on applying the identical detection and filtering pipeline to injected clusters, which provides a self-consistent estimate of the selection function. In the revised manuscript we will add an explicit discussion of this assumption and its caveats, including potential differences in morphology and background, while tempering the language around 'highly accurate' to reflect the synthetic nature of the validation. revision: partial
Referee: [Application to NGC 628] The reported order-of-magnitude extension in mass and age range for NGC 628, and the removal of artificial flattening, are direct consequences of the completeness function derived from the synthetic injections. Without a quantitative assessment of how mismatches in morphology, crowding, background structure, or PSF convolution between synthetic and real clusters propagate into the learned function, the demographic corrections remain provisional.

Authors: The referee is correct that the reported extensions and corrections are provisional in the absence of a quantitative propagation of injection mismatches. We will revise the application section to include sensitivity tests that vary key injection parameters (e.g., cluster morphology, PSF convolution, and local background levels) and report the resulting variation in the derived completeness function and demographic corrections. This will provide a quantitative bound on the robustness of the NGC 628 results. revision: yes
Referee: [Neural Network Training] The manuscript does not report the specific network architecture details, hyperparameter choices, loss function, or regularization strategy used for the multilayer perceptrons, nor does it include ablation tests showing that the non-separable behavior is learned rather than imposed by the training distribution.

Authors: We will add a dedicated subsection in the Methods describing the multilayer perceptron architecture (number of layers, neurons per layer, activation functions), the hyperparameter search procedure, the loss function (binary cross-entropy), and the regularization techniques employed (dropout and early stopping). We will also include ablation experiments that systematically remove or mask input features (mass, age, extinction, local background) and demonstrate that the non-separable dependencies emerge from the training data rather than being artifacts of the sampling distribution. revision: yes

standing simulated objections not resolved

Independent ground-truth comparison against real clusters with known properties
Complete quantitative assessment of all possible mismatches between synthetic and real cluster injections without new observational data

Circularity Check

0 steps flagged

No significant circularity; forward simulation and external pipeline provide independent basis

full rationale

The paper's core method injects artificial clusters into real observed images, runs them through the identical LEGUS detection and filtering pipeline, and trains MLPs on the resulting binary detection labels to learn a continuous completeness function. This learned operator is then applied to the real catalog. No step reduces by construction to its own inputs: the training data derive from external images and a fixed external pipeline rather than from the target cluster demographics or from parameters fitted to the final corrected distributions. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The derivation chain therefore remains self-contained against external benchmarks (the LEGUS images and pipeline) and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that simulated cluster injections faithfully reproduce real detection processes and that neural networks can learn the resulting high-dimensional selection function from finite simulations.

free parameters (1)

multilayer perceptron architecture and training parameters
Number of layers, neurons per layer, activation functions, and optimization hyperparameters are chosen to fit the learned completeness function but not specified in the abstract.

axioms (1)

domain assumption Artificial clusters added to real galaxy images undergo identical detection, photometry, and filtering steps as real clusters.
This assumption underpins the entire simulation-based training procedure described in the abstract.

pith-pipeline@v0.9.0 · 5570 in / 1486 out tokens · 73070 ms · 2026-05-10T19:53:22.688455+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · 1 internal anchor

[1]

R., Federrath C., 2021, MNRAS, 508, 3290 Bertin E., Arnouts S., 1996, A&AS, 117, 393 Boulares A., Cox D

Adamo A., et al., 2017, The Astrophysical Journal, 841, 131 Arora R., Krumholz M. R., Federrath C., 2021, MNRAS, 508, 3290 Bertin E., Arnouts S., 1996, A&AS, 117, 393 Boulares A., Cox D. P., 1990, ApJ, 365, 544 Calzetti D., et al., 2015, AJ, 149, 51 Cerviño M., Luridiana V., 2004, A&A, 413, 145 Cerviño M., Valls-Gabaud D., 2003, MNRAS, 338, 481 ChabrierG....

work page doi:10.1007/978-1-4020-3407-7_5 2017
[2]

Attention Is All You Need

Pedregosa F., et al., 2011, Journal of Machine Learning Research, 12, 2825 Rumelhart D. E., Hinton G. E., Williams R. J., 1986, Nature, 323, 533 Ryon J. E., et al., 2017, ApJ, 841, 92 Tang J., Grasha K., Krumholz M. R., 2023, arXiv e-prints, p. arXiv:2301.05912 Tang J., Grasha K., Krumholz M. R., 2024, MNRAS, 532, 4583 Vaswani A., Shazeer N., Parmar N., U...

work page internal anchor Pith review doi:10.48550/arxiv.1706.03762 2011
[3]

Although the validation loss begins to plateau at𝑁≈5×104, the curves continue to decrease slightly up to𝑁=2.5×105, while the improvement beyond this point is minimal

Beyond this point, doubling the training set yields marginal improvement in validation lossΔCE≲0.001, while substantially increasing computational cost. Although the validation loss begins to plateau at𝑁≈5×104, the curves continue to decrease slightly up to𝑁=2.5×105, while the improvement beyond this point is minimal. We therefore adopt a sample size of𝑁=...

work page 2024

[1] [1]

R., Federrath C., 2021, MNRAS, 508, 3290 Bertin E., Arnouts S., 1996, A&AS, 117, 393 Boulares A., Cox D

Adamo A., et al., 2017, The Astrophysical Journal, 841, 131 Arora R., Krumholz M. R., Federrath C., 2021, MNRAS, 508, 3290 Bertin E., Arnouts S., 1996, A&AS, 117, 393 Boulares A., Cox D. P., 1990, ApJ, 365, 544 Calzetti D., et al., 2015, AJ, 149, 51 Cerviño M., Luridiana V., 2004, A&A, 413, 145 Cerviño M., Valls-Gabaud D., 2003, MNRAS, 338, 481 ChabrierG....

work page doi:10.1007/978-1-4020-3407-7_5 2017

[2] [2]

Attention Is All You Need

Pedregosa F., et al., 2011, Journal of Machine Learning Research, 12, 2825 Rumelhart D. E., Hinton G. E., Williams R. J., 1986, Nature, 323, 533 Ryon J. E., et al., 2017, ApJ, 841, 92 Tang J., Grasha K., Krumholz M. R., 2023, arXiv e-prints, p. arXiv:2301.05912 Tang J., Grasha K., Krumholz M. R., 2024, MNRAS, 532, 4583 Vaswani A., Shazeer N., Parmar N., U...

work page internal anchor Pith review doi:10.48550/arxiv.1706.03762 2011

[3] [3]

Although the validation loss begins to plateau at𝑁≈5×104, the curves continue to decrease slightly up to𝑁=2.5×105, while the improvement beyond this point is minimal

Beyond this point, doubling the training set yields marginal improvement in validation lossΔCE≲0.001, while substantially increasing computational cost. Although the validation loss begins to plateau at𝑁≈5×104, the curves continue to decrease slightly up to𝑁=2.5×105, while the improvement beyond this point is minimal. We therefore adopt a sample size of𝑁=...

work page 2024