Probing the faint end of simulated galaxy counts at z>3
Pith reviewed 2026-05-20 18:43 UTC · model grok-4.3
The pith
Hydrodynamical simulations underproduce the faint compact galaxies seen in deep near-infrared observations at redshifts above 3.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The discrepancy between observed and simulated faint-end galaxy counts at redshifts greater than 3 arises both from detection losses of diffuse galaxies and, more fundamentally, from the inability of current hydrodynamical simulations to produce enough faint compact galaxies. Forward modeling into mock images demonstrates that the deficit persists even after accounting for completeness and that increasing depth alone cannot resolve it, as simulations favor diffuse low-surface-brightness systems over the compact cores seen in data.
What carries the argument
Forward modeling of simulated lightcone catalogs into mock observational images to enable direct comparison of detected sources, which reveals systematic differences in galaxy structure between simulations and observations.
If this is right
- The faint-end deficit appears consistently across different observational fields at redshifts above 3.
- Structural analysis shows simulations produce more diffuse low-surface-brightness galaxies and fewer compact systems with bright cores.
- Increasing the depth of mock images recovers counts near the completeness peak but overpredicts the faintest sources.
- The tension indicates that adjustments to modeling of early star formation, feedback, and dust treatment are needed.
Where Pith is reading between the lines
- The structural mismatch may require revisions to how galaxy sizes and concentrations are regulated in simulations at early times.
- Resolving the count discrepancy could alter estimates of the total contribution from faint galaxies to the extragalactic background light.
- Similar forward-modeling tests applied to other simulation suites could isolate whether the compact-galaxy deficit is widespread.
Load-bearing premise
The forward-modeling procedure and completeness corrections accurately capture all observational selection effects, allowing remaining differences to be attributed to simulation physics rather than unmodeled biases.
What would settle it
Deeper imaging that reveals a population of faint compact galaxies with bright central cores whose number and properties match the simulation predictions without overproducing the very faintest sources would challenge the conclusion that simulations fundamentally lack these objects.
Figures
read the original abstract
Simulations and observations now probe comparable redshift regimes with unprecedented accuracy, enabling direct consistency tests through forward modeling. In a previous work, we identified a faint-end discrepancy between observed and simulated near-infrared galaxy counts in CANDELS GOODS-South. Here we investigate whether this tension originates from the forward-modeling procedure or from limitations of the underlying simulations, and we characterize the galaxy populations responsible for the tension. Using the FORECAST forward-modeling code, we generated ten independent light-cone realizations and mock CANDELS images from the TNG100 and EAGLE simulations. We compared both the intrinsic light-cone catalogs and the mock-image detections with observations, testing dependencies on field and redshift, and validating the pipeline through stellar mass and multi-band analyses. The faint-end deficit is present in all CANDELS fields and appears at z>3 in both simulations. GOODS-South counts corrected for completeness exceed intrinsic simulation counts already at the 50% completeness limit, indicating that the missing population is not simply hidden below the detection threshold. Increasing the depth of mock images recovers the counts near the peak but overpredicts the faintest sources, showing that depth alone cannot resolve the discrepancy. Structural analyses reveal that compact galaxies with bright central cores observed in GOODS-South are underproduced in simulations, which instead favor diffuse low-surface-brightness systems. We conclude that the discrepancy arises both from detection losses of diffuse galaxies and, more fundamentally, from the inability of current hydrodynamical simulations to produce enough faint compact galaxies at z>3. This tension points to the need for improved modeling of early star formation, feedback, and dust treatment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper forward-models TNG100 and EAGLE simulations into mock CANDELS images using the FORECAST code across ten independent lightcone realizations. It compares both intrinsic lightcone catalogs and mock detections to observational counts, finding a persistent faint-end deficit at z>3 that exceeds completeness-corrected observations already at the 50% limit and is attributed to underproduction of compact galaxies with bright cores in the simulations, alongside some detection losses of diffuse systems.
Significance. If the central claim holds, the work strengthens the case for limitations in current hydrodynamical simulations regarding early star formation, feedback, and dust at z>3, while demonstrating the value of multi-realization forward modeling for isolating simulation physics from observational selection. The reported field-to-field consistency and structural diagnostics provide a solid basis for the tension identified.
major comments (2)
- [completeness and detection analysis] § on completeness and detection (abstract and results): the claim that completeness-corrected CANDELS counts exceed intrinsic simulation counts at the 50% limit is load-bearing for ruling out simple threshold losses, but the manuscript provides limited detail on exact completeness modeling, error propagation, and how post-hoc choices in FORECAST affect the z>3 comparison; this needs explicit quantification to support attribution to simulation physics.
- [structural analyses] § on structural analyses: the distinction between observed compact galaxies and simulated diffuse systems assumes identical morphology measurement (e.g., concentration or core brightness) in mocks versus data. The abstract mentions validation via stellar mass and multi-band checks but does not detail how PSF, noise, background subtraction, or source-extraction thresholds are matched for different surface-brightness profiles; this is central to the claim that simulations underproduce faint compact galaxies.
minor comments (2)
- [methods] Clarify in the methods whether the ten lightcone realizations are fully independent or share initial conditions, and report the exact field-to-field variance in the counts.
- [results] The abstract states that increasing mock depth recovers counts near the peak but overpredicts the faintest sources; add a quantitative statement on the magnitude range where this transition occurs.
Simulated Author's Rebuttal
We thank the referee for their careful and constructive review of our manuscript. The comments have identified areas where additional methodological detail will strengthen the presentation of our results. We address each major comment below and indicate the changes we will make in revision.
read point-by-point responses
-
Referee: [completeness and detection analysis] § on completeness and detection (abstract and results): the claim that completeness-corrected CANDELS counts exceed intrinsic simulation counts at the 50% limit is load-bearing for ruling out simple threshold losses, but the manuscript provides limited detail on exact completeness modeling, error propagation, and how post-hoc choices in FORECAST affect the z>3 comparison; this needs explicit quantification to support attribution to simulation physics.
Authors: We agree that explicit quantification of the completeness procedure is required to support the claim that the discrepancy is not due to simple detection threshold effects. In the revised manuscript we will expand the completeness and detection section to provide a full description of the modeling, including the precise algorithm used to compute completeness fractions, the propagation of Poisson and cosmic-variance uncertainties across the ten lightcone realizations, and the sensitivity of the z>3 comparison to the specific post-hoc choices made in FORECAST (e.g., source-extraction parameters and background estimation). We will include supplementary figures showing completeness curves versus magnitude for each field and redshift bin, together with the resulting corrected counts and their uncertainties at the 50% limit. revision: yes
-
Referee: [structural analyses] § on structural analyses: the distinction between observed compact galaxies and simulated diffuse systems assumes identical morphology measurement (e.g., concentration or core brightness) in mocks versus data. The abstract mentions validation via stellar mass and multi-band checks but does not detail how PSF, noise, background subtraction, or source-extraction thresholds are matched for different surface-brightness profiles; this is central to the claim that simulations underproduce faint compact galaxies.
Authors: We thank the referee for emphasizing the importance of demonstrating that morphological measurements are performed identically on mocks and data. The FORECAST pipeline convolves the simulated images with the CANDELS PSF, adds realistic noise, and applies the same background subtraction and source-extraction settings used on the observations. Nevertheless, we acknowledge that the current text does not provide sufficient detail on how these steps are tuned for galaxies with differing surface-brightness profiles. In revision we will add a dedicated subsection that (i) specifies the exact source-extraction thresholds and concentration/core-brightness definitions applied to both datasets, (ii) presents validation tests in which input mock morphologies are recovered after the full observational processing, and (iii) shows direct comparisons of the resulting structural-parameter distributions for the faint z>3 population. revision: yes
Circularity Check
No significant circularity; comparisons use external simulations and independent forward modeling.
full rationale
The paper generates mock CANDELS images from independent external simulations (TNG100 and EAGLE) via the FORECAST pipeline and directly compares intrinsic catalogs, detected counts, and structural properties to observational data. The central claim about missing faint compact galaxies follows from these count excesses (already at 50% completeness) and morphology differences, without any quantity being defined in terms of itself or a prediction forced by fitting the target dataset. The reference to prior work merely contextualizes the known discrepancy; the present analysis re-derives and extends the result across fields, depths, and structural metrics using new realizations. This keeps the chain self-contained against external benchmarks rather than reducing to self-citation or tautological input.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The TNG100 and EAGLE simulations provide a sufficiently realistic representation of galaxy formation physics at z>3 for the purposes of this count comparison.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.