Predicted number counts and clustering of Hi galaxies from future radio surveys

Ainulnabilah Nasirudin; Isabelle Ye; Philip Bull

arxiv: 2604.26886 · v1 · submitted 2026-04-29 · 🌌 astro-ph.CO

Predicted number counts and clustering of Hi galaxies from future radio surveys

Ainulnabilah Nasirudin , Philip Bull , Isabelle Ye This is my paper

Pith reviewed 2026-05-07 12:00 UTC · model grok-4.3

classification 🌌 astro-ph.CO

keywords galaxiesclusteringcosmologicalcountsgalaxynumberpropertiesradio

0 comments

The pith

Predictions of HI galaxy number counts, redshift distributions, and clustering for SKA-MID using S3-SAX, GAEA, and IllustrisTNG simulations, with HOD fits for sample variance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neutral hydrogen in galaxies emits a radio signal at 21 cm that can be used to measure distances precisely. The paper takes three different computer models of how galaxies form and evolve, each built on different assumptions about dark matter and gas physics. From these models it extracts how many galaxies would be visible to a future radio telescope array called SKA-MID at various depths and distances. It then estimates how well such a survey could measure the large-scale structure of the universe and how much the results might vary across different patches of sky.

Core claim

We present predictions for galaxy number counts as a function of sensitivity cut and redshift, and use these to forecast the cosmological performance of a proposed SKA-MID cosmological survey. Finally, we fit a halo occupation distribution model to low-redshift angular correlation functions to constrain clustering properties.

Load-bearing premise

The three simulations (S3-SAX, GAEA, IllustrisTNG) accurately capture the HI content, distribution, and clustering of real galaxies across the relevant redshifts and luminosities; any systematic mismatch would directly affect the number-count and cosmological forecasts.

read the original abstract

The 21cm emission line from neutral hydrogen (HI) contained within galaxies provides a way to make accurate spectroscopic redshift determinations in the radio part of the spectrum. Large radio arrays such as SKA-MID are coming online that will have the sensitivity and survey time required to catalogue hundreds of thousands to millions of HI galaxies, opening up the possibility of studying the cosmological large scale structure using this technique. The expected number counts and clustering properties of the galaxies are still quite poorly understood however. We use three different simulated galaxy catalogues to predict the properties of the HI galaxy distribution that SKA-MID will be able to observe, along with estimates of the error on these predictions due to modelling uncertainty. The simulations in question are from S$^3$-SAX (semi-analytic models based on the Millennium dark matter-only simulation); GAEA (an updated semi-analytic model partially calibrated on hydrodynamical simulations); and IllustrisTNG (a hydrodynamical simulation). We present predictions for galaxy number counts as a function of sensitivity cut and redshift, and use these to forecast the cosmological performance of a proposed SKA-MID cosmological survey. Finally, we fit a halo occupation distribution model to low-redshift angular correlation functions to constrain clustering properties of multiple sub-volumes of the simulations to gain insight into the expected variation (sample variance) over smaller survey areas.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript uses three simulated galaxy catalogs (S³-SAX semi-analytic, GAEA semi-analytic, and IllustrisTNG hydrodynamical) to predict HI galaxy number counts as a function of sensitivity cut and redshift for SKA-MID, forecasts the cosmological performance of a proposed SKA-MID survey, and fits halo occupation distribution (HOD) models to low-redshift angular correlation functions from simulation sub-volumes to assess clustering variations and sample variance.

Significance. If the simulated HI populations prove faithful to observations, the work would supply practical forecasts for number counts, survey optimization, and cosmological constraints from future SKA-MID HI galaxy surveys. The explicit use of three independent simulation suites to bracket modeling uncertainty is a clear strength, as it supplies a concrete (if incomplete) estimate of systematic spread in the headline predictions.

major comments (3)

[Abstract and simulation description] Abstract and simulation description: The headline number-count predictions and SKA-MID cosmological forecasts rest entirely on the fidelity of the three input catalogs in reproducing the HI mass function, luminosity distribution, and bias across the relevant flux limits and redshifts. No quantitative comparison of the simulated HI number densities or mass functions against existing low-z observational constraints (e.g., ALFALFA or HIPASS) is presented, so the quoted inter-simulation spread cannot be shown to bound the true systematic error.
[Cosmological performance forecast section] Cosmological performance forecast section: The mapping from predicted number counts to forecasted cosmological constraints does not include an error budget that propagates potential correlated biases in the HI prescriptions (gas cooling, feedback, low-mass halo resolution) across the three suites; the modeling uncertainty is therefore likely underestimated.
[HOD fitting and clustering section] HOD fitting and clustering section: The HOD fits to low-redshift angular correlation functions are used to quantify sample variance in clustering, yet no goodness-of-fit metrics, parameter posteriors, or covariance information are reported, making it impossible to judge whether the derived variations are robust or dominated by fitting noise.

minor comments (2)

[Figures] Figure captions and axis labels should explicitly state the flux or sensitivity cuts used for each curve to avoid ambiguity when comparing number-count panels.
[Methods] The manuscript should clarify whether the three simulations share any common sub-grid physics or calibration data that could induce correlated systematics in the HI properties.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. We address each major comment point by point below and have revised the manuscript to incorporate the suggested improvements where appropriate.

read point-by-point responses

Referee: [Abstract and simulation description] Abstract and simulation description: The headline number-count predictions and SKA-MID cosmological forecasts rest entirely on the fidelity of the three input catalogs in reproducing the HI mass function, luminosity distribution, and bias across the relevant flux limits and redshifts. No quantitative comparison of the simulated HI number densities or mass functions against existing low-z observational constraints (e.g., ALFALFA or HIPASS) is presented, so the quoted inter-simulation spread cannot be shown to bound the true systematic error.

Authors: We agree that a direct quantitative comparison to low-redshift observational constraints would provide important context and help readers evaluate whether the inter-simulation spread is representative of the true systematic uncertainty. Although the manuscript's primary emphasis is on differences arising from distinct modeling approaches, we will add a new subsection (and associated figure) in the simulation description that compares the HI mass functions and number densities from S³-SAX, GAEA, and IllustrisTNG against ALFALFA and HIPASS data at z ≈ 0. This addition will explicitly discuss how well each simulation reproduces the observed HI mass function and will clarify the limitations of using the simulation spread alone as an error estimate. revision: yes
Referee: [Cosmological performance forecast section] Cosmological performance forecast section: The mapping from predicted number counts to forecasted cosmological constraints does not include an error budget that propagates potential correlated biases in the HI prescriptions (gas cooling, feedback, low-mass halo resolution) across the three suites; the modeling uncertainty is therefore likely underestimated.

Authors: We appreciate the referee's point on the propagation of correlated modeling biases. The three simulations employ substantially different HI modeling frameworks, which already captures a range of uncertainties in gas physics and halo resolution. Nevertheless, we acknowledge that certain systematic effects (e.g., feedback prescriptions or low-mass halo resolution limits) could be partially correlated. In the revised manuscript we will expand the discussion within the cosmological forecasts section to include a qualitative assessment of these potential correlations, explicitly stating that the quoted modeling uncertainty represents a lower bound. We will also add a corresponding caveat to the abstract and conclusions. revision: partial
Referee: [HOD fitting and clustering section] HOD fitting and clustering section: The HOD fits to low-redshift angular correlation functions are used to quantify sample variance in clustering, yet no goodness-of-fit metrics, parameter posteriors, or covariance information are reported, making it impossible to judge whether the derived variations are robust or dominated by fitting noise.

Authors: We agree that reporting fit quality metrics and parameter information is necessary for readers to assess the robustness of the sample-variance estimates. In the revised manuscript we will augment the HOD section with reduced χ² (or equivalent goodness-of-fit) values for each sub-volume fit, the best-fit HOD parameters together with their uncertainties, and a brief description of the covariance matrix used in the fitting procedure. These additions will allow a clear evaluation of whether the reported clustering variations are dominated by fitting noise. revision: yes

Circularity Check

0 steps flagged

No significant circularity; predictions extracted from independent external simulations

full rationale

The paper derives its headline number counts, redshift distributions, and SKA-MID cosmological forecasts by directly processing galaxy catalogs from three pre-existing, independent simulation suites (S³-SAX, GAEA, IllustrisTNG). The only fitting step—HOD modeling of low-redshift angular correlation functions—is performed on sub-volumes of those same simulations solely to quantify internal sample variance, not to generate or redefine the primary counts or forecasts. No self-citations, self-definitional loops, fitted inputs renamed as predictions, or ansatzes imported from prior author work appear in the derivation chain. The results remain tied to external simulation outputs rather than reducing to quantities constructed from the paper's own fitted parameters.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate specific free parameters or axioms; standard cosmological assumptions (flat LCDM, halo mass functions) are implicitly used but not listed.

pith-pipeline@v0.9.0 · 5541 in / 1035 out tokens · 46986 ms · 2026-05-07T12:00:27.248828+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

HI Simulations for Cosmology with the SKA Observatory
astro-ph.CO 2026-06 unverdicted novelty 2.0

Overview of HI modeling methods finds consistency in cosmic HI density but systematic differences in HI-halo mass relation shape and redshift evolution.