arxiv: 2604.26890 · v1 · submitted 2026-04-29 · 🌌 astro-ph.IM

Recognition: unknown

Bayesian component separation and power spectrum estimation for 21 cm intensity mapping data cubes

Geoff G. Murphy , Philip Bull , Mario G. Santos , Zheng Zhang , Steven Cunnington

Authors on Pith no claims yet

Pith reviewed 2026-05-07 11:47 UTC · model grok-4.3

classification 🌌 astro-ph.IM

keywords 21 cm intensity mappingforeground removalBayesian component separationGibbs samplingpower spectrum estimationRFI flaggingintensity mapping

0 comments

The pith

Bayesian sampling of a joint parametric model separates foregrounds from the 21 cm signal at the map level and recovers the HI power spectrum within statistical uncertainties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models simulated single-dish 21 cm intensity mapping observations using a parametric Bayesian framework that jointly describes foreground emission, the cosmological HI signal, and instrumental noise. Gibbs sampling draws the spectral parameters while Gaussian constrained realizations generate the spatial map coefficients, allowing efficient sampling from the posterior even with more than two million free parameters. This process yields map-level component separations together with uncertainties and produces a one-dimensional HI power spectrum estimate that lies within 2 sigma of the input model. The same machinery remains effective when radio frequency interference removes frequency channels, because the constrained realizations fill the gaps with statistically consistent values drawn from the posterior.

Core claim

Sampling the posterior distribution with Gibbs sampling for the spectral indices and Gaussian constrained realizations for the map amplitudes allows the model to jointly infer the foreground and 21 cm components while recovering the one-dimensional HI power spectrum to within the reported statistical uncertainties, even in the presence of strong foreground contamination and frequency channel flagging.

What carries the argument

Gibbs sampling combined with Gaussian constrained realizations to draw samples from the posterior of a parametric model containing foregrounds, 21 cm signal, and noise.

If this is right

Foreground and 21 cm maps are recovered at the pixel level together with uncertainties taken directly from the joint posterior.
The one-dimensional HI power spectrum is recovered to within 2 sigma of the true model despite strong foregrounds.
Missing frequency channels from RFI flagging are in-painted with statistically consistent realizations drawn from the model.
Full posterior samples provide statistical realizations of both the maps and the power spectrum for downstream use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The availability of the full joint posterior could let uncertainties propagate directly into cosmological parameter constraints without separate transfer-function corrections.
The same sampling strategy might extend to other high-dimensional component separation tasks in radio data where gaps and bright contaminants coexist.
Replacing the current parametric foreground model with more flexible forms could test how sensitive the power-spectrum recovery remains to model misspecification.

Load-bearing premise

The parametric forms chosen for foregrounds, the 21 cm signal, and noise accurately describe the statistical properties present in the data.

What would settle it

Apply the sampler to simulated data cubes with a known input HI power spectrum and check whether the recovered estimate and its uncertainty interval contain the true value at the expected frequency.

Figures

Figures reproduced from arXiv: 2604.26890 by Geoff G. Murphy, Mario G. Santos, Philip Bull, Steven Cunnington, Zheng Zhang.

**Figure 1.** Figure 1: The first five foreground eigenvectors, of which only the first four are used in the model. 3 DATA MODEL This section describes our approach to constructing and modelling the different components that we want to constrain: the amplitudes of a set of foreground modes; 3D Fourier mode amplitudes of the 21 cm field; and the 1D 21 cm power spectrum (i.e. the 21 cm signal covariance). We describe the process of… view at source ↗

**Figure 2.** Figure 2: An example of the trace of a foreground mode amplitude to highlight the burn-in period. For all results, we draw a total of 2000 samples, discard the first 750 as burn in, and use the remaining 1250 samples for our statistics. P(X m clean,Xm) in Eq. 29 is the cross-correlation power spectrum between the cleaned mock + data component, X m clean, and the mock signal, Xm. The denominator is the autocorrelati… view at source ↗

**Figure 4.** Figure 4: The recovery of the true power spectrum shown as the fractional residuals for the case of power spectra SNR values of 0.5 and 4 — orange solid line and blue dashed line, respectively. These residuals are calculated with respect to the median power spectra, with their associated 95% CL statistical uncertainties being derived from 1250 samples. Points have been offset slightly in k for legibility. Here, SCS … view at source ↗

**Figure 5.** Figure 5: The spread in P(k) samples in the highest k-bin for the two noise cases shown here. The vertical black dashed line denotes the true P(k) value. This figure is primarily shown to demonstrate that, while errors are narrow, higher noise cases do induce a wider spread in the P(k) samples. 6.1 Power spectrum and 21 cm field recovery view at source ↗

**Figure 6.** Figure 6: Upper left: An example of the foregrounds from the true data cube shown as a slice at a constant frequency of ν = 722 MHz. Upper middle: The recovered foregrounds at SNR = 4 (the marginal mean of the foreground component at this frequency). Upper right: The foreground residuals, calculated as the difference between the previous two panels. Note the much narrower range of values. Lower left: A slice of the … view at source ↗

**Figure 7.** Figure 7: shows the distribution of the total model (signal plus foreground) residuals compared to the noise, calculated as Tresiduals = ³ T True FG +T True HI +TNoise´ − ³ T Model FG +T Model HI ´ , (32) MNRAS 000, 1–17 (2026) view at source ↗

**Figure 8.** Figure 8: Sample posteriors of seven complex signal Fourier modes, |s˜|, compared to their corresponding true values. The indices denote Fourier modes in the 65 × 128 × 128-shaped rfft cube. This corner plot, and the following ones, are made with 9250 samples (i.e. 10000 samples, with the first 750 being discarded as burn in), in order to provide a clearer demonstration of the model’s convergence. The red lines deno… view at source ↗

**Figure 10.** Figure 10: The log posteriors of the four frequency-frequency foreground covariance modes in the model. the foreground covariance parameters are shared across all pixels.) The picture is more complex for these parameters; the lower order (smoother) foreground mode amplitudes are biased away from the true values, which we expect is due to the partial degeneracy between the signal and foreground modes on large scales … view at source ↗

**Figure 11.** Figure 11: Top: Plots of the foreground subtracted power spectrum when blind PCA cleaning is applied in a negligible-noise scenario (red squares), and the subsequent transfer function-corrected power spectrum (green crosses), in comparison to the true power spectrum (black points). Our sample median bandpowers are the orange diamonds, which are from the inverse-Gamma sampler. The dark purple, rightward pointing tria… view at source ↗

**Figure 12.** Figure 12: The mean transfer function used to correct the blind foreground cleaned power spectrum in view at source ↗

**Figure 13.** Figure 13: The data cube recovery in the presence of frequency channel flags. Upper left: the true data cube (foregrounds, signal, and noise) with flags applied. Upper right: The sample mean recovered data cube, where the flagged regions have been in-painted by the model. Lower left: The residuals of the model sample mean and the true, unflagged data. Lower right: The sample standard deviation of the model data cube. From view at source ↗

**Figure 14.** Figure 14: Top left: the true 21 cm field with frequency channel flags applied. Top right: A single field realisation from our model, with in-painted flagged channels Middle left: the sample mean of our recovered field. Middle right: the residuals of the model sample mean and the true, unflagged Hi field. Bottom: the sample standard deviation of the model signal field. in the flagged channels is also higher than the… view at source ↗

**Figure 16.** Figure 16: A comparison between the noise and model residuals for the results of view at source ↗

**Figure 15.** Figure 15: The fractional residuals of the recovered power spectrum in the presence of flagged data compared to complete data. We focus on the high-k modes in the lower plot for visibility. The orange squares denote recovery in the case of ∼ 28% of the data volume being flagged, which corresponds to the results of view at source ↗

**Figure 17.** Figure 17: The sampled power spectra for differing numbers of foreground PCA modes: three modes denoted by the downward teal points, four by the orange diamond (the default case), and five by the dark purple plus symbols view at source ↗

read the original abstract

Foreground removal remains an ongoing challenge in radio cosmology, and increasingly sensitive experiments necessitate more robust analysis techniques. In this work, we model simulated data from a single-dish intensity mapping experiment, and use the Gibbs sampling and Gaussian constrained realisation (GCR) techniques to draw samples from the posterior probability distribution of the model parameters. This allows for a separation of the foregrounds and 21 cm signal at the map level, as well as recovery of the 1-dimensional HI power spectrum to within statistical uncertainties. Despite the model consisting of over 2 million free parameters in the example presented here, these methods allow us to sample from the Bayesian posterior at a rate of $<30$ seconds per iteration. This framework is also resilient to frequency channel flagging (e.g. due to RFI excision), with the GCR steps effectively in-painting the missing data with statistically-consistent model realisations. The power spectrum is recovered accurately in the presence of strong foreground contamination and RFI flagging -- the estimate falling within $2\sigma$ of the true model in our example, similar to the commonly-used transfer function correction method. Statistical realisations of foreground and HI maps are also recovered, with associated uncertainties available from the full joint posterior distribution of all parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a Gibbs sampling plus GCR pipeline that jointly separates foregrounds and the 21 cm signal at map level while estimating the HI power spectrum, with built-in handling for RFI flagging, but the tests stay inside matched simulations.

read the letter

The core contribution is a Bayesian sampler that draws from the joint posterior over foreground maps, 21 cm maps, and the 1D power spectrum for single-dish intensity mapping data cubes. Gibbs sampling handles the high-dimensional parameter space (over two million free parameters in their example) at under 30 seconds per iteration, and the GCR steps in-paint flagged channels with statistically consistent realizations. On their simulated data with strong foregrounds and RFI flagging, the recovered power spectrum sits within 2 sigma of the input truth, and they also output map realizations with uncertainties from the full posterior. That combination of map-level separation and spectrum estimation in one framework is the practical advance here, and the flagging resilience is a clear plus for real survey data reduction. The work is technically clean on its own terms and shows the method scales without obvious bottlenecks in the reported runs. The main limitation is that the simulations are generated from the exact parametric foreground model, Gaussian 21 cm field, and noise model used in the inference. Recovery is therefore a consistency check rather than a test against misspecification or unmodeled systematics. There are no comparisons against real observations or against more complex foreground simulations, and the performance is described as similar to existing transfer-function approaches. Readers working on 21 cm pipeline development for MeerKAT, SKA, or similar single-dish surveys will find the implementation details and the GCR in-painting step useful. The paper is coherent and addresses a genuine analysis need, so it should go to peer review. Reviewers will probably press for additional tests on mismatched simulations and some real-data application, but the current results are solid enough to warrant that discussion.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a Bayesian component separation framework for 21 cm intensity mapping data cubes that combines Gibbs sampling with Gaussian constrained realizations (GCR) to jointly sample the posterior over foreground maps, 21 cm signal maps, noise, and the 1D HI power spectrum parameters. Demonstrated on simulated single-dish observations containing strong foregrounds, a Gaussian HI field, instrumental noise, and frequency flagging, the method recovers map-level separations with uncertainties and yields a 1D power spectrum estimate lying within 2σ of the input truth, at a sampling cost of <30 s per iteration even with >2 million free parameters.

Significance. If the reported performance holds under more varied conditions, the approach supplies a statistically coherent route to map-level foreground removal and power-spectrum estimation that automatically propagates uncertainties and handles missing channels via GCR in-painting. The computational efficiency for high-dimensional posteriors and the explicit treatment of RFI flagging constitute clear practical strengths for forthcoming intensity-mapping surveys.

major comments (2)

[numerical results / simulation section] The validation is performed exclusively on data cubes generated from the identical parametric model (foregrounds + Gaussian 21 cm field + noise) that is assumed in the likelihood and prior. Consequently the reported agreement of the recovered power spectrum to within 2σ constitutes a self-consistency check rather than a test of robustness to model misspecification or realistic foreground complexity. This directly limits the strength of the central claim that the framework recovers the HI power spectrum accurately in the presence of strong foreground contamination.
[results and discussion] The manuscript does not quantify how the recovered posterior widths or bias change when the foreground spectral index or the 21 cm field deviates from the exact parametric assumptions used to generate the test data. Such a controlled misspecification test is required to establish that the 2σ agreement is not an artifact of matched assumptions.

minor comments (2)

[abstract / introduction] The abstract and introduction would benefit from a concise statement of the precise parametric forms adopted for the foregrounds and the 21 cm covariance; this would clarify the scope of the consistency test.
[figures] Figure captions should explicitly state the number of Gibbs iterations, burn-in length, and convergence diagnostics used to produce the reported posterior summaries.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment below, agreeing that the current validation is a self-consistency test and that additional misspecification experiments would strengthen the presentation of the results.

read point-by-point responses

Referee: [numerical results / simulation section] The validation is performed exclusively on data cubes generated from the identical parametric model (foregrounds + Gaussian 21 cm field + noise) that is assumed in the likelihood and prior. Consequently the reported agreement of the recovered power spectrum to within 2σ constitutes a self-consistency check rather than a test of robustness to model misspecification or realistic foreground complexity. This directly limits the strength of the central claim that the framework recovers the HI power spectrum accurately in the presence of strong foreground contamination.

Authors: We agree that the validation uses data generated from the same parametric model assumed in the likelihood and prior, rendering the 2σ agreement a self-consistency check of the Gibbs sampler and GCR under matched assumptions. The manuscript qualifies the result as applying to our simulated example, which is the appropriate scope for demonstrating that the high-dimensional posterior can be sampled efficiently while recovering the input power spectrum and handling RFI flagging via in-painting. To address the referee's concern about robustness, we will revise the numerical results section to include a discussion of the method's behavior under controlled model variations. revision: yes
Referee: [results and discussion] The manuscript does not quantify how the recovered posterior widths or bias change when the foreground spectral index or the 21 cm field deviates from the exact parametric assumptions used to generate the test data. Such a controlled misspecification test is required to establish that the 2σ agreement is not an artifact of matched assumptions.

Authors: We acknowledge that the manuscript does not currently quantify the effects of deviations in foreground spectral index or 21 cm field properties on posterior widths and bias. In the revised manuscript we will add controlled misspecification tests in the results and discussion section. These will consist of additional simulations generated with altered foreground spectral indices and non-Gaussian 21 cm fields, followed by application of the Bayesian framework and reporting of the resulting biases and changes in posterior widths. This will clarify the sensitivity of the recovered power spectrum to the assumed model form. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The manuscript applies Gibbs sampling and Gaussian constrained realizations to sample the joint posterior over a parametric model (foregrounds + 21 cm Gaussian random field + noise) on simulated data cubes generated from the identical model. The headline result—that the recovered 1D HI power spectrum lies within 2σ of the injected truth and that map-level separation is achieved—is a direct consistency check of the sampler under matched assumptions, not a derivation that reduces to its inputs by construction. No self-definitional equations, fitted parameters renamed as predictions, load-bearing self-citations, or uniqueness theorems imported from prior author work appear in the abstract or described framework. The approach is self-contained against external benchmarks (injected truth) and does not smuggle ansatzes or rename known empirical patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on a linear additive model for the data cube, Gaussian statistics for the components, and the ability to efficiently sample a posterior with millions of parameters; these are standard in the domain but must hold for the separation to be valid.

free parameters (1)

component amplitude and map parameters
Over 2 million free parameters in the model that are sampled from the posterior rather than fixed in advance.

axioms (2)

domain assumption Observed data is a linear superposition of foreground, 21 cm signal, and noise components
Invoked throughout the component separation framework described in the abstract.
domain assumption Gaussian statistics apply to the signal and foreground fields for the GCR step
Required for the Gaussian constrained realization technique to produce statistically consistent realizations.

pith-pipeline@v0.9.0 · 5532 in / 1430 out tokens · 63104 ms · 2026-05-07T11:47:52.743633+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

1 extracted references · 1 canonical work pages

[1]

SIAM, 2 edition, 2003

Aguirre J. E., et al., 2022, The Astrophysical Journal, 924, 85 Alonso D., Bull P., Ferreira P. G., Santos M. G., 2015, Monthly Notices of the Royal Astronomical Society, 447, 400 Amiri M., et al., 2023, ApJ, 947, 16 Bobin J., Starck J.-L., Fadili J., Moudden Y., 2007, IEEE Transactions on Image Processing, 16, 2662 Bull P., Ferreira P. G., Patel P., Sant...

work page doi:10.1137/1.9780898718003 2022