arxiv: 2604.24863 · v1 · submitted 2026-04-27 · 🌌 astro-ph.CO · astro-ph.GA

Bound or blown: the fate of hot gas in galaxy groups

R. Seppi , D. Eckert , J. Schaye , J. Braspenning , M. Schaller , B. D. Oppenheimer , E. O'Sullivan , F. Gastaldello

show 9 more authors

L. Lovisari M. A. Bourne M. Sun A. Finoguenov H. Khalil G. Gozaliasl K. Kolokythas Y. E. Bahar R. Santra

This is my paper

Pith reviewed 2026-05-08 01:32 UTC · model grok-4.3

classification 🌌 astro-ph.CO astro-ph.GA

keywords galaxy groupsAGN feedbackX-ray scaling relationshydrodynamical simulationshot gas contentXMM-Newton observationsselection function modeling

0 comments

The pith

Intermediate AGN feedback strengths match the hot gas properties of galaxy groups observed by XMM-Newton, while the strongest ejection models do not.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how different strengths of AGN feedback affect the amount of hot gas retained in galaxy groups by comparing real X-ray data to a set of simulations. It builds realistic mock observations that include the same detection limits and measurement effects as the actual survey, then checks multiple properties such as luminosity-temperature and gas-mass-temperature relations at once. Intermediate feedback levels reproduce the observed relations with little tension, but models that eject far more gas than usual are clearly inconsistent with the data. This approach shows that the thermodynamic state of gas in these systems can discriminate among feedback prescriptions when selection effects are properly modeled.

Core claim

By generating end-to-end XMM-Newton mock observations from FLAMINGO hydrodynamical simulations that vary AGN feedback strength, the analysis finds that the normalization of the scaling relations provides the strongest test. The fgas-2sigma model yields the lowest overall tension of 0.8 sigma with the X-GAP sample, whereas the fgas-8sigma model is excluded at more than 4 sigma. Number counts fluctuate by more than 20 percent due to cosmic variance and are therefore a weaker discriminator than the relations themselves.

What carries the argument

Forward modeling of the full X-GAP selection function, detection thresholds, and observational systematics applied to simulated groups, producing mock X-ray images and catalogs analyzed identically to the real data.

If this is right

Thermodynamic properties of galaxy groups favor feedback stronger than the fiducial FLAMINGO calibration but rule out the most ejective scenarios.
The normalization of L-T and Mgas-T relations serves as the primary discriminator between feedback models.
Cosmic variance causes greater than 20 percent fluctuations in the number of detected groups, weakening counts as a standalone test.
Multi-observable constraints combined with forward modeling are required to probe the fate of hot baryons in low-mass halos.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar forward-modeling techniques could tighten constraints on feedback when applied to next-generation X-ray surveys with larger group samples.
The retained hot gas fraction in groups may influence how baryons are distributed on larger scales in the cosmic web.
Future simulations could be calibrated directly against these multi-observable tensions to reduce uncertainty in AGN feedback prescriptions.

Load-bearing premise

The forward model accurately recovers input luminosities, gas masses, and core-excised temperatures for regular systems, enabling direct comparison in observable space.

What would settle it

A larger X-ray sample or deeper observations that show scaling-relation normalizations matching the fgas-8sigma simulation at high significance would falsify the preference for intermediate feedback.

Figures

Figures reproduced from arXiv: 2604.24863 by A. Finoguenov, B. D. Oppenheimer, D. Eckert, E. O'Sullivan, F. Gastaldello, G. Gozaliasl, H. Khalil, J. Braspenning, J. Schaye, K. Kolokythas, L. Lovisari, M. A. Bourne, M. Schaller, M. Sun, R. Santra, R. Seppi, Y. E. Bahar.

**Figure 1.** Figure 1: Expected properties of an X-GAP-like sample selected from different FLAMINGO models. The L1_m8 includes tests for cosmic variance (CV) and uncertainties on the selection function. The top panel shows the number of groups within an SDSS-like area of 7430 deg2 , the bottom one shows the median temperature of the selected sample. The latter is a promising discriminator between FLAMINGO models. 2.4. Selectio… view at source ↗

**Figure 2.** Figure 2: Gas fraction as a function of mass for the selected systems (solid lines and shaded areas) and the full sample (dashed lines). Both are true input quantities. The top panel shows the mass distribution of the selected systems: skewed to higher masses for strong feedback models. The bottom panel denotes the ratio between the gas fraction in the XGAP-like selected sample and the whole population. While the L… view at source ↗

**Figure 3.** Figure 3: Workflow for the end-to-end XMM-Newton simulation of FLAMINGO galaxy groups down to the direct comparison with X-GAP. ries. They would only show up as a hard tail in the spectrum of the inner most bin in our analysis, and their contribution is expected to be around 1040 erg/s (Boroson et al. 2011), negligible in the soft X-rays compared to the hot gas luminosity in the regime of galaxy groups. 3.2. X-ray … view at source ↗

**Figure 5.** Figure 5: Observables used for the comparison between X-GAP and various FLAMINGO models: the normalisation of the scaling relation between X-ray luminosity and temperature (both core-excised), between gas mass within 400 kpc and core-excised temperature, the total number of groups, the mean temperature and galaxy member velocity dispersion. X-GAP shows the best agreement with the fgas − 2σ model. et al. (2025), who … view at source ↗

read the original abstract

The impact of AGN feedback on the hot gas content of galaxy groups remains a key uncertainty in galaxy formation and its connection to the large scale structure of the Universe. We aim to compare the XMM-Newton Group AGN Project (X-GAP) sample to the hydrodynamical FLAMINGO simulations, which span a wide range of AGN feedback prescriptions. We construct X-GAP analogues by forward-modelling the full selection function, including detection and observational systematics, and generate end-to-end XMM-Newton mock observations analysed consistently with the data. We study multiple observables, including the L--T and Mgas--T relations, number of groups, mean temperature, and velocity dispersion, accounting for their covariance. The forward model accurately recovers input luminosities, gas masses, and core-excised temperatures for regular systems, enabling direct comparison in observable space. The normalisation of the scaling relations is the best discriminator between feedback models, while cosmic variance introduces > 20% fluctuations in the number of detected systems, making counts alone a weak discriminator. Models with intermediate feedback strength provide the best agreement with X-GAP, with the fgas-2sigma model yielding the lowest tension of only 0.8sigma, while the most extreme feedback scenario (fgas-8sigma) is ruled out at > 4sigma. Our results indicate that the thermodynamic properties of galaxy groups favour feedback stronger than the fiducial FLAMINGO calibration, but disfavour the most ejective models. This highlights the importance of combining forward modelling and multi-observable constraints to probe the fate of hot baryons in low-mass haloes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Intermediate AGN feedback in FLAMINGO fits X-GAP best while extreme models are disfavored, but the tension numbers hinge on forward-model performance for the full sample including disturbed groups.

read the letter

The main result is that the fgas-2sigma FLAMINGO variant matches the X-GAP scaling relations and other observables at 0.8 sigma tension, while the strongest feedback run is excluded above 4 sigma. They reach this by generating end-to-end XMM mocks from the simulations that include the full selection function and then analyzing everything the same way as the real data. The paper does a clean job of this forward modeling and of folding in covariance across the L-T, Mgas-T, temperature, and velocity dispersion measurements. It is also useful that they point out how cosmic variance makes raw counts a weak discriminator, so the normalization of the scaling relations does the real work. This is a straightforward, incremental application of an existing simulation suite to a new observational sample, and the multi-observable approach is a step up from single-relation comparisons. The soft spot is the validation of the forward model. It is stated to recover luminosities, gas masses, and core-excised temperatures accurately for regular systems, but the text gives no numbers on what fraction of the X-GAP sample or the simulated analogues meet the regularity cuts, nor any test results for disturbed systems. Groups frequently show merger or AGN-driven disturbances, and any systematic offset in recovered properties for those objects would shift the observed relations and the tension metric. Without that check, the strength of the exclusion for the most ejective model is harder to judge. This work is aimed at groups tuning AGN feedback in hydro simulations and at X-ray observers studying low-mass halos. The methods are careful enough and the question is concrete enough that it deserves a serious referee, though the review should focus on the regularity assumption and the exact covariance treatment.

Referee Report

2 major / 2 minor

Summary. The paper compares the X-GAP sample of galaxy groups from XMM-Newton observations to FLAMINGO hydrodynamical simulations spanning a range of AGN feedback strengths. By forward-modeling the full selection function, generating end-to-end XMM mock observations analyzed identically to the data, and comparing multiple observables (L-T and Mgas-T scaling relations, group counts, mean temperature, velocity dispersion) while accounting for their covariance, the authors find that intermediate feedback models best match the observations. Specifically, the fgas-2sigma model yields the lowest tension (0.8σ), while the most extreme fgas-8sigma model is ruled out at >4σ. The work concludes that group thermodynamic properties favor feedback stronger than the fiducial FLAMINGO calibration but disfavor the most ejective scenarios.

Significance. If the central results hold, this provides important empirical constraints on AGN feedback efficiency in low-mass halos, directly addressing uncertainties in how baryons are ejected or retained and their effects on large-scale structure. The methodological approach of full forward modeling of selection effects combined with multi-observable covariance-aware comparison is a clear strength, enabling more robust discrimination between feedback variants than single-relation studies. It also quantifies the limited discriminating power of number counts due to cosmic variance (>20% fluctuations).

major comments (2)

[Abstract and forward-modeling section] Abstract and forward-modeling section: The claim that the forward model 'accurately recovers input luminosities, gas masses, and core-excised temperatures for regular systems' underpins the direct observable-space comparison and the reported tensions (0.8σ and >4σ). However, no quantitative validation is provided for non-regular/disturbed systems, and the fraction of the X-GAP sample (or simulated analogues) meeting the regularity criteria is not reported. Disturbed systems are common at group scales; any recovery biases in L, Mgas or T would shift scaling-relation normalizations and covariance matrices, altering which feedback model is preferred and the strength of the >4σ exclusion.
[Results and tension calculation] Tension calculation and covariance treatment (implied in results section): The manuscript states that observables are compared 'accounting for their covariance,' but full details on covariance matrix construction, error propagation, and the exact tension metric are not provided. Since the central claim of ruling out extreme models at >4σ rests on this multi-observable statistic, insufficient documentation prevents full verification of the quoted significances.

minor comments (2)

[Simulation descriptions] Clarify the precise parameter differences between the fgas-2sigma and fgas-8sigma variants (e.g., in a table of feedback parameters) to aid reproducibility.
[Figures and methods] Figure captions and text should explicitly state the regularity criteria used in the recovery tests to allow readers to assess applicability to the full sample.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive report and positive assessment of the work's significance. We address each major comment below with the strongest honest response possible. Where the manuscript is incomplete, we agree revisions are needed and will incorporate the requested details.

read point-by-point responses

Referee: [Abstract and forward-modeling section] The claim that the forward model 'accurately recovers input luminosities, gas masses, and core-excised temperatures for regular systems' lacks quantitative validation for non-regular/disturbed systems. The fraction of the X-GAP sample (or simulated analogues) meeting regularity criteria is not reported. Disturbed systems are common; biases could alter scaling relations and the reported tensions.

Authors: We agree the manuscript does not report the fraction of regular systems in X-GAP or provide quantitative recovery tests for disturbed systems. The validation statement applies specifically to regular systems, which form the core of the X-GAP thermodynamic analysis. In the revised manuscript we will add a dedicated paragraph in the forward-modeling section stating the regularity criteria applied, the observed fraction of X-GAP groups meeting them, and recovery statistics from simulated disturbed analogues to quantify any residual biases in L, Mgas and T. revision: yes
Referee: [Results and tension calculation] The manuscript states that observables are compared 'accounting for their covariance,' but full details on covariance matrix construction, error propagation, and the exact tension metric are not provided. This prevents verification of the quoted significances including the >4σ exclusion.

Authors: We acknowledge that the covariance construction and tension metric are described only at a high level. The covariance matrix was built from the joint posterior of the simulated observables (including cosmic variance), and tension was evaluated via a multivariate chi-squared statistic. In the revised manuscript we will add an appendix that explicitly gives the covariance matrix elements, the error propagation procedure, and the precise tension formula used, allowing full reproduction of the 0.8σ and >4σ results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; external data-simulation comparison

full rationale

The paper's derivation chain consists of forward-modeling the X-GAP selection function on FLAMINGO simulation variants, generating end-to-end XMM mocks, and comparing multi-observable statistics (L-T, Mgas-T, counts, etc.) with covariance to the independent X-GAP dataset. The abstract states that the forward model 'accurately recovers input luminosities, gas masses, and core-excised temperatures for regular systems,' which is presented as a supporting validation test rather than a definitional step. No equation or claim reduces a 'prediction' to a fitted parameter by construction, nor does any load-bearing premise rely on a self-citation chain or imported uniqueness theorem. The tension values (0.8sigma vs. >4sigma) emerge from direct observable-space comparison to external observations, satisfying the criterion of being self-contained against external benchmarks. No circular steps are identifiable from the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on the accuracy of the FLAMINGO hydrodynamical simulations, the completeness of the X-GAP selection function, and the assumption that forward-modeled mocks faithfully reproduce observational systematics.

axioms (2)

standard math Standard Lambda-CDM cosmology and hydrodynamical equations govern the simulations
Invoked throughout the FLAMINGO runs used for comparison
domain assumption The X-GAP sample selection function and observational systematics are fully captured by the forward model
Central to enabling direct observable-space comparison

pith-pipeline@v0.9.0 · 5678 in / 1328 out tokens · 84622 ms · 2026-05-08T01:32:47.558679+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 1 canonical work pages

[1]

Abbott, T. M. C., Aguena, M., Alarcon, A., et al. 2022, Phys. Rev. D, 105, 023520 Abril-Pla, O., Andreani, V ., Carroll, C., et al. 2023, PeerJ Computer Science, 9 Akino, D., Eckert, D., Okabe, N., et al. 2022, PASJ, 74, 175 Alam, S., Albareti, F. D., Allende Prieto, C., et al. 2015, ApJS, 219, 12 Aricò, G., Angulo, R. E., Zennaro, M., et al. 2023, A&A, 6...

work page arXiv 2022
[2]

Posterior values are reported from the third column onward, for each case labelled in the top row

0.02±0.01 Notes.The symbolU(M,N) denotes a uniform prior between the values M and N. Posterior values are reported from the third column onward, for each case labelled in the top row. mate, but do not necessarily exactly reproduce, SDSS Petrosian magnitudes. At the low redshifts considered here (z<0.05), such differences are expected to be subdominant rel...

2025
[3]

The result is presented in Fig. B.3. The selection bias follows a similar pat- tern: incompleteness at low masses and a down-scattered popu- lation at high masses caused by the upper radius cut. However, the effect is more pronounced for theL–Trelation because gas fraction is less directly tied to the selection observable (X-ray flux). We measure deviatio...

2020
[4]

In addition, a slight offset and a portion of the observed scatter may originate from the cylindrical correc- tion (see Eq

orapec (this work) does not impact the measurement of X-ray luminos- ity in our framework. In addition, a slight offset and a portion of the observed scatter may originate from the cylindrical correc- tion (see Eq. 2). The correction is applied to the reference true luminosities, whereas the reconstructed values are derived inde- pendently from the mock p...

2004
[5]

4.2) we also store the input temperature and input veloc- ity dispersion

Appendix C.1: Observables correlation From theN-light cone generationprocedure (explained in Sect. 4.2) we also store the input temperature and input veloc- ity dispersion. Although these are not identical to the measured quantities, they provide a physically motivated baseline to quan- tify correlations between observables driven by halo population and s...

2020
[6]

observable we compute a two-sided tail probabilityp j and define Article number, page 17 of 20 A&A proofs:manuscript no. aa60011-26 the summary statisticSfollowing the Fisher method such that: p j =2×min(F j(x j),1− F j(x j)), S=−2 5X i=1 logp j.(C.1) Because thep j are correlated, we do not expectSto neces- sarily follow the standardχ 2 distribution. We ...

2026
[7]

In contrast, theM gas–Trela- tion is more sensitive: an enhancedL X would require lowerM gas to match the observations

TheL–Tnormalisation is largely unaffected, as luminosity drives the selection. In contrast, theM gas–Trela- tion is more sensitive: an enhancedL X would require lowerM gas to match the observations. We recompute the input X-ray luminosities withpyXSIMby fixing the metallicity of gas particles within R 500c to 0.3 Z ⊙ for haloes with M 500c >5×10 12 M⊙. Th...

2026