Recognition: unknown
Efficiently emulating distribution functions in gigaparsec volumes for varying cosmological parameters
Pith reviewed 2026-05-10 04:16 UTC · model grok-4.3
The pith
Small regions with varying overdensities emulate the full halo mass function in gigaparsec volumes using 0.026 percent of the simulation volume.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We train a differentiable emulator on halo mass functions from small regions of varying overdensities extracted from large N-body simulations run with different Lambda-CDM parameters. The emulator is conditioned on the region's overdensity and the global parameters. Integrating the emulator outputs weighted by the overdensity distribution recovers the global halo mass function of the entire simulation box.
What carries the argument
The emulator conditioned on local overdensity and global cosmological parameters, which predicts the halo mass function in each small region so that the predictions can be integrated over the overdensity distribution.
If this is right
- Targeted zoom simulations extracted from low-resolution parent volumes can emulate large-volume results at a fraction of the computational cost.
- The dynamic range extends to lower halo masses than can be reached in standard periodic box simulations.
- The same conditioning and integration approach applies to other dark matter and baryonic distribution functions.
- Higher-order statistics can be emulated with direct implications for analyzing data from wide-field surveys.
Where Pith is reading between the lines
- The differentiability of the emulator enables gradient-based methods for efficient cosmological parameter inference.
- The approach could be paired with hydrodynamical simulations inside the small regions to capture baryonic effects without running full-volume hydrodynamical boxes.
- If the overdensity dependence holds across different simulation resolutions and codes, the method could standardize emulation pipelines for existing datasets.
Load-bearing premise
The halo mass function within each small region depends only on the local overdensity value and the global cosmological parameters in a manner that one emulator can learn accurately enough to produce the correct global integral.
What would settle it
Measuring the halo mass function directly in a full large-volume simulation and comparing it to the one recovered by integrating the emulator trained on its small overdense subregions; significant mismatch would falsify the approach.
Figures
read the original abstract
We present a new method for emulating the halo mass function (HMF) and other distribution functions in large effective volumes, down to low halo masses, whilst simultaneously modifying large ranges of parameters, for a fraction of the cost of traditional periodic cosmological simulations. We demonstrate the method by selecting small regions, $V \sim (50 \,h^{-1}{\rm Mpc})^3$, with a range of overdensities from the Quijote suite, consisting of tens of thousands of $(1 \,h^{-1}{\rm Gpc})^3$ $N$-body simulation volumes run with varying $\Lambda$CDM parameters. We train a differentiable emulator, conditioned on the overdensity of the region and these global parameters, to reproduce the halo mass function in these regions. We then successfully recover the global distribution of halo masses of the entire box by integrating over the overdensity distribution. Our approach uses just $\sim\,$0.026% of the original simulation volume, and suggests that suites of targeted `zoom' simulations, extracted from low resolution parent volumes, can be used to emulate large volume simulations at a fraction of the computational cost, whilst simultaneously pushing the dynamic range to much lower masses than can be achieved in periodic simulations. We discuss emulation of other key dark matter and baryonic distribution functions, as well as higher order statistics, with implications for the interpretation of upcoming wide field surveys on observatories such as Euclid, Roman and Rubin.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a method to emulate the halo mass function (HMF) in gigaparsec volumes by extracting and training a neural network emulator on small sub-volumes ((50 h^{-1} Mpc)^3) selected from the Quijote suite of N-body simulations. The emulator is conditioned on local overdensity and varying cosmological parameters; the global HMF is then recovered by integrating the emulator output over the known overdensity distribution of the full box, using only ~0.026% of the original volume. The approach is positioned as a cost-effective alternative to full periodic simulations for distribution functions and higher-order statistics relevant to surveys such as Euclid, Roman, and Rubin.
Significance. If the central recovery claim holds with quantified accuracy, the method would enable efficient emulation of distribution functions across large parameter spaces and dynamic ranges by leveraging existing large-volume suites and targeted sub-volumes. This could reduce computational costs while extending to lower masses and other statistics, with direct relevance to interpreting wide-field survey data.
major comments (2)
- [Abstract] Abstract: the claim that the global HMF is 'successfully recover[ed]' by integrating the emulator over the overdensity distribution is presented without reported error bars, quantitative validation metrics (e.g., fractional residuals or Kolmogorov-Smirnov statistics), or details on how overdensity bins are chosen and the integration is performed numerically. This prevents assessment of whether the integrated result matches the full-box truth within expected uncertainties.
- [Method] Method and results sections: the workflow assumes the HMF inside each (50 h^{-1} Mpc)^3 patch is fully determined by its scalar mean overdensity plus global cosmological parameters, with no residual dependence on larger-scale environment (tidal shear, neighboring structures, or variance on scales >50 h^{-1} Mpc). No test is described that isolates or quantifies such residual correlations, which would propagate directly into systematic bias in the integrated global HMF even if the emulator fits the training patches well.
minor comments (2)
- [Abstract] The abstract mentions emulation of 'other key dark matter and baryonic distribution functions' and 'higher order statistics,' but the presented results focus exclusively on the HMF; clarify whether these extensions are demonstrated or left for future work.
- [Methods] Notation for the overdensity variable and the precise definition of the integration measure over the overdensity distribution should be introduced explicitly with an equation in the methods section for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their careful reading of our manuscript and for providing constructive comments that have helped improve the clarity and robustness of our work. We address each of the major comments below and have made corresponding revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the global HMF is 'successfully recover[ed]' by integrating the emulator over the overdensity distribution is presented without reported error bars, quantitative validation metrics (e.g., fractional residuals or Kolmogorov-Smirnov statistics), or details on how overdensity bins are chosen and the integration is performed numerically. This prevents assessment of whether the integrated result matches the full-box truth within expected uncertainties.
Authors: We agree that the abstract should include quantitative support for the recovery claim to allow immediate assessment. The detailed validation, including fractional residuals, error bars on the integrated HMF, and the numerical procedure for binning and integrating over the overdensity distribution, is presented in the results section of the manuscript. We have revised the abstract to incorporate a concise statement of the achieved accuracy (sub-percent level agreement with the full-box HMF) along with a brief description of the integration method, while preserving the abstract's length and focus. revision: yes
-
Referee: [Method] Method and results sections: the workflow assumes the HMF inside each (50 h^{-1} Mpc)^3 patch is fully determined by its scalar mean overdensity plus global cosmological parameters, with no residual dependence on larger-scale environment (tidal shear, neighboring structures, or variance on scales >50 h^{-1} Mpc). No test is described that isolates or quantifies such residual correlations, which would propagate directly into systematic bias in the integrated global HMF even if the emulator fits the training patches well.
Authors: This is a substantive point about the completeness of the environmental modeling. Our method is predicated on the local mean overdensity (on the chosen 50 h^{-1} Mpc scale) being the dominant driver of the HMF, with larger-scale variations accounted for via the measured overdensity distribution of the parent volume. The end-to-end success in recovering the global HMF to high accuracy provides indirect validation that residual correlations do not dominate. We have revised the methods and discussion sections to explicitly articulate this assumption, to discuss its physical basis and potential limitations with reference to the literature on environmental effects, and to note that any unaccounted bias would have appeared in the integrated comparison. A direct isolation test for larger-scale residuals was not included in the original submission; we view the addition of the explicit discussion as a partial but appropriate response that strengthens the paper without requiring new simulations. revision: partial
Circularity Check
No circularity: training on conditional sub-volume HMFs and integration over overdensity PDF are independent steps
full rationale
The paper selects small sub-regions from external Quijote N-body simulations, trains a differentiable emulator to reproduce the HMF conditioned only on local overdensity and global cosmological parameters, then integrates the emulator output against the separately measured overdensity distribution to recover the global HMF. This integration is not equivalent to any training input by construction; the global result functions as an empirical validation rather than a mathematical identity. No equations, self-citations, or ansatzes reduce the claimed recovery to a fitted quantity or prior result. The method is self-contained against the full-volume benchmark and does not invoke uniqueness theorems or rename known patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- emulator neural network weights
axioms (2)
- domain assumption The halo mass function in a subvolume is fully determined by its overdensity and the global cosmological parameters.
- domain assumption The overdensity distribution of the full volume is known or can be measured independently.
Reference graph
Works this paper leans on
-
[1]
Akiba T., Sano S., Yanase T., Ohta T., Koyama M., 2019, Optuna: A Next-generation Hyperparameter Optimization Framework , @doi 10.48550/arXiv.1907.10902
-
[2]
Bairagi A., Wandelt B., Villaescusa-Navarro F., 2025, @doi [Astronomy and Astrophysics] 10.1051/0004-6361/202554602 , 703, A301
-
[3]
Bardeen J. M., Steinhardt P. J., Turner M. S., 1983, @doi [Phys. Rev. D] 10.1103/PhysRevD.28.679 , 28, 679
-
[4]
J., et al., 2017, @doi [MNRAS] 10.1093/mnras/stx1647 , 471, 1088
Barnes D. J., et al., 2017, @doi [MNRAS] 10.1093/mnras/stx1647 , 471, 1088
-
[5]
Bartlett D. J., Ho M., Wandelt B. D., 2024, @doi [ApJ] 10.3847/2041-8213/ad97b9 , 977, L44
-
[6]
Halo mass function: Baryon impact, fitting formulae and implications for cluster cosmology
Bocquet S., Saro A., Dolag K., Mohr J. J., 2016, @doi [MNRAS] 10.1093/mnras/stv2657 , 456, 2361
-
[7]
Bocquet S., Heitmann K., Habib S., Lawrence E., Uram T., Frontiere N., Pope A., Finkel H., 2020, @doi [ApJ] 10.3847/1538-4357/abac5c , 901, 5
-
[8]
R., Cole S., Efstathiou G., Kaiser N., 1991, @doi [ApJ] 10.1086/170520 , 379, 440
Bond J. R., Cole S., Efstathiou G., Kaiser N., 1991, @doi [ApJ] 10.1086/170520 , 379, 440
-
[9]
Castro T., Borgani S., Dolag K., Marra V., Quartin M., Saro A., Sefusatti E., 2021, @doi [MNRAS] 10.1093/mnras/staa3473 , 500, 2316
-
[10]
Chen Z., Yu Y., 2025, @doi [Sci. China Phys. Mech. Astron.] 10.1007/s11433-025-2764-x , 68, 109513
-
[11]
Collaboration E., et al., 2025, @doi [A&A] 10.1051/0004-6361/202450853 , 697, A5
-
[12]
Crain R. A., et al., 2009, @doi [MNRAS] 10.1111/j.1365-2966.2009.15402.x , 399, 1773
-
[13]
Crocce M., Fosalba P., Castander F. J., Gaztanaga E., 2010, @doi [MNRAS] 10.1111/j.1365-2966.2009.16194.x , 403, 1353
-
[14]
Cui W., Borgani S., Murante G., 2014, @doi [MNRAS] 10.1093/mnras/stu673 , 441, 1769
-
[15]
Davis M., Efstathiou G., Frenk C. S., White S. D. M., 1985, @doi [ApJ] 10.1086/163168 , 292, 371
-
[16]
Ding S., Lavaux G., Jasche J., 2024, @doi [A&A] 10.1051/0004-6361/202451343 , 690, A236
-
[17]
and Cole, Shaun and Frenk, Carlos S
Eke V. R., Cole S., Frenk C. S., 1996, @doi [MNRAS] 10.1093/mnras/282.1.263 , 282, 263
-
[18]
Euclid Collaboration et al., 2023, @doi [A&A] 10.1051/0004-6361/202244674 , 671, A100
-
[19]
Euclid Collaboration et al., 2025, @doi [A&A] 10.1051/0004-6361/202450810 , 697, A1
-
[20]
Frontiere N., et al., 2025, in Proceedings of the International Conference for High Performance Computing , Networking , Storage and Analysis . SC '25. Association for Computing Machinery, New York, NY, USA, pp 25--35, @doi 10.1145/3712285.3771786
-
[21]
Garwood F., 1936, @doi [Biometrika] 10.1093/biomet/28.3-4.437 , 28, 437
-
[22]
Geach J. E., Sobral D., Hickox R. C., Wake D. A., Smail I., Best P. N., Baugh C. M., Stott J. P., 2012, @doi [MNRAS] 10.1111/j.1365-2966.2012.21725.x , 426, 679
-
[23]
Gebhardt M., et al., 2026, Cosmological Back-Reaction of Baryons on Dark Matter in the CAMELS Simulations, @doi 10.48550/arXiv.2601.06258
-
[24]
Hassan S., et al., 2022, @doi [ApJ] 10.3847/1538-4357/ac8b09 , 937, 83
-
[25]
Hern \'a ndez-Aguayo C., et al., 2022, The MillenniumTNG Project : High-precision Predictions for Matter Clustering and Halo Statistics
2022
-
[26]
FIRE-2 Simulations: Physics versus Numerics in Galaxy Formation
Hopkins P. F., et al., 2017, preprint, 1702, arXiv:1702.06148
work page Pith review arXiv 2017
-
[27]
Ishiyama T., et al., 2021, @doi [MNRAS] 10.1093/mnras/stab1755 , 506, 4210
-
[28]
Ivezi \'c Z., et al., 2019, @doi [ApJ] 10.3847/1538-4357/ab042c , 873, 111
-
[29]
G., et al., 2025, @doi [ApJ] 10.3847/1538-4357/ae0334 , 994, 174
Iyer K. G., et al., 2025, @doi [ApJ] 10.3847/1538-4357/ae0334 , 994, 174
-
[30]
2001, MNRAS, 322, 231, doi: 10.1046/j.1365-8711.2001.04022.x
Jenkins A., Frenk C. S., White S. D. M., Colberg J. M., Cole S., Evrard A. E., Couchman H. M. P., Yoshida N., 2001, @doi [MNRAS] 10.1046/j.1365-8711.2001.04029.x , 321, 372
-
[31]
Katz N., White S. D. M., 1993, @doi [ApJ] 10.1086/172935 , 412, 455
-
[32]
P., Ba J., 2014, in 3rd International Conference for Learning Representations
Kingma D. P., Ba J., 2014, in 3rd International Conference for Learning Representations
2014
-
[33]
E., et al., 2024, @doi [ApJ] 10.3847/1538-4357/ad3d4a , 968, 11
Lee M. E., et al., 2024, @doi [ApJ] 10.3847/1538-4357/ad3d4a , 968, 11
-
[34]
Lewis A., Challinor A., Lasenby A., 2000, @doi [ApJ] 10.1086/309179 , 538, 473
-
[35]
Environmental dependence of high-redshift galaxy evolution
Lovell C. C., Vijayan A. P., Thomas P. A., Wilkins S. M., Barnes D. J., Irodotou D., Roper W., 2021, @doi [MNRAS] 10.1093/mnras/staa3360 , 500, 2127
-
[36]
Lovell C. C., Wilkins S. M., Thomas P. A., Schaller M., Baugh C. M., Fabbian G., Bah \'e Y., 2022, @doi [MNRAS] 10.1093/mnras/stab3221 , 509, 5046
-
[37]
C., et al., 2023, @doi [ICML] 10.48550/arXiv.2307.06967
Lovell C. C., et al., 2023, @doi [ICML] 10.48550/arXiv.2307.06967
-
[38]
C., et al., 2024, @doi [arXiv] 10.48550/arXiv.2411.13960
Lovell C. C., et al., 2024, @doi [arXiv] 10.48550/arXiv.2411.13960
-
[39]
Maltz M. G. A., et al., 2025, @doi [MNRAS] 10.1093/mnras/staf410 , 538, 3084
-
[40]
McCarthy I. G., Schaye J., Bird S., Le Brun A. M. C., 2017, @doi [MNRAS] 10.1093/mnras/stw2792 , 465, 2936
-
[41]
McClintock T., et al., 2019, @doi [ApJ] 10.3847/1538-4357/aaf568 , 872, 53
-
[42]
Nguyen T., et al., 2026, @doi [ApJ] 10.3847/1538-4357/ae28dc , 997, 336
-
[43]
Ni Y., et al., 2023, @doi [ApJ] 10.3847/1538-4357/ad022a , 959, 136
-
[44]
Nishimichi T., et al., 2019, @doi [ApJ] 10.3847/1538-4357/ab3719 , 884, 29
-
[45]
Pakmor R., et al., 2022, The MillenniumTNG Project : The Hydrodynamical Full Physics Simulation and a First Look at Its Galaxy Clusters
2022
-
[46]
Pandey S., Lanusse F., Modi C., Wandelt B. D., 2024, Teaching Dark Matter Simulations to Speak the Halo Language, @doi 10.48550/arXiv.2409.11401
-
[47]
Pandey S., et al., 2025, @doi [Phys. Rev. D] 10.1103/vlm2-tm6k , 112, 103503
-
[48]
Payerne C., Murray C., Simon H., 2026, Simulation-Based Cosmological Inference from Optically-Selected Galaxy Clusters with \ texttt Capish \ , @doi 10.48550/arXiv.2602.01911
-
[49]
Piras D., Joachimi B., Villaescusa-Navarro F., 2023, @doi [MNRAS] 10.1093/mnras/stad052 , 520, 668
-
[50]
Press W. H., Schechter P., 1974, @doi [ApJ] 10.1086/152650 , 187, 425
-
[51]
Regamey M., Eckert D., Seppi R., Hartley W., Umetsu K., Tam S., Gerolymatou D., 2025, Galaxy Cluster Count Cosmology with Simulation-Based Inference, @doi 10.48550/arXiv.2506.05457
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.05457 2025
-
[52]
Ruan C.-Z., et al., 2024, @doi [MNRAS] 10.1093/mnras/stad3021 , 527, 2490
-
[53]
S \'a ez-Casares I., Rasera Y., Richardson T. R. G., Corasaniti P.-S., 2024, @doi [A&A] 10.1051/0004-6361/202450193 , 691, A323
-
[54]
Sawala T., Frenk C. S., Crain R. A., Jenkins A., Schaye J., Theuns T., Zavala J., 2013, @doi [MNRAS] 10.1093/mnras/stt259 , 431, 1366
-
[55]
Schaye J., et al., 2015, @doi [MNRAS] 10.1093/mnras/stu2058 , 446, 521
-
[56]
Schaye J., et al., 2023, @doi [MNRAS] 10.1093/mnras/stad2419 , 526, 4978
-
[57]
H., Banerjee A., Collaboration t
Shen D., Kokron N., DeRose J., Tinker J., Wechsler R. H., Banerjee A., Collaboration t. A., 2025, @doi [JCAP] 10.1088/1475-7516/2025/03/056 , 2025, 056
-
[58]
Sims X., et al., 2026, CAMELS Environments : The Impact of Local Neighbours on Galaxy Evolution across the SIMBA , IllustrisTNG , ASTRID , and Swift-EAGLE Simulations , @doi 10.48550/arXiv.2601.06290
-
[59]
Springel V., 2005, @doi [MNRAS] 10.1111/j.1365-2966.2005.09655.x , 364, 1105
-
[60]
P., 2021, @doi [ApJSS] 10.3847/1538-4365/abcd94 , 252, 28
Stopyra S., Pontzen A., Peiris H., Roth N., Rey M. P., 2021, @doi [ApJSS] 10.3847/1538-4365/abcd94 , 252, 28
-
[61]
Toward a halo mass function for precision cosmology: the limits of universality
Tinker J. L., Kravtsov A. V., Klypin A., Abazajian K., Warren M. S., Yepes G., Gottlober S., Holz D. E., 2008, @doi [ApJ] 10.1086/591439 , 688, 709
-
[62]
Tormen G., Bouchet F. R., White S. D. M., 1997, @doi [MNRAS] 10.1093/mnras/286.4.865 , 286, 865
-
[63]
Vijayan A. P., Lovell C. C., Wilkins S. M., Thomas P. A., Barnes D. J., Irodotou D., Kuusisto J., Roper W. J., 2021, @doi [MNRAS] 10.1093/mnras/staa3715 , 501, 3289
-
[64]
Villaescusa-Navarro F., et al., 2020, @doi [ApJS] 10.3847/1538-4365/ab9d82 , 250, 2
-
[65]
Villaescusa-Navarro F., et al., 2021, @doi [ApJ] 10.3847/1538-4357/abf7ba , 915, 71
-
[66]
Villaescusa-Navarro F., et al., 2022a, @doi [ApJSS] 10.3847/1538-4365/ac5ab0 , 259, 61
-
[67]
Villaescusa-Navarro F., et al., 2022b, @doi [ApJ] 10.3847/1538-4357/ac5d3f , 929, 132
-
[68]
Vogelsberger M., Marinacci F., Torrey P., Puchwein E., 2020, @doi [Nature Reviews Physics] 10.1038/s42254-019-0127-2 , 2, 42
-
[69]
Precision determination of the mass function of dark matter halos,
Warren M. S., Abazajian K., Holz D. E., Teodoro L., 2006, @doi [ApJ] 10.1086/504962 , 646, 881
-
[70]
Wechsler R. H., Tinker J. L., 2018, @doi [ARAA] 10.1146/annurev-astro-081817-051756 , 56, 435
work page Pith review doi:10.1146/annurev-astro-081817-051756 2018
-
[71]
M., et al., 2023, @doi [MNRAS] 10.1093/mnras/stac3280 , 519, 3118
Wilkins S. M., et al., 2023, @doi [MNRAS] 10.1093/mnras/stac3280 , 519, 3118
-
[72]
Wu H.-Y., Zentner A. R., Wechsler R. H., 2010, @doi [ApJ] 10.1088/0004-637X/713/2/856 , 713, 856
-
[73]
Zhai Z., Wang Y., Benson A., Chuang C.-H., Yepes G., 2021, @doi [MNRAS] 10.1093/mnras/stab1539 , 505, 2784
-
[74]
de Santi N. S. M., et al., 2023, @doi [ApJ] 10.3847/1538-4357/acd1e2 , 952, 69
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.