Comparative Analysis of EMCEE, Gaussian Process, and Masked Autoregressive Flow in Constraining the Hubble Constant Using Cosmic Chronometers Dataset
Pith reviewed 2026-05-23 03:08 UTC · model grok-4.3
The pith
EMCEE recovers the Hubble constant with lower bias and better calibration than Gaussian processes or masked autoregressive flows in tests on mock cosmic chronometer data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When mock cosmic chronometer datasets are generated with a fixed input H0,EMCEE recovers posteriors with the smallest bias and RMSE, the coverage closest to the nominal 68 percent and 95 percent levels, and the highest log score. GP ranks in the middle on these quantities while MAF ranks lowest. GP is also the most sensitive to removal of single data points, MAF intermediate, and EMCEE least sensitive; EMCEE and GP respond more to high-redshift points while MAF responds more to low-redshift points. These rankings hold under both Lambda-CDM-based and GP-based mock-generation prescriptions.
What carries the argument
Monte Carlo delete-d jackknife (MCDJ) applied to assess point-wise sensitivity, together with simulation tests that compare recovered H0 posteriors to a known input truth via bias, RMSE, 68 percent and 95 percent coverage, and log score.
If this is right
- EMCEE produces H0 constraints with lower bias and better-calibrated intervals than the other two methods across the tested mock ensembles.
- Sensitivity rankings and redshift dependence differ systematically among the methods, so data-point influence is method-specific.
- The performance ordering remains the same whether mocks are drawn from a Lambda-CDM model or from a GP model.
- MAF shows the largest deviations from nominal coverage and the lowest log scores in the simulation tests.
Where Pith is reading between the lines
- Analyses aiming to tighten constraints on the Hubble tension with chronometer data would gain by defaulting to EMCEE unless new evidence shows otherwise.
- The redshift-dependent sensitivities could be used to design follow-up observations that target the redshift ranges where each method is least affected.
- Extending the same mock-based ranking exercise to other cosmological probes might reveal whether EMCEE remains preferable outside the chronometer setting.
Load-bearing premise
The mock datasets generated under the Lambda-CDM and GP prescriptions accurately capture the statistical properties, error structure, and selection effects of real cosmic chronometer observations.
What would settle it
Repeating the simulation tests after changing the mock-generation prescription to include a different error distribution or redshift-dependent selection bias and finding that the performance ordering reverses would falsify the claim that EMCEE is the best performer.
Figures
read the original abstract
The Hubble constant ($H_0$) is essential for understanding the universe's evolution. Different methods, such as Affine Invariant Markov chain Monte Carlo Ensemble sampler (EMCEE), Gaussian Process (GP), and Masked Autoregressive Flow (MAF), are used to constrain $H_0$ using $H(z)$ data. However, these methods produce varying $H_0$ values when applied to the same dataset. To investigate these differences, we compare the methods based on their sensitivity to individual data points and their performance in constraining $H_0$. We apply Monte Carlo delete-$d$ jackknife (MCDJ) to assess their sensitivity to individual data points. Our findings reveal that GP is more sensitive to individual data points than both MAF and EMCEE, with MAF being more sensitive than EMCEE. Sensitivity also depends on redshift: EMCEE and GP are more sensitive to $H(z)$ at higher redshifts, while MAF is more sensitive at lower redshifts. In simulation-based performance tests, we generate an ensemble of mock CC datasets with a fixed input truth $H_{0,\mathrm{true}}$, apply each method to recover $H_0$ posteriors, and summarise performance by comparing the recovered posterior to $H_{0,\mathrm{true}}$: (i) posterior central value accuracy (bias and RMSE), (ii) credible-interval calibration (68\% and 95\% coverage), and (iii) overall posterior quality (log score), under two simulation prescriptions ($\Lambda$CDM-based and GP-based). Overall, EMCEE performs best, GP is intermediate, and MAF performs worst across the performance metrics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares three inference methods—EMCEE, Gaussian Process (GP), and Masked Autoregressive Flow (MAF)—for constraining the Hubble constant H0 from cosmic chronometer H(z) data. It applies Monte Carlo delete-d jackknife (MCDJ) to quantify sensitivity to individual data points (finding GP most sensitive, followed by MAF then EMCEE, with redshift dependence) and conducts simulation-based recovery tests on mock CC datasets generated under ΛCDM-based and GP-based prescriptions. Performance is ranked via bias, RMSE, 68%/95% coverage, and log score, with the headline result that EMCEE performs best, GP intermediate, and MAF worst.
Significance. If the performance ranking holds under more general mock-generation schemes, the work would provide a useful, quantitative guide for selecting inference engines when analyzing sparse H(z) datasets. The MCDJ sensitivity analysis and the multi-metric simulation protocol (bias/RMSE/coverage/log score) are concrete strengths that could be extended to other cosmological probes.
major comments (1)
- [simulation-based performance tests] The central performance ranking (EMCEE > GP > MAF) is obtained exclusively from recovery tests on mocks drawn from ΛCDM-based and GP-based prescriptions (abstract and simulation section). Because these prescriptions align with the parametric assumptions inside EMCEE and the kernel/smoothness assumptions inside GP, the design may systematically disadvantage the more flexible MAF; the manuscript does not report a cross-check using mocks generated by an independent process (e.g., bootstrap resampling of the real CC points with their covariance). This is load-bearing for the headline claim.
minor comments (3)
- [MCDJ analysis] The manuscript should specify the exact data-selection criteria, redshift range, and error model for the real CC sample used in the MCDJ analysis.
- [Methods] Hyperparameter choices for the GP kernel, MAF architecture (number of layers, hidden units, training schedule), and EMCEE settings should be stated explicitly, together with any sensitivity tests.
- [Results] Figure captions and table legends should clarify whether the reported coverage and log-score values are averaged over the ensemble of mocks or shown for representative realizations.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for identifying a key aspect of our simulation protocol. We address the single major comment below.
read point-by-point responses
-
Referee: [simulation-based performance tests] The central performance ranking (EMCEE > GP > MAF) is obtained exclusively from recovery tests on mocks drawn from ΛCDM-based and GP-based prescriptions (abstract and simulation section). Because these prescriptions align with the parametric assumptions inside EMCEE and the kernel/smoothness assumptions inside GP, the design may systematically disadvantage the more flexible MAF; the manuscript does not report a cross-check using mocks generated by an independent process (e.g., bootstrap resampling of the real CC points with their covariance). This is load-bearing for the headline claim.
Authors: We thank the referee for this observation. The two mock-generation prescriptions were deliberately selected to span the modeling assumptions underlying EMCEE (parametric ΛCDM) and GP (non-parametric smoothness), thereby providing a controlled test in which the more flexible MAF is evaluated on data drawn from both classes of generative process. Nevertheless, we agree that an additional recovery test based on bootstrap resampling of the real CC points (respecting their reported covariances) would constitute an independent, data-driven check that does not presuppose any particular form for the underlying expansion history. We will therefore add this bootstrap-based recovery experiment, together with the corresponding performance metrics, to the revised manuscript. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper evaluates method performance exclusively via recovery on mock CC datasets that are generated with an externally fixed, known H0,true value under two stated simulation prescriptions. Recovered posteriors are scored against this independent truth using bias, RMSE, coverage, and log score. No quoted equation or procedure shows a fitted parameter being renamed as a prediction, a self-definitional loop, or a load-bearing self-citation that reduces the ranking to the input by construction. The simulation design and evaluation metrics remain external to the internal fitting steps of EMCEE, GP, and MAF, rendering the central claim self-contained.
Axiom & Free-Parameter Ledger
axioms (3)
- standard math MCMC chains in EMCEE have converged to the target posterior
- domain assumption The chosen kernel and hyperparameters for the Gaussian Process adequately model the H(z) data covariance
- domain assumption The MAF architecture and training procedure produce well-calibrated densities for the parameter space
Reference graph
Works this paper leans on
-
[1]
In this paper, we adopt the flat ΛCDM cosmological model
EMCEE To determine H 0 from CC data using EMCEE, it is essential to choose a cosmological model. In this paper, we adopt the flat ΛCDM cosmological model. The cor- responding Friedmann equation is given by: H(z) = H0 p ΩM(1 + z)3 + (1 − ΩM). (2) 3 Where H(z) represents the Hubble parameter, H 0 is the Hubble constant, Ω M denotes the matter density, and z...
-
[2]
GP is a powerful tool that can model the relationship in data using a joint Gaus- sian distribution
GP H0 can also be determined by reconstructing the CC data using the Gaussian process, as outlined by Ras- mussen and Williams [6]. GP is a powerful tool that can model the relationship in data using a joint Gaus- sian distribution. It estimates values at new points with- out requiring additional parameters. In this study, we use a Gaussian process to est...
-
[3]
A reliable neural network estimator should balance both flexibility and tractability
MAF The Hubble constant H 0 can be estimated using CC data through neural networks. A reliable neural network estimator should balance both flexibility and tractability. Two families that embody both properties are autore- gressive models [25] and normalizing flows [26]. In this study, we use MAF, proposed by Papamakarios et al. [8], to estimate H 0 from ...
-
[4]
This results in the creation of 1000 randomly selected datasets, labeled with an index i ranging from 0 to 999. • Step 3: Generate 1000 constrained values of H 0 with EMCEE, GP, and MAF respectively using the 1000 randomly selected H(z) datasets from Step 2. Each method generates 1000 values of H0, and these results are shown in Fig. 2. To evaluate the se...
-
[5]
We summarize these results in Table II
we compare three metrics: (1) the absolute differ- ence between the mode of the H0 distribution and the H0 value constrained from the full 33 CC dataset, denoted as ∆ H0,mode−CC = |H0,mode − H0,CC|, (2) the absolute difference between the median of the H0 distribution and H0,CC, denoted as ∆H0,median−CC = |H0,median − H0,CC|, and (3) the range spanned by ...
-
[6]
The results are shown in Fig. 3. By analyzing Fig. 3, we can assess whether H(z) values from different redshift regions have varying impacts on the sensitivity of the constrained H0. The wider range of H0 within the 1σ (∆H0,1σ) and 2σ (∆H0,2σ) intervals in- dicates greater sensitivity of constrained H0 to individual H(z) data points in the corresponding r...
-
[7]
At this stage, we have generated zsim, Hfid(zsim) and σsim(zsim) in Equation (20). By combining these compo- nents, we can generate a simulated dataset consisting of 33 Hsim(zsim) data points based on the ΛCDM, as de- FIG. 6. The simulated H(z) dataset based on the ΛCDM model using CC data. The simulated data points, Hsim(zsim), are shown as red triangles...
-
[8]
Planck 2018 results. VI. Cosmological parameters
Planck Collaboration, N. Aghanim, Y. Akrami, M. Ash- down, J. Aumont, C. Baccigalupi, M. Ballardini, A. J. Banday, R. B. Barreiro, N. Bartolo, S. Basak, R. Battye, K. Benabed, J. P. Bernard, M. Bersanelli, P. Bielewicz, J. J. Bock, J. R. Bond, J. Borrill, F. R. Bouchet, F. Boulanger, M. Bucher, C. Burigana, R. C. Butler, E. Calabrese, J. F. Cardoso, J. Ca...
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[9]
A. G. Riess, W. Yuan, L. M. Macri, D. Scolnic, D. Brout, S. Casertano, D. O. Jones, Y. Murakami, G. S. Anand, L. Breuval, T. G. Brink, A. V. Filippenko, S. Hoffmann, S. W. Jha, W. D’arcy Kenworthy, J. Mackenty, B. E. Stahl, and W. Zheng, Astrophys. J. L. 934, L7 (2022), arXiv:2112.04510 [astro-ph.CO]
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[10]
Tensions between the Early and the Late Universe
L. Verde, T. Treu, and A. G. Riess, Nature Astronomy 3, 891 (2019), arXiv:1907.10625 [astro-ph.CO]
work page internal anchor Pith review Pith/arXiv arXiv 2019
- [11]
-
[12]
D. Foreman-Mackey, D. W. Hogg, D. Lang, and J. Good- man, Publications of the Astronomical Society of the Pa- cific 125, 306 (2013), arXiv:1202.3665 [astro-ph.IM]
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[13]
C. E. Rasmussen and C. K. I. Williams, Gaussian Pro- cesses for Machine Learning (2006)
work page 2006
-
[14]
Reconstruction of dark energy and expansion dynamics using Gaussian processes
M. Seikel, C. Clarkson, and M. Smith, JCAP 2012, 036 (2012), arXiv:1204.2832 [astro-ph.CO]
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[15]
Masked Autoregressive Flow for Density Estimation
G. Papamakarios, T. Pavlakou, and I. Murray, Proceed- ings of the 31st International Conference on Neural In- formation Processing Systems , arXiv:1705.07057 (2017), arXiv:1705.07057 [stat.ML]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
Cosmological Parameters from CMB Maps without Likelihood Approximation
B. Racine, J. B. Jewell, H. K. Eriksen, and I. K. Wehus, Astrophys. J. 820, 31 (2016), arXiv:1512.06619 [astro- ph.CO]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[17]
H. Zhang, Y.-C. Wang, T.-J. Zhang, and T. Zhang, As- trophys. J. S. 266, 27 (2023), arXiv:2304.03911 [astro- ph.CO]
-
[18]
Y.-C. Wang, Y.-B. Xie, T.-J. Zhang, H.-C. Huang, T. Zhang, and K. Liu, Astrophys. J. S. 254, 43 (2021), arXiv:2005.10628 [astro-ph.CO]
-
[19]
Constraining Cosmological Parameters Based on Relative Galaxy Ages
R. Jimenez and A. Loeb, Astrophys. J. 573, 37 (2002), arXiv:astro-ph/0106145 [astro-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2002
-
[20]
M. Moresco, R. Jimenez, L. Verde, A. Cimatti, and L. Pozzetti, Astrophys. J. 898, 82 (2020), arXiv:2003.07362 [astro-ph.GA]
- [21]
-
[22]
C. Zhang, H. Zhang, S. Yuan, S. Liu, T.-J. Zhang, and Y.-C. Sun, Research in Astronomy and Astrophysics 14, 1221-1233 (2014), arXiv:1207.4541 [astro-ph.CO]
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[23]
Constraints on the redshift dependence of the dark energy potential
J. Simon, L. Verde, and R. Jimenez, Phys. Rev. D 71, 123001 (2005), arXiv:astro-ph/0412269 [astro-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[24]
M. Moresco, A. Cimatti, R. Jimenez, L. Pozzetti, G. Zamorani, M. Bolzonella, J. Dunlop, F. Lamareille, M. Mignoli, H. Pearce, P. Rosati, D. Stern, L. Verde, E. Zucca, C. M. Carollo, T. Contini, J. P. Kneib, O. Le F` evre, S. J. Lilly, V. Mainieri, A. Renzini, M. Scodeggio, I. Balestra, R. Gobat, R. McLure, S. Bardelli, A. Bon- giorno, K. Caputi, O. Cuccia...
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[25]
M. Moresco, L. Pozzetti, A. Cimatti, R. Jimenez, C. Maraston, L. Verde, D. Thomas, A. Citro, R. To- jeiro, and D. Wilkinson, JCAP 2016, 014 (2016), arXiv:1601.01701 [astro-ph.CO]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[26]
A. L. Ratsimbazafy, S. I. Loubser, S. M. Craw- ford, C. M. Cress, B. A. Bassett, R. C. Nichol, and P. V¨ ais¨ anen, Mon. Not. Roy. Astron. Soc. 467, 3239 (2017), arXiv:1702.00418 [astro-ph.CO]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[27]
Cosmic Chronometers: Constraining the Equation of State of Dark Energy. I: H(z) Measurements
D. Stern, R. Jimenez, L. Verde, M. Kamionkowski, and S. A. Stanford, JCAP 2010, 008 (2010), arXiv:0907.3149 [astro-ph.CO]
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[28]
E. Tomasetti, M. Moresco, N. Borghi, K. Jiao, A. Cimatti, L. Pozzetti, A. C. Carnall, R. J. McLure, and L. Pentericci, Astron. Astrophys. 679, A96 (2023), arXiv:2305.16387 [astro-ph.CO]
-
[29]
Raising the bar: new constraints on the Hubble parameter with cosmic chronometers at z$\sim$2
M. Moresco, Mon. Not. Roy. Astron. Soc. 450, L16 (2015), arXiv:1503.01116 [astro-ph.CO]
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[30]
Power of Observational Hubble Parameter Data: a Figure of Merit Exploration
C. Ma and T.-J. Zhang, Astrophys. J. 730, 74 (2011), arXiv:1007.3787 [astro-ph.CO]
work page internal anchor Pith review Pith/arXiv arXiv 2011
- [31]
-
[32]
Neural Autoregressive Distribution Estimation
B. Uria, M.-A. Cˆ ot´ e, K. Gregor, I. Murray, and H. Larochelle, The Journal of Machine Learning Research , arXiv:1605.02226 (2016), arXiv:1605.02226 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[33]
D. Jimenez Rezende and S. Mohamed, Proceedings of the 32nd International Conference on Machine Learning (2015)
work page 2015
-
[34]
MADE: Masked Autoencoder for Distribution Estimation
M. Germain, K. Gregor, I. Murray, and H. Larochelle, arXiv e-prints , arXiv:1502.03509 (2015), arXiv:1502.03509 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[35]
Variational Inference with Normalizing Flows
D. Jimenez Rezende and S. Mohamed, arXiv e-prints , arXiv:1505.05770 (2015), arXiv:1505.05770 [stat.ML]
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[36]
J.-F. Chen, Y.-C. Wang, T. Zhang, and T.-J. Zhang, Phys. Rev. D 107, 063517 (2023), arXiv:2211.05064 [astro-ph.CO]
-
[37]
A. Endo, E. van Leeuwen, and M. Baguelin, Epidemics 29, 100363 (2019)
work page 2019
-
[38]
Y. Li, S. Rao, A. Hassaine, R. Ramakrishnan, D. Canoy, G. Salimi-Khorshidi, M. Mamouei, T. Lukasiewicz, and K. Rahimi, Scientific Reports 11, 20685 (2021)
work page 2021
-
[39]
Thomas, Computational Economics 60, 451 (2022)
L. Thomas, Computational Economics 60, 451 (2022)
work page 2022
- [40]
-
[41]
Astropy Collaboration, A. M. Price-Whelan, B. M. Sip˝ ocz, H. M. G¨ unther, P. L. Lim, S. M. Crawford, S. Conseil, D. L. Shupe, M. W. Craig, N. Dencheva, A. Ginsburg, J. T. VanderPlas, L. D. Bradley, D. P´ erez- Su´ arez, M. de Val-Borro, T. L. Aldcroft, K. L. Cruz, T. P. Robitaille, E. J. Tollerud, C. Ardelean, T. Babej, Y. P. Bach, M. Bachetti, A. V. Ba...
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[42]
Astropy Collaboration, T. P. Robitaille, E. J. Tollerud, P. Greenfield, M. Droettboom, E. Bray, T. Ald- croft, M. Davis, A. Ginsburg, A. M. Price-Whelan, W. E. Kerzendorf, A. Conley, N. Crighton, K. Barbary, D. Muna, H. Ferguson, F. Grollier, M. M. Parikh, P. H. Nair, H. M. Unther, C. Deil, J. Woillez, S. Conseil, R. Kramer, J. E. H. Turner, L. Singer, R....
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[43]
E. Bertin and S. Arnouts, Astron. Astrophys. Suppl. Ser. 117, 393 (1996)
work page 1996
-
[44]
R. Cloutier, R. Doyon, F. Bouchy, and G. H´ ebrard, As- tron. J. 156, 82 (2018), arXiv:1807.01263 [astro-ph.EP]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[45]
L. Corrales, Astrophys. J. 805, 23 (2015), arXiv:1503.01475 [astro-ph.HE]
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[46]
G. J. Ferland, R. L. Porter, P. A. M. van Hoof, R. J. R. Williams, N. P. Abel, M. L. Lykins, G. Shaw, W. J. Henney, and P. C. Stancil, Revista Mexicana de As- tronom´ ıa y Astrof´ ısica49, 137 (2013), arXiv:1302.4485 13 [astro-ph.GA]
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[47]
R. J. Hanisch and C. D. Biemesderfer, in Bulletin of the American Astronomical Society (1989) p. 780
work page 1989
-
[48]
Lamport, LaTeX: A Document Preparation System , 2nd ed
L. Lamport, LaTeX: A Document Preparation System , 2nd ed. (Addison-Wesley Professional, 1994)
work page 1994
-
[49]
L. Li, J. Zhang, H. Peter, L. P. Chitta, J. Su, H. Song, C. Xia, and Y. Hou, Astrophys. J. 868, L33 (2018), arXiv:1811.08553 [astro-ph.SR]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[50]
Nominal values for selected solar and planetary quantities: IAU 2015 Resolution B3
A. Prˇ sa, P. Harmanec, G. Torres, E. Mamajek, M. As- plund, N. Capitaine, J. Christensen-Dalsgaard, ´E. De- pagne, M. Haberreiter, and S. Hekker, Astron. J. 152, 41 (2016), arXiv:1605.09788 [astro-ph.SR]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[51]
G. J. Schwarz, J.-U. Ness, J. P. Osborne, K. L. Page, P. A. Evans, A. P. Beardmore, F. M. Walter, L. A. Hel- ton, C. E. Woodward, M. Bode, S. Starrfield, and J. J. Drake, Astrophys. J. S. 197, 31 (2011), arXiv:1110.6224 [astro-ph.SR]
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[52]
F. P. A. Vogt, M. A. Dopita, L. J. Kewley, R. S. Suther- land, J. Scharw¨ achter, H. M. Basurah, A. Ali, and M. A. Amer, Astrophys. J. 793, 127 (2014), arXiv:1406.5186 [astro-ph.GA]
work page internal anchor Pith review Pith/arXiv arXiv 2014
- [53]
-
[54]
J. Niu and T.-J. Zhang, Physics of the Dark Universe 39, 101147 (2023), arXiv:2204.10597 [astro-ph.CO]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.