Are all models wrong? Falsifying binary formation models in gravitational-wave astronomy
Pith reviewed 2026-05-24 01:02 UTC · model grok-4.3
The pith
A frequentist p-value test reveals that some but not all hierarchical merger models adequately explain GW190521.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a frequentist p-value to assess whether a model provides an adequate explanation for the data. Applied to hierarchical merger models for GW190521, some models in active galactic nuclei and globular clusters yield adequate explanations while others do not.
What carries the argument
A frequentist p-value calculation that diagnoses model adequacy for rare, high-mass gravitational-wave events.
If this is right
- Bayesian model selection alone is insufficient; models must also pass an adequacy test.
- Hierarchical mergers in some environments can explain exceptionally massive events like GW190521.
- When all tested models fail the p-value test, entirely new formation channels become necessary.
- The method can be applied to other exceptional events in the growing gravitational-wave catalogue.
Where Pith is reading between the lines
- The same adequacy test could be run on models proposed for the extreme mass-ratio event GW190814.
- Repeated application to future high-mass detections might systematically rule out entire classes of hierarchical merger scenarios.
- The approach highlights the value of designing population analyses that explicitly separate selection effects from the model adequacy question.
Load-bearing premise
The frequentist p-value calculation correctly diagnoses model adequacy for rare high-mass events without being undermined by unmodeled selection effects or population assumptions.
What would settle it
A specific p-value below a conventional threshold such as 0.05 for one of the hierarchical merger models applied to GW190521 would show that model is inadequate.
Figures
read the original abstract
As the catalogue of gravitational-wave transients grows, several entries appear "exceptional" within the population. Tipping the scales with a total mass of $\approx 150 M_\odot$, GW190521 likely contained black holes in the pair-instability mass gap. The event GW190814, meanwhile, is unusual for its extreme mass ratio and the mass of its secondary component. A growing model-building industry has emerged to provide explanations for such exceptional events, and Bayesian model selection is frequently used to determine the most informative model. However, Bayesian methods can only take us so far. They provide no answer to the question: does our model provide an adequate explanation for the data? If none of the models we are testing provide an adequate explanation, then it is not enough to simply rank our existing models - we need new ones. In this paper, we introduce a method to answer this question with a frequentist $p$-value. We apply the method to different models that have been suggested to explain GW190521: hierarchical mergers in active galactic nuclei and globular clusters. We show that some (but not all) of these models provide adequate explanations for exceptionally massive events like GW190521.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a frequentist p-value procedure to assess whether binary black hole formation models provide adequate explanations for exceptional events such as GW190521 (total mass ~150 M_⊙). It applies the method to hierarchical-merger scenarios in active galactic nuclei and globular clusters, concluding that some (but not all) models yield adequate p-values while others do not.
Significance. If the p-value construction is shown to be robust, the work supplies a concrete falsification tool that complements Bayesian model selection in gravitational-wave population studies, directly addressing the question of model adequacy for rare, high-mass events.
major comments (2)
- [§3.1–3.2] §3.1–3.2 (p-value definition and likelihood): the test statistic is constructed from a likelihood that encodes both formation-channel physics and the observational selection function, yet no explicit marginalization over population hyperparameters or sensitivity analysis under variations in the mass-dependent detection probability is provided; this directly affects whether the reported tail probabilities can be trusted for GW190521.
- [§4] §4 (application to GW190521 models): the claim that certain AGN and globular-cluster models are adequate rests on the computed p-values, but the manuscript supplies neither a validation study nor error analysis of the p-value procedure itself, leaving open the possibility that unmodeled selection effects bias the adequacy diagnosis.
minor comments (2)
- [§2] Notation for the selection function and the precise definition of the test statistic should be introduced earlier and used consistently throughout.
- [Figures 2–4] Figure captions could explicitly state the assumed priors on spin and redshift distributions used in the likelihood.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§3.1–3.2] §3.1–3.2 (p-value definition and likelihood): the test statistic is constructed from a likelihood that encodes both formation-channel physics and the observational selection function, yet no explicit marginalization over population hyperparameters or sensitivity analysis under variations in the mass-dependent detection probability is provided; this directly affects whether the reported tail probabilities can be trusted for GW190521.
Authors: Our p-value procedure is constructed for a fixed formation model (with hyperparameters set to literature values), as is standard for frequentist model-adequacy tests rather than full hierarchical inference. The likelihood already folds in the selection function for that model. We nevertheless agree that a sensitivity analysis would increase confidence in the tail probabilities. We will add such an analysis, including variations in the mass-dependent detection probability, to the revised manuscript. revision: yes
-
Referee: [§4] §4 (application to GW190521 models): the claim that certain AGN and globular-cluster models are adequate rests on the computed p-values, but the manuscript supplies neither a validation study nor error analysis of the p-value procedure itself, leaving open the possibility that unmodeled selection effects bias the adequacy diagnosis.
Authors: We accept that an explicit validation study and error analysis would strengthen the application section. In the revision we will include Monte Carlo validation (drawing synthetic events from each model and confirming that the p-value distribution is uniform under the null) together with a discussion of possible biases from unmodeled selection effects and how the inclusion of the selection function in the likelihood mitigates them. revision: yes
Circularity Check
No circularity: frequentist p-value adequacy test is independent of model inputs
full rationale
The paper introduces a frequentist p-value method to assess whether formation models (hierarchical mergers in AGN or globular clusters) adequately explain exceptional events like GW190521, distinct from Bayesian ranking. The abstract presents this as an external check on model adequacy without any quoted equations or steps that reduce the p-value to a fitted parameter, self-referential definition, or self-citation chain by construction. No load-bearing step equates the test statistic to its own inputs; the derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Bayesian model selection is insufficient to determine whether any tested model provides an adequate explanation
Reference graph
Works this paper leans on
-
[1]
Aasi et al., J. 2015, Class. Quantum Grav., 32, 074001, doi: 10.1088/0264-9381/32/7/074001 Abbott et al., R. 2020a, Phys. Rev. Lett., 125, 101102, doi: 10.1103/PhysRevLett.125.101102 —. 2020b, ApJL, 896, L44, doi: 10.3847/2041-8213/ab960f —. 2023, Phys. Rev. X, 13, 041039, doi: 10.1103/PhysRevX.13.041039
-
[2]
Acernese, F., Agathos, M., Agatsuma, K., et al. 2015, Class. Quantum Grav., 32, 024001, doi: 10.1088/0264-9381/32/2/024001
-
[3]
2019, Nat Astron, 3, 35, doi: 10.1038/s41550-018-0658-y
Akutsu, T., Ando, M., Arai, K., et al. 2019, Nat Astron, 3, 35, doi: 10.1038/s41550-018-0658-y
-
[4]
2022, ApJ, 941, 4, doi: 10.3847/1538-4357/ac9d95
Anagnostou, O., Trenti, M., & Melatos, A. 2022, ApJ, 941, 4, doi: 10.3847/1538-4357/ac9d95
-
[5]
2021, ApJ, 920, 128, doi: 10.3847/1538-4357/ac1419
Arca-Sedda, M., Paolo Rizzuto, F., Naab, T., et al. 2021, ApJ, 920, 128, doi: 10.3847/1538-4357/ac1419
-
[6]
Ashton, G., H¨ ubner, M., Lasky, P. D., et al. 2019, ApJS, 241, 27, doi: 10.3847/1538-4365/ab06fc
-
[7]
2016, A&A, 594, A97, doi: 10.1051/0004-6361/201628980
Belczynski, K., Heger, A., Gladysz, W., et al. 2016, A&A, 594, A97, doi: 10.1051/0004-6361/201628980
-
[8]
2022, Physics Letters B, 829, 137040, doi: 10.1016/j.physletb.2022.137040
Chen, Z.-C., Yuan, C., & Huang, Q.-G. 2022, Physics Letters B, 829, 137040, doi: 10.1016/j.physletb.2022.137040
-
[9]
2022, Physics of the Dark Universe, 38, 101111, doi: 10.1016/j.dark.2022.101111
Clesse, S., & Garc´ ıa-Bellido, J. 2022, Physics of the Dark Universe, 38, 101111, doi: 10.1016/j.dark.2022.101111
-
[10]
Costa, G., Bressan, A., Mapelli, M., et al. 2021, Monthly Notices of the Royal Astronomical Society, 501, 4514, doi: 10.1093/mnras/staa3916 Dall’Amico, M., Mapelli, M., Di Carlo, U. N., et al. 2021, Monthly Notices of the Royal Astronomical Society, 508, 3045, doi: 10.1093/mnras/stab2783 De Luca, V., Desjacques, V., Franciolini, G., Pani, P., &
-
[11]
Riotto, A. 2021, Phys. Rev. Lett., 126, 051101, doi: 10.1103/PhysRevLett.126.051101
-
[12]
2022, ApJ, 926, 34, doi: 10.3847/1538-4357/ac3978
Essick, R., Farah, A., Galaudage, S., et al. 2022, ApJ, 926, 34, doi: 10.3847/1538-4357/ac3978
-
[13]
2019, ApJ, 887, 53, doi: 10.3847/1538-4357/ab518b
Justham, S. 2019, ApJ, 887, 53, doi: 10.3847/1538-4357/ab518b
-
[14]
Fishbach, M., Essick, R., & Holz, D. E. 2020, ApJL, 899, L8, doi: 10.3847/2041-8213/aba7b6
-
[15]
Fragione, G., Loeb, A., & Rasio, F. A. 2020, ApJL, 902, L26, doi: 10.3847/2041-8213/abbc0a
-
[16]
2023, ApJL, 945, L29, doi: 10.3847/2041-8213/acbfb8
Gayathri, V., Wysocki, D., Yang, Y., et al. 2023, ApJL, 945, L29, doi: 10.3847/2041-8213/acbfb8
-
[17]
Heger, A., & Woosley, S. E. 2002, ApJ, 567, 532, doi: 10.1086/338487
-
[18]
Kimball, C., Talbot, C., Berry, C. P. L., et al. 2021, ApJL, 915, L35, doi: 10.3847/2041-8213/ac0aef
-
[19]
Kinugawa, T., Nakamura, T., & Nakano, H. 2021, Monthly Notices of the Royal Astronomical Society: Letters, 501, L49, doi: 10.1093/mnrasl/slaa191
-
[20]
2020, ApJL, 903, L40, doi: 10.3847/2041-8213/abc552
Liu, B., & Bromm, V. 2020, ApJL, 903, L40, doi: 10.3847/2041-8213/abc552
-
[21]
2021, Monthly Notices of the Royal Astronomical Society, 502, 2049, doi: 10.1093/mnras/stab178
Liu, B., & Lai, D. 2021, Monthly Notices of the Royal Astronomical Society, 502, 2049, doi: 10.1093/mnras/stab178
-
[22]
2021, Monthly Notices of the Royal Astronomical Society, 505, 339, doi: 10.1093/mnras/stab1334
Mapelli, M., Dall’Amico, M., Bouffanais, Y., et al. 2021, Monthly Notices of the Royal Astronomical Society, 505, 339, doi: 10.1093/mnras/stab1334
-
[23]
L., Rinaldi, S., Torres-Orjuela, A., et al
Morton, S. L., Rinaldi, S., Torres-Orjuela, A., et al. 2023, Phys. Rev. D, 108, 123039, doi: 10.1103/PhysRevD.108.123039
-
[24]
2023, Monthly Notices of the Royal Astronomical Society, 525, 3986, doi: 10.1093/mnras/stad2502
Mould, M., Gerosa, D., Dall’Amico, M., & Mapelli, M. 2023, Monthly Notices of the Royal Astronomical Society, 525, 3986, doi: 10.1093/mnras/stad2502
-
[25]
Palmese, A., Fishbach, M., Burke, C. J., Annis, J., & Liu, X. 2021, ApJL, 914, L34, doi: 10.3847/2041-8213/ac0883
- [26]
-
[27]
L., Zevin, M., Amaro-Seoane, P., et al
Rodriguez, C. L., Zevin, M., Amaro-Seoane, P., et al. 2019, Phys. Rev. D, 100, 043027, doi: 10.1103/PhysRevD.100.043027
-
[28]
Romero-Shaw, I., Lasky, P. D., Thrane, E., & Bustillo, J. C. 2020, ApJL, 903, L5, doi: 10.3847/2041-8213/abbe26
-
[29]
Romero-Shaw, I. M., Thrane, E., & Lasky, P. D. 2022, Publ. Astron. Soc. Aust., 39, e025, doi: 10.1017/pasa.2022.24
-
[30]
2020, ApJL, 903, L21, doi: 10.3847/2041-8213/abc253
Safarzadeh, M., & Haiman, Z. 2020, ApJL, 903, L21, doi: 10.3847/2041-8213/abc253
-
[31]
Samsing, J., Bartos, I., D’Orazio, D. J., et al. 2022, Nature, 603, 237, doi: 10.1038/s41586-021-04333-1
-
[32]
Schmidt, P., Hannam, M., & Husa, S. 2012, Phys. Rev. D, 86, 104063, doi: 10.1103/PhysRevD.86.104063
-
[33]
Speagle, J. S. 2020, Monthly Notices of the Royal Astronomical Society, 493, 3132, doi: 10.1093/mnras/staa278
-
[34]
Stevenson, S., Berry, C. P. L., & Mandel, I. 2017, Monthly Notices of the Royal Astronomical Society, 471, 2801, doi: 10.1093/mnras/stx1764
-
[35]
2021, ApJ, 908, 194, doi: 10.3847/1538-4357/abd555
Tagawa, H., Kocsis, B., Haiman, Z., et al. 2021, ApJ, 908, 194, doi: 10.3847/1538-4357/abd555
-
[36]
2021, Monthly Notices of the Royal Astronomical Society, 505, 2170, doi: 10.1093/mnras/stab1421
Umeda, H. 2021, Monthly Notices of the Royal Astronomical Society, 505, 2170, doi: 10.1093/mnras/stab1421
- [37]
-
[38]
Vajpeyi, A., Thrane, E., Smith, R., McKernan, B., & Ford, K. E. S. 2022, ApJ, 931, 82, doi: 10.3847/1538-4357/ac6180
-
[39]
Winch, E. R. J., Vink, J. S., Higgins, E. R., & Sabhahitf, G. N. 2024, Monthly Notices of the Royal Astronomical Society, 529, 2980, doi: 10.1093/mnras/stae393
-
[40]
Woosley, S. E. 2017, ApJ, 836, 244, doi: 10.3847/1538-4357/836/2/244 —. 2019, ApJ, 878, 49, doi: 10.3847/1538-4357/ab1b41
-
[41]
Woosley, S. E., Blinnikov, S., & Heger, A. 2007, Nature, 450, 390, doi: 10.1038/nature06333
-
[42]
Woosley, S. E., & Heger, A. 2021, ApJL, 912, L31, doi: 10.3847/2041-8213/abf2c4
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.