pith. sign in

arxiv: 2405.09739 · v2 · submitted 2024-05-16 · 🌌 astro-ph.HE · gr-qc

Are all models wrong? Falsifying binary formation models in gravitational-wave astronomy

Pith reviewed 2026-05-24 01:02 UTC · model grok-4.3

classification 🌌 astro-ph.HE gr-qc
keywords gravitational wavesblack hole mergersmodel selectionp-valueGW190521hierarchical mergersactive galactic nucleiglobular clusters
0
0 comments X

The pith

A frequentist p-value test reveals that some but not all hierarchical merger models adequately explain GW190521.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a frequentist p-value method to check whether proposed models actually provide adequate explanations for exceptional gravitational-wave events, rather than merely ranking them with Bayesian selection. It applies the test to hierarchical black hole merger models in active galactic nuclei and globular clusters as explanations for the high-mass event GW190521. Some of these models pass the adequacy test while others fail. A sympathetic reader would care because if every tested model is inadequate, the field needs new formation scenarios instead of continued comparison among existing ones. The method supplies a concrete way to decide when the growing catalogue of events demands revised physics.

Core claim

We introduce a frequentist p-value to assess whether a model provides an adequate explanation for the data. Applied to hierarchical merger models for GW190521, some models in active galactic nuclei and globular clusters yield adequate explanations while others do not.

What carries the argument

A frequentist p-value calculation that diagnoses model adequacy for rare, high-mass gravitational-wave events.

If this is right

  • Bayesian model selection alone is insufficient; models must also pass an adequacy test.
  • Hierarchical mergers in some environments can explain exceptionally massive events like GW190521.
  • When all tested models fail the p-value test, entirely new formation channels become necessary.
  • The method can be applied to other exceptional events in the growing gravitational-wave catalogue.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adequacy test could be run on models proposed for the extreme mass-ratio event GW190814.
  • Repeated application to future high-mass detections might systematically rule out entire classes of hierarchical merger scenarios.
  • The approach highlights the value of designing population analyses that explicitly separate selection effects from the model adequacy question.

Load-bearing premise

The frequentist p-value calculation correctly diagnoses model adequacy for rare high-mass events without being undermined by unmodeled selection effects or population assumptions.

What would settle it

A specific p-value below a conventional threshold such as 0.05 for one of the hierarchical merger models applied to GW190521 would show that model is inadequate.

Figures

Figures reproduced from arXiv: 2405.09739 by Ben Farr, Eric Thrane, Ethan Payne, Lachlan Passenger, Paul D. Lasky, Simon Stevenson.

Figure 1
Figure 1. Figure 1: The distribution of total mass mtot for different models. The solid curves are the original models while the histograms show the distribution of the most massive event in each simulated catalogue. We consider three variations of the AGN model from Gayathri et al. (2023) with different maximum black hole masses mmax (blue, orange, green). We include a globular cluster model from Rodriguez et al. (2019) (pin… view at source ↗
Figure 2
Figure 2. Figure 2: Log-normalised evidence ln Z versus peak total mass likelihood for the most-massive events of the mmax = 50M⊙ AGN model. Total masses are the median posterior value. Lower ln Z values correspond to catalogues that are relatively unusual—either because the maximum total mass is either unusually small or unusually large. The highest values of ln Z correspond to typical catalogues with usual values for the ma… view at source ↗
Figure 3
Figure 3. Figure 3: Probability distribution of log-normalised evidence (ln Z) for distributions of most-massive, detectable events drawn from Eq. 4 (histograms), against ln Z ′ , the log-normalised evidence for GW190521 (black-dashed line), for (from left to right, top to bottom) AGN models with mmax = 15M⊙ (blue), 50M⊙ (orange) and 75M⊙ (green), as well as a globular cluster model with χbirth = 0 (pink). The p-value calcula… view at source ↗
read the original abstract

As the catalogue of gravitational-wave transients grows, several entries appear "exceptional" within the population. Tipping the scales with a total mass of $\approx 150 M_\odot$, GW190521 likely contained black holes in the pair-instability mass gap. The event GW190814, meanwhile, is unusual for its extreme mass ratio and the mass of its secondary component. A growing model-building industry has emerged to provide explanations for such exceptional events, and Bayesian model selection is frequently used to determine the most informative model. However, Bayesian methods can only take us so far. They provide no answer to the question: does our model provide an adequate explanation for the data? If none of the models we are testing provide an adequate explanation, then it is not enough to simply rank our existing models - we need new ones. In this paper, we introduce a method to answer this question with a frequentist $p$-value. We apply the method to different models that have been suggested to explain GW190521: hierarchical mergers in active galactic nuclei and globular clusters. We show that some (but not all) of these models provide adequate explanations for exceptionally massive events like GW190521.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a frequentist p-value procedure to assess whether binary black hole formation models provide adequate explanations for exceptional events such as GW190521 (total mass ~150 M_⊙). It applies the method to hierarchical-merger scenarios in active galactic nuclei and globular clusters, concluding that some (but not all) models yield adequate p-values while others do not.

Significance. If the p-value construction is shown to be robust, the work supplies a concrete falsification tool that complements Bayesian model selection in gravitational-wave population studies, directly addressing the question of model adequacy for rare, high-mass events.

major comments (2)
  1. [§3.1–3.2] §3.1–3.2 (p-value definition and likelihood): the test statistic is constructed from a likelihood that encodes both formation-channel physics and the observational selection function, yet no explicit marginalization over population hyperparameters or sensitivity analysis under variations in the mass-dependent detection probability is provided; this directly affects whether the reported tail probabilities can be trusted for GW190521.
  2. [§4] §4 (application to GW190521 models): the claim that certain AGN and globular-cluster models are adequate rests on the computed p-values, but the manuscript supplies neither a validation study nor error analysis of the p-value procedure itself, leaving open the possibility that unmodeled selection effects bias the adequacy diagnosis.
minor comments (2)
  1. [§2] Notation for the selection function and the precise definition of the test statistic should be introduced earlier and used consistently throughout.
  2. [Figures 2–4] Figure captions could explicitly state the assumed priors on spin and redshift distributions used in the likelihood.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§3.1–3.2] §3.1–3.2 (p-value definition and likelihood): the test statistic is constructed from a likelihood that encodes both formation-channel physics and the observational selection function, yet no explicit marginalization over population hyperparameters or sensitivity analysis under variations in the mass-dependent detection probability is provided; this directly affects whether the reported tail probabilities can be trusted for GW190521.

    Authors: Our p-value procedure is constructed for a fixed formation model (with hyperparameters set to literature values), as is standard for frequentist model-adequacy tests rather than full hierarchical inference. The likelihood already folds in the selection function for that model. We nevertheless agree that a sensitivity analysis would increase confidence in the tail probabilities. We will add such an analysis, including variations in the mass-dependent detection probability, to the revised manuscript. revision: yes

  2. Referee: [§4] §4 (application to GW190521 models): the claim that certain AGN and globular-cluster models are adequate rests on the computed p-values, but the manuscript supplies neither a validation study nor error analysis of the p-value procedure itself, leaving open the possibility that unmodeled selection effects bias the adequacy diagnosis.

    Authors: We accept that an explicit validation study and error analysis would strengthen the application section. In the revision we will include Monte Carlo validation (drawing synthetic events from each model and confirming that the p-value distribution is uniform under the null) together with a discussion of possible biases from unmodeled selection effects and how the inclusion of the selection function in the likelihood mitigates them. revision: yes

Circularity Check

0 steps flagged

No circularity: frequentist p-value adequacy test is independent of model inputs

full rationale

The paper introduces a frequentist p-value method to assess whether formation models (hierarchical mergers in AGN or globular clusters) adequately explain exceptional events like GW190521, distinct from Bayesian ranking. The abstract presents this as an external check on model adequacy without any quoted equations or steps that reduce the p-value to a fitted parameter, self-referential definition, or self-citation chain by construction. No load-bearing step equates the test statistic to its own inputs; the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The central motivation rests on the domain assumption that Bayesian model selection alone cannot answer adequacy questions.

axioms (1)
  • domain assumption Bayesian model selection is insufficient to determine whether any tested model provides an adequate explanation
    Explicitly stated as motivation in the abstract.

pith-pipeline@v0.9.0 · 5756 in / 1195 out tokens · 28423 ms · 2026-05-24T01:02:50.602890+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    2015, Class

    Aasi et al., J. 2015, Class. Quantum Grav., 32, 074001, doi: 10.1088/0264-9381/32/7/074001 Abbott et al., R. 2020a, Phys. Rev. Lett., 125, 101102, doi: 10.1103/PhysRevLett.125.101102 —. 2020b, ApJL, 896, L44, doi: 10.3847/2041-8213/ab960f —. 2023, Phys. Rev. X, 13, 041039, doi: 10.1103/PhysRevX.13.041039

  2. [2]

    2015, Class

    Acernese, F., Agathos, M., Agatsuma, K., et al. 2015, Class. Quantum Grav., 32, 024001, doi: 10.1088/0264-9381/32/2/024001

  3. [3]

    2019, Nat Astron, 3, 35, doi: 10.1038/s41550-018-0658-y

    Akutsu, T., Ando, M., Arai, K., et al. 2019, Nat Astron, 3, 35, doi: 10.1038/s41550-018-0658-y

  4. [4]

    2022, ApJ, 941, 4, doi: 10.3847/1538-4357/ac9d95

    Anagnostou, O., Trenti, M., & Melatos, A. 2022, ApJ, 941, 4, doi: 10.3847/1538-4357/ac9d95

  5. [5]

    2021, ApJ, 920, 128, doi: 10.3847/1538-4357/ac1419

    Arca-Sedda, M., Paolo Rizzuto, F., Naab, T., et al. 2021, ApJ, 920, 128, doi: 10.3847/1538-4357/ac1419

  6. [6]

    D., et al

    Ashton, G., H¨ ubner, M., Lasky, P. D., et al. 2019, ApJS, 241, 27, doi: 10.3847/1538-4365/ab06fc

  7. [7]

    2016, A&A, 594, A97, doi: 10.1051/0004-6361/201628980

    Belczynski, K., Heger, A., Gladysz, W., et al. 2016, A&A, 594, A97, doi: 10.1051/0004-6361/201628980

  8. [8]

    2022, Physics Letters B, 829, 137040, doi: 10.1016/j.physletb.2022.137040

    Chen, Z.-C., Yuan, C., & Huang, Q.-G. 2022, Physics Letters B, 829, 137040, doi: 10.1016/j.physletb.2022.137040

  9. [9]

    2022, Physics of the Dark Universe, 38, 101111, doi: 10.1016/j.dark.2022.101111

    Clesse, S., & Garc´ ıa-Bellido, J. 2022, Physics of the Dark Universe, 38, 101111, doi: 10.1016/j.dark.2022.101111

  10. [10]

    2021, Monthly Notices of the Royal Astronomical Society, 501, 4514, doi: 10.1093/mnras/staa3916 Dall’Amico, M., Mapelli, M., Di Carlo, U

    Costa, G., Bressan, A., Mapelli, M., et al. 2021, Monthly Notices of the Royal Astronomical Society, 501, 4514, doi: 10.1093/mnras/staa3916 Dall’Amico, M., Mapelli, M., Di Carlo, U. N., et al. 2021, Monthly Notices of the Royal Astronomical Society, 508, 3045, doi: 10.1093/mnras/stab2783 De Luca, V., Desjacques, V., Franciolini, G., Pani, P., &

  11. [11]

    2021, Phys

    Riotto, A. 2021, Phys. Rev. Lett., 126, 051101, doi: 10.1103/PhysRevLett.126.051101

  12. [12]

    2022, ApJ, 926, 34, doi: 10.3847/1538-4357/ac3978

    Essick, R., Farah, A., Galaudage, S., et al. 2022, ApJ, 926, 34, doi: 10.3847/1538-4357/ac3978

  13. [13]

    2019, ApJ, 887, 53, doi: 10.3847/1538-4357/ab518b

    Justham, S. 2019, ApJ, 887, 53, doi: 10.3847/1538-4357/ab518b

  14. [14]

    Fishbach, M., Essick, R., & Holz, D. E. 2020, ApJL, 899, L8, doi: 10.3847/2041-8213/aba7b6

  15. [15]

    Fragione, G., Loeb, A., & Rasio, F. A. 2020, ApJL, 902, L26, doi: 10.3847/2041-8213/abbc0a

  16. [16]

    2023, ApJL, 945, L29, doi: 10.3847/2041-8213/acbfb8

    Gayathri, V., Wysocki, D., Yang, Y., et al. 2023, ApJL, 945, L29, doi: 10.3847/2041-8213/acbfb8

  17. [17]

    Heger, A., & Woosley, S. E. 2002, ApJ, 567, 532, doi: 10.1086/338487

  18. [18]

    Kimball, C., Talbot, C., Berry, C. P. L., et al. 2021, ApJL, 915, L35, doi: 10.3847/2041-8213/ac0aef

  19. [19]

    2021, Monthly Notices of the Royal Astronomical Society: Letters, 501, L49, doi: 10.1093/mnrasl/slaa191

    Kinugawa, T., Nakamura, T., & Nakano, H. 2021, Monthly Notices of the Royal Astronomical Society: Letters, 501, L49, doi: 10.1093/mnrasl/slaa191

  20. [20]

    2020, ApJL, 903, L40, doi: 10.3847/2041-8213/abc552

    Liu, B., & Bromm, V. 2020, ApJL, 903, L40, doi: 10.3847/2041-8213/abc552

  21. [21]

    2021, Monthly Notices of the Royal Astronomical Society, 502, 2049, doi: 10.1093/mnras/stab178

    Liu, B., & Lai, D. 2021, Monthly Notices of the Royal Astronomical Society, 502, 2049, doi: 10.1093/mnras/stab178

  22. [22]

    2021, Monthly Notices of the Royal Astronomical Society, 505, 339, doi: 10.1093/mnras/stab1334

    Mapelli, M., Dall’Amico, M., Bouffanais, Y., et al. 2021, Monthly Notices of the Royal Astronomical Society, 505, 339, doi: 10.1093/mnras/stab1334

  23. [23]

    L., Rinaldi, S., Torres-Orjuela, A., et al

    Morton, S. L., Rinaldi, S., Torres-Orjuela, A., et al. 2023, Phys. Rev. D, 108, 123039, doi: 10.1103/PhysRevD.108.123039

  24. [24]

    2023, Monthly Notices of the Royal Astronomical Society, 525, 3986, doi: 10.1093/mnras/stad2502

    Mould, M., Gerosa, D., Dall’Amico, M., & Mapelli, M. 2023, Monthly Notices of the Royal Astronomical Society, 525, 3986, doi: 10.1093/mnras/stad2502

  25. [25]

    J., Annis, J., & Liu, X

    Palmese, A., Fishbach, M., Burke, C. J., Annis, J., & Liu, X. 2021, ApJL, 914, L34, doi: 10.3847/2041-8213/ac0883

  26. [26]

    2023, Phys

    Payne, E., & Thrane, E. 2023, Phys. Rev. Res., 5, 023013

  27. [27]

    L., Zevin, M., Amaro-Seoane, P., et al

    Rodriguez, C. L., Zevin, M., Amaro-Seoane, P., et al. 2019, Phys. Rev. D, 100, 043027, doi: 10.1103/PhysRevD.100.043027

  28. [28]

    D., Thrane, E., & Bustillo, J

    Romero-Shaw, I., Lasky, P. D., Thrane, E., & Bustillo, J. C. 2020, ApJL, 903, L5, doi: 10.3847/2041-8213/abbe26

  29. [29]

    M., Thrane, E., & Lasky, P

    Romero-Shaw, I. M., Thrane, E., & Lasky, P. D. 2022, Publ. Astron. Soc. Aust., 39, e025, doi: 10.1017/pasa.2022.24

  30. [30]

    2020, ApJL, 903, L21, doi: 10.3847/2041-8213/abc253

    Safarzadeh, M., & Haiman, Z. 2020, ApJL, 903, L21, doi: 10.3847/2041-8213/abc253

  31. [31]

    J., et al

    Samsing, J., Bartos, I., D’Orazio, D. J., et al. 2022, Nature, 603, 237, doi: 10.1038/s41586-021-04333-1

  32. [32]

    2012, Phys

    Schmidt, P., Hannam, M., & Husa, S. 2012, Phys. Rev. D, 86, 104063, doi: 10.1103/PhysRevD.86.104063

  33. [33]

    Speagle, J. S. 2020, Monthly Notices of the Royal Astronomical Society, 493, 3132, doi: 10.1093/mnras/staa278

  34. [34]

    Stevenson, S., Berry, C. P. L., & Mandel, I. 2017, Monthly Notices of the Royal Astronomical Society, 471, 2801, doi: 10.1093/mnras/stx1764

  35. [35]

    2021, ApJ, 908, 194, doi: 10.3847/1538-4357/abd555

    Tagawa, H., Kocsis, B., Haiman, Z., et al. 2021, ApJ, 908, 194, doi: 10.3847/1538-4357/abd555

  36. [36]

    2021, Monthly Notices of the Royal Astronomical Society, 505, 2170, doi: 10.1093/mnras/stab1421

    Umeda, H. 2021, Monthly Notices of the Royal Astronomical Society, 505, 2170, doi: 10.1093/mnras/stab1421

  37. [37]

    2019, Pub

    Thrane, E., & Talbot, C. 2019, Pub. Astron. Soc. Aust., 36, E010 8

  38. [38]

    Vajpeyi, A., Thrane, E., Smith, R., McKernan, B., & Ford, K. E. S. 2022, ApJ, 931, 82, doi: 10.3847/1538-4357/ac6180

  39. [39]

    Winch, E. R. J., Vink, J. S., Higgins, E. R., & Sabhahitf, G. N. 2024, Monthly Notices of the Royal Astronomical Society, 529, 2980, doi: 10.1093/mnras/stae393

  40. [40]

    Woosley, S. E. 2017, ApJ, 836, 244, doi: 10.3847/1538-4357/836/2/244 —. 2019, ApJ, 878, 49, doi: 10.3847/1538-4357/ab1b41

  41. [41]

    E., Blinnikov, S., & Heger, A

    Woosley, S. E., Blinnikov, S., & Heger, A. 2007, Nature, 450, 390, doi: 10.1038/nature06333

  42. [42]

    E., & Heger, A

    Woosley, S. E., & Heger, A. 2021, ApJL, 912, L31, doi: 10.3847/2041-8213/abf2c4