Can Bayes Factors "Prove" the Null Hypothesis?

Michael Smithson

arxiv: 1907.05583 · v1 · pith:KJ6FB25Fnew · submitted 2019-07-12 · 📊 stat.ME

Can Bayes Factors "Prove" the Null Hypothesis?

Michael Smithson This is my paper

Pith reviewed 2026-05-24 22:45 UTC · model grok-4.3

classification 📊 stat.ME

keywords Bayes factorsnull hypothesisvague priorspoint alternativessample sizehypothesis testing

0 comments

The pith

A large Bayes factor can favor the null hypothesis against a vague alternative while specific point alternatives remain better supported by the data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Bayes factors can produce strong evidence for a point null hypothesis even when the data actually support other specific values more strongly. This happens because the conventional vague prior for the alternative spreads probability so thinly that it assigns low density near the observed sample mean. With larger samples the mismatch grows, so that a BF exceeding any chosen threshold q in favor of the null versus the vague alternative can still be accompanied by BFs below 1/q in favor of certain point alternatives. A reader would care because this pattern undermines the common practice of treating a high BF as decisive proof of the null. The paper derives the conditions under which the pattern is guaranteed and examines ways to address it.

Core claim

It is possible to obtain a large Bayes Factor favoring the null hypothesis when both the null and alternative hypotheses have low likelihoods, and there are other hypotheses being ignored that are much more strongly supported by the data. As sample sizes become large it becomes increasingly probable that a strong BF favouring a point null against a conventional Bayesian vague alternative co-occurs with a BF favouring various specific alternatives against the null. For any BF threshold q and sample mean, there is a value n such that sample sizes larger than n guarantee that although the BF comparing H0 against a conventional (vague) alternative exceeds q, nevertheless for some range of HypotH

What carries the argument

Bayes factor comparing a point null to a conventional vague prior on the alternative, contrasted with Bayes factors comparing the null to specific point alternatives.

If this is right

For any fixed threshold q the probability of the described mismatch rises toward 1 as n grows.
Standard use of a single vague prior can produce apparently decisive support for the null while ignoring better-supported alternatives.
The mismatch is guaranteed once n exceeds a value that depends only on q and the observed sample mean.
Resolution requires either checking Bayes factors against a range of point alternatives or replacing the vague prior with one that better reflects plausible values.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers may need to report a profile of Bayes factors across a grid of point alternatives rather than a single comparison.
The result suggests that large-sample Bayesian tests of point nulls are sensitive to prior choice in ways that frequentist tests are not.
One practical extension is to examine how the critical n scales with the distance between the sample mean and the null value.

Load-bearing premise

The alternative hypothesis is represented by a conventional vague prior whose probability density near the data may be much lower than the density at the best-fitting point values.

What would settle it

A concrete counter-example in which, for a chosen q and observed sample mean, no sample size n exists such that BF(H0 vs vague) exceeds q while some point-alternative BF falls below 1/q.

read the original abstract

It is possible to obtain a large Bayes Factor (BF) favoring the null hypothesis when both the null and alternative hypotheses have low likelihoods, and there are other hypotheses being ignored that are much more strongly supported by the data. As sample sizes become large it becomes increasingly probable that a strong BF favouring a point null against a conventional Bayesian vague alternative co-occurs with a BF favouring various specific alternatives against the null. For any BF threshold q and sample mean, there is a value n such that sample sizes larger than n guarantee that although the BF comparing H0 against a conventional (vague) alternative exceeds q, nevertheless for some range of hypothetical {\mu}, a BF comparing H0 against {\mu} in that range falls below 1/q. This paper discusses the conditions under which this conundrum occurs and investigates methods for resolving it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper gives an explicit existence result for the sample size where BF favoring null vs vague alt co-occurs with weak BF vs the sample mean.

read the letter

The core observation is that for any Bayes factor threshold q and any fixed sample mean, there is some n beyond which larger samples produce BF01 > q against a conventional vague alternative while still having BF01 < 1/q against certain point alternatives near the observed mean. This is the Lindley-Jeffreys paradox turned into a concrete sample-size guarantee rather than just an asymptotic statement. The paper states the claim cleanly and shows it follows from the marginal likelihood under the vague prior decaying faster than the likelihood at the null as n grows. That part looks solid on the abstract alone and is the main new element. The work is useful because it makes the practical implication explicit: you can get strong evidence for the null against the usual vague alternative even when the data clearly prefer other specific values. The discussion of conditions and possible resolutions is the applied part worth checking in the full text. The main limitation is the reliance on the standard vague prior for the alternative; change that prior and the guarantee need not hold, which the abstract already flags as the key assumption. No other load-bearing issues jump out from the description. This is the sort of targeted methodological note that statistical practitioners and methodologists should see, especially those who use Bayes factors for point-null testing. It is not a broad advance but the specific guarantee is worth having on record. I would send it to peer review so the derivation and any resolution proposals can be checked in detail.

Referee Report

0 major / 3 minor

Summary. The manuscript claims that a large Bayes factor favoring a point null against a conventional vague alternative can occur even when the data more strongly supports specific alternatives, and that this becomes guaranteed for sufficiently large n. Specifically, for any BF threshold q and any sample mean, there exists n such that BF_{01} > q against the vague alternative, yet for some range of mu, BF_{0 mu} < 1/q. The paper discusses the conditions under which this occurs (tied to the Lindley-Jeffreys paradox) and investigates methods for resolving the resulting interpretive conundrum.

Significance. If the central mathematical guarantee holds, the paper provides a clean formalization of a known asymptotic behavior under vague priors, with the 'for any q and sample mean' result being a strength that does not depend on simulations or fitted quantities. This could usefully inform methodological discussions on BF interpretation in large samples, though the result is a direct consequence of standard marginal-likelihood asymptotics rather than a novel derivation.

minor comments (3)

[Abstract] The abstract states the guarantee clearly but does not name the sampling model or prior family (e.g., normal data with wide normal or Cauchy alternative); adding this would make the claim immediately verifiable from the abstract alone.
[Abstract] The discussion of 'methods for resolving it' is mentioned but not previewed; a one-sentence indication of the proposed resolutions (e.g., local priors, posterior predictive checks) would improve the abstract's utility.
Consider citing the original Lindley (1957) and Jeffreys (1961) statements of the paradox to situate the contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the review and the recommendation of minor revision. We address the points raised below.

read point-by-point responses

Referee: the result is a direct consequence of standard marginal-likelihood asymptotics rather than a novel derivation.

Authors: We agree that the asymptotic behavior of Bayes factors under vague priors is related to known properties, including those underlying the Lindley-Jeffreys paradox. The manuscript's specific contribution is the explicit guarantee that, for any threshold q and any observed sample mean, sufficiently large n ensures BF_{01} > q against the vague alternative while BF_{0 mu} < 1/q for some mu. We will add a paragraph in the discussion clarifying the connection to standard marginal-likelihood asymptotics to better contextualize the result. revision: partial

Circularity Check

0 steps flagged

No significant circularity; mathematical demonstration of known asymptotic behavior

full rationale

The paper's central claim is a direct mathematical demonstration that, for any fixed BF threshold q and sample mean, sufficiently large n guarantees BF01 > q against a conventional vague alternative while BF01 < 1/q against some point alternatives near the MLE. This follows immediately from the definitions of the marginal likelihoods under a normal sampling model with fixed vague prior (wide normal or Cauchy) on the alternative: as n grows with fixed x̄ ≠ 0 the marginal under the vague alternative decays faster than the point likelihood at the null, driving BF01 → ∞, while the pointwise BF against μ = x̄ remains bounded. No parameter is fitted to data and then renamed a prediction; no self-citation chain is invoked to justify a uniqueness theorem or ansatz; the result is self-contained in the standard Lindley-Jeffreys asymptotics and does not reduce to any input by construction. The abstract and described derivation contain no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on standard Bayesian hypothesis testing setup with a point null and vague alternative prior; no free parameters, invented entities, or additional axioms are specified.

axioms (1)

domain assumption Bayes factors are computed using a point null hypothesis against a conventional vague alternative prior.
This is the explicit comparison setup described in the abstract.

pith-pipeline@v0.9.0 · 5661 in / 1095 out tokens · 40188 ms · 2026-05-24T22:45:09.081550+00:00 · methodology

Can Bayes Factors "Prove" the Null Hypothesis?

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)