Strong Likelihood Principle: Strengthening a Principle or Misunderstanding the Likelihood Function

Paul William Vos

arxiv: 2606.08975 · v1 · pith:UXYNHI45new · submitted 2026-06-08 · 📊 stat.OT

Strong Likelihood Principle: Strengthening a Principle or Misunderstanding the Likelihood Function

Paul William Vos This is my paper

Pith reviewed 2026-06-27 14:21 UTC · model grok-4.3

classification 📊 stat.OT

keywords strong likelihood principleweak likelihood principlelikelihood function domainfamily of distributionsFisher information metricbinomial negative binomialstatistical inference principlesmanifold geometry

0 comments

The pith

The strong likelihood principle collapses to the weak one once the likelihood function is defined on the family of distributions M rather than a parameter space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that the strong likelihood principle (SLP) stems from a misplacement of the likelihood function's domain. When the likelihood is treated as a function on the family of distributions M, the SLP reduces directly to the weak likelihood principle without additional strength. This reduction is shown by comparing the binomial and negative binomial families that share a parameter, and by linking the result to the Fisher information metric on the manifold M. A reader would care because the argument reframes longstanding debates about Birnbaum's derivation of the SLP from sufficiency and conditionality. The same standardization appears in both a statistical comparison of measurements across populations and a geometric argument about manifold distance.

Core claim

When the likelihood function is defined on the family of distributions M rather than on a parameter space, the strong likelihood principle collapses into the weak likelihood principle. The paper illustrates this by analogy with monetary value and develops the claim through the binomial and negative binomial families sharing a parameter, connecting the result to the geometric structure of M via the Fisher information metric. The same standardization arises from a statistical argument about comparing measurements across populations and from a geometric argument about manifold distance, supplying the positive content of the weak likelihood principle.

What carries the argument

The domain of the likelihood function as the family of distributions M, which forces the strong likelihood principle to reduce to the weak likelihood principle.

If this is right

The strong likelihood principle supplies no additional inferential content beyond the weak likelihood principle.
Likelihood comparisons across sampling models that share a parameter become standardized by the geometry of the family M.
The weak likelihood principle acquires positive content from both statistical measurement arguments and manifold-distance arguments.
Birnbaum's derivation of the strong principle from sufficiency and conditionality reflects a confusion about the likelihood's domain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The geometric standardization on M could be tested by checking whether likelihood-based inferences remain invariant under reparameterizations that preserve the family structure.
This domain clarification may apply to other likelihood-based principles such as those involving profile likelihoods or marginal likelihoods.
The convergence of statistical and geometric arguments suggests examining whether Fisher information distances directly quantify the standardization needed for cross-population measurements.

Load-bearing premise

The likelihood function is naturally defined as a function on a family of distributions M rather than on a parameter space.

What would settle it

An explicit pair of experiments, one binomial and one negative binomial, sharing the same parameter value, where the likelihood ratio computed on M fails to match the standardized comparison required by the weak principle.

Figures

Figures reproduced from arXiv: 2606.08975 by Paul William Vos.

**Figure 1.** Figure 1: Binomial log likelihood plotted against θ = p (left) and ξ = logit(p) (right), for n = 20, y = 8. Gray segments rise from the floor to five reference distributions; dashed horizontals connect the two coordinate pictures of each m. The heights at matched segments agree: ℓ y M is one function on M. The slopes differ, because they are expressed in different units. though we have suppressed it). This is not a … view at source ↗

**Figure 2.** Figure 2: Standardized score ∂ℓy/ √ I for the binomial family in coordinates θ = p (left) and ξ = logit(p) (right). Heights at the five reference distributions agree across the two panels: the standardized score is a function on M, not on the parameter space. The curve crosses zero at the MLE, mˆ y. θM(m) = θ, a Taylor expansion of KLΘ(θ◦, θ) at θ = θ◦ gives KLΘ(θ◦, θ) = − 1 2E [PITH_FULL_IMAGE:figures/full_fig_p00… view at source ↗

**Figure 3.** Figure 3: Standardized score ∂ℓy/ √ I for the binomial and negative binomial families, plotted against the shared parameter p, for matched data. The two parametric log likelihoods agree as functions of p up to a constant, but the standardized scores are different functions of p because the Fisher informations differ. The closed-form KL divergences make this concrete: KLBin(p1, p2) = n p1 log p1 p2 + (1 − p1) log 1… view at source ↗

read the original abstract

The strong likelihood principle (SLP) is conventionally derived from the sufficiency principle and a conditionality principle in an argument due to Birnbaum, and much of the literature contests whether the derivation is sound. We take a different approach. We ask what the SLP says when its terms are read carefully, and argue that the principle as ordinarily stated reflects a confusion about the domain of the likelihood function. The likelihood is naturally defined as a function on a family of distributions $M$, not on a parameter space, and once it is so defined the SLP collapses into its weak counterpart, the weak likelihood principle. The diagnosis is illustrated by analogy with monetary value, developed concretely through a comparison of the binomial and negative binomial families that share a parameter, and connected to the geometric structure of $M$ through the Fisher information metric. The same standardization emerges from a statistical argument about comparing measurements across populations and from a geometric argument about manifold distance; this convergence supplies the positive content of the weak likelihood principle.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims that treating the likelihood as a map on the model family M rather than the parameter space makes the strong likelihood principle collapse to the weak version, illustrated via binomial/negative-binomial and Fisher geometry.

read the letter

The main takeaway is that the strong likelihood principle arises from a domain mix-up: likelihood should be a function on the family of distributions M, not the parameter space, and once fixed that way the strong version reduces to the weak one.

The paper does something useful by avoiding the usual Birnbaum sufficiency-plus-conditionality route and instead reading the terms directly. The binomial versus negative binomial comparison (same parameter, different sampling) makes the point concrete, and the monetary value analogy helps. Linking the standardization to the Fisher metric on M and showing convergence from both a statistical comparison argument and a manifold distance argument gives the claim some independent support.

The soft spot is the premise that the domain on M is the natural one. The standard definition writes L(θ; x) = f(x|θ) with data fixed and θ varying, so shifting the domain changes what the principle asserts rather than exposing an internal flaw in the original statement. The abstract does not supply a derivation showing that every formulation of the SLP becomes logically identical to the WLP under this shift, so the collapse risks being partly by construction. The geometric and statistical arguments are suggestive but would need checking against further examples to confirm they are not special to the cases shown.

This is for readers already working on the foundations of inference who are comfortable with model-manifold or geometric views. Someone who has followed the post-Birnbaum literature on the likelihood principle will see the new angle most clearly.

It deserves a serious referee. The engagement with the existing debate is direct, the illustrations are specific, and the domain claim is the sort of thing referees can test against the literature and counterexamples.

Referee Report

3 major / 2 minor

Summary. The paper claims that the strong likelihood principle (SLP), conventionally derived from sufficiency and conditionality, reflects a confusion about the domain of the likelihood function. The likelihood is naturally a function on the family of distributions M rather than a parameter space; once redefined on M, the SLP collapses into the weak likelihood principle (WLP). This is illustrated via an analogy to monetary value, a concrete binomial versus negative-binomial comparison sharing a parameter, and connections to the Fisher information metric on M. Convergence of statistical arguments about cross-population measurement and geometric manifold-distance arguments supplies positive content for the WLP.

Significance. If the central claim holds, the paper would reframe decades of debate on Birnbaum's derivation as a domain-specification issue rather than a question of logical validity, potentially redirecting foundational statistics toward the geometric structure of likelihood. The explicit convergence of definitional, statistical, and geometric lines of argument is a strength that supplies independent motivation for the WLP. The result would be significant for the literature on likelihood principles if accompanied by a general demonstration that the domain shift preserves the intended content of SLP statements.

major comments (3)

[Abstract] Abstract and opening sections: the claim that 'once it is so defined the SLP collapses into its weak counterpart' is not supported by an explicit logical mapping or derivation showing that every standard SLP statement becomes equivalent to the corresponding WLP statement under the domain M; the reduction therefore risks being definitional by construction rather than exposing an internal error in the original formulation.
[Binomial/negative binomial comparison] Binomial/negative-binomial comparison (the concrete illustration section): while the example shows that the same numerical parameter can label distinct elements of M, it does not establish that the SLP-to-WLP collapse holds for arbitrary sampling models or that the original SLP statements are thereby rendered identical to WLP statements; a general argument is required for the central claim.
[Geometric argument] Geometric argument via Fisher metric: the connection to manifold distance supplies motivation for standardization on M, but does not by itself demonstrate the logical collapse of the SLP; the paper must show how this geometric fact entails equivalence of the two principles rather than merely consistency with the WLP.

minor comments (2)

The monetary-value analogy is suggestive but would benefit from a short numerical table contrasting 'value' under different units to parallel the binomial/negative-binomial case.
Notation for the family M and the likelihood map L:M o R should be introduced with a formal definition early in the text to avoid ambiguity when contrasting with the conventional L( heta;x).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. The comments correctly identify places where the logical steps from domain redefinition to the collapse of the SLP can be made more explicit. We will revise the manuscript to supply the requested mappings and general arguments while preserving the paper's core thesis that the domain of the likelihood is M.

read point-by-point responses

Referee: [Abstract] Abstract and opening sections: the claim that 'once it is so defined the SLP collapses into its weak counterpart' is not supported by an explicit logical mapping or derivation showing that every standard SLP statement becomes equivalent to the corresponding WLP statement under the domain M; the reduction therefore risks being definitional by construction rather than exposing an internal error in the original formulation.

Authors: We accept that an explicit logical mapping would strengthen the presentation and prevent any appearance of definitional collapse. In the revised version we will insert a new subsection that takes standard formulations of the SLP (including Birnbaum's sufficiency-plus-conditionality derivation) and shows, statement by statement, how each becomes a WLP statement once the likelihood is treated as a function on M rather than on a shared parameter space. The mapping rests on the observation that distinct sampling models correspond to distinct points of M, so cross-model likelihood ratios are undefined; this is an internal consequence of the domain choice rather than an external stipulation. revision: yes
Referee: [Binomial/negative binomial comparison] Binomial/negative-binomial comparison (the concrete illustration section): while the example shows that the same numerical parameter can label distinct elements of M, it does not establish that the SLP-to-WLP collapse holds for arbitrary sampling models or that the original SLP statements are thereby rendered identical to WLP statements; a general argument is required for the central claim.

Authors: The binomial-negative-binomial comparison is offered only as a concrete illustration of how a shared numerical parameter can index distinct elements of M. The general argument is the domain redefinition itself, which applies to any pair of sampling models. To address the request for an explicit general demonstration, the revision will add a short section that considers arbitrary distinct models M1 and M2 sharing a parameter label and shows that any SLP claim comparing L_M1 and L_M2 is ill-formed, reducing directly to separate WLP claims within each M. The concrete example will be retained as motivation. revision: yes
Referee: [Geometric argument] Geometric argument via Fisher metric: the connection to manifold distance supplies motivation for standardization on M, but does not by itself demonstrate the logical collapse of the SLP; the paper must show how this geometric fact entails equivalence of the two principles rather than merely consistency with the WLP.

Authors: We agree that the Fisher-metric argument is not presented as a standalone proof of the collapse; it is one of three independent lines (definitional, statistical, geometric) that converge on standardization to M. The logical collapse is derived from the domain redefinition. In revision we will add an explicit clarifying sentence stating that the geometric construction supplies supporting structure and independent motivation for the WLP without being claimed to entail the equivalence by itself. revision: yes

Circularity Check

1 steps flagged

Central claim of SLP collapse follows directly from redefinition of likelihood domain

specific steps

self definitional [Abstract]
"The likelihood is naturally defined as a function on a family of distributions $M$, not on a parameter space, and once it is so defined the SLP collapses into its weak counterpart, the weak likelihood principle."

The argument asserts that the SLP as ordinarily stated reflects confusion about the domain, and that redefining likelihood as a map on M makes SLP identical to WLP. The collapse is presented as following immediately from the domain change, rendering the reduction definitional by construction rather than a derived equivalence independent of the redefinition.

full rationale

The paper's main thesis is that careful reading of the SLP reveals a domain confusion, leading to its collapse into the WLP upon correcting the domain to the family of distributions M. This is illustrated with examples and connected to geometric structure. While supporting arguments from statistics and geometry provide independent content for the WLP, the specific claim that SLP collapses is tied to the definitional shift, creating moderate circularity in the diagnosis of the principle as a misunderstanding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that the likelihood's natural domain is the model family M; this is treated as a domain assumption rather than derived. No free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The likelihood function is naturally defined on the family of distributions M rather than on a parameter space.
Invoked directly in the abstract as the key to showing SLP collapses to WLP.

pith-pipeline@v0.9.1-grok · 5694 in / 1333 out tokens · 16236 ms · 2026-06-27T14:21:51.007583+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 11 canonical work pages

[1]

Barnard, G. A. and Sprott, D. A. (2006). Likelihood. In Encyclopedia of Statistical Sciences. John Wiley & Sons, New York. https://doi.org/10.1002/0471667196.ess1448.pub2

work page doi:10.1002/0471667196.ess1448.pub2 2006
[2]

Berger, J. O. and Wolpert, R. L. (1988). The Likelihood Principle, 2nd ed. Lecture Notes---Monograph Series 6. IMS, Hayward, CA. https://doi.org/10.1214/lnms/1215466210

work page doi:10.1214/lnms/1215466210 1988
[3]

Birnbaum, A. (1962). On the foundations of statistical inference. J. Amer. Statist. Assoc. 57 269--306. https://doi.org/10.1080/01621459.1962.10480660

work page doi:10.1080/01621459.1962.10480660 1962
[4]

Cox, D. R. (1958). Some problems connected with statistical inference. Ann. Math. Statist. 29 357--372. https://doi.org/10.1214/aoms/1177706618

work page doi:10.1214/aoms/1177706618 1958
[5]

Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, London

1974
[6]

Dawid, A. P. (2014). Discussion of ``On the Birnbaum argument for the strong likelihood principle.'' Statist. Sci. 29 240--241. https://doi.org/10.1214/14-STS470

work page doi:10.1214/14-sts470 2014
[7]

Durbin, J. (1970). On Birnbaum's theorem on the relation between sufficiency, conditionality and likelihood. J. Amer. Statist. Assoc. 65 395--398. https://doi.org/10.1080/01621459.1970.10481088

work page doi:10.1080/01621459.1970.10481088 1970
[8]

J., Fraser, D

Evans, M. J., Fraser, D. A. S. and Monette, G. (1986). On principles and arguments to likelihood. Canad. J. Statist. 14 181--199. https://doi.org/10.2307/3314794

work page doi:10.2307/3314794 1986
[9]

Fraser, D. A. S. (1963). On the sufficiency and likelihood principles. J. Amer. Statist. Assoc. 58 641--647

1963
[10]

Kalbfleisch, J. D. (1975). Sufficiency and conditionality. Biometrika 62 251--268. https://doi.org/10.1093/biomet/62.2.251

work page doi:10.1093/biomet/62.2.251 1975
[11]

Mayo, D. G. (2014). On the Birnbaum argument for the strong likelihood principle. Statist. Sci. 29 227--239. https://doi.org/10.1214/13-STS457

work page doi:10.1214/13-sts457 2014
[12]

Vos, P. W. (2022). Generalized estimators, slope, efficiency, and Fisher information bounds. Information Geometry 7 151--170. https://doi.org/10.1007/s41884-022-00085-7

work page doi:10.1007/s41884-022-00085-7 2022
[13]

Vos, P. W. and Wu, Q. (2025). Generalized estimation and information. Information Geometry 8 99--123. https://doi.org/10.1007/s41884-025-00164-5

work page doi:10.1007/s41884-025-00164-5 2025

[1] [1]

Barnard, G. A. and Sprott, D. A. (2006). Likelihood. In Encyclopedia of Statistical Sciences. John Wiley & Sons, New York. https://doi.org/10.1002/0471667196.ess1448.pub2

work page doi:10.1002/0471667196.ess1448.pub2 2006

[2] [2]

Berger, J. O. and Wolpert, R. L. (1988). The Likelihood Principle, 2nd ed. Lecture Notes---Monograph Series 6. IMS, Hayward, CA. https://doi.org/10.1214/lnms/1215466210

work page doi:10.1214/lnms/1215466210 1988

[3] [3]

Birnbaum, A. (1962). On the foundations of statistical inference. J. Amer. Statist. Assoc. 57 269--306. https://doi.org/10.1080/01621459.1962.10480660

work page doi:10.1080/01621459.1962.10480660 1962

[4] [4]

Cox, D. R. (1958). Some problems connected with statistical inference. Ann. Math. Statist. 29 357--372. https://doi.org/10.1214/aoms/1177706618

work page doi:10.1214/aoms/1177706618 1958

[5] [5]

Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, London

1974

[6] [6]

Dawid, A. P. (2014). Discussion of ``On the Birnbaum argument for the strong likelihood principle.'' Statist. Sci. 29 240--241. https://doi.org/10.1214/14-STS470

work page doi:10.1214/14-sts470 2014

[7] [7]

Durbin, J. (1970). On Birnbaum's theorem on the relation between sufficiency, conditionality and likelihood. J. Amer. Statist. Assoc. 65 395--398. https://doi.org/10.1080/01621459.1970.10481088

work page doi:10.1080/01621459.1970.10481088 1970

[8] [8]

J., Fraser, D

Evans, M. J., Fraser, D. A. S. and Monette, G. (1986). On principles and arguments to likelihood. Canad. J. Statist. 14 181--199. https://doi.org/10.2307/3314794

work page doi:10.2307/3314794 1986

[9] [9]

Fraser, D. A. S. (1963). On the sufficiency and likelihood principles. J. Amer. Statist. Assoc. 58 641--647

1963

[10] [10]

Kalbfleisch, J. D. (1975). Sufficiency and conditionality. Biometrika 62 251--268. https://doi.org/10.1093/biomet/62.2.251

work page doi:10.1093/biomet/62.2.251 1975

[11] [11]

Mayo, D. G. (2014). On the Birnbaum argument for the strong likelihood principle. Statist. Sci. 29 227--239. https://doi.org/10.1214/13-STS457

work page doi:10.1214/13-sts457 2014

[12] [12]

Vos, P. W. (2022). Generalized estimators, slope, efficiency, and Fisher information bounds. Information Geometry 7 151--170. https://doi.org/10.1007/s41884-022-00085-7

work page doi:10.1007/s41884-022-00085-7 2022

[13] [13]

Vos, P. W. and Wu, Q. (2025). Generalized estimation and information. Information Geometry 8 99--123. https://doi.org/10.1007/s41884-025-00164-5

work page doi:10.1007/s41884-025-00164-5 2025