Confidence, Statistical Evidence and Relative Belief with Applications to a Problem in Particle Physics

Michael Evans; Siqi Zheng

arxiv: 2606.10256 · v1 · pith:5UKDXET4new · submitted 2026-06-08 · ⚛️ physics.data-an · hep-ex· stat.AP

Confidence, Statistical Evidence and Relative Belief with Applications to a Problem in Particle Physics

Michael Evans , Siqi Zheng This is my paper

Pith reviewed 2026-06-27 13:57 UTC · model grok-4.3

classification ⚛️ physics.data-an hep-exstat.AP

keywords relative beliefstatistical evidencePoisson modelconfidence intervalsparticle physicsFeldman-Cousinsuncertainty quantificationprinciple of evidence

0 comments

The pith

Relative belief inferences satisfy the principle of evidence and achieve frequentist confidence levels for intervals in the Poisson model used in particle physics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that relative belief inferences follow the principle of evidence by ordering hypotheses according to how the data supports them. When the errors attached to these inferences are controlled, the resulting intervals also attain specified confidence levels under repeated sampling from the model. The authors apply this to the construction of intervals for a signal parameter in the presence of background noise, modeled by a Poisson distribution. The intervals are presented as an alternative to the Feldman-Cousins construction for the same problem. A reader would care because the approach aims to deliver both evidence-based and frequentist properties in a setting common to particle physics experiments.

Core claim

Relative belief inferences satisfy the principle of evidence and, when the errors in these inferences are controlled, also satisfy repeated sampling requirements such as achieving given confidence levels for intervals in the Poisson signal-plus-background model.

What carries the argument

Relative belief inferences based on the relative belief ratio, which orders parameter values by the principle of evidence without requiring a prior.

If this is right

Intervals for the Poisson signal-plus-background model can be constructed that both respect the ordering of evidence and attain specified confidence levels.
The method supplies uncertainty quantification that meets both evidence and repeated-sampling criteria in the particle-physics setting.
These intervals stand as a direct alternative to Feldman-Cousins intervals for the same Poisson problem.
Error control on relative belief inferences yields frequentist validity without introducing a prior distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same error-control technique might be applied to other counting models that arise in physics experiments.
Computational procedures for controlling the errors could be developed once for a family of discrete distributions rather than case by case.
Reporting both the evidence ordering and the achieved coverage could become a standard practice for interval estimation in high-energy physics analyses.

Load-bearing premise

The errors in relative belief inferences can be controlled in a manner that delivers the stated frequentist confidence levels for the Poisson signal-plus-background model without further model-specific assumptions.

What would settle it

A Monte Carlo simulation that checks whether the relative belief intervals attain the nominal coverage probability for the true signal strength when data are repeatedly drawn from the Poisson signal-plus-background distribution.

Figures

Figures reproduced from arXiv: 2606.10256 by Michael Evans, Siqi Zheng.

**Figure 2.** Figure 2: Plots for Example 3.3. With 0 counts observed, the FCCI upper limit fluctuates and [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

**Figure 3.** Figure 3: Plots for Example 3.4 based on a generated sample of [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Plots for Example 3.5 based on a generated sample of [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

read the original abstract

Probability theory provides a clear definition of what is meant by evidence in favor, against or none either way, of an event occurring for an unobserved response, via the principle of evidence. This is immediately applicable when carrying out a proper Bayesian analysis. Even without a prior, this imposes restrictions on reported inferences as these need to reflect the likelihood ordering. Relative belief inferences satisfy this requirement and, when the errors in these inferences are controlled, they also satisfy repeated sampling, or frequentist, requirements such as achieving given confidence levels. Relative belief inferences are considered here for the construction of intervals for uncertainty quantification in the context of a Poisson model for a signal with background noise. These intervals are contrasted with the well-known Feldman-Cousins intervals for this problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Relative belief applied to Poisson signal-plus-background intervals claims both evidence ordering and frequentist coverage after error control, but the abstract leaves the actual control mechanism and comparisons unexamined.

read the letter

The main point is that this paper takes relative belief inferences, which already respect the principle of evidence from probability, and applies them to interval construction for a Poisson signal with background. It claims that once errors are controlled, the intervals also meet repeated-sampling confidence requirements, and it sets them against Feldman-Cousins intervals.

The work does a clean job of spelling out how relative belief uses the likelihood ordering directly and why that matters for reporting inferences in this model. The contrast with Feldman-Cousins is straightforward and relevant for anyone who works with this exact setup in particle physics.

The soft spot is that the abstract gives no derivations, no explicit error-control rule, and no numerical results. Without those, it is impossible to check whether the claimed frequentist coverage holds across signal strengths and background rates or whether it requires extra tuning. The central promise—that error control delivers the stated confidence levels—remains unverified on the basis of what is shown here.

This is for readers who already follow relative-belief methods or who need practical alternatives to standard frequentist intervals in the Poisson-with-background case. It is incremental rather than foundational, but the topic is narrow and applied, so a serious referee could still be useful if the full derivations and checks are solid.

Referee Report

2 major / 2 minor

Summary. The paper claims that the principle of evidence from probability theory restricts inferences to respect likelihood ordering, that relative belief (RB) inferences satisfy this principle, and that when errors in RB inferences are controlled they also achieve repeated-sampling properties such as exact or conservative frequentist coverage. It develops RB interval constructions for a Poisson signal-plus-background model and contrasts them with Feldman-Cousins intervals.

Significance. If a general, assumption-light error-control procedure for RB intervals can be shown to deliver the stated frequentist coverage in the Poisson model, the work would supply a coherent bridge between evidence-based ordering and frequentist guarantees, which is of direct interest for uncertainty quantification in particle-physics searches where Feldman-Cousins is the current standard.

major comments (2)

[Abstract, §3] Abstract and §3 (Poisson application): the central claim that 'when the errors in these inferences are controlled' the RB intervals achieve given confidence levels is load-bearing, yet the manuscript provides only illustrative numerical comparisons with Feldman-Cousins and does not derive or demonstrate a general error-control rule that produces exact or conservative coverage for all signal strengths, background rates, and observation regimes without further model-specific tuning.
[§3] §3, discussion of coverage: the paper asserts that RB intervals can be made to satisfy repeated-sampling requirements, but no table or figure reports empirical coverage probabilities over a grid of true signal values; without such verification the frequentist guarantee does not follow from the principle of evidence alone.

minor comments (2)

Notation for the relative-belief ratio and the error-control threshold should be introduced once with a single symbol and used consistently thereafter.
The manuscript would benefit from an explicit statement of the precise frequentist coverage target (e.g., exact 95 % or conservative) that the error-control procedure is intended to achieve.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below, with revisions where the manuscript requires clarification or additional support.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (Poisson application): the central claim that 'when the errors in these inferences are controlled' the RB intervals achieve given confidence levels is load-bearing, yet the manuscript provides only illustrative numerical comparisons with Feldman-Cousins and does not derive or demonstrate a general error-control rule that produces exact or conservative coverage for all signal strengths, background rates, and observation regimes without further model-specific tuning.

Authors: The manuscript focuses on the Poisson signal-plus-background model and does not claim or derive a general error-control rule that applies without model-specific tuning across arbitrary regimes. The claim is that relative belief inferences satisfy the principle of evidence and, when errors are controlled in this setting, the resulting intervals achieve the stated frequentist properties, as shown through the direct numerical comparisons with Feldman-Cousins. We will revise the abstract and §3 to make the model-specific scope of the error control and frequentist results explicit. revision: partial
Referee: [§3] §3, discussion of coverage: the paper asserts that RB intervals can be made to satisfy repeated-sampling requirements, but no table or figure reports empirical coverage probabilities over a grid of true signal values; without such verification the frequentist guarantee does not follow from the principle of evidence alone.

Authors: The referee correctly notes that the manuscript contains no table or figure reporting empirical coverage probabilities over a grid of true signal values. The frequentist properties are illustrated via targeted comparisons rather than a systematic coverage study. We will add a figure or table with coverage results over a range of signal strengths in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: relative belief properties and error-controlled frequentist coverage presented as independent consequences without self-referential definitions or load-bearing self-citations in provided text

full rationale

The abstract states that relative belief inferences satisfy the principle of evidence by construction from likelihood ordering and, separately, that error control allows them to meet frequentist confidence levels in the Poisson model. No equations, fitted parameters, or self-citations are exhibited that would reduce the coverage claim to a definition or prior result by the same authors. The derivation chain is not shown to collapse by construction; the error-control step is described as an additional requirement rather than tautological. This is the expected honest non-finding when no specific reduction (e.g., Eq. X defined via the coverage it claims to achieve) can be quoted.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the principle of evidence and the controllability of errors in relative belief inferences; both are taken as given from prior work on relative belief.

axioms (2)

standard math Principle of evidence provides a clear definition of evidence in favor, against, or neutral for an event
Stated in the opening sentence of the abstract as the foundation for the approach.
domain assumption Errors in relative belief inferences can be controlled to achieve given confidence levels
Invoked when the abstract asserts that the inferences satisfy repeated-sampling requirements.

pith-pipeline@v0.9.1-grok · 5652 in / 1247 out tokens · 17306 ms · 2026-06-27T13:57:05.575652+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 2 canonical work pages

[1]

This is immediately applicable when carrying out a proper Bayesian analysis

Confidence, Statistical Evidence and Relative Belief with Applications to a Problem in Particle Physics Michael Evans∗and Siqi Zheng † Department of Statistical Sciences, University of Toronto Abstract Probability theory provides a clear definition of what is meant by evidence in favor, against or none either way, of an event occurring for an unobserved r...

Pith/arXiv arXiv 1930
[2]

Certainly, the consideration of such error is essential as it is a measure of the reliability of the inference being quoted

This criticism is not surprising because the concept of confidence itself was not designed to reflect evidence, but rather confidence regions are used to measure the error in an estimate lying within the region. Certainly, the consideration of such error is essential as it is a measure of the reliability of the inference being quoted. As will be shown in ...

1986
[3]

because the observed data is more probable whenψ 1 is the true value than whenψ 2 is the true value. The likelihood ordering seems very natural and Theorem 1 in Section 2.2 implies that any region quoted as a candidate to contain the true value, must respect the ordering and so be a likelihood region. To use the likelihood as a basis for inference and the...

1998
[4]

Such an outcome is commonly considered as an absurdity and a region that exhibits such behavior is calledimproperorabsurd

Notice thatC(1) =C(2) = Θ, the whole parameter space. Such an outcome is commonly considered as an absurdity and a region that exhibits such behavior is calledimproperorabsurd. The reason for this is that, ifC(1) orC(2) is stated, then it is categorically known that the true value is in the set and the confidence level 11/12 seems irrelevant. It has been ...

2024
[5]

One is that it is silent about which interval to quote. It is reasonable to answer, however, that this is not a problem when a prior on ψis provided, as with relative belief, and so this is a problem that other approaches to inference have to deal with. A more serious concern is how to obtain the inference baseI Ψ for a marginal parameter? This is not a p...

2006
[6]

If (7) is large, this indicates that not finding evidence in favor ofH 0,based on the observed data will happen with high prior probability, when H0 is true

denotes the conditional prior distribution of the data given thatH 0 is true, namely, the nuisance parameters have been integrated out. If (7) is large, this indicates that not finding evidence in favor ofH 0,based on the observed data will happen with high prior probability, when H0 is true. As such, it cannot be claimed that finding evidence against is ...

2015
[7]

For the estimation problems the biases are obtained by averaging these biases overλ 0 with respect to the prior (now placed onλ 0)

= 1− X {t:RB(λ0 |t)>1} nt(b+λ 0)t t! e−n(b+λ0), bias in favor(λ0) = sup λ:|λ0−λ|≥δ/2 MT (RBΨ(λ0 |t)≥1|λ) = sup λ∈{λ0−δ/2,λ0+δ/2} X {t:RB(λ0 |t)≥1} nt(b+λ) t t! e−n(b+λ), whereδis the difference that matters. For the estimation problems the biases are obtained by averaging these biases overλ 0 with respect to the prior (now placed onλ 0). Consider now impl...

2025
[8]

With the background so much greater than the signal, it is hard to discern the signal, at least with such a small sample size

The strength of this evidence isStr(0|x) = 0.18, so there is moderate evidence in favor, but certainly not worth claiming thatH 0 is true. With the background so much greater than the signal, it is hard to discern the signal, at least with such a small sample size. The resulting plausible interval is (0,1.87), and it contains 92.3% of the posterior probab...

1999
[9]

Right panel: Relative belief ratio forλ where the horizontal dashed line at 1 marks the evidence cutoff and plausible interval

Left panel: prior onλ, prior onb, posterior density, and plausible interval forλ. Right panel: Relative belief ratio forλ where the horizontal dashed line at 1 marks the evidence cutoff and plausible interval. 4 Code Availability The methods described in this paper are implemented in the Python packagerbinfer, freely available at https://github.com/siqi-z...

work page doi:10.1002/cjs.70015 2025
[10]

Teo, Y.S., Jeong, H., Prasannan, N., Brecht, B., Silberhorn, C., Evans, M., Mogilevtsev, D

DOI: 10.1103/PhysRevA.110.012231. Teo, Y.S., Jeong, H., Prasannan, N., Brecht, B., Silberhorn, C., Evans, M., Mogilevtsev, D. and Sanchez-Soto, L.L. (2024b) Evidence-based certification of quantum dimensions. Physical Review Letters 133, 050204, DOI: 10.1103/PhysRevLett.133.050204. 21 Belief= 0.75 UL init UL 1 Cont 1 UL 2 Cont 2 RB Λ (0|x)Str Λ (0|x) 5.0 ...

work page doi:10.1103/physreva.110.012231

[1] [1]

This is immediately applicable when carrying out a proper Bayesian analysis

Confidence, Statistical Evidence and Relative Belief with Applications to a Problem in Particle Physics Michael Evans∗and Siqi Zheng † Department of Statistical Sciences, University of Toronto Abstract Probability theory provides a clear definition of what is meant by evidence in favor, against or none either way, of an event occurring for an unobserved r...

Pith/arXiv arXiv 1930

[2] [2]

Certainly, the consideration of such error is essential as it is a measure of the reliability of the inference being quoted

This criticism is not surprising because the concept of confidence itself was not designed to reflect evidence, but rather confidence regions are used to measure the error in an estimate lying within the region. Certainly, the consideration of such error is essential as it is a measure of the reliability of the inference being quoted. As will be shown in ...

1986

[3] [3]

because the observed data is more probable whenψ 1 is the true value than whenψ 2 is the true value. The likelihood ordering seems very natural and Theorem 1 in Section 2.2 implies that any region quoted as a candidate to contain the true value, must respect the ordering and so be a likelihood region. To use the likelihood as a basis for inference and the...

1998

[4] [4]

Such an outcome is commonly considered as an absurdity and a region that exhibits such behavior is calledimproperorabsurd

Notice thatC(1) =C(2) = Θ, the whole parameter space. Such an outcome is commonly considered as an absurdity and a region that exhibits such behavior is calledimproperorabsurd. The reason for this is that, ifC(1) orC(2) is stated, then it is categorically known that the true value is in the set and the confidence level 11/12 seems irrelevant. It has been ...

2024

[5] [5]

One is that it is silent about which interval to quote. It is reasonable to answer, however, that this is not a problem when a prior on ψis provided, as with relative belief, and so this is a problem that other approaches to inference have to deal with. A more serious concern is how to obtain the inference baseI Ψ for a marginal parameter? This is not a p...

2006

[6] [6]

If (7) is large, this indicates that not finding evidence in favor ofH 0,based on the observed data will happen with high prior probability, when H0 is true

denotes the conditional prior distribution of the data given thatH 0 is true, namely, the nuisance parameters have been integrated out. If (7) is large, this indicates that not finding evidence in favor ofH 0,based on the observed data will happen with high prior probability, when H0 is true. As such, it cannot be claimed that finding evidence against is ...

2015

[7] [7]

For the estimation problems the biases are obtained by averaging these biases overλ 0 with respect to the prior (now placed onλ 0)

= 1− X {t:RB(λ0 |t)>1} nt(b+λ 0)t t! e−n(b+λ0), bias in favor(λ0) = sup λ:|λ0−λ|≥δ/2 MT (RBΨ(λ0 |t)≥1|λ) = sup λ∈{λ0−δ/2,λ0+δ/2} X {t:RB(λ0 |t)≥1} nt(b+λ) t t! e−n(b+λ), whereδis the difference that matters. For the estimation problems the biases are obtained by averaging these biases overλ 0 with respect to the prior (now placed onλ 0). Consider now impl...

2025

[8] [8]

With the background so much greater than the signal, it is hard to discern the signal, at least with such a small sample size

The strength of this evidence isStr(0|x) = 0.18, so there is moderate evidence in favor, but certainly not worth claiming thatH 0 is true. With the background so much greater than the signal, it is hard to discern the signal, at least with such a small sample size. The resulting plausible interval is (0,1.87), and it contains 92.3% of the posterior probab...

1999

[9] [9]

Right panel: Relative belief ratio forλ where the horizontal dashed line at 1 marks the evidence cutoff and plausible interval

Left panel: prior onλ, prior onb, posterior density, and plausible interval forλ. Right panel: Relative belief ratio forλ where the horizontal dashed line at 1 marks the evidence cutoff and plausible interval. 4 Code Availability The methods described in this paper are implemented in the Python packagerbinfer, freely available at https://github.com/siqi-z...

work page doi:10.1002/cjs.70015 2025

[10] [10]

Teo, Y.S., Jeong, H., Prasannan, N., Brecht, B., Silberhorn, C., Evans, M., Mogilevtsev, D

DOI: 10.1103/PhysRevA.110.012231. Teo, Y.S., Jeong, H., Prasannan, N., Brecht, B., Silberhorn, C., Evans, M., Mogilevtsev, D. and Sanchez-Soto, L.L. (2024b) Evidence-based certification of quantum dimensions. Physical Review Letters 133, 050204, DOI: 10.1103/PhysRevLett.133.050204. 21 Belief= 0.75 UL init UL 1 Cont 1 UL 2 Cont 2 RB Λ (0|x)Str Λ (0|x) 5.0 ...

work page doi:10.1103/physreva.110.012231