pith. sign in

arxiv: 2604.02664 · v1 · submitted 2026-04-03 · 📊 stat.ME · astro-ph.IM· stat.AP

A comparison of methods for Poisson regression in the presence of background

Pith reviewed 2026-05-13 18:59 UTC · model grok-4.3

classification 📊 stat.ME astro-ph.IMstat.AP
keywords Poisson regressionbackground subtractionwstatCash statisticspectral analysisstatistical biasdegrees of freedom
0
0 comments X p. Extension

The pith

Joint parametric fitting avoids bias in Poisson regression with background, unlike wstat or fixed-background methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares three methods for regression on Poisson data with Poisson background: joint parametric fit of source and background, the non-parametric wstat method, and regression with fixed background. It establishes that the wstat and fixed-background approaches produce significantly biased results, particularly in low-count and background-dominated cases. The joint-fit method instead provides unbiased source parameter estimates and supports reliable hypothesis testing via the Cash statistic. It further shows that wstat inflates the effective number of degrees of freedom beyond the source model's free parameters. Accurate handling of background is critical in applications such as astronomical spectral analysis where counts are often sparse.

Core claim

The non-parametric background method is found to be significantly biased, especially in the low-count and background-dominated regimes. Similar conclusions apply to the fixed-background regression. The joint-fit method, on the other hand, simultaneously affords reliable hypothesis testing by means of the usual Cash statistic and unbiased reconstruction of source parameters. The wstat method adds a significantly larger number of degrees of freedom, compared to the number of free parameters in the source model.

What carries the argument

The joint-fit method that models both the source and background with parametric forms simultaneously.

If this is right

  • Source parameters are reconstructed without bias using the joint-fit method.
  • Standard Cash statistic hypothesis testing is reliable under the joint-fit approach.
  • The wstat method increases effective degrees of freedom significantly.
  • Fixed-background regression leads to biased results similar to wstat.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If background structure cannot be captured parametrically, the joint-fit unbiasedness claim may fail.
  • Software for spectral fitting could default to joint modeling when possible to reduce bias.
  • Further tests could compare these methods on real datasets with independent background measurements.

Load-bearing premise

The background signal can be adequately captured by a chosen parametric model.

What would settle it

Generate synthetic Poisson data sets with a known complex non-parametric background, apply the joint-fit assuming a simple parametric background, and check if the recovered source parameters match the input values within expected errors.

Figures

Figures reproduced from arXiv: 2604.02664 by Jelle de Plaa, Massimiliano Bonamente, Vinay Kashyap, Xiaoli Li.

Figure 1
Figure 1. Figure 1: Sample data set and best-fit models for θ = β = 1 and tS = tB = 1, with N = 100. Data points for the source region (red) are drawn from a Poisson distribution with mean θ + β, and for the background region (blue) from a Poisson distribution with mean β. The joint fit has MLE estimates given by    ϕˆ = nB/N ≥ 0, ˆθ = nS/N − ϕˆ (20) where nB is the total number of counts in the back￾ground region, and nS … view at source ↗
Figure 2
Figure 2. Figure 2: Experimental Cumulative Distribution Function (eCDF) for the statistics and best-fit parameters, based on 1,000 simulations with intensity θ = β = 1 and N = 100 data points. assume that the θ-dependence of ˆbi is ignored (see Ap￾pendix B). Therefore the same considerations also apply to the non-parametric background to explain the strong bias in the low-mean regime. Remark 5 (Biases with Wmin and Cmin(FB) … view at source ↗
Figure 3
Figure 3. Figure 3: (Top): eCDFs for the simulations with N = 100, θ = β = 0.1. (Bottom): eCDFs for the simulations with N = 100, θ = 0.1, β = 10 In the background dominated-data regime, Si , Bi ≫ 1 and ˆθ ≪ 1. According to Eq. (6), it is possible to show that d ˆbi(θ) dθ = 1 2 [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: eCDFs for the most extreme case of θ = 0.1, β = 100, N = 100. REFERENCES Andrae, R. 2010, Error estimation in astronomy: A guide. https://arxiv.org/abs/1009.2755 Arnaud, K., Gordon, C., Dorman, B., & Rutkowski, K. 2025, XSPEC User’s Guide, https://heasarc.gsfc.nasa.gov/docs/software/xspec/ manual/node340.html Arnaud, K. A. 1996, in Astr. Data Analysis Software and Systems V, ed. G. H. Jacoby & J. Barnes, V… view at source ↗
read the original abstract

This paper provides a statistical analysis of three common methods of regression for Poisson data in the presence of Poisson background, namely the joint fit with two parametric models for the source and the background, the use of a non-parametric model for the background known as the wstat method, and the regression with a fixed background. The non-parametric background method, which is a popular method for spectral data, is found to be significantly biased, especially in the low-count and background-dominated regimes. Similar conclusions apply to the fixed-background regression. The joint-fit method, on the other hand, simultaneously affords reliable hypothesis testing by means of the usual Cash statistic and unbiased reconstruction of source parameters. We also investigate the effect of non-parametric regression on the number of effective degrees of freedom by means of the Efron degree of freedom function. We find that the wstat method adds a significantly larger number of degrees of freedom, compared to the number of free parameters in the source model. The other two methods have a number of degrees of freedom consistent with the number of adjustable parameters, at least for the simple models investigated in this paper.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript compares three methods for fitting Poisson data in the presence of Poisson background: joint parametric modeling of both source and background, the wstat non-parametric background method, and regression with a fixed background. Monte Carlo simulations show that wstat and fixed-background approaches produce biased source parameter estimates, particularly in low-count and background-dominated regimes. The joint-fit method yields unbiased estimates while permitting standard Cash-statistic hypothesis testing. The paper further applies Efron's effective degrees-of-freedom function and reports that wstat inflates the effective dof relative to the number of free source parameters, whereas the other two methods align with the expected parameter count for the simple models examined.

Significance. If the central simulation results hold, the work is significant for statistical methodology in fields that routinely analyze Poisson counts with background (e.g., X-ray spectroscopy). It supplies concrete evidence that a widely used non-parametric technique can introduce bias and quantifies the accompanying inflation in effective degrees of freedom. The explicit contrast with the joint-fit approach, which preserves both unbiasedness and standard inferential tools, offers a practical recommendation when a parametric background model is defensible. The use of known-truth simulations and the Cash statistic provides a reproducible benchmark.

major comments (2)
  1. [§3] §3 (Simulation study): The reported unbiasedness of the joint-fit method is shown only when the background is generated from the identical parametric family later assumed in the fit. No simulations explore deliberate misspecification (e.g., background with unmodeled curvature or non-parametric structure). Because the abstract already highlights sensitivity in the background-dominated regime, this omission is load-bearing for the general claim that joint fit is unbiased.
  2. [§5] §5 (Effective degrees of freedom): The Efron dof calculation for wstat is stated to yield a significantly larger number than the source-model parameters, yet the manuscript does not supply the explicit formula or the precise handling of the background component inside the dof function. Without this, the quantitative comparison cannot be independently verified.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'simple models investigated in this paper' is used without naming the functional forms; a one-sentence description would improve readability.
  2. [Figures] Figure captions: several panels compare bias across count regimes; adding explicit numerical thresholds (e.g., 'source counts < 10') would make the low-count and background-dominated labels unambiguous.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation of minor revision. We respond to each major comment below and will update the manuscript accordingly.

read point-by-point responses
  1. Referee: [§3] §3 (Simulation study): The reported unbiasedness of the joint-fit method is shown only when the background is generated from the identical parametric family later assumed in the fit. No simulations explore deliberate misspecification (e.g., background with unmodeled curvature or non-parametric structure). Because the abstract already highlights sensitivity in the background-dominated regime, this omission is load-bearing for the general claim that joint fit is unbiased.

    Authors: The simulations in §3 are constructed under the assumption that the parametric forms for both source and background are correctly specified, which matches the intended use case for the joint-fit method when a defensible parametric background model is available. The manuscript contrasts this with wstat and fixed-background approaches, which remain biased even under correct specification. We will revise the abstract and §3 to state this modeling assumption explicitly and add a short discussion paragraph noting that misspecification would generally bias all methods, while the joint-fit approach still permits Cash-statistic diagnostics for model adequacy. This addresses the concern without requiring new simulations for a minor revision. revision: yes

  2. Referee: [§5] §5 (Effective degrees of freedom): The Efron dof calculation for wstat is stated to yield a significantly larger number than the source-model parameters, yet the manuscript does not supply the explicit formula or the precise handling of the background component inside the dof function. Without this, the quantitative comparison cannot be independently verified.

    Authors: We agree that the implementation details were omitted. The Efron effective degrees of freedom is computed via the trace of the hat matrix obtained from the observed information matrix of the Poisson log-likelihood. For wstat the background is profiled out non-parametrically on a per-bin basis, so the effective dof includes both the source-model parameters and an additional term equal to the number of independent background observations. In the revised §5 we will insert the explicit formula, the adaptation to the Cash statistic, and the precise treatment of the background component to permit independent verification. revision: yes

Circularity Check

0 steps flagged

No significant circularity: claims rest on external Monte Carlo benchmarks and standard statistics

full rationale

The paper evaluates the three regression methods via Monte Carlo simulations that generate data from known parametric source and background models, then measures bias and coverage against those independent truth values. This constitutes an external benchmark rather than any reduction of reported bias or degrees-of-freedom results to quantities defined by the fit itself. The Efron effective-degrees-of-freedom function is invoked as an external reference. No self-citations appear as load-bearing premises, no equations equate a 'prediction' to a fitted input by construction, and the joint-fit unbiasedness result is explicitly conditioned on correct parametric specification—the very condition the simulations test. The derivation chain is therefore self-contained against the simulation truths.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard Poisson likelihood assumptions and the validity of the Cash statistic for hypothesis testing; no new entities are introduced and the comparison uses simulation-based validation rather than additional fitted constants.

axioms (2)
  • domain assumption Observed counts follow independent Poisson distributions for source and background components.
    Invoked throughout the comparison of regression methods for count data.
  • standard math The Cash statistic provides reliable hypothesis testing when the model is correctly specified.
    Used to support the claim of reliable testing for the joint-fit method.

pith-pipeline@v0.9.0 · 5506 in / 1391 out tokens · 34419 ms · 2026-05-13T18:59:44.674633+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The non-parametric background method... is found to be significantly biased... The joint-fit method... affords reliable hypothesis testing by means of the usual Cash statistic and unbiased reconstruction of source parameters.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

  1. [1]

    Error estimation in astronomy: A guide

    Andrae, R. 2010, Error estimation in astronomy: A guide. https://arxiv.org/abs/1009.2755 Arnaud, K., Gordon, C., Dorman, B., & Rutkowski, K. 2025, XSPEC User’s Guide, https://heasarc.gsfc.nasa.gov/docs/software/xspec/ manual/node340.html Arnaud, K. A. 1996, in Astr. Data Analysis Software and Systems V, ed. G. H. Jacoby & J. Barnes, Vol. 101, 17 Azzalini,...

  2. [2]

    http://www.jstor.org/stable/2346214 Baker, S., & Cousins, R. D. 1984, Nuclear Instruments and Methods in Physics Research, 221, 437, doi: https://doi.org/10.1016/0167-5087(84)90016-4 Bevington, P. R., & Robinson, D. K. 2003, Data reduction and error analysis for the physical sciences (McGraw Hill, Third Edition) Bishop, Y., Fienberg, S., & Holland, P. 197...

  3. [3]

    http://www.jstor.org/stable/30245113 Mallows, C. L. 1973, Technometrics, 15,

  4. [4]

    2005, Astrophys

    http://www.jstor.org/stable/1267380 Nevalainen, J., Markevitch, M., & Lumb, D. 2005, Astrophys. J., 629, 172, doi: 10.1086/431198 Pawitan, Y. 2001, In All Likelihood: Statistical Modelling and Inference Using Likelihood (Oxford University Press), doi: 10.1093/oso/9780198507659.001.0001 Rothenberg, T. J. 1971, Econometrica, 39,

  5. [5]

    D., Norris, J

    http://www.jstor.org/stable/1913267 Scargle, J. D., Norris, J. P., Jackson, B., & Chiang, J. 2013, Astrophys. J., 764, 167, doi: 10.1088/0004-637X/764/2/167 Spence, D., Bonamente, M., Ahoranta, J., et al. 2024, Monthly Notices of the Royal Astronomical Society, 539, 2088, doi: 10.1093/mnras/stae2590 Spence, D., Bonamente, M., Nevalainen, J., et al. 2023, ...