pith. sign in

arxiv: 2406.09195 · v6 · pith:E3CMPZT4new · submitted 2024-06-13 · 📊 stat.ME · math.ST· physics.data-an· stat.CO· stat.TH

On the statistical analysis of grouped data: when Pearson chi² and other divisible statistics are not goodness-of-fit tests

classification 📊 stat.ME math.STphysics.data-anstat.COstat.TH
keywords testsstatisticsanalysisdivisiblegoodness-of-fitanalyzeddatagrouped
0
0 comments X
read the original abstract

Thousands of experiments are analyzed, and papers are published each year involving the statistical analysis of grouped data. While this area of statistics is often perceived -- somewhat naively -- as saturated, several misconceptions still affect everyday practice, and new frontiers have so far remained unexplored. Researchers must be aware of the limitations affecting their analyses and what new possibilities are at their hands. The article introduces a unifying approach to the analysis of divisible statistics -- that includes Pearson's $\chi^2$, the likelihood ratio, and spectral statistics, as special cases -- when a statistician deals with a large number of bins/groups, thus leading to a large number of small or moderate frequencies. Performance of the tests is analyzed against the class of contiguous (local) alternatives. Perhaps the most surprising result here is that, in this `sparse' regime, most of the tests proposed in the literature can be modified to produce more powerful tests, and no single test based on a divisible statistic leads to a goodness-of-fit test. Distribution-free goodness-of-fit tests are also constructed.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Compensator-Based Inference for Signal Detection Under Unknown Background

    stat.ME 2026-05 unverdicted novelty 6.0

    Estimating a single compensator parameter suffices to infer signal intensity under unknown background and controls inference conservativeness.