pith. sign in

arxiv: 1907.07850 · v2 · pith:NPE7AAVEnew · submitted 2019-07-18 · 📊 stat.AP

Interval estimators for inequality measures using grouped data

Pith reviewed 2026-05-24 19:50 UTC · model grok-4.3

classification 📊 stat.AP
keywords inequality measuresgrouped databootstrap intervalsWald intervalsquantile-based measuresGini indexGeneralized Lambda Distribution
0
0 comments X

The pith

Bootstrap and Wald-type intervals achieve good coverage for quantile-based inequality measures with only grouped data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that reliable confidence intervals for quantile-based inequality measures can be obtained even when data are available only as grouped frequencies or bins. It demonstrates this by constructing bootstrap and Wald-type intervals after approximating the underlying distribution with the Generalized Lambda Distribution or, when available, linear interpolation using group means. These intervals typically attain better coverage than comparable methods for the Gini index in the grouped-data setting. The approach is illustrated on both simulated and real income datasets to show practical applicability when individual records cannot be released.

Core claim

When income data are supplied only in grouped form, bootstrap and Wald-type intervals for quantile-based inequality measures attain good coverage probabilities by first approximating the underlying density via the Generalized Lambda Distribution or linear interpolation; these coverages are typically superior to those obtained for the Gini index under the same grouped-data constraints.

What carries the argument

Bootstrap and Wald-type interval estimators for quantile-based measures, constructed after approximating the density from grouped data using the Generalized Lambda Distribution or linear interpolation.

If this is right

  • Quantile-based inequality measures become usable for statistical inference when only binned summary tables are released.
  • Linear interpolation becomes a viable approximation route whenever group means accompany the frequency counts.
  • The Gini index remains harder to interval-estimate reliably from the same grouped data.
  • Real-data applications become feasible for privacy-protected income tables without requiring microdata access.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The methods could be stress-tested against alternative binning schemes or heavier-tailed distributions to map their robustness limits.
  • Similar approximation-plus-bootstrap pipelines might apply to other summary statistics that depend on order statistics.
  • Policy analysts could compare coverage performance across different inequality indices on the same grouped releases to choose the most stable option.

Load-bearing premise

The Generalized Lambda Distribution and linear interpolation approximations are accurate enough representations of the true income distribution to produce the claimed coverage properties for the intervals.

What would settle it

A simulation study or real dataset in which the empirical coverage of the proposed intervals falls well below the nominal level when the true distribution deviates from the Generalized Lambda or linear-interpolation approximations.

read the original abstract

Income inequality measures are often used as an indication of economic health. How to obtain reliable confidence intervals for these measures based on sampled data has been studied extensively in recent years. To preserve confidentiality, income data is often made available in summary form only (i.e. histograms, frequencies between quintiles, etc.). In this paper, we show that good coverage can be achieved for bootstrap and Wald-type intervals for quantile-based measures when only grouped (binned) data are available. These coverages are typically superior to those that we have been able to achieve for intervals for popular measures such as the Gini index in this grouped data setting. To facilitate the bootstrapping, we use the Generalized Lambda Distribution and also a linear interpolation approximation method to approximate the underlying density. The latter is possible when groups means are available. We also apply our methods to real data sets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper develops methods for constructing bootstrap and Wald-type confidence intervals for quantile-based inequality measures when only grouped (binned) data are available. It approximates the underlying density via the Generalized Lambda Distribution fitted to bin probabilities or via linear interpolation of the CDF (when group means are supplied), generates pseudo-samples for bootstrapping, and applies the resulting intervals to real datasets, claiming that good coverage is achieved and that these intervals are typically superior to those obtainable for the Gini index in the grouped-data setting.

Significance. If the coverage properties hold under the stated approximations, the work would supply practical tools for reliable inference on inequality measures from the binned summaries that are commonly released for confidentiality reasons (e.g., census or tax data). The approach extends existing micro-data interval methods to the grouped case using standard bootstrap and Wald machinery, which is a direct and potentially useful contribution provided the approximation error does not materially affect the finite-sample distributions of the target measures.

major comments (3)
  1. [Abstract and §1] Abstract and §1: The central claim that 'good coverage can be achieved' for the bootstrap and Wald intervals and that these coverages are 'typically superior' to those for the Gini index is asserted without any accompanying quantitative evidence—simulation design, number of replications, coverage probabilities, or error analysis—in the manuscript. The support is said to rest on real-data applications whose details and results are not reported.
  2. [§3] §3 (GLD and linear-interpolation approximations): The validity of the reported coverage properties hinges on the four-parameter GLD and the piecewise-linear CDF interpolation being sufficiently accurate representations of the true income distribution. No analytic bound is supplied on the resulting distortion to the sampling distribution of the quantile-based measures, and it is unclear whether any simulation design tested distributions with features (heavy tails, multimodality, point masses) that lie outside the GLD family.
  3. [Simulation study] Simulation study (wherever presented): If Monte Carlo results exist, they must be reported in a table that shows, for each quantile-based measure and each grouped-data configuration, the empirical coverage of the bootstrap and Wald intervals together with the corresponding figures for the Gini index; without such a table the superiority claim cannot be evaluated.
minor comments (2)
  1. [§2] Notation for the quantile-based measures (e.g., inter-quantile ratios or shares) should be defined explicitly in §2 before the approximation methods are introduced.
  2. [§3] The manuscript should state the precise fitting criterion used for the GLD parameters (maximum likelihood on bin probabilities, moment matching, etc.) and any constraints imposed on the parameter space.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the thoughtful comments, which highlight areas where the manuscript's evidence and discussion can be strengthened. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract and §1] Abstract and §1: The central claim that 'good coverage can be achieved' for the bootstrap and Wald intervals and that these coverages are 'typically superior' to those for the Gini index is asserted without any accompanying quantitative evidence—simulation design, number of replications, coverage probabilities, or error analysis—in the manuscript. The support is said to rest on real-data applications whose details and results are not reported.

    Authors: We acknowledge that the real-data applications were not presented with sufficient quantitative detail to fully support the coverage claims. In the revised manuscript we will add tables (in a new results subsection or appendix) that report the estimated measures, the constructed bootstrap and Wald intervals, and direct numerical comparisons with the corresponding Gini intervals for each dataset. This will make the supporting evidence explicit and allow readers to evaluate the 'typically superior' claim. revision: yes

  2. Referee: [§3] §3 (GLD and linear-interpolation approximations): The validity of the reported coverage properties hinges on the four-parameter GLD and the piecewise-linear CDF interpolation being sufficiently accurate representations of the true income distribution. No analytic bound is supplied on the resulting distortion to the sampling distribution of the quantile-based measures, and it is unclear whether any simulation design tested distributions with features (heavy tails, multimodality, point masses) that lie outside the GLD family.

    Authors: We agree that an analytic bound on the approximation-induced distortion would be valuable but is technically difficult to obtain for these nonlinear functionals of the quantile function. The GLD was selected for its documented flexibility with income-type distributions that commonly exhibit heavy tails; the linear-interpolation method is used only when group means are supplied. We will expand the discussion in §3 to explicitly note the lack of an analytic error bound, to describe the range of distributions for which the approximations are expected to perform well, and to acknowledge that multimodality or point masses may not be captured accurately. revision: partial

  3. Referee: [Simulation study] Simulation study (wherever presented): If Monte Carlo results exist, they must be reported in a table that shows, for each quantile-based measure and each grouped-data configuration, the empirical coverage of the bootstrap and Wald intervals together with the corresponding figures for the Gini index; without such a table the superiority claim cannot be evaluated.

    Authors: The present manuscript contains no Monte Carlo simulation study; the coverage statements rest on the real-data applications. To address the referee's request for quantitative coverage evidence, we will add a simulation section that generates data from a range of distributions (including heavy-tailed, multimodal, and non-GLD cases), applies the grouped-data procedures, and reports empirical coverage rates for the bootstrap and Wald intervals alongside the Gini intervals in the requested tabular format. revision: yes

standing simulated objections not resolved
  • Supplying a rigorous analytic bound on the distortion that the GLD or linear-CDF approximations introduce into the finite-sample distribution of the quantile-based inequality measures.

Circularity Check

0 steps flagged

No significant circularity; standard methods applied to grouped data approximations

full rationale

The paper's core contribution is the application of bootstrap resampling and Wald-type intervals to quantile-based inequality measures computed from grouped (binned) data, using the Generalized Lambda Distribution (a pre-existing four-parameter family) or linear CDF interpolation as density approximations when group means are available. Coverage is assessed via simulation on known distributions and real data examples. No derivation step reduces by construction to its own fitted inputs, no self-citation forms a load-bearing uniqueness claim, and no prediction is statistically forced by the fitting process itself. The approximations are external tools whose accuracy is an empirical question separate from the interval construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the adequacy of two density approximations (GLD and linear interpolation) for bootstrap inference from grouped data and on the assumption that grouped data retain enough information for reliable quantile-based inference.

free parameters (1)
  • Generalized Lambda Distribution parameters
    Four parameters of the GLD are fitted to the grouped data to enable bootstrap resampling; these are data-dependent and central to the procedure.
axioms (1)
  • domain assumption The income distribution can be adequately approximated by the Generalized Lambda Distribution or by linear interpolation between group means for the purpose of producing bootstrap and Wald intervals with good coverage.
    This assumption is invoked to justify the use of the approximations when only binned data are available.

pith-pipeline@v0.9.0 · 5673 in / 1302 out tokens · 28645 ms · 2026-05-24T19:50:01.861935+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.