pith. sign in

arxiv: 1907.09053 · v1 · pith:UPA5FSLBnew · submitted 2019-07-21 · 📊 stat.ML · cs.LG· stat.ME

Some New Results for Poisson Binomial Models

Pith reviewed 2026-05-24 18:10 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords ecological inferencePoisson binomiallogistic regressionmaximum likelihood estimationheteroscedastic Gaussianvoter preferencesaggregate data
0
0 comments X

The pith

The maximum likelihood estimator exists for logistic parameters in ecological inference, and the heteroscedastic Gaussian approximation to the Poisson binomial likelihood has controlled curvature despite not being log-concave.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper extends earlier modeling of voter preferences from aggregate data by proving that the maximum likelihood estimator for the logistic regression coefficients exists. It also establishes results on the curvature of the likelihood, which is approximated by a heteroscedastic Gaussian for the Poisson binomial distribution of summed independent but non-identical Bernoulli trials. These properties matter because they justify reliable optimization when inferring individual-level probabilities from group-level observations such as election returns. The method is then tested on real voter data from Morris County, New Jersey, where it outperforms other ecological inference approaches at predicting individual voting.

Core claim

We prove results about the existence of the MLE and the curvature of this likelihood, which is not log-concave in general. We further demonstrate the utility of our method on a real data example. Using data on voters in Morris County, NJ, we demonstrate that our approach outperforms other ecological inference methods in predicting a related, but known outcome: whether an individual votes.

What carries the argument

The heteroscedastic Gaussian approximation to the Poisson binomial likelihood, which carries the proofs of MLE existence and curvature control for logistic regression parameters estimated from aggregate data.

Load-bearing premise

The logistic regression model for individual probabilities combined with the heteroscedastic Gaussian approximation to the Poisson binomial likelihood is adequate for the claimed MLE existence and curvature properties to transfer to the real-data setting.

What would settle it

A concrete counterexample dataset or parameter vector where numerical optimization of the approximated likelihood fails to converge to a unique point or encounters multiple local maxima would falsify the existence and curvature claims.

read the original abstract

We consider a problem of ecological inference, in which individual-level covariates are known, but labeled data is available only at the aggregate level. The intended application is modeling voter preferences in elections. In Rosenman and Viswanathan (2018), we proposed modeling individual voter probabilities via a logistic regression, and posing the problem as a maximum likelihood estimation for the parameter vector beta. The likelihood is a Poisson binomial, the distribution of the sum of independent but not identically distributed Bernoulli variables, though we approximate it with a heteroscedastic Gaussian for computational efficiency. Here, we extend the prior work by proving results about the existence of the MLE and the curvature of this likelihood, which is not log-concave in general. We further demonstrate the utility of our method on a real data example. Using data on voters in Morris County, NJ, we demonstrate that our approach outperforms other ecological inference methods in predicting a related, but known outcome: whether an individual votes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript extends Rosenman and Viswanathan (2018) on ecological inference for voter preferences. Individual probabilities are modeled via logistic regression, yielding a Poisson binomial likelihood for aggregate counts; this is approximated by a heteroscedastic Gaussian for computational tractability. The paper claims to prove existence of the MLE and to analyze curvature properties of the likelihood (noting it is not log-concave in general), and reports that the method outperforms alternatives on Morris County, NJ voter data when predicting a related observed outcome.

Significance. Theoretical guarantees on MLE existence and curvature for a non-log-concave Poisson binomial likelihood would be useful for ecological inference applications. The real-data demonstration provides a concrete test case. However, because the implemented procedure optimizes the Gaussian surrogate rather than the exact likelihood for which the proofs are stated, the practical significance hinges on whether those properties carry over to the approximation actually used.

major comments (2)
  1. [Abstract] Abstract: the existence and curvature results are stated for the Poisson binomial likelihood, yet the text immediately notes that this likelihood 'is approximated ... for computational efficiency' and that the real-data example optimizes the heteroscedastic Gaussian surrogate. No indication is given that the stated theorems apply to the objective actually maximized.
  2. [Abstract] Abstract (and implied methods): the central claims concern MLE existence and curvature for the exact convolution structure of the Poisson binomial; because the optimization performed on real data uses the Gaussian approximation, it is necessary to verify (or extend the proofs to show) that the same existence and curvature properties hold for the surrogate likelihood.
minor comments (1)
  1. [Abstract] Clarify in the abstract and introduction whether 'this likelihood' refers to the exact Poisson binomial or to the Gaussian approximation used in practice.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their comments, which highlight an important distinction between our theoretical results and the computational approach. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the existence and curvature results are stated for the Poisson binomial likelihood, yet the text immediately notes that this likelihood 'is approximated ... for computational efficiency' and that the real-data example optimizes the heteroscedastic Gaussian surrogate. No indication is given that the stated theorems apply to the objective actually maximized.

    Authors: We agree that the abstract should more clearly delineate the scope of the theorems. The existence of the MLE and the analysis of curvature (non-log-concavity) are proven for the exact Poisson binomial likelihood. The Gaussian approximation is introduced solely for computational tractability in optimization and inference. We will revise the abstract to explicitly note that the theoretical results apply to the exact likelihood, while the implemented procedure uses the surrogate. This clarification will be made in the revised manuscript. revision: yes

  2. Referee: [Abstract] Abstract (and implied methods): the central claims concern MLE existence and curvature for the exact convolution structure of the Poisson binomial; because the optimization performed on real data uses the Gaussian approximation, it is necessary to verify (or extend the proofs to show) that the same existence and curvature properties hold for the surrogate likelihood.

    Authors: The central claims are for the exact Poisson binomial model as stated. While we acknowledge that the real-data optimization uses the Gaussian surrogate, we do not claim that the exact MLE existence or curvature properties hold for the surrogate. The surrogate is used as a practical approximation, and its performance is validated empirically on the Morris County data. Extending the theoretical results to the surrogate would constitute a separate and substantial undertaking, as the Gaussian is a different (approximating) objective. We believe the current separation—exact theory for the model, approximation for computation—is appropriate and will clarify this in the text. If additional analysis is required, this could be addressed in future work. revision: partial

standing simulated objections not resolved
  • Verification or extension of the MLE existence and non-log-concavity proofs to the heteroscedastic Gaussian surrogate likelihood used in the real-data experiments.

Circularity Check

0 steps flagged

Minor self-citation to 2018 model; new MLE existence and curvature proofs are independent extensions

full rationale

The paper cites Rosenman and Viswanathan (2018) only to establish the logistic regression setup and heteroscedastic Gaussian approximation for computation. The claimed new results are mathematical proofs on existence of the MLE and curvature properties for the exact Poisson binomial likelihood. These proofs are presented as extensions and do not reduce by construction to fitted parameters, self-referential definitions, or load-bearing self-citations. The approximation is used in practice but the theorems target the exact PMF; no step equates a prediction to its own input by definition. This is the normal case of incremental work with a prior citation that is not circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities beyond standard logistic regression and Poisson binomial assumptions; full manuscript would be needed to audit any implicit modeling choices.

pith-pipeline@v0.9.0 · 5686 in / 1058 out tokens · 22361 ms · 2026-05-24T18:10:50.484687+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.