Identification and Semiparametric Estimation of Conditional Means from Aggregate Data

Cory McCartan; Shiro Kuriwaki

arxiv: 2509.20194 · v2 · submitted 2025-09-24 · 📊 stat.ME · econ.EM

Identification and Semiparametric Estimation of Conditional Means from Aggregate Data

Cory McCartan , Shiro Kuriwaki This is my paper

Pith reviewed 2026-05-18 13:53 UTC · model grok-4.3

classification 📊 stat.ME econ.EM

keywords ecological inferenceaggregate dataconditional meanssemiparametric estimationdebiased machine learningsensitivity analysisidentification conditions

0 comments

The pith

A new method estimates conditional means from aggregate data using weaker conditions that hold given covariates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a way to recover the average outcome for different groups when only group averages across larger units are observed. It shows that identification is possible under conditional independence assumptions given observed covariates rather than stronger unconditional ones. To handle many covariates, it introduces a debiased machine learning estimator that keeps nuisance functions in a partially linear form. This setup also supports sensitivity checks for assumption violations and a test for the assumption itself, with valid confidence intervals for local estimates. Readers in social science and statistics would care because it makes ecological inference more reliable without requiring complete individual data.

Core claim

Under weaker conditions for identification that hold conditionally on covariates, the mean of an outcome within groups can be estimated from aggregate data using a debiased machine learning estimator based on nuisance functions restricted to a partially linear form, which also enables semiparametric sensitivity analysis for violations of the key assumption.

What carries the argument

debiased machine learning estimator with nuisance functions restricted to a partially linear form, which controls for covariates and supports sensitivity analysis

If this is right

Efficient control for many covariates is possible without strong parametric assumptions on the nuisance functions.
Semiparametric sensitivity analysis quantifies the impact of violations of the identifying assumption.
A nonparametric test can assess the validity of the key identifying assumption directly.
Asymptotically valid confidence intervals can be derived for local, unit-level estimates under additional assumptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be applied to other settings with aggregated data such as in public health or economic surveys where individual records are unavailable.
The sensitivity analysis offers a tool for researchers to prioritize collection of additional covariates that might strengthen identification.
Integration with existing software for aggregate data analysis could allow routine robustness reporting in applied work.

Load-bearing premise

The aggregation process satisfies a form of conditional independence or no unmeasured confounding given the observed covariates.

What would settle it

A validation exercise on data with known ground truth where the estimates change substantially under plausible violations of the conditional independence assumption or where the nonparametric test rejects the assumption.

read the original abstract

We introduce a new method for estimating the mean of an outcome variable within groups when researchers only observe the average of the outcome and group indicators across a set of aggregation units, such as geographical areas. Existing methods for this problem, also known as ecological inference, implicitly make strong assumptions about the aggregation process. We first formalize weaker conditions for identification which hold conditionally on covariates. To efficiently control for many covariates, we propose a debiased machine learning estimator that is based on nuisance functions restricted to a partially linear form. Our estimator admits a semiparametric sensitivity analysis which allows researchers to evaluate the impact of violations of the key identifying assumption. We also propose a nonparametric test for the identifying assumption itself. Finally, we derive asymptotically valid confidence intervals for local, unit-level estimates under additional assumptions. Simulations and validation on real-world data where ground truth is available demonstrate the advantages of our approach over existing methods. Open-source software is available which implements the proposed methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable semiparametric route to subgroup means from aggregates under conditional assumptions, with sensitivity analysis and a test attached.

read the letter

The core advance is formalizing identification that holds conditionally on covariates rather than unconditionally, then delivering a debiased partially linear ML estimator that scales to many controls. They add a semiparametric sensitivity analysis for violations of that conditional independence and a nonparametric test for the assumption itself, plus unit-level intervals under extra conditions. Simulations and real-data checks with ground truth are included, along with open software.

Referee Report

3 major / 2 minor

Summary. The paper introduces a semiparametric method for estimating conditional means of an outcome variable from aggregate (ecological) data, where only group averages and indicators are observed. It formalizes weaker identification conditions that hold conditionally on covariates, proposes a debiased machine learning estimator based on partially linear nuisance functions to handle many covariates, develops a semiparametric sensitivity analysis for violations of the key assumption, includes a nonparametric test for the identifying assumption, and derives asymptotically valid unit-level confidence intervals under additional assumptions. The approach is supported by simulations and validation on real-world data with ground truth, along with open-source software.

Significance. If the central results hold, this contributes a practically useful advance in ecological inference by relaxing strong implicit assumptions in prior methods and integrating modern debiased ML tools for high-dimensional settings. The sensitivity analysis and nonparametric test for the identifying assumption are particularly valuable for applied work in social sciences and epidemiology. Credit is due for the open-source software implementation, simulation studies, and real-data validation with known ground truth, which support reproducibility and empirical assessment of the method.

major comments (3)

[§2] §2 (Identification): The weaker conditional identification conditions are formalized as a form of conditional independence or no unmeasured confounding given covariates. However, this remains load-bearing for the entire estimator, sensitivity analysis, and unit-level CIs; the manuscript should explicitly address whether unmeasured group-level or spatial factors (common in aggregate data) could violate the condition even after conditioning on observed covariates, with a concrete discussion or counterexample.
[§4] §4 (Estimator), partially linear nuisance restriction: The debiased ML estimator restricts nuisance functions to a partially linear form to control for many covariates. It is unclear whether this restriction preserves double robustness or the claimed asymptotic properties relative to fully nonparametric nuisances; a derivation or reference showing the semiparametric efficiency bound under this restriction is needed.
[Sensitivity analysis] Sensitivity analysis section: The semiparametric sensitivity analysis is a strength for evaluating violations, but lacks specific guidance on calibrating or bounding the sensitivity parameters in finite samples or applied settings. This detail is load-bearing for the practical utility claimed in the abstract.

minor comments (2)

[Abstract] Abstract: Mentions open-source software but does not include the repository link or package name.
[Simulations] Simulation tables: Reported performance metrics (bias, RMSE) would be clearer with accompanying standard errors or interval estimates across replications.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments highlight important areas for clarification and strengthening, particularly around the identifying assumptions, the properties of the partially linear nuisance restriction, and practical guidance for the sensitivity analysis. We address each major comment below and will incorporate revisions to improve the manuscript.

read point-by-point responses

Referee: §2 (Identification): The weaker conditional identification conditions are formalized as a form of conditional independence or no unmeasured confounding given covariates. However, this remains load-bearing for the entire estimator, sensitivity analysis, and unit-level CIs; the manuscript should explicitly address whether unmeasured group-level or spatial factors (common in aggregate data) could violate the condition even after conditioning on observed covariates, with a concrete discussion or counterexample.

Authors: We agree that an explicit discussion of potential violations from unmeasured group-level or spatial factors is valuable. In the revised version, we will add a dedicated paragraph in §2 that discusses how unobserved spatial autocorrelation or group-level confounders (e.g., unmeasured neighborhood effects in geographic aggregates) could violate the conditional independence assumption even after conditioning on observed covariates. We will provide a concrete counterexample involving spatially correlated residuals in ecological data and explain the implications for the estimator, sensitivity analysis, and unit-level confidence intervals. This addition will clarify the scope and limitations of the identifying conditions without changing the formal results. revision: yes
Referee: §4 (Estimator), partially linear nuisance restriction: The debiased ML estimator restricts nuisance functions to a partially linear form to control for many covariates. It is unclear whether this restriction preserves double robustness or the claimed asymptotic properties relative to fully nonparametric nuisances; a derivation or reference showing the semiparametric efficiency bound under this restriction is needed.

Authors: The partially linear restriction is imposed to enable scalable estimation with high-dimensional covariates while preserving key robustness properties. We will add a derivation in the appendix demonstrating that the estimator remains doubly robust and attains the semiparametric efficiency bound within the partially linear nuisance class. We will also cite relevant results from the debiased ML literature (e.g., Chernozhukov et al. on double/debiased machine learning for partially linear models) to support the asymptotic claims. This addresses the concern directly and confirms that the restriction does not compromise the stated properties relative to the model class considered. revision: yes
Referee: Sensitivity analysis section: The semiparametric sensitivity analysis is a strength for evaluating violations, but lacks specific guidance on calibrating or bounding the sensitivity parameters in finite samples or applied settings. This detail is load-bearing for the practical utility claimed in the abstract.

Authors: We recognize that concrete guidance on calibrating and bounding the sensitivity parameters would strengthen the practical applicability. In the revision, we will expand the sensitivity analysis section with recommendations for choosing bounds based on substantive knowledge, such as ranges informed by prior literature or plausible violation magnitudes in social science and epidemiology applications. We will also include a brief discussion of finite-sample considerations, supported by additional simulation results that illustrate how different bound choices affect inference in moderate sample sizes. These additions will provide actionable guidance without altering the core semiparametric framework. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation draws from external semiparametric and ML theory.

full rationale

The paper formalizes weaker conditional identification conditions (conditional independence or no unmeasured confounding given covariates) and derives a debiased ML estimator under a partially linear nuisance restriction. This follows standard semiparametric estimation frameworks and external machine-learning results rather than defining the target functional or estimator in terms of its own fitted values. The sensitivity analysis, nonparametric test, and unit-level CIs are presented as extensions under additional assumptions, without any quoted reduction of the main result to a self-referential fit or self-citation chain. No equations or steps exhibit self-definitional, fitted-input, or ansatz-smuggling patterns. The framework is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on a conditional identification assumption whose precise statement is not given in the abstract, plus standard regularity conditions for debiased machine learning and asymptotic normality of the resulting estimator. No free parameters or invented entities are mentioned.

axioms (1)

domain assumption Weaker conditions for identification hold conditionally on covariates
Stated in the abstract as the foundation for the estimator and sensitivity analysis.

pith-pipeline@v0.9.0 · 5694 in / 1251 out tokens · 31894 ms · 2026-05-18T13:53:03.820016+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We first formalize weaker conditions for identification which hold conditionally on covariates... debiased machine learning estimator that is based on nuisance functions restricted to a partially linear form.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption CAR (Coarsening at random)... η₀(Z_G)ᵀ X_G

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

and Rivers, D

Ansolabehere, S. and Rivers, D. (1995). Bias in ecological regression. Working paper

work page 1995
[2]

and Hall, P

Beran, R. and Hall, P. (1992). Estimating coefficient distributions in random coefficient regressions. The annals of Statistics , pages 1970--1984

work page 1992
[3]

Bontemps, C., Florens, J.-P., and Meddahi, N. (2025). Functional ecological inference. Journal of Econometrics , 248:105918

work page 2025
[4]

Breunig, C. (2021). Varying random coefficient models. Journal of Econometrics , 221(2):381--408

work page 2021
[5]

Stability revisited: new generalisation bounds for the Leave-one-Out

Celisse, A. and Guedj, B. (2016). Stability revisited: new generalisation bounds for the leave-one-out. arXiv preprint arXiv:1608.06412

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Chen, Q., Syrgkanis, V., and Austern, M. (2022). Debiased machine learning without sample-splitting for stable estimators. Advances in Neural Information Processing Systems , 35:3096--3109

work page 2022
[7]

Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. Handbook of econometrics , 6:5549--5632

work page 2007
[8]

Chernozhukov, V., Cinelli, C., Newey, W., Sharma, A., and Syrgkanis, V. (2024). Long story short: Omitted variable bias in causal machine learning. arXiv preprint arXiv:2112.13398

work page arXiv 2024
[9]

K., and Singh, R

Chernozhukov, V., Newey, W. K., and Singh, R. (2022). Debiased machine learning of global and local parameters using regularized riesz representers. The Econometrics Journal , 25(3):576--601

work page 2022
[10]

Cho, W. T. and Manski, C. F. (2008). Cross-Level/Ecological Inference , chapter 24, pages 547--569

work page 2008
[11]

Cross, P. J. and Manski, C. F. (2002). Regressions, short and long. Econometrica , 70(1):357--368

work page 2002
[12]

Duncan, O. D. and Davis, B. (1953). An alternative to ecological correlation. American Sociological Review

work page 1953
[13]

Fan, Y., Sherman, R., and Shum, M. (2016). Estimation and inference in an ecological inference model. Journal of Econometric Methods , 5(1):17--48

work page 2016
[14]

and Rosenman, E

Fishman, N. and Rosenman, E. (2024). Estimating vote choice in us elections with approximate poisson-binomial logistic regression. In OPT 2024: Optimization for Machine Learning

work page 2024
[15]

R., Wang, Y.-X., and Smola, A

Flaxman, S. R., Wang, Y.-X., and Smola, A. J. (2015). Who supported O bama in 2012? E cological inference through distribution regression. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 289--298

work page 2015
[16]

A., Klein, S

Freedman, D. A., Klein, S. P., Ostland, M., and Roberts, M. R. (1998). A solution to the 'ecological inference' problem. Journal of the American Statistical Association , 93(444):1518--1521

work page 1998
[17]

Goodman, L. A. (1953). Ecological regressions and behavior of individuals. American Sociological Review , 18(6):663

work page 1953
[18]

Goodman, L. A. (1959). Some alternatives to ecological correlation. American Journal of Sociology , 64(6):610--625

work page 1959
[19]

and Robins, J

Greenland, S. and Robins, J. (1994). Invited commentary: ecologic studies—biases, misconceptions, and counterexamples. American journal of epidemiology , 139(8):747--760

work page 1994
[20]

Greiner, D. J. (2006). Ecological inference in voting rights act disputes: Where are we now, and where do we want to be. Jurimetrics , 47:115

work page 2006
[21]

Greiner, J. D. and Quinn, K. M. (2009). R C ecological inference: bounds, correlations, flexibility and transparency of assumptions. Journal of the Royal Statistical Society Series A: Statistics in Society , 172(1):67--81

work page 2009
[22]

Heitjan, D. F. and Rubin, D. B. (1991). Ignorability and coarse data. The Annals of Statistics , pages 2244--2253

work page 1991
[23]

Helwig, N. E. (2022). Robust permutation tests for penalized splines. Stats , 5(3):916--933

work page 2022
[24]

Huang, J. Z. (2001). Concave extended linear modeling: a theoretical synthesis. Statistica Sinica , pages 173--197

work page 2001
[25]

Imai, K., Lu, Y., and Strauss, A. (2008). Bayesian and likelihood inference for 2 2 ecological tables: An incomplete-data approach. Political Analysis , 16(1):41--69

work page 2008
[26]

Jbaily, A., Zhou, X., Liu, J., Lee, T.-H., Kamareddine, L., Verguet, S., and Dominici, F. (2022). Air pollution exposure disparities across us population and income groups. Nature , 601(7892):228--233

work page 2022
[27]

Jiang, W., King, G., Schmaltz, A., and Tanner, M. A. (2020). Ecological regression with partial identification. Political Analysis , 28(1):65--86

work page 2020
[28]

Judge, G. G. and Cho, T. (2004). An information theoretic approach to ecological estimation. In King, G., Tanner, M. A., and Rosen, O., editors, Ecological Inference: New Methodological Strategies , chapter 7, page 162. Cambridge University Press

work page 2004
[29]

Kennedy, P. E. and Cade, B. S. (1996). Randomization tests for multiple regression. Communications in Statistics-Simulation and Computation , 25(4):923--936

work page 1996
[30]

King, G. (1997). A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data . Princeton University Press

work page 1997
[31]

and McCartan, C

Kuriwaki, S. and McCartan, C. (2025). The role of confounders and linearity in ecological inference: A reassessment. Working paper

work page 2025
[32]

Manski, C. F. (2018). Credible ecological inference for medical decisions with personalized risk assessment. Quantitative Economics , 9(2):541--569

work page 2018
[33]

and Kuriwaki, S

McCartan, C. and Kuriwaki, S. (2025). seine: Semiparametric Ecological Inference . R package

work page 2025
[34]

Muzellec, B., Nock, R., Patrini, G., and Nielsen, F. (2017). Tsallis regularized optimal transport and ecological inference. In Proceedings of the AAAI conference on artificial intelligence , volume 31

work page 2017
[35]

Newey, W. K. (1994). The asymptotic variance of semiparametric estimators. Econometrica: Journal of the Econometric Society , pages 1349--1382

work page 1994
[36]

U., Mammen, E., Lee, Y

Park, B. U., Mammen, E., Lee, Y. K., and Lee, E. R. (2015). Varying coefficient regression models: a review and new developments. International Statistical Review , 83(1):36--64

work page 2015
[37]

Patil, P., Wei, Y., Rinaldo, A., and Tibshirani, R. (2021). Uniform consistency of cross-validation estimators for high-dimensional ridge regression. In International conference on artificial intelligence and statistics , pages 3178--3186. PMLR

work page 2021
[38]

T., Porter, P., Mobley, J., and Hurley, F

Rao, S. T., Porter, P., Mobley, J., and Hurley, F. (2011). Understanding the spatio-temporal variability in air pollution concentrations. Environ. Manage , 70:42--48

work page 2011
[39]

Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review , 15(3):351--357

work page 1950
[40]

Rosen, O., Jiang, W., King, G., and Tanner, M. A. (2001). Bayesian and frequentist inference for ecological inference: The R C case. Statistica Neerlandica , 55(2):134--156

work page 2001
[41]

Singh, R., Xu, L., and Gretton, A. (2024). Kernel methods for causal functions: dose, heterogeneous and incremental response curves. Biometrika , 111(2):497--516

work page 2024
[42]

2022 Precinct-Level Election Results

Voting and Election Science Team (2022). 2022 Precinct-Level Election Results

work page 2022
[43]

and Petunin, Y

Vysochanskij, D. and Petunin, Y. I. (1980). Justification of the 3 rule for unimodal distributions. Theory of Probability and Mathematical Statistics , 21(25-36)

work page 1980
[44]

Wakefield, J. (2004). Ecological inference for 2 2 tables (with discussion). Journal of the Royal Statistical Society Series A: Statistics in Society , 167(3):385--445

work page 2004
[45]

and Simon, N

Zhang, T. and Simon, N. (2023). Regression in tensor product spaces by the method of sieves. Electronic journal of statistics , 17(2):3660

work page 2023

[1] [1]

and Rivers, D

Ansolabehere, S. and Rivers, D. (1995). Bias in ecological regression. Working paper

work page 1995

[2] [2]

and Hall, P

Beran, R. and Hall, P. (1992). Estimating coefficient distributions in random coefficient regressions. The annals of Statistics , pages 1970--1984

work page 1992

[3] [3]

Bontemps, C., Florens, J.-P., and Meddahi, N. (2025). Functional ecological inference. Journal of Econometrics , 248:105918

work page 2025

[4] [4]

Breunig, C. (2021). Varying random coefficient models. Journal of Econometrics , 221(2):381--408

work page 2021

[5] [5]

Stability revisited: new generalisation bounds for the Leave-one-Out

Celisse, A. and Guedj, B. (2016). Stability revisited: new generalisation bounds for the leave-one-out. arXiv preprint arXiv:1608.06412

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

Chen, Q., Syrgkanis, V., and Austern, M. (2022). Debiased machine learning without sample-splitting for stable estimators. Advances in Neural Information Processing Systems , 35:3096--3109

work page 2022

[7] [7]

Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. Handbook of econometrics , 6:5549--5632

work page 2007

[8] [8]

Chernozhukov, V., Cinelli, C., Newey, W., Sharma, A., and Syrgkanis, V. (2024). Long story short: Omitted variable bias in causal machine learning. arXiv preprint arXiv:2112.13398

work page arXiv 2024

[9] [9]

K., and Singh, R

Chernozhukov, V., Newey, W. K., and Singh, R. (2022). Debiased machine learning of global and local parameters using regularized riesz representers. The Econometrics Journal , 25(3):576--601

work page 2022

[10] [10]

Cho, W. T. and Manski, C. F. (2008). Cross-Level/Ecological Inference , chapter 24, pages 547--569

work page 2008

[11] [11]

Cross, P. J. and Manski, C. F. (2002). Regressions, short and long. Econometrica , 70(1):357--368

work page 2002

[12] [12]

Duncan, O. D. and Davis, B. (1953). An alternative to ecological correlation. American Sociological Review

work page 1953

[13] [13]

Fan, Y., Sherman, R., and Shum, M. (2016). Estimation and inference in an ecological inference model. Journal of Econometric Methods , 5(1):17--48

work page 2016

[14] [14]

and Rosenman, E

Fishman, N. and Rosenman, E. (2024). Estimating vote choice in us elections with approximate poisson-binomial logistic regression. In OPT 2024: Optimization for Machine Learning

work page 2024

[15] [15]

R., Wang, Y.-X., and Smola, A

Flaxman, S. R., Wang, Y.-X., and Smola, A. J. (2015). Who supported O bama in 2012? E cological inference through distribution regression. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 289--298

work page 2015

[16] [16]

A., Klein, S

Freedman, D. A., Klein, S. P., Ostland, M., and Roberts, M. R. (1998). A solution to the 'ecological inference' problem. Journal of the American Statistical Association , 93(444):1518--1521

work page 1998

[17] [17]

Goodman, L. A. (1953). Ecological regressions and behavior of individuals. American Sociological Review , 18(6):663

work page 1953

[18] [18]

Goodman, L. A. (1959). Some alternatives to ecological correlation. American Journal of Sociology , 64(6):610--625

work page 1959

[19] [19]

and Robins, J

Greenland, S. and Robins, J. (1994). Invited commentary: ecologic studies—biases, misconceptions, and counterexamples. American journal of epidemiology , 139(8):747--760

work page 1994

[20] [20]

Greiner, D. J. (2006). Ecological inference in voting rights act disputes: Where are we now, and where do we want to be. Jurimetrics , 47:115

work page 2006

[21] [21]

Greiner, J. D. and Quinn, K. M. (2009). R C ecological inference: bounds, correlations, flexibility and transparency of assumptions. Journal of the Royal Statistical Society Series A: Statistics in Society , 172(1):67--81

work page 2009

[22] [22]

Heitjan, D. F. and Rubin, D. B. (1991). Ignorability and coarse data. The Annals of Statistics , pages 2244--2253

work page 1991

[23] [23]

Helwig, N. E. (2022). Robust permutation tests for penalized splines. Stats , 5(3):916--933

work page 2022

[24] [24]

Huang, J. Z. (2001). Concave extended linear modeling: a theoretical synthesis. Statistica Sinica , pages 173--197

work page 2001

[25] [25]

Imai, K., Lu, Y., and Strauss, A. (2008). Bayesian and likelihood inference for 2 2 ecological tables: An incomplete-data approach. Political Analysis , 16(1):41--69

work page 2008

[26] [26]

Jbaily, A., Zhou, X., Liu, J., Lee, T.-H., Kamareddine, L., Verguet, S., and Dominici, F. (2022). Air pollution exposure disparities across us population and income groups. Nature , 601(7892):228--233

work page 2022

[27] [27]

Jiang, W., King, G., Schmaltz, A., and Tanner, M. A. (2020). Ecological regression with partial identification. Political Analysis , 28(1):65--86

work page 2020

[28] [28]

Judge, G. G. and Cho, T. (2004). An information theoretic approach to ecological estimation. In King, G., Tanner, M. A., and Rosen, O., editors, Ecological Inference: New Methodological Strategies , chapter 7, page 162. Cambridge University Press

work page 2004

[29] [29]

Kennedy, P. E. and Cade, B. S. (1996). Randomization tests for multiple regression. Communications in Statistics-Simulation and Computation , 25(4):923--936

work page 1996

[30] [30]

King, G. (1997). A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data . Princeton University Press

work page 1997

[31] [31]

and McCartan, C

Kuriwaki, S. and McCartan, C. (2025). The role of confounders and linearity in ecological inference: A reassessment. Working paper

work page 2025

[32] [32]

Manski, C. F. (2018). Credible ecological inference for medical decisions with personalized risk assessment. Quantitative Economics , 9(2):541--569

work page 2018

[33] [33]

and Kuriwaki, S

McCartan, C. and Kuriwaki, S. (2025). seine: Semiparametric Ecological Inference . R package

work page 2025

[34] [34]

Muzellec, B., Nock, R., Patrini, G., and Nielsen, F. (2017). Tsallis regularized optimal transport and ecological inference. In Proceedings of the AAAI conference on artificial intelligence , volume 31

work page 2017

[35] [35]

Newey, W. K. (1994). The asymptotic variance of semiparametric estimators. Econometrica: Journal of the Econometric Society , pages 1349--1382

work page 1994

[36] [36]

U., Mammen, E., Lee, Y

Park, B. U., Mammen, E., Lee, Y. K., and Lee, E. R. (2015). Varying coefficient regression models: a review and new developments. International Statistical Review , 83(1):36--64

work page 2015

[37] [37]

Patil, P., Wei, Y., Rinaldo, A., and Tibshirani, R. (2021). Uniform consistency of cross-validation estimators for high-dimensional ridge regression. In International conference on artificial intelligence and statistics , pages 3178--3186. PMLR

work page 2021

[38] [38]

T., Porter, P., Mobley, J., and Hurley, F

Rao, S. T., Porter, P., Mobley, J., and Hurley, F. (2011). Understanding the spatio-temporal variability in air pollution concentrations. Environ. Manage , 70:42--48

work page 2011

[39] [39]

Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review , 15(3):351--357

work page 1950

[40] [40]

Rosen, O., Jiang, W., King, G., and Tanner, M. A. (2001). Bayesian and frequentist inference for ecological inference: The R C case. Statistica Neerlandica , 55(2):134--156

work page 2001

[41] [41]

Singh, R., Xu, L., and Gretton, A. (2024). Kernel methods for causal functions: dose, heterogeneous and incremental response curves. Biometrika , 111(2):497--516

work page 2024

[42] [42]

2022 Precinct-Level Election Results

Voting and Election Science Team (2022). 2022 Precinct-Level Election Results

work page 2022

[43] [43]

and Petunin, Y

Vysochanskij, D. and Petunin, Y. I. (1980). Justification of the 3 rule for unimodal distributions. Theory of Probability and Mathematical Statistics , 21(25-36)

work page 1980

[44] [44]

Wakefield, J. (2004). Ecological inference for 2 2 tables (with discussion). Journal of the Royal Statistical Society Series A: Statistics in Society , 167(3):385--445

work page 2004

[45] [45]

and Simon, N

Zhang, T. and Simon, N. (2023). Regression in tensor product spaces by the method of sieves. Electronic journal of statistics , 17(2):3660

work page 2023