pith. sign in

arxiv: 1906.11920 · v2 · pith:APMYHNVHnew · submitted 2019-06-27 · 📊 stat.CO · stat.AP· stat.ME

A Python Library For Empirical Calibration

Pith reviewed 2026-05-25 13:32 UTC · model grok-4.3

classification 📊 stat.CO stat.APstat.ME
keywords empirical calibrationconvex optimizationPython librarybias correctionsurvey samplingcausal inferenceweighting methodsdual optimization
0
0 comments X

The pith

A Python library called EC computes empirical calibration weights by solving convex optimization in dual form.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Python library EC that calculates empirical calibration weights to reduce bias in samples from surveys or observational causal studies. It casts the weighting task as a convex optimization problem and solves it in the dual form for claimed gains in speed and numerical stability. The library adds practical options such as multiple objective functions, weight clipping, and inexact calibration targets. A reader would care because these features turn a theoretically useful weighting method into a usable computational tool for correcting data biases across common statistical settings.

Core claim

The EC library formulates empirical calibration as a convex optimization problem and solves it efficiently in the dual form, delivering greater computational efficiency and robustness than existing software while also supporting different optimization objectives, weight clipping, and inexact calibration to improve practical usability.

What carries the argument

Dual-form convex optimization solver for empirical calibration weights.

If this is right

  • Survey sampling estimates become less biased when weights are computed with the dual solver.
  • Causal studies with observational data achieve better covariate balance through the same weighting routine.
  • Users can apply weight clipping without separate post-processing steps.
  • Inexact calibration targets allow solutions on datasets where perfect balance is infeasible.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The library could serve as a drop-in replacement in existing analysis scripts that currently call slower calibration routines.
  • Its dual formulation may scale more readily to larger sample sizes than primal approaches used in other packages.
  • Adding support for user-defined constraints beyond the built-in options would extend its reach to custom balancing problems.

Load-bearing premise

The dual-form convex optimization implementation will deliver measurable gains in efficiency and robustness over existing software without post-hoc tuning or dataset-specific adjustments.

What would settle it

A head-to-head timing and success-rate comparison on a standard suite of survey and observational datasets where EC shows no consistent speed advantage or higher completion rate than prior libraries.

Figures

Figures reproduced from arXiv: 1906.11920 by Jingang Miao, Xiaojing Wang, Yunting Sun.

Figure 1
Figure 1. Figure 1: Estimate population mean: Kernel density estimates of the true, unweighted, and weighted [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Causal inference for geo 30 with unweighted and weighted mean of control geos as the counterfactual [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Dealing with biased data samples is a common task across many statistical fields. In survey sampling, bias often occurs due to unrepresentative samples. In causal studies with observational data, the treated versus untreated group assignment is often correlated with covariates, i.e., not random. Empirical calibration is a generic weighting method that presents a unified view on correcting or reducing the data biases for the tasks mentioned above. We provide a Python library EC to compute the empirical calibration weights. The problem is formulated as convex optimization and solved efficiently in the dual form. Compared to existing software, EC is both more efficient and robust. EC also accommodates different optimization objectives, supports weight clipping, and allows inexact calibration, which improves usability. We demonstrate its usage across various experiments with both simulated and real-world data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents the Python library EC for computing empirical calibration weights to correct bias in survey samples and observational causal data. The weighting problem is cast as a convex optimization task solved in dual form; the library is claimed to be more efficient and robust than existing packages, while also supporting multiple objectives, weight clipping, and inexact calibration. Usage is illustrated on simulated and real-world data.

Significance. If the efficiency and robustness claims are substantiated, the library would supply a practical, unified implementation of empirical calibration with usability extensions that could see adoption in survey sampling and causal inference workflows.

major comments (2)
  1. [Abstract] Abstract and implementation section: the central claim that EC is 'both more efficient and robust' than existing software is not supported by any reported timing benchmarks, accuracy tables, or comparisons against named competitor packages (e.g., no wall-clock times, iteration counts, or failure rates on standardized test problems). Without such evidence the efficiency/robustness advantage cannot be evaluated.
  2. [Implementation] The manuscript formulates the problem as convex optimization solved in dual form but does not isolate whether the dual route (versus a primal solver or off-the-shelf convex packages) is responsible for any observed gains, nor does it report numerical stability metrics (condition numbers, failure rates under clipping) that would substantiate the robustness claim.
minor comments (2)
  1. [Abstract] The abstract states that 'different optimization objectives' are accommodated but does not list the supported objectives or their corresponding dual formulations.
  2. No mention of software dependencies, installation instructions, or reproducibility artifacts (e.g., exact solver tolerances, random seeds for experiments) appears in the provided text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract and implementation section: the central claim that EC is 'both more efficient and robust' than existing software is not supported by any reported timing benchmarks, accuracy tables, or comparisons against named competitor packages (e.g., no wall-clock times, iteration counts, or failure rates on standardized test problems). Without such evidence the efficiency/robustness advantage cannot be evaluated.

    Authors: We acknowledge that the abstract asserts an efficiency and robustness advantage without accompanying benchmarks or named comparisons in the current manuscript. The claim reflects the authors' development experience, but we agree it requires explicit supporting evidence to be evaluable. In revision we will add a dedicated experiments subsection reporting wall-clock timings, iteration counts, accuracy metrics, and failure rates on standardized test problems against relevant existing packages, allowing direct assessment of the claims. revision: yes

  2. Referee: [Implementation] The manuscript formulates the problem as convex optimization solved in dual form but does not isolate whether the dual route (versus a primal solver or off-the-shelf convex packages) is responsible for any observed gains, nor does it report numerical stability metrics (condition numbers, failure rates under clipping) that would substantiate the robustness claim.

    Authors: The dual formulation is presented as the core computational approach, yet the manuscript does not contain an ablation isolating its contribution nor quantitative stability metrics. We will revise the implementation section to include (i) a brief rationale for preferring the dual route with reference to problem structure and (ii) reported numerical stability measures such as condition numbers of the Hessian and observed failure rates under weight clipping. A limited comparison against a primal formulation will be added if space allows. revision: yes

Circularity Check

0 steps flagged

No circularity: standard convex optimization implementation for library

full rationale

The paper describes a Python library EC that implements empirical calibration as a convex optimization problem solved in dual form. This is a direct encoding of a known formulation rather than any derivation that reduces to fitted parameters, self-citations, or ansatzes from the authors' prior work. No equations or claims in the provided text equate a 'prediction' to its own inputs by construction, and efficiency/robustness statements are presented as empirical comparisons against existing software rather than self-referential results. The contribution is self-contained as software engineering without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that empirical calibration admits an efficient dual convex formulation and that added features improve usability; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Empirical calibration can be cast as a convex optimization problem solvable efficiently in dual form
    Stated directly in the abstract as the basis for the library implementation.

pith-pipeline@v0.9.0 · 5656 in / 1111 out tokens · 23950 ms · 2026-05-25T13:32:43.852502+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Inferring causal impact using bayesian structural time-series models

    Kay H Brodersen, Fabian Gallusser, Jim Koehler, Nicolas Remy, Steven L Scott, et al. Inferring causal impact using bayesian structural time-series models. The Annals of Applied Statistics, 9 0 (1): 0 247--274, 2015

  2. [2]

    Calibration estimators in survey sampling

    Jean-Claude Deville and Carl-Erik S \"a rndal. Calibration estimators in survey sampling. Journal of the American statistical Association, 87 0 (418): 0 376--382, 1992

  3. [3]

    Cvxpy: A python-embedded modeling language for convex optimization

    Steven Diamond and Stephen Boyd. Cvxpy: A python-embedded modeling language for convex optimization. The Journal of Machine Learning Research, 17 0 (1): 0 2909--2913, 2016

  4. [4]

    Ecos: An socp solver for embedded systems

    Alexander Domahidi, Eric Chu, and Stephen Boyd. Ecos: An socp solver for embedded systems. In Control Conference (ECC), 2013 European, pp.\ 3071--3076. IEEE, 2013

  5. [5]

    Fleiss, Bruce Levin, and Myunghee Cho Paik

    Joseph L. Fleiss, Bruce Levin, and Myunghee Cho Paik. Statistical Methods for Rates and Proportions. Wiley, 2003

  6. [6]

    Cvxr: An r package for disciplined convex optimization

    Anqi Fu, Balasubramanian Narasimhan, and Stephen Boyd. Cvxr: An r package for disciplined convex optimization. arXiv preprint arXiv:1711.07582, 2017

  7. [7]

    Cvrx: A direct standardization example

    Anqi Fu, Balasubramanian Narasimhan, and Stephen Boyd. Cvrx: A direct standardization example. https://cvxr.rbind.io/cvxr_examples/cvxr_direct-standardization, 2018

  8. [8]

    Sampling statistics, volume 560

    Wayne A Fuller. Sampling statistics, volume 560. John Wiley & Sons, 2011

  9. [9]

    R package G eoexperiments R esearch

    Google Inc . R package G eoexperiments R esearch. https://github.com/google/GeoexperimentsResearch, 2017

  10. [10]

    Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies

    Jens Hainmueller. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20 0 (1): 0 25--46, 2012

  11. [11]

    ebal: Entropy reweighting to create balanced samples

    Jens Hainmueller. ebal: Entropy reweighting to create balanced samples. https://CRAN.R-project.org/package=ebal, 2014. R package version 0.1-6

  12. [12]

    Ebalance: A stata package for entropy balancing

    Jens Hainmueller and Yiqing Xu. Ebalance: A stata package for entropy balancing. Journal of Statistical Software, 2013

  13. [13]

    Efficient estimation of average treatment effects using the estimated propensity score

    Keisuke Hirano, Guido W Imbens, and Geert Ridder. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71 0 (4): 0 1161--1189, 2003

  14. [14]

    A generalization of sampling without replacement from a finite universe

    Daniel G Horvitz and Donovan J Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 47 0 (260): 0 663--685, 1952

  15. [15]

    Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data

    Joseph DY Kang, Joseph L Schafer, et al. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical science, 22 0 (4): 0 523--539, 2007

  16. [16]

    Estimating ad effectiveness using geo experiments in a time-based regression framework

    Jouni Kerman, Peng Wang, and Jon Vaver. Estimating ad effectiveness using geo experiments in a time-based regression framework. ai.google/research/pubs/pub45950, 2017. Google, Inc

  17. [17]

    Why propensity scores should not be used for matching

    Gary King and Richard Nielsen. Why propensity scores should not be used for matching. Copy at http://jmp/1sexgVw Export BibTex Tagged XML Download Paper, 481, 2015

  18. [18]

    Survey sampling

    Leslie Kish. Survey sampling. John Wiley and Sons, 1965

  19. [19]

    Using calibration weighting to adjust for nonresponse and coverage errors

    Phillip S Kott. Using calibration weighting to adjust for nonresponse and coverage errors. Survey Methodology, 32 0 (2): 0 133, 2006

  20. [20]

    Evaluating the econometric evaluations of training programs with experimental data

    Robert J LaLonde. Evaluating the econometric evaluations of training programs with experimental data. The American economic review, pp.\ 604--620, 1986

  21. [21]

    Post-stratification: a modeler's perspective

    Roderick JA Little. Post-stratification: a modeler's perspective. Journal of the American Statistical Association, 88 0 (423): 0 1001--1012, 1993

  22. [22]

    Empirical likelihood

    Art B Owen. Empirical likelihood. Chapman and Hall/CRC, 2001

  23. [23]

    Conic optimization via operator splitting and homogeneous self-dual embedding

    Brendan O’Donoghue, Eric Chu, Neal Parikh, and Stephen Boyd. Conic optimization via operator splitting and homogeneous self-dual embedding. Journal of Optimization Theory and Applications, 169 0 (3): 0 1042--1068, 2016

  24. [24]

    Estimation of regression coefficients when some regressors are not always observed

    James M Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89 0 (427): 0 846--866, 1994

  25. [25]

    The central role of the propensity score in observational studies for causal effects

    Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70 0 (1): 0 41--55, 1983

  26. [26]

    The use of matched sampling and regression adjustment to remove bias in observational studies

    Donald B Rubin. The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics, pp.\ 185--203, 1973

  27. [27]

    Computer and internet use in the united states: 2015

    Camille Ryan and Jamie M Lewis. Computer and internet use in the united states: 2015. American Community Survey Reports, 2017

  28. [28]

    Measuring ad effectiveness using geo experiments

    Jon Vaver and Jim Koehler. Measuring ad effectiveness using geo experiments. https://ai.google/research/pubs/pub38355, 2011. Google Inc

  29. [29]

    Entropy balancing is doubly robust

    Qingyuan Zhao and Daniel Percival. Entropy balancing is doubly robust. Journal of Causal Inference, 5 0 (1), 2017

  30. [30]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  31. [31]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  32. [32]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  33. [33]

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...