Randomization-Based Inference for Average Treatment Effects in Inexactly Matched Observational Studies

Jeffrey Zhang; Jianan Zhu; Siyu Heng; Zijian Guo

arxiv: 2308.02005 · v5 · submitted 2023-08-03 · 📊 stat.ME

Randomization-Based Inference for Average Treatment Effects in Inexactly Matched Observational Studies

Jianan Zhu , Jeffrey Zhang , Zijian Guo , Siyu Heng This is my paper

Pith reviewed 2026-05-24 06:59 UTC · model grok-4.3

classification 📊 stat.ME

keywords randomization inferenceaverage treatment effectmatchingcausal inferenceobservational studiesbias correctioninexact matching

0 comments

The pith

Inverse post-matching probability weighting corrects bias in randomization inference for average treatment effects when matching is inexact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Matching pairs treated and control units on covariates to mimic a randomized experiment, but inexact matches leave residual imbalances that bias standard randomization tests for average treatment effects. Prior corrections existed only for constant treatment effects under Fisher's sharp null. The paper develops inverse post-matching probability weighting to adjust for these imbalances while retaining the randomization distribution. Theory and simulations demonstrate lower bias and better coverage rates than conventional approaches. This targets Neyman's weak null for average effects rather than sharp nulls.

Core claim

Inverse post-matching probability weighting performs randomization-based inference for average treatment effects under inexact matching by reweighting to remove bias from residual covariate imbalance, with theoretical and simulation results showing reduced bias and improved coverage relative to unadjusted randomization inference.

What carries the argument

Inverse post-matching probability weighting, which adjusts each unit's contribution according to its post-matching probability to restore unbiasedness for the average treatment effect under the randomization distribution.

If this is right

Randomization inference for average treatment effects becomes valid without requiring exact matches on all covariates.
Bias from residual imbalance after matching is reduced compared to standard methods that assume balance criteria are met.
Coverage rates of resulting confidence intervals increase in settings with inexact matching.
The approach applies specifically to Neyman's weak null rather than only to constant effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applied researchers could use this to analyze observational datasets where perfect covariate matches are uncommon, such as in medical or economic studies.
The weighting might extend to other imperfect balance designs like coarsened exact matching or propensity score methods.
Testing the method on datasets with known treatment effects from prior randomized trials would provide external validation.

Load-bearing premise

The weighting fully removes bias from inexact matching for the average treatment effect while leaving the randomization distribution unchanged.

What would settle it

A simulation in which the proposed intervals fail to achieve nominal coverage under known inexact matching with correct post-matching probabilities would show the correction does not work as claimed.

read the original abstract

Matching is a widely used causal inference design that aims to approximate a randomized experiment using observational data by forming matched sets of treated and control units based on similarities in their covariates. Ideally, treated units are exactly matched with controls on these covariates, enabling randomization-based inference for treatment effects as in a randomized experiment, under the assumption of no unobserved covariates. However, inexact matching often occurs, leading to residual covariate imbalance after matching. Previous matched studies have typically overlooked this issue and relied on conventional randomization-based inference, assuming that some covariate balance criteria are met. Recent research, however, has shown that this approach can introduce significant bias and proposed methods to correct for bias arising from inexact matching in randomization-based inference. These methods, however, are primarily focused on the constant treatment effect and its extensions (i.e., Fisher's sharp null) and do not apply to average treatment effects (i.e., Neyman's weak null). To address this gap, we introduce a new method--inverse post-matching probability weighting--for conducting randomization-based inference for average treatment effects under inexact matching. Our theoretical and simulation results indicate that, compared to conventional randomization-based inference methods, our approach significantly reduces bias and improves coverage rates in the presence of inexact matching.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The inverse post-matching probability weighting extends prior bias-correction methods to average treatment effects, but its validity under the weak null needs more visible support than the abstract provides.

read the letter

The main takeaway is that the authors propose inverse post-matching probability weighting as a way to do valid randomization inference for average treatment effects in cases where matching leaves some covariate imbalance. This extends beyond earlier methods that only worked for constant treatment effects. The paper does a good job naming the practical problem and offering a new procedure aimed at bias reduction. If the simulations back it up as claimed, it could help researchers who rely on matching but can't get perfect balance. The soft spot is the missing detail on the theoretical side. The claim that this weighting corrects bias while keeping the randomization distribution usable for Neyman's weak null is central, but the abstract gives no equations or conditions. That leaves open whether it works exactly or only under assumptions that fail with continuous covariates or certain matching schemes. The stress-test note captures this accurately based on what's provided. This paper is for methodologists and applied statisticians working in causal inference with observational data. It deserves a serious referee because it targets a genuine gap with a concrete proposal, even though the current presentation is limited. I recommend sending it to peer review so the authors can provide the full derivations and any necessary conditions.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes inverse post-matching probability weighting as a new method for randomization-based inference of average treatment effects (Neyman's weak null) in observational studies with inexact matching. It claims that, unlike conventional randomization-based methods or prior corrections limited to Fisher's sharp null, this weighting reduces bias from residual covariate imbalance and improves coverage, as supported by theoretical derivations and simulation results.

Significance. If the weighting scheme is shown to correct residual imbalance while preserving a usable randomization distribution under the weak null, the result would address a documented gap in matched observational studies and extend randomization inference beyond constant-effect settings. The provision of both theoretical derivations and simulation evidence would strengthen the contribution relative to purely asymptotic or simulation-only approaches.

major comments (2)

[Abstract] Abstract and theory section: the central claim that inverse post-matching probability weighting 'corrects bias for Neyman's weak null while preserving the randomization distribution' requires an explicit derivation or set of regularity conditions showing how the weights are constructed from the observed matching and why the reference distribution remains exactly or asymptotically known; no such derivation or conditions appear in the provided abstract, leaving the coverage improvement claim unverified.
[Simulations] Simulation results: the reported bias reduction and coverage improvement must be accompanied by the precise definition of post-matching probabilities (e.g., how they are estimated from the matching procedure) and the exact simulation design (including covariate dimension and degree of inexactness); without these, it is impossible to confirm that the improvement is not an artifact of the chosen data-generating process.

minor comments (2)

Notation for the weighting scheme should be introduced with a clear distinction between the matching-induced randomization distribution and the weighted estimator.
The manuscript should cite the specific prior work on bias correction under Fisher's sharp null to clarify the precise extension being made.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. We address each major comment below and agree that targeted revisions will improve clarity without altering the core contribution.

read point-by-point responses

Referee: [Abstract] Abstract and theory section: the central claim that inverse post-matching probability weighting 'corrects bias for Neyman's weak null while preserving the randomization distribution' requires an explicit derivation or set of regularity conditions showing how the weights are constructed from the observed matching and why the reference distribution remains exactly or asymptotically known; no such derivation or conditions appear in the provided abstract, leaving the coverage improvement claim unverified.

Authors: The theory section of the manuscript (Section 3) contains the explicit construction of the inverse post-matching probability weights from the observed matching and the regularity conditions under which the weighted randomization distribution remains valid for Neyman's weak null. The abstract is intentionally concise, but we will revise it to include a brief outline of the weight construction and the key asymptotic conditions to make the central claim verifiable directly from the abstract. revision: yes
Referee: [Simulations] Simulation results: the reported bias reduction and coverage improvement must be accompanied by the precise definition of post-matching probabilities (e.g., how they are estimated from the matching procedure) and the exact simulation design (including covariate dimension and degree of inexactness); without these, it is impossible to confirm that the improvement is not an artifact of the chosen data-generating process.

Authors: The definitions of the post-matching probabilities and the full simulation design (including how probabilities are estimated from the matching procedure, covariate dimensions, and degree of inexactness) are already specified in Sections 4 and 5 of the manuscript. To address the referee's concern about reproducibility and potential artifacts, we will expand the simulation section with a dedicated summary table or subsection that restates these elements more explicitly and prominently. revision: yes

Circularity Check

0 steps flagged

No circularity: new weighting method derived independently to address acknowledged gap for Neyman weak null.

full rationale

The paper introduces inverse post-matching probability weighting as a novel correction for bias under inexact matching when targeting the average treatment effect (Neyman's weak null). The abstract and description frame this as filling a gap left by prior methods limited to Fisher's sharp null, with the new approach supported by separate theoretical derivations and simulations rather than any redefinition of inputs, fitted parameters renamed as predictions, or load-bearing self-citations. No step reduces the claimed coverage improvement or randomization preservation to a tautology by construction; the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on standard causal assumptions plus the new weighting construction; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption No unobserved covariates affect treatment assignment or outcome
Invoked in the abstract as the ideal-case assumption that matching approximates a randomized experiment.

pith-pipeline@v0.9.0 · 5755 in / 1120 out tokens · 34983 ms · 2026-05-24T06:59:17.143114+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

inverse post-matching probability weighting (IPPW) estimator... re-weighting the post-matching difference-in-means estimator according to discrepancies of post-matching treatment assignment probabilities
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1... lim pr(λ ∈ CI_λ_* | Z) ≥ 1-α under Conditions 1–4

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

[1]

and Imbens, G

Abadie, A. and Imbens, G. W. (2011). Bias-corrected matching estimators for average treatment effects. Journal of Business & Economic Statistics , 29(1):1--11

work page 2011
[2]

and Imbens, G

Athey, S. and Imbens, G. W. (2017). The econometrics of randomized experiments. In Handbook of Economic Field Experiments , volume 1, pages 73--140. Elsevier

work page 2017
[3]

S., Lorch, S., and Rosenbaum, P

Baiocchi, M., Small, D. S., Lorch, S., and Rosenbaum, P. R. (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. Journal of the American Statistical Association , 105(492):1285--1296

work page 2010
[4]

Basse, G., Ding, P., Feller, A., and Toulis, P. (2024). Randomization tests for peer effects in group formation experiments. Econometrica , 92(2):567--590

work page 2024
[5]

Branson, Z. (2021). Randomization tests to assess covariate balance when designing and analyzing matched datasets. Observational Studies , 7(2):1--36

work page 2021
[6]

and Guestrin, C

Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting system . In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 785--794

work page 2016
[7]

J., Neuman M, Finlay, J

Corsi, D. J., Neuman M, Finlay, J. E., and Subramanian, S. V. (2012). Demographic and health surveys: a profile. International Journal of Epidemiology , 41(6):1602--1613

work page 2012
[8]

K., Hotz, V

Crump, R. K., Hotz, V. J., Imbens, G. W., and Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika , 96(1):187--199

work page 2009
[9]

Ding, P. (2024). A First Course in Causal Inference . CRC Press

work page 2024
[10]

Fogarty, C. B. (2018). On mitigating the analytical limitations of finely stratified experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 80(5):1035--1056

work page 2018
[11]

Frazier, A., Heng, S., and Zhou, W. (2024). Bias reduction in matched observational studies with continuous treatments: Calipered non-bipartite matching and bias-corrected estimation and inference. arXiv preprint arXiv:2409.11701

work page arXiv 2024
[12]

and Shem-Tov, Y

Gagnon-Bartsch, J. and Shem-Tov, Y. (2019). The classification permutation test: A flexible approach to testing for covariate imbalance in observational studies. The Annals of Applied Statistics , 13(3):1464--1483

work page 2019
[13]

and Rothenh \"a usler, D

Guo, K. and Rothenh \"a usler, D. (2023). On the statistical role of inexact matching in observational studies. Biometrika , 110(3):631--644

work page 2023
[14]

Hansen, B. B. (2004). Full matching in an observational study of coaching for the sat. Journal of the American Statistical Association , 99(467):609--618

work page 2004
[15]

Hansen, B. B. and Klopfer, S. O. (2006). Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics , 15(3):609--627

work page 2006
[16]

and Small, D

Heng, S. and Small, D. S. (2021). Sharpening the rosenbaum sensitivity bounds to address concerns about interactions between observed and unobserved covariates. Statistica Sinica , 31:2331--2353

work page 2021
[17]

E., Imai, K., King, G., and Stuart, E

Ho, D. E., Imai, K., King, G., and Stuart, E. A. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis , 15(3):199--236

work page 2007
[18]

Imbens, G. W. and Rubin, D. B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences . Cambridge University Press

work page 2015
[19]

Kang, H., Kreuels, B., May, J., and Small, D. S. (2016). Full matching approach to instrumental variables estimation with application to the effect of malaria on stunting. The Annals of Applied Statistics , 10(1):335--364

work page 2016
[20]

E., Bhattacharjee, N

Kinyoki, D., Osgood-Zimmerman, A. E., Bhattacharjee, N. V., Local Burden of Disease Anaemia Collaborators , Kassebaum, N. J., and Hay, S. I. (2021). Anemia prevalence in women of reproductive age in low- and middle-income countries between 2000 and 2018. Nature Medicine , 27(10):1761--1782

work page 2021
[21]

and Ding, P

Li, X. and Ding, P. (2017). General forms of finite population central limit theorems with applications to causal inference. Journal of the American Statistical Association , 112(520):1759--1769

work page 2017
[22]

and Small, D

Li, X. and Small, D. S. (2023). Randomization-based test for censored outcomes: a new look at the logrank test. Statistical Science , 38(1):92--107

work page 2023
[23]

K., and Small, D

Lin, Y., Heng, S., Anand, S., Deshpande, S. K., and Small, D. S. (2022). Hemoglobin levels among male agricultural workers: analyses from the demographic and health surveys to investigate a marker for chronic kidney disease of uncertain etiology. Journal of Occupational and Environmental Medicine , 64(12):805--810

work page 2022
[24]

and Davidian, M

Lunceford, J. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine , 23(19):2937--2960

work page 2004
[25]

and Wang, J

Ma, X. and Wang, J. (2020). Robust inference using inverse probability weighting. Journal of the American Statistical Association , 115(532):1851--1860

work page 2020
[26]

Mukerjee, R., Dasgupta, T., and Rubin, D. B. (2018). Using standard tools from finite population sampling to improve causal inference for complex experiments. Journal of the American Statistical Association , 113(522):868--881

work page 2018
[27]

Neyman, J. S. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. (translated and edited by D. M. D abrowska and T. P. S peed) . Statistical Science , (1990) 5:465--480

work page 1923
[28]

Pimentel, S. D. and Huang, Y. (2024). Covariate-adaptive randomization inference in matched designs. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , page qkae033

work page 2024
[29]

Rosenbaum, P. R. (1987). Model-based direct adjustment. Journal of the American Statistical Association , 82(398):387--394

work page 1987
[30]

Rosenbaum, P. R. (1988). Permutation tests for matched pairs with adjustments for covariates. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 37(3):401--411

work page 1988
[31]

Rosenbaum, P. R. (2002). Observational Studies . Springer

work page 2002
[32]

Rosenbaum, P. R. (2012). Optimal matching of an optimally chosen subset in observational studies. Journal of Computational and Graphical Statistics , 21(1):57--71

work page 2012
[33]

Rosenbaum, P. R. (2020). Design of Observational Studies (Second Edition) . Springer

work page 2020
[34]

Rubin, D. B. (1973). The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics , pages 185--203

work page 1973
[35]

Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology , 66(5):688

work page 1974
[36]

L., Wyss, R., Ellis, A

St \"u rmer, T., Webster-Clark, M., Lund, J. L., Wyss, R., Ellis, A. R., Lunt, M., Rothman, K. J., and Glynn, R. J. (2021). Propensity score weighting and trimming strategies for reducing variance and bias of treatment effect estimates: a simulation study. American Journal of Epidemiology , 190(8):1659--1670

work page 2021
[37]

and Li, X

Su, Y. and Li, X. (2024). Treatment effect quantiles in stratified randomized experiments and matched observational studies . Biometrika , 111(1):235--254

work page 2024
[38]

and Zubizarreta, J

Visconti, G. and Zubizarreta, J. R. (2018). Handling limited overlap in observational studies with cardinality matching. Observational Studies , 4(1):217--249

work page 2018
[39]

Zhang, B., Heng, S., Ye, T., and Small, D. S. (2023). Social distancing and covid-19: Randomization inference for a structured dose-response relationship. The Annals of Applied Statistics , 17(1):23--46

work page 2023
[40]

Zhao, A., Ding, P., Mukerjee, R., and Dasgupta, T. (2018). Randomization-based causal inference from split-plot designs. The Annals of Statistics , 46(5):1876--1903

work page 2018
[41]

Randomization-Based Inference for Average Treatment Effects in Inexactly Matched Observational Studies

Zhu, J. and Heng, S. (2023). Bias correction for randomization-based estimation in inexactly matched observational studies. arXiv preprint arXiv:2308.02005v1

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

and Imbens, G

Abadie, A. and Imbens, G. W. (2011). Bias-corrected matching estimators for average treatment effects. Journal of Business & Economic Statistics , 29(1):1--11

work page 2011

[2] [2]

and Imbens, G

Athey, S. and Imbens, G. W. (2017). The econometrics of randomized experiments. In Handbook of Economic Field Experiments , volume 1, pages 73--140. Elsevier

work page 2017

[3] [3]

S., Lorch, S., and Rosenbaum, P

Baiocchi, M., Small, D. S., Lorch, S., and Rosenbaum, P. R. (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. Journal of the American Statistical Association , 105(492):1285--1296

work page 2010

[4] [4]

Basse, G., Ding, P., Feller, A., and Toulis, P. (2024). Randomization tests for peer effects in group formation experiments. Econometrica , 92(2):567--590

work page 2024

[5] [5]

Branson, Z. (2021). Randomization tests to assess covariate balance when designing and analyzing matched datasets. Observational Studies , 7(2):1--36

work page 2021

[6] [6]

and Guestrin, C

Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting system . In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 785--794

work page 2016

[7] [7]

J., Neuman M, Finlay, J

Corsi, D. J., Neuman M, Finlay, J. E., and Subramanian, S. V. (2012). Demographic and health surveys: a profile. International Journal of Epidemiology , 41(6):1602--1613

work page 2012

[8] [8]

K., Hotz, V

Crump, R. K., Hotz, V. J., Imbens, G. W., and Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika , 96(1):187--199

work page 2009

[9] [9]

Ding, P. (2024). A First Course in Causal Inference . CRC Press

work page 2024

[10] [10]

Fogarty, C. B. (2018). On mitigating the analytical limitations of finely stratified experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 80(5):1035--1056

work page 2018

[11] [11]

Frazier, A., Heng, S., and Zhou, W. (2024). Bias reduction in matched observational studies with continuous treatments: Calipered non-bipartite matching and bias-corrected estimation and inference. arXiv preprint arXiv:2409.11701

work page arXiv 2024

[12] [12]

and Shem-Tov, Y

Gagnon-Bartsch, J. and Shem-Tov, Y. (2019). The classification permutation test: A flexible approach to testing for covariate imbalance in observational studies. The Annals of Applied Statistics , 13(3):1464--1483

work page 2019

[13] [13]

and Rothenh \"a usler, D

Guo, K. and Rothenh \"a usler, D. (2023). On the statistical role of inexact matching in observational studies. Biometrika , 110(3):631--644

work page 2023

[14] [14]

Hansen, B. B. (2004). Full matching in an observational study of coaching for the sat. Journal of the American Statistical Association , 99(467):609--618

work page 2004

[15] [15]

Hansen, B. B. and Klopfer, S. O. (2006). Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics , 15(3):609--627

work page 2006

[16] [16]

and Small, D

Heng, S. and Small, D. S. (2021). Sharpening the rosenbaum sensitivity bounds to address concerns about interactions between observed and unobserved covariates. Statistica Sinica , 31:2331--2353

work page 2021

[17] [17]

E., Imai, K., King, G., and Stuart, E

Ho, D. E., Imai, K., King, G., and Stuart, E. A. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis , 15(3):199--236

work page 2007

[18] [18]

Imbens, G. W. and Rubin, D. B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences . Cambridge University Press

work page 2015

[19] [19]

Kang, H., Kreuels, B., May, J., and Small, D. S. (2016). Full matching approach to instrumental variables estimation with application to the effect of malaria on stunting. The Annals of Applied Statistics , 10(1):335--364

work page 2016

[20] [20]

E., Bhattacharjee, N

Kinyoki, D., Osgood-Zimmerman, A. E., Bhattacharjee, N. V., Local Burden of Disease Anaemia Collaborators , Kassebaum, N. J., and Hay, S. I. (2021). Anemia prevalence in women of reproductive age in low- and middle-income countries between 2000 and 2018. Nature Medicine , 27(10):1761--1782

work page 2021

[21] [21]

and Ding, P

Li, X. and Ding, P. (2017). General forms of finite population central limit theorems with applications to causal inference. Journal of the American Statistical Association , 112(520):1759--1769

work page 2017

[22] [22]

and Small, D

Li, X. and Small, D. S. (2023). Randomization-based test for censored outcomes: a new look at the logrank test. Statistical Science , 38(1):92--107

work page 2023

[23] [23]

K., and Small, D

Lin, Y., Heng, S., Anand, S., Deshpande, S. K., and Small, D. S. (2022). Hemoglobin levels among male agricultural workers: analyses from the demographic and health surveys to investigate a marker for chronic kidney disease of uncertain etiology. Journal of Occupational and Environmental Medicine , 64(12):805--810

work page 2022

[24] [24]

and Davidian, M

Lunceford, J. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine , 23(19):2937--2960

work page 2004

[25] [25]

and Wang, J

Ma, X. and Wang, J. (2020). Robust inference using inverse probability weighting. Journal of the American Statistical Association , 115(532):1851--1860

work page 2020

[26] [26]

Mukerjee, R., Dasgupta, T., and Rubin, D. B. (2018). Using standard tools from finite population sampling to improve causal inference for complex experiments. Journal of the American Statistical Association , 113(522):868--881

work page 2018

[27] [27]

Neyman, J. S. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. (translated and edited by D. M. D abrowska and T. P. S peed) . Statistical Science , (1990) 5:465--480

work page 1923

[28] [28]

Pimentel, S. D. and Huang, Y. (2024). Covariate-adaptive randomization inference in matched designs. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , page qkae033

work page 2024

[29] [29]

Rosenbaum, P. R. (1987). Model-based direct adjustment. Journal of the American Statistical Association , 82(398):387--394

work page 1987

[30] [30]

Rosenbaum, P. R. (1988). Permutation tests for matched pairs with adjustments for covariates. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 37(3):401--411

work page 1988

[31] [31]

Rosenbaum, P. R. (2002). Observational Studies . Springer

work page 2002

[32] [32]

Rosenbaum, P. R. (2012). Optimal matching of an optimally chosen subset in observational studies. Journal of Computational and Graphical Statistics , 21(1):57--71

work page 2012

[33] [33]

Rosenbaum, P. R. (2020). Design of Observational Studies (Second Edition) . Springer

work page 2020

[34] [34]

Rubin, D. B. (1973). The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics , pages 185--203

work page 1973

[35] [35]

Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology , 66(5):688

work page 1974

[36] [36]

L., Wyss, R., Ellis, A

St \"u rmer, T., Webster-Clark, M., Lund, J. L., Wyss, R., Ellis, A. R., Lunt, M., Rothman, K. J., and Glynn, R. J. (2021). Propensity score weighting and trimming strategies for reducing variance and bias of treatment effect estimates: a simulation study. American Journal of Epidemiology , 190(8):1659--1670

work page 2021

[37] [37]

and Li, X

Su, Y. and Li, X. (2024). Treatment effect quantiles in stratified randomized experiments and matched observational studies . Biometrika , 111(1):235--254

work page 2024

[38] [38]

and Zubizarreta, J

Visconti, G. and Zubizarreta, J. R. (2018). Handling limited overlap in observational studies with cardinality matching. Observational Studies , 4(1):217--249

work page 2018

[39] [39]

Zhang, B., Heng, S., Ye, T., and Small, D. S. (2023). Social distancing and covid-19: Randomization inference for a structured dose-response relationship. The Annals of Applied Statistics , 17(1):23--46

work page 2023

[40] [40]

Zhao, A., Ding, P., Mukerjee, R., and Dasgupta, T. (2018). Randomization-based causal inference from split-plot designs. The Annals of Statistics , 46(5):1876--1903

work page 2018

[41] [41]

Randomization-Based Inference for Average Treatment Effects in Inexactly Matched Observational Studies

Zhu, J. and Heng, S. (2023). Bias correction for randomization-based estimation in inexactly matched observational studies. arXiv preprint arXiv:2308.02005v1

work page internal anchor Pith review Pith/arXiv arXiv 2023