pith. machine review for the scientific record. sign in

arxiv: 2604.10412 · v1 · submitted 2026-04-12 · 📊 stat.ML · cs.LG· stat.ME

Recognition: unknown

Orthogonal machine learning for conditional odds and risk ratios

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:39 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords conditional odds ratioconditional risk ratioorthogonal machine learningdoubly robust estimationR-learnerDR-learnertreatment effect heterogeneitynonparametric causal inference
0
0 comments X

The pith

Orthogonal risk functions for conditional odds and risk ratios produce pseudo-outcomes with second-order remainder properties like those for average treatment effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops estimators for conditional odds ratios and risk ratios that measure how treatment effects vary across groups. It generalizes orthogonal machine learning techniques previously used for average treatment effects to these ratio measures by deriving new risk functions. The resulting pseudo-outcomes have errors that depend on products rather than sums of nuisance estimation errors, allowing flexible machine learning for those nuisances. Simulations across hundreds of data scenarios show the new estimators reduce bias and mean squared error compared with standard alternatives in complex settings. An analysis of national health survey data demonstrates that the approach reveals treatment heterogeneity missed by traditional regression.

Core claim

We derive orthogonal risk functions for the OR and RR and show that the associated pseudo-outcomes satisfy second-order conditional-mean remainder properties analogous to the ATE case. We also evaluate estimators for the conditional ATE, OR, and RR in a comprehensive nonparametric Monte Carlo simulation study to compare them with common alternatives under hundreds of different data-generating distributions. Our numerical studies provide empirical guidance for choosing an estimator, showing that nonparametric estimators significantly reduce bias and mean squared error in more complex settings.

What carries the argument

Orthogonal risk functions for the odds ratio and risk ratio that generate pseudo-outcomes satisfying second-order conditional-mean remainder properties.

If this is right

  • The estimators achieve lower bias and mean squared error than parametric models when data patterns are complex.
  • Treatment targeting improves by uncovering heterogeneity obscured by standard regression.
  • The approach extends the benefits of DR-learner and R-learner methods from ATE to OR and RR parameters.
  • Real-world applications such as NHANES data analysis yield improved decision rules for interventions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar orthogonalization may apply to other nonlinear causal effect measures beyond OR and RR.
  • Practitioners gain most when they pair these methods with flexible, high-quality nuisance estimators.
  • The methods support more reliable subgroup-specific treatment recommendations in precision health settings.
  • The simulation design provides a reusable template for evaluating estimators of other conditional causal parameters.

Load-bearing premise

Nuisance functions such as propensity scores and outcome regressions can be estimated at rates fast enough that their product terms dominate the remainder and vanish asymptotically.

What would settle it

A Monte Carlo experiment in which the proposed OR and RR estimators fail to show lower mean squared error than non-orthogonal alternatives despite accurate nuisance function fits in complex data-generating processes.

Figures

Figures reproduced from arXiv: 2604.10412 by Iv\'an D\'iaz, Jiacheng Ge.

Figure 1
Figure 1. Figure 1: Conditional OR: interaction order 3, sample size 2000 [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Conditional OR: interaction order 3, sample size 200 [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Conditional OR: interaction order 1, sample size 2000 [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Conditional OR: interaction order 1, sample size 200 [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: NHANES application for the effect of physical activity on sleep trouble. The left panel shows [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Conditional OR: interaction order 3, sample size 1000 [PITH_FULL_IMAGE:figures/full_fig_p033_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Conditional OR: interaction order 3, sample size 500 [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Conditional OR: interaction order 2, sample size 2000 [PITH_FULL_IMAGE:figures/full_fig_p035_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Conditional OR: interaction order 2, sample size 1000 [PITH_FULL_IMAGE:figures/full_fig_p036_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Conditional OR: interaction order 2, sample size 500 [PITH_FULL_IMAGE:figures/full_fig_p037_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Conditional OR: interaction order 2, sample size 200 [PITH_FULL_IMAGE:figures/full_fig_p038_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Conditional OR: interaction order 1, sample size 1000 [PITH_FULL_IMAGE:figures/full_fig_p039_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Conditional OR: interaction order 1, sample size 500 [PITH_FULL_IMAGE:figures/full_fig_p040_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Conditional RR: interaction order 3, sample size 2000 [PITH_FULL_IMAGE:figures/full_fig_p041_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Conditional RR: interaction order 3, sample size 1000 [PITH_FULL_IMAGE:figures/full_fig_p042_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Conditional RR: interaction order 3, sample size 500 [PITH_FULL_IMAGE:figures/full_fig_p043_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Conditional RR: interaction order 3, sample size 200 [PITH_FULL_IMAGE:figures/full_fig_p044_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Conditional RR: interaction order 2, sample size 2000 [PITH_FULL_IMAGE:figures/full_fig_p045_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Conditional RR: interaction order 2, sample size 1000 [PITH_FULL_IMAGE:figures/full_fig_p046_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Conditional RR: interaction order 2, sample size 500 [PITH_FULL_IMAGE:figures/full_fig_p047_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Conditional RR: interaction order 2, sample size 200 [PITH_FULL_IMAGE:figures/full_fig_p048_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Conditional RR: interaction order 1, sample size 2000 [PITH_FULL_IMAGE:figures/full_fig_p049_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Conditional RR: interaction order 1, sample size 1000 [PITH_FULL_IMAGE:figures/full_fig_p050_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Conditional RR: interaction order 1, sample size 500 [PITH_FULL_IMAGE:figures/full_fig_p051_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Conditional RR: interaction order 1, sample size 200 [PITH_FULL_IMAGE:figures/full_fig_p052_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Conditional ATE: interaction order 3, sample size 2000 [PITH_FULL_IMAGE:figures/full_fig_p053_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Conditional ATE: interaction order 3, sample size 1000 [PITH_FULL_IMAGE:figures/full_fig_p054_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Conditional ATE: interaction order 3, sample size 500 [PITH_FULL_IMAGE:figures/full_fig_p055_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Conditional ATE: interaction order 3, sample size 200 [PITH_FULL_IMAGE:figures/full_fig_p056_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Conditional ATE: interaction order 2, sample size 2000 [PITH_FULL_IMAGE:figures/full_fig_p057_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Conditional ATE: interaction order 2, sample size 1000 [PITH_FULL_IMAGE:figures/full_fig_p058_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Conditional ATE: interaction order 2, sample size 500 [PITH_FULL_IMAGE:figures/full_fig_p059_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Conditional ATE: interaction order 2, sample size 200 [PITH_FULL_IMAGE:figures/full_fig_p060_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: Conditional ATE: interaction order 1, sample size 2000 [PITH_FULL_IMAGE:figures/full_fig_p061_34.png] view at source ↗
Figure 35
Figure 35. Figure 35: Conditional ATE: interaction order 1, sample size 1000 [PITH_FULL_IMAGE:figures/full_fig_p062_35.png] view at source ↗
Figure 36
Figure 36. Figure 36: Conditional ATE: interaction order 1, sample size 500 [PITH_FULL_IMAGE:figures/full_fig_p063_36.png] view at source ↗
Figure 37
Figure 37. Figure 37: Conditional ATE: interaction order 1, sample size 200 [PITH_FULL_IMAGE:figures/full_fig_p064_37.png] view at source ↗
read the original abstract

Conditional effects are commonly used measures for understanding how treatment effects vary across different groups, and are often used to target treatments/interventions to groups who benefit most. In this work we review existing methods and propose novel ones, focusing on the odds ratio (OR) and the risk ratio (RR). While estimation of the conditional average treatment effect (ATE) has been widely studied, estimators for the OR and RR lag behind, and cutting edge estimators such as those based on doubly robust transformations or orthogonal risk functions have not been generalized to these parameters. We propose such a generalization here, focusing on the DR-learner and the R-learner. We derive orthogonal risk functions for the OR and RR and show that the associated pseudo-outcomes satisfy second-order conditional-mean remainder properties analogous to the ATE case. We also evaluate estimators for the conditional ATE, OR, and RR in a comprehensive nonparametric Monte Carlo simulation study to compare them with common alternatives under hundreds of different data-generating distributions. Our numerical studies provide empirical guidance for choosing an estimator. For instance, they show that while parametric models are useful in very simple settings, the proposed nonparametric estimators significantly reduce bias and mean squared error in the more complex settings expected in the real world. We illustrate the methods in the analysis of physical activity and sleep trouble in U.S. adults using data from the National Health and Nutrition Examination Survey (NHANES). The results demonstrate that our estimators uncover substantial treatment effect heterogeneity that is obscured by traditional regression approaches and lead to improved treatment decision rules, highlighting the importance of data-adaptive methods for advancing precision health research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper derives orthogonal risk functions and associated pseudo-outcomes for the conditional odds ratio (OR) and risk ratio (RR) that exhibit second-order conditional-mean remainder properties analogous to those for the conditional average treatment effect. It proposes generalizations of the DR-learner and R-learner to these functionals, evaluates the resulting estimators against common alternatives in a large-scale nonparametric Monte Carlo study across hundreds of data-generating distributions, and applies the methods to NHANES data on physical activity and sleep trouble to demonstrate improved detection of treatment effect heterogeneity.

Significance. If the second-order remainder properties hold under standard nonparametric nuisance estimators, the work extends doubly robust orthogonal learning to two important conditional effect measures for binary outcomes, enabling more robust estimation of heterogeneous treatment effects in settings where ATE alone is insufficient. The comprehensive simulation design across diverse distributions is a clear strength and supplies practical guidance on estimator choice; the real-data illustration shows the methods can uncover heterogeneity missed by parametric regression.

major comments (2)
  1. [§3.3, Eq. (12)] §3.3, Eq. (12) and surrounding derivation: the claimed second-order remainder for the RR pseudo-outcome is stated as the product of propensity and outcome-regression errors, but the text does not derive or cite the precise rate conditions (e.g., each nuisance converging faster than n^{-1/4}) under which this product is o_p(n^{-1/2}) when the conditional probabilities approach the boundary; without these conditions the advertised sqrt(n) consistency is not guaranteed for standard estimators such as random forests or kernels.
  2. [§5.1] §5.1, simulation design: none of the 200+ data-generating processes include regimes in which the propensity score approaches 0 or 1 while the outcome regression remains bounded away from 0/1; this omission leaves untested the finite-sample behavior of the second-order property precisely where the product remainder is most likely to degrade.
minor comments (2)
  1. [§3 and §4] Notation for the OR and RR pseudo-outcomes is introduced in §3 but reused with slight variations in §4; a single consolidated definition table would improve readability.
  2. [§6] The NHANES application in §6 reports point estimates and confidence intervals but does not include a sensitivity analysis to the choice of nuisance estimators or bandwidths.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which have identified important points for clarification and strengthening. We address each major comment below and will incorporate revisions accordingly.

read point-by-point responses
  1. Referee: [§3.3, Eq. (12)] §3.3, Eq. (12) and surrounding derivation: the claimed second-order remainder for the RR pseudo-outcome is stated as the product of propensity and outcome-regression errors, but the text does not derive or cite the precise rate conditions (e.g., each nuisance converging faster than n^{-1/4}) under which this product is o_p(n^{-1/2}) when the conditional probabilities approach the boundary; without these conditions the advertised sqrt(n) consistency is not guaranteed for standard estimators such as random forests or kernels.

    Authors: We appreciate the referee highlighting the need for explicit rate conditions. The derivation in §3.3 shows that the remainder for the RR pseudo-outcome is the product of the propensity-score and outcome-regression errors. Under the maintained assumption that the conditional probabilities are bounded away from 0 and 1, standard nonparametric rates (each nuisance estimator o_p(n^{-1/4})) make the product o_p(n^{-1/2}), as is standard in the orthogonal-learning literature. Near boundaries, additional regularity or trimming is indeed required, analogous to conditions in doubly robust estimation for binary outcomes. We will revise the text to state these conditions explicitly and cite the relevant rate results for product remainders. revision: yes

  2. Referee: [§5.1] §5.1, simulation design: none of the 200+ data-generating processes include regimes in which the propensity score approaches 0 or 1 while the outcome regression remains bounded away from 0/1; this omission leaves untested the finite-sample behavior of the second-order property precisely where the product remainder is most likely to degrade.

    Authors: The referee is correct that our simulation design did not include propensity scores approaching the boundaries. The 200+ DGPs were chosen to span a wide range of complexities in the conditional effects and nuisance functions, but we agree that boundary regimes are a natural stress test for the product remainder. In the revision we will add a targeted set of simulations with propensity scores near 0 and 1 (while keeping outcome regressions bounded away from 0/1) to evaluate finite-sample behavior in these cases. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation of OR/RR orthogonal risks

full rationale

The paper derives orthogonal risk functions and associated pseudo-outcomes for conditional OR and RR as a direct mathematical generalization of the ATE framework, establishing second-order remainder properties through algebraic manipulation of the relevant expressions. This process does not reduce any claimed result to its inputs by construction, nor does it rely on fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations. The derivations stand as independent content, with the simulation study and NHANES application providing separate empirical support rather than circular reinforcement.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract alone, the paper relies on standard causal inference assumptions but introduces no explicit new free parameters or invented entities; full details on any implicit fitting choices are unavailable.

axioms (1)
  • domain assumption Standard causal assumptions including consistency, no unmeasured confounding, and positivity.
    Required for defining and identifying conditional OR and RR in observational data.

pith-pipeline@v0.9.0 · 5590 in / 1322 out tokens · 63819 ms · 2026-05-10T16:39:04.711219+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 3 canonical work pages

  1. [1]

    and Pearl, J

    Bareinboim, E. and Pearl, J. (2012). Controlling selection bias in causal inference. In Lawrence, N. D. and Girolami, M., editors, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics , volume 22 of Proceedings of Machine Learning Research , pages 100--108, La Palma, Canary Islands. PMLR

  2. [2]

    J., Klaassen, C

    Bickel, P. J., Klaassen, C. A. J., Ritov, Y., and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models . Johns Hopkins University Press

  3. [3]

    Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal , 21(1):C1--C68

  4. [4]

    A., George, E

    Chipman, H. A., George, E. I., and McCulloch, R. E. (2010). Bart: Bayesian additive regression trees. The Annals of Applied Statistics , 4(1):266--298

  5. [5]

    D \' az, I., Savenkov, O., and Ballman, K. (2018). Targeted learning ensembles for optimal individualized treatment rules with time-to-event outcomes. Biometrika , 105(3):723--738

  6. [6]

    L., and Schenck, E

    D \' az, I., Williams, N., Hoffman, K. L., and Schenck, E. J. (2023). Nonparametric causal effects based on longitudinal modified treatment policies. Journal of the American Statistical Association , 118(542):846--857

  7. [7]

    portable

    Doi, S. A., Furuya-Kanamori, L., Xu, C., Chivese, T., Lin, L., Musa, O. A., Hindy, G., Thalib, L., and Harrell Jr, F. E. (2022). The odds ratio is “portable” across baseline risk but not the relative risk: time to do away with the log link in binomial regression. Journal of Clinical Epidemiology , 142:288--293

  8. [8]

    Dunson, D. B. and Xing, C. (2009). Nonparametric bayes modeling of multivariate categorical data. Journal of the American Statistical Association , 104(487):1042--1051

  9. [9]

    Foster, D. J. and Syrgkanis, V. (2023). Orthogonal statistical learning. The Annals of Statistics , 51(3):879--908

  10. [10]

    Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics , 19(1):1--67

  11. [11]

    Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis

    Harrell (2015). Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis . Springer Series in Statistics. Springer, Cham, 2 edition

  12. [12]

    Hines, O., Dukes, O., Diaz-Ordaz, K., and Vansteelandt, S. (2022). Demystifying statistical learning based on efficient influence functions. The American Statistician , 76(3):292--304

  13. [13]

    W., Lemeshow, S., and Sturdivant, R

    Hosmer Jr, D. W., Lemeshow, S., and Sturdivant, R. X. (2013). Applied logistic regression . John Wiley & Sons

  14. [14]

    Jun, S. J. and Lee, S. (2023). Average adjusted association: Efficient estimation with high dimensional confounders. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics , volume 206 of Proceedings of Machine Learning Research , pages 5980--5996

  15. [15]

    Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017)

  16. [16]

    Kennedy, E. H. (2023). Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics , 17(2):3008--3049

  17. [17]

    H., Ma, Z., McHugh, M

    Kennedy, E. H., Ma, Z., McHugh, M. D., and Small, D. S. (2017). Nonparametric methods for doubly robust estimation of continuous treatment effects. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 79(4):1229--1245

  18. [18]

    R., Sekhon, J

    K \"u nzel, S. R., Sekhon, J. S., Bickel, P. J., and Yu, B. (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences , 116(10):4156--4165

  19. [19]

    Sequential Double Robustness in Right-Censored Longitudinal Models

    Luedtke, A. R., Sofrygin, O., van der Laan, M. J., and Carone, M. (2017). Sequential double robustness in right-censored longitudinal models. arXiv preprint arXiv:1705.02459

  20. [20]

    Luedtke, A. R. and van der Laan, M. J. (2016). Super-learning of an optimal dynamic treatment rule. The international journal of biostatistics , 12(1):305--332

  21. [21]

    Malinsky, D., Shpitser, I., and Tchetgen Tchetgen, E. J. (2022). Semiparametric inference for nonmonotone missing-not-at-random data: the no self-censoring model. Journal of the American Statistical Association , 117(539):1415--1423

  22. [22]

    Mittinty, M. N. and Lynch, J. (2023). Reflection on modern methods: risk ratio regression—simple concept yet complex computation. International journal of epidemiology , 52(1):309--314

  23. [23]

    Morzywo ek, P., Decruyenaere, J., and Vansteelandt, S. (2025). On weighted orthogonal learners for heterogeneous treatment effects. Statistical Science . Future paper. Preprint: arXiv:2303.12687

  24. [24]

    and Wager, S

    Nie, X. and Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika , 108(2):299--319

  25. [25]

    Pearl, J. (2009). Causality: Models, Reasoning, and Inference . Cambridge University Press, Cambridge, 2 edition

  26. [26]

    Pruim, R. (2014). NHANES: Data from the US National Health and Nutrition Examination Study . R package version 2.1.0

  27. [27]

    and van der Laan , M

    Rubin, D. and van der Laan , M. J. (2007). A doubly robust censoring unbiased transformation. The International Journal of Biostatistics , 3(1):Article 4

  28. [28]

    E., Williams, N

    Rudolph, K. E., Williams, N. T., Miles, C. H., Antonelli, J., and D \' az, I. (2023). All models are wrong, but which are useful? comparing parametric and nonparametric estimation of causal effects in finite samples. Journal of Causal Inference , 11(1):20230022

  29. [29]

    van der Laan , L., Carone, M., and Luedtke, A. (2024). Combining t-learning and dr-learning: a framework for oracle-efficient estimation of causal contrasts. arXiv preprint arXiv:2402.01972

  30. [30]

    van der Laan, M. J. (2006). Statistical inference for variable importance. The International Journal of Biostatistics , 2(1)

  31. [31]

    J., Polley, E

    van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007). Super learner. Statistical Applications in Genetics and Molecular Biology , 6(1):Article 25

  32. [32]

    van der Laan , M. J. and Rubin, D. (2006). Targeted maximum likelihood learning. The International Journal of Biostatistics , 2(1)