pith. sign in

arxiv: 2412.11136 · v2 · submitted 2024-12-15 · 📊 stat.ME · stat.ML

Minimax Regret Estimation for Generalizing Heterogeneous Treatment Effects with Multisite Data

Pith reviewed 2026-05-23 06:55 UTC · model grok-4.3

classification 📊 stat.ME stat.ML
keywords causal inferenceheterogeneous treatment effectsmultisite studiesminimax regretgeneralizabilityconditional average treatment effectrobust optimizationdistribution shift
0
0 comments X

The pith

Minimax regret estimation yields a weighted average of site-specific CATE models for generalizing heterogeneous treatment effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Researchers often want to learn how treatments work differently for different people, but data from multiple sites may not match the target population in unknown ways. This paper introduces a minimax regret framework that finds a CATE model minimizing the worst possible regret across all target populations that can be formed as convex combinations of the site models. The solution is a closed-form weighted average of the individual site CATE estimates, which can be computed after applying any preferred method within each site. This approach explicitly accounts for shifts in both who is in the population and how the treatment effects vary. A sympathetic reader would care because it offers a way to make causal findings more transportable without assuming the sites are representative.

Core claim

The methodology minimizes the worst-case regret over the class of target CATE functions that are convex combinations of site-specific CATE functions. Using robust optimization, it produces a CATE estimator that is the weighted average of site-specific estimators and accounts for distribution shifts in covariates and treatment effect heterogeneity.

What carries the argument

The minimax-regret criterion over convex combinations of site-specific CATEs, leading to a closed-form solution as their weighted average.

If this is right

  • The resulting model is more robust to unknown distribution shifts than pooled or single-site estimates.
  • Any site-specific CATE method can be used and then aggregated using the derived weights.
  • The framework improves generalizability in multisite studies with heterogeneous populations.
  • Empirical results in simulations and applications demonstrate better performance under shifts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The convex combination assumption may limit applicability if target effects lie outside the convex hull of sites.
  • This method could be tested by simulating target populations not in the convex hull to see performance degradation.
  • In practice, the weights could inform how much to trust each site's data for a new context.

Load-bearing premise

Any possible target population's conditional average treatment effect can be written as a convex combination of the conditional average treatment effects estimated from the observed sites.

What would settle it

Observe a target population where the true CATE function lies outside the convex hull of the site-specific CATE functions; if the minimax model then shows higher regret or worse accuracy than alternatives, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2412.11136 by Kosuke Imai, Melody Huang, Yi Zhang.

Figure 1
Figure 1. Figure 1: Average MSE of multisite CATE estimates from three methods across 1,000 simulations, [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Worst-case MSE averaged across 1,000 iterations under varying sample size ratios: (1) [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
read the original abstract

To test scientific theories and develop individualized treatment rules, researchers often wish to learn heterogeneous treatment effects that can be consistently found across diverse populations and contexts. We consider the problem of generalizing heterogeneous treatment effects (HTE) based on data from multiple sites. A key challenge is that a target population may differ from the source sites in unknown and unobservable ways. This means that the estimates from site-specific models lack external validity, and a simple pooled analysis risks bias. We develop a robust CATE (conditional average treatment effect) estimation methodology with multisite data from heterogeneous populations. We propose a minimax-regret framework that learns a generalizable CATE model by minimizing the worst-case regret over a class of target populations whose CATE can be represented as convex combinations of site-specific CATEs. Using robust optimization, the proposed methodology accounts for distribution shifts in both individual covariates and treatment effect heterogeneity across sites. We show that the resulting CATE model has an interpretable closed-form solution, expressed as a weighted average of site-specific CATE models. Thus, researchers can utilize a flexible CATE estimation method within each site and aggregate site-specific estimates to produce the final model. Through simulations and a real-world application, we show that the proposed methodology improves the robustness and generalizability of existing approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a minimax-regret framework for generalizing CATE estimation from multisite data. It minimizes worst-case regret over target populations whose CATE functions lie in the convex hull of site-specific CATEs, derives a closed-form solution as a weighted average of site-specific models via robust optimization, and claims this accounts for shifts in both covariates and treatment-effect heterogeneity. The approach is evaluated in simulations and one real-world application, with the final model allowing flexible site-specific estimators.

Significance. If the derivation and scope hold, the closed-form weighted-average solution is a clear strength, as it permits plug-in use of arbitrary site-specific CATE estimators while providing an interpretable aggregator. The simulations and real-data demonstration add concrete evidence of practical improvement over pooled or single-site baselines. The significance is limited by the fact that all robustness guarantees are confined to the convex-combination class by construction.

major comments (2)
  1. [Abstract and §2–3] Abstract and the definition of the uncertainty set (likely §2–3): the claim that the method 'accounts for distribution shifts in both individual covariates and treatment effect heterogeneity across sites' is overstated. The uncertainty set restricts target CATEs to convex combinations of the observed site-specific CATEs; outside this class (e.g., non-linear combinations, new moderators, or functional forms not spanned by the sites), the worst-case regret guarantee does not apply and the procedure reduces to an ad-hoc aggregator. No argument or diagnostic is supplied that real multisite heterogeneity shifts typically satisfy the convex-combination representation.
  2. [§3] §3 (robust-optimization step and closed-form derivation): the manuscript states that the solution is obtained via robust optimization but provides no explicit verification that the resulting weights remain stable under perturbations of the site-specific CATE estimators or under modest violations of the convex-hull assumption. Because the closed-form expression is the central deliverable, the absence of such sensitivity analysis or error bounds is load-bearing for the robustness claim.
minor comments (2)
  1. [§3] Notation for the site-specific CATE estimators and the convex weights should be introduced with a single consistent symbol table to avoid ambiguity when moving between the minimax objective and the final weighted-average expression.
  2. [Simulation study] The simulation section would benefit from an explicit statement of whether the data-generating processes used to create the 'target' populations lie inside or outside the convex hull of the training-site CATEs; this would directly address the scope of the reported gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We agree that the abstract language regarding the scope of robustness could be clarified to more accurately reflect the convex-combination uncertainty set, and we will revise accordingly. We will also add discussion and analysis addressing sensitivity of the closed-form weights. Point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract and §2–3] Abstract and the definition of the uncertainty set (likely §2–3): the claim that the method 'accounts for distribution shifts in both individual covariates and treatment effect heterogeneity across sites' is overstated. The uncertainty set restricts target CATEs to convex combinations of the observed site-specific CATEs; outside this class (e.g., non-linear combinations, new moderators, or functional forms not spanned by the sites), the worst-case regret guarantee does not apply and the procedure reduces to an ad-hoc aggregator. No argument or diagnostic is supplied that real multisite heterogeneity shifts typically satisfy the convex-combination representation.

    Authors: We agree the phrasing in the abstract is imprecise and could suggest broader applicability than the framework provides. The uncertainty set is explicitly the convex hull of site-specific CATEs, which permits certain linear interpolations of heterogeneity and covariate shifts within the span of observed sites but does not cover arbitrary functional forms or extrapolation. We will revise the abstract, introduction, and §2–3 to state clearly that minimax-regret guarantees hold for target CATEs in this convex hull. We will add a limitations paragraph noting that the convex-combination assumption is central to the guarantees and that the method reduces to an aggregator outside this class. While a general empirical diagnostic across all multisite settings is beyond the scope of this work, the framework is motivated by the practical case where site-specific CATEs capture the relevant range of observed heterogeneity. revision: yes

  2. Referee: [§3] §3 (robust-optimization step and closed-form derivation): the manuscript states that the solution is obtained via robust optimization but provides no explicit verification that the resulting weights remain stable under perturbations of the site-specific CATE estimators or under modest violations of the convex-hull assumption. Because the closed-form expression is the central deliverable, the absence of such sensitivity analysis or error bounds is load-bearing for the robustness claim.

    Authors: We acknowledge that the derivation assumes fixed site-specific estimators and exact satisfaction of the convex-hull condition. To strengthen the robustness claim for the closed-form solution, we will add an appendix deriving first-order bounds on weight changes under additive perturbations to the site-specific CATE estimators (using the explicit form of the weights) and include a small simulation study examining performance degradation under modest convex-hull violations. These additions will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; minimax derivation is self-contained within explicitly stated uncertainty class

full rationale

The paper defines an uncertainty set consisting of target CATE functions that are convex combinations of the observed site-specific CATEs, then applies standard robust optimization (minimax regret) to that set. The resulting closed-form weighted-average estimator follows directly from the mathematics of the chosen set and objective; it is not obtained by re-fitting or re-labeling the inputs. The assumption that real targets lie in this convex hull is stated explicitly as the scope of the guarantee rather than derived or smuggled in. No self-citation chains, fitted parameters renamed as predictions, or uniqueness theorems imported from the authors' prior work appear in the derivation. The framework is therefore independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that any plausible target CATE lies in the convex hull of the site-specific CATE functions; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Target population CATE belongs to the convex hull of site-specific CATEs
    Explicitly stated as the class of target populations over which worst-case regret is minimized.

pith-pipeline@v0.9.0 · 5763 in / 1226 out tokens · 28137 ms · 2026-05-23T06:55:38.478514+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Privacy-preserving Meta-analysis through Low-Rank Basis Hunting

    stat.ME 2026-04 unverdicted novelty 7.0

    MetaHunt recovers latent basis functions via an extended successive projection algorithm to enable privacy-preserving prediction of function-valued meta-analytic quantities from study-level covariates and estimates alone.

  2. A Functional-Class Meta-Analytic Framework for Quantifying Surrogate Resilience

    stat.ME 2026-04 unverdicted novelty 6.0

    A meta-analytic framework estimates the resilience probability of a surrogate marker to the surrogate paradox in a new study by modeling deviations from functional relationships observed in completed trials.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    Agarwal, A. and T. Zhang (2022). Minimax regret optimization for robust machine learning under distribution shift. In Conference on Learning Theory , pp.\ 2704--2729. PMLR

  2. [2]

    Karlan, and J

    Angelucci, M., D. Karlan, and J. Zinman (2015). Microcredit impacts: Evidence from a randomized microcredit program placement experiment by compartamos banco. American Economic Journal: Applied Economics\/ 7\/ (1), 151--182

  3. [3]

    Augsburg, R

    Attanasio, O., B. Augsburg, R. De Haas, E. Fitzsimons, and H. Harmgart (2015). The impacts of microfinance: Evidence from joint-liability lending in mongolia. American Economic Journal: Applied Economics\/ 7\/ (1), 90--122

  4. [4]

    De Haas, H

    Augsburg, B., R. De Haas, H. Harmgart, and C. Meghir (2015). The impacts of microcredit: Evidence from bosnia and herzegovina. American Economic Journal: Applied Economics\/ 7\/ (1), 183--203

  5. [5]

    Haghtalab, and E

    Awasthi, P., N. Haghtalab, and E. Zhao (2023). Open problem: The sample complexity of multi-distribution learning for vc classes. In The Thirty Sixth Annual Conference on Learning Theory , pp.\ 5943--5949. PMLR

  6. [6]

    Bareinboim, E. and J. Pearl (2016). Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences\/ 113\/ (27), 7345--7352

  7. [7]

    Ben-Michael, E., D. J. Greiner, K. Imai, and Z. Jiang (2021). Safe policy learning through extrapolation: Application to pre-trial risk assessment. arXiv preprint arXiv:2109.11679\/

  8. [8]

    Den Hertog, A

    Ben-Tal, A., D. Den Hertog, A. De Waegenaere, B. Melenberg, and G. Rennen (2013). Robust solutions of optimization problems affected by uncertain probabilities. Management Science\/ 59\/ (2), 341--357

  9. [9]

    L., T.-H

    Brantner, C. L., T.-H. Chang, T. Q. Nguyen, H. Hong, L. Di Stefano, and E. A. Stuart (2023). Methods for integrating trials and non-experimental data to examine treatment effect heterogeneity. Statistical science: a review journal of the Institute of Mathematical Statistics\/ 38\/ (4), 640

  10. [10]

    Brantner, C. L., T. Q. Nguyen, T. Tang, C. Zhao, H. Hong, and E. A. Stuart (2024). Comparison of methods that combine multiple randomized trials to estimate heterogeneous treatment effects. Statistics in Medicine\/

  11. [11]

    Burke, D. L., J. Ensor, and R. D. Riley (2017). Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ. Statistics in medicine\/ 36\/ (5), 855--875

  12. [12]

    Carranza, A. G. and S. Athey (2024). Robust offline policy learning with observational data from multiple sources. arXiv preprint arXiv:2410.08537\/

  13. [13]

    Cheng, D. and T. Cai (2021). Adaptive combination of randomized and observational data. arXiv preprint arXiv:2111.15012\/

  14. [14]

    Devoto, E

    Cr \'e pon, B., F. Devoto, E. Duflo, and W. Parient \'e (2015). Estimating the impact of microcredit on those who take it up: Evidence from a randomized experiment in morocco. American Economic Journal: Applied Economics\/ 7\/ (1), 123--150

  15. [15]

    Svensson, J

    Curth, A., D. Svensson, J. Weatherall, and M. van der Schaar (2021). Really doing great at estimating cate? a critical look at ml benchmarking practices in treatment effect estimation. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2)

  16. [16]

    Curth, A. and M. Van der Schaar (2021). Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms. In International Conference on Artificial Intelligence and Statistics , pp.\ 1810--1818. PMLR

  17. [17]

    Dahabreh, I. J. and M. A. Hern \'a n (2019). Extending inferences from a randomized trial to a target population. European journal of epidemiology\/ 34 , 719--722

  18. [18]

    Dahabreh, I. J., S. E. Robertson, L. C. Petito, M. A. Hern \'a n, and J. A. Steingrimsson (2023). Efficient and robust methods for causally interpretable meta-analysis: Transporting inferences from multiple randomized trials to a target population. Biometrics\/ 79\/ (2), 1057--1072

  19. [19]

    Debray, T. P., K. G. Moons, G. van Valkenhoef, O. Efthimiou, N. Hummel, R. H. Groenwold, J. B. Reitsma, and G. M. R. Group (2015). Get real in individual participant data (ipd) meta-analysis: a review of the methodology. Research synthesis methods\/ 6\/ (4), 293--309

  20. [20]

    Deng, Y., M. M. Kamani, and M. Mahdavi (2020). Distributionally robust federated averaging. Advances in neural information processing systems\/ 33 , 15111--15122

  21. [21]

    Guo, and N

    Dorn, J., K. Guo, and N. Kallus (2024). Doubly-valid/doubly-sharp sensitivity analysis for causal inference with unmeasured confounding. Journal of the American Statistical Association\/ , 1--12

  22. [22]

    Duchi, J. C., P. W. Glynn, and H. Namkoong (2021). Statistics of robust optimization: A generalized empirical likelihood approach. Mathematics of Operations Research\/ 46\/ (3), 946--969

  23. [23]

    Duchi, J. C. and H. Namkoong (2021). Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics\/ 49\/ (3), 1378--1406

  24. [24]

    Egami, N. and E. Hartman (2023). Elements of external validity: Framework, design, and analysis. American Political Science Review\/ 117\/ (3), 1070--1088

  25. [25]

    Egami, N. and D. D. I. Lee (2024). Designing multi-site studies for external validity: Site selection via synthetic purposive sampling. Available at SSRN 4717330\/

  26. [26]

    Elzinga, D. J. and D. W. Hearn (1972). The minimum covering sphere problem. Management science\/ 19\/ (1), 96--104

  27. [27]

    Guo, Z. (2023). Statistical inference for maximin effects: Identifying stable associations across multiple studies. Journal of the American Statistical Association\/ , 1--17

  28. [28]

    Hahn, P. R., J. S. Murray, and C. M. Carvalho (2020). Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects (with discussion). Bayesian Analysis\/ 15\/ (3), 965--1056

  29. [29]

    Hastings, J., C. Jung, C. Peale, and V. Syrgkanis (2024). Taking a moment for distributional robustness. arXiv preprint arXiv:2405.05461\/

  30. [30]

    Berrevoets, A

    Hatt, T., J. Berrevoets, A. Curth, S. Feuerriegel, and M. van der Schaar (2022). Combining observational and randomized data for estimating heterogeneous treatment effects. arXiv preprint arXiv:2202.12891\/

  31. [31]

    Hu, W., G. Niu, I. Sato, and M. Sugiyama (2018). Does distributionally robust supervised learning give robust classifiers? In International Conference on Machine Learning , pp.\ 2029--2037. PMLR

  32. [32]

    (2024+a)

    Huang, M. (2024+a). Overlap violations in external validity. Annals of Applied Statistics (Forthcoming)\/

  33. [33]

    Egami, E

    Huang, M., N. Egami, E. Hartman, and L. Miratrix (2023). Leveraging population outcomes to improve the generalization of experimental results: Application to the jtpa study. The Annals of Applied Statistics\/ 17\/ (3), 2139--2164

  34. [34]

    Huang, M. Y. (2024b). Sensitivity analysis for the generalization of experimental results. Journal of the Royal Statistical Society Series A: Statistics in Society\/ , qnae012

  35. [35]

    Imai, K. and M. Ratkovic (2013). Estimating treatment effect heterogeneity in randomized program evaluation

  36. [36]

    Ishihara, T. and T. Kitagawa (2021). Evidence aggregation for treatment choice. arXiv preprint arXiv:2108.06473\/

  37. [37]

    Ren, and Z

    Jin, Y., Z. Ren, and Z. Zhou (2022). Sensitivity analysis under the f -sensitivity models: a distributional robustness perspective. arXiv preprint arXiv:2203.04373\/

  38. [38]

    Kallus, N. and A. Zhou (2018). Confounding-robust policy improvement. Advances in neural information processing systems\/ 31

  39. [39]

    Kallus, N. and A. Zhou (2021). Minimax-optimal policy learning under unobserved confounding. Management Science\/ 67\/ (5), 2870--2890

  40. [40]

    Kennedy, E. H. (2023). Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics\/ 17\/ (2), 3008--3049

  41. [41]

    Kim, and A

    Kern, C., M. Kim, and A. Zhou (2024). Multi-cate: Multi-accurate conditional average treatment effect estimation robust to unknown covariate shifts. arXiv preprint arXiv:2405.18206\/

  42. [42]

    Kido, D. (2022). Distributionally robust policy learning with wasserstein distance. arXiv preprint arXiv:2205.04637\/

  43. [43]

    K \"u nzel, S. R., J. S. Sekhon, P. J. Bickel, and B. Yu (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences\/ 116\/ (10), 4156--4165

  44. [44]

    Sahoo, and S

    Lei, L., R. Sahoo, and S. Wager (2023). Policy learning under biased sample selection. arXiv preprint arXiv:2304.11735\/

  45. [45]

    Manski, C. F. (2004). Statistical treatment rules for heterogeneous populations. Econometrica\/ 72\/ (4), 1221--1246

  46. [46]

    Manski, C. F. (2011). Choosing treatment policies under ambiguity. Annu. Rev. Econ.\/ 3\/ (1), 25--49

  47. [47]

    Meager, R. (2019). Understanding the average impact of microcredit expansions: A bayesian hierarchical analysis of seven randomized experiments. American Economic Journal: Applied Economics\/ 11\/ (1), 57--91

  48. [48]

    Meinshausen, N. and P. B \"u hlmann (2015). Maximin effects in inhomogeneous large-scale data

  49. [49]

    Miratrix, L. W., M. J. Weiss, and B. Henderson (2021). An applied researcher’s guide to estimating effects from multisite individually randomized trials: Estimands, estimators, and estimates. Journal of Research on Educational Effectiveness\/ 14\/ (1), 270--308

  50. [50]

    Qi, and Y

    Mo, W., Z. Qi, and Y. Liu (2021). Learning optimal distributionally robust individualized treatment rules. Journal of the American Statistical Association\/ 116\/ (534), 659--674

  51. [51]

    Mo, W., W. Tang, S. Xue, Y. Liu, and J. Zhu (2024). Minimax regret learning for data with heterogeneous subgroups. arXiv preprint arXiv:2405.01709\/

  52. [52]

    Nguyen, T. Q., C. Ebnesajjad, S. R. Cole, and E. A. Stuart (2017). Sensitivity analysis for an unobserved moderator in rct-to-target-population generalization of treatment effects. The Annals of Applied Statistics\/ , 225--247

  53. [53]

    Imbens, and S

    Nie, X., G. Imbens, and S. Wager (2021). Covariate balancing sensitivity analysis for extrapolating randomized trials across locations. arXiv preprint arXiv:2112.04723\/

  54. [54]

    Nie, X. and S. Wager (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika\/ 108\/ (2), 299--319

  55. [55]

    Roodman, D. (2012). Due diligence: An impertinent inquiry into microfinance . CGD Books

  56. [56]

    Rubin, D. B. (1990). Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science\/ 5\/ (4), 472--480

  57. [57]

    Sagawa, S., P. W. Koh, T. B. Hashimoto, and P. Liang (2019). Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731\/

  58. [58]

    Savage, L. J. (1951). The theory of statistical decision. Journal of the American Statistical Association\/ 46\/ (253), 55--67

  59. [59]

    Seo, M., I. R. White, T. A. Furukawa, H. Imai, M. Valgimigli, M. Egger, M. Zwahlen, and O. Efthimiou (2021). Comparing methods for estimating patient-specific treatment effects in individual patient data meta-analysis. Statistics in medicine\/ 40\/ (6), 1553--1573

  60. [60]

    Shalit, U., F. D. Johansson, and D. Sontag (2017). Estimating individual treatment effect: generalization bounds and algorithms. In International conference on machine learning , pp.\ 3076--3085. PMLR

  61. [61]

    Shyr, C., B. Ren, P. Patil, and G. Parmigiani (2023). Multi-study r-learner for estimating heterogeneous treatment effects across studies using statistical machine learning. arXiv e-prints\/ , arXiv--2306

  62. [62]

    Stoye, J. (2012). Minimax regret treatment choice with covariates or with limited validity of experiments. Journal of Econometrics\/ 166\/ (1), 138--156

  63. [63]

    Stuart, E. A., C. P. Bradshaw, and P. J. Leaf (2015). Assessing the generalizability of randomized trial results to target populations. Prevention Science\/ 16 , 475--485

  64. [64]

    Tan, X., C.-C. H. Chang, L. Zhou, and L. Tang (2022). A tree-based model averaging approach for personalized treatment effect estimation from heterogeneous data sources. In International Conference on Machine Learning , pp.\ 21013--21036. PMLR

  65. [65]

    Tipton, E. and M. Mamakos (2023). Designing randomized experiments to predict unit-specific treatment effects. arXiv preprint arXiv:2310.18500\/

  66. [66]

    Wager, S. and S. Athey (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association\/ 113\/ (523), 1228--1242

  67. [67]

    B \"u hlmann, and Z

    Wang, Z., P. B \"u hlmann, and Z. Guo (2023). Distributionally robust machine learning with multi-source data. arXiv preprint arXiv:2309.02211\/

  68. [68]

    Wolfe, P. (1961). A duality theorem for non-linear programming. Quarterly of applied mathematics\/ 19\/ (3), 239--244

  69. [69]

    Namkoong, S

    Yadlowsky, S., H. Namkoong, S. Basu, J. Duchi, and L. Tian (2018). Bounds on the conditional and average treatment effect with unobserved confounding factors. arXiv preprint arXiv:1808.09521\/

  70. [70]

    Yang, S., C. Gao, D. Zeng, and X. Wang (2023). Elastic integrative analysis of randomised trial and real-world data for treatment heterogeneity estimation. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 85\/ (3), 575--596

  71. [71]

    Zhang, Z., W. Zhan, Y. Chen, S. S. Du, and J. D. Lee (2023). Optimal multi-distribution learning. arXiv preprint arXiv:2312.05134\/