arxiv: 2605.10430 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.AI· stat.ML

Recognition: 2 theorem links

· Lean Theorem

Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation

George Panagopoulos

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords treatment effect estimationevaluation metricssemi-simulated benchmarksreal-world datameta-learnerscausal machine learningcounterfactual metricsobservable metrics

0 comments

The pith

Semi-simulated benchmarks with counterfactual metrics do not identify the treatment effect estimators that perform best under observable metrics on real data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper runs a broad empirical comparison of machine learning methods for heterogeneous treatment effect estimation. It tests the same set of estimators on standard semi-simulated benchmark families using counterfactual metrics and on real-world datasets using observable metrics such as ranking quality or test outcomes. The comparison finds that the two evaluation regimes select different preferred estimators and that benchmark rankings do not carry over to real data. Simple meta-learners paired with strong base models remain competitive across both regimes, while specialized causal models show no consistent edge. These patterns imply that progress measured only by current benchmarks may not translate to practical settings.

Core claim

Our results reveal two complementary gaps. First, counterfactual metrics do not reliably recover the estimators preferred by observable metrics, even on the same semi-simulated benchmarks. Second, rankings obtained on semi-simulated benchmarks do not transfer to real datasets. We further find that simple meta-learners with strong base models are consistently competitive, in contrast to specialized causal models.

What carries the argument

Large-scale empirical comparison of meta-learners and specialized causal models across counterfactual metrics on semi-simulated benchmarks versus observable metrics on real datasets.

If this is right

Treatment effect research should combine observable metrics and real-data validation with existing benchmark practices.
Simple meta-learners with strong base models can serve as competitive baselines without requiring specialized causal architectures.
Model selection for applications cannot rely solely on semi-simulated rankings.
Specialized causal models require additional real-world evidence to demonstrate advantage over simpler alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New evaluation protocols could be built around proxy tasks that approximate observable outcomes without needing counterfactual ground truth.
Emphasis may shift toward improving base learners rather than inventing more elaborate causal wrappers.
Additional real datasets from domains such as medicine or policy could be collected to test whether the observed gaps persist.

Load-bearing premise

The chosen real-world datasets are representative of practical deployment and the observable metrics based on ranking or test outcomes accurately reflect the value of treatment effect estimates in those settings.

What would settle it

A new study that finds the same estimators consistently rank at the top under both counterfactual metrics on semi-simulated data and observable metrics on a broad collection of real datasets would contradict the reported gaps.

Figures

Figures reproduced from arXiv: 2605.10430 by George Panagopoulos.

**Figure 2.** Figure 2: Average rank shift ∆Rank = Ranksemi − Rankreal from semi-simulated to real datasets for each observable metric. Positive indicates rank improvement from semi-simulated to real, and vice versa. the figure shows substantial dispersion: methods with similar standing on semi-simulated data can have different standing on real data, and vice versa [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Average rank by causal strategy on semi-simulated and real datasets. Lower is better. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Matched Criteo case study. Left: average ranks of S-, T-, and X-learners with XGBoost and [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Average ranks of all methods on the semi-simulated datasets across all evaluation metrics. [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Average ranks of all methods on the real datasets across observable evaluation metrics. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

read the original abstract

Estimating heterogeneous treatment effects with machine learning has attracted substantial attention in both academic research and industrial practice. However, the two communities often evaluate models under markedly different conditions. Methodological work typically relies on semi-simulated benchmarks and metrics that require counterfactual outcomes, whereas real-world applications rely on observable metrics based on ranking or test outcomes. Despite the well-known gap between methodological progress and practical deployment, the relationship between these evaluation regimes has not been examined systematically. We conduct a large-scale empirical study of treatment effect evaluation across standard semi-simulated benchmark families and real-world datasets. Our benchmark covers meta-learners paired with multiple base learners, as well as specialized causal machine learning models. We evaluate these methods using observable metrics common in application-oriented literature, alongside counterfactual metrics commonly used in methods papers. Our results reveal two complementary gaps. First, counterfactual metrics do not reliably recover the estimators preferred by observable metrics, even on the same semi-simulated benchmarks. Second, rankings obtained on semi-simulated benchmarks do not transfer to real datasets. We further find that simple meta-learners with strong base models are consistently competitive, in contrast to specialized causal models. Overall, our findings suggest that progress in treatment effect estimation research should not be assessed solely through counterfactual metrics and semi-simulated benchmarks, but it would benefit from incorporating observable metrics and real-data validation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript reports a large-scale empirical study comparing treatment effect estimation methods under two evaluation regimes: counterfactual metrics on semi-simulated benchmarks versus observable metrics (ranking- or test-outcome-based) on real-world datasets. It covers meta-learners with various base learners and specialized causal models. The central claims are that counterfactual metrics fail to recover the estimators preferred by observable metrics even on identical semi-simulated benchmarks, that method rankings from semi-simulated data do not transfer to real datasets, and that simple meta-learners with strong base models are consistently competitive with specialized causal models.

Significance. If the reported gaps prove robust, the work is significant for causal machine learning because it supplies concrete empirical evidence of misalignment between standard academic benchmarks and practical deployment criteria. This could prompt the field to broaden evaluation beyond counterfactual metrics and semi-simulated data. The study is strengthened by its scale and direct comparison of common methodological and application-oriented practices.

major comments (2)

[Abstract] Abstract: the claim that 'rankings obtained on semi-simulated benchmarks do not transfer to real datasets' is load-bearing for the paper's main message. This conclusion rests on observable metrics serving as faithful proxies for CATE quality, yet such metrics can be driven by overall response-model accuracy, treatment-selection patterns, or non-causal signals; the manuscript should include targeted analyses (e.g., ablation on base-model performance or synthetic controls) to isolate the contribution of heterogeneous-effect recovery.
[Abstract] Abstract: the statement that 'counterfactual metrics do not reliably recover the estimators preferred by observable metrics' requires explicit quantification of disagreement (e.g., rank correlation or top-k overlap) together with statistical tests and robustness checks across different observable metrics; without these, it is difficult to judge whether the observed mismatch is systematic or sensitive to metric choice.

minor comments (1)

[Abstract] Abstract: naming the specific semi-simulated benchmark families (e.g., IHDP, ACIC) and real-world datasets, along with their key characteristics (sample size, treatment prevalence), would improve reproducibility and allow readers to assess generalizability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and outline revisions to strengthen the manuscript's claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'rankings obtained on semi-simulated benchmarks do not transfer to real datasets' is load-bearing for the paper's main message. This conclusion rests on observable metrics serving as faithful proxies for CATE quality, yet such metrics can be driven by overall response-model accuracy, treatment-selection patterns, or non-causal signals; the manuscript should include targeted analyses (e.g., ablation on base-model performance or synthetic controls) to isolate the contribution of heterogeneous-effect recovery.

Authors: We appreciate this observation. Observable metrics on real data can indeed capture non-causal signals, but they reflect the evaluation standards used in practical deployment, which is the core motivation for contrasting them with academic counterfactual metrics. To isolate the contribution of heterogeneous effect recovery, we will add ablations comparing meta-learners to direct base-learner regressions that ignore treatment assignment (i.e., no meta-learning for CATE). Where data permits, we will also incorporate synthetic control analyses. These will appear in a new subsection of the experiments and be linked explicitly to the transferability claim. revision: partial
Referee: [Abstract] Abstract: the statement that 'counterfactual metrics do not reliably recover the estimators preferred by observable metrics' requires explicit quantification of disagreement (e.g., rank correlation or top-k overlap) together with statistical tests and robustness checks across different observable metrics; without these, it is difficult to judge whether the observed mismatch is systematic or sensitive to metric choice.

Authors: We agree that explicit quantification is needed for rigor. In the revision we will report Spearman's rho and Kendall's tau rank correlations between counterfactual-metric and observable-metric rankings on the semi-simulated benchmarks, together with top-k overlap percentages. We will apply permutation tests to assess whether the observed disagreements are statistically significant and will repeat the entire analysis across alternative observable metrics drawn from the application literature. These results and robustness checks will be added to Section 4 and the appendix. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking with no derivations

full rationale

This paper conducts a large-scale empirical comparison of treatment effect estimators across semi-simulated benchmarks and real-world datasets, evaluating them with both counterfactual and observable metrics. The abstract and described content contain no mathematical derivations, equations, fitted parameters, or ansatzes that could reduce to self-definitions or prior self-citations. Claims about gaps between evaluation regimes rest directly on experimental runs against external data sources rather than internal constructions, rendering the study self-contained with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard causal inference assumptions (no unmeasured confounding in real data, correct specification of base learners) and the representativeness of the chosen benchmarks and real datasets; no new free parameters, invented entities, or ad-hoc axioms are introduced beyond those implicit in the ML and causal literature.

axioms (1)

domain assumption Standard causal assumptions (e.g., no unmeasured confounding, positivity) hold sufficiently in the real-world datasets for observable metrics to be meaningful.
The study uses observable metrics on real data, which implicitly requires these assumptions to interpret results as treatment effect estimates.

pith-pipeline@v0.9.0 · 5537 in / 1459 out tokens · 54909 ms · 2026-05-12T04:33:50.031292+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We evaluate these methods using observable metrics common in application-oriented literature, alongside counterfactual metrics commonly used in methods papers.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages · 1 internal anchor

[1]

Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2) , year=

Really doing great at estimating cate? a critical look at ml benchmarking practices in treatment effect estimation , author=. Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2) , year=

work page
[2]

Advances in neural information processing systems , volume=

Removing hidden confounding by experimental grounding , author=. Advances in neural information processing systems , volume=

work page
[3]

Forty-second International Conference on Machine Learning , year=

Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation , author=. Forty-second International Conference on Machine Learning , year=

work page
[4]

Value in Health , year=

Estimating Heterogeneous Treatment Effects with Real-World Health Data--A Scoping Review of Machine Learning Methods , author=. Value in Health , year=

work page
[5]

Proceedings of the 13th international conference on web search and data mining , pages=

Learning individual causal effects from networked observational data , author=. Proceedings of the 13th international conference on web search and data mining , pages=

work page
[6]

International Conference on Artificial Intelligence and Statistics , pages=

Counterfactual representation learning with balancing weights , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021
[7]

International Symposium on Intelligent Data Analysis , pages=

Evaluation of uplift models with non-random assignment bias , author=. International Symposium on Intelligent Data Analysis , pages=. 2022 , organization=

work page 2022
[8]

Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=

Uplift modeling under limited supervision , author=. Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=. 2024 , organization=

work page 2024
[9]

Journal of the American Statistical Association , volume=

Evaluating treatment prioritization rules via rank-weighted average treatment effects , author=. Journal of the American Statistical Association , volume=. 2025 , publisher=

work page 2025
[10]

International Conference on Machine Learning , pages=

How and why to use experimental data to evaluate methods for observational causal inference , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[11]

Annals of statistics , volume=

Performance guarantees for individualized treatment rules , author=. Annals of statistics , volume=

work page
[12]

Clinical Pharmacology & Therapeutics , volume=

Using machine learning to individualize treatment effect estimation: challenges and opportunities , author=. Clinical Pharmacology & Therapeutics , volume=. 2024 , publisher=

work page 2024
[13]

International Conference on Machine Learning , pages=

In search of insights, not magic bullets: Towards demystification of the model selection dilemma in heterogeneous treatment effect estimation , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[14]

International Conference on Machine Learning Position Paper Track , year=

Position: Causal Machine Learning Requires Rigorous Synthetic Experiments for Broader Adoption , author=. International Conference on Machine Learning Position Paper Track , year=

work page
[15]

Proceedings of the KDD Workshop on Artificial Intelligence for Computational Advertising , year=

A large scale benchmark for uplift modeling , author=. Proceedings of the KDD Workshop on Artificial Intelligence for Computational Advertising , year=

work page
[16]

Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis , booktitle =

Vito Walter Anelli and Daniele Malitesta and Claudio Pomo and Alejandro Bellog. Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis , booktitle =

work page
[17]

ICML Workshop on Clinical Data Analysis , volume=

Uplift modeling for clinical trial data , author=. ICML Workshop on Clinical Data Analysis , volume=

work page
[18]

Knowledge and Information Systems , volume=

A review on matrix completion for recommender systems , author=. Knowledge and Information Systems , volume=. 2022 , publisher=

work page 2022
[19]

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

Graph infomax adversarial learning for treatment effect estimation with networked observational data , author=. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

work page
[20]

arXiv preprint arXiv:2107.12420 , year=

Efficient treatment effect estimation in observational studies under heterogeneous partial interference , author=. arXiv preprint arXiv:2107.12420 , year=

work page arXiv
[21]

Advances in Neural Information Processing Systems , volume=

Near-optimal bayesian active learning with noisy observations , author=. Advances in Neural Information Processing Systems , volume=

work page
[22]

Econometrica , volume=

Policy learning with observational data , author=. Econometrica , volume=. 2021 , publisher=

work page 2021
[23]

Proceedings of the ACM Web Conference 2022 , pages=

Assessing the causal impact of COVID-19 related policies on outbreak dynamics: A case study in the US , author=. Proceedings of the ACM Web Conference 2022 , pages=

work page 2022
[24]

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

Inferring network effects from observational data , author=. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

work page
[25]

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

A Look into Causal Effects under Entangled Treatment in Graphs: Investigating the Impact of Contact on MRSA Infection , author=. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

work page
[26]

Advances in Neural Information Processing Systems , volume=

Interventions, where and how? experimental design for causal models at scale , author=. Advances in Neural Information Processing Systems , volume=

work page
[27]

Proceedings of the 40th International Conference on Machine Learning , year=

Uncertainty Estimation for Molecules: Desiderata and Methods , author=. Proceedings of the 40th International Conference on Machine Learning , year=

work page
[28]

Proceedings of the 32nd International Conference on Machine Learning , pages=

Submodularity in data subset selection and active learning , author=. Proceedings of the 32nd International Conference on Machine Learning , pages=. 2015 , organization=

work page 2015
[29]

Universal Sentence Encoder

Universal sentence encoder , author=. arXiv preprint arXiv:1803.11175 , year=

work page Pith review arXiv
[30]

The Journal of Machine Learning Research , volume=

Batch greedy maximization of non-submodular functions: Guarantees and applications to experimental design , author=. The Journal of Machine Learning Research , volume=. 2021 , publisher=

work page 2021
[31]

Proceedings of the KDD International Workshop on Mining and Learning with Graphs , year=

Network experiment design for estimating direct treatment effects , author=. Proceedings of the KDD International Workshop on Mining and Learning with Graphs , year=

work page
[32]

Advances in Neural Information Processing Systems , volume=

Using embeddings for causal estimation of peer influence in social networks , author=. Advances in Neural Information Processing Systems , volume=

work page
[33]

Proceedings of the 31st ACM International Conference on Information & Knowledge Management , pages=

Estimating Causal Effects on Networked Observational Data via Representation Learning , author=. Proceedings of the 31st ACM International Conference on Information & Knowledge Management , pages=

work page
[34]

Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=

Estimating Treatment Effects Under Heterogeneous Interference , author=. Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=. 2023 , organization=

work page 2023
[35]

NeurIPS 2022 Workshop on Causality for Real-world Impact , year=

A Causal Inference Framework for Network Interference with Panel Data , author=. NeurIPS 2022 Workshop on Causality for Real-world Impact , year=

work page 2022
[36]

arXiv preprint arXiv:2405.02183 , year=

Metalearners for Ranking Treatment Effects , author=. arXiv preprint arXiv:2405.02183 , year=

work page arXiv
[37]

Journal of the American Statistical Association , volume=

Estimation and inference of heterogeneous treatment effects using random forests , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

work page 2018
[38]

International Conference on Machine Learning , pages=

Deep IV: A flexible approach for counterfactual prediction , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017
[39]

Data Mining and Knowledge Discovery , volume=

Linear regression for uplift modeling , author=. Data Mining and Knowledge Discovery , volume=. 2018 , publisher=

work page 2018
[40]

arXiv preprint arXiv:1807.07909 , year=

Boosting algorithms for uplift modeling , author=. arXiv preprint arXiv:1807.07909 , year=

work page arXiv
[41]

Nature Medicine , volume=

Causal machine learning for predicting treatment outcomes , author=. Nature Medicine , volume=. 2024 , publisher=

work page 2024
[42]

Advances in neural information processing systems , volume=

Organite: Optimal transplant donor organ offering using an individual treatment effect , author=. Advances in neural information processing systems , volume=

work page
[43]

European Journal of Operational Research , volume=

To do or not to do: cost-sensitive causal decision-making , author=. European Journal of Operational Research , volume=

work page
[44]

Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=

Regularization for Uplift Regression , author=. Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=. 2023 , organization=

work page 2023
[45]

arXiv preprint arXiv:2306.03929 , year=

Finding Counterfactually Optimal Action Sequences in Continuous State Spaces , author=. arXiv preprint arXiv:2306.03929 , year=

work page arXiv
[46]

arXiv preprint arXiv:2301.12292 , year=

Zero-shot causal learning , author=. arXiv preprint arXiv:2301.12292 , year=

work page arXiv
[47]

International Conference on Machine Learning , pages=

Improving screening processes via calibrated subset selection , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[48]

Clinical Pharmacology & Therapeutics , volume=

From real-world patient data to individualized treatment effects using machine learning: current and future methods to address underlying challenges , author=. Clinical Pharmacology & Therapeutics , volume=. 2021 , publisher=

work page 2021
[49]

arXiv; 2023.http://arxiv.org/abs/2011.08047, arXiv:2011.08047 [stat]

Causal inference methods for combining randomized trials and observational studies: a review , author=. arXiv preprint arXiv:2011.08047 , year=

work page arXiv 2011
[50]

Journal of the American Statistical Association , volume=

Causal inference using potential outcomes: Design, modeling, decisions , author=. Journal of the American Statistical Association , volume=. 2005 , publisher=

work page 2005
[51]

Advances in Neural Information Processing Systems , volume=

Staggered rollout designs enable causal inference under interference without network knowledge , author=. Advances in Neural Information Processing Systems , volume=

work page
[52]

Advances in Neural Information Processing Systems , volume=

Causal effect inference with deep latent-variable models , author=. Advances in Neural Information Processing Systems , volume=

work page
[53]

arXiv preprint arXiv:1801.07310 , year=

Propensity score methodology in the presence of network entanglement between treatments , author=. arXiv preprint arXiv:1801.07310 , year=

work page arXiv
[54]

Semi-Supervised Classification with Graph Convolutional Networks

Semi-supervised classification with graph convolutional networks , author=. arXiv preprint arXiv:1609.02907 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[55]

ICML Workshop on Structured Probabilistic Inference & Generative Modeling , year=

Graph Neural Network Powered Bayesian Optimization for Large Molecular Spaces , author=. ICML Workshop on Structured Probabilistic Inference & Generative Modeling , year=

work page
[56]

Journal of Computational and Graphical Statistics , volume=

Bayesian nonparametric modeling for causal inference , author=. Journal of Computational and Graphical Statistics , volume=. 2011 , publisher=

work page 2011
[57]

arXiv preprint arXiv:2211.01939 , year=

Empirical analysis of model selection for heterogeneous causal effect estimation , author=. arXiv preprint arXiv:2211.01939 , year=

work page arXiv
[58]

arXiv preprint arXiv:1804.05146 , year=

A comparison of methods for model selection when estimating individual treatment effects , author=. arXiv preprint arXiv:1804.05146 , year=

work page arXiv
[59]

International Conference on Machine Learning , pages=

Counterfactual cross-validation: Stable model selection procedure for causal inference models , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[60]

Advances in Neural Information Processing Systems , volume=

Causal normalizing flows: from theory to practice , author=. Advances in Neural Information Processing Systems , volume=

work page
[61]

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management , pages=

Uplift modeling: From causal inference to personalization , author=. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management , pages=

work page
[62]

International conference on machine learning , pages=

Causal transformer for estimating counterfactual outcomes , author=. International conference on machine learning , pages=. 2022 , organization=

work page 2022
[63]

International Conference on Machine Learning , pages=

Validating causal inference models via influence functions , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019
[64]

Foundation models for causal inference via prior-data fitted networks, 2025

Foundation models for causal inference via prior-data fitted networks , author=. arXiv preprint arXiv:2506.10914 , year=

work page arXiv
[65]

Statistical Science , volume=

Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition , author=. Statistical Science , volume=

work page
[66]

2008 , howpublished=

The MineThatData E-Mail Analytics and Data Mining Challenge , author=. 2008 , howpublished=

work page 2008
[67]

2010 , publisher=

Causal inference , author=. 2010 , publisher=

work page 2010
[68]

arXiv preprint arXiv:1906.00442 , year=

An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference , author=. arXiv preprint arXiv:1906.00442 , year=

work page arXiv 1906
[69]

International conference on learning representations , year=

GANITE: Estimation of individualized treatment effects using generative adversarial nets , author=. International conference on learning representations , year=

work page
[70]

Advances in neural information processing systems , volume=

Bayesian inference of individualized treatment effects using multi-task gaussian processes , author=. Advances in neural information processing systems , volume=

work page
[71]

Proceedings of the AAAI conference on artificial intelligence , volume=

Learning counterfactual representations for estimating individual dose-response curves , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[72]

Drug discovery today , volume=

Application of statistical ‘design of experiments’ methods in drug discovery , author=. Drug discovery today , volume=. 2004 , publisher=

work page 2004
[73]

Biometrika , volume=

Quasi-oracle estimation of heterogeneous treatment effects , author=. Biometrika , volume=. 2021 , publisher=

work page 2021
[74]

arXiv preprint arXiv:2002.11631 , year=

Causalml: Python package for causal machine learning , author=. arXiv preprint arXiv:2002.11631 , year=

work page arXiv 2002
[75]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Mar- garet Mitchell, Dhruv Batra, C

Prediction-powered inference , author=. arXiv preprint arXiv:2301.09633 , year=

work page arXiv
[76]

ACS Central Science , volume=

Evidential deep learning for guided molecular property prediction and discovery , author=. ACS Central Science , volume=. 2021 , publisher=

work page 2021
[77]

Machine Learning for Marketing Decision Support , year=

Revenue uplift modeling , author=. Machine Learning for Marketing Decision Support , year=

work page
[78]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Conditional independence in statistical theory , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1979 , publisher=

work page 1979
[79]

Knowledge and Information Systems , volume=

Decision trees for uplift modeling with single and multiple treatments , author=. Knowledge and Information Systems , volume=. 2012 , publisher=

work page 2012
[80]

Proceedings of the ACM Web Conference 2023 , pages=

Graph Neural Network with Two Uplift Estimators for Label-Scarcity Individual Uplift Modeling , author=. Proceedings of the ACM Web Conference 2023 , pages=

work page 2023

Showing first 80 references.