Annotation-Assisted Learning of Treatment Policies From Multimodal Electronic Health Records

Henri Arno; Thomas Demeester

arxiv: 2507.20993 · v3 · submitted 2025-07-28 · 💻 cs.LG · cs.AI· stat.ML

Annotation-Assisted Learning of Treatment Policies From Multimodal Electronic Health Records

Henri Arno , Thomas Demeester This is my paper

Pith reviewed 2026-05-19 02:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords causal policy learningmultimodal EHRtreatment policiesconfounding adjustmentannotation-assisted learningclinical decision supportelectronic health records

0 comments

The pith

Expert annotations during training enable valid treatment benefit predictions from multimodal EHR representations alone at inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method to learn treatment policies from multimodal electronic health records that combine tabular data and clinical text. Standard causal estimators risk bias when applied directly to learned representations because those representations may omit key confounding factors. The approach incorporates expert annotations only during training to perform confounding adjustment, then trains a predictor that outputs treatment benefit estimates from multimodal representations at inference time without any annotations. This setup matters because it allows causal policy learning to work in real clinical settings where annotations are costly to obtain continuously while still respecting the need to identify patients who benefit most from treatment rather than just high-risk ones. Empirical results on synthetic, semi-synthetic, and real EHR data show the method outperforms both risk-based baselines and direct representation-based causal estimators.

Core claim

AACE (Annotation-Assisted Coarsened Effects) uses expert-provided annotations during training to support confounding adjustment in multimodal electronic health records, allowing the model to predict treatment benefits accurately from multimodal representations without annotations at inference time.

What carries the argument

Annotation-Assisted Coarsened Effects (AACE), which leverages expert annotations at training to adjust for confounding while learning predictors that operate solely on multimodal representations at test time.

If this is right

Treatment policies can be deployed using only multimodal EHR data at the point of care without requiring ongoing expert annotations.
The learned policies identify patients with the largest expected treatment benefit rather than simply those at highest baseline risk.
Performance gains appear across synthetic, semi-synthetic, and real-world EHR datasets compared with risk-based and representation-based causal baselines.
Annotations are needed only during model development, lowering the barrier to applying causal methods in multimodal clinical data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same training-time annotation strategy could extend to other multimodal domains where rich sensor or text data exist but expert labels are expensive to maintain at scale.
Minimizing the annotation budget while preserving adjustment quality would be a direct next test of practicality.
Domain-specific annotation protocols might be required when the confounders in EHR text differ from those in tabular fields.

Load-bearing premise

Expert annotations collected only in training capture the confounding information needed so that multimodal representations alone produce valid treatment benefit estimates at inference.

What would settle it

Observe patient outcomes under policies derived from AACE versus baselines in a prospective clinical setting; if the method shows no reduction in bias or no improvement in benefit identification when annotations are absent at deployment, the central claim fails.

Figures

Figures reproduced from arXiv: 2507.20993 by Henri Arno, Thomas Demeester.

**Figure 2.** Figure 2: Overview of methods for estimating τ ϕ (ϕ) from a subset of structured data (the information extraction approach in the yellow panel, and direct regression in red), with an optional correction for sampling bias (blue panel). Nuisance functions are estimated on the structured subset (S = 1), enabling construction of DR pseudo-outcomes ∆x . The proposed estimators leverage these to estimate the target effect… view at source ↗

**Figure 3.** Figure 3: Performance of all methods on the MIMIC (top row) and SynSUM (bottom row) datasets, [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

We study how to learn treatment policies from multimodal electronic health records (EHRs) that consist of tabular data and clinical text. These policies can help physicians make better treatment decisions and allocate healthcare resources more efficiently. Causal policy learning methods prioritize patients with the largest expected treatment benefit. Yet, existing estimators are designed for tabular covariates under causal assumptions that may be hard to justify in the multimodal setting. A pragmatic alternative is to apply causal estimators directly to multimodal representations, but this can produce biased treatment effect estimates when the representations do not preserve the relevant confounding information. As a result, predictive models of baseline risk are commonly used in practice to guide treatment decisions, although they are not designed to identify which patients benefit most from treatment. We propose AACE (Annotation-Assisted Coarsened Effects), an annotation-assisted approach to causal policy learning for multimodal EHRs. The method uses expert-provided annotations during training to support confounding adjustment, and then predicts treatment benefit from only multimodal representations at inference. We show that the proposed method achieves strong empirical performance across synthetic, semi-synthetic, and real-world EHR datasets, outperforming risk-based and representation-based causal baselines, and offering practical insights for applying causal machine learning in clinical practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes AACE (Annotation-Assisted Coarsened Effects), a method for learning treatment policies from multimodal EHRs (tabular + clinical text). Expert annotations are used only during training to support coarsened-effect confounding adjustment; at inference the model predicts treatment benefit from multimodal representations alone. Empirical results across synthetic, semi-synthetic, and real-world datasets are reported to show outperformance relative to risk-based and representation-based causal baselines.

Significance. If the central empirical claims hold, the work supplies a pragmatic route to causal policy learning in multimodal clinical data where annotations are costly at deployment. The multi-regime evaluation (synthetic through real EHR) is a concrete strength that allows direct comparison of policy value estimates.

major comments (2)

[§3.2] §3.2 (Method): the claim that the learned multimodal encoder necessarily recovers the annotation-adjusted treatment-benefit ordering at inference is asserted without a supporting bound, sensitivity result, or identifiability argument. Because the reported policy-value gains rest on this preservation property, the absence of such analysis makes the validity of the annotation-free inference step load-bearing for the central claim.
[§4.3] §4.3 (Real-world EHR experiments): the performance tables report point estimates of policy value but do not include error bars, bootstrap intervals, or sensitivity checks to annotation quality or residual confounding; without these, it is impossible to assess whether the observed gains over representation-based baselines are robust or dataset-specific.

minor comments (2)

[§3] Notation for the coarsened effect estimator and the multimodal encoder should be unified across §3.1 and §3.2 to avoid ambiguity in how the annotation signal is injected during training.
[Figure 2] Figure 2 (synthetic data results) would benefit from explicit labeling of the x-axis as the degree of confounding strength rather than an opaque index.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (Method): the claim that the learned multimodal encoder necessarily recovers the annotation-adjusted treatment-benefit ordering at inference is asserted without a supporting bound, sensitivity result, or identifiability argument. Because the reported policy-value gains rest on this preservation property, the absence of such analysis makes the validity of the annotation-free inference step load-bearing for the central claim.

Authors: We appreciate the referee's observation that a formal bound or identifiability argument would strengthen the presentation. The AACE training objective explicitly incorporates expert annotations to produce coarsened-effect-adjusted targets; the multimodal encoder is then optimized to predict these targets from the available modalities. This design choice is intended to align the learned representations with the annotation-adjusted ordering. While the original submission emphasizes empirical validation across synthetic, semi-synthetic, and real regimes rather than a complete theoretical guarantee, we will add a dedicated paragraph in §3.2 discussing the preservation property under the coarsened-effects framework and include a sensitivity analysis that varies annotation quality and measures resulting changes in policy value. A full identifiability proof remains an open theoretical question. revision: partial
Referee: [§4.3] §4.3 (Real-world EHR experiments): the performance tables report point estimates of policy value but do not include error bars, bootstrap intervals, or sensitivity checks to annotation quality or residual confounding; without these, it is impossible to assess whether the observed gains over representation-based baselines are robust or dataset-specific.

Authors: We agree that uncertainty quantification and sensitivity checks are necessary to evaluate robustness. In the revised manuscript we will replace the point estimates in the real-world tables of §4.3 with bootstrap confidence intervals (1,000 resamples) and add two new sensitivity panels: one varying the fraction and quality of annotations used during training, and one examining performance under increasing levels of simulated residual confounding. These additions will allow readers to assess whether the reported gains are stable across plausible annotation and confounding regimes. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external annotations and empirical validation rather than self-referential definitions or fits

full rationale

The paper proposes the AACE method, which incorporates expert annotations only during training to aid confounding adjustment on multimodal EHR inputs and then drops them at inference to predict from learned representations. No load-bearing derivation step reduces by construction to its own inputs, fitted parameters, or self-citation chains; the central claim of improved policy learning is supported by explicit experiments across synthetic, semi-synthetic, and real-world datasets rather than being defined into the result. The method is self-contained against external benchmarks, with validity treated as an empirical question.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that expert annotations can be obtained and used to correct confounding without introducing new bias, plus standard causal assumptions that may be harder to justify in multimodal EHR settings. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Expert annotations during training can support valid confounding adjustment for multimodal data.
Invoked to justify why the method works at inference without annotations.

pith-pipeline@v0.9.0 · 5745 in / 1164 out tokens · 28284 ms · 2026-05-19T02:15:11.733722+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose AACE (Annotation-Assisted Coarsened Effects), an annotation-assisted approach to causal policy learning for multimodal EHRs. The method uses expert-provided annotations during training to support confounding adjustment, and then predicts treatment benefit from only multimodal representations at inference.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The doubly robust learner... nuisance functions... pseudo-outcome Δx_i

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

[1]

Bayesian inference of individualized treatment effects using multi-task gaussian processes

Ahmed Alaa and Mihaela van der Schaar. Bayesian inference of individualized treatment effects using multi-task gaussian processes. In Advances in Neural Information Processing Systems, 2017

work page 2017
[2]

From text to treatment effects: A meta-learning approach to handling text-based confounding

Henri Arno, Paloma Rabaey, and Thomas Demeester. From text to treatment effects: A meta-learning approach to handling text-based confounding. In NeurIPS 2024 Workshop on Causal Representation Learning (CRL), 2024. 9

work page 2024
[3]

Proximal causal inference with text data

Jacob Chen, Rohit Bhattacharya, and Katherine Keith. Proximal causal inference with text data. In Advances in Neural Information Processing Systems, 2024

work page 2024
[4]

Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms

Alicia Curth and Mihaela van der Schaar. Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

work page 2021
[5]

On inductive biases for heterogeneous treatment effect estimation

Alicia Curth and Mihaela van der Schaar. On inductive biases for heterogeneous treatment effect estimation. In Advances in Neural Information Processing Systems, 2021

work page 2021
[6]

Conceptualizing treatment leakage in text-based causal inference, 2022

Adel Daoud, Connor Jerzak, and Richard Johansson. Conceptualizing treatment leakage in text-based causal inference, 2022. preprint - arXiv:2205.00465

work page arXiv 2022
[7]

BERT: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019

work page 2019
[8]

How to make causal inferences using texts

Naoki Egami, Christian Fong, Justin Grimmer, Margaret Roberts, and Brandon Stewart. How to make causal inferences using texts. Science Advances, 8(42), 2022

work page 2022
[9]

Language-agnostic BERT sentence embedding

Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang. Language-agnostic BERT sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

work page 2022
[10]

Causal machine learning for predicting treatment outcomes

Stefan Feuerriegel, Dennis Frauen, Valentyn Melnychuk, Jonas Schweisthal, Konstantin Hess, Alicia Curth, Stefan Bauer, Niki Kilbertus, Isaac Kohane, and Mihaela van der Schaar. Causal machine learning for predicting treatment outcomes. Nature Medicine, 30(4), 2024

work page 2024
[11]

SimCSE: Simple contrastive learning of sentence embed- dings

Tianyu Gao, Xingcheng Yao, and Danqi Chen. SimCSE: Simple contrastive learning of sentence embed- dings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

work page 2021
[12]

Operationalizing complex causes: A pragmatic view of mediation

Limor Gultchin, David Watson, Matt Kusner, and Ricardo Silva. Operationalizing complex causes: A pragmatic view of mediation. In Proceedings of the 38th International Conference on Machine Learning, 2021

work page 2021
[13]

Image-based treatment effect heterogeneity

Connor Jerzak, Fredrik Johansson, and Adel Daoud. Image-based treatment effect heterogeneity. In Proceedings of the 2nd Conference on Causal Learning and Reasoning, 2023

work page 2023
[14]

Quantifying ignorance in individual-level causal-effect estimates under hidden confounding

Andrew Jesson, Sören Mindermann, Yarin Gal, and Uri Shalit. Quantifying ignorance in individual-level causal-effect estimates under hidden confounding. In Proceedings of the 38th International Conference on Machine Learning, 2021

work page 2021
[15]

Generalization bounds and representation learning for estimation of potential outcomes and causal effects

Fredrik Johansson, Uri Shalit, Nathan Kallus, and David Sontag. Generalization bounds and representation learning for estimation of potential outcomes and causal effects. Journal of Machine Learning Research, 23(166), 2022

work page 2022
[16]

Learning representations for counterfactual inference

Fredrik Johansson, Uri Shalit, and David Sontag. Learning representations for counterfactual inference. In Proceedings of the 33rd International Conference on Machine Learning, 2016

work page 2016
[17]

MIMIC-III, a freely accessible critical care database

Alistair Johnson, Tom Pollard, Lu Shen, Li-wei Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger Mark. MIMIC-III, a freely accessible critical care database. Scientific data, 3(1), 2016

work page 2016
[18]

Causal inference with noisy and missing covariates via matrix factorization

Nathan Kallus, Xiaojie Mao, and Madeleine Udell. Causal inference with noisy and missing covariates via matrix factorization. Advances in Neural Information Processing Systems, 2018

work page 2018
[19]

Text and causal inference: A review of using text to remove confounding from causal estimates

Katherine Keith, David Jensen, and Brendan O’Connor. Text and causal inference: A review of using text to remove confounding from causal estimates. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

work page 2020
[20]

Text as causal mediators: research design for causal estimates of differential treatment of social groups via language aspects

Katherine Keith, Douglas Rice, and Brendan O’Connor. Text as causal mediators: research design for causal estimates of differential treatment of social groups via language aspects. In EMNLP 2021 Workshop on Causal Inference and NLP, 2021

work page 2021
[21]

Towards optimal doubly robust estimation of heterogeneous causal effects

Edward Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17(2), 2023

work page 2023
[22]

Doublemldeep: Estimation of causal effects with multimodal data, 2024

Sven Klaassen, Jan Teichert-Kluge, Philipp Bach, Victor Chernozhukov, Martin Spindler, and Suhas Vijaykumar. Doublemldeep: Estimation of causal effects with multimodal data, 2024. preprint - arXiv:2402.01785

work page arXiv 2024
[23]

Meta learners for estimating heterogeneous treatment effects using machine learning

Sören Künzel, Jasjeet Sekhon, Peter Bickel, and Bin Yu. Meta learners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences , 116(10), 2019

work page 2019
[24]

Llm-driven treatment effect estimation under inference time text confounding, 2025

Yuchen Ma, Dennis Frauen, Jonas Schweisthal, and Stefan Feuerriegel. Llm-driven treatment effect estimation under inference time text confounding, 2025. preprint - arXiv:2507.02843. 10

work page arXiv 2025
[25]

CausalNLP: A practical toolkit for causal inference with text, 2021

Arun Maiya. CausalNLP: A practical toolkit for causal inference with text, 2021. preprint - arXiv:2106.08043

work page arXiv 2021
[26]

Bounds on representation-induced confound- ing bias for treatment effect estimation

Valentyn Melnychuk, Dennis Frauen, and Stefan Feuerriegel. Bounds on representation-induced confound- ing bias for treatment effect estimation. In Proceedings of the 12th International Conference on Learning Representations, 2024

work page 2024
[27]

Orthogonal Representation Learning for Estimating Causal Quantities

Valentyn Melnychuk, Dennis Frauen, Jonas Schweisthal, and Stefan Feuerriegel. Orthogonal representation learning for estimating causal quantities, 2025. preprint - arXiv:2502.04274

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

On a general class of orthogonal learners for the estimation of heterogeneous treatment effects.arXiv preprint arXiv:2303.12687,

Pawel Morzywolek, Johan Decruyenaere, and Stijn Vansteelandt. On weighted orthogonal learners for heterogeneous treatment effects, 2024. preprint - arXiv:2303.12687v2

work page arXiv 2024
[29]

Matching with text data: An experimental evaluation of methods for matching documents and of measuring match quality

Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, and Jason Anastasopoulos. Matching with text data: An experimental evaluation of methods for matching documents and of measuring match quality. Political Analysis, 28(4), 2020

work page 2020
[30]

Quasi-oracle estimation of heterogeneous treatment effects

Xinkun Nie and Stefan Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2), 2020

work page 2020
[31]

Causal effects of linguistic properties

Reid Pryzant, Dallas Card, Dan Jurafsky, Victor Veitch, and Dhanya Sridhar. Causal effects of linguistic properties. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, 2021

work page 2021
[32]

Synsum – synthetic benchmark with structured and unstructured medical records

Paloma Rabaey, Henri Arno, Stefan Heytens, and Thomas Demeester. Synsum – synthetic benchmark with structured and unstructured medical records. In AAAI 2025 Workshop on GenAI4Health, 2024

work page 2025
[33]

Biolord-2023: semantic textual representations fusing large language models and clinical knowledge graph insights

François Remy, Kris Demuynck, and Thomas Demeester. Biolord-2023: semantic textual representations fusing large language models and clinical knowledge graph insights. Journal of the American Medical Informatics Association, 31(9), 2024

work page 2023
[34]

Adjusting for confounding with text matching

Margaret Roberts, Brandon Stewart, and Richard Nielsen. Adjusting for confounding with text matching. American Journal of Political Science, 64(4), 2020

work page 2020
[35]

Causal inference using potential outcomes

Donald Rubin. Causal inference using potential outcomes. Journal of the American Statistical Association, 100(469), 2005

work page 2005
[36]

Diffusion causal models for counterfactual estimation

Pedro Sanchez and Sotirios Tsaftaris. Diffusion causal models for counterfactual estimation. InProceedings of the 1st Conference on Causal Learning and Reasoning, 2022

work page 2022
[37]

Estimating individual treatment effect: generalization bounds and algorithms

Uri Shalit, Fredrik Johansson, and David Sontag. Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning, 2017

work page 2017
[38]

Adapting neural networks for the estimation of treatment effects

Claudia Shi, David Blei, and Victor Veitch. Adapting neural networks for the estimation of treatment effects. In Advances in Neural Information Processing Systems, 2019

work page 2019
[39]

I see, therefore i do: Estimating causal effects for image treatments, 2024

Abhinav Thorat, Ravi Kolla, and Niranjan Pedanekar. I see, therefore i do: Estimating causal effects for image treatments, 2024. preprint - arXiv:2412.06810

work page arXiv 2024
[40]

Adapting text embeddings for causal inference

Victor Veitch, Dhanya Sridhar, and David Blei. Adapting text embeddings for causal inference. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, 2020

work page 2020
[41]

Estimation and inference of heterogeneous treatment effects using random forests

Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 2018

work page 2018
[42]

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, and Iacopo Poli. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference, 20...

work page internal anchor Pith review arXiv 2024
[43]

Adjusting for confounders with text: Challenges and an empirical evaluation framework for causal inference

Galen Weld, Peter West, Maria Glenski, David Arbour, Ryan Rossi, and Tim Althoff. Adjusting for confounders with text: Challenges and an empirical evaluation framework for causal inference. In Proceedings of the 15th International AAAI Conference on Web and Social Media, 2022

work page 2022
[44]

Challenges of using text classifiers for causal inference

Zach Wood-Doughty, Ilya Shpitser, and Mark Dredze. Challenges of using text classifiers for causal inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018

work page 2018
[45]

GANITE: Estimation of individualized treatment effects using generative adversarial nets

Jinsung Yoon, James Jordon, and Mihaela van der Schaar. GANITE: Estimation of individualized treatment effects using generative adversarial nets. In Proceedings of the 6th International Conference on Learning Representations, 2018

work page 2018
[46]

Optimizing multi-scale representations to detect effect heterogeneity using earth observation and computer vision: Application to two anti-poverty rcts

Fucheng Warren Zhu, Connor Jerzak, and Adel Daoud. Optimizing multi-scale representations to detect effect heterogeneity using earth observation and computer vision: Application to two anti-poverty rcts. In Proceedings of the 4th Conference on Causal Learning and Reasoning, 2025. 11 Appendix This appendix provides additional technical details to support t...

work page 2025

[1] [1]

Bayesian inference of individualized treatment effects using multi-task gaussian processes

Ahmed Alaa and Mihaela van der Schaar. Bayesian inference of individualized treatment effects using multi-task gaussian processes. In Advances in Neural Information Processing Systems, 2017

work page 2017

[2] [2]

From text to treatment effects: A meta-learning approach to handling text-based confounding

Henri Arno, Paloma Rabaey, and Thomas Demeester. From text to treatment effects: A meta-learning approach to handling text-based confounding. In NeurIPS 2024 Workshop on Causal Representation Learning (CRL), 2024. 9

work page 2024

[3] [3]

Proximal causal inference with text data

Jacob Chen, Rohit Bhattacharya, and Katherine Keith. Proximal causal inference with text data. In Advances in Neural Information Processing Systems, 2024

work page 2024

[4] [4]

Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms

Alicia Curth and Mihaela van der Schaar. Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

work page 2021

[5] [5]

On inductive biases for heterogeneous treatment effect estimation

Alicia Curth and Mihaela van der Schaar. On inductive biases for heterogeneous treatment effect estimation. In Advances in Neural Information Processing Systems, 2021

work page 2021

[6] [6]

Conceptualizing treatment leakage in text-based causal inference, 2022

Adel Daoud, Connor Jerzak, and Richard Johansson. Conceptualizing treatment leakage in text-based causal inference, 2022. preprint - arXiv:2205.00465

work page arXiv 2022

[7] [7]

BERT: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019

work page 2019

[8] [8]

How to make causal inferences using texts

Naoki Egami, Christian Fong, Justin Grimmer, Margaret Roberts, and Brandon Stewart. How to make causal inferences using texts. Science Advances, 8(42), 2022

work page 2022

[9] [9]

Language-agnostic BERT sentence embedding

Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang. Language-agnostic BERT sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

work page 2022

[10] [10]

Causal machine learning for predicting treatment outcomes

Stefan Feuerriegel, Dennis Frauen, Valentyn Melnychuk, Jonas Schweisthal, Konstantin Hess, Alicia Curth, Stefan Bauer, Niki Kilbertus, Isaac Kohane, and Mihaela van der Schaar. Causal machine learning for predicting treatment outcomes. Nature Medicine, 30(4), 2024

work page 2024

[11] [11]

SimCSE: Simple contrastive learning of sentence embed- dings

Tianyu Gao, Xingcheng Yao, and Danqi Chen. SimCSE: Simple contrastive learning of sentence embed- dings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

work page 2021

[12] [12]

Operationalizing complex causes: A pragmatic view of mediation

Limor Gultchin, David Watson, Matt Kusner, and Ricardo Silva. Operationalizing complex causes: A pragmatic view of mediation. In Proceedings of the 38th International Conference on Machine Learning, 2021

work page 2021

[13] [13]

Image-based treatment effect heterogeneity

Connor Jerzak, Fredrik Johansson, and Adel Daoud. Image-based treatment effect heterogeneity. In Proceedings of the 2nd Conference on Causal Learning and Reasoning, 2023

work page 2023

[14] [14]

Quantifying ignorance in individual-level causal-effect estimates under hidden confounding

Andrew Jesson, Sören Mindermann, Yarin Gal, and Uri Shalit. Quantifying ignorance in individual-level causal-effect estimates under hidden confounding. In Proceedings of the 38th International Conference on Machine Learning, 2021

work page 2021

[15] [15]

Generalization bounds and representation learning for estimation of potential outcomes and causal effects

Fredrik Johansson, Uri Shalit, Nathan Kallus, and David Sontag. Generalization bounds and representation learning for estimation of potential outcomes and causal effects. Journal of Machine Learning Research, 23(166), 2022

work page 2022

[16] [16]

Learning representations for counterfactual inference

Fredrik Johansson, Uri Shalit, and David Sontag. Learning representations for counterfactual inference. In Proceedings of the 33rd International Conference on Machine Learning, 2016

work page 2016

[17] [17]

MIMIC-III, a freely accessible critical care database

Alistair Johnson, Tom Pollard, Lu Shen, Li-wei Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger Mark. MIMIC-III, a freely accessible critical care database. Scientific data, 3(1), 2016

work page 2016

[18] [18]

Causal inference with noisy and missing covariates via matrix factorization

Nathan Kallus, Xiaojie Mao, and Madeleine Udell. Causal inference with noisy and missing covariates via matrix factorization. Advances in Neural Information Processing Systems, 2018

work page 2018

[19] [19]

Text and causal inference: A review of using text to remove confounding from causal estimates

Katherine Keith, David Jensen, and Brendan O’Connor. Text and causal inference: A review of using text to remove confounding from causal estimates. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

work page 2020

[20] [20]

Text as causal mediators: research design for causal estimates of differential treatment of social groups via language aspects

Katherine Keith, Douglas Rice, and Brendan O’Connor. Text as causal mediators: research design for causal estimates of differential treatment of social groups via language aspects. In EMNLP 2021 Workshop on Causal Inference and NLP, 2021

work page 2021

[21] [21]

Towards optimal doubly robust estimation of heterogeneous causal effects

Edward Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17(2), 2023

work page 2023

[22] [22]

Doublemldeep: Estimation of causal effects with multimodal data, 2024

Sven Klaassen, Jan Teichert-Kluge, Philipp Bach, Victor Chernozhukov, Martin Spindler, and Suhas Vijaykumar. Doublemldeep: Estimation of causal effects with multimodal data, 2024. preprint - arXiv:2402.01785

work page arXiv 2024

[23] [23]

Meta learners for estimating heterogeneous treatment effects using machine learning

Sören Künzel, Jasjeet Sekhon, Peter Bickel, and Bin Yu. Meta learners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences , 116(10), 2019

work page 2019

[24] [24]

Llm-driven treatment effect estimation under inference time text confounding, 2025

Yuchen Ma, Dennis Frauen, Jonas Schweisthal, and Stefan Feuerriegel. Llm-driven treatment effect estimation under inference time text confounding, 2025. preprint - arXiv:2507.02843. 10

work page arXiv 2025

[25] [25]

CausalNLP: A practical toolkit for causal inference with text, 2021

Arun Maiya. CausalNLP: A practical toolkit for causal inference with text, 2021. preprint - arXiv:2106.08043

work page arXiv 2021

[26] [26]

Bounds on representation-induced confound- ing bias for treatment effect estimation

Valentyn Melnychuk, Dennis Frauen, and Stefan Feuerriegel. Bounds on representation-induced confound- ing bias for treatment effect estimation. In Proceedings of the 12th International Conference on Learning Representations, 2024

work page 2024

[27] [27]

Orthogonal Representation Learning for Estimating Causal Quantities

Valentyn Melnychuk, Dennis Frauen, Jonas Schweisthal, and Stefan Feuerriegel. Orthogonal representation learning for estimating causal quantities, 2025. preprint - arXiv:2502.04274

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

On a general class of orthogonal learners for the estimation of heterogeneous treatment effects.arXiv preprint arXiv:2303.12687,

Pawel Morzywolek, Johan Decruyenaere, and Stijn Vansteelandt. On weighted orthogonal learners for heterogeneous treatment effects, 2024. preprint - arXiv:2303.12687v2

work page arXiv 2024

[29] [29]

Matching with text data: An experimental evaluation of methods for matching documents and of measuring match quality

Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, and Jason Anastasopoulos. Matching with text data: An experimental evaluation of methods for matching documents and of measuring match quality. Political Analysis, 28(4), 2020

work page 2020

[30] [30]

Quasi-oracle estimation of heterogeneous treatment effects

Xinkun Nie and Stefan Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2), 2020

work page 2020

[31] [31]

Causal effects of linguistic properties

Reid Pryzant, Dallas Card, Dan Jurafsky, Victor Veitch, and Dhanya Sridhar. Causal effects of linguistic properties. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, 2021

work page 2021

[32] [32]

Synsum – synthetic benchmark with structured and unstructured medical records

Paloma Rabaey, Henri Arno, Stefan Heytens, and Thomas Demeester. Synsum – synthetic benchmark with structured and unstructured medical records. In AAAI 2025 Workshop on GenAI4Health, 2024

work page 2025

[33] [33]

Biolord-2023: semantic textual representations fusing large language models and clinical knowledge graph insights

François Remy, Kris Demuynck, and Thomas Demeester. Biolord-2023: semantic textual representations fusing large language models and clinical knowledge graph insights. Journal of the American Medical Informatics Association, 31(9), 2024

work page 2023

[34] [34]

Adjusting for confounding with text matching

Margaret Roberts, Brandon Stewart, and Richard Nielsen. Adjusting for confounding with text matching. American Journal of Political Science, 64(4), 2020

work page 2020

[35] [35]

Causal inference using potential outcomes

Donald Rubin. Causal inference using potential outcomes. Journal of the American Statistical Association, 100(469), 2005

work page 2005

[36] [36]

Diffusion causal models for counterfactual estimation

Pedro Sanchez and Sotirios Tsaftaris. Diffusion causal models for counterfactual estimation. InProceedings of the 1st Conference on Causal Learning and Reasoning, 2022

work page 2022

[37] [37]

Estimating individual treatment effect: generalization bounds and algorithms

Uri Shalit, Fredrik Johansson, and David Sontag. Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning, 2017

work page 2017

[38] [38]

Adapting neural networks for the estimation of treatment effects

Claudia Shi, David Blei, and Victor Veitch. Adapting neural networks for the estimation of treatment effects. In Advances in Neural Information Processing Systems, 2019

work page 2019

[39] [39]

I see, therefore i do: Estimating causal effects for image treatments, 2024

Abhinav Thorat, Ravi Kolla, and Niranjan Pedanekar. I see, therefore i do: Estimating causal effects for image treatments, 2024. preprint - arXiv:2412.06810

work page arXiv 2024

[40] [40]

Adapting text embeddings for causal inference

Victor Veitch, Dhanya Sridhar, and David Blei. Adapting text embeddings for causal inference. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, 2020

work page 2020

[41] [41]

Estimation and inference of heterogeneous treatment effects using random forests

Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 2018

work page 2018

[42] [42]

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, and Iacopo Poli. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference, 20...

work page internal anchor Pith review arXiv 2024

[43] [43]

Adjusting for confounders with text: Challenges and an empirical evaluation framework for causal inference

Galen Weld, Peter West, Maria Glenski, David Arbour, Ryan Rossi, and Tim Althoff. Adjusting for confounders with text: Challenges and an empirical evaluation framework for causal inference. In Proceedings of the 15th International AAAI Conference on Web and Social Media, 2022

work page 2022

[44] [44]

Challenges of using text classifiers for causal inference

Zach Wood-Doughty, Ilya Shpitser, and Mark Dredze. Challenges of using text classifiers for causal inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018

work page 2018

[45] [45]

GANITE: Estimation of individualized treatment effects using generative adversarial nets

Jinsung Yoon, James Jordon, and Mihaela van der Schaar. GANITE: Estimation of individualized treatment effects using generative adversarial nets. In Proceedings of the 6th International Conference on Learning Representations, 2018

work page 2018

[46] [46]

Optimizing multi-scale representations to detect effect heterogeneity using earth observation and computer vision: Application to two anti-poverty rcts

Fucheng Warren Zhu, Connor Jerzak, and Adel Daoud. Optimizing multi-scale representations to detect effect heterogeneity using earth observation and computer vision: Application to two anti-poverty rcts. In Proceedings of the 4th Conference on Causal Learning and Reasoning, 2025. 11 Appendix This appendix provides additional technical details to support t...

work page 2025