IV-ICL: Bounding Causal Effects with Instrumental Variables via In-Context Learning
Pith reviewed 2026-05-14 19:45 UTC · model grok-4.3
The pith
An amortized in-context learner recovers the full identified set of causal effects from instrumental variable data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By training an in-context learner to minimize the expected inclusive KL divergence on instrumental-variable data, IV-ICL obtains the marginal posterior distribution of the causal effect; its quantiles then supply bounds that empirically cover the full identified set for a range of data-generating processes.
What carries the argument
Amortized in-context learner minimizing inclusive KL to output the posterior over causal effects.
Load-bearing premise
That the inclusive-KL objective will recover the full identified set for arbitrary data-generating processes rather than only the synthetic ones seen in training.
What would settle it
A counterexample data-generating process where the quantiles of the learned posterior do not contain all values in the true identified set for the causal effect.
Figures
read the original abstract
The instrumental-variables (IV) setting is standard for partial identification of causal effects when unobserved confounding makes point identification impossible. Existing approaches face methodological bottlenecks: closed-form bound estimands are required -- e.g., Balke-Pearl equations in binary IV -- and even when available, designing accurate estimators requires manual effort tailored to each estimand. While direct Bayesian inference of the causal effects, instead of the bounds, circumvents these challenges, it is often computationally intensive and suffers from high prior sensitivity or under-dispersed posteriors. As a remedy, we introduce IV-ICL, an amortized Bayesian in-context learning method that learns the marginal posterior distribution of the causal effects directly and derives bounds as its quantiles. Unlike standard variational inference that optimizes exclusive KL divergence, amortized Bayesian inference minimizes the expected inclusive KL, a mass-covering objective. We empirically observe that optimizing inclusive KL can recover the entire identified set across diverse data-generating processes, while exclusive-KL (e.g. with variational inference) of the same Bayesian formulation collapses onto a single mode and fails to cover the identified set. We evaluate IV-ICL on synthetic and semi-synthetic IV benchmarks and show it produces intervals that are more reliably valid and more informative compared to efficient semi-parametric, Bayesian, and plug-in baselines, at 20-500x lower inference time. Beyond methodology, we propose a procedure to convert randomized controlled trials into IV benchmarks with provably preserved ground-truth causal effects that enables a more realistic evaluation of partial-identification methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces IV-ICL, an amortized Bayesian in-context learning method for partial identification of causal effects in instrumental variable (IV) settings. By directly learning the marginal posterior distribution of the causal effects through minimization of the inclusive KL divergence and extracting bounds as posterior quantiles, the approach aims to circumvent the need for closed-form bound estimands and manual estimator design. The key empirical observation is that inclusive KL optimization recovers the full identified set across diverse data-generating processes, in contrast to exclusive KL which collapses to a single mode. The paper evaluates this on synthetic and semi-synthetic benchmarks, reporting more valid and informative intervals than baselines at significantly lower inference time, and proposes a method to convert RCTs into IV benchmarks preserving ground-truth effects.
Significance. If the empirical observation that inclusive KL recovers the identified set generalizes, the method offers a scalable alternative to existing IV bounding techniques that avoids closed-form requirements and reduces computational burden while improving coverage over under-dispersed Bayesian or plug-in estimators. The RCT-to-IV benchmark conversion procedure is a concrete methodological contribution that could improve evaluation standards in partial identification research.
major comments (2)
- [Abstract and §3] Abstract and §3 (method description): the central claim that 'optimizing inclusive KL can recover the entire identified set across diverse data-generating processes' is supported solely by empirical behavior on the synthetic families used to train the amortizer. No derivation is given showing that the mass-covering property of the inclusive KL objective on the marginal posterior necessarily yields the full identified set for arbitrary IV models (e.g., continuous instruments, non-linear response surfaces, or DGPs outside the training support). Without this, the quantile-derived bounds lack a general validity guarantee.
- [§4] §4 (experiments): the reported favorable results on synthetic and semi-synthetic benchmarks use data-generating processes drawn from the same distributional families employed during amortizer training. This leaves open whether the learned posterior covers the identified set on out-of-distribution DGPs; additional stress tests on held-out model classes would be required to substantiate the generalization claim.
minor comments (2)
- [Abstract] The abstract states '20-500x lower inference time' without specifying the exact baseline methods, hardware, or batch sizes used for the timing comparison; this detail should be added for reproducibility.
- [§3] Notation for the in-context learning model hyperparameters and the precise form of the inclusive KL objective should be introduced earlier and used consistently throughout the method section.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We agree that the central claims rest on empirical observations rather than general theory and that the experiments would benefit from explicit out-of-distribution tests. We outline targeted revisions below.
read point-by-point responses
-
Referee: [Abstract and §3] the central claim that 'optimizing inclusive KL can recover the entire identified set across diverse data-generating processes' is supported solely by empirical behavior on the synthetic families used to train the amortizer. No derivation is given showing that the mass-covering property of the inclusive KL objective on the marginal posterior necessarily yields the full identified set for arbitrary IV models.
Authors: We agree the claim is empirical. The manuscript presents the recovery of the identified set as an observed property of inclusive-KL optimization on the families studied, in contrast to exclusive KL. We will revise the abstract and §3 to state explicitly that this is an empirical finding without a general validity guarantee for arbitrary IV models (e.g., continuous instruments or DGPs outside the training support). A brief limitations paragraph will be added noting the absence of a theoretical derivation and the need for future work on conditions under which inclusive KL covers the identified set. revision: yes
-
Referee: [§4] the reported favorable results on synthetic and semi-synthetic benchmarks use data-generating processes drawn from the same distributional families employed during amortizer training. This leaves open whether the learned posterior covers the identified set on out-of-distribution DGPs.
Authors: We acknowledge the overlap between training and test DGPs. While the semi-synthetic benchmarks introduce realistic variation, we will add new experiments using held-out model classes (different response surfaces, continuous instruments, and functional forms not seen in training) to test whether the learned posterior continues to cover the identified set on OOD DGPs. These results will be reported with the same coverage and interval-width metrics. revision: yes
- No general theoretical derivation is available showing that inclusive KL necessarily recovers the full identified set for arbitrary IV models.
Circularity Check
No circularity; central claim is empirical observation on synthetic DGPs, not reduction by construction
full rationale
The paper defines IV-ICL as an amortized Bayesian method that directly learns the marginal posterior of causal effects via inclusive KL minimization and takes quantiles as bounds. The key statement is an empirical observation ('We empirically observe that optimizing inclusive KL can recover the entire identified set across diverse data-generating processes') rather than a derivation that equates the output bounds to fitted inputs or self-cited uniqueness theorems. No equations, self-citations, or ansatzes are shown that would force the identified-set coverage by definition. Training on synthetics followed by application to new data is standard amortized inference and does not create circularity. The method remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- in-context learning model hyperparameters
axioms (1)
- domain assumption The data-generating processes used for training cover the relevant class of IV problems so that the learned posterior generalizes to real data.
Reference graph
Works this paper leans on
-
[1]
Atila Abdulkadiro ˘glu, Joshua D Angrist, Susan M Dynarski, Thomas J Kane, and Parag A Pathak. Accountability and flexibility in public schools: Evidence from boston’s charters and pilots.The Quarterly Journal of Economics, 126(2):699–748, 2011
work page 2011
-
[2]
Joshua D Angrist and Guido W. Imbens. Identification and estimation of local average treatment effects.Econometrica, 62:467–475, 1994
work page 1994
-
[3]
Princeton university press, 2009
Joshua D Angrist and Jörn-Steffen Pischke.Mostly harmless econometrics: An empiricist’s companion. Princeton university press, 2009
work page 2009
-
[4]
Joshua D Angrist, Guido W Imbens, and Donald B Rubin. Identification of causal effects using instrumental variables.Journal of the American statistical Association, 91(434):444–455, 1996
work page 1996
-
[5]
Stuart G Baker and Karen S Lindeman. The paired availability design: a proposal for evaluating epidural analgesia during labor.Statistics in medicine, 13(21):2269–2278, 1994
work page 1994
-
[6]
Vahid Balazadeh, Vasilis Syrgkanis, and Rahul G Krishnan. Partial identification of treatment effects with implicit generative models.Advances in Neural Information Processing Systems, 35:22816–22829, 2022
work page 2022
-
[7]
Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas, Benson Li, Junwei Ma, Jesse C. Cresswell, and Rahul G. Krishnan. Causalpfn: Amortized causal effect estimation via in-context learning. InAdvances in Neural Information Processing Systems, volume 38, 2025
work page 2025
-
[8]
Counterfactual probabilities: Computational methods, bounds and applications
Alexander Balke and Judea Pearl. Counterfactual probabilities: Computational methods, bounds and applications. InUncertainty in artificial intelligence, pages 46–54. Elsevier, 1994
work page 1994
-
[9]
Alexander Balke and Judea Pearl. Bounds on treatment effects from studies with imperfect compliance.Journal of the American statistical Association, 92(439):1171–1176, 1997
work page 1997
-
[10]
Black box causal inference: Effect estimation via meta prediction.arXiv:2503.05985, 2025
Lucius EJ Bynum, Aahlad Manas Puli, Diego Herrero-Quevedo, Nhi Nguyen, Carlos Fernandez- Granda, Kyunghyun Cho, and Rajesh Ranganath. Black box causal inference: Effect estimation via meta prediction.arXiv:2503.05985, 2025
-
[11]
Zhihong Cai, Manabu Kuroki, and Tosiya Sato. Non-parametric bounds on treatment effects with non-compliance by covariate adjustment.Statistics in medicine, 26(16):3188–3204, 2007
work page 2007
-
[12]
A clinician’s tool for analyzing non-compliance
David Maxwell Chickering and Judea Pearl. A clinician’s tool for analyzing non-compliance. InProceedings of the National Conference on Artificial Intelligence, pages 1269–1276, 1996
work page 1996
-
[13]
Carlos Cinelli, Avi Feller, Guido Imbens, Edward Kennedy, Sara Magliacane, and Jose Zu- bizarreta. Challenges in statistics: A dozen challenges in causality and causal inference.arXiv preprint arXiv:2508.17099, 2025
-
[14]
Causal inference using influence diagrams: the problem of partial compliance
A Philip Dawid. Causal inference using influence diagrams: the problem of partial compliance. Oxford Statistical Science Series, pages 45–65, 2003
work page 2003
-
[15]
Guilherme Duarte, Noam Finkelstein, Dean Knox, Jonathan Mummolo, and Ilya Shpitser. An automated approach to causal inference in discrete settings.Journal of the American Statistical Association, 119(547):1778–1793, 2024
work page 2024
-
[16]
Principal stratification in causal inference
Constantine E Frangakis and Donald B Rubin. Principal stratification in causal inference. Biometrics, 58(1):21–29, 2002
work page 2002
-
[17]
AlexanderM Franks, Alexander D’Amour, and Avi Feller. Flexible sensitivity analysis for observational studies without observable implications.Journal of the American Statistical Association, 2020
work page 2020
-
[18]
Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration
Jacob R Gardner, Geoff Pleiss, David Bindel, Kilian Q Weinberger, and Andrew Gordon Wilson. Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. InAdvances in Neural Information Processing Systems, 2018. 10
work page 2018
-
[19]
Sander Greenland. Relaxation penalties and priors for plausible modeling of nonidentified bias sources.Statistical Science, 24:195–210, 2009
work page 2009
-
[20]
Paul Gustafson. On Model Expansion, Model Contraction, Identifiability and Prior Information: Two Illustrative Scenarios Involving Mismeasured Variables.Statistical Science, 20(2):111 – 140, 2005
work page 2005
-
[21]
Paul Gustafson. Bayesian inference for partially identified models.The international journal of biostatistics, 6(2), 2010
work page 2010
-
[22]
Deep iv: A flexible approach for counterfactual prediction
Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. Deep iv: A flexible approach for counterfactual prediction. InProceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia, 6-11 August 2017, pages 1–9, 2017
work page 2017
-
[23]
Keisuke Hirano, Guido W Imbens, Donald B Rubin, and Xiao-Hua Zhou. Assessing the effect of an influenza vaccine in an encouragement design.Biostatistics, 1(1):69–88, 2000
work page 2000
-
[24]
TabPFN: A transformer that solves small tabular classification problems in a second
Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. TabPFN: A transformer that solves small tabular classification problems in a second. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[25]
Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025
Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025
work page 2025
-
[26]
Cambridge University Press, 2015
Guido W Imbens and Donald B Rubin.Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015
work page 2015
-
[27]
Andrew Jesson, Sören Mindermann, Uri Shalit, and Yarin Gal. Identifying causal-effect inference failure with uncertainty-aware models.Advances in Neural Information Processing Systems, 33:11637–11649, 2020
work page 2020
-
[28]
Tabicl: A tabular foundation model for in-context learning on large data
Qu Jingang, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. Tabicl: A tabular foundation model for in-context learning on large data. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[29]
Garrett A Johnson, Randall A Lewis, and Elmar I Nubbemeyer. Ghost ads: Improving the economics of measuring online ad effectiveness.Journal of Marketing Research, 54(6):867–884, 2017
work page 2017
-
[30]
Niki Kilbertus, Matt J Kusner, and Ricardo Silva. A class of algorithms for general instrumental variable models.Advances in Neural Information Processing Systems, 33:20108–20119, 2020
work page 2020
-
[31]
Christian Kleiber and Achim Zeileis.Applied Econometrics with R. Springer-Verlag, New York,
-
[32]
doi: 10.1007/978-0-387-77318-6
-
[33]
Robert J LaLonde. Evaluating the econometric evaluations of training programs with experi- mental data.The American Economic Review, pages 604–620, 1986
work page 1986
-
[34]
Levis, Matteo Bonvini, Zhenghao Zeng, Luke Keele, and Edward H
Alexander W. Levis, Matteo Bonvini, Zhenghao Zeng, Luke Keele, and Edward H. Kennedy. Covariate-assisted bounds on causal effects with instrumental variables.Journal of the Royal Statistical Society Series B: Statistical Methodology, 2025
work page 2025
-
[35]
Sharpening bounds on principal effects with covariates
Dustin M Long and Michael G Hudgens. Sharpening bounds on principal effects with covariates. Biometrics, 69(4):812–819, 2013
work page 2013
-
[36]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2017
work page 2017
-
[37]
Tabdpt: Scaling tabular foundation models on real data
Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L Caterini, and Maksims V olkovs. Tabdpt: Scaling tabular foundation models on real data. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 11
work page 2025
-
[38]
Foundation models for causal inference via prior-data fitted networks
Yuchen Ma, Dennis Frauen, Emil Javurek, and Stefan Feuerriegel. Foundation models for causal inference via prior-data fitted networks. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[39]
Divyat Mahajan, Jannes Gladrow, Agrin Hilmkil, Cheng Zhang, and Meyer Scetbon. Amortized inference of causal models via conditional fixed-point iterations.Transactions on Machine Learning Research, 2025. ISSN 2835-8856. J2C Certification
work page 2025
-
[40]
Nonparametric bounds on treatment effects.The American Economic Review, 80(2):319–323, 1990
Charles F Manski. Nonparametric bounds on treatment effects.The American Economic Review, 80(2):319–323, 1990
work page 1990
-
[41]
Charles F Manski.Partial identification of probability distributions. Springer, 2003
work page 2003
-
[42]
Valentyn Melnychuk, Dennis Frauen, Maresa Schröder, and Stefan Feuerriegel. Frequentist consistency of prior-data fitted networks for causal inference.arXiv preprint arXiv:2603.12037, 2026
-
[43]
Frederick Mosteller. The tennessee study of class size in the early school grades.The future of children, pages 113–127, 1995
work page 1995
-
[44]
Prior-Data Fitted Networks for Causal Inference: a Simulation Study with Real-World Scenarios
Francisco Mourao, David Hajage, Daria Bystrova, Bertrand Bouvarel, Nathanaël Lapidus, Fabrice Carrat, and Benjamin Glemain. Prior-data fitted networks for causal inference: a simulation study with real-world scenarios.arXiv preprint arXiv:2603.15928, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[45]
Transformers Can Do Bayesian Inference
Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hut- ter. Transformers Can Do Bayesian Inference. InInternational Conference on Learning Representations, 2022
work page 2022
-
[46]
Statistical foundations of prior-data fitted networks
Thomas Nagler. Statistical foundations of prior-data fitted networks. InProceedings of the 40th International Conference on Machine Learning, volume 202, pages 25660–25676, 2023
work page 2023
-
[47]
Stochastic causal programming for bounding treatment effects
Kirtan Padh, Jakob Zeitler, David Watson, Matt Kusner, Ricardo Silva, and Niki Kilbertus. Stochastic causal programming for bounding treatment effects. InConference on Causal Learning and Reasoning, pages 142–176. PMLR, 2023
work page 2023
-
[48]
Cambridge University Press, 2009
Judea Pearl.Causality. Cambridge University Press, 2009
work page 2009
-
[49]
PhD thesis, Almqvist & Wiksell, 1945
Olav Reiersøl.Confluence analysis by means of instrumental sets of variables. PhD thesis, Almqvist & Wiksell, 1945
work page 1945
-
[50]
Amy Richardson, Michael G Hudgens, Peter B Gilbert, and Jason P Fine. Nonparametric bounds and sensitivity analysis of treatment effects.Statistical science: a review journal of the Institute of Mathematical Statistics, 29(4):596, 2015
work page 2015
-
[51]
Transparent parameterizations of models for potential outcomes.Bayesian statistics, 9:569–610, 2011
Thomas S Richardson, Robin J Evans, and James M Robins. Transparent parameterizations of models for potential outcomes.Bayesian statistics, 9:569–610, 2011
work page 2011
-
[52]
Do-PFN: In-context learning for causal effect estimation
Jake Robertson, Arik Reuter, Siyuan Guo, Noah Hollmann, Frank Hutter, and Bernhard Schölkopf. Do-PFN: In-context learning for causal effect estimation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[53]
Donald B Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of Educational Psychology, 66(5):688, 1974
work page 1974
-
[54]
Donald B Rubin. Bayesian inference for causal effects: The role of randomization.The Annals of Statistics, pages 34–58, 1978
work page 1978
-
[55]
Causal inference using potential outcomes: Design, modeling, decisions
Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005
work page 2005
-
[56]
Michael C Sachs, Gustav Jonzon, Arvid Sjölander, and Erin E Gabriel. A general method for deriving tight symbolic bounds on causal effects.Journal of Computational and Graphical Statistics, 32(2):567–576, 2023. 12
work page 2023
-
[57]
Learning representations of instruments for partial identification of treatment effects
Jonas Schweisthal, Dennis Frauen, Maresa Schröder, Konstantin Hess, Niki Kilbertus, and Stefan Feuerriegel. Learning representations of instruments for partial identification of treatment effects. InICLR 2025 Workshop on Generative and Experimental Perspectives for Biomolecular Design, 2025
work page 2025
-
[58]
Ricardo Silva and Robin Evans. Causal inference through a witness protection program.Journal of Machine Learning Research, 17(56):1–53, 2016
work page 2016
-
[59]
Sonja A Swanson, Miguel A Hernán, Matthew Miller, James M Robins, and Thomas S Richard- son. Partial identification of the average treatment effect using instrumental variables: review of methods for binary instruments, treatments, and outcomes.Journal of the American Statistical Association, 113(522):933–947, 2018
work page 2018
-
[60]
Jin Tian and Judea Pearl. Probabilities of causation: Bounds and identification.Annals of Mathematics and Artificial Intelligence, 28(1):287–313, 2000
work page 2000
-
[61]
Variational learning of inducing variables in sparse gaussian processes
Michalis Titsias. Variational learning of inducing variables in sparse gaussian processes. In Artificial intelligence and statistics, pages 567–574. PMLR, 2009
work page 2009
-
[62]
Anastasios A Tsiatis, Marie Davidian, Min Zhang, and Xiaomin Lu. Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach.Statistics in medicine, 27(23):4658–4677, 2008
work page 2008
-
[63]
Tyler J. VanderWeele and Ilya Shpitser. On the definition of a confounder.Annals of statistics, 41(1):196–220, 2013
work page 2013
-
[64]
Quantile-optimal treatment regimes,
Justin Whitehouse, Morgane Austern, and Vasilis Syrgkanis. Inference on optimal policy values and other irregular functionals via smoothing.arXiv preprint arXiv:2507.11780, 2025
-
[65]
Philip Green Wright.The tariff on animal and vegetable oils. Macmillan, 1928
work page 1928
-
[66]
Kevin Xia, Yushu Pan, and Elias Bareinboim. Neural causal models for counterfactual identifi- cation and estimation.arXiv preprint arXiv:2210.00035, 2022
-
[67]
Jiaqi Zhang, Joel Jennings, Agrin Hilmkil, Nick Pawlowski, Cheng Zhang, and Chao Ma. Towards causal foundation model: on duality between causal inference and attention.arXiv preprint arXiv:2310.00809, 2023
-
[68]
Bounding causal effects on continuous outcome
Junzhe Zhang and Elias Bareinboim. Bounding causal effects on continuous outcome. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12207–12215, 2021
work page 2021
-
[69]
Junzhe Zhang and Elias Bareinboim. Non-parametric methods for partial identification of causal effects.Columbia CausalAI Laboratory Technical Report, 2021. 13 Appendix Contents A Balke-Pearl Equations 15 B Inclusive KL Equivalence 15 C Training Details and Inference 16 D Proof of Proposition 1 16 E Details of the Benchmark 17 E.1 Details of the Synthetic ...
work page 2021
-
[70]
Covariates:Sample X∈R n×d with d∼Unif{5,6,7,8,9,10} , where each entry is drawn from eitherN(5,1)or Unif(−10,5), chosen randomly
-
[71]
Com- pute logits as ℓZ =Xw Z +ε Z, where εZ is noise from either N(0,1) or Laplace(0,1)
Instrument generation:Generate weights wZ ∈R d from either N(1,2) or Unif(−2,2) . Com- pute logits as ℓZ =Xw Z +ε Z, where εZ is noise from either N(0,1) or Laplace(0,1) . Stan- dardize: ˜ℓZ = (ℓZ − ¯ℓZ)/std(ℓZ). SampleZ∼Bernoulli(σ( ˜ℓZ)). 17
-
[72]
Apply row-wise softmax to obtain strata probabilities P∈R n×16, where each row sums to 1
Potential treatment/outcome:Generate weights W∈R d×16 and compute logits L=XW+E where E∈R n×16 is noise. Apply row-wise softmax to obtain strata probabilities P∈R n×16, where each row sums to 1. 4.Treatment and outcome strata:The 16 columns correspond to combinations of: • Treatment strata: Always-Takers (AT), Never-Takers (NT), Defiers (DE), Compliers (C...
-
[73]
Observable generation:For each unit i, sample the stratum from the categorical distribution defined byP i, then determine(T i, Yi)based onZ i and the sampled stratum
-
[74]
Ground-truth bounds:Compute the observational probabilities pyt.z(xi) analytically from the strata probabilities, then apply the Balke-Pearl equations to obtainℓ(x i)andu(x i). E.2 Details of the Jobs Benchmark The original National Supported Work (NSW) Demonstration is an RCT evaluating job training effects on earnings. It includes the following covariat...
work page 1974
-
[75]
Finally, the outcome variable is the amount of earnings in 1978 (re78)
The treatment is a binary indicator of assignment to job training program. Finally, the outcome variable is the amount of earnings in 1978 (re78). We apply log transforms to the outputs to get less skewed outcome distribution: re74←log(re74+ 1) , re75←log(re75+ 1) , Y←log(re78+ 1) . Covariate Split.We split the features into observed covariates (O), which...
work page 1978
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.