pith. sign in

arxiv: 2502.16156 · v1 · submitted 2025-02-22 · 📊 stat.ML · cs.LG

A Review of Causal Decision Making

Pith reviewed 2026-05-23 02:45 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords causal decision makingcausal structure learningcausal effect learningcausal policy learningcausal inferencedecision makingPython implementation
0
0 comments X

The pith

Causal decision making proceeds through structure learning, effect estimation, and policy application.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The review frames causal decision-making as a sequence of three linked stages. Causal structure learning identifies relationships among actions, environments, and outcomes. Causal effect learning then measures the size of those relationships. Causal policy learning applies the resulting knowledge to choose actions. The work also collects existing methods into one Python package and lists open challenges plus future directions for making the approach usable in practice.

Core claim

The review establishes that effective causal decision-making requires three sequential components: causal structure learning to discover relationships, causal effect learning to quantify impacts, and causal policy learning to select actions, all supported by a consolidated Python collection of methods along with identified challenges and research directions.

What carries the argument

The three-stage pipeline of causal structure learning, causal effect learning, and causal policy learning.

If this is right

  • Practitioners can follow the three stages in order to build decision systems from observational data.
  • Advances that solve the listed challenges will directly improve the pipeline's reliability.
  • The Python collection serves as a concrete implementation resource for the full framework.
  • Real-world applications can be developed by chaining methods across the three stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The staged structure could guide integration of causal methods with existing reinforcement-learning systems.
  • Testing the pipeline on domains with partial observability would reveal where the stages break down.
  • Future extensions might add feedback loops that let policy learning refine earlier structure estimates.

Load-bearing premise

The selected methods and challenges in the three stages accurately represent the current state of the field without major omissions or selection bias.

What would settle it

A widely used causal decision-making method or an important practical challenge that is absent from the review and cannot be fit into the three-stage structure.

Figures

Figures reproduced from arXiv: 2502.16156 by Hengrui Cai, Lin Ge, Rui Song, Runzhe Wan, Yang Xu.

Figure 1
Figure 1. Figure 1: Workflow of the Causal Decision Making. f1, f2, and f3 represent the impact sizes of the directed edges. Variables enclosed in solid circles are observed, while those in dashed circles are actionable. personal investor’s actions may have negligible impact on stock market dynamics, enabling potential outcomes of alternate investment decisions to be inferred from existing stock price time series (Angrist and… view at source ↗
Figure 2
Figure 2. Figure 2: Common data dependence structures (paradigms) in CDM. Detailed notations and explanations can be found [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Causal Policy Learning decision-making rely on a known causal structure, which provides the convenience of identifying the right variables to control (e.g., confounders), to intervene (e.g., treatments), and to optimize (e.g., rewards). However, such convenience is absent in many emerging real applications where the causal structure is unknown. Causal discovery thus has attracted more and more attention re… view at source ↗
Figure 4
Figure 4. Figure 4: An illustration of Simpson’s Paradox. Second, understanding the causal pathway is essential when estimating the impact of changes in one variable on another to inform decision-making. Interventions based on incomplete causal knowledge risks yielding biased outcomes, as depicted in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Left: Illustration of the causal relationship between the customer being a new father or not, beer purchasing, and diaper purchasing, where solid lines represent the true model, and the dashed line corresponds to the spurious correlation between beer purchasing and diaper purchasing. Right: Relationship between various causal structures. Nodes A, B, and C belong to the necessary and sufficient causal graph… view at source ↗
Figure 6
Figure 6. Figure 6: The difference in coefficients approach approximates the mediated effect by calculating [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Estimated directed acyclic graph for the selected MIMIC-III data analysis. [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Simulation results for movie recommendation. Shaded areas indicate the 95% confidence interval of the [PITH_FULL_IMAGE:figures/full_fig_p037_8.png] view at source ↗
read the original abstract

To make effective decisions, it is important to have a thorough understanding of the causal relationships among actions, environments, and outcomes. This review aims to surface three crucial aspects of decision-making through a causal lens: 1) the discovery of causal relationships through causal structure learning, 2) understanding the impacts of these relationships through causal effect learning, and 3) applying the knowledge gained from the first two aspects to support decision making via causal policy learning. Moreover, we identify challenges that hinder the broader utilization of causal decision-making and discuss recent advances in overcoming these challenges. Finally, we provide future research directions to address these challenges and to further enhance the implementation of causal decision-making in practice, with real-world applications illustrated based on the proposed causal decision-making. We aim to offer a comprehensive methodology and practical implementation framework by consolidating various methods in this area into a Python-based collection. URL: https://causaldm.github.io/Causal-Decision-Making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript reviews causal decision making by organizing the literature into three stages—causal structure learning, causal effect learning, and causal policy learning—while identifying challenges, discussing advances to overcome them, outlining future directions, and consolidating methods into a Python collection (with real-world application examples) hosted at causaldm.github.io/Causal-Decision-Making.

Significance. If the coverage is representative and the Python implementations faithful, the review would supply a useful organizing framework and accessible implementation resource for causal ML applied to decisions. The explicit consolidation of methods into a public Python collection is a concrete strength that supports reproducibility.

major comments (2)
  1. [Abstract] Abstract: the claim to offer a 'comprehensive methodology' and to surface the three stages without major omissions is load-bearing for the central contribution, yet no inclusion criteria, search protocol, date range, or explicit rationale for method selection appears in the abstract or introductory sections; this leaves open the possibility that recent advances (e.g., scalable discovery under latent confounding or instrument-based off-policy evaluation) are underrepresented.
  2. The manuscript's assessment of challenges and future directions rests on the chosen literature; without documented selection criteria it is impossible to evaluate whether the listed challenges are the most pressing or whether the cited advances are the most relevant.
minor comments (1)
  1. The repository URL is given but no statement confirms that the collection has been tested against the original papers' reported results or includes unit tests for the reviewed algorithms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater transparency in literature selection. We will revise the manuscript to include explicit selection criteria and rationale, which will strengthen the presentation of the three-stage framework, challenges, and future directions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim to offer a 'comprehensive methodology' and to surface the three stages without major omissions is load-bearing for the central contribution, yet no inclusion criteria, search protocol, date range, or explicit rationale for method selection appears in the abstract or introductory sections; this leaves open the possibility that recent advances (e.g., scalable discovery under latent confounding or instrument-based off-policy evaluation) are underrepresented.

    Authors: We agree that the abstract and introduction would benefit from an explicit statement of scope. The review organizes the literature around the three-stage conceptual framework (structure learning, effect learning, policy learning) and prioritizes methods with publicly available implementations that support the accompanying Python collection. We will add a dedicated paragraph in the Introduction describing this rationale, the emphasis on decision-oriented applications, and the time frame of literature considered. We will also assess the two example advances cited and incorporate brief discussion where they align with the stages. revision: yes

  2. Referee: The manuscript's assessment of challenges and future directions rests on the chosen literature; without documented selection criteria it is impossible to evaluate whether the listed challenges are the most pressing or whether the cited advances are the most relevant.

    Authors: We accept that documenting the selection process will allow readers to better judge the completeness of the identified challenges and proposed directions. The challenges are drawn directly from limitations observed across the methods reviewed in each stage, and the future directions target practical deployment gaps. Adding the selection rationale (as noted above) will make this linkage clearer without altering the core organization or the Python resource. revision: yes

Circularity Check

0 steps flagged

No circularity: review paper with no derivations or predictions

full rationale

This is a survey paper that organizes existing literature into three stages (causal structure learning, causal effect learning, causal policy learning), identifies challenges, and points to a Python collection. No original equations, derivations, fitted parameters, or predictions are presented. The central claims are descriptive syntheses of prior work rather than reductions of new results to inputs defined within the paper. None of the enumerated circularity patterns (self-definitional, fitted-input-called-prediction, self-citation load-bearing, etc.) apply. The absence of explicit inclusion criteria for the review is a potential coverage limitation but does not constitute circularity in any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a review paper, the contribution is synthesis of prior literature rather than new derivations; no free parameters, axioms, or invented entities are introduced by the authors.

pith-pipeline@v0.9.0 · 5695 in / 1017 out tokens · 35076 ms · 2026-05-23T02:45:06.124363+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

    cs.LG 2026-04 unverdicted novelty 6.0

    Kernel smoothing yields accurate value and gradient estimates for low-variance policy learning in LLM reasoning under tight per-prompt sampling budgets.

  2. Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

    cs.LG 2026-04 unverdicted novelty 6.0

    Kernel smoothing enables accurate low-variance value and gradient estimates for policy optimization in LLM reasoning under tight sampling constraints per prompt.

  3. Causal Additive Models with Unobserved Causal Paths and Backdoor Paths

    cs.LG 2025-02 unverdicted novelty 6.0

    Establishes sufficient conditions for causal direction identification in additive models with unobserved paths and introduces a sound, complete search algorithm.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 2 Pith papers · 6 internal anchors

  1. [1]

    The economic costs of conflict: A case study of the basque country

    Abadie A, Gardeazabal J (2003). The economic costs of conflict: A case study of the basque country. American economic review, 93(1): 113–132. Abadie A, Imbens GW (2006). Large sample properties of matching estimators for average treatment effects. econo- metrica, 74(1): 235–267. Abadie A, Imbens GW (2008). On the failure of the bootstrap for matching esti...

  2. [2]

    In: The economics of artificial intelligence, 507–552

    the impact of machine learning on economics. In: The economics of artificial intelligence, 507–552. University of Chicago Press. Athey S, Bayati M, Doudchenko N, Imbens G, Khosravi K (2021). Matrix completion methods for causal panel data models. Journal of the American Statistical Association, 116(536): 1716–1730. Athey S, Chetty R, Imbens G (2020). Comb...

  3. [3]

    Bai Y , Gao Y , Wan R, Zhang S, Song R (2024)

    Statistics in medicine, 27(12): 2037–2049. Bai Y , Gao Y , Wan R, Zhang S, Song R (2024). A review of reinforcement learning in financial applications.Annual Review of Statistics and Its Application,

  4. [4]

    Advantage updating

    Baird III LC (1993). Advantage updating. Technical report, WRIGHT LAB WRIGHT-PATTERSON AFB OH. Balakrishnan S, Bi J, Soh H (2022). Scales: From fairness principles to constrained decision-making. In: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 46–55. Bannon J, Windsor B, Song W, Li T (2020). Causality and batch reinforcement le...

  5. [5]

    Heterogeneous treatment and spillover effects under clustered network interference

    Bargagli-Stoffi FJ, Tortù C, Forastiere L (2020). Heterogeneous treatment and spillover effects under clustered network interference. arXiv preprint arXiv:2008.00707. Bargiacchi E, Verstraeten T, Roijers D, Nowé A, Hasselt H (2018). Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In: International co...

  6. [6]

    No regrets for learning the prior in bandits

    Basu S, Kveton B, Zaheer M, Szepesvári C (2021). No regrets for learning the prior in bandits. Advances in neural information processing systems, 34: 28029–28041. Baxter J, Bartlett PL (2001). Infinite-horizon policy-gradient estimation. journal of artificial intelligence research, 15: 319–350. Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, M...

  7. [7]

    Doubly robust nonparametric inference on the average treatment effect

    Benkeser D, Carone M, Laan MVD, Gilbert PB (2017). Doubly robust nonparametric inference on the average treatment effect. Biometrika, 104(4): 863–880. Bibaut A, Dimakopoulou M, Kallus N, Chambaz A, van der Laan M (2021). Post-contextual-bandit inference.Advances in Neural Information Processing Systems,

  8. [8]

    A Survey on Practical Applications of Multi-Armed and Contextual Bandits

    Bietti A, Agarwal A, Langford J (2021). A contextual bandit bake-off. Journal of Machine Learning Research, 22(133): 1–49. Blyth CR (1972). On simpson’s paradox and the sure-thing principle. Journal of the American Statistical Association, 67(338): 364–366. Bollen KA (1987). Total, direct, and indirect effects in structural equation models. Sociological m...

  9. [9]

    Bühlmann P, Peters J, Ernest J, et al. (2014). Cam: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42(6): 2526–2556. Cai H, Song R, Lu W (2020). Anoce: Analysis of causal effects with multiple mediators via constrained structural learning. In: International Conference on Learning Representations. ...

  10. [10]

    Flexible and efficient contextual bandits with heterogeneous treatment effect oracles

    42 A Review of Causal Decision Making A PREPRINT Carranza AG, Krishnamurthy SK, Athey S (2023). Flexible and efficient contextual bandits with heterogeneous treatment effect oracles. In: International Conference on Artificial Intelligence and Statistics, 7190–7212. PMLR. Cesa-Bianchi N, Fischer P (1998). Finite-time regret bounds for the multiarmed bandit...

  11. [11]

    An empirical evaluation of thompson sampling

    Chapelle O, Li L (2011). An empirical evaluation of thompson sampling. Advances in neural information processing systems, 24: 2249–2257. Chen F (2009). Bayesian modeling using the mcmc procedure. In: Proceedings of the SAS Global Forum 2008 Conference, Cary NC: SAS Institute Inc. Citeseer. Chen G, Zeng D, Kosorok MR (2016). Personalized dose finding using...

  12. [12]

    Optimal structure identification with greedy search

    Chickering DM (2002). Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov): 507–554. Chu W, Li L, Reyzin L, Schapire R (2011). Contextual bandits with linear payoff functions. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 208–214. JMLR Workshop and Confere...

  13. [13]

    Pessimistic off-policy optimization for learning to rank

    Cief M, Kveton B, Kompan M (2022). Pessimistic off-policy optimization for learning to rank. arXiv preprint arXiv:2206.02593. Crump RK, Hotz VJ, Imbens GW, Mitnik OA (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1): 187–199. Cui Y , Tchetgen Tchetgen E (2021). A semiparametric instrumental variable approac...

  14. [14]

    Dubey A, et al. (2020). Kernel methods for cooperative multi-agent contextual bandits. In: International Conference on Machine Learning, 2740–2750. PMLR. Dudik M, Hsu D, Kale S, Karampatziakis N, Langford J, Reyzin L, et al. (2011). Efficient optimal learning for contextual bandits. arXiv preprint arXiv:1106.2369. Dudík M, Langford J, Li L (2011). Doubly ...

  15. [15]

    How effective is targeted advertising? In: Proceedings of the 21st international conference on World Wide Web, 111–120

    Farahat A, Bailey MC (2012). How effective is targeted advertising? In: Proceedings of the 21st international conference on World Wide Web, 111–120. Feder A, Keith KA, Manzoor E, Pryzant R, Sridhar D, Wood-Doughty Z, et al. (2021). Causal inference in natural language processing: Estimation, prediction, interpretation and beyond. arXiv preprint arXiv:2109...

  16. [16]

    Hu X, Zhang R, Tang K, Guo J, Yi Q, Chen R, et al

    Cambridge university press. Hu X, Zhang R, Tang K, Guo J, Yi Q, Chen R, et al. (2022). Causality-driven hierarchical structure discovery for reinforcement learning. Advances in Neural Information Processing Systems, 35: 20064–20076. Hu Y , Kallus N (2020). Dtr bandit: Learning to make response-adaptive decisions with low regret. arXiv preprint arXiv:2005....

  17. [17]

    without

    Imai K, Keele L, Yamamoto T (2010b). Identification, inference and sensitivity analysis for causal mediation effects. Imbens G, Kallus N, Mao X, Wang Y (2022). Long-term causal inference under persistent confounding via data combination. arXiv preprint arXiv:2202.07234. Imbens GW (2004). Nonparametric estimation of average treatment effects under exogenei...

  18. [18]

    Doubly robust thompson sampling with linear payoffs

    46 A Review of Causal Decision Making A PREPRINT Kim W, Kim GS, Paik MC (2021). Doubly robust thompson sampling with linear payoffs. Advances in Neural Information Processing Systems, 34: 15830–15840. Kim W, Lee K, Paik MC (2023). Double doubly robust thompson sampling for generalized linear contextual bandits. In: Proceedings of the AAAI Conference on Ar...

  19. [19]

    Stabilizing off-policy q-learning via bootstrapping error reduction

    Kumar A, Fu J, Soh M, Tucker G, Levine S (2019). Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems,

  20. [20]

    Gradient-based neural dag learning

    Künzel SR, Sekhon JS, Bickel PJ, Yu B (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10): 4156–4165. Kveton B, Konobeev M, Zaheer M, Hsu Cw, Mladenov M, Boutilier C, et al. (2021). Meta-thompson sampling. In: International Conference on Machine Learning, 5884...

  21. [21]

    Bandit algorithms

    Lattimore T, Szepesvári C (2020). Bandit algorithms. Cambridge University Press. Le H, V oloshin C, Yue Y (2019). Batch policy learning under constraints. In:International Conference on Machine Learning, 3703–3712. PMLR. Leacy FP, Stuart EA (2014). On the joint use of propensity and prognostic scores in estimation of the average treatment effect on the tr...

  22. [22]

    Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

    Lee S, Bareinboim E (2019). Structural causal bandits with non-manipulable variables. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 4164–4172. Lee S, Honavar V (2020). Towards robust relational causal discovery. In:Uncertainty in Artificial Intelligence, 345–355. PMLR. Leung MP (2022a). Causal inference under approximate ne...

  23. [23]

    Unifying offline causal inference and online bandit learning for data driven decision

    Li Y , Xie H, Lin Y , Lui JC (2021b). Unifying offline causal inference and online bandit learning for data driven decision. In: Proceedings of the Web Conference 2021, 2291–2303. Liang S, Lu W, Song R (2018). Deep advantage learning for optimal dynamic treatment regime. Statistical theory and related fields, 2(1): 80–88. Liao P, Qi Z, Wan R, Klasnja P, M...

  24. [24]

    Interventional approach for path-specific effects

    Lin SH, VanderWeele T (2017). Interventional approach for path-specific effects. Journal of Causal Inference, 5(1). Littman ML (2009). A tutorial on partially observable markov decision processes. Journal of Mathematical Psychology, 53(3): 119–125. Liu Q, Li L, Tang Z, Zhou D (2018a). Breaking the curse of horizon: Infinite-horizon off-policy estimation. ...

  25. [25]

    A map of bandits for e-commerce.arXiv preprint arXiv:2107.00680

    Liu Y , Li L (2021). A map of bandits for e-commerce.arXiv preprint arXiv:2107.00680. Liu Y , Wang Y , Kosorok MR, Zhao Y , Zeng D (2018b). Augmented outcome-weighted learning for estimating optimal dynamic treatment regimens. Statistics in medicine, 37(26): 3776–3788. Lopez-Paz D, Schölkopf B (2017). Discovering causal signals in images. In: Proceedings ...

  26. [26]

    Explainable reinforcement learning through a causal lens

    Madumal P, Miller T, Sonenberg L, Vetere F (2020). Explainable reinforcement learning through a causal lens. In: Proceedings of the AAAI conference on artificial intelligence, volume 34, 2493–2500. 48 A Review of Causal Decision Making A PREPRINT Manski CF (2013). Identification of treatment response with social interactions. The Econometrics Journal, 16(...

  27. [27]

    Off-policy evaluation in markov decision processes

    Paduraru C (2013). Off-policy evaluation in markov decision processes. Pan L, Huang L, Ma T, Xu H (2022). Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification. In: International conference on machine learning, 17221–17237. PMLR. Panizza U, Presbitero AF (2014). Public debt and economic growth: is there a causa...

  28. [28]

    Pearl J (2003)

    Springer. Pearl J (2003). Statistics and causal inference: A review. Test, 12: 281–345. Pearl J (2010a). Causal inference. Causality: objectives and assessment, 39–58. Pearl J (2010b). The foundations of causal inference. Sociological Methodology, 40(1): 75–149. Pearl J (2022). Direct and indirect effects. In: Probabilistic and causal inference: the works...

  29. [29]

    Bayesian inference for causal effects: The role of randomization

    Rubin DB (1978). Bayesian inference for causal effects: The role of randomization. The Annals of statistics, 6: 34–58. Rubin DB (2004). Direct and indirect causal effects via potential outcomes. Scandinavian Journal of Statistics, 31(2): 161–170. Rummery GA, Niranjan M (1994). On-line Q-learning using connectionist systems , volume

  30. [30]

    Evolution Strategies as a Scalable Alternative to Reinforcement Learning

    University of Cambridge, Department of Engineering Cambridge, UK. Runge J, Bathiany S, Bollt E, Camps-Valls G, Coumou D, Deyle E, et al. (2019). Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances, 5(11): eaau4996. Russo DJ, Van Roy B, Kazerouni A, Osband I, Wen Z, et al. (2018). A tutorial on thompson s...

  31. [31]

    Contextual Multi-Armed Bandits for Causal Marketing

    Sawant N, Namballa CB, Sadagopan N, Nassif H (2018). Contextual multi-armed bandits for causal marketing. arXiv preprint arXiv:1810.01859. Schölkopf B, Locatello F, Bauer S, Ke NR, Kalchbrenner N, Goyal A, et al. (2021). Toward causal representation learning. Proceedings of the IEEE, 109(5): 612–634. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (201...

  32. [32]

    Causal influence detection for improving efficiency in reinforcement learning

    Seitzer M, Schölkopf B, Martius G (2021). Causal influence detection for improving efficiency in reinforcement learning. Advances in Neural Information Processing Systems, 34: 22905–22918. Selig JP, Preacher KJ (2009). Mediation models for longitudinal data in developmental research. Research in human development, 6(2-3): 144–164. Sen R, Shanmugam K, Dima...

  33. [33]

    High-dimensional a-learning for optimal dynamic treatment regimes

    Shi C, Fan A, Song R, Lu W (2018). High-dimensional a-learning for optimal dynamic treatment regimes. Annals of statistics, 46(3):

  34. [34]

    Testing mediation effects using logic of boolean matrices

    Shi C, Li L (2021). Testing mediation effects using logic of boolean matrices. Journal of the American Statistical Association, 1–14. 51 A Review of Causal Decision Making A PREPRINT Shi C, Luo S, Le Y , Zhu H, Song R (2024). Statistically efficient advantage learning for offline reinforcement learning in infinite horizons. Journal of the American Statist...

  35. [35]

    Partially observable markov decision processes

    Spaan MT (2012). Partially observable markov decision processes. Reinforcement learning: State-of-the-art, 387–414. Spirtes P, Glymour C, Scheines R, Kauffman S, Aimale V , Wimberly F (2000a). Constructing bayesian network models of gene expression networks from microarray data. Spirtes P, Glymour CN, Scheines R, Heckerman D (2000b). Causation, prediction...

  36. [36]

    Modelling and estimation for optimal treatment decision with interference

    52 A Review of Causal Decision Making A PREPRINT Su L, Lu W, Song R (2019a). Modelling and estimation for optimal treatment decision with interference. Stat, 8(1): e219. Su Y , Dimakopoulou M, Krishnamurthy A, Dudík M (2019b). Doubly robust off-policy evaluation with shrinkage. arXiv preprint arXiv:1907.09623. Subramanian C, Ravindran B (2021). Causal con...

  37. [37]

    Swaminathan A, Krishnamurthy A, Agarwal A, Dudik M, Langford J, Jose D, et al. (2017). Off-policy evaluation for slate recommendation. In: Advances in Neural Information Processing Systems, 3632–3642. Tang K, Huang J, Zhang H (2020). Long-tailed classification by keeping the good and removing the bad momentum causal effect. arXiv preprint arXiv:2009.12991...

  38. [38]

    A review of causal estimation of effects in mediation analyses

    Ten Have TR, Joffe MM (2012). A review of causal estimation of effects in mediation analyses. Statistical Methods in Medical Research, 21(1): 77–107. Triantafyllou S, Singla A, Radanovic G (2022). Actual causality and responsibility attribution in decentralized partially observable markov decision processes. In: Proceedings of the 2022 AAAI/ACM Conference...

  39. [39]

    Conceptual issues concerning mediation, interventions and composition

    53 A Review of Causal Decision Making A PREPRINT VanderWeele TJ, Vansteelandt S (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and its Interface, 2(4): 457–468. Vermeulen K, Vansteelandt S (2016). Data-adaptive bias-reduced doubly robust estimation. The international journal of biostatistics, 12(1): 253–282. Vers...

  40. [40]

    Handling limited overlap in observational studies with cardinality matching

    Visconti G, Zubizarreta JR (2018). Handling limited overlap in observational studies with cardinality matching. Observational Studies, 4(1): 217–249. Viviano D (2020). Experimental design under network interference. arXiv preprint arXiv:2003.08421. Viviano D (2024). Policy targeting under network interference. Review of Economic Studies, rdae041. Viviano ...

  41. [41]

    Behavior Regularized Offline Reinforcement Learning

    Macmillan. Wu L, Yang S (2022). Integrativer-learner of heterogeneous treatment effects combining experimental and observational studies. In: Conference on Causal Learning and Reasoning, 904–926. PMLR. Wu Y , Tucker G, Nachum O (2019). Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361. Xiang X, Foo S (2021). Recent advan...

  42. [42]

    Non-parametric path analysis in structural causal models

    Zhang J, Bareinboim E (2018b). Non-parametric path analysis in structural causal models. In: Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence. Zhang K, Janson L, Murphy S (2021). Statistical inference with m-estimators on adaptively collected data. Advances in Neural Information Processing Systems, 34: 7460–7471. Zhang KW, Jans...