pith. sign in

arxiv: 1907.00399 · v1 · pith:6SBVWWLEnew · submitted 2019-06-30 · 🧮 math.ST · econ.EM· stat.TH

Bounding Causes of Effects with Mediators

Pith reviewed 2026-05-25 12:18 UTC · model grok-4.3

classification 🧮 math.ST econ.EMstat.TH
keywords probability of causationcausal boundsmediatorsbinary variablescausal inferencebounding PC
0
0 comments X

The pith

Data on complete mediators between binary exposure and outcome yields improved bounds on the probability of causation, with two-step processes sufficient for the extremal bounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the question of whether an observed positive outcome in an exposed individual was caused by the exposure, quantified by the probability of causation PC. While PC is not fully identified from the joint distribution of X and Y alone, additional knowledge of the probabilistic structure of complete mediators allows derivation of bounds. A general formula is provided for computing these bounds under any observed pattern of mediator data. The analysis reveals that the most extreme bounds achievable by any number of mediators are already attained with at most two mediators. This holds even for homogeneous processes with many steps, where PC can be pinned to zero under negative data but never to one under positive data.

Core claim

For binary X and Y with known P(Y|X), and known probabilistic structure of a sequence of complete mediators, bounds on PC for a case with X=1, Y=1 can be calculated using a general formula that incorporates the mediator data pattern. The tightest possible upper and lower bounds over all possible complete mediation processes are achieved by processes with at most two steps. With negative data on mediators PC can sometimes be identified as zero, but identification at one is impossible even with positive data on infinitely many mediators.

What carries the argument

The general bounding formula for PC under arbitrary mediator data patterns in complete mediation sequences, together with the two-step sufficiency theorem for extremal bounds.

If this is right

  • Improved bounds on PC are obtainable from any data pattern on the mediators.
  • The widest range of possible bounds is realized already in two-step mediation.
  • PC is identifiable as 0 under certain negative mediator data configurations.
  • PC cannot be identified as 1 even with positive data on an arbitrary number of mediators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • In applied settings, collecting data on just two mediators might be sufficient to achieve the sharpest possible bounds without needing more.
  • The results suggest limits to how much process knowledge can resolve individual-level causation questions.
  • Extensions could involve relaxing the completeness assumption to partial mediators.

Load-bearing premise

That the mediators form a complete sequence capturing the entire causal effect from X to Y, with their joint probabilistic structure fully known.

What would settle it

Observing or constructing a complete mediation process with three or more steps that produces strictly tighter or wider bounds on PC than any two-step process would falsify the claim that two steps suffice for extremal bounds.

Figures

Figures reproduced from arXiv: 1907.00399 by Macartan Humphreys, Monica Musio, Philip Dawid.

Figure 1
Figure 1. Figure 1: Bounds from homogeneous decompositions of length [PITH_FULL_IMAGE:figures/full_fig_p013_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of bounds on PC given different auxiliary information. Simple bounds are [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
read the original abstract

Suppose X and Y are binary exposure and outcome variables, and we have full knowledge of the distribution of Y, given application of X. From this we know the average causal effect of X on Y. We are now interested in assessing, for a case that was exposed and exhibited a positive outcome, whether it was the exposure that caused the outcome. The relevant "probability of causation", PC, typically is not identified by the distribution of Y given X, but bounds can be placed on it, and these bounds can be improved if we have further information about the causal process. Here we consider cases where we know the probabilistic structure for a sequence of complete mediators between X and Y. We derive a general formula for calculating bounds on PC for any pattern of data on the mediators (including the case with no data). We show that the largest and smallest upper and lower bounds that can result from any complete mediation process can be obtained in processes with at most two steps. We also consider homogeneous processes with many mediators. PC can sometimes be identified as 0 with negative data, but it cannot be identified at 1 even with positive data on an infinite set of mediators. The results have implications for learning about causation from knowledge of general processes and of data on cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript derives a general formula for bounding the probability of causation (PC) for binary exposure X and outcome Y, given P(Y|X) and the known probabilistic structure of any sequence of complete mediators (including the no-data case). It proves that the extremal upper and lower bounds over all complete mediation processes are attained by processes with at most two steps. It further examines homogeneous processes, showing that PC is identifiable as 0 under negative mediator data but cannot be identified as 1 even under positive data on an infinite mediator chain.

Significance. If the derivations hold, the work strengthens causal inference by providing explicit, computable bounds on individual-level causation that tighten with mediator information and by establishing a two-step sufficiency result that simplifies analysis of arbitrary-length mediation chains. The non-identification result for PC=1 is a clear, falsifiable contribution.

minor comments (2)
  1. The abstract states that derivations exist but the main text should ensure every step of the general formula is written out with explicit conditioning on the mediator distributions to allow direct verification.
  2. Notation for the sequence of mediators and the data patterns (positive/negative) should be introduced once in a dedicated preliminary section rather than inline.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The summary accurately captures the main results on bounds for the probability of causation under complete mediation sequences, the two-step sufficiency result, and the identification findings for homogeneous processes.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper derives a general formula for PC bounds given any observed pattern on complete mediators (including the null case) and proves that extremal bounds over all such processes are attained by processes with at most two steps. Both results are obtained by direct mathematical manipulation from the explicit setup assumptions: binary X and Y, full knowledge of P(Y|X), and complete mediators whose joint probabilistic structure is known exactly. These assumptions are stated as the modeling premise in the abstract and are not derived from the target bounds. No parameter is fitted to data and then relabeled as a prediction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in. The derivation chain therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard causal identification assumptions (consistency, no unmeasured confounding along the mediator chain, complete mediation) that are domain assumptions in causal inference; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Standard causal assumptions such as consistency, no unmeasured confounding for the mediators, and complete mediation.
    Invoked to ensure the mediator structure allows bounding PC from the known probabilistic relations.

pith-pipeline@v0.9.0 · 5752 in / 1196 out tokens · 25001 ms · 2026-05-25T12:18:24.970422+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    Understanding Process Tracing

    Collier, David. 2011. “Understanding Process Tracing.”PS: Political Science & Politics 44(4):823– 830

  2. [2]

    From Statistical Evidence to Evidence of Causality

    Dawid, A. Philip, Monica Musio and Stephen E. Fienberg. 2016. “From Statistical Evidence to Evidence of Causality.” Bayesian Analysis 11:725–752

  3. [3]

    Dawid, Alexander Philip. 2011. The Rˆ ole of Scientific and Statistical Evidence in Assessing Causal- ity. In Perspectives on Causation, ed. Richard Goldberg. Oxford: Hart Publishing pp. 133–147

  4. [4]

    The Probability of Causa- tion

    Dawid, Alexander Philip, Monica Musio and Rossella Murtas. 2017. “The Probability of Causa- tion.” Law, Probability and Risk 16:163–179

  5. [5]

    Dawid, Alexander Philip, Rossella Murtas and Monica Musio. 2016. Bounding the Probability of Causation in Mediation Analysis. In Topics on Methodological and Applied Statistical Inference. Springer pp. 75–84

  6. [6]

    Gelman, Andrew and Guido Imbens. 2013. Why Ask Why? Forward Causal Inference and Reverse Causal Questions. Working Paper 19614 National Bureau of Economic Research. URL: https://www.nber.org/papers/w19614

  7. [7]

    The Logic of Process Tracing Tests in the Social Sciences

    Mahoney, James. 2012. “The Logic of Process Tracing Tests in the Social Sciences.” Sociological Methods & Research 41(4):570–597

  8. [8]

    New bounds for the Probability of Causation in Mediation Analysis

    Murtas, Rossella, Alexander Philip Dawid and Monica Musio. 2017. “New Bounds for the Prob- ability of Causation in Mediation Analysis.” arXiv:1706.04857. URL: https://arxiv.org/abs/1706.04857v1

  9. [9]

    Causes of Effects and Effects of Causes

    Pearl, Judea. 2015. “Causes of Effects and Effects of Causes.” Sociological Methods & Research 44(1):149–164

  10. [10]

    The Probability of Causation Under a Stochastic Model for Individual Risk

    Robins, James and Sander Greenland. 1989. “The Probability of Causation Under a Stochastic Model for Individual Risk.” Biometrics 45:1125–1138

  11. [11]

    Probabilities of Causation: Bounds and Identification

    Tian, Jin and Judea Pearl. 2000. “Probabilities of Causation: Bounds and Identification.” Annals of Mathematics and Artificial Intelligence 28:287–313. Van Evera, Stephen. 1997. Guide to Methods for Students of Political Science. Ithaca, NY: Cornell University Press

  12. [12]

    Theory-Based Impact Evaluation: Principles and Practice

    White, Howard. 2009. “Theory-Based Impact Evaluation: Principles and Practice.” Journal of Development Effectiveness 1(3):271–284. Wolfram Research, Inc. 2018. “Mathematica, Version 11.3.”. Champaign, IL

  13. [13]

    Understanding the Past: Statistical Analysis of Causal Attribution

    Yamamoto, Teppei. 2012. “Understanding the Past: Statistical Analysis of Causal Attribution.” American Journal of Political Science 56(1):237–256. 23