Optimal Experiments for Partial Causal Effect Identification

Jalal Etesami; Tobias Maringgele

arxiv: 2605.06993 · v1 · submitted 2026-05-07 · 💻 cs.AI · stat.ML

Optimal Experiments for Partial Causal Effect Identification

Tobias Maringgele , Jalal Etesami This is my paper

Pith reviewed 2026-05-11 01:41 UTC · model grok-4.3

classification 💻 cs.AI stat.ML

keywords partial causal identificationexperiment selectionepistemic potencymax-potency problemgraphical pruningID algorithmNP-hard optimizationcausal bounds

0 comments

The pith

Selecting a cost-limited set of experiments can maximize the guaranteed reduction in bound width on a partially identifiable causal query.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to choose which experiments to perform when observational data leaves a causal effect only partially identified and each experiment carries a cost. It defines epistemic potency as the worst-case narrowing of the bound interval that an experiment guarantees and casts the task as the max-potency problem of finding the best budget-respecting collection. The authors prove the selection problem is NP-hard by a direct reduction from the 0-1 knapsack problem and supply a general method to compute potency values in discrete settings by leveraging polynomial programming. Two graph-only pruning rules, one based on path interception through districts and one based on the ID algorithm, are shown to discard 50-88 percent of candidate experiments on average before any optimization is run. The resulting procedure is demonstrated end-to-end on NHANES data to decide which interventions best sharpen the estimated effect of physical activity on diabetes.

Core claim

We formalize the selection of a cost-constrained subset of experiments that maximally tightens bounds on a target causal query as the max-potency problem, where epistemic potency measures the worst-case reduction in bound width guaranteed by an experiment. The problem is shown to be NP-hard via a reduction from 0-1 knapsack. A general procedure evaluates epistemic potency in discrete settings by building on the polynomial-programming framework of Duarte et al. (2023). Two graphical pruning criteria that depend only on the causal graph and the query, a novel path-interception rule and an identifiability check based on the ID algorithm, together prune 50-88 percent of candidate experiments on

What carries the argument

Epistemic potency, the worst-case reduction in the width of bounds on the target causal query that is guaranteed by performing a given experiment, which is computed via polynomial programming and pruned using path-interception and ID-algorithm checks on the known causal graph.

If this is right

The two graphical pruning criteria together certify zero potency for 50-88 percent of candidate experiments on average in Erdős-Rényi graphs and bnlearn networks without solving any polynomial program.
Experiments that fail the ID-algorithm identifiability check are combinatorially inert, yielding a super-exponential reduction in the number of subsets that must be evaluated.
Path interception through district structure certifies zero potency for many experiments in linear time using only the causal graph and query.
The method produces a concrete recommendation of experiments on observational NHANES data for the effect of physical activity on diabetes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the assumed causal graph contains errors, the computed potencies and the selected experiments may no longer be optimal.
The same potency definition could be used to rank experiments when the graph itself is uncertain by averaging over a posterior over graphs.
Approximation algorithms for the knapsack reduction could be combined with the pruning rules to handle larger numbers of candidate experiments.

Load-bearing premise

The causal graph is known and correctly specified, and the setting is discrete so the polynomial-programming procedure applies directly.

What would settle it

An exhaustive enumeration on a small causal graph and query where the subset returned by the pruned max-potency procedure differs from the true optimum found by checking every feasible subset.

Figures

Figures reproduced from arXiv: 2605.06993 by Jalal Etesami, Tobias Maringgele.

**Figure 1.** Figure 1: Two example ADMGs. where lp˜a and up˜a are the sharp observational bounds on p˜a , itself a causal query. We take the maximum because bounds must hold regardless of the experimental outcome; any smaller choice could yield a bound width that is exceeded in practice.2 We denote the observational bound width as Wθ := Wθ(∅) = Uθ − Lθ. For illustration, Figure 1b shows an ADMG with V = {A, B, X, M, Y, C, D} (al… view at source ↗

**Figure 2.** Figure 2: Erdos–Rényi graphs. For different sizes of ˝ V , we plot the fraction of pruned experiments against the density |U| |V | . The hatched orange area represents experiments pruned by the interception mechanism (Algorithm 4); the blue area represents those pruned by the ID check. In total, 1,596 simulations were performed. 7 Evaluation Synthetic evaluation. We evaluate Algorithm 1 on two types of graphs: rando… view at source ↗

**Figure 3.** Figure 3: bnlearn. For each graph, we plot the mean fraction of pruned experiments. The hatched orange area represents experiments pruned by the interception mechanism (Algorithm 4); the blue area represents those pruned by the ID check. Error bars indicate ±1 standard deviation of the total pruning rate across 100 simulations per graph. In total, 1,100 simulations were performed. The high variability in the interce… view at source ↗

**Figure 4.** Figure 4: A TP-cell Ci : a confounded treatment-outcome pair. The query of interest within a single cell is the probability of necessity and sufficiency, PNSi = P(yi,xi , y′ i,x′ i ). For a TP-cell, Tian and Pearl [2000] give the sharp observational bounds 0 ≤ PNSi ≤ αi + δi , so that the observational worst-case width is Wi(∅) = αi + δi . We write poti(ai) = Wi(∅)−Wi(ai) for the per-cell potency, in line with Defin… view at source ↗

**Figure 5.** Figure 5: Example 3.5. The realized bound width Wθ(a, pa ) := Uθ(a, pa ) − Lθ(a, pa ) is shown as a function of pa . Example D.1 (Cartesian product decomposition). Consider the chain A → B → C → D with confounders R1 on {A, B} and R2 on {C, D} ( [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Chain graph with two districts and confounders [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Figure 1b with all edges to/from C removed (i.e., after deleting Z = {C}). After initialization: visited = {R1, R2}, queue = (R1, R2). Iteration 1: current = R1. Not in W. Children in E ′ : X, M (unvisited). Update visited = {R1, R2, X, M}, queue = (R2, X, M). Iteration 2: current = R2. Unvisited child: Y . Update visited = {R1, R2, X, M, Y }, queue = (X, M, Y ). Iterations 3–5: X has no unvisited children… view at source ↗

**Figure 8.** Figure 8: bnlearn. The fraction of experiments pruned by the interception mechanism (Algorithm 4) against the relative size of R ∗ θ . The line represents the mean fraction of experiments pruned; the band around it represents the ±1 standard deviation [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Assumed causal ADMG for the NHANES experiment ( [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗

read the original abstract

Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset of experiments that maximally tightens bounds on a target query. We formalize this as the max-potency problem, where epistemic potency measures the worst-case reduction in bound width guaranteed by an experiment, and show that this problem is NP-hard via a reduction from 0-1 knapsack. Building on the polynomial-programming framework of Duarte et al. (2023), we give a general procedure for evaluating epistemic potency in discrete settings. To control the super-exponential search space, we introduce two graphical pruning criteria that depend only on the causal graph and the query: a novel path-interception rule that exploits district structure to certify zero potency in linear time, and an identifiability check based on the ID algorithm. On Erdos-Renyi random graphs and 11 bnlearn benchmark networks, the two criteria together prune 50-88% of candidate experiments on average without solving a single polynomial program. For the general subset search, we show that ID-pruned experiments are combinatorially inert, yielding a super-exponential reduction in the number of subsets evaluated. We close with an end-to-end demonstration on observational NHANES data, selecting optimal experiments for estimating the effect of physical activity on diabetes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formalizes max-potency for choosing experiments that best shrink bounds on partially identifiable causal queries and supplies graph-based pruning that cuts the search space substantially.

read the letter

This paper tackles how to pick a limited number of experiments ahead of time to tighten bounds on a causal effect when you only have partial identification from observational data. They define epistemic potency as the worst-case reduction in bound width an experiment can guarantee, turn the selection into the max-potency problem, and prove it NP-hard by reduction from 0-1 knapsack. They adapt the polynomial-programming method from Duarte et al. to compute potency in discrete cases and add two pruning rules that work from the graph and query alone: a path-interception check that uses district structure to flag zero-potency experiments quickly, plus an ID-algorithm check that marks some experiments as inert for the subset search. On Erdos-Renyi graphs and bnlearn benchmarks the rules remove 50-88% of candidates without running any programs, and the inertness claim shrinks the subset enumeration enough to make the approach usable. The NHANES demonstration shows the full pipeline on real data for the effect of physical activity on diabetes risk. The pruning and hardness result are the concrete new pieces; the rest builds directly on existing identification and optimization tools. The main limitations are the standing assumptions of a known, correctly specified graph and discrete variables so the Duarte procedure applies. If the graph is uncertain or variables are continuous the method does not transfer directly, and the real-data section is only an illustration rather than a stress test against misspecification. This is aimed at causal-inference researchers who already work with partial identification and need to allocate experimental resources. Anyone doing optimal design or experimental planning in graphical models will find the pruning rules and potency definition worth examining. It deserves a serious referee because the technical contributions are self-contained, the empirical pruning numbers are consistent, and the central claims hold together without obvious gaps.

Referee Report

2 major / 3 minor

Summary. The manuscript studies selection of a cost-constrained subset of experiments that maximally tightens bounds on a partially identifiable causal query. It formalizes the max-potency problem (epistemic potency as worst-case bound-width reduction), proves NP-hardness via reduction from 0-1 knapsack, adapts the Duarte et al. (2023) polynomial-programming framework to evaluate potency in discrete settings, introduces two graph-only pruning rules (path-interception exploiting district structure and an ID-algorithm identifiability check) that prune 50-88% of candidates on Erdos-Renyi and bnlearn graphs, shows that ID-pruned experiments are combinatorially inert (yielding super-exponential subset reduction), and demonstrates the pipeline on NHANES data for the effect of physical activity on diabetes.

Significance. If the results hold, the work offers a principled, computationally tractable approach to optimal experimental design for tightening partially identified causal effects. The NP-hardness result, the two sound graphical pruning criteria that require no polynomial-program solves, the combinatorial-inertness observation, and the end-to-end real-data demonstration are concrete strengths that advance both theory and practice in causal inference and experimental design.

major comments (2)

[NP-hardness reduction] The NP-hardness reduction (abstract and the section presenting the max-potency problem): the explicit mapping from knapsack instances to experiments and the correspondence between potency values and knapsack objective must be stated formally so that the reduction can be verified as polynomial-time and correct; without this the complexity claim remains uncheckable.
[Pruning criteria] Section introducing the path-interception pruning rule: the linear-time certification of zero potency via district structure is load-bearing for the reported 50-88% pruning rates; a self-contained soundness argument (or reference to a prior lemma) is required, as any counter-example would invalidate the pruning percentages and the subsequent combinatorial-inertness claim.

minor comments (3)

[Combinatorial inertness paragraph] The term 'combinatorial inertness' is used without an explicit definition; add a short formal definition or property when first introduced.
[Real-data demonstration] The NHANES demonstration would benefit from an explicit statement of the target causal query, the chosen cost budget, and the exact experiments selected by the algorithm.
[Empirical evaluation] Tables reporting pruning percentages on the 11 bnlearn networks should include per-network values or at least min/max ranges rather than only averages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment, the recommendation of minor revision, and the constructive comments that help clarify the presentation. We address each major comment below and will revise the manuscript to incorporate the requested formal details.

read point-by-point responses

Referee: [NP-hardness reduction] The NP-hardness reduction (abstract and the section presenting the max-potency problem): the explicit mapping from knapsack instances to experiments and the correspondence between potency values and knapsack objective must be stated formally so that the reduction can be verified as polynomial-time and correct; without this the complexity claim remains uncheckable.

Authors: We agree that an explicit, formal statement of the reduction is required for independent verification. In the revised manuscript we will add a dedicated subsection that defines the polynomial-time mapping: given any 0-1 knapsack instance with n items (weights w_i, values v_i, capacity W), we construct a causal graph G, a target query Q, and a set of candidate experiments such that each item i corresponds to one experiment e_i with cost exactly w_i, and the epistemic potency of any subset S of experiments equals the knapsack objective value of the corresponding items. We will prove that the mapping is computable in time polynomial in the knapsack input size and that an optimal max-potency solution corresponds exactly to an optimal knapsack solution. This will make the NP-hardness claim directly checkable. revision: yes
Referee: [Pruning criteria] Section introducing the path-interception pruning rule: the linear-time certification of zero potency via district structure is load-bearing for the reported 50-88% pruning rates; a self-contained soundness argument (or reference to a prior lemma) is required, as any counter-example would invalidate the pruning percentages and the subsequent combinatorial-inertness claim.

Authors: We thank the referee for emphasizing the centrality of this argument. The manuscript currently sketches the path-interception rule using district structure and the fact that experiments failing to intercept all relevant paths from intervention to query cannot tighten the bounds. To address the request for a self-contained argument, the revision will insert a new lemma immediately preceding the pruning section. The lemma states that, for a query Q whose identifying expression factors over districts, any experiment that does not intercept every path from its intervention node to the query nodes within the relevant districts has epistemic potency exactly zero; the proof relies only on the definition of epistemic potency, the district factorization, and the properties of the ID algorithm, without invoking polynomial-program solves. This will support the reported pruning rates and the combinatorial-inertness observation. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core contributions—the formalization of the max-potency problem, the NP-hardness reduction from 0-1 knapsack, the two novel graphical pruning criteria (path-interception exploiting district structure and ID-algorithm identifiability check), and the combinatorial inertness argument—are self-contained and do not reduce to definitions or quantities internal to the paper. The polynomial-programming evaluation procedure is explicitly adapted from the external Duarte et al. (2023) framework, which is cited as an independent black-box tool rather than a self-referential premise. No self-definitional loops, fitted inputs renamed as predictions, load-bearing self-citations, or smuggled ansatzes appear in the derivation chain. The reported pruning rates and NHANES demonstration follow directly once the external framework and the soundness of the new pruning rules are granted.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the standard causal-graph assumptions of the field plus the external Duarte et al. (2023) framework; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption The causal graph is known and correctly specified.
All pruning criteria and potency calculations operate directly on the given graph and query.
domain assumption The setting is discrete.
The general procedure for evaluating epistemic potency is stated only for discrete variables.

pith-pipeline@v0.9.0 · 5546 in / 1426 out tokens · 43266 ms · 2026-05-11T01:41:43.939348+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

doi: 10.1111/j.0006-341X.2002.00021.x. S. Greenland and J. M. Robins. Identifiability, exchangeability, and epidemiological confounding. International Journal of Epidemiology, 15(3):413–419, 09 1986. ISSN 0300-5771. doi: 10.1093/ ije/15.3.413. URLhttps://doi.org/10.1093/ije/15.3.413. Y . Kivva, E. Mokhtarian, J. Etesami, and N. Kiyavash. Revisiting the ge...

work page doi:10.1111/j.0006-341x.2002.00021.x 2002
[2]

Graph.Let G be the disjoint union of n TP-cells C1, . . . ,Cn. The observed variables are V={X 1, Y1, . . . , Xn, Yn}

work page
[3]

Constructive achievability is established in Lemma C.3 below; if necessary, rescale the values vi so that they lie in the achievable range

Observational distribution.Choose per-cell distributions Pi(Xi, Yi) such that poti(ai) = vi, where ai =P(y i,xi). Constructive achievability is established in Lemma C.3 below; if necessary, rescale the values vi so that they lie in the achievable range. Set P(V) =Qn i=1 Pi(Xi, Yi). 3.Query.Let θ= nX i=1 PNSi = nX i=1 P(y i,xi , y′ i,x′ i )

work page
[4]

,an} with ai =P(y i,xi)

Experiment set.Let A={a 1, . . . ,an} with ai =P(y i,xi). For any S⊆ {1, . . . , n} , we writea S :={a i :i∈S}

work page
[5]

, n} , and use the original budgetB

Cost function and budget.Set c(aS) =P i∈S wi for S⊆ {1, . . . , n} , and use the original budgetB. The construction is polynomial in n: each TP-cell is a constant-size object, and Lemma C.3 produces the requiredP i in time polynomial in the bit-length ofv i. 13 C.3 Additivity of potency across cells The key technical claim is that the potency of any subse...

work page 1996
[6]

PickY (1) uniformly fromV; pickX (1) uniformly fromanc(Y (1))\ {Y (1)}

work page
[7]

24 H.3 Candidate Experiment Sampling Each a∈A is sampled independently

For the second world, pick Y (2) uniformly from V subject to X(1) ∈anc(Y (2)), then set X(2) =X (1). 24 H.3 Candidate Experiment Sampling Each a∈A is sampled independently. With probability 0.3, a has two counterfactual worlds; otherwise one

work page
[8]

Sample|W (1)|,|Z (1)| ∼Unif{1,2,3}

work page
[9]

PickW (1) ⊆Vuniformly; pickZ (1) ⊆anc(W (1))\W (1) uniformly

work page
[10]

Remaining variables fill fromVandanc(W (2))\W (2), respectively

For two-world experiments, sample W (2),Z (2) analogously, but seed each with a shared element: pick w∈W (1) uniformly and require w∈W (2); pick z∈Z (1)∩(anc(W (2))\W (2)) uniformly and requirez∈Z (2). Remaining variables fill fromVandanc(W (2))\W (2), respectively. H.4 Erd ˝os–Rényi Graphs Each ER graph has a fixed topological order V1 <· · ·< V N . Dire...

work page
[11]

Add a forced confounder betweenX (1) andY (1)

work page
[12]

Draw a target ratior∼Unif(0.1,0.9), clamped below by1/|V|for achievability

work page
[13]

physical activity is protective

Add confounders between uniformly random pairs of observed variables until the total reaches ⌊r· |V|⌋. With100simulations per graph, this yields11×100 = 1,100simulations. H.6 Compute Used All experiments were executed on a single Microsoft Surface Laptop Studio with an 11th Gen Intel Core i7-11370H CPU (3.30 GHz base) and 32 GB RAM. No GPU was used and no...

work page 2017

[1] [1]

doi: 10.1111/j.0006-341X.2002.00021.x. S. Greenland and J. M. Robins. Identifiability, exchangeability, and epidemiological confounding. International Journal of Epidemiology, 15(3):413–419, 09 1986. ISSN 0300-5771. doi: 10.1093/ ije/15.3.413. URLhttps://doi.org/10.1093/ije/15.3.413. Y . Kivva, E. Mokhtarian, J. Etesami, and N. Kiyavash. Revisiting the ge...

work page doi:10.1111/j.0006-341x.2002.00021.x 2002

[2] [2]

Graph.Let G be the disjoint union of n TP-cells C1, . . . ,Cn. The observed variables are V={X 1, Y1, . . . , Xn, Yn}

work page

[3] [3]

Constructive achievability is established in Lemma C.3 below; if necessary, rescale the values vi so that they lie in the achievable range

Observational distribution.Choose per-cell distributions Pi(Xi, Yi) such that poti(ai) = vi, where ai =P(y i,xi). Constructive achievability is established in Lemma C.3 below; if necessary, rescale the values vi so that they lie in the achievable range. Set P(V) =Qn i=1 Pi(Xi, Yi). 3.Query.Let θ= nX i=1 PNSi = nX i=1 P(y i,xi , y′ i,x′ i )

work page

[4] [4]

,an} with ai =P(y i,xi)

Experiment set.Let A={a 1, . . . ,an} with ai =P(y i,xi). For any S⊆ {1, . . . , n} , we writea S :={a i :i∈S}

work page

[5] [5]

, n} , and use the original budgetB

Cost function and budget.Set c(aS) =P i∈S wi for S⊆ {1, . . . , n} , and use the original budgetB. The construction is polynomial in n: each TP-cell is a constant-size object, and Lemma C.3 produces the requiredP i in time polynomial in the bit-length ofv i. 13 C.3 Additivity of potency across cells The key technical claim is that the potency of any subse...

work page 1996

[6] [6]

PickY (1) uniformly fromV; pickX (1) uniformly fromanc(Y (1))\ {Y (1)}

work page

[7] [7]

24 H.3 Candidate Experiment Sampling Each a∈A is sampled independently

For the second world, pick Y (2) uniformly from V subject to X(1) ∈anc(Y (2)), then set X(2) =X (1). 24 H.3 Candidate Experiment Sampling Each a∈A is sampled independently. With probability 0.3, a has two counterfactual worlds; otherwise one

work page

[8] [8]

Sample|W (1)|,|Z (1)| ∼Unif{1,2,3}

work page

[9] [9]

PickW (1) ⊆Vuniformly; pickZ (1) ⊆anc(W (1))\W (1) uniformly

work page

[10] [10]

Remaining variables fill fromVandanc(W (2))\W (2), respectively

For two-world experiments, sample W (2),Z (2) analogously, but seed each with a shared element: pick w∈W (1) uniformly and require w∈W (2); pick z∈Z (1)∩(anc(W (2))\W (2)) uniformly and requirez∈Z (2). Remaining variables fill fromVandanc(W (2))\W (2), respectively. H.4 Erd ˝os–Rényi Graphs Each ER graph has a fixed topological order V1 <· · ·< V N . Dire...

work page

[11] [11]

Add a forced confounder betweenX (1) andY (1)

work page

[12] [12]

Draw a target ratior∼Unif(0.1,0.9), clamped below by1/|V|for achievability

work page

[13] [13]

physical activity is protective

Add confounders between uniformly random pairs of observed variables until the total reaches ⌊r· |V|⌋. With100simulations per graph, this yields11×100 = 1,100simulations. H.6 Compute Used All experiments were executed on a single Microsoft Surface Laptop Studio with an 11th Gen Intel Core i7-11370H CPU (3.30 GHz base) and 32 GB RAM. No GPU was used and no...

work page 2017