pith. sign in

arxiv: 2409.01794 · v2 · submitted 2024-09-03 · 📊 stat.ME · cs.LG· stat.ML

Estimating Joint Interventional Distributions from Marginal Interventional Data

Pith reviewed 2026-05-23 20:56 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML
keywords causal inferencemaximum entropyinterventional distributionsexponential familyLagrange dualitycausal feature selection
0
0 comments X

The pith

Marginal interventional distributions over variable subsets suffice to recover the joint interventional distribution over all variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends the Causal Maximum Entropy principle to incorporate interventional data in addition to observational data. Using Lagrange duality, it establishes that the resulting optimization problem has a solution in the exponential family. This framework supports two concrete tasks when marginal interventional distributions are supplied for arbitrary subsets of variables: causal feature selection from a mixture of observational and single-variable interventional data, and direct recovery of the full joint interventional distribution. A sympathetic reader cares because the approach combines datasets collected under different interventions without requiring joint observations of every variable at once.

Core claim

Extending the Causal Maximum Entropy objective to include interventional constraints yields, by Lagrange duality, a solution in the exponential family. When marginal interventional distributions are provided for any subset of the variables, the same objective recovers the joint interventional distribution over the full set and also enables causal feature selection from mixed observational and single-variable interventional data.

What carries the argument

The extended Causal Maximum Entropy objective with interventional constraints, solved via its Lagrange dual to produce an exponential-family distribution.

If this is right

  • Causal feature selection can be performed from a mixture of observational data and single-variable interventional data, outperforming prior merging methods on synthetic examples.
  • The recovered joint interventional distributions match the performance of tests that require full joint observations.
  • The exponential-family form supplies an explicit parametric representation for any collection of marginal interventional constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the causal graph is known, the same dual construction could be used to propagate constraints across unobserved interventions.
  • The method suggests a data-collection strategy in which separate experiments each intervene on only a few variables, with the joint recovered afterward.

Load-bearing premise

Marginal interventional distributions supplied for arbitrary subsets of variables are together sufficient to uniquely determine the joint interventional distribution over all variables.

What would settle it

A concrete data-generating process in which two different joint interventional distributions produce identical marginal interventional distributions on every proper subset, yet differ on the full joint.

Figures

Figures reproduced from arXiv: 2409.01794 by Armin Keki\'c, Atalanti Mastakouri, Bernhard Sch\"olkopf, Elke Kirschbaum, Sergio Hernan Garrido Mejia.

Figure 1
Figure 1. Figure 1: Results for causal feature selection. (a), (b), and (c) show the graph structures used for our synthetic experiments. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Residuals between true and estimated joint interventional distributions. The violin plots show the residuals between [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

In this paper we show how to exploit interventional data to acquire the joint conditional distribution of all the variables using the Maximum Entropy principle. To this end, we extend the Causal Maximum Entropy method to make use of interventional data in addition to observational data. Using Lagrange duality, we prove that the solution to the Causal Maximum Entropy problem with interventional constraints lies in the exponential family, as in the Maximum Entropy solution. Our method allows us to perform two tasks of interest when marginal interventional distributions are provided for any subset of the variables. First, we show how to perform causal feature selection from a mixture of observational and single-variable interventional data, and, second, how to infer joint interventional distributions. For the former task, we show on synthetically generated data, that our proposed method outperforms the state-of-the-art method on merging datasets, and yields comparable results to the KCI-test which requires access to joint observations of all variables.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript extends the Causal Maximum Entropy framework to incorporate interventional data alongside observational data. Using Lagrange duality, it claims to prove that the optimizer under interventional constraints remains in the exponential family. The method is applied to two tasks: causal feature selection from a mixture of observational and single-variable interventional data, and recovery of joint interventional distributions from marginal interventional distributions supplied for arbitrary subsets of variables. Synthetic experiments indicate that the approach outperforms dataset-merging baselines for feature selection and performs comparably to the KCI test (which requires joint observations).

Significance. If the duality argument is rigorous and the supplied marginal interventional constraints suffice for unique recovery of the joint, the work supplies a principled maximum-entropy route for fusing observational and interventional data without requiring full joint observations. The exponential-family preservation result would be a clean theoretical contribution, and the feature-selection experiments on synthetic data provide concrete empirical grounding.

major comments (2)
  1. [Abstract / identifiability section] Abstract and the section presenting the identifiability claim: the statement that the method recovers the joint interventional distribution “when marginal interventional distributions are provided for any subset of the variables” is load-bearing for both claimed tasks. No explicit identifiability theorem or graph-dependent conditions are supplied showing that the marginal interventional constraints uniquely determine the joint; multiple joints can agree on the same do-marginals when intervened subsets leave paths or components unconstrained.
  2. [Theoretical development / duality argument] The Lagrange-duality proof (referenced in the abstract and presumably in the main theoretical section): the claim that the solution remains in the exponential family under interventional constraints is central, yet the manuscript provides neither the explicit dual derivation nor the encoding of the marginal interventional expectations as constraints. Without these details the preservation result cannot be verified.
minor comments (2)
  1. [Experiments] The synthetic-data section should report the precise data-generating process, the number of variables, the fraction of interventional samples, and the exact performance metrics (beyond “outperforms”) so that the feature-selection comparison can be reproduced.
  2. [Notation / method section] Notation for the interventional constraints (e.g., how P(V_S | do(V_T)) is written inside the extended Causal MaxEnt objective) should be introduced once and used consistently.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify key areas where the manuscript requires greater rigor and explicit detail. We address each major comment below and will incorporate the necessary revisions.

read point-by-point responses
  1. Referee: [Abstract / identifiability section] Abstract and the section presenting the identifiability claim: the statement that the method recovers the joint interventional distribution “when marginal interventional distributions are provided for any subset of the variables” is load-bearing for both claimed tasks. No explicit identifiability theorem or graph-dependent conditions are supplied showing that the marginal interventional constraints uniquely determine the joint; multiple joints can agree on the same do-marginals when intervened subsets leave paths or components unconstrained.

    Authors: We acknowledge that the current version does not supply an explicit identifiability theorem with graph-dependent conditions. The claim in the abstract and introduction is intended to hold under the maximum-entropy principle when the supplied marginal interventional constraints are sufficient to pin down the joint, but we agree that uniqueness is not automatic for arbitrary subsets. In the revision we will add a dedicated identifiability subsection that states the precise conditions (e.g., when the collection of intervened variable sets covers all relevant causal paths or satisfies a covering criterion on the underlying DAG) under which the joint interventional distribution is uniquely recoverable from the given marginals. revision: yes

  2. Referee: [Theoretical development / duality argument] The Lagrange-duality proof (referenced in the abstract and presumably in the main theoretical section): the claim that the solution remains in the exponential family under interventional constraints is central, yet the manuscript provides neither the explicit dual derivation nor the encoding of the marginal interventional expectations as constraints. Without these details the preservation result cannot be verified.

    Authors: The manuscript sketches the Lagrange-duality argument but does not expand the full derivation or the precise encoding of interventional marginals. We will revise the theoretical section to include the complete steps: (i) formulation of the constrained optimization problem that augments the observational entropy objective with both observational and interventional moment-matching constraints, (ii) construction of the Lagrangian that incorporates the do-marginal expectations as linear constraints on the interventional distributions, (iii) derivation of the dual problem, and (iv) explicit verification that the resulting primal optimizer belongs to the exponential family with parameters that absorb the interventional Lagrange multipliers. This will make the preservation result directly verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation follows from standard Lagrange duality on extended MaxEnt

full rationale

The abstract describes extending Causal MaxEnt with interventional constraints and applying Lagrange duality to obtain an exponential-family solution. This is a direct consequence of the optimization problem definition and does not reduce any target quantity (joint interventional distribution) to a fitted parameter or self-citation by construction. No load-bearing steps match the enumerated circularity patterns; the method is presented as building on the established MaxEnt principle with an independent duality argument. The identifiability of joints from marginals is an assumption whose validity is external to the derivation chain itself.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the maximum entropy principle being applicable once interventional marginals are added as constraints, plus the correctness of the Lagrange duality argument. No explicit free parameters or new entities are named in the abstract.

free parameters (1)
  • Lagrange multipliers for interventional constraints
    These multipliers are introduced to enforce the marginal interventional distributions; their values are determined by the optimization and are therefore fitted to the supplied data.
axioms (2)
  • domain assumption The maximum entropy principle remains valid when observational and interventional marginal constraints are combined in a causal setting.
    Invoked when the abstract states that the Causal MaxEnt method is extended to interventional data.
  • standard math Lagrange duality applies directly to the Causal MaxEnt objective with the added interventional constraints.
    Used to prove the solution lies in the exponential family.

pith-pipeline@v0.9.0 · 5708 in / 1422 out tokens · 55907 ms · 2026-05-23T20:56:10.253859+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    A., and Della Pietra, V

    Berger, A., Della Pietra, S. A., and Della Pietra, V. J. A maximum entropy approach to natural language processing. Computational linguistics, 22 0 (1): 0 39--71, 1996

  2. [2]

    J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., Vander P las, J., Wanderman- M ilne, S., and Zhang, Q

    Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., Vander P las, J., Wanderman- M ilne, S., and Zhang, Q. JAX : composable transformations of P ython+ N um P y programs, 2018. URL http://github.com/google/jax

  3. [3]

    Cooper, G. F. and Yoo, C. Causal discovery from a mixture of experimental and observational data. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pp.\ 116--125, 1999

  4. [4]

    Integrating locally learned causal structures with overlapping variables

    Danks, D., Glymour, C., and Tillman, R. Integrating locally learned causal structures with overlapping variables. Advances in Neural Information Processing Systems, 21, 2008

  5. [5]

    Deming, W. E. and Stephan, F. F. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics, 11 0 (4): 0 427--444, 1940

  6. [6]

    and Murphy, K

    Eaton, D. and Murphy, K. Exact bayesian structure learning from uncertain interventions. In Meila, M. and Shen, X. (eds.), Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, volume 2 of Proceedings of Machine Learning Research, pp.\ 107--114, San Juan, Puerto Rico, 21--24 Mar 2007. PMLR. URL https://proceedings...

  7. [7]

    Q., Ghasemi, M., and Kocaoglu, M

    Elahi, M. Q., Ghasemi, M., and Kocaoglu, M. Identification of average causal effects in confounded additive noise models. arXiv preprint arXiv:2407.10014, 2024

  8. [8]

    and Tse, D

    Farnia, F. and Tse, D. A minimax approach to supervised learning. Advances in Neural Information Processing Systems, 29, 2016

  9. [9]

    Obtaining causal information by merging datasets with maxent

    Garrido Mejia , S., Kirschbaum, E., and Janzing, D. Obtaining causal information by merging datasets with maxent. In International Conference on Artificial Intelligence and Statistics, pp.\ 581--603. PMLR, 2022

  10. [10]

    u gelgen, J., K \

    Gresele, L., Von K \"u gelgen, J., K \"u bler, J., Kirschbaum, E., Sch \"o lkopf, B., and Janzing, D. Causal inference through the structural causal marginal problem. In International Conference on Machine Learning, pp.\ 7793--7824. PMLR, 2022

  11. [11]

    Invariant causal prediction for nonlinear models

    Heinze-Deml, C., Peters, J., and Meinshausen, N. Invariant causal prediction for nonlinear models. Journal of Causal Inference, 6 0 (2), 2018

  12. [12]

    M., and Talahaturuson, A

    Hindersah, R., Kalay, A. M., and Talahaturuson, A. Rice yield grown in different fertilizer combination and planting methods: Case study in buru island, indonesia. Open Agriculture, 7 0 (1): 0 871--881, 2022

  13. [13]

    Causal versions of maximum entropy and principle of insufficient reason

    Janzing, D. Causal versions of maximum entropy and principle of insufficient reason. Journal of Causal Inference, 9 0 (1): 0 285--301, 2021

  14. [14]

    Distinguishing Cause and Effect via Second Order Exponential Models

    Janzing, D., Sun, X., and Sch \"o lkopf, B. Distinguishing cause and effect via second order exponential models. arXiv preprint arXiv:0910.5561, 2009

  15. [15]

    Jaynes, E. T. Information theory and statistical mechanics. Physical review, 106 0 (4): 0 620, 1957

  16. [16]

    Jaynes, E. T. Probability theory: The logic of science. Cambridge university press, 2003

  17. [17]

    Disentangling causal effects from sets of interventions in the presence of unobserved confounders

    Jeunen, O., Gilligan-Lee, C., Mehrotra, R., and Lalmas, M. Disentangling causal effects from sets of interventions in the presence of unobserved confounders. Advances in Neural Information Processing Systems, 35: 0 27850--27861, 2022

  18. [18]

    Kellerer, H. G. Ma theoretische marginalprobleme. Mathematische Annalen, 153 0 (3): 0 168--198, June 1964. doi:10.1007/bf01360315. URL https://doi.org/10.1007/bf01360315

  19. [19]

    and Friedman, N

    Koller, D. and Friedman, N. Probabilistic graphical models: principles and techniques. MIT press, 2009

  20. [20]

    M., Magliacane, S., and Claassen, T

    Mooij, J. M., Magliacane, S., and Claassen, T. Joint causal inference from multiple contexts. The Journal of Machine Learning Research, 21 0 (1): 0 3919--4026, 2020

  21. [21]

    Causality

    Pearl, J. Causality. Cambridge university press, 2009

  22. [22]

    and Mackenzie, D

    Pearl, J. and Mackenzie, D. The Book of Why: The New Science of Cause and Effect. Basic Books, Inc., USA, 1st edition, 2018. ISBN 046509760X

  23. [23]

    Causal inference by using invariant prediction: identification and confidence intervals

    Peters, J., B \"u hlmann, P., and Meinshausen, N. Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78 0 (5): 0 947--1012, 2016

  24. [24]

    Effect of irrigation and fertilizer management on rice yield and nitrogen loss: A meta-analysis

    Qiu, H., Yang, S., Jiang, Z., Xu, Y., and Jiao, X. Effect of irrigation and fertilizer management on rice yield and nitrogen loss: A meta-analysis. Plants, 11 0 (13): 0 1690, 2022

  25. [25]

    and Silva, R

    Saengkyongam, S. and Silva, R. Learning joint nonlinear effects from single-variable interventions in the presence of hidden confounders. In Conference on Uncertainty in Artificial Intelligence, pp.\ 300--309. PMLR, 2020

  26. [26]

    A., and Janzing, D

    Sani, N., Mastakouri, A. A., and Janzing, D. Bounding probabilities of causation through the causal marginal problem. arXiv preprint arXiv:2304.02023, 2023

  27. [27]

    Causal inference by choosing graphs with most plausible markov kernels

    Sun, X., Janzing, D., and Sch \"o lkopf, B. Causal inference by choosing graphs with most plausible markov kernels. In Ninth International Symposium on Artificial Intelligence and Mathematics (AIMath 2006), pp.\ 1--11, 2006

  28. [28]

    and Pearl, J

    Tian, J. and Pearl, J. Causal discovery from changes. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp.\ 512--521, 2001

  29. [29]

    and Pearl, J

    Tian, J. and Pearl, J. A general identification condition for causal effects. eScholarship, University of California, 2002

  30. [30]

    and Spirtes, P

    Tillman, R. and Spirtes, P. Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp.\ 3--15. JMLR Workshop and Conference Proceedings, 2011

  31. [31]

    Tillman, R. E. Structure learning with independent non-identically distributed data. In Proceedings of the 26th Annual International Conference on Machine Learning, pp.\ 1041--1048, 2009

  32. [32]

    and Tsamardinos, I

    Triantafillou, S. and Tsamardinos, I. Constraint-based causal discovery from multiple interventions over overlapping variable sets. The Journal of Machine Learning Research, 16 0 (1): 0 2147--2205, 2015

  33. [33]

    J., Jordan, M

    Wainwright, M. J., Jordan, M. I., et al. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning , 1 0 (1--2): 0 1--305, 2008

  34. [34]

    S., and Ardiwinata, A

    Wihardjaka, A., Harsanti, E. S., and Ardiwinata, A. N. Effect of fertilizer management on potassium dynamics and yield of rainfed lowland rice in indonesia. Chilean journal of agricultural research, 82 0 (1): 0 33--43, 2022

  35. [35]

    Kernel-based conditional independence test and application in causal discovery

    Zhang, K., Peters, J., Janzing, D., and Sch \"o lkopf, B. Kernel-based conditional independence test and application in causal discovery. In 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp.\ 804--813. AUAI Press, 2011

  36. [36]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...