pith. sign in

arxiv: 2507.13920 · v2 · submitted 2025-07-18 · 💻 cs.LG

Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem

Pith reviewed 2026-05-19 03:43 UTC · model grok-4.3

classification 💻 cs.LG
keywords causal process modelsdynamic causal graphsreinforcement learningsparse graphsphysical predictionworld modelingvisual observationsmulti-agent RL
0
0 comments X

The pith

Causal Process Models learn sparse time-varying causal graphs from visual observations by casting graph construction as a multi-agent reinforcement learning problem.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Causal Process Models that treat the construction of dynamic causal graphs as a reinforcement learning task rather than assuming static dense connections. Specialized agents decide at each timestep which objects are causally linked, using a factorization of object and force vectors into three learned dimensions. This produces sparse graphs that only include active interactions. A sympathetic reader would care because the approach promises more efficient and interpretable world models for physical scenes where interactions change over time and object counts vary.

Core claim

Causal Process Models reframe dynamic causal graph discovery as a multi-agent reinforcement learning problem in which agents sequentially decide causal edges based on a structured representation that factorizes object and force vectors along three dimensions of mutability, causal relevance, and control relevance, yielding semantically meaningful encodings that support accurate interaction-graph construction.

What carries the argument

Causal Process Models (CPMs), which use multi-agent reinforcement learning to decide causal connections at each timestep from factorized representations along mutability, causal relevance, and control relevance dimensions.

If this is right

  • CPMs produce more accurate physical predictions than dense graph baselines especially over longer horizons.
  • The models handle scenes with varying numbers of objects more effectively.
  • Causal graphs are constructed only for active interactions, improving computational efficiency and interpretability.
  • The three-dimensional factorization enables automatic discovery of meaningful causal encodings without explicit supervision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same factorization approach could be tested on non-physical domains such as video of social or biological interactions where causal links also appear and disappear.
  • Combining the RL decision process with other representation-learning objectives might improve robustness when visual observations contain noise or partial occlusions.
  • If the learned dimensions prove consistent across environments, they could serve as a general prior for causal discovery in other sequential decision tasks.

Load-bearing premise

The three learned dimensions automatically yield semantically meaningful encodings that allow the reinforcement learning agents to make accurate decisions about which causal edges to include.

What would settle it

Run the model on a controlled physical simulation with known, time-varying ground-truth causal structures and measure whether the learned sparse graphs match the true edges while outperforming dense baselines on long-horizon prediction error.

Figures

Figures reproduced from arXiv: 2507.13920 by Charley M Wu, Christian Gumbsch, Martin V. Butz, Turan Orujlu.

Figure 1
Figure 1. Figure 1: Our model has three components: a vision encoder, an action encoder, and the transition [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Prediction results for a synthetic physics environment in a) observed and b) unobserved settings (Ke et al., 2021) Top: Description of the task. Bottom: Prediction metric vs prediction horizon for 5 objects (average of top 8 out of 10 seeds). 5.1 Comparison Baselines We compare our model against 2 baselines, a graph neural network (GNN) (Scarselli et al., 2009) and a modular network which has a separate ML… view at source ↗
Figure 3
Figure 3. Figure 3: Prediction and downstream RL results over number of objects a: Prediction metric vs number of objects for 5-steps. b-c: Mean reward vs number of objects. All results are the average of top 8 out of 10 seeds setting and 7-object 10-step prediction setting (Fig. 3b). In the Unobserved setting, our model was consistently better than the baselines (Fig. 3c). 6 Discussion In this paper, we introduced the Causal… view at source ↗
read the original abstract

Most neural models of causality assume static causal graphs, failing to capture the dynamic and sparse nature of physical interactions where causal relationships emerge and dissolve over time. We introduce the Causal Process Framework and its neural implementation, Causal Process Models (CPMs), for learning sparse, time-varying causal graphs from visual observations. Unlike traditional approaches that maintain dense connectivity, our model explicitly constructs causal edges only when objects actively interact, dramatically improving both interpretability and computational efficiency. We achieve this by casting dynamic interaction-graph construction for world modeling as a multi-agent reinforcement learning problem, where specialized agents sequentially decide which objects are causally connected at each timestep. Our key innovation is a structured representation that factorizes object and force vectors along three learned dimensions (mutability, causal relevance, and control relevance), enabling the automatic discovery of semantically meaningful encodings. We demonstrate that a CPM significantly outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces the Causal Process Framework and its neural implementation, Causal Process Models (CPMs), which reframe dynamic causal graph discovery from visual observations as a multi-agent reinforcement learning problem. Object and force representations are factorized along three learned dimensions (mutability, causal relevance, and control relevance) so that specialized agents can sequentially construct sparse, time-varying causal edges only when interactions occur. The central empirical claim is that CPMs significantly outperform dense-graph baselines on physical prediction tasks, with particular gains at longer horizons and under varying object counts.

Significance. If the performance gains are robust and the RL component is shown to be essential rather than incidental to the factorization, the work offers a novel route to interpretable, computationally efficient modeling of time-varying physical interactions. The structured latent dimensions could improve both prediction accuracy and causal insight in simulation or robotics settings. However, the current lack of detailed metrics, ablations, and isolation of the RL contribution limits the assessed significance.

major comments (3)
  1. [Abstract / Experimental Results] Abstract and Experimental Evaluation: The headline claim that 'a CPM significantly outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts' supplies no quantitative metrics, baseline specifications, dataset details, or ablation evidence. Without these, it is impossible to judge whether the central performance result is supported or sensitive to post-hoc choices.
  2. [Method / Experiments] Method and Experiments: The manuscript does not report ablations that remove the multi-agent RL edge-selection step while retaining the three-dimensional factorization. If the structured latent space alone drives the long-horizon gains, the reframing of graph discovery as RL would not be load-bearing for the reported improvements.
  3. [Method] Method: The assumption that the three learned dimensions 'automatically' yield semantically meaningful encodings that support accurate causal-edge decisions by the RL agents is stated without accompanying analysis, visualizations, or quantitative validation of semantic alignment. This assumption underpins both the interpretability and performance claims.
minor comments (1)
  1. [Method] Notation for the three dimensions and the agent decision process could be introduced with explicit equations or pseudocode to improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major point below and indicate the changes made in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract / Experimental Results] Abstract and Experimental Evaluation: The headline claim that 'a CPM significantly outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts' supplies no quantitative metrics, baseline specifications, dataset details, or ablation evidence. Without these, it is impossible to judge whether the central performance result is supported or sensitive to post-hoc choices.

    Authors: We agree that the abstract would benefit from explicit quantitative support. In the revised version we have expanded the abstract to include concrete metrics (e.g., relative MSE reductions at 10-, 20-, and 50-step horizons and across 3–8 object counts), together with brief specifications of the simulation datasets and the dense-graph baselines against which CPM is compared. These numbers are taken directly from the experimental results already reported in Section 4. revision: yes

  2. Referee: [Method / Experiments] Method and Experiments: The manuscript does not report ablations that remove the multi-agent RL edge-selection step while retaining the three-dimensional factorization. If the structured latent space alone drives the long-horizon gains, the reframing of graph discovery as RL would not be load-bearing for the reported improvements.

    Authors: This is a fair criticism. The original manuscript compared the full CPM only against dense baselines and did not isolate the RL component. To address the concern we have added a new ablation in the revised experimental section: a non-RL variant that retains the three-dimensional factorization but replaces the multi-agent RL edge-selection policy with a deterministic threshold on the causal-relevance dimension. The results show that removing the RL step degrades long-horizon prediction accuracy, indicating that the reinforcement-learning formulation contributes measurably beyond the factorization alone. These additional results are now reported in Section 4.3. revision: yes

  3. Referee: [Method] Method: The assumption that the three learned dimensions 'automatically' yield semantically meaningful encodings that support accurate causal-edge decisions by the RL agents is stated without accompanying analysis, visualizations, or quantitative validation of semantic alignment. This assumption underpins both the interpretability and performance claims.

    Authors: We accept that the original submission offered only qualitative motivation for the semantic content of the three dimensions. In the revised manuscript we have inserted a dedicated analysis subsection (Section 3.4) that supplies (i) t-SNE visualizations of the factorized representations colored by ground-truth mutability, causal relevance, and control relevance, and (ii) quantitative alignment scores (Pearson correlations and mutual-information values) between each learned dimension and the corresponding simulator-provided attributes. These additions provide direct evidence that the dimensions are semantically aligned and that they inform the RL agents’ edge decisions. revision: yes

Circularity Check

0 steps flagged

No circularity: new RL reframing and factorization presented as independent construction

full rationale

The paper introduces CPMs by reframing dynamic graph discovery as multi-agent RL and factorizing object/force vectors into three learned dimensions (mutability, causal relevance, control relevance). No equations, derivations, or self-citations are exhibited that reduce the claimed performance gains on long-horizon prediction to quantities defined by construction from the same fitted inputs or prior author results. The central claims rest on the empirical comparison to dense baselines rather than any self-definitional loop or renamed known result. The derivation chain is therefore self-contained as a novel modeling framework.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Abstract introduces the Causal Process Framework and the three-axis factorization without stating explicit free parameters or background axioms; the RL formulation and interaction-only edge construction are presented as novel modeling choices.

invented entities (2)
  • Causal Process Framework no independent evidence
    purpose: To enable sparse time-varying causal graphs from visual observations
    New framework introduced to replace static dense graphs; no independent evidence supplied in abstract.
  • Three learned dimensions (mutability, causal relevance, control relevance) no independent evidence
    purpose: To factorize object and force vectors for semantically meaningful encodings
    Invented structured representation whose utility is asserted but not independently verified in the provided abstract.

pith-pipeline@v0.9.0 · 5702 in / 1211 out tokens · 34676 ms · 2026-05-19T03:43:39.478412+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

  1. [1]

    On the bottleneck of graph neural networks and its practical implications

    Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net,

  2. [4]

    Relational inductive biases, deep learning, and graph networks

    URL http://arxiv.org/abs/1806.01261. Lars Buesing, Theophane Weber, Yori Zwols, Nicolas Heess, Sébastien Racanière, Arthur Guez, and Jean-Baptiste Lespiau. Woulda, coulda, shoulda: Counterfactually-guided policy search. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 . OpenReview.net,

  3. [5]

    DOI: https://doi.org/10.1016/j.neubiorev

    ISSN 0149-7634. DOI: https://doi.org/10.1016/j.neubiorev. 2024.105948. URL https://www.sciencedirect.com/science/article/pii/ S0149763424004172. Hanna M Dettki, Brenden M Lake, Charley M Wu, and Bob Rehder. Do large language models reason causally like us? even better? arXiv preprint arXiv:2502.10215,

  4. [6]

    Using confounded data in latent model-based reinforcement learning

    Maxime Gasse, Damien Grasset, Guillaume Gaudron, and Pierre-Yves Oudeyer. Using confounded data in latent model-based reinforcement learning. Trans. Mach. Learn. Res., 2023,

  5. [7]

    Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio, and Michael M

    URL https://api.semanticscholar.org/CorpusID:235361720. Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio, and Michael M. Bronstein. On over-squashing in message passing neural networks: The impact of width, depth, and topology. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan...

  6. [8]

    Francesco Di Giovanni, T

    URL https://proceedings.mlr.press/v202/ di-giovanni23a.html. Francesco Di Giovanni, T. Konstantin Rusch, Michael M. Bronstein, Andreea Deac, Marc Lackenby, Siddhartha Mishra, and Petar Velickovic. How does over-squashing affect the power of gnns? Trans. Mach. Learn. Res. , 2024,

  7. [9]

    Systematic evaluation of causal discovery in visual model based reinforcement learning

    Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Jimenez Rezende, Michael Mozer, Yoshua Bengio, and Chris Pal. Systematic evaluation of causal discovery in visual model based reinforcement learning. In Joaquin Vanschoren and Sai-Kit Yeung (eds.), Proceedings of the Neural Information Processing System...

  8. [10]

    Causal Reinforcement Learning Workshop at RLC 2025 Thomas N

    URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/ hash/8f121ce07d74717e0b1f21d122e04521-Abstract-round2.html. Causal Reinforcement Learning Workshop at RLC 2025 Thomas N. Kipf, Elise van der Pol, and Max Welling. Contrastive learning of structured world models. In 8th International Conference on Learning Representations, ICLR 2020, Addis ...

  9. [11]

    Richard D

    URL https://openreview.net/forum?id= H1gax6VtDB. Richard D. Lange and Konrad P. Kording. Causality in the human niche: lessons for machine learning. CoRR, abs/2506.13803,

  10. [12]

    2025, arXiv e-prints, arXiv:2510.13477, doi:10.48550/arXiv

    DOI: 10.48550/ARXIV .2506.13803. URL https: //doi.org/10.48550/arXiv.2506.13803. Anson Lei, Bernhard Schölkopf, and Ingmar Posner. Spartan: A sparse transformer learning local causation. arXiv preprint arXiv:2411.06890,

  11. [13]

    Causal transformer for estimating counterfactual outcomes

    Valentyn Melnychuk, Dennis Frauen, and Stefan Feuerriegel. Causal transformer for estimating counterfactual outcomes. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine...

  12. [14]

    Eshaan Nichani, Alex Damian, and Jason D

    URL https://proceedings.mlr.press/ v162/melnychuk22a.html. Eshaan Nichani, Alex Damian, and Jason D. Lee. How transformers learn causal structure with gradient descent. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

  13. [15]

    DOI: 10.1201/9781420011579.ch23

    ISBN 978-1-58488-658-7. DOI: 10.1201/9781420011579.ch23. Raanan Y . Rohekar, Yaniv Gurwicz, and Shami Nisimov. Causal interpretation of self- attention in pre-trained transformers. In Alice Oh, Tristan Naumann, Amir Glober- son, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural In- formation Processing Systems 36: Annual Conference o...

  14. [16]

    Donald B

    URL http://papers.nips.cc/paper_files/paper/2023/hash/ 642a321fba8a0f03765318e629cb93ea-Abstract-Conference.html. Donald B. Rubin. Bayesian Inference for Causal Effects: The Role of Randomization. The Annals of Statistics, 6(1):34 – 58,

  15. [17]

    Sasi Kiran Gaddipati et al

    DOI: 10.1214/aos/1176344064. URL https://doi.org/10. 1214/aos/1176344064. Bertrand Russell. Human Knowledge: Its Scope and Limits . Routledge, London and New York,

  16. [18]

    Xingzhi Sun, Danqi Liao, Kincaid MacDonald, Yanlei Zhang, Chen Liu, Guillaume Huguet, Guy Wolf, Ian Adelstein, Tim GJ Rudner, and Smita Krishnaswamy

    DOI: 10.1109/TNN.2008.2005605. URL https://doi.org/10.1109/TNN.2008.2005605. Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634,

  17. [19]

    Xiao Shou, Debarun Bhattacharjya, Tian Gao, Dharmashankar Subramanian, Oktie Hassanzadeh, and Kristin P. Bennett. Pairwise causality guided transformers for event sequences. In Al- ice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural ...

  18. [20]

    Brian Skyrms

    URL http://papers.nips.cc/paper_files/paper/2023/hash/ 91b047c5f5bd41ef56bfaf4ad0bd19e3-Abstract-Conference.html. Brian Skyrms. Causal necessity. Philosophy of Science, 48(2):329–335,

  19. [21]

    Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M

    DOI: 10.1086/289003. Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M. Bronstein. Understanding over-squashing and bottlenecks on graphs via curvature. In The Tenth In- ternational Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29,

  20. [22]

    Causal action influence aware counterfactual data augmentation.arXiv preprint arXiv:2405.18917, 2024

    URL https://openreview.net/forum?id=7UmjRGzp-A. Núria Armengol Urpí, Marco Bagatella, Marin Vlastelica, and Georg Martius. Causal action influence aware counterfactual data augmentation. arXiv preprint arXiv:2405.18917,

  21. [23]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 30: Annual Conference ...

  22. [24]

    Christopher J

    URL https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html. Christopher J. C. H. Watkins and Peter Dayan. Technical note q-learning. Mach. Learn., 8:279–292,

  23. [25]

    URL https://doi.org/10.1007/BF00992698

    DOI: 10.1007/BF00992698. URL https://doi.org/10.1007/BF00992698. Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229–256,

  24. [26]

    Williams

    DOI: 10.1007/BF00992696. URL https://doi. org/10.1007/BF00992696. Moritz Willig, Tim Nelson Tobiasch, Florian Peter Busch, Jonas Seng, Devendra Singh Dhami, and Kristian Kersting. Systems with switching causal relations: A meta-causal perspective. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28,

  25. [27]

    Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, and Elias Bareinboim

    URL https://openreview.net/forum?id= J9VogDTa1W. Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, and Elias Bareinboim. The causal-neural con- nection: Expressiveness, learnability, and inference. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Ad- vances in Neural Information Processing Systems 34: An...

  26. [29]

    URL https:// arxiv.org/abs/2109.04173