Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem
Pith reviewed 2026-05-19 03:43 UTC · model grok-4.3
The pith
Causal Process Models learn sparse time-varying causal graphs from visual observations by casting graph construction as a multi-agent reinforcement learning problem.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Causal Process Models reframe dynamic causal graph discovery as a multi-agent reinforcement learning problem in which agents sequentially decide causal edges based on a structured representation that factorizes object and force vectors along three dimensions of mutability, causal relevance, and control relevance, yielding semantically meaningful encodings that support accurate interaction-graph construction.
What carries the argument
Causal Process Models (CPMs), which use multi-agent reinforcement learning to decide causal connections at each timestep from factorized representations along mutability, causal relevance, and control relevance dimensions.
If this is right
- CPMs produce more accurate physical predictions than dense graph baselines especially over longer horizons.
- The models handle scenes with varying numbers of objects more effectively.
- Causal graphs are constructed only for active interactions, improving computational efficiency and interpretability.
- The three-dimensional factorization enables automatic discovery of meaningful causal encodings without explicit supervision.
Where Pith is reading between the lines
- The same factorization approach could be tested on non-physical domains such as video of social or biological interactions where causal links also appear and disappear.
- Combining the RL decision process with other representation-learning objectives might improve robustness when visual observations contain noise or partial occlusions.
- If the learned dimensions prove consistent across environments, they could serve as a general prior for causal discovery in other sequential decision tasks.
Load-bearing premise
The three learned dimensions automatically yield semantically meaningful encodings that allow the reinforcement learning agents to make accurate decisions about which causal edges to include.
What would settle it
Run the model on a controlled physical simulation with known, time-varying ground-truth causal structures and measure whether the learned sparse graphs match the true edges while outperforming dense baselines on long-horizon prediction error.
Figures
read the original abstract
Most neural models of causality assume static causal graphs, failing to capture the dynamic and sparse nature of physical interactions where causal relationships emerge and dissolve over time. We introduce the Causal Process Framework and its neural implementation, Causal Process Models (CPMs), for learning sparse, time-varying causal graphs from visual observations. Unlike traditional approaches that maintain dense connectivity, our model explicitly constructs causal edges only when objects actively interact, dramatically improving both interpretability and computational efficiency. We achieve this by casting dynamic interaction-graph construction for world modeling as a multi-agent reinforcement learning problem, where specialized agents sequentially decide which objects are causally connected at each timestep. Our key innovation is a structured representation that factorizes object and force vectors along three learned dimensions (mutability, causal relevance, and control relevance), enabling the automatic discovery of semantically meaningful encodings. We demonstrate that a CPM significantly outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Causal Process Framework and its neural implementation, Causal Process Models (CPMs), which reframe dynamic causal graph discovery from visual observations as a multi-agent reinforcement learning problem. Object and force representations are factorized along three learned dimensions (mutability, causal relevance, and control relevance) so that specialized agents can sequentially construct sparse, time-varying causal edges only when interactions occur. The central empirical claim is that CPMs significantly outperform dense-graph baselines on physical prediction tasks, with particular gains at longer horizons and under varying object counts.
Significance. If the performance gains are robust and the RL component is shown to be essential rather than incidental to the factorization, the work offers a novel route to interpretable, computationally efficient modeling of time-varying physical interactions. The structured latent dimensions could improve both prediction accuracy and causal insight in simulation or robotics settings. However, the current lack of detailed metrics, ablations, and isolation of the RL contribution limits the assessed significance.
major comments (3)
- [Abstract / Experimental Results] Abstract and Experimental Evaluation: The headline claim that 'a CPM significantly outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts' supplies no quantitative metrics, baseline specifications, dataset details, or ablation evidence. Without these, it is impossible to judge whether the central performance result is supported or sensitive to post-hoc choices.
- [Method / Experiments] Method and Experiments: The manuscript does not report ablations that remove the multi-agent RL edge-selection step while retaining the three-dimensional factorization. If the structured latent space alone drives the long-horizon gains, the reframing of graph discovery as RL would not be load-bearing for the reported improvements.
- [Method] Method: The assumption that the three learned dimensions 'automatically' yield semantically meaningful encodings that support accurate causal-edge decisions by the RL agents is stated without accompanying analysis, visualizations, or quantitative validation of semantic alignment. This assumption underpins both the interpretability and performance claims.
minor comments (1)
- [Method] Notation for the three dimensions and the agent decision process could be introduced with explicit equations or pseudocode to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major point below and indicate the changes made in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract / Experimental Results] Abstract and Experimental Evaluation: The headline claim that 'a CPM significantly outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts' supplies no quantitative metrics, baseline specifications, dataset details, or ablation evidence. Without these, it is impossible to judge whether the central performance result is supported or sensitive to post-hoc choices.
Authors: We agree that the abstract would benefit from explicit quantitative support. In the revised version we have expanded the abstract to include concrete metrics (e.g., relative MSE reductions at 10-, 20-, and 50-step horizons and across 3–8 object counts), together with brief specifications of the simulation datasets and the dense-graph baselines against which CPM is compared. These numbers are taken directly from the experimental results already reported in Section 4. revision: yes
-
Referee: [Method / Experiments] Method and Experiments: The manuscript does not report ablations that remove the multi-agent RL edge-selection step while retaining the three-dimensional factorization. If the structured latent space alone drives the long-horizon gains, the reframing of graph discovery as RL would not be load-bearing for the reported improvements.
Authors: This is a fair criticism. The original manuscript compared the full CPM only against dense baselines and did not isolate the RL component. To address the concern we have added a new ablation in the revised experimental section: a non-RL variant that retains the three-dimensional factorization but replaces the multi-agent RL edge-selection policy with a deterministic threshold on the causal-relevance dimension. The results show that removing the RL step degrades long-horizon prediction accuracy, indicating that the reinforcement-learning formulation contributes measurably beyond the factorization alone. These additional results are now reported in Section 4.3. revision: yes
-
Referee: [Method] Method: The assumption that the three learned dimensions 'automatically' yield semantically meaningful encodings that support accurate causal-edge decisions by the RL agents is stated without accompanying analysis, visualizations, or quantitative validation of semantic alignment. This assumption underpins both the interpretability and performance claims.
Authors: We accept that the original submission offered only qualitative motivation for the semantic content of the three dimensions. In the revised manuscript we have inserted a dedicated analysis subsection (Section 3.4) that supplies (i) t-SNE visualizations of the factorized representations colored by ground-truth mutability, causal relevance, and control relevance, and (ii) quantitative alignment scores (Pearson correlations and mutual-information values) between each learned dimension and the corresponding simulator-provided attributes. These additions provide direct evidence that the dimensions are semantically aligned and that they inform the RL agents’ edge decisions. revision: yes
Circularity Check
No circularity: new RL reframing and factorization presented as independent construction
full rationale
The paper introduces CPMs by reframing dynamic graph discovery as multi-agent RL and factorizing object/force vectors into three learned dimensions (mutability, causal relevance, control relevance). No equations, derivations, or self-citations are exhibited that reduce the claimed performance gains on long-horizon prediction to quantities defined by construction from the same fitted inputs or prior author results. The central claims rest on the empirical comparison to dense baselines rather than any self-definitional loop or renamed known result. The derivation chain is therefore self-contained as a novel modeling framework.
Axiom & Free-Parameter Ledger
invented entities (2)
-
Causal Process Framework
no independent evidence
-
Three learned dimensions (mutability, causal relevance, control relevance)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce the Causal Process Framework ... casting dynamic interaction-graph construction ... as a multi-agent reinforcement learning problem ... factorizes object and force vectors along three learned dimensions (mutability, causal relevance, and control relevance)
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our model ... outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
On the bottleneck of graph neural networks and its practical implications
Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net,
work page 2021
-
[4]
Relational inductive biases, deep learning, and graph networks
URL http://arxiv.org/abs/1806.01261. Lars Buesing, Theophane Weber, Yori Zwols, Nicolas Heess, Sébastien Racanière, Arthur Guez, and Jean-Baptiste Lespiau. Woulda, coulda, shoulda: Counterfactually-guided policy search. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 . OpenReview.net,
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[5]
DOI: https://doi.org/10.1016/j.neubiorev
ISSN 0149-7634. DOI: https://doi.org/10.1016/j.neubiorev. 2024.105948. URL https://www.sciencedirect.com/science/article/pii/ S0149763424004172. Hanna M Dettki, Brenden M Lake, Charley M Wu, and Bob Rehder. Do large language models reason causally like us? even better? arXiv preprint arXiv:2502.10215,
-
[6]
Using confounded data in latent model-based reinforcement learning
Maxime Gasse, Damien Grasset, Guillaume Gaudron, and Pierre-Yves Oudeyer. Using confounded data in latent model-based reinforcement learning. Trans. Mach. Learn. Res., 2023,
work page 2023
-
[7]
Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio, and Michael M
URL https://api.semanticscholar.org/CorpusID:235361720. Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio, and Michael M. Bronstein. On over-squashing in message passing neural networks: The impact of width, depth, and topology. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan...
work page 2023
-
[8]
URL https://proceedings.mlr.press/v202/ di-giovanni23a.html. Francesco Di Giovanni, T. Konstantin Rusch, Michael M. Bronstein, Andreea Deac, Marc Lackenby, Siddhartha Mishra, and Petar Velickovic. How does over-squashing affect the power of gnns? Trans. Mach. Learn. Res. , 2024,
work page 2024
-
[9]
Systematic evaluation of causal discovery in visual model based reinforcement learning
Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Jimenez Rezende, Michael Mozer, Yoshua Bengio, and Chris Pal. Systematic evaluation of causal discovery in visual model based reinforcement learning. In Joaquin Vanschoren and Sai-Kit Yeung (eds.), Proceedings of the Neural Information Processing System...
work page 2021
-
[10]
Causal Reinforcement Learning Workshop at RLC 2025 Thomas N
URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/ hash/8f121ce07d74717e0b1f21d122e04521-Abstract-round2.html. Causal Reinforcement Learning Workshop at RLC 2025 Thomas N. Kipf, Elise van der Pol, and Max Welling. Contrastive learning of structured world models. In 8th International Conference on Learning Representations, ICLR 2020, Addis ...
work page 2021
- [11]
-
[12]
From scale to speed: Adaptive test-time scaling for image editing.CoRR, abs/2603.00141, 2026
DOI: 10.48550/ARXIV .2506.13803. URL https: //doi.org/10.48550/arXiv.2506.13803. Anson Lei, Bernhard Schölkopf, and Ingmar Posner. Spartan: A sparse transformer learning local causation. arXiv preprint arXiv:2411.06890,
work page internal anchor Pith review doi:10.48550/arxiv
-
[13]
Causal transformer for estimating counterfactual outcomes
Valentyn Melnychuk, Dennis Frauen, and Stefan Feuerriegel. Causal transformer for estimating counterfactual outcomes. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine...
work page 2022
-
[14]
Eshaan Nichani, Alex Damian, and Jason D
URL https://proceedings.mlr.press/ v162/melnychuk22a.html. Eshaan Nichani, Alex Damian, and Jason D. Lee. How transformers learn causal structure with gradient descent. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,
work page 2024
-
[15]
DOI: 10.1201/9781420011579.ch23
ISBN 978-1-58488-658-7. DOI: 10.1201/9781420011579.ch23. Raanan Y . Rohekar, Yaniv Gurwicz, and Shami Nisimov. Causal interpretation of self- attention in pre-trained transformers. In Alice Oh, Tristan Naumann, Amir Glober- son, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural In- formation Processing Systems 36: Annual Conference o...
- [16]
-
[17]
DOI: 10.1214/aos/1176344064. URL https://doi.org/10. 1214/aos/1176344064. Bertrand Russell. Human Knowledge: Its Scope and Limits . Routledge, London and New York,
-
[18]
The Graph Neural Network Model
DOI: 10.1109/TNN.2008.2005605. URL https://doi.org/10.1109/TNN.2008.2005605. Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634,
-
[19]
Xiao Shou, Debarun Bhattacharjya, Tian Gao, Dharmashankar Subramanian, Oktie Hassanzadeh, and Kristin P. Bennett. Pairwise causality guided transformers for event sequences. In Al- ice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural ...
work page 2023
-
[20]
URL http://papers.nips.cc/paper_files/paper/2023/hash/ 91b047c5f5bd41ef56bfaf4ad0bd19e3-Abstract-Conference.html. Brian Skyrms. Causal necessity. Philosophy of Science, 48(2):329–335,
work page 2023
-
[21]
Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M
DOI: 10.1086/289003. Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M. Bronstein. Understanding over-squashing and bottlenecks on graphs via curvature. In The Tenth In- ternational Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29,
-
[22]
Causal action influence aware counterfactual data augmentation.arXiv preprint arXiv:2405.18917, 2024
URL https://openreview.net/forum?id=7UmjRGzp-A. Núria Armengol Urpí, Marco Bagatella, Marin Vlastelica, and Georg Martius. Causal action influence aware counterfactual data augmentation. arXiv preprint arXiv:2405.18917,
-
[23]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 30: Annual Conference ...
work page 2017
-
[24]
URL https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html. Christopher J. C. H. Watkins and Peter Dayan. Technical note q-learning. Mach. Learn., 8:279–292,
work page 2017
-
[25]
DOI: 10.1007/BF00992698. URL https://doi.org/10.1007/BF00992698. Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229–256,
-
[26]
DOI: 10.1007/BF00992696. URL https://doi. org/10.1007/BF00992696. Moritz Willig, Tim Nelson Tobiasch, Florian Peter Busch, Jonas Seng, Devendra Singh Dhami, and Kristian Kersting. Systems with switching causal relations: A meta-causal perspective. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28,
-
[27]
Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, and Elias Bareinboim
URL https://openreview.net/forum?id= J9VogDTa1W. Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, and Elias Bareinboim. The causal-neural con- nection: Expressiveness, learnability, and inference. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Ad- vances in Neural Information Processing Systems 34: An...
work page 2021
- [29]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.