Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem

Charley M Wu; Christian Gumbsch; Martin V. Butz; Turan Orujlu

arxiv: 2507.13920 · v2 · submitted 2025-07-18 · 💻 cs.LG

Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem

Turan Orujlu , Christian Gumbsch , Martin V. Butz , Charley M Wu This is my paper

Pith reviewed 2026-05-19 03:43 UTC · model grok-4.3

classification 💻 cs.LG

keywords causal process modelsdynamic causal graphsreinforcement learningsparse graphsphysical predictionworld modelingvisual observationsmulti-agent RL

0 comments

The pith

Causal Process Models learn sparse time-varying causal graphs from visual observations by casting graph construction as a multi-agent reinforcement learning problem.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Causal Process Models that treat the construction of dynamic causal graphs as a reinforcement learning task rather than assuming static dense connections. Specialized agents decide at each timestep which objects are causally linked, using a factorization of object and force vectors into three learned dimensions. This produces sparse graphs that only include active interactions. A sympathetic reader would care because the approach promises more efficient and interpretable world models for physical scenes where interactions change over time and object counts vary.

Core claim

Causal Process Models reframe dynamic causal graph discovery as a multi-agent reinforcement learning problem in which agents sequentially decide causal edges based on a structured representation that factorizes object and force vectors along three dimensions of mutability, causal relevance, and control relevance, yielding semantically meaningful encodings that support accurate interaction-graph construction.

What carries the argument

Causal Process Models (CPMs), which use multi-agent reinforcement learning to decide causal connections at each timestep from factorized representations along mutability, causal relevance, and control relevance dimensions.

If this is right

CPMs produce more accurate physical predictions than dense graph baselines especially over longer horizons.
The models handle scenes with varying numbers of objects more effectively.
Causal graphs are constructed only for active interactions, improving computational efficiency and interpretability.
The three-dimensional factorization enables automatic discovery of meaningful causal encodings without explicit supervision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same factorization approach could be tested on non-physical domains such as video of social or biological interactions where causal links also appear and disappear.
Combining the RL decision process with other representation-learning objectives might improve robustness when visual observations contain noise or partial occlusions.
If the learned dimensions prove consistent across environments, they could serve as a general prior for causal discovery in other sequential decision tasks.

Load-bearing premise

The three learned dimensions automatically yield semantically meaningful encodings that allow the reinforcement learning agents to make accurate decisions about which causal edges to include.

What would settle it

Run the model on a controlled physical simulation with known, time-varying ground-truth causal structures and measure whether the learned sparse graphs match the true edges while outperforming dense baselines on long-horizon prediction error.

Figures

Figures reproduced from arXiv: 2507.13920 by Charley M Wu, Christian Gumbsch, Martin V. Butz, Turan Orujlu.

**Figure 2.** Figure 2: Prediction results for a synthetic physics environment in a) observed and b) unobserved settings (Ke et al., 2021) Top: Description of the task. Bottom: Prediction metric vs prediction horizon for 5 objects (average of top 8 out of 10 seeds). 5.1 Comparison Baselines We compare our model against 2 baselines, a graph neural network (GNN) (Scarselli et al., 2009) and a modular network which has a separate ML… view at source ↗

**Figure 3.** Figure 3: Prediction and downstream RL results over number of objects a: Prediction metric vs number of objects for 5-steps. b-c: Mean reward vs number of objects. All results are the average of top 8 out of 10 seeds setting and 7-object 10-step prediction setting (Fig. 3b). In the Unobserved setting, our model was consistently better than the baselines (Fig. 3c). 6 Discussion In this paper, we introduced the Causal… view at source ↗

read the original abstract

Most neural models of causality assume static causal graphs, failing to capture the dynamic and sparse nature of physical interactions where causal relationships emerge and dissolve over time. We introduce the Causal Process Framework and its neural implementation, Causal Process Models (CPMs), for learning sparse, time-varying causal graphs from visual observations. Unlike traditional approaches that maintain dense connectivity, our model explicitly constructs causal edges only when objects actively interact, dramatically improving both interpretability and computational efficiency. We achieve this by casting dynamic interaction-graph construction for world modeling as a multi-agent reinforcement learning problem, where specialized agents sequentially decide which objects are causally connected at each timestep. Our key innovation is a structured representation that factorizes object and force vectors along three learned dimensions (mutability, causal relevance, and control relevance), enabling the automatic discovery of semantically meaningful encodings. We demonstrate that a CPM significantly outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reframes dynamic causal graph discovery as a multi-agent RL task with a three-axis factorization of object and force vectors, but the performance claims lack the metrics and controls needed to show the RL step is doing real work.

read the letter

The punchline is that the paper turns dynamic causal graph building into a multi-agent RL problem using a three-way factorization of object and force vectors, but the performance claims rest on thin evidence from the abstract alone. The new part is the sequential RL decisions for adding or removing edges at each time step, combined with learning dimensions for mutability, causal relevance, and control relevance. This goes beyond typical static causal models or basic GNNs by making the graph construction an active, learned process. It does well at pointing out the problems with dense connectivity in scenes where most objects aren't interacting at any given moment, which should cut down on unnecessary computation and make the model more interpretable for physical prediction. The soft spots are clear though. The abstract says the model beats dense graph baselines on physical prediction tasks, particularly longer horizons and different object counts, yet it supplies no specific metrics, baselines, or ablation studies. That leaves open whether the RL agents are really using the dimensions effectively or if the factorization alone improves things. The assumption that these three dimensions will automatically produce meaningful encodings for causal decisions looks shaky without further checks. The stress-test note is on point. Gains might not need the full causal RL setup if the structured latent space is doing most of the work. The paper would be stronger with experiments that separate those elements. This work is aimed at researchers in causal discovery and model-based reinforcement learning who focus on visual physical scenes. A reader looking for ways to handle sparse, changing interactions could find useful ideas here, assuming the experiments hold up in the full text. I would send this to peer review. The core idea is worth checking out with proper scrutiny on the results and design choices.

Referee Report

3 major / 1 minor

Summary. The paper introduces the Causal Process Framework and its neural implementation, Causal Process Models (CPMs), which reframe dynamic causal graph discovery from visual observations as a multi-agent reinforcement learning problem. Object and force representations are factorized along three learned dimensions (mutability, causal relevance, and control relevance) so that specialized agents can sequentially construct sparse, time-varying causal edges only when interactions occur. The central empirical claim is that CPMs significantly outperform dense-graph baselines on physical prediction tasks, with particular gains at longer horizons and under varying object counts.

Significance. If the performance gains are robust and the RL component is shown to be essential rather than incidental to the factorization, the work offers a novel route to interpretable, computationally efficient modeling of time-varying physical interactions. The structured latent dimensions could improve both prediction accuracy and causal insight in simulation or robotics settings. However, the current lack of detailed metrics, ablations, and isolation of the RL contribution limits the assessed significance.

major comments (3)

[Abstract / Experimental Results] Abstract and Experimental Evaluation: The headline claim that 'a CPM significantly outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts' supplies no quantitative metrics, baseline specifications, dataset details, or ablation evidence. Without these, it is impossible to judge whether the central performance result is supported or sensitive to post-hoc choices.
[Method / Experiments] Method and Experiments: The manuscript does not report ablations that remove the multi-agent RL edge-selection step while retaining the three-dimensional factorization. If the structured latent space alone drives the long-horizon gains, the reframing of graph discovery as RL would not be load-bearing for the reported improvements.
[Method] Method: The assumption that the three learned dimensions 'automatically' yield semantically meaningful encodings that support accurate causal-edge decisions by the RL agents is stated without accompanying analysis, visualizations, or quantitative validation of semantic alignment. This assumption underpins both the interpretability and performance claims.

minor comments (1)

[Method] Notation for the three dimensions and the agent decision process could be introduced with explicit equations or pseudocode to improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major point below and indicate the changes made in the revised manuscript.

read point-by-point responses

Referee: [Abstract / Experimental Results] Abstract and Experimental Evaluation: The headline claim that 'a CPM significantly outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts' supplies no quantitative metrics, baseline specifications, dataset details, or ablation evidence. Without these, it is impossible to judge whether the central performance result is supported or sensitive to post-hoc choices.

Authors: We agree that the abstract would benefit from explicit quantitative support. In the revised version we have expanded the abstract to include concrete metrics (e.g., relative MSE reductions at 10-, 20-, and 50-step horizons and across 3–8 object counts), together with brief specifications of the simulation datasets and the dense-graph baselines against which CPM is compared. These numbers are taken directly from the experimental results already reported in Section 4. revision: yes
Referee: [Method / Experiments] Method and Experiments: The manuscript does not report ablations that remove the multi-agent RL edge-selection step while retaining the three-dimensional factorization. If the structured latent space alone drives the long-horizon gains, the reframing of graph discovery as RL would not be load-bearing for the reported improvements.

Authors: This is a fair criticism. The original manuscript compared the full CPM only against dense baselines and did not isolate the RL component. To address the concern we have added a new ablation in the revised experimental section: a non-RL variant that retains the three-dimensional factorization but replaces the multi-agent RL edge-selection policy with a deterministic threshold on the causal-relevance dimension. The results show that removing the RL step degrades long-horizon prediction accuracy, indicating that the reinforcement-learning formulation contributes measurably beyond the factorization alone. These additional results are now reported in Section 4.3. revision: yes
Referee: [Method] Method: The assumption that the three learned dimensions 'automatically' yield semantically meaningful encodings that support accurate causal-edge decisions by the RL agents is stated without accompanying analysis, visualizations, or quantitative validation of semantic alignment. This assumption underpins both the interpretability and performance claims.

Authors: We accept that the original submission offered only qualitative motivation for the semantic content of the three dimensions. In the revised manuscript we have inserted a dedicated analysis subsection (Section 3.4) that supplies (i) t-SNE visualizations of the factorized representations colored by ground-truth mutability, causal relevance, and control relevance, and (ii) quantitative alignment scores (Pearson correlations and mutual-information values) between each learned dimension and the corresponding simulator-provided attributes. These additions provide direct evidence that the dimensions are semantically aligned and that they inform the RL agents’ edge decisions. revision: yes

Circularity Check

0 steps flagged

No circularity: new RL reframing and factorization presented as independent construction

full rationale

The paper introduces CPMs by reframing dynamic graph discovery as multi-agent RL and factorizing object/force vectors into three learned dimensions (mutability, causal relevance, control relevance). No equations, derivations, or self-citations are exhibited that reduce the claimed performance gains on long-horizon prediction to quantities defined by construction from the same fitted inputs or prior author results. The central claims rest on the empirical comparison to dense baselines rather than any self-definitional loop or renamed known result. The derivation chain is therefore self-contained as a novel modeling framework.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Abstract introduces the Causal Process Framework and the three-axis factorization without stating explicit free parameters or background axioms; the RL formulation and interaction-only edge construction are presented as novel modeling choices.

invented entities (2)

Causal Process Framework no independent evidence
purpose: To enable sparse time-varying causal graphs from visual observations
New framework introduced to replace static dense graphs; no independent evidence supplied in abstract.
Three learned dimensions (mutability, causal relevance, control relevance) no independent evidence
purpose: To factorize object and force vectors for semantically meaningful encodings
Invented structured representation whose utility is asserted but not independently verified in the provided abstract.

pith-pipeline@v0.9.0 · 5702 in / 1211 out tokens · 34676 ms · 2026-05-19T03:43:39.478412+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the Causal Process Framework ... casting dynamic interaction-graph construction ... as a multi-agent reinforcement learning problem ... factorizes object and force vectors along three learned dimensions (mutability, causal relevance, and control relevance)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our model ... outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

[1]

On the bottleneck of graph neural networks and its practical implications

Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net,

work page 2021
[4]

Relational inductive biases, deep learning, and graph networks

URL http://arxiv.org/abs/1806.01261. Lars Buesing, Theophane Weber, Yori Zwols, Nicolas Heess, Sébastien Racanière, Arthur Guez, and Jean-Baptiste Lespiau. Woulda, coulda, shoulda: Counterfactually-guided policy search. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 . OpenReview.net,

work page internal anchor Pith review Pith/arXiv arXiv 2019
[5]

DOI: https://doi.org/10.1016/j.neubiorev

ISSN 0149-7634. DOI: https://doi.org/10.1016/j.neubiorev. 2024.105948. URL https://www.sciencedirect.com/science/article/pii/ S0149763424004172. Hanna M Dettki, Brenden M Lake, Charley M Wu, and Bob Rehder. Do large language models reason causally like us? even better? arXiv preprint arXiv:2502.10215,

work page doi:10.1016/j.neubiorev 2024
[6]

Using confounded data in latent model-based reinforcement learning

Maxime Gasse, Damien Grasset, Guillaume Gaudron, and Pierre-Yves Oudeyer. Using confounded data in latent model-based reinforcement learning. Trans. Mach. Learn. Res., 2023,

work page 2023
[7]

Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio, and Michael M

URL https://api.semanticscholar.org/CorpusID:235361720. Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio, and Michael M. Bronstein. On over-squashing in message passing neural networks: The impact of width, depth, and topology. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan...

work page 2023
[8]

Francesco Di Giovanni, T

URL https://proceedings.mlr.press/v202/ di-giovanni23a.html. Francesco Di Giovanni, T. Konstantin Rusch, Michael M. Bronstein, Andreea Deac, Marc Lackenby, Siddhartha Mishra, and Petar Velickovic. How does over-squashing affect the power of gnns? Trans. Mach. Learn. Res. , 2024,

work page 2024
[9]

Systematic evaluation of causal discovery in visual model based reinforcement learning

Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Jimenez Rezende, Michael Mozer, Yoshua Bengio, and Chris Pal. Systematic evaluation of causal discovery in visual model based reinforcement learning. In Joaquin Vanschoren and Sai-Kit Yeung (eds.), Proceedings of the Neural Information Processing System...

work page 2021
[10]

Causal Reinforcement Learning Workshop at RLC 2025 Thomas N

URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/ hash/8f121ce07d74717e0b1f21d122e04521-Abstract-round2.html. Causal Reinforcement Learning Workshop at RLC 2025 Thomas N. Kipf, Elise van der Pol, and Max Welling. Contrastive learning of structured world models. In 8th International Conference on Learning Representations, ICLR 2020, Addis ...

work page 2021
[11]

Richard D

URL https://openreview.net/forum?id= H1gax6VtDB. Richard D. Lange and Konrad P. Kording. Causality in the human niche: lessons for machine learning. CoRR, abs/2506.13803,

work page arXiv
[12]

From scale to speed: Adaptive test-time scaling for image editing.CoRR, abs/2603.00141, 2026

DOI: 10.48550/ARXIV .2506.13803. URL https: //doi.org/10.48550/arXiv.2506.13803. Anson Lei, Bernhard Schölkopf, and Ingmar Posner. Spartan: A sparse transformer learning local causation. arXiv preprint arXiv:2411.06890,

work page internal anchor Pith review doi:10.48550/arxiv
[13]

Causal transformer for estimating counterfactual outcomes

Valentyn Melnychuk, Dennis Frauen, and Stefan Feuerriegel. Causal transformer for estimating counterfactual outcomes. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine...

work page 2022
[14]

Eshaan Nichani, Alex Damian, and Jason D

URL https://proceedings.mlr.press/ v162/melnychuk22a.html. Eshaan Nichani, Alex Damian, and Jason D. Lee. How transformers learn causal structure with gradient descent. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

work page 2024
[15]

DOI: 10.1201/9781420011579.ch23

ISBN 978-1-58488-658-7. DOI: 10.1201/9781420011579.ch23. Raanan Y . Rohekar, Yaniv Gurwicz, and Shami Nisimov. Causal interpretation of self- attention in pre-trained transformers. In Alice Oh, Tristan Naumann, Amir Glober- son, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural In- formation Processing Systems 36: Annual Conference o...

work page doi:10.1201/9781420011579.ch23 2023
[16]

Donald B

URL http://papers.nips.cc/paper_files/paper/2023/hash/ 642a321fba8a0f03765318e629cb93ea-Abstract-Conference.html. Donald B. Rubin. Bayesian Inference for Causal Effects: The Role of Randomization. The Annals of Statistics, 6(1):34 – 58,

work page 2023
[17]

Sasi Kiran Gaddipati et al

DOI: 10.1214/aos/1176344064. URL https://doi.org/10. 1214/aos/1176344064. Bertrand Russell. Human Knowledge: Its Scope and Limits . Routledge, London and New York,

work page doi:10.1214/aos/1176344064
[18]

The Graph Neural Network Model

DOI: 10.1109/TNN.2008.2005605. URL https://doi.org/10.1109/TNN.2008.2005605. Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634,

work page doi:10.1109/tnn.2008.2005605 2008
[19]

Xiao Shou, Debarun Bhattacharjya, Tian Gao, Dharmashankar Subramanian, Oktie Hassanzadeh, and Kristin P. Bennett. Pairwise causality guided transformers for event sequences. In Al- ice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural ...

work page 2023
[20]

Brian Skyrms

URL http://papers.nips.cc/paper_files/paper/2023/hash/ 91b047c5f5bd41ef56bfaf4ad0bd19e3-Abstract-Conference.html. Brian Skyrms. Causal necessity. Philosophy of Science, 48(2):329–335,

work page 2023
[21]

Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M

DOI: 10.1086/289003. Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M. Bronstein. Understanding over-squashing and bottlenecks on graphs via curvature. In The Tenth In- ternational Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29,

work page doi:10.1086/289003 2022
[22]

Causal action influence aware counterfactual data augmentation.arXiv preprint arXiv:2405.18917, 2024

URL https://openreview.net/forum?id=7UmjRGzp-A. Núria Armengol Urpí, Marco Bagatella, Marin Vlastelica, and Georg Martius. Causal action influence aware counterfactual data augmentation. arXiv preprint arXiv:2405.18917,

work page arXiv
[23]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 30: Annual Conference ...

work page 2017
[24]

Christopher J

URL https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html. Christopher J. C. H. Watkins and Peter Dayan. Technical note q-learning. Mach. Learn., 8:279–292,

work page 2017
[25]

doi: 10.1007/BF00992698

DOI: 10.1007/BF00992698. URL https://doi.org/10.1007/BF00992698. Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229–256,

work page doi:10.1007/bf00992698
[26]

Simple statistical gradient-following algorithms for connectionist rein- forcement learning.Machine Learning, 8:229–256, 1992

DOI: 10.1007/BF00992696. URL https://doi. org/10.1007/BF00992696. Moritz Willig, Tim Nelson Tobiasch, Florian Peter Busch, Jonas Seng, Devendra Singh Dhami, and Kristian Kersting. Systems with switching causal relations: A meta-causal perspective. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28,

work page doi:10.1007/bf00992696 2025
[27]

Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, and Elias Bareinboim

URL https://openreview.net/forum?id= J9VogDTa1W. Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, and Elias Bareinboim. The causal-neural con- nection: Expressiveness, learnability, and inference. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Ad- vances in Neural Information Processing Systems 34: An...

work page 2021
[29]

URL https:// arxiv.org/abs/2109.04173

work page arXiv

[1] [1]

On the bottleneck of graph neural networks and its practical implications

Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net,

work page 2021

[2] [4]

Relational inductive biases, deep learning, and graph networks

URL http://arxiv.org/abs/1806.01261. Lars Buesing, Theophane Weber, Yori Zwols, Nicolas Heess, Sébastien Racanière, Arthur Guez, and Jean-Baptiste Lespiau. Woulda, coulda, shoulda: Counterfactually-guided policy search. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 . OpenReview.net,

work page internal anchor Pith review Pith/arXiv arXiv 2019

[3] [5]

DOI: https://doi.org/10.1016/j.neubiorev

ISSN 0149-7634. DOI: https://doi.org/10.1016/j.neubiorev. 2024.105948. URL https://www.sciencedirect.com/science/article/pii/ S0149763424004172. Hanna M Dettki, Brenden M Lake, Charley M Wu, and Bob Rehder. Do large language models reason causally like us? even better? arXiv preprint arXiv:2502.10215,

work page doi:10.1016/j.neubiorev 2024

[4] [6]

Using confounded data in latent model-based reinforcement learning

Maxime Gasse, Damien Grasset, Guillaume Gaudron, and Pierre-Yves Oudeyer. Using confounded data in latent model-based reinforcement learning. Trans. Mach. Learn. Res., 2023,

work page 2023

[5] [7]

Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio, and Michael M

URL https://api.semanticscholar.org/CorpusID:235361720. Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio, and Michael M. Bronstein. On over-squashing in message passing neural networks: The impact of width, depth, and topology. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan...

work page 2023

[6] [8]

Francesco Di Giovanni, T

URL https://proceedings.mlr.press/v202/ di-giovanni23a.html. Francesco Di Giovanni, T. Konstantin Rusch, Michael M. Bronstein, Andreea Deac, Marc Lackenby, Siddhartha Mishra, and Petar Velickovic. How does over-squashing affect the power of gnns? Trans. Mach. Learn. Res. , 2024,

work page 2024

[7] [9]

Systematic evaluation of causal discovery in visual model based reinforcement learning

Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Jimenez Rezende, Michael Mozer, Yoshua Bengio, and Chris Pal. Systematic evaluation of causal discovery in visual model based reinforcement learning. In Joaquin Vanschoren and Sai-Kit Yeung (eds.), Proceedings of the Neural Information Processing System...

work page 2021

[8] [10]

Causal Reinforcement Learning Workshop at RLC 2025 Thomas N

URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/ hash/8f121ce07d74717e0b1f21d122e04521-Abstract-round2.html. Causal Reinforcement Learning Workshop at RLC 2025 Thomas N. Kipf, Elise van der Pol, and Max Welling. Contrastive learning of structured world models. In 8th International Conference on Learning Representations, ICLR 2020, Addis ...

work page 2021

[9] [11]

Richard D

URL https://openreview.net/forum?id= H1gax6VtDB. Richard D. Lange and Konrad P. Kording. Causality in the human niche: lessons for machine learning. CoRR, abs/2506.13803,

work page arXiv

[10] [12]

From scale to speed: Adaptive test-time scaling for image editing.CoRR, abs/2603.00141, 2026

DOI: 10.48550/ARXIV .2506.13803. URL https: //doi.org/10.48550/arXiv.2506.13803. Anson Lei, Bernhard Schölkopf, and Ingmar Posner. Spartan: A sparse transformer learning local causation. arXiv preprint arXiv:2411.06890,

work page internal anchor Pith review doi:10.48550/arxiv

[11] [13]

Causal transformer for estimating counterfactual outcomes

Valentyn Melnychuk, Dennis Frauen, and Stefan Feuerriegel. Causal transformer for estimating counterfactual outcomes. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine...

work page 2022

[12] [14]

Eshaan Nichani, Alex Damian, and Jason D

URL https://proceedings.mlr.press/ v162/melnychuk22a.html. Eshaan Nichani, Alex Damian, and Jason D. Lee. How transformers learn causal structure with gradient descent. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

work page 2024

[13] [15]

DOI: 10.1201/9781420011579.ch23

ISBN 978-1-58488-658-7. DOI: 10.1201/9781420011579.ch23. Raanan Y . Rohekar, Yaniv Gurwicz, and Shami Nisimov. Causal interpretation of self- attention in pre-trained transformers. In Alice Oh, Tristan Naumann, Amir Glober- son, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural In- formation Processing Systems 36: Annual Conference o...

work page doi:10.1201/9781420011579.ch23 2023

[14] [16]

Donald B

URL http://papers.nips.cc/paper_files/paper/2023/hash/ 642a321fba8a0f03765318e629cb93ea-Abstract-Conference.html. Donald B. Rubin. Bayesian Inference for Causal Effects: The Role of Randomization. The Annals of Statistics, 6(1):34 – 58,

work page 2023

[15] [17]

Sasi Kiran Gaddipati et al

DOI: 10.1214/aos/1176344064. URL https://doi.org/10. 1214/aos/1176344064. Bertrand Russell. Human Knowledge: Its Scope and Limits . Routledge, London and New York,

work page doi:10.1214/aos/1176344064

[16] [18]

The Graph Neural Network Model

DOI: 10.1109/TNN.2008.2005605. URL https://doi.org/10.1109/TNN.2008.2005605. Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634,

work page doi:10.1109/tnn.2008.2005605 2008

[17] [19]

Xiao Shou, Debarun Bhattacharjya, Tian Gao, Dharmashankar Subramanian, Oktie Hassanzadeh, and Kristin P. Bennett. Pairwise causality guided transformers for event sequences. In Al- ice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural ...

work page 2023

[18] [20]

Brian Skyrms

URL http://papers.nips.cc/paper_files/paper/2023/hash/ 91b047c5f5bd41ef56bfaf4ad0bd19e3-Abstract-Conference.html. Brian Skyrms. Causal necessity. Philosophy of Science, 48(2):329–335,

work page 2023

[19] [21]

Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M

DOI: 10.1086/289003. Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M. Bronstein. Understanding over-squashing and bottlenecks on graphs via curvature. In The Tenth In- ternational Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29,

work page doi:10.1086/289003 2022

[20] [22]

Causal action influence aware counterfactual data augmentation.arXiv preprint arXiv:2405.18917, 2024

URL https://openreview.net/forum?id=7UmjRGzp-A. Núria Armengol Urpí, Marco Bagatella, Marin Vlastelica, and Georg Martius. Causal action influence aware counterfactual data augmentation. arXiv preprint arXiv:2405.18917,

work page arXiv

[21] [23]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 30: Annual Conference ...

work page 2017

[22] [24]

Christopher J

URL https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html. Christopher J. C. H. Watkins and Peter Dayan. Technical note q-learning. Mach. Learn., 8:279–292,

work page 2017

[23] [25]

doi: 10.1007/BF00992698

DOI: 10.1007/BF00992698. URL https://doi.org/10.1007/BF00992698. Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229–256,

work page doi:10.1007/bf00992698

[24] [26]

Simple statistical gradient-following algorithms for connectionist rein- forcement learning.Machine Learning, 8:229–256, 1992

DOI: 10.1007/BF00992696. URL https://doi. org/10.1007/BF00992696. Moritz Willig, Tim Nelson Tobiasch, Florian Peter Busch, Jonas Seng, Devendra Singh Dhami, and Kristian Kersting. Systems with switching causal relations: A meta-causal perspective. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28,

work page doi:10.1007/bf00992696 2025

[25] [27]

Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, and Elias Bareinboim

URL https://openreview.net/forum?id= J9VogDTa1W. Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, and Elias Bareinboim. The causal-neural con- nection: Expressiveness, learnability, and inference. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Ad- vances in Neural Information Processing Systems 34: An...

work page 2021

[26] [29]

URL https:// arxiv.org/abs/2109.04173

work page arXiv