pith. sign in

arxiv: 2606.21212 · v1 · pith:3TTE5IIXnew · submitted 2026-06-19 · 💻 cs.LG

DCD-PFN: A Decoupling-Aware Foundation Model for Causal Discovery

Pith reviewed 2026-06-26 14:23 UTC · model grok-4.3

classification 💻 cs.LG
keywords causal discoveryfoundation modelPrior-Data Fitted NetworkMarkov boundaryzero-shot generalizationstructural causal modeldecoupling
0
0 comments X

The pith

DCD-PFN pre-trains on synthetic SCMs to learn sample-wise decoupling weights for Markov boundary identification and zero-shot global graph reconstruction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that a Prior-Data Fitted Network can handle causal discovery by shifting from direct global graph prediction to a local decoupling approach. Pre-training across many synthetic structural causal models produces weights that isolate Markov boundaries for individual samples. These local results are then combined in parallel to recover the full graph. A reader would care if this delivers reliable performance on nonlinear noisy data without needing to retrain for each new problem.

Core claim

Through pre-training on diverse synthetic Structural Causal Models, DCD-PFN learns sample-wise decoupling weights that enable Markov boundary identification and efficient reconstruction of global causal graphs while achieving robust zero-shot generalization.

What carries the argument

The decoupling-aware PFN that outputs sample-wise decoupling weights to identify Markov boundaries.

If this is right

  • Parallel local discovery removes the computational bottleneck of global search methods.
  • Zero-shot operation removes the need to retrain or fine-tune on target data.
  • The method stays consistent with the theoretical guarantees of decoupling-based causal discovery.
  • Performance holds across highly nonlinear and noisy data-generating processes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the synthetic-to-real transfer holds, the same pre-training recipe could be applied to other local structure-learning tasks.
  • The approach suggests a route to amortize causal discovery across many scientific domains that currently lack large labeled graphs.
  • Future work could test whether the learned weights also support downstream tasks such as intervention prediction.

Load-bearing premise

Patterns learned from synthetic SCMs transfer to real-world data and the learned weights recover true Markov boundaries without further safeguards.

What would settle it

Run the trained model on real-world datasets that have independently verified ground-truth causal graphs and measure whether the recovered Markov boundaries match the known structure at rates exceeding standard baselines.

Figures

Figures reproduced from arXiv: 2606.21212 by Fei Wu, Haoyuan Qian, Kun Kuang, Yi He, Yikang Chen, Yunze Tong, Zhengkang Guan, Zijing Hu.

Figure 1
Figure 1. Figure 1: Architecture of DCD-PFN. Input synthetic data is encoded into global representations via [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
read the original abstract

Causal discovery is critical for understanding complex data-generating mechanisms, yet traditional algorithms often struggle with highly non-linear and noisy systems, or suffer from severe computational bottlenecks. Recent tabular foundation models based on Prior-Data Fitted Networks (PFNs) have demonstrated remarkable zero-shot inference capabilities, but their potential for explicit structural causal discovery remains underexplored. To bridge this gap, we propose DCD-PFN, a decoupling-aware foundation model for causal discovery. Instead of directly amortizing global graph reconstruction, DCD-PFN focuses on local causal discovery through a decoupling-based paradigm. Through pre-training on diverse synthetic Structural Causal Models (SCMs), the model learns sample-wise decoupling weights that enable Markov boundary (MB) identification. Furthermore, by leveraging parallelized local discovery, DCD-PFN efficiently reconstructs global causal graphs while remaining grounded in the theoretical foundations of decoupling-based causal discovery. Experiments demonstrate that our foundation model achieves robust zero-shot generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper proposes DCD-PFN, a decoupling-aware Prior-Data Fitted Network (PFN) foundation model for causal discovery. It pre-trains on diverse synthetic Structural Causal Models to learn sample-wise decoupling weights that enable Markov boundary identification, then leverages parallelized local discovery to reconstruct global causal graphs while remaining grounded in decoupling theory, claiming robust zero-shot generalization to non-linear and noisy systems.

Significance. If the zero-shot results hold, the work offers a promising direction for scalable causal discovery by combining PFN-style pre-training with established decoupling-based local discovery; the explicit grounding in decoupling theory and use of synthetic SCM diversity for pre-training are strengths that distinguish it from purely empirical tabular foundation models.

minor comments (2)
  1. [Abstract] Abstract: the claim of 'robust zero-shot generalization' would be strengthened by including at least one concrete performance metric, dataset, or baseline comparison rather than leaving the empirical support implicit.
  2. [Method] The manuscript would benefit from a short explicit statement (e.g., in §3 or §4) of how the learned decoupling weights differ in form or training objective from a standard PFN, to make the 'decoupling-aware' contribution easier to isolate.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of DCD-PFN, recognition of its grounding in decoupling theory, and recommendation of minor revision. We appreciate the assessment that the combination of PFN pre-training with local discovery offers a promising direction.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript presents an empirical pre-training pipeline on synthetic SCMs to learn decoupling weights for local MB identification followed by parallel global reconstruction. No derivation chain, equations, or first-principles claims are exhibited that reduce any output to the inputs by construction. The work explicitly positions itself as grounded in prior decoupling theory rather than deriving that theory internally, and the zero-shot generalization results are reported as falsifiable empirical outcomes rather than tautological predictions. No self-citation load-bearing steps, fitted-input renamings, or ansatz smuggling appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.1-grok · 5714 in / 1053 out tokens · 25699 ms · 2026-06-26T14:23:08.706152+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references

  1. [1]

    Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xeno- fon D

    Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xeno- fon D. Koutsoukos. Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation.Journal of Machine Learning Research, 11(7):171–234, 2010

  2. [2]

    Amortized active causal induction with deep reinforcement learning

    Yashas Annadani, Panagiotis Tigas, Stefan Bauer, and Adam Foster. Amortized active causal induction with deep reinforcement learning. InICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, 2024

  3. [3]

    Realistic evaluation of tabpfn v2 in open environments, 2025

    Zi-Jian Cheng, Zi-Yi Jia, Zhi Zhou, Yu-Feng Li, and Lan-Zhe Guo. Realistic evaluation of tabpfn v2 in open environments, 2025. 8

  4. [4]

    Optimal structure identification with greedy search.Journal of machine learning research, 3(Nov):507–554, 2002

    David Maxwell Chickering. Optimal structure identification with greedy search.Journal of machine learning research, 3(Nov):507–554, 2002

  5. [5]

    A meta-learning approach to bayesian causal discovery

    Anish Dhir, Matthew Ashman, James Requeima, and Mark van der Wilk. A meta-learning approach to bayesian causal discovery. InThe Thirteenth International Conference on Learning Representations, 2025

  6. [6]

    On the robustness of tabular foundation models: Test-time attacks and in-context defenses, 2025

    Mohamed Djilani, Thibault Simonetto, Karim Tit, Florian Tambon, Paul Récamier, Salah Ghamizi, Maxime Cordy, and Mike Papadakis. On the robustness of tabular foundation models: Test-time attacks and in-context defenses, 2025

  7. [7]

    Amortized conditional independence testing

    Bao Duong, Nu Hoang, and Thin Nguyen. Amortized conditional independence testing. In Xintao Wu, Myra Spiliopoulou, Can Wang, Vipin Kumar, Longbing Cao, Yanqiu Wu, Yu Yao, and Zhangkai Wu, editors,Advances in Knowledge Discovery and Data Mining, pages 410–423, Singapore, 2025. Springer Nature Singapore. ISBN 978-981-96-8170-9

  8. [8]

    Tabpfn-2.5: Advancing the state of the art in tabular foundation models, 2025

    Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Benjamin Jäger, Dominik Safaric, Simone Alessi, Adrian Hayler, Mihir Manium, Rosen Yu, Felix Jablon- ski, Shi Bin Hoo, Anurag Garg, Jake Robertson, Magnus Bühler, Vladyslav Moroshan, Lennart Purucker, Clara Cornu, Lilly Charlotte Wehrhahn, Alessandro Bonetto, Bernhard Schö...

  9. [9]

    Decoupled causal discovery, 2026

    Zhengkang Guan and Kun Kuang. Decoupled causal discovery, 2026. Manuscript in preparation

  10. [10]

    Efficient ensemble conditional independence test framework for causal discovery

    Zhengkang Guan and Kun Kuang. Efficient ensemble conditional independence test framework for causal discovery. InInternational Conference on Learning Representations, 2026

  11. [11]

    Drift-resilient tabpfn: In-context learning temporal distribution shifts on tabular data

    Kai Helli, David Schnurr, Noah Hollmann, Samuel Müller, and Frank Hutter. Drift-resilient tabpfn: In-context learning temporal distribution shifts on tabular data. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 98742–98781. Curran Associates,...

  12. [12]

    TabPFN: A transformer that solves small tabular classification problems in a second

    Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. TabPFN: A transformer that solves small tabular classification problems in a second. InThe Eleventh International Conference on Learning Representations, 2023

  13. [13]

    Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, January 2025

    Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, January 2025. ISSN 1476-4687

  14. [14]

    Learning to induce causal structure

    Nan Rosemary Ke, Silvia Chiappa, Jane X Wang, Jorg Bornschein, Anirudh Goyal, Melanie Rey, Theophane Weber, Matthew Botvinick, Michael Curtis Mozer, and Danilo Jimenez Rezende. Learning to induce causal structure. InInternational Conference on Learning Representations, 2023

  15. [15]

    Self- attention between datapoints: Going beyond individual input-output pairs in deep learning

    Jannik Kossen, Neil Band, Clare Lyle, Aidan Gomez, Tom Rainforth, and Yarin Gal. Self- attention between datapoints: Going beyond individual input-output pairs in deep learning. In A. Beygelzimer, Y . Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021

  16. [16]

    Gradient- based neural dag learning.ArXiv, abs/1906.02226, 2019

    Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, and Simon Lacoste-Julien. Gradient- based neural dag learning.ArXiv, abs/1906.02226, 2019

  17. [17]

    Bamb: A balanced markov blanket discovery approach to feature selection.ACM Trans

    Zhaolong Ling, Kui Yu, Hao Wang, Lin Liu, Wei Ding, and Xindong Wu. Bamb: A balanced markov blanket discovery approach to feature selection.ACM Trans. Intell. Syst. Technol., 10 (5):52:1–52:25, 2019

  18. [18]

    Amortized inference for causal structure learning

    Lars Lorch, Scott Sussex, Jonas Rothfuss, Andreas Krause, and Bernhard Schölkopf. Amortized inference for causal structure learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems, volume 35, pages 13104–13118. Curran Associates, Inc., 2022. 9

  19. [19]

    Foundation models for causal inference via prior-data fitted networks

    Yuchen Ma, Dennis Frauen, Emil Javurek, and Stefan Feuerriegel. Foundation models for causal inference via prior-data fitted networks. InThe Fourteenth International Conference on Learning Representations, 2026

  20. [20]

    Bayesian network induction via local neighborhoods

    Dimitris Margaritis and Sebastian Thrun. Bayesian network induction via local neighborhoods. In S. Solla, T. Leen, and K. Müller, editors,Advances in Neural Information Processing Systems, volume 12. MIT Press, 1999

  21. [21]

    Transformers can do bayesian inference

    Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. InInternational Conference on Learning Representa- tions, 2022

  22. [22]

    Statistical foundations of prior-data fitted networks

    Thomas Nagler. Statistical foundations of prior-data fitted networks. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

  23. [23]

    On the role of sparsity and dag constraints for learning linear dags.ArXiv, abs/2006.10201, 2020

    Ignavier Ng, AmirEmad Ghassami, and Kun Zhang. On the role of sparsity and dag constraints for learning linear dags.ArXiv, abs/2006.10201, 2020

  24. [24]

    Do-PFN: In-context learning for causal effect estimation

    Jake Robertson, Arik Reuter, Siyuan Guo, Noah Hollmann, Frank Hutter, and Bernhard Schölkopf. Do-PFN: In-context learning for causal effect estimation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  25. [25]

    Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721): 523–529, 2005

    Karen Sachs, Omar Perez, Dana Pe’er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721): 523–529, 2005

  26. [26]

    A linear non-gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(10), 2006

    Shohei Shimizu, Patrik O Hoyer, Aapo Hyvärinen, Antti Kerminen, and Michael Jordan. A linear non-gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(10), 2006

  27. [27]

    Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248, 2011

    Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvarinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, Kenneth Bollen, and Patrik Hoyer. Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248, 2011

  28. [28]

    An algorithm for fast recovery of sparse causal graphs.Social Science Computer Review, 9(1):62–72, 1991

    Peter Spirtes and Clark Glymour. An algorithm for fast recovery of sparse causal graphs.Social Science Computer Review, 9(1):62–72, 1991

  29. [29]

    MIT press, 2000

    Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2000

  30. [30]

    Tsamardinos, Constantin F

    I. Tsamardinos, Constantin F. Aliferis, and Alexander R. Statnikov. Time and sample efficient discovery of markov blankets and direct causal relations. InKnowledge Discovery and Data Mining, 2003

  31. [31]

    Towards efficient and effective discovery of markov blankets for feature selection.Inf

    Hao Wang, Zhaolong Ling, Kui Yu, and Xindong Wu. Towards efficient and effective discovery of markov blankets for feature selection.Inf. Sci., 509:227–242, 2020

  32. [32]

    Accurate markov boundary discovery for causal feature selection.IEEE Trans

    Xingyu Wu, Bingbing Jiang, Kui Yu, Chunyan Miao, and Huanhuan Chen. Accurate markov boundary discovery for causal feature selection.IEEE Trans. Cybern., 50(12):4983–4996, 2020

  33. [33]

    A closer look at tabPFN v2: Understanding its strengths and extending its capabilities

    Han-Jia Ye, Si-Yang Liu, and Wei-Lun Chao. A closer look at tabPFN v2: Understanding its strengths and extending its capabilities. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  34. [34]

    Kernel-based conditional independence test and application in causal discovery

    Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Kernel-based conditional independence test and application in causal discovery. In Fábio Gagliardi Cozman and Avi Pfeffer, editors,UAI 2011, Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, Barcelona, Spain, July 14-17, 2011, pages 804–813. AUAI Press, 2011

  35. [35]

    Limix: Unleashing structured-data modeling capability for generalist intelligence

    Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, Ningbo Dai, Renzhe Xu, Shuyang Li, Tianyang Zhang, Yue He, Yuanrui Wang, Yunjia Zhang, Zijing Xu, Dongzhe Li, Fang Gao, Hao Zou, Jiandong Liu, Jiashuo Liu, Jiawei Xu, Kaijie Cheng, Kehan Li, Linjun Zhou, Qing Li, Shaohua Fan, Xiaoyu 10 Lin, Xinyan...

  36. [36]

    Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W

    Xiyuan Zhang, Danielle C. Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W. Mahoney, Cuixiong Hu, Huzefa Rangwala, George Karypis, and Bernie Wang. Mitra: Mixed synthetic priors for enhancing tabular foundation models. InThe Thirty-ninth Annual Conference on Neural Information Proc...

  37. [37]

    Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. Dags with no tears: Continuous optimization for structure learning. InNeural Information Processing Systems, 2018. 11