Interventional Time Series Priors for Causal Foundation Models

Dennis Thumm; Ying Chen

arxiv: 2603.11090 · v2 · submitted 2026-03-11 · 💻 cs.LG · stat.ME

Interventional Time Series Priors for Causal Foundation Models

Dennis Thumm , Ying Chen This is my paper

Pith reviewed 2026-05-15 12:43 UTC · model grok-4.3

classification 💻 cs.LG stat.ME

keywords causal inferencetime seriesfoundation modelssynthetic datastructural causal modelsprior-data fitted networksinterventional datain-context learning

0 comments

The pith

CausalTimePrior generates paired observational and interventional time series to train prior-data fitted networks for in-context causal effect estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes CausalTimePrior as a framework to create synthetic temporal structural causal models that include both observational time series and corresponding interventional data under various graph structures and dynamics. This fills a gap in existing benchmarks, which provide only observational data and ground-truth graphs but no interventional targets needed to train causal foundation models. PFNs trained on this prior can then perform in-context causal effect estimation on new, held-out synthetic time series models. A sympathetic reader would care because it opens a route to foundation models that handle causal queries in time series domains such as economics, biology, and climate without requiring task-specific retraining.

Core claim

CausalTimePrior supplies a configurable generator for synthetic TSCMs that produce paired observational and interventional time series, supporting nonlinear autoregressive mechanisms, regime switches, and multiple intervention types; prior-data fitted networks trained on data from this generator perform in-context causal effect estimation on held-out TSCMs.

What carries the argument

CausalTimePrior, the framework that samples temporal structural causal models and outputs matched observational and interventional time series for training.

If this is right

PFNs can answer causal queries on time series by conditioning only on observational context plus a query intervention.
The same trained network handles multiple intervention types without retraining.
Configurable graph structures and dynamics in the prior allow targeted coverage of different causal regimes.
The setup provides a scalable source of labeled training data for causal time-series tasks that previously lacked interventional labels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the synthetic models prove realistic, the same prior could be used to pre-train larger transformers that accept mixed observational and interventional prompts.
The approach suggests a general recipe: design a prior over causal mechanisms first, then fit a network to its generated data, rather than collecting scarce real interventional records.
One could test whether performance improves when the prior is tuned to match marginal statistics of a target domain before training.

Load-bearing premise

The synthetic time series generated by CausalTimePrior capture enough of the statistical and causal structure of real-world time series that models trained on them generalize to new data.

What would settle it

Train a PFN on CausalTimePrior data and test it on a collection of real-world time series with known interventions; if the estimated causal effects show large systematic error compared with the known effects, the generalization claim fails.

Figures

Figures reproduced from arXiv: 2603.11090 by Dennis Thumm, Ying Chen.

**Figure 2.** Figure 2: Four time-varying intervention profiles. Blue: observational trajectory. Red: interven [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: Paired observational and interventional time series for the intervention target variable. The [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: All variables in a sampled 6-variable TSCM with a hard intervention on Variable 4. The [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Distributions of prior properties across 100K sampled TSCMs from CausalTimePrior. The [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Prior-data fitted networks (PFNs) have emerged as powerful foundation models for tabular causal inference, yet their extension to time series remains limited by the absence of synthetic data generators that provide interventional targets. Existing time series benchmarks generate observational data with ground-truth causal graphs but lack the interventional data required for training causal foundation models. To address this, we propose \textbf{CausalTimePrior}, a principled framework for generating synthetic temporal structural causal models (TSCMs) with paired observational and interventional time series. Our prior supports configurable causal graph structures, nonlinear autoregressive mechanisms, regime-switching dynamics, and multiple intervention types (hard, soft, time-varying). We demonstrate that PFNs trained on CausalTimePrior can perform in-context causal effect estimation on held-out TSCMs, establishing a pathway toward foundation models for time series causal inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies a new configurable generator for synthetic interventional time series SCMs that lets PFNs do in-context causal effect estimation inside that family, but provides no evidence of transfer beyond it.

read the letter

The paper introduces CausalTimePrior, a framework for generating synthetic temporal structural causal models that include paired observational and interventional time series with configurable nonlinear dynamics and interventions. This is the useful new piece, as it fills the gap for training data in causal time series modeling where real interventional data is rare. They extend prior-data fitted networks to this setting by training on data from the generator and testing in-context causal effect estimation on held-out examples from the same family. The setup looks independent enough that the results aren't just by construction, and the generator's flexibility with regime switching and intervention types is a clear step beyond observational-only benchmarks. The evaluation stays entirely within synthetic data generated from the same prior, with no tests on real time series or substantially different mechanisms. That limits how far the foundation model claim can go right now, since we don't know if the in-context learning transfers outside the training distribution. The abstract is also short on specific metrics or baselines, so the full paper needs to show those clearly to make the performance claims stick. This is worth attention for anyone building or using synthetic data for causal inference on time series. A reader focused on practical tools for causal foundation models would find the generator itself valuable to experiment with. I would send it to peer review. The generator is a concrete, usable advance that merits referee feedback on how to strengthen the generalization evidence.

Referee Report

2 major / 2 minor

Summary. The paper introduces CausalTimePrior, a framework for generating synthetic temporal structural causal models (TSCMs) with paired observational and interventional time series data supporting configurable graphs, nonlinear autoregressive mechanisms, regime-switching, and multiple intervention types. It claims that PFNs trained on data from this prior can perform in-context causal effect estimation on held-out TSCMs from the same family, establishing a pathway toward foundation models for time series causal inference.

Significance. If the central demonstration holds, the work supplies a scalable source of interventional targets that existing time-series benchmarks lack, enabling in-context learning for causal queries in dynamic systems. This could accelerate development of foundation models in causal time-series inference. The contribution is primarily methodological; its significance for downstream applications remains conditional on evidence of transfer beyond the synthetic family.

major comments (2)

[§4] §4 (Experiments): The evaluation is restricted to held-out TSCMs sampled from the identical CausalTimePrior family used for training. This only confirms that the PFN can interpolate within the training distribution; it does not test whether the learned in-context mechanism transfers to real-world time series whose marginals, noise spectra, or causal mechanisms lie outside the configurable nonlinear autoregressive regime-switching class.
[Abstract and §4] Abstract and §4: The manuscript states that trained PFNs 'perform in-context causal effect estimation' on held-out data yet reports no quantitative metrics, baselines, ablation studies, or error analysis. Without these, the strength of the central demonstration cannot be assessed.

minor comments (2)

[§3] Define the precise parameterization of the regime-switching dynamics and the intervention operators (hard/soft/time-varying) with explicit equations in §3 to allow exact reproduction.
[§3.2] Clarify whether the held-out TSCMs share the same hyper-parameter ranges as the training set or are drawn from a disjoint range; this affects the interpretation of 'held-out'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [§4] §4 (Experiments): The evaluation is restricted to held-out TSCMs sampled from the identical CausalTimePrior family used for training. This only confirms that the PFN can interpolate within the training distribution; it does not test whether the learned in-context mechanism transfers to real-world time series whose marginals, noise spectra, or causal mechanisms lie outside the configurable nonlinear autoregressive regime-switching class.

Authors: We agree that the experiments demonstrate performance only on held-out samples from the same prior family, confirming in-distribution interpolation rather than out-of-distribution transfer to real-world time series. This was a deliberate first step to validate the prior and PFN approach within a controlled setting where interventional targets are available. We will revise Section 4 and add a new limitations subsection explicitly stating this scope, discussing the challenges of real-world transfer, and outlining planned extensions such as domain adaptation or evaluation on existing observational time-series benchmarks with partial interventional data. revision: partial
Referee: [Abstract and §4] Abstract and §4: The manuscript states that trained PFNs 'perform in-context causal effect estimation' on held-out data yet reports no quantitative metrics, baselines, ablation studies, or error analysis. Without these, the strength of the central demonstration cannot be assessed.

Authors: We appreciate this observation. While Section 4 presents results showing that the PFN achieves lower estimation error than naive baselines on held-out TSCMs, we acknowledge that the reporting lacks sufficient detail. In the revised manuscript we will expand Section 4 to include: (i) explicit quantitative metrics such as mean absolute error and coverage rates for causal effect estimates across multiple regimes; (ii) comparisons against standard time-series causal inference baselines (e.g., Granger causality, VAR-based methods, and synthetic control); (iii) ablation studies removing individual prior components (regime-switching, nonlinear mechanisms); and (iv) error analysis stratified by graph density, intervention type, and sequence length. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines CausalTimePrior as an independent generative framework for synthetic TSCMs (configurable graphs, nonlinear autoregressive mechanisms, regime-switching, and intervention types). It then trains PFNs on samples drawn from this prior and reports in-context causal effect estimation performance on held-out samples from the identical family. This constitutes a standard within-distribution train/test split rather than any reduction of the result to fitted parameters by construction, self-definition of the target metric, or load-bearing self-citation. No equations or claims in the provided text equate the reported PFN performance to the generator inputs themselves; the empirical demonstration remains falsifiable against the held-out synthetic data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that synthetic data with the listed properties will transfer to real time series; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Synthetic temporal structural causal models with configurable graphs and interventions can serve as effective training priors for causal foundation models
Invoked to justify training PFNs on generated data for generalization to held-out TSCMs

pith-pipeline@v0.9.0 · 5430 in / 1164 out tokens · 34728 ms · 2026-05-15T12:43:06.519585+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Curriculum learning, in: Proceedings of the 26th Annual International Conference on Machine Learning, Associa- tion for Computing Machinery, New York, NY, USA

Association for Computing Machinery. ISBN 9781605585161. doi: 10.1145/1553374.1553380. URLhttps://doi.org/10.1145/ 1553374.1553380. Ioana Bica, Ahmed M Alaa, James Jordon, and Mihaela van der Schaar. Estimating counterfactual treatment outcomes over time through adversarially balanced representations. InInternational Conference on Learning Representations,

work page doi:10.1145/1553374.1553380
[2]

Philip Boeken and Joris M. Mooij. Dynamic structural causal models. InUAI 2024 Workshop on Causal Inference for Time Series (CI4TS),

work page 2024
[3]

CausalTime: Realistically gen- erated time-series for benchmarking of causal discovery

5 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) Yuxiao Cheng, Ziqian Yang, Xu Chen, Jiecheng Li, and Junchi Yan. CausalTime: Realistically gen- erated time-series for benchmarking of causal discovery. InInternational Conference on Learning Representations,

work page 2026
[4]

Foundation models for causal inference via prior-data fitted networks.arXiv preprint arXiv:2506.10914,

Yuchen Ma, Dennis Frauen, Emil Javurek, and Stefan Feuerriegel. Foundation models for causal inference via prior-data fitted networks.arXiv preprint arXiv:2506.10914,

work page arXiv
[5]

TempoPFN: Synthetic pre-training of linear RNNs for zero-shot time series forecasting

Vladyslav Moroshan, Julien Siems, Arber Zela, Timur Carstensen, and Frank Hutter. TempoPFN: Synthetic pre-training of linear RNNs for zero-shot time series forecasting. InNeurIPS 2025 Workshop on AI for Tabular Data,

work page 2025
[6]

Causal discovery with continuous additive noise models.Journal of Machine Learning Research,

6 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Sch ¨olkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research,

work page 2026
[7]

Towards causal market simulators

Dennis Thumm and Luis Ontaneda Mijares. Towards causal market simulators. InICAIF 2025 Workshop on Rethinking Financial Time-Series, Singapore,

work page 2025
[8]

Causal Time Series Generation via Diffusion Models

URLhttps:// icaif-25-rtfs.github.io/. Yutong Xia, Chang Xu, Yuxuan Liang, Qingsong Wen, Roger Zimmermann, and Jiang Bian. Causal time series generation via diffusion models.arXiv preprint arXiv:2509.20846,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

48550/arXiv.2508.02879

URLhttps://doi.org/10. 48550/arXiv.2508.02879. 7 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) A CAUSALTIMEPRIORALGORITHM Algorithm 1 formalizes the CausalTimePrior sampling procedure for generating paired observational and interventional time series from TSCMs. Algorithm 1CausalTimePrior Sampling 1:Input:Prior hyperparametersΠ = (Π...

work page arXiv 2026
[11]

motivates extending CausalTimePrior to generate in- terventional data from SDE-based causal models. Consider a causal Ornstein-Uhlenbeck process dXt =θ(µ−X t)dt+σ w dWt; applying Euler-Maruyama with step∆tyields: xt+1 = (1−θ∆t)| {z } c2 xt +θµ∆t|{z} c1 +σ w √ ∆t| {z } c3 Zt, Z t ∼ N(0,1)(4) which is precisely the AR(1) form our mechanism prior generates. ...

work page 2025
[12]

what would anewdraw from the system look like under this intervention?

or Neural SDEs (Kidger et al., 2021)— and discretize at variable time steps, enabling the prior to generate irregularly-sampled interventional time series. B PRIORASSUMPTIONS ANDLIMITATIONS This section details the modeling assumptions underlying CausalTimePrior and their implications for identifiability and generalization. 8 ICLR 2026 Workshop on Time Se...

work page 2021
[13]

receive Gaussian noise with std∼ShiftedExp(rate= 1.0,shift= 0.1), providing larger driving noise. Non-root nodes receive noise from one of three variance-matched families (each with probability 1 3): Gaussian N(0,std 2), UniformU(−a, a)witha=std √ 3, or Laplace Lap(0, b)withb=std/ √ 2, where std∼ShiftedExp(rate= 10.0,shift= 0.01). The smaller non-root noi...

work page 2026
[14]

interventional (red) trajectories

Right column: resulting observational (blue) vs. interventional (red) trajectories. Yellow shading marks the intervention window[30,70). thequery encoderembeds the prediction query (which variable to predict, when). Theprediction headcombines these encodings to outputP(Y int τ |do(X(i) t =c),X obs 1:T )as a Gaussian distribution with predicted mean and st...

work page 2026
[15]

Implementation.CausalTimePrior is implemented from scratch, drawing conceptual inspi- ration from TempoPFN’s diverse generator design and intervention logic from Do-PFN

is an important next step. Implementation.CausalTimePrior is implemented from scratch, drawing conceptual inspi- ration from TempoPFN’s diverse generator design and intervention logic from Do-PFN. The core is aTemporalSCMclass that supports bothsample observational(T)and sample interventional(T, intervention)methods, enabling paired data generation from t...

work page 2026
[16]

Causal effects propagate through the graph structure, affecting downstream variables while leaving non-causally connected variables un- changed

The intervention target is highlighted with a yellow background. Causal effects propagate through the graph structure, affecting downstream variables while leaving non-causally connected variables un- changed. 3 4 5 6 7 8 9 10 Number of variables (N) 0 2000 4000 6000 8000 10000 12000 14000Count (a) Graph size Hard Soft Time-varying 0 10000 20000 30000 400...

work page 2000
[17]

SimpleCausalPFN achieves comparable RMSE to V AR-OLS (176.4 vs 176.5) while requiring no per-dataset fitting

which discovers causal graphs via conditional independence tests, and a mean pre- diction baseline (Table 3). SimpleCausalPFN achieves comparable RMSE to V AR-OLS (176.4 vs 176.5) while requiring no per-dataset fitting. PCMCI+ achieves lower overall RMSE (161.4) by leveraging discovered causal structure, but requires expensive per-sample graph discovery. ...

work page 2026

[1] [1]

Curriculum learning, in: Proceedings of the 26th Annual International Conference on Machine Learning, Associa- tion for Computing Machinery, New York, NY, USA

Association for Computing Machinery. ISBN 9781605585161. doi: 10.1145/1553374.1553380. URLhttps://doi.org/10.1145/ 1553374.1553380. Ioana Bica, Ahmed M Alaa, James Jordon, and Mihaela van der Schaar. Estimating counterfactual treatment outcomes over time through adversarially balanced representations. InInternational Conference on Learning Representations,

work page doi:10.1145/1553374.1553380

[2] [2]

Philip Boeken and Joris M. Mooij. Dynamic structural causal models. InUAI 2024 Workshop on Causal Inference for Time Series (CI4TS),

work page 2024

[3] [3]

CausalTime: Realistically gen- erated time-series for benchmarking of causal discovery

5 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) Yuxiao Cheng, Ziqian Yang, Xu Chen, Jiecheng Li, and Junchi Yan. CausalTime: Realistically gen- erated time-series for benchmarking of causal discovery. InInternational Conference on Learning Representations,

work page 2026

[4] [4]

Foundation models for causal inference via prior-data fitted networks.arXiv preprint arXiv:2506.10914,

Yuchen Ma, Dennis Frauen, Emil Javurek, and Stefan Feuerriegel. Foundation models for causal inference via prior-data fitted networks.arXiv preprint arXiv:2506.10914,

work page arXiv

[5] [5]

TempoPFN: Synthetic pre-training of linear RNNs for zero-shot time series forecasting

Vladyslav Moroshan, Julien Siems, Arber Zela, Timur Carstensen, and Frank Hutter. TempoPFN: Synthetic pre-training of linear RNNs for zero-shot time series forecasting. InNeurIPS 2025 Workshop on AI for Tabular Data,

work page 2025

[6] [6]

Causal discovery with continuous additive noise models.Journal of Machine Learning Research,

6 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Sch ¨olkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research,

work page 2026

[7] [7]

Towards causal market simulators

Dennis Thumm and Luis Ontaneda Mijares. Towards causal market simulators. InICAIF 2025 Workshop on Rethinking Financial Time-Series, Singapore,

work page 2025

[8] [8]

Causal Time Series Generation via Diffusion Models

URLhttps:// icaif-25-rtfs.github.io/. Yutong Xia, Chang Xu, Yuxuan Liang, Qingsong Wen, Roger Zimmermann, and Jiang Bian. Causal time series generation via diffusion models.arXiv preprint arXiv:2509.20846,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [10]

48550/arXiv.2508.02879

URLhttps://doi.org/10. 48550/arXiv.2508.02879. 7 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) A CAUSALTIMEPRIORALGORITHM Algorithm 1 formalizes the CausalTimePrior sampling procedure for generating paired observational and interventional time series from TSCMs. Algorithm 1CausalTimePrior Sampling 1:Input:Prior hyperparametersΠ = (Π...

work page arXiv 2026

[10] [11]

motivates extending CausalTimePrior to generate in- terventional data from SDE-based causal models. Consider a causal Ornstein-Uhlenbeck process dXt =θ(µ−X t)dt+σ w dWt; applying Euler-Maruyama with step∆tyields: xt+1 = (1−θ∆t)| {z } c2 xt +θµ∆t|{z} c1 +σ w √ ∆t| {z } c3 Zt, Z t ∼ N(0,1)(4) which is precisely the AR(1) form our mechanism prior generates. ...

work page 2025

[11] [12]

what would anewdraw from the system look like under this intervention?

or Neural SDEs (Kidger et al., 2021)— and discretize at variable time steps, enabling the prior to generate irregularly-sampled interventional time series. B PRIORASSUMPTIONS ANDLIMITATIONS This section details the modeling assumptions underlying CausalTimePrior and their implications for identifiability and generalization. 8 ICLR 2026 Workshop on Time Se...

work page 2021

[12] [13]

receive Gaussian noise with std∼ShiftedExp(rate= 1.0,shift= 0.1), providing larger driving noise. Non-root nodes receive noise from one of three variance-matched families (each with probability 1 3): Gaussian N(0,std 2), UniformU(−a, a)witha=std √ 3, or Laplace Lap(0, b)withb=std/ √ 2, where std∼ShiftedExp(rate= 10.0,shift= 0.01). The smaller non-root noi...

work page 2026

[13] [14]

interventional (red) trajectories

Right column: resulting observational (blue) vs. interventional (red) trajectories. Yellow shading marks the intervention window[30,70). thequery encoderembeds the prediction query (which variable to predict, when). Theprediction headcombines these encodings to outputP(Y int τ |do(X(i) t =c),X obs 1:T )as a Gaussian distribution with predicted mean and st...

work page 2026

[14] [15]

Implementation.CausalTimePrior is implemented from scratch, drawing conceptual inspi- ration from TempoPFN’s diverse generator design and intervention logic from Do-PFN

is an important next step. Implementation.CausalTimePrior is implemented from scratch, drawing conceptual inspi- ration from TempoPFN’s diverse generator design and intervention logic from Do-PFN. The core is aTemporalSCMclass that supports bothsample observational(T)and sample interventional(T, intervention)methods, enabling paired data generation from t...

work page 2026

[15] [16]

Causal effects propagate through the graph structure, affecting downstream variables while leaving non-causally connected variables un- changed

The intervention target is highlighted with a yellow background. Causal effects propagate through the graph structure, affecting downstream variables while leaving non-causally connected variables un- changed. 3 4 5 6 7 8 9 10 Number of variables (N) 0 2000 4000 6000 8000 10000 12000 14000Count (a) Graph size Hard Soft Time-varying 0 10000 20000 30000 400...

work page 2000

[16] [17]

SimpleCausalPFN achieves comparable RMSE to V AR-OLS (176.4 vs 176.5) while requiring no per-dataset fitting

which discovers causal graphs via conditional independence tests, and a mean pre- diction baseline (Table 3). SimpleCausalPFN achieves comparable RMSE to V AR-OLS (176.4 vs 176.5) while requiring no per-dataset fitting. PCMCI+ achieves lower overall RMSE (161.4) by leveraging discovered causal structure, but requires expensive per-sample graph discovery. ...

work page 2026