Interventional Time Series Priors for Causal Foundation Models
Pith reviewed 2026-05-15 12:43 UTC · model grok-4.3
The pith
CausalTimePrior generates paired observational and interventional time series to train prior-data fitted networks for in-context causal effect estimation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CausalTimePrior supplies a configurable generator for synthetic TSCMs that produce paired observational and interventional time series, supporting nonlinear autoregressive mechanisms, regime switches, and multiple intervention types; prior-data fitted networks trained on data from this generator perform in-context causal effect estimation on held-out TSCMs.
What carries the argument
CausalTimePrior, the framework that samples temporal structural causal models and outputs matched observational and interventional time series for training.
If this is right
- PFNs can answer causal queries on time series by conditioning only on observational context plus a query intervention.
- The same trained network handles multiple intervention types without retraining.
- Configurable graph structures and dynamics in the prior allow targeted coverage of different causal regimes.
- The setup provides a scalable source of labeled training data for causal time-series tasks that previously lacked interventional labels.
Where Pith is reading between the lines
- If the synthetic models prove realistic, the same prior could be used to pre-train larger transformers that accept mixed observational and interventional prompts.
- The approach suggests a general recipe: design a prior over causal mechanisms first, then fit a network to its generated data, rather than collecting scarce real interventional records.
- One could test whether performance improves when the prior is tuned to match marginal statistics of a target domain before training.
Load-bearing premise
The synthetic time series generated by CausalTimePrior capture enough of the statistical and causal structure of real-world time series that models trained on them generalize to new data.
What would settle it
Train a PFN on CausalTimePrior data and test it on a collection of real-world time series with known interventions; if the estimated causal effects show large systematic error compared with the known effects, the generalization claim fails.
Figures
read the original abstract
Prior-data fitted networks (PFNs) have emerged as powerful foundation models for tabular causal inference, yet their extension to time series remains limited by the absence of synthetic data generators that provide interventional targets. Existing time series benchmarks generate observational data with ground-truth causal graphs but lack the interventional data required for training causal foundation models. To address this, we propose \textbf{CausalTimePrior}, a principled framework for generating synthetic temporal structural causal models (TSCMs) with paired observational and interventional time series. Our prior supports configurable causal graph structures, nonlinear autoregressive mechanisms, regime-switching dynamics, and multiple intervention types (hard, soft, time-varying). We demonstrate that PFNs trained on CausalTimePrior can perform in-context causal effect estimation on held-out TSCMs, establishing a pathway toward foundation models for time series causal inference.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CausalTimePrior, a framework for generating synthetic temporal structural causal models (TSCMs) with paired observational and interventional time series data supporting configurable graphs, nonlinear autoregressive mechanisms, regime-switching, and multiple intervention types. It claims that PFNs trained on data from this prior can perform in-context causal effect estimation on held-out TSCMs from the same family, establishing a pathway toward foundation models for time series causal inference.
Significance. If the central demonstration holds, the work supplies a scalable source of interventional targets that existing time-series benchmarks lack, enabling in-context learning for causal queries in dynamic systems. This could accelerate development of foundation models in causal time-series inference. The contribution is primarily methodological; its significance for downstream applications remains conditional on evidence of transfer beyond the synthetic family.
major comments (2)
- [§4] §4 (Experiments): The evaluation is restricted to held-out TSCMs sampled from the identical CausalTimePrior family used for training. This only confirms that the PFN can interpolate within the training distribution; it does not test whether the learned in-context mechanism transfers to real-world time series whose marginals, noise spectra, or causal mechanisms lie outside the configurable nonlinear autoregressive regime-switching class.
- [Abstract and §4] Abstract and §4: The manuscript states that trained PFNs 'perform in-context causal effect estimation' on held-out data yet reports no quantitative metrics, baselines, ablation studies, or error analysis. Without these, the strength of the central demonstration cannot be assessed.
minor comments (2)
- [§3] Define the precise parameterization of the regime-switching dynamics and the intervention operators (hard/soft/time-varying) with explicit equations in §3 to allow exact reproduction.
- [§3.2] Clarify whether the held-out TSCMs share the same hyper-parameter ranges as the training set or are drawn from a disjoint range; this affects the interpretation of 'held-out'.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The evaluation is restricted to held-out TSCMs sampled from the identical CausalTimePrior family used for training. This only confirms that the PFN can interpolate within the training distribution; it does not test whether the learned in-context mechanism transfers to real-world time series whose marginals, noise spectra, or causal mechanisms lie outside the configurable nonlinear autoregressive regime-switching class.
Authors: We agree that the experiments demonstrate performance only on held-out samples from the same prior family, confirming in-distribution interpolation rather than out-of-distribution transfer to real-world time series. This was a deliberate first step to validate the prior and PFN approach within a controlled setting where interventional targets are available. We will revise Section 4 and add a new limitations subsection explicitly stating this scope, discussing the challenges of real-world transfer, and outlining planned extensions such as domain adaptation or evaluation on existing observational time-series benchmarks with partial interventional data. revision: partial
-
Referee: [Abstract and §4] Abstract and §4: The manuscript states that trained PFNs 'perform in-context causal effect estimation' on held-out data yet reports no quantitative metrics, baselines, ablation studies, or error analysis. Without these, the strength of the central demonstration cannot be assessed.
Authors: We appreciate this observation. While Section 4 presents results showing that the PFN achieves lower estimation error than naive baselines on held-out TSCMs, we acknowledge that the reporting lacks sufficient detail. In the revised manuscript we will expand Section 4 to include: (i) explicit quantitative metrics such as mean absolute error and coverage rates for causal effect estimates across multiple regimes; (ii) comparisons against standard time-series causal inference baselines (e.g., Granger causality, VAR-based methods, and synthetic control); (iii) ablation studies removing individual prior components (regime-switching, nonlinear mechanisms); and (iv) error analysis stratified by graph density, intervention type, and sequence length. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper defines CausalTimePrior as an independent generative framework for synthetic TSCMs (configurable graphs, nonlinear autoregressive mechanisms, regime-switching, and intervention types). It then trains PFNs on samples drawn from this prior and reports in-context causal effect estimation performance on held-out samples from the identical family. This constitutes a standard within-distribution train/test split rather than any reduction of the result to fitted parameters by construction, self-definition of the target metric, or load-bearing self-citation. No equations or claims in the provided text equate the reported PFN performance to the generator inputs themselves; the empirical demonstration remains falsifiable against the held-out synthetic data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Synthetic temporal structural causal models with configurable graphs and interventions can serve as effective training priors for causal foundation models
Reference graph
Works this paper leans on
-
[1]
Association for Computing Machinery. ISBN 9781605585161. doi: 10.1145/1553374.1553380. URLhttps://doi.org/10.1145/ 1553374.1553380. Ioana Bica, Ahmed M Alaa, James Jordon, and Mihaela van der Schaar. Estimating counterfactual treatment outcomes over time through adversarially balanced representations. InInternational Conference on Learning Representations,
-
[2]
Philip Boeken and Joris M. Mooij. Dynamic structural causal models. InUAI 2024 Workshop on Causal Inference for Time Series (CI4TS),
work page 2024
-
[3]
CausalTime: Realistically gen- erated time-series for benchmarking of causal discovery
5 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) Yuxiao Cheng, Ziqian Yang, Xu Chen, Jiecheng Li, and Junchi Yan. CausalTime: Realistically gen- erated time-series for benchmarking of causal discovery. InInternational Conference on Learning Representations,
work page 2026
-
[4]
Yuchen Ma, Dennis Frauen, Emil Javurek, and Stefan Feuerriegel. Foundation models for causal inference via prior-data fitted networks.arXiv preprint arXiv:2506.10914,
-
[5]
TempoPFN: Synthetic pre-training of linear RNNs for zero-shot time series forecasting
Vladyslav Moroshan, Julien Siems, Arber Zela, Timur Carstensen, and Frank Hutter. TempoPFN: Synthetic pre-training of linear RNNs for zero-shot time series forecasting. InNeurIPS 2025 Workshop on AI for Tabular Data,
work page 2025
-
[6]
Causal discovery with continuous additive noise models.Journal of Machine Learning Research,
6 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Sch ¨olkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research,
work page 2026
-
[7]
Towards causal market simulators
Dennis Thumm and Luis Ontaneda Mijares. Towards causal market simulators. InICAIF 2025 Workshop on Rethinking Financial Time-Series, Singapore,
work page 2025
-
[8]
Causal Time Series Generation via Diffusion Models
URLhttps:// icaif-25-rtfs.github.io/. Yutong Xia, Chang Xu, Yuxuan Liang, Qingsong Wen, Roger Zimmermann, and Jiang Bian. Causal time series generation via diffusion models.arXiv preprint arXiv:2509.20846,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
URLhttps://doi.org/10. 48550/arXiv.2508.02879. 7 ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM) A CAUSALTIMEPRIORALGORITHM Algorithm 1 formalizes the CausalTimePrior sampling procedure for generating paired observational and interventional time series from TSCMs. Algorithm 1CausalTimePrior Sampling 1:Input:Prior hyperparametersΠ = (Π...
-
[11]
motivates extending CausalTimePrior to generate in- terventional data from SDE-based causal models. Consider a causal Ornstein-Uhlenbeck process dXt =θ(µ−X t)dt+σ w dWt; applying Euler-Maruyama with step∆tyields: xt+1 = (1−θ∆t)| {z } c2 xt +θµ∆t|{z} c1 +σ w √ ∆t| {z } c3 Zt, Z t ∼ N(0,1)(4) which is precisely the AR(1) form our mechanism prior generates. ...
work page 2025
-
[12]
what would anewdraw from the system look like under this intervention?
or Neural SDEs (Kidger et al., 2021)— and discretize at variable time steps, enabling the prior to generate irregularly-sampled interventional time series. B PRIORASSUMPTIONS ANDLIMITATIONS This section details the modeling assumptions underlying CausalTimePrior and their implications for identifiability and generalization. 8 ICLR 2026 Workshop on Time Se...
work page 2021
-
[13]
receive Gaussian noise with std∼ShiftedExp(rate= 1.0,shift= 0.1), providing larger driving noise. Non-root nodes receive noise from one of three variance-matched families (each with probability 1 3): Gaussian N(0,std 2), UniformU(−a, a)witha=std √ 3, or Laplace Lap(0, b)withb=std/ √ 2, where std∼ShiftedExp(rate= 10.0,shift= 0.01). The smaller non-root noi...
work page 2026
-
[14]
interventional (red) trajectories
Right column: resulting observational (blue) vs. interventional (red) trajectories. Yellow shading marks the intervention window[30,70). thequery encoderembeds the prediction query (which variable to predict, when). Theprediction headcombines these encodings to outputP(Y int τ |do(X(i) t =c),X obs 1:T )as a Gaussian distribution with predicted mean and st...
work page 2026
-
[15]
is an important next step. Implementation.CausalTimePrior is implemented from scratch, drawing conceptual inspi- ration from TempoPFN’s diverse generator design and intervention logic from Do-PFN. The core is aTemporalSCMclass that supports bothsample observational(T)and sample interventional(T, intervention)methods, enabling paired data generation from t...
work page 2026
-
[16]
The intervention target is highlighted with a yellow background. Causal effects propagate through the graph structure, affecting downstream variables while leaving non-causally connected variables un- changed. 3 4 5 6 7 8 9 10 Number of variables (N) 0 2000 4000 6000 8000 10000 12000 14000Count (a) Graph size Hard Soft Time-varying 0 10000 20000 30000 400...
work page 2000
-
[17]
which discovers causal graphs via conditional independence tests, and a mean pre- diction baseline (Table 3). SimpleCausalPFN achieves comparable RMSE to V AR-OLS (176.4 vs 176.5) while requiring no per-dataset fitting. PCMCI+ achieves lower overall RMSE (161.4) by leveraging discovered causal structure, but requires expensive per-sample graph discovery. ...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.