Towards Continuous-time Causal Foundation Models

Dennis Thumm; Ruben Wiedemann; Ying Chen

arxiv: 2605.28880 · v1 · pith:FSXU7UAZnew · submitted 2026-05-26 · 💻 cs.LG · physics.data-an· stat.ME

Towards Continuous-time Causal Foundation Models

Dennis Thumm , Ruben Wiedemann , Ying Chen This is my paper

Pith reviewed 2026-06-29 19:48 UTC · model grok-4.3

classification 💻 cs.LG physics.data-anstat.ME

keywords continuous-time causal modelstrajectory-law invarianceobservation schedulestochastic differential equationscausal foundation modelsirregular time seriesinterventionsprior-data fitted networks

0 comments

The pith

Fine-grid integration with decoupled observation makes continuous-time causal models invariant to the observation schedule.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to extend discrete-time causal prior-data fitted networks into continuous time while preserving a key property: the distribution of trajectories must not change merely because observations occur at different times. It defines trajectory-law invariance to the observation schedule as the precise continuity criterion and introduces a three-tier taxonomy that distinguishes discrete models, naive grid integration, and fine-grid integration with decoupled observation. A concrete construction on random directed acyclic graphs driven by Ornstein-Uhlenbeck processes or small multilayer perceptrons is shown to reach the top tier, remaining invariant even under hard, soft, and time-varying interventions. This property matters because real time-series data arrive at irregular intervals; models whose laws shift with sampling frequency cannot be reliably deployed across different monitoring schedules. Ablation experiments on linear and nonlinear priors confirm that fine-grid integration consistently outperforms naive integration, with the advantage widening as the evaluation grid is refined.

Core claim

The central claim is that a random-DAG construction using OU or small-MLP nonlinear drifts, combined with fine-grid SDE integration and observation decoupled from the integration grid, realizes trajectory-law invariance to the observation schedule for irregular sampling and multiple intervention types.

What carries the argument

Fine-grid integration with decoupled observation: the SDE is integrated on a fine auxiliary grid independent of the observation times, after which states are read out only at the actual observation instants.

If this is right

Models built this way produce consistent trajectory distributions across arbitrary irregular sampling patterns.
The encoder component becomes non-critical once fine integration is used, whereas it matters under naive integration.
The released prior supports preliminary zero-shot evaluation on pharmacokinetic and physical-system data.
The performance advantage of fine integration grows as the test grid is refined.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same invariance property could allow a single model to be trained on mixed-frequency data without retraining for each new sampling regime.
If the fine-grid criterion holds beyond the tested OU and small-MLP drifts, larger nonlinear mechanisms might inherit the same schedule independence.
Causal discovery procedures that rely on these models would inherit robustness to changes in measurement frequency.

Load-bearing premise

That fine-grid integration with decoupled observation truly renders the trajectory law independent of the observation schedule for the chosen random-DAG priors, rather than merely improving performance on the tested grids.

What would settle it

Generate trajectories from the same underlying random-DAG process under two different irregular observation schedules versus a uniform fine grid and test whether the empirical distribution of the observed paths remains statistically indistinguishable only for the fine-grid decoupled construction.

Figures

Figures reproduced from arXiv: 2605.28880 by Dennis Thumm, Ruben Wiedemann, Ying Chen.

**Figure 1.** Figure 1: Canonical SCM structures used by the named-structure sampler. Each panel shows a back-door / front-door / IV-style template with the treatment A (left), outcome Y (right), and any mediators or confounders. The random-DAG sampler in Section 3.2 subsumes these as special cases. z-scoring parallels Theophylline. Causal Chamber (wind-tunnel). The light-tunnel lt walks v1/actuators white benchmark used by earli… view at source ↗

read the original abstract

Extending discrete-time causal Prior-data Fitted Networks for time series to continuous time invites writing the mechanism as a stochastic differential equation (SDE) -- but if the SDE is integrated \emph{once per observation gap}, the trajectory law depends on when it is observed, and the prior remains a discrete-time Markov model in SDE clothing. We propose a precise continuity criterion -- trajectory-law invariance to the observation schedule -- together with a three-tier taxonomy (discrete; naive observation-grid integration; fine-grid integration with decoupled observation) and a construction realising the top tier on a random DAG with OU or small-MLP nonlinear drifts, irregular observation schedules, and hard / soft / time-varying interventions. A $2 \times 2$ encoder $\times$ integrator ablation, run independently on a linear and a nonlinear prior, finds fine-grid integration beats naive on 8/8 cells (sign-consistency $p < 1/256$) with the gap growing as the eval grid refines; the encoder axis is null with fine integration but time-aware-leading with naive. We release the prior and a preliminary zero-shot protocol on pharmacokinetic and physical-system data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a trajectory-law invariance criterion for continuous-time causal PFNs and shows empirical gains from fine-grid decoupled integration over naive methods, but the ablation demonstrates performance improvements rather than exact schedule independence.

read the letter

The main takeaway is that this work defines trajectory-law invariance to the observation schedule as the right continuity criterion for causal PFNs in continuous time, supplies a three-tier taxonomy, and gives a construction using fine-grid integration on random-DAG priors with OU or small-MLP drifts that handles irregular sampling and interventions.

What is new is the invariance framing itself and the decoupled fine-grid approach; it is not a direct lift from discrete PFN literature. The 2x2 ablation on linear and nonlinear priors is the strongest part: fine-grid integration beats naive on all eight cells with sign-consistency p < 1/256, the gap widens on refined evaluation grids, and the encoder axis becomes null once fine integration is used. Releasing the prior and a zero-shot protocol on pharmacokinetic and physical-system data is also concrete and helpful.

The soft spot is the distance between the reported results and the top-tier claim of exact trajectory-law invariance. The ablation shows consistent superiority and reduced schedule sensitivity, but finite-grid numerical integration plus decoupled observation can still leave discretization or intervention artifacts, especially under nonlinear drifts or time-varying interventions. No closed-form verification or distribution-matching argument appears even for the linear OU case, so the construction improves approximation quality without yet proving the law is identical across schedules.

This is for researchers extending causal foundation models to irregular scientific time series. A reader working on continuous-time priors or SDE-based causal models will find the taxonomy and ablation useful even if they want tighter theory.

It deserves a serious referee because the problem is real, the proposal is well-motivated, and the experiments are informative; the central empirical claim holds up, though the invariance argument would benefit from more rigorous support.

Referee Report

2 major / 1 minor

Summary. The paper extends discrete-time causal Prior-data Fitted Networks to continuous time by representing mechanisms as SDEs. It introduces the criterion of trajectory-law invariance to the observation schedule, a three-tier taxonomy (discrete; naive observation-grid integration; fine-grid integration with decoupled observation), and a construction claimed to realize the top tier on random-DAG priors with OU or small-MLP nonlinear drifts, irregular sampling, and hard/soft/time-varying interventions. A 2x2 encoder x integrator ablation on linear and nonlinear priors reports that fine-grid integration outperforms naive integration on all 8 cells (sign-consistency p < 1/256), with the gap increasing on refined evaluation grids; the encoder axis is null under fine integration. The prior and a preliminary zero-shot protocol on pharmacokinetic and physical-system data are released.

Significance. If the construction achieves exact trajectory-law invariance rather than merely improved empirical performance, the work would supply a principled route to continuous-time causal foundation models that remain consistent under arbitrary observation schedules. The release of the prior and zero-shot protocol is a concrete strength that supports reproducibility and downstream testing.

major comments (2)

[Abstract] Abstract: the claim that the construction 'realises the top tier' (exact trajectory-law invariance) is load-bearing for the central contribution, yet the only supporting evidence is the reported 2x2 ablation showing performance gains; no closed-form verification, exact distribution-matching argument, or proof that the generated measure is identical across schedules is supplied, even for the linear OU case.
[Abstract] Abstract (ablation description): the sign-consistency result (p < 1/256 across 8/8 cells) is presented without implementation details, data-generation code, or explicit checks that the fine-grid choice was not post-hoc tuned to the evaluation grids, leaving open the possibility that the observed superiority reflects reduced but nonzero schedule dependence rather than invariance.

minor comments (1)

[Abstract] Abstract: the three-tier taxonomy is named but the precise mathematical distinctions between 'naive observation-grid integration' and 'fine-grid integration with decoupled observation' are not stated; adding the governing equations or pseudocode would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review and for highlighting the importance of trajectory-law invariance. We address the two major comments below. We agree that the abstract's claim of exact realisation of the top tier overstates the evidence, which is empirical rather than formal, and will revise accordingly. We will also expand the experimental details as requested.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the construction 'realises the top tier' (exact trajectory-law invariance) is load-bearing for the central contribution, yet the only supporting evidence is the reported 2x2 ablation showing performance gains; no closed-form verification, exact distribution-matching argument, or proof that the generated measure is identical across schedules is supplied, even for the linear OU case.

Authors: We acknowledge that the manuscript asserts the construction realises the top tier while supplying only the 2x2 ablation as supporting evidence, without a closed-form proof or distribution-matching argument (including for the linear OU case). The fine-grid integration with decoupled observation is designed to drive the generated trajectories toward schedule invariance by refining the numerical integration independently of observation times; the ablation demonstrates that this yields statistically consistent performance across schedules where naive integration does not. We will revise the abstract to state that the construction is proposed to realise the top tier and is supported by strong empirical evidence from the ablation, rather than claiming exact invariance without qualification. revision: yes
Referee: [Abstract] Abstract (ablation description): the sign-consistency result (p < 1/256 across 8/8 cells) is presented without implementation details, data-generation code, or explicit checks that the fine-grid choice was not post-hoc tuned to the evaluation grids, leaving open the possibility that the observed superiority reflects reduced but nonzero schedule dependence rather than invariance.

Authors: We agree that the current description lacks sufficient implementation details. In the revision we will add the precise data-generation procedure (including random-DAG sampling, OU/MLP drift parameters, and irregular schedule generation), the exact fine-grid step count per observation interval, the repository link containing the full code, and an additional ablation confirming that the performance gap persists across a range of fine-grid resolutions chosen independently of the final evaluation grids. The sign-consistency test was obtained by executing the full 2x2 ablation eight times with distinct random seeds; we will report the per-cell outcomes and the binomial test details explicitly. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a new continuity criterion (trajectory-law invariance) and a three-tier taxonomy, then presents an explicit construction (fine-grid integration with decoupled observation on random-DAG priors with OU/MLP drifts) that is claimed to realize the top tier. The 2x2 ablation reports empirical performance differences but does not treat any fitted parameter as a prediction or rely on self-citation for the load-bearing step. The derivation chain remains self-contained: the invariance property is asserted by the integration scheme itself rather than by redefinition, renaming, or reduction to prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the random-DAG prior and OU/MLP drifts are treated as standard building blocks.

pith-pipeline@v0.9.1-grok · 5735 in / 1198 out tokens · 37827 ms · 2026-06-29T19:48:51.607895+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Temporal Causal Prior-Data Fitted Networks for Panel Data with Learned Reliability Signals
cs.LG 2026-06 unverdicted novelty 7.0

TCPFN is a zero-shot foundation model for temporal causal discovery on panel data that jointly predicts multiple causal aspects and reliability signals, with reported high AUROC on benchmarks and better scaling than P...

Reference graph

Works this paper leans on

2 extracted references · 1 canonical work pages · cited by 1 Pith paper

[1]

Rubanova, Y ., Chen, R

URL https://openreview.net/forum? id=OaNbl9b56B. Rubanova, Y ., Chen, R. T. Q., and Duvenaud, D. Latent or- dinary differential equations for irregularly-sampled time series. InAdvances in Neural Information Processing Systems, volume 32, 2019. Shukla, S. N. and Marlin, B. M. Multi-time attention net- works for irregularly sampled time series. InInternati...

2019
[2]

URL https://openreview.net/forum? id=JbTgx2L9Z2. Tzen, B. and Raginsky, M. Neural stochastic differen- tial equations: Deep latent gaussian models in the diffu- sion limit, 2019. URL https://arxiv.org/abs/ 1905.09883. Xia, S. et al. Causal time series generation via diffusion models.arXiv preprint arXiv:2509.20846, 2025. Xie, S., Feofanov, V ., Alonso, M....

work page arXiv 2019

[1] [1]

Rubanova, Y ., Chen, R

URL https://openreview.net/forum? id=OaNbl9b56B. Rubanova, Y ., Chen, R. T. Q., and Duvenaud, D. Latent or- dinary differential equations for irregularly-sampled time series. InAdvances in Neural Information Processing Systems, volume 32, 2019. Shukla, S. N. and Marlin, B. M. Multi-time attention net- works for irregularly sampled time series. InInternati...

2019

[2] [2]

URL https://openreview.net/forum? id=JbTgx2L9Z2. Tzen, B. and Raginsky, M. Neural stochastic differen- tial equations: Deep latent gaussian models in the diffu- sion limit, 2019. URL https://arxiv.org/abs/ 1905.09883. Xia, S. et al. Causal time series generation via diffusion models.arXiv preprint arXiv:2509.20846, 2025. Xie, S., Feofanov, V ., Alonso, M....

work page arXiv 2019