arxiv: 2603.20266 · v2 · submitted 2026-03-14 · 💻 cs.LG · cs.AI

Recognition: no theorem link

JointFM-0.1: A Foundation Model for Multi-Target Joint Distributional Prediction

Stefan Hackmann

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:37 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords foundation modeljoint distributional predictionstochastic differential equationscoupled time serieszero-shot predictionsynthetic data trainingenergy loss

0 comments

The pith

JointFM trains on synthetic SDEs alone to predict joint distributions of coupled time series directly, cutting energy loss 21.1% versus baselines in zero-shot tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents JointFM as a foundation model that samples an infinite stream of synthetic stochastic differential equations to learn direct prediction of future joint probability distributions for coupled time series. This reverses the usual workflow of fitting and calibrating SDEs to each new dataset, removing the associated modeling risk, calibration fragility, and simulation costs. The resulting model requires no task-specific fine-tuning and operates purely zero-shot. A sympathetic reader cares because successful generalization from synthetic training would make accurate uncertainty modeling practical for any coupled dynamic system without repeated expert intervention.

Core claim

JointFM inverts the SDE paradigm by training a single generic model on infinite synthetic SDE trajectories so that it outputs future joint probability distributions directly from observed coupled time series. No per-task SDE fitting, calibration, or fine-tuning is performed. In zero-shot evaluation on oracle joint distributions generated by unseen synthetic SDEs, the model reduces energy loss by 21.1 percent relative to the strongest baseline.

What carries the argument

JointFM, the foundation model trained on streams of synthetic SDEs to map input coupled time series to predicted future joint distributions without calibration.

If this is right

Any new coupled time series can receive distributional forecasts without running SDE calibration or simulation.
Computational cost for uncertainty modeling drops because high-fidelity per-instance simulations are replaced by a single forward pass.
Risk assessment in domains that rely on joint dynamics becomes independent of brittle SDE parameter estimation.
Synthetic-data training alone suffices to outperform traditional baselines on the distributional recovery task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If synthetic SDE families cover the dominant statistical features of real systems, the same zero-shot model could transfer to empirical datasets in finance or biology.
The approach opens the door to hybrid pipelines that combine the frozen JointFM with lightweight real-data adapters rather than full retraining.
Similar foundation-model strategies might replace per-instance calibration in other stochastic process classes beyond SDEs.

Load-bearing premise

A model trained exclusively on synthetic SDEs will produce useful distributional predictions for real-world coupled time series without any task-specific calibration or fine-tuning.

What would settle it

JointFM fails to reduce energy loss below the strongest baseline when recovering joint distributions from a fresh set of unseen synthetic SDEs or from real coupled time series data in zero-shot mode.

Figures

Figures reproduced from arXiv: 2603.20266 by Stefan Hackmann.

**Figure 1.** Figure 1: JointFM, the digital quant. Traditional quantitative modeling follows a three-stage selectcalibrate-simulate pipeline (top): choose a system of stochastic processes, fit parameters to historical data, and then simulate future paths [22, 12]. JointFM replaces this workflow by pretraining on a universe of synthetic SDE dynamics and directly predicting future joint probability distributions from context in a… view at source ↗

**Figure 2.** Figure 2: Recovering joint distributions. Zero-shot synthetic distribution recovery across model families and baselines. Lower values indicate better distributional match on energy loss, marginal energy, and CRPS-sum. JointFM variants consistently outperform Historical Simulation and DCC-GARCH, with the performance gap becoming increasingly pronounced as the forecast horizon lengthens. lengthens. DCC-GARCH shows a … view at source ↗

**Figure 3.** Figure 3: Training data. This example stems from SDE sampler as it is configured for our recovery experiment but showing only four targets. 10 [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

Despite the rapid advancements in Artificial Intelligence (AI), Stochastic Differential Equations (SDEs) remain the gold-standard formalism for modeling systems under uncertainty. However, applying SDEs in practice is fraught with challenges: modeling risk is high, calibration is often brittle, and high-fidelity simulations are computationally expensive. This technical report introduces JointFM, a foundation model that inverts this paradigm. Instead of fitting SDEs to data, we sample an infinite stream of synthetic SDEs to train a generic model to predict future joint probability distributions directly. This approach establishes JointFM as the first foundation model for distributional predictions of coupled time series - requiring no task-specific calibration or finetuning. Despite operating in a purely zero-shot setting, JointFM reduces the energy loss by 21.1% relative to the strongest baseline when recovering oracle joint distributions generated by unseen synthetic SDEs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces JointFM-0.1, a foundation model trained on synthetic SDEs to directly predict joint probability distributions for coupled time series in a zero-shot setting without task-specific calibration or fine-tuning. It claims a 21.1% reduction in energy loss relative to the strongest baseline when recovering oracle joint distributions generated by unseen synthetic SDEs.

Significance. If the central empirical result can be verified with a properly defined energy loss metric, explicit baseline descriptions, and non-circular test distributions, the work would represent a notable step toward foundation models for multivariate distributional forecasting, potentially offering computational advantages over traditional SDE fitting. The zero-shot framing on synthetic data is promising but requires stronger evidence of generalization to be impactful.

major comments (3)

[Abstract] Abstract: the central claim of a 21.1% energy-loss reduction is presented without any mathematical definition of the energy loss metric (e.g., energy distance or custom functional), without naming or describing the strongest baseline, and without error bars or details on how the unseen synthetic SDE test distribution was generated or separated from training data.
[Experimental evaluation] Experimental evaluation: training and test SDEs are drawn from the same synthetic generator family, which undermines the independence required for the zero-shot generalization statement; no explicit parameter separation, hold-out generator families, or real-world benchmarks are reported to address this circularity risk.
[Results section] Results section: no information is supplied on the number of independent runs, statistical significance testing, or the precise form of the reported energy loss, rendering the quantitative improvement unverifiable and the comparison to baselines impossible to assess.

minor comments (2)

[Introduction] Introduction: the term 'foundation model' is used without clarifying how the architecture and training regime differ from standard sequence models or prior work on neural SDE solvers.
[Methods] Methods: the network architecture, input/output representations for joint distributions, and the training objective should be specified with equations or pseudocode.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the detailed review. We appreciate the feedback on improving the clarity of our claims regarding the energy loss metric, experimental setup, and results reporting. We will revise the manuscript to address these points as detailed below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of a 21.1% energy-loss reduction is presented without any mathematical definition of the energy loss metric (e.g., energy distance or custom functional), without naming or describing the strongest baseline, and without error bars or details on how the unseen synthetic SDE test distribution was generated or separated from training data.

Authors: We agree that the abstract should be self-contained. We will include a brief mathematical definition of the energy loss (the energy distance between predicted and oracle joint distributions), name the strongest baseline, add error bars from multiple runs, and clarify how the unseen test SDEs were generated with parameter values held out from training. revision: yes
Referee: [Experimental evaluation] Experimental evaluation: training and test SDEs are drawn from the same synthetic generator family, which undermines the independence required for the zero-shot generalization statement; no explicit parameter separation, hold-out generator families, or real-world benchmarks are reported to address this circularity risk.

Authors: We acknowledge the concern regarding potential circularity. While the SDEs are from the same broad family, we ensured independence by using completely disjoint parameter ranges for training and testing, as specified in the data generation procedure. We will add explicit details on the parameter separation in the revised experimental section. Real-world benchmarks are outside the scope of this initial technical report, which focuses on validating the approach with controllable synthetic oracles; future work will explore real data. revision: partial
Referee: [Results section] Results section: no information is supplied on the number of independent runs, statistical significance testing, or the precise form of the reported energy loss, rendering the quantitative improvement unverifiable and the comparison to baselines impossible to assess.

Authors: We will revise the results section to include the number of independent runs, results of statistical significance testing, and the precise mathematical form of the energy loss. This will make the reported improvement verifiable and the baseline comparisons clear. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation or evaluation chain

full rationale

The paper trains JointFM on an infinite stream of synthetic SDEs and reports a 21.1% energy-loss reduction on oracle joint distributions from unseen synthetic SDEs. This setup follows standard machine-learning practice for testing generalization within a data-generating family; the reported performance number is not mathematically equivalent to the training inputs by construction, nor does any equation or claim reduce to a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation. No uniqueness theorems, ansatzes, or renamings of known results are invoked in a circular manner. The evaluation therefore remains an independent empirical measurement relative to the stated training procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that synthetic SDEs adequately sample the space of real coupled time-series dynamics and that the chosen energy-loss metric faithfully reflects distributional accuracy; no free parameters or invented entities are explicitly named in the abstract.

axioms (1)

domain assumption Stochastic differential equations remain the appropriate formalism for modeling systems under uncertainty
Stated as the gold-standard formalism in the opening sentence of the abstract.

pith-pipeline@v0.9.0 · 5441 in / 1283 out tokens · 42177 ms · 2026-05-15T11:37:22.461453+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Acar , Christian Genest, and Johanna Nešlehová

Elif F. Acar , Christian Genest, and Johanna Nešlehová. Beyond simplified pair-copula constructions. Journal of Multivariate Analysis, 110:74–90, 2012. Special Issue on Copula Modeling and Dependence

work page 2012
[2]

Maddix, Pablo Guerron, T ony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael Bohlke-Schneider

Abdul Fatir Ansari, Oleksandr Shchur , Jaris Küken, Andreas Auer , Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shub- ham Kapoor , Danielle C. Maddix, Pablo Guerron, T ony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Mic...

work page 2025
[3]

Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution

Adelchi Azzalini and Antonella Capitanio. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society: Series B (Sta- tistical Methodology), 65(2):367–389, 2003

work page 2003
[4]

Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Dem- szky, Chris Donahue, Moussa Doumbouya, Esin Durmus, St...

work page 2021
[5]

T en things you should know about DCC

Massimiliano Caporin and Michael McAleer . T en things you should know about DCC. Working Paper 16/2013, Department of Economics and Finance, College of Business and Economics, University of Canterbury, Christchurch, New Zealand, 03 2013

work page 2013
[6]

Financial Modelling with Jump Processes

Rama Cont and Peter Tankov. Financial Modelling with Jump Processes. Chapman & Hall/CRC, 2004

work page 2004
[7]

A decoder-only foundation model for time- series forecasting, 2024

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time- series forecasting, 2024

work page 2024
[8]

Transform analysis and asset pricing for affine jump- diffusions

Darrell Duffie, Jun Pan, and Kenneth Singleton. Transform analysis and asset pricing for affine jump- diffusions. Econometrica, 68(6):1343–1376, 2000

work page 2000
[9]

McNeil, and Daniel Straumann

Paul Embrechts, Alexander J. McNeil, and Daniel Straumann. Correlation and dependence in risk man- agement: Properties and pitfalls. In M. A. H. Dempster , editor , Risk Management: Value at Risk and Beyond, pages 176–223. Cambridge University Press, 2002

work page 2002
[10]

Dynamic conditional correlation

Robert Engle. Dynamic conditional correlation. Journal of Business & Economic Statistics, 20(3):339– 350, 2002

work page 2002
[11]

Volatility is rough

Jim Gatheral, Thibault Jaisson, and Mathieu Rosenbaum. Volatility is rough. Quantitative Finance, 18(6):933–949, 2018

work page 2018
[12]

Monte Carlo Methods in Financial Engineering, volume 53 of Applications of Math- ematics

Paul Glasserman. Monte Carlo Methods in Financial Engineering, volume 53 of Applications of Math- ematics. Springer , 2003

work page 2003
[13]

Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Jour- nal of the American Statistical Association, 102(477):359–378, 2007

work page 2007
[14]

On diffusion by discontinuous movements, and on the telegraph equation

Sidney Goldstein. On diffusion by discontinuous movements, and on the telegraph equation. The Quarterly Journal of Mechanics and Applied Mathematics, 4(2):129–156, 1951

work page 1951
[15]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair , Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Pro- cessing Systems, pages 2672–2680, 2014

work page 2014
[16]

Instant portfolio optimization with JointFM

Stefan Hackmann. Instant portfolio optimization with JointFM. https://www.datarobot.com/blog/ instant-portfolio-optimization-with-jointfm/ , 2026

work page 2026
[17]

Hamilton

James D. Hamilton. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57(2):357–384, 1989

work page 1989
[18]

Evaluation of value-at-risk models using historical data

Darryll Hendricks. Evaluation of value-at-risk models using historical data. Economic Policy Review, 2(1):39–69, 1996

work page 1996
[19]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840–6851, 2020

work page 2020
[20]

Neural sdes as infinite- dimensional gans, 2021

Patrick Kidger , James Foster , Xuechen Li, Harald Oberhauser , and T erry Lyons. Neural sdes as infinite- dimensional gans, 2021

work page 2021
[21]

Kingma and Max Welling

Diederik P . Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014. 8 JointFM-0. 1: A Foundation Model for Multi- Target Joint Distributional Prediction

work page 2014
[22]

Kloeden and Eckhard Platen

Peter E. Kloeden and Eckhard Platen. Numerical Solution of Stochastic Differential Equations, vol- ume 23 of Applications of Mathematics. Springer , 1992

work page 1992
[23]

On Cox processes and credit risky securities

David Lando. On Cox processes and credit risky securities. Review of Derivatives Research, 2(2– 3):99–120, 1998

work page 1998
[24]

Xuechen Li, Ting-Kam Leonard Wong, Ricky T . Q. Chen, and David Duvenaud. Scalable gradients for stochastic differential equations. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AIST A TS), 2020

work page 2020
[25]

Moirai 2.0: When less is more for time series forecasting, 2025

Chenghao Liu, Taha Aksu, Juncheng Liu, Xu Liu, Hanshu Yan, Quang Pham, Silvio Savarese, Doyen Sahoo, Caiming Xiong, and Junnan Li. Moirai 2.0: When less is more for time series forecasting, 2025

work page 2025
[26]

Finite Mixture Models

Geoffrey McLachlan and David Peel. Finite Mixture Models. Wiley, 2000

work page 2000
[27]

Robert C. Merton. Option pricing when underlying stock returns are discontinuous. Journal of Finan- cial Economics, 3(1–2):125–144, 1976

work page 1976
[28]

Stochastic model predictive control: An overview and perspectives for future research

Ali Mesbah. Stochastic model predictive control: An overview and perspectives for future research. IEEE Control Systems Magazine, 36(6):30–44, 2016

work page 2016
[29]

T empopfn: Syn- thetic pre-training of linear rnns for zero-shot time series forecasting, 2025

Vladyslav Moroshan, Julien Siems, Arber Zela, Timur Carstensen, and Frank Hutter . T empopfn: Syn- thetic pre-training of linear rnns for zero-shot time series forecasting, 2025

work page 2025
[30]

Stochastic Differential Equations: An Introduction with Applications

Bernt Øksendal. Stochastic Differential Equations: An Introduction with Applications. Universitext. Springer , 6 edition, 2003

work page 2003
[31]

M. F. M. Osborne. Brownian motion in the stock market. Operations Research, 7(2):145–173, 1959

work page 1959
[32]

Variational inference with normalizing flows

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Pro- ceedings of the 32nd International Conference on Machine Learning, pages 1530–1538, 2015

work page 2015
[33]

Measuring financial contagion: A copula approach

Juan Carlos Rodriguez. Measuring financial contagion: A copula approach. Journal of Empirical Fi- nance, 14(3):401–423, 2007

work page 2007
[34]

Schwartz

Eduardo S. Schwartz. The stochastic behavior of commodity prices: Implications for valuation and hedging. The Journal of Finance, 52(3):923–973, 1997

work page 1997
[35]

M. Sklar . Fonctions de répartition à N dimensions et leurs marges. Annales de l’ISUP , VIII(3):229–231, 1959

work page 1959
[36]

G. E. Uhlenbeck and L. S. Ornstein. On the theory of the brownian motion. Physical Review, 36(5):823– 841, 1930

work page 1930
[37]

Gomez, Łukasz Kaiser , and Illia Polosukhin

Ashish Vaswani, Noam Shazeer , Niki Parmar , Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser , and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017 . 9 JointFM-0. 1: A Foundation Model for Multi- Target Joint Distributional Prediction A Training Data Figure 3: Training dat...

work page 2017