Recognition: no theorem link
JointFM-0.1: A Foundation Model for Multi-Target Joint Distributional Prediction
Pith reviewed 2026-05-15 11:37 UTC · model grok-4.3
The pith
JointFM trains on synthetic SDEs alone to predict joint distributions of coupled time series directly, cutting energy loss 21.1% versus baselines in zero-shot tests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
JointFM inverts the SDE paradigm by training a single generic model on infinite synthetic SDE trajectories so that it outputs future joint probability distributions directly from observed coupled time series. No per-task SDE fitting, calibration, or fine-tuning is performed. In zero-shot evaluation on oracle joint distributions generated by unseen synthetic SDEs, the model reduces energy loss by 21.1 percent relative to the strongest baseline.
What carries the argument
JointFM, the foundation model trained on streams of synthetic SDEs to map input coupled time series to predicted future joint distributions without calibration.
If this is right
- Any new coupled time series can receive distributional forecasts without running SDE calibration or simulation.
- Computational cost for uncertainty modeling drops because high-fidelity per-instance simulations are replaced by a single forward pass.
- Risk assessment in domains that rely on joint dynamics becomes independent of brittle SDE parameter estimation.
- Synthetic-data training alone suffices to outperform traditional baselines on the distributional recovery task.
Where Pith is reading between the lines
- If synthetic SDE families cover the dominant statistical features of real systems, the same zero-shot model could transfer to empirical datasets in finance or biology.
- The approach opens the door to hybrid pipelines that combine the frozen JointFM with lightweight real-data adapters rather than full retraining.
- Similar foundation-model strategies might replace per-instance calibration in other stochastic process classes beyond SDEs.
Load-bearing premise
A model trained exclusively on synthetic SDEs will produce useful distributional predictions for real-world coupled time series without any task-specific calibration or fine-tuning.
What would settle it
JointFM fails to reduce energy loss below the strongest baseline when recovering joint distributions from a fresh set of unseen synthetic SDEs or from real coupled time series data in zero-shot mode.
Figures
read the original abstract
Despite the rapid advancements in Artificial Intelligence (AI), Stochastic Differential Equations (SDEs) remain the gold-standard formalism for modeling systems under uncertainty. However, applying SDEs in practice is fraught with challenges: modeling risk is high, calibration is often brittle, and high-fidelity simulations are computationally expensive. This technical report introduces JointFM, a foundation model that inverts this paradigm. Instead of fitting SDEs to data, we sample an infinite stream of synthetic SDEs to train a generic model to predict future joint probability distributions directly. This approach establishes JointFM as the first foundation model for distributional predictions of coupled time series - requiring no task-specific calibration or finetuning. Despite operating in a purely zero-shot setting, JointFM reduces the energy loss by 21.1% relative to the strongest baseline when recovering oracle joint distributions generated by unseen synthetic SDEs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces JointFM-0.1, a foundation model trained on synthetic SDEs to directly predict joint probability distributions for coupled time series in a zero-shot setting without task-specific calibration or fine-tuning. It claims a 21.1% reduction in energy loss relative to the strongest baseline when recovering oracle joint distributions generated by unseen synthetic SDEs.
Significance. If the central empirical result can be verified with a properly defined energy loss metric, explicit baseline descriptions, and non-circular test distributions, the work would represent a notable step toward foundation models for multivariate distributional forecasting, potentially offering computational advantages over traditional SDE fitting. The zero-shot framing on synthetic data is promising but requires stronger evidence of generalization to be impactful.
major comments (3)
- [Abstract] Abstract: the central claim of a 21.1% energy-loss reduction is presented without any mathematical definition of the energy loss metric (e.g., energy distance or custom functional), without naming or describing the strongest baseline, and without error bars or details on how the unseen synthetic SDE test distribution was generated or separated from training data.
- [Experimental evaluation] Experimental evaluation: training and test SDEs are drawn from the same synthetic generator family, which undermines the independence required for the zero-shot generalization statement; no explicit parameter separation, hold-out generator families, or real-world benchmarks are reported to address this circularity risk.
- [Results section] Results section: no information is supplied on the number of independent runs, statistical significance testing, or the precise form of the reported energy loss, rendering the quantitative improvement unverifiable and the comparison to baselines impossible to assess.
minor comments (2)
- [Introduction] Introduction: the term 'foundation model' is used without clarifying how the architecture and training regime differ from standard sequence models or prior work on neural SDE solvers.
- [Methods] Methods: the network architecture, input/output representations for joint distributions, and the training objective should be specified with equations or pseudocode.
Simulated Author's Rebuttal
Thank you for the detailed review. We appreciate the feedback on improving the clarity of our claims regarding the energy loss metric, experimental setup, and results reporting. We will revise the manuscript to address these points as detailed below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of a 21.1% energy-loss reduction is presented without any mathematical definition of the energy loss metric (e.g., energy distance or custom functional), without naming or describing the strongest baseline, and without error bars or details on how the unseen synthetic SDE test distribution was generated or separated from training data.
Authors: We agree that the abstract should be self-contained. We will include a brief mathematical definition of the energy loss (the energy distance between predicted and oracle joint distributions), name the strongest baseline, add error bars from multiple runs, and clarify how the unseen test SDEs were generated with parameter values held out from training. revision: yes
-
Referee: [Experimental evaluation] Experimental evaluation: training and test SDEs are drawn from the same synthetic generator family, which undermines the independence required for the zero-shot generalization statement; no explicit parameter separation, hold-out generator families, or real-world benchmarks are reported to address this circularity risk.
Authors: We acknowledge the concern regarding potential circularity. While the SDEs are from the same broad family, we ensured independence by using completely disjoint parameter ranges for training and testing, as specified in the data generation procedure. We will add explicit details on the parameter separation in the revised experimental section. Real-world benchmarks are outside the scope of this initial technical report, which focuses on validating the approach with controllable synthetic oracles; future work will explore real data. revision: partial
-
Referee: [Results section] Results section: no information is supplied on the number of independent runs, statistical significance testing, or the precise form of the reported energy loss, rendering the quantitative improvement unverifiable and the comparison to baselines impossible to assess.
Authors: We will revise the results section to include the number of independent runs, results of statistical significance testing, and the precise mathematical form of the energy loss. This will make the reported improvement verifiable and the baseline comparisons clear. revision: yes
Circularity Check
No significant circularity detected in derivation or evaluation chain
full rationale
The paper trains JointFM on an infinite stream of synthetic SDEs and reports a 21.1% energy-loss reduction on oracle joint distributions from unseen synthetic SDEs. This setup follows standard machine-learning practice for testing generalization within a data-generating family; the reported performance number is not mathematically equivalent to the training inputs by construction, nor does any equation or claim reduce to a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation. No uniqueness theorems, ansatzes, or renamings of known results are invoked in a circular manner. The evaluation therefore remains an independent empirical measurement relative to the stated training procedure.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Stochastic differential equations remain the appropriate formalism for modeling systems under uncertainty
Reference graph
Works this paper leans on
-
[1]
Acar , Christian Genest, and Johanna Nešlehová
Elif F. Acar , Christian Genest, and Johanna Nešlehová. Beyond simplified pair-copula constructions. Journal of Multivariate Analysis, 110:74–90, 2012. Special Issue on Copula Modeling and Dependence
work page 2012
-
[2]
Abdul Fatir Ansari, Oleksandr Shchur , Jaris Küken, Andreas Auer , Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shub- ham Kapoor , Danielle C. Maddix, Pablo Guerron, T ony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Mic...
work page 2025
-
[3]
Adelchi Azzalini and Antonella Capitanio. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society: Series B (Sta- tistical Methodology), 65(2):367–389, 2003
work page 2003
-
[4]
Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Dem- szky, Chris Donahue, Moussa Doumbouya, Esin Durmus, St...
work page 2021
-
[5]
T en things you should know about DCC
Massimiliano Caporin and Michael McAleer . T en things you should know about DCC. Working Paper 16/2013, Department of Economics and Finance, College of Business and Economics, University of Canterbury, Christchurch, New Zealand, 03 2013
work page 2013
-
[6]
Financial Modelling with Jump Processes
Rama Cont and Peter Tankov. Financial Modelling with Jump Processes. Chapman & Hall/CRC, 2004
work page 2004
-
[7]
A decoder-only foundation model for time- series forecasting, 2024
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time- series forecasting, 2024
work page 2024
-
[8]
Transform analysis and asset pricing for affine jump- diffusions
Darrell Duffie, Jun Pan, and Kenneth Singleton. Transform analysis and asset pricing for affine jump- diffusions. Econometrica, 68(6):1343–1376, 2000
work page 2000
-
[9]
Paul Embrechts, Alexander J. McNeil, and Daniel Straumann. Correlation and dependence in risk man- agement: Properties and pitfalls. In M. A. H. Dempster , editor , Risk Management: Value at Risk and Beyond, pages 176–223. Cambridge University Press, 2002
work page 2002
-
[10]
Dynamic conditional correlation
Robert Engle. Dynamic conditional correlation. Journal of Business & Economic Statistics, 20(3):339– 350, 2002
work page 2002
-
[11]
Jim Gatheral, Thibault Jaisson, and Mathieu Rosenbaum. Volatility is rough. Quantitative Finance, 18(6):933–949, 2018
work page 2018
-
[12]
Monte Carlo Methods in Financial Engineering, volume 53 of Applications of Math- ematics
Paul Glasserman. Monte Carlo Methods in Financial Engineering, volume 53 of Applications of Math- ematics. Springer , 2003
work page 2003
-
[13]
Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Jour- nal of the American Statistical Association, 102(477):359–378, 2007
work page 2007
-
[14]
On diffusion by discontinuous movements, and on the telegraph equation
Sidney Goldstein. On diffusion by discontinuous movements, and on the telegraph equation. The Quarterly Journal of Mechanics and Applied Mathematics, 4(2):129–156, 1951
work page 1951
-
[15]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair , Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Pro- cessing Systems, pages 2672–2680, 2014
work page 2014
-
[16]
Instant portfolio optimization with JointFM
Stefan Hackmann. Instant portfolio optimization with JointFM. https://www.datarobot.com/blog/ instant-portfolio-optimization-with-jointfm/ , 2026
work page 2026
- [17]
-
[18]
Evaluation of value-at-risk models using historical data
Darryll Hendricks. Evaluation of value-at-risk models using historical data. Economic Policy Review, 2(1):39–69, 1996
work page 1996
-
[19]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840–6851, 2020
work page 2020
-
[20]
Neural sdes as infinite- dimensional gans, 2021
Patrick Kidger , James Foster , Xuechen Li, Harald Oberhauser , and T erry Lyons. Neural sdes as infinite- dimensional gans, 2021
work page 2021
-
[21]
Diederik P . Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014. 8 JointFM-0. 1: A Foundation Model for Multi- Target Joint Distributional Prediction
work page 2014
-
[22]
Peter E. Kloeden and Eckhard Platen. Numerical Solution of Stochastic Differential Equations, vol- ume 23 of Applications of Mathematics. Springer , 1992
work page 1992
-
[23]
On Cox processes and credit risky securities
David Lando. On Cox processes and credit risky securities. Review of Derivatives Research, 2(2– 3):99–120, 1998
work page 1998
-
[24]
Xuechen Li, Ting-Kam Leonard Wong, Ricky T . Q. Chen, and David Duvenaud. Scalable gradients for stochastic differential equations. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AIST A TS), 2020
work page 2020
-
[25]
Moirai 2.0: When less is more for time series forecasting, 2025
Chenghao Liu, Taha Aksu, Juncheng Liu, Xu Liu, Hanshu Yan, Quang Pham, Silvio Savarese, Doyen Sahoo, Caiming Xiong, and Junnan Li. Moirai 2.0: When less is more for time series forecasting, 2025
work page 2025
-
[26]
Geoffrey McLachlan and David Peel. Finite Mixture Models. Wiley, 2000
work page 2000
-
[27]
Robert C. Merton. Option pricing when underlying stock returns are discontinuous. Journal of Finan- cial Economics, 3(1–2):125–144, 1976
work page 1976
-
[28]
Stochastic model predictive control: An overview and perspectives for future research
Ali Mesbah. Stochastic model predictive control: An overview and perspectives for future research. IEEE Control Systems Magazine, 36(6):30–44, 2016
work page 2016
-
[29]
T empopfn: Syn- thetic pre-training of linear rnns for zero-shot time series forecasting, 2025
Vladyslav Moroshan, Julien Siems, Arber Zela, Timur Carstensen, and Frank Hutter . T empopfn: Syn- thetic pre-training of linear rnns for zero-shot time series forecasting, 2025
work page 2025
-
[30]
Stochastic Differential Equations: An Introduction with Applications
Bernt Øksendal. Stochastic Differential Equations: An Introduction with Applications. Universitext. Springer , 6 edition, 2003
work page 2003
-
[31]
M. F. M. Osborne. Brownian motion in the stock market. Operations Research, 7(2):145–173, 1959
work page 1959
-
[32]
Variational inference with normalizing flows
Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Pro- ceedings of the 32nd International Conference on Machine Learning, pages 1530–1538, 2015
work page 2015
-
[33]
Measuring financial contagion: A copula approach
Juan Carlos Rodriguez. Measuring financial contagion: A copula approach. Journal of Empirical Fi- nance, 14(3):401–423, 2007
work page 2007
- [34]
-
[35]
M. Sklar . Fonctions de répartition à N dimensions et leurs marges. Annales de l’ISUP , VIII(3):229–231, 1959
work page 1959
-
[36]
G. E. Uhlenbeck and L. S. Ornstein. On the theory of the brownian motion. Physical Review, 36(5):823– 841, 1930
work page 1930
-
[37]
Gomez, Łukasz Kaiser , and Illia Polosukhin
Ashish Vaswani, Noam Shazeer , Niki Parmar , Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser , and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017 . 9 JointFM-0. 1: A Foundation Model for Multi- Target Joint Distributional Prediction A Training Data Figure 3: Training dat...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.