pith. sign in

arxiv: 2606.09912 · v1 · pith:3R7M2H47new · submitted 2026-06-06 · 💻 cs.LG · cs.AI

Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining

Pith reviewed 2026-06-27 20:30 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords time seriessynthetic datafoundation modelspretrainingdata compositionforecastinggenerator mixturemodel architectures
0
0 comments X

The pith

Mixing all synthetic generators equally matches or beats the best single generator for time series foundation model pretraining and produces the strongest corpora when combined with real data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that selecting the right synthetic data generator for pretraining time series models is unreliable because rankings change across different model architectures. Instead of picking one, an equal mixture of generators from all families performs at least as well as the top individual generator on both tested architectures. Adding real data to this mixture creates the overall best pretraining corpus. This shifts the focus from choosing generators to composing the right mix of synthetic and real data.

Core claim

The authors establish that synthetic pretraining for time series models is a corpus composition problem rather than a generator selection problem. A simple equal-weight mixture of all 11 generator families matches or beats the best individual generator for both architectures tested. Composing this mixture with real data yields the strongest pretraining corpora overall.

What carries the argument

The equal-weight mixture of all generator families, optionally combined with real data, which serves as the robust pretraining corpus independent of individual generator performance.

If this is right

  • Generator rankings are not stable across model architectures, requiring per-family validation of composition choices.
  • Composition of multiple generators is more reliable than selecting any single one.
  • The best results come from combining the synthetic mixture with real data rather than using either alone.
  • Under identical training budgets, poor generator choice can double forecasting error compared to the best choice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This suggests that similar mixture strategies could help in pretraining for other data types where generator choice is uncertain.
  • Practitioners might achieve better results by always using a broad mixture rather than trying to identify a single optimal generator.
  • Future work could test whether the optimal mixture weights differ from equal weighting or if certain subsets of generators suffice.

Load-bearing premise

The two model architectures and eleven generator families tested are representative enough that the observed instability in generator rankings will hold more generally.

What would settle it

Finding a new model architecture or set of generators where one generator consistently outperforms all others and the mixture does not match it would challenge the claim that composition is always preferable to selection.

Figures

Figures reproduced from arXiv: 2606.09912 by Aaryan Nagpal, Debdeep Sanyal, Dhruv Kumar, Murari Mandal, Saurabh Deshpande.

Figure 1
Figure 1. Figure 1: Per-generator PCA projections in the audit feature space. Each panel overlays one synthetic generator with the real reference corpus using the same PCA coordinate system. The figure is a qualitative diagnostic showing that generators occupy different regions of the audit feature space. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
read the original abstract

Choosing the wrong synthetic generator for time-series foundation model pretraining is costly: under identical training budgets, the best and worst generators produce up to a $2\times$ gap in forecasting error, yet the field has no principled way to make this choice. The problem is compounded by the fact that generator rankings are not stable across architectures: across 11 generator families evaluated on Chronos-T5-Mini and Moirai-Small trained from scratch, we find that which generators are useful depends on the model architecture. Rather than solving the generator selection problem, we sidestep it: a simple equal-weight mixture of all generators matches or beats the best individual generator for both architectures, and composing this mixture with real data yields the strongest pretraining corpora overall. Synthetic pretraining is therefore a corpus composition problem, not a generator selection problem, and composition choices should be validated per model family rather than assumed to transfer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript claims that selecting the appropriate synthetic data generator for pretraining time series foundation models is critical, as the best and worst generators can lead to up to a 2× gap in forecasting error under the same training budget. Through experiments with 11 generator families on two architectures (Chronos-T5-Mini and Moirai-Small), it shows that generator rankings are not stable across architectures. Instead of selecting a single generator, the authors demonstrate that an equal-weight mixture of all generators performs as well as or better than the best individual generator for both architectures. Furthermore, combining this synthetic mixture with real data produces the strongest pretraining corpora. The paper concludes that synthetic pretraining should be viewed as a corpus composition problem rather than a generator selection problem, and that composition choices need to be validated per model family.

Significance. If the empirical results hold under fuller scrutiny, the work provides a practical, low-effort strategy for constructing effective pretraining corpora via equal-weight mixing of synthetic generators, which could reduce the need for costly generator selection in time series foundation model development. It also supplies concrete evidence that data composition, rather than individual generator quality, drives performance, potentially shifting community practice toward systematic mixing protocols.

major comments (1)
  1. [Abstract] Abstract: The conclusion that generator rankings are unstable across architectures and that 'composition choices should be validated per model family rather than assumed to transfer' rests exclusively on the contrast between Chronos-T5-Mini and Moirai-Small. With only two architectures tested, the observed instability may be idiosyncratic rather than a general property of architectural families, which is load-bearing for the paper's central recommendation to treat composition as model-family-specific.
minor comments (1)
  1. [Abstract] The abstract states an 'up to a 2× gap in forecasting error' without specifying the exact metric, error bars, training budgets, or data exclusion rules; these details should be stated explicitly in §3 or §4 to allow verification of the motivating claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the single major comment point-by-point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The conclusion that generator rankings are unstable across architectures and that 'composition choices should be validated per model family rather than assumed to transfer' rests exclusively on the contrast between Chronos-T5-Mini and Moirai-Small. With only two architectures tested, the observed instability may be idiosyncratic rather than a general property of architectural families, which is load-bearing for the paper's central recommendation to treat composition as model-family-specific.

    Authors: We agree that the instability claim is based on only two architectures and that this limits how broadly the finding can be stated. Chronos-T5-Mini and Moirai-Small belong to recognizably different architectural families (T5-style encoder-decoder versus Moirai's distinct time-series transformer design), and the reversal in generator rankings between them is large enough to motivate caution about assuming transfer. Nevertheless, we accept that two examples do not establish a general property of architectural families. We will revise the abstract (and the corresponding discussion) to present the instability as an empirical observation from the two tested families rather than a universal claim, while retaining the practical recommendation to validate composition choices per model family. This change directly incorporates the referee's point. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical corpus comparison

full rationale

The paper reports direct experimental results: 11 generator families are used to create synthetic corpora, models (Chronos-T5-Mini and Moirai-Small) are trained from scratch on each, and forecasting errors are measured. The mixture claim is the observed outcome of those runs, not a fitted parameter renamed as a prediction or a self-referential definition. No equations, ansatzes, uniqueness theorems, or self-citations are invoked to derive the ranking instability or mixture superiority; both are presented as measured facts from the reported training runs. The derivation chain is therefore self-contained against external benchmarks (held-out forecasting error) and does not reduce to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on empirical results from two model architectures and eleven generator families. No free parameters, invented entities, or additional axioms are stated.

axioms (1)
  • domain assumption The eleven generator families constitute a representative sample of synthetic time-series sources.
    Invoked to support the mixture result and the claim that rankings are unstable.

pith-pipeline@v0.9.1-grok · 5703 in / 1155 out tokens · 19512 ms · 2026-06-27T20:30:12.384361+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 3 canonical work pages

  1. [1]

    GIFT-Eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393,

    Aksu, T., Woo, G., Liu, J., Liu, X., Liu, C., Savarese, S., Xiong, C., and Sahoo, D. GIFT-Eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393,

  2. [2]

    Gillespie, D. T. Exact numerical simulation of the Ornstein- Uhlenbeck process and its integral.Physical Review E, 54(2):2084–2091,

  3. [3]

    Strictly proper scoring rules, prediction, and estimation

    doi: 10.1198/016214506000001437. Hamilton, J. D. A new approach to the economic analy- sis of nonstationary time series and the business cycle. Econometrica, 57(2):357–384,

  4. [4]

    Bergmeir, R

    doi: 10.1016/j.ijforecast. 2006.03.001. Hyndman, R. J., Koehler, A. B., Ord, J. K., and Snyder, R. D.Forecasting with Exponential Smoothing: The State Space Approach. Springer,

  5. [5]

    Maat, R., Malali, A., and Protopapas, P

    doi: 10.1007/s10618-019-00647-x. Maat, R., Malali, A., and Protopapas, P. TimeSynth: A multipurpose library for synthetic time series generation in Python. https://github.com/TimeSynth/ TimeSynth,

  6. [6]

    TempoPFN: Synthetic pre-training of linear RNNs for zero-shot time series forecasting.arXiv preprint arXiv:2510.25502,

    Moroshan, V ., Siems, J., Zela, A., Carstensen, T., and Hut- ter, F. TempoPFN: Synthetic pre-training of linear RNNs for zero-shot time series forecasting.arXiv preprint arXiv:2510.25502,

  7. [7]

    Hyperparameter sampling distributions for all generators are reported in Appendix B. ARIMA The ARIMA generator samples a randomized SARIMA-like process with non-seasonal ARMA dynamics, optional ordinary integration, and optional seasonal lag structure (Box & Jenkins, 1970; Hamilton, 1994). It first simulates an ARMA(p, q) process via yt = pX j=1 ϕj yt−j +...

  8. [8]

    The Lorenz system is integrated via fourth-order Runge–Kutta, dx dt =σ(y−x), dy dt =x(ρ−z)−y, dz dt =xy−β Lz,(2) and one coordinate is extracted as the observed series

    and the Mackey–Glass delay differential equation (Mackey & Glass, 1977). The Lorenz system is integrated via fourth-order Runge–Kutta, dx dt =σ(y−x), dy dt =x(ρ−z)−y, dz dt =xy−β Lz,(2) and one coordinate is extracted as the observed series. The Mackey–Glass equation is integrated via forward Euler, dx dt = βM x(t−τ) 1 +x(t−τ) n −γ M x(t).(3) A burn-in of...

  9. [9]

    ETS The ETS generator implements the full error–trend–seasonality state-space family (Hyndman et al., 2008), with error type ∈ {A,M} , trend ∈ {N,A,A d}, and seasonality ∈ {N,A,M} drawn uniformly. The observation and state-update equations for the multiplicative-error, damped-trend, multiplicative-seasonality variant are yt = (ℓt−1 +ϕb t−1)s t−m (1 +ε t),...

  10. [10]

    fBm The fBm generator produces either fractional Brownian motion or its stationary increments (fractional Gaussian noise), governed by the Hurst exponentH∈(0.1,0.9)(Mandelbrot & Van Ness, 1968). The covariance structure is C(i, j) = 1 2 |i|2H +|j| 2H − |i−j| 2H .(5) 6 Why Synthetic Corpus Composition Matters for TSFM Pretraining Samples are drawn via the ...

  11. [11]

    All share the mean equation rt =µ+ϕ(r t−1 −µ) +ε t withε t =σ tzt

    GARCH The GARCH generator draws from three conditional heteroskedasticity models: GARCH(1,1) (Bollerslev, 1986), GJR- GARCH (Glosten et al., 1993), and EGARCH (Nelson, 1991). All share the mean equation rt =µ+ϕ(r t−1 −µ) +ε t withε t =σ tzt. The variance equations are respectively σ2 t =ω+αε 2 t−1 +βσ 2 t−1,(6) σ2 t =ω+ (α+γ1[ε t−1 <0])ε 2 t−1 +βσ 2 t−1, ...

  12. [12]

    The mean function m(x) is sampled with probability 0.5 from {linear,quadratic,sinusoidal,log-linear,random walk}

    KernelSynth The KernelSynth generator draws samples from a Gaussian process prior (Ansari et al., 2024; Rasmussen & Williams, 2006), y∼ GP(m(x), k(x,x ′)),(7) where the kernel k is constructed by randomly composing 1–5 base kernels from a bank of 41, including RBF, Mat ´ern (ν∈ {0.5,1.5,2.5} ), rational quadratic, periodic (ExpSineSquared), dot-product, a...

  13. [13]

    Within regime k, the exact discrete-time transition (Gillespie,

    SDE The SDE generator implements a regime-switching Ornstein–Uhlenbeck process (Uhlenbeck & Ornstein, 1930; Hamilton, 1989), with K∈ {2,3,4} regimes governed by a Markov transition matrix P . Within regime k, the exact discrete-time transition (Gillespie,

  14. [14]

    TimeSynth The TimeSynth generator mixes 1–3 signal components drawn from a library of five types — sinusoidal, continuous autoregressive (CAR), NARMA, pseudoperiodic, and autoregressive — using the TimeSynth library (Maat et al., 2017). 7 Why Synthetic Corpus Composition Matters for TSFM Pretraining Components are combined additively (probability 0.8) or ...

  15. [15]

    Eight combination models are used; the fully multiplicative variant is yt =T t (1 + 0.1St) (1 + 0.1Nt),(10) with additive and mixed forms following analogously

    TSI The TSI generator constructs series by combining trend (T ), seasonality (S), and irregularity (N) components (Bahrpeyma et al., 2021), with component types drawn uniformly from parametric families including a null option for each. Eight combination models are used; the fully multiplicative variant is yt =T t (1 + 0.1St) (1 + 0.1Nt),(10) with additive...

  16. [16]

    Each component is wt =A·f(2πν t+φ),(11) where f∈ {sawtooth,square,triangle} , amplitude A∼U(0.3,3.0) , frequency ν∼U(1,50) , and phase φ∼U(0,2π)

    Waveform The Waveform generator produces mixtures of 1–3 non-smooth periodic signals (Moroshan et al., 2025), targeting asymmetric and discontinuous dynamics absent from GP-based generators. Each component is wt =A·f(2πν t+φ),(11) where f∈ {sawtooth,square,triangle} , amplitude A∼U(0.3,3.0) , frequency ν∼U(1,50) , and phase φ∼U(0,2π) . An amplitude modula...

  17. [17]

    For real–synthetic mixture conditions, three model initialisations are trained per corpus; all single-generator and Mixed11 conditions use a single run. D. Real Reference Corpus: Sampling Details The real reference corpus is drawn from the GIFT-Eval pretraining dataset (Aksu et al., 2024). We sample 1,000,000 univariate windows of length 1024, stratified ...