pith. sign in

arxiv: 1907.01552 · v1 · pith:5ZKO6K6Znew · submitted 2019-07-02 · 📊 stat.ML · cs.LG

Forecasting high-dimensional dynamics exploiting suboptimal embeddings

Pith reviewed 2026-05-25 11:19 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords delay embeddingtime series forecastingcombinatorial optimizationnonlinear dynamicshigh-dimensional dataensemble forecastingsuboptimal embeddingsmultivariate time series
0
0 comments X

The pith

A forecasting framework selects suboptimal delay embeddings via combinatorial optimization to combine diverse predictions for high-dimensional time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method for forecasting nonlinear multivariate time series that reconstructs dynamics through delay embeddings. Instead of random selection or exhaustive search, it uses combinatorial optimization to identify multiple suboptimal embeddings that minimize in-sample error. These embeddings are then used to generate separate forecasts whose combination yields a single prediction. The approach is shown to outperform prior frameworks on both synthetic examples and a real flood dataset while remaining applicable across varying data lengths and dimensions.

Core claim

Delay embedding is widely used to forecast nonlinear time series as a model-free approach. When multivariate time series are observed, several existing frameworks can be applied to yield a single forecast combining multiple forecasts derived from various embeddings. However, the performance of these frameworks is not always satisfactory because they randomly select embeddings or use brute force and do not consider the diversity of the embeddings to combine. The framework exploits various suboptimal embeddings obtained by minimizing the in-sample error via combinatorial optimization.

What carries the argument

Combinatorial optimization that identifies multiple suboptimal embeddings by minimizing in-sample error, enabling their forecasts to be combined while preserving diversity.

If this is right

  • The framework produces superior accuracy compared with existing embedding-combination methods on both toy and real-world data.
  • It remains effective across a wide range of data lengths and dimensions.
  • It can be applied to forecasting tasks in neuroscience, ecology, finance, fluid dynamics, weather, and disaster prevention.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the optimization step reliably avoids overfitting, the same selection principle could be tested on other ensemble methods that rely on reconstructed state spaces.
  • The emphasis on suboptimal rather than globally optimal embeddings suggests that deliberate diversity may be more important than individual embedding quality for ensemble stability.
  • Extending the approach to streaming data would require checking whether the combinatorial step can be updated incrementally without losing the reported performance gains.

Load-bearing premise

Embeddings chosen to minimize in-sample error will stay sufficiently diverse and continue to generalize when their forecasts are averaged, without the selection process causing overfitting that hurts out-of-sample accuracy.

What would settle it

On a held-out dataset the combined forecast from the optimized suboptimal embeddings performs no better than, or worse than, a single best embedding or a random-selection ensemble.

Figures

Figures reproduced from arXiv: 1907.01552 by Kazuyuki Aihara, Shunya Okuno, Yoshito Hirata.

Figure 1
Figure 1. Figure 1: Schematic of the proposed forecasting procedure. We prepare a pool of suboptimal embeddings in the first step. We solve K combinatorial optimization problems to obtain various embeddings in this step. Next, we pick ˆkp embeddings to minimize the error of the combined forecast in the second step. We combine the forecasts obtained by the ˆkp embeddings to test the time series. First step: Preparing suboptima… view at source ↗
Figure 2
Figure 2. Figure 2: Forecast performance with the Lorenz’96I model: comparisons of the performance (a) up to 10 steps ahead with the fixed data length (4000) without noise, (b) with different values of the data length, and (c) with different scales of observational noise. We computed the RMSE of the five-steps-ahead forecasts with randomly distributed embedding (RDE), multiview embedding (MVE), state-dependent weighting (SDW)… view at source ↗
Figure 3
Figure 3. Figure 3: Forecast profile of the Lorenz’96I model with the data length of 4000. Panel (a) shows the proportion of embedding of the proposed forecasts. The color indicates the proportion of embedding for each variable averaged over the number of combined forecasts for each step. Note that for each prediction step, the sum of the proportions of all variables is one. Variables x0,..., x4 are the variables of the 10-di… view at source ↗
Figure 4
Figure 4. Figure 4: Forecast performance with the Kuramoto–Sivashinsky equations: comparisons of the performance (a) up to 10 steps ahead with the fixed data length (4000) without noise, (b) with different values of the data length, and (c) with different scales of observational noise. We computed the RMSE of the five-steps-ahead forecasts with randomly distributed embedding (RDE), multiview embedding (MVE), state-dependent w… view at source ↗
Figure 5
Figure 5. Figure 5: Forecast performance with the Lorenz’96I model for various numbers of variables: cases where (a) a half of the variables are substituted with random walks and (b) all variables are available. We computed the RMSE of the five-steps-ahead forecasts with randomly distributed embedding (RDE), multiview embedding (MVE), state-dependent weighting (SDW), single-best embedding based on the (µ +λ)-ES algorithm (SBE… view at source ↗
Figure 6
Figure 6. Figure 6: Forecast results for the flood dataset. Panel (a) shows a comparison of the ground truth and the proposed 24-h-ahead forecast. The proposed forecast did not underestimate the maximum river stage, which is the maximum value of the whole dataset. Panel (b) shows a comparison of the in-sample and test errors of the ensemble members. The results demonstrate the difficulty of selecting the best forecast because… view at source ↗
Figure 1
Figure 1. Figure 1: Effect of multiple optimizations: the proportion of embedding of the proposed forecasts with (a) K times optimization and (b) a single optimization to minimize the whole in-sample error. The color indicates the proportion of embedding for each variable averaged over the number of combined forecasts for each step. Panels (c) and (d) show the relation between the number of combined forecasts and the five-ste… view at source ↗
Figure 2
Figure 2. Figure 2: Forecast performance for low-dimensional datasets: RSMEs for (a) the Lorenz’63 dataset, (b) the Rössler dataset, and (c) the six-dimensional Lorenz’96I dataset. We compared the performance of multiview embedding (MVE), state-dependent weighting (SDW), and single-best embedding based on the (µ +λ)-ES algorithm (SBE). These tests were carried out with 20 datasets generated with different random initial condi… view at source ↗
read the original abstract

Delay embedding---a method for reconstructing dynamical systems by delay coordinates---is widely used to forecast nonlinear time series as a model-free approach. When multivariate time series are observed, several existing frameworks can be applied to yield a single forecast combining multiple forecasts derived from various embeddings. However, the performance of these frameworks is not always satisfactory because they randomly select embeddings or use brute force and do not consider the diversity of the embeddings to combine. Herein, we develop a forecasting framework that overcomes these existing problems. The framework exploits various "suboptimal embeddings" obtained by minimizing the in-sample error via combinatorial optimization. The framework achieves the best results among existing frameworks for sample toy datasets and a real-world flood dataset. We show that the framework is applicable to a wide range of data lengths and dimensions. Therefore, the framework can be applied to various fields such as neuroscience, ecology, finance, fluid dynamics, weather, and disaster prevention.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a delay-embedding forecasting framework for high-dimensional nonlinear time series. It selects a collection of 'suboptimal' embeddings by combinatorial optimization that minimizes in-sample prediction error, then combines the individual forecasts. The central empirical claim is that this procedure yields the best performance among compared frameworks on several toy datasets and one real-world flood dataset, and remains applicable across a range of data lengths and dimensions.

Significance. If the out-of-sample gains are shown to be robust, the approach supplies a concrete heuristic for trading off embedding quality against diversity in ensemble delay-coordinate forecasting. The method is parameter-light once the combinatorial search is defined and could be directly useful in the application domains listed in the abstract.

major comments (3)
  1. [§4] §4 (Experimental results): the manuscript asserts superior performance on toy and flood data, yet provides no description of the train/test split procedure, the size of the combinatorial search space, the number of embeddings retained, cross-validation, or error bars. Without these controls it is impossible to assess whether the reported gains survive the in-sample optimization step highlighted in the skeptic note.
  2. [§3.2] §3.2 (Embedding selection): the optimization objective is defined solely on in-sample error with no explicit diversity penalty or out-of-sample guardrail. Because the central claim requires that the selected embeddings remain sufficiently uncorrelated on unseen data, the absence of any post-selection diversity diagnostic (e.g., pairwise correlation of forecast residuals on the test set) is load-bearing for the generalization argument.
  3. [Table 1, Table 2] Table 1 (toy datasets) and Table 2 (flood dataset): the reported performance numbers are given without the corresponding baseline implementations, hyper-parameter settings, or statistical tests. This prevents verification that the claimed superiority is not an artifact of unequal tuning effort.
minor comments (2)
  1. [Abstract, §1] The abstract states that the framework 'achieves the best results' but does not name the competing frameworks or the precise error metric; this should be clarified in the abstract and §1.
  2. [§2] Notation for the delay vector and the combinatorial objective is introduced without an explicit equation number; adding an equation label would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and indicate planned revisions to improve clarity and verifiability.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental results): the manuscript asserts superior performance on toy and flood data, yet provides no description of the train/test split procedure, the size of the combinatorial search space, the number of embeddings retained, cross-validation, or error bars. Without these controls it is impossible to assess whether the reported gains survive the in-sample optimization step highlighted in the skeptic note.

    Authors: We agree that the experimental section requires additional detail for reproducibility and assessment of robustness. In the revised manuscript we will add explicit descriptions of the train/test split (last 30% held out for testing), the combinatorial search space size, the number of embeddings retained, any cross-validation procedure, and error bars computed over multiple runs. These additions will directly address whether gains persist beyond the in-sample optimization. revision: yes

  2. Referee: [§3.2] §3.2 (Embedding selection): the optimization objective is defined solely on in-sample error with no explicit diversity penalty or out-of-sample guardrail. Because the central claim requires that the selected embeddings remain sufficiently uncorrelated on unseen data, the absence of any post-selection diversity diagnostic (e.g., pairwise correlation of forecast residuals on the test set) is load-bearing for the generalization argument.

    Authors: The method intentionally selects suboptimal embeddings via in-sample error to induce diversity without an explicit penalty term. To strengthen the generalization argument we will add, in the revision, a post-selection diagnostic reporting pairwise correlations of forecast residuals on the test set for the retained embeddings. This will provide direct evidence on out-of-sample uncorrelatedness. revision: partial

  3. Referee: [Table 1, Table 2] Table 1 (toy datasets) and Table 2 (flood dataset): the reported performance numbers are given without the corresponding baseline implementations, hyper-parameter settings, or statistical tests. This prevents verification that the claimed superiority is not an artifact of unequal tuning effort.

    Authors: We will expand the revised manuscript with an appendix or table listing the exact hyper-parameter settings used for every baseline, following the original papers' implementations. We will also add statistical significance tests (e.g., paired t-tests) to Tables 1 and 2 to quantify performance differences. revision: yes

Circularity Check

0 steps flagged

No circularity; optimization objective and out-of-sample evaluation are independent

full rationale

The framework selects embeddings by minimizing in-sample error via combinatorial optimization and then evaluates combined forecasts on held-out test data for both toy and flood datasets. No equation or step reduces the claimed out-of-sample performance to the in-sample fit by construction, no self-citation is invoked as a load-bearing uniqueness theorem, and no ansatz is smuggled in. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach rests on the standard delay-embedding reconstruction theorem and the assumption that in-sample error minimization yields useful out-of-sample diversity; no explicit free parameters, ad-hoc axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5685 in / 960 out tokens · 29736 ms · 2026-05-25T11:19:32.053895+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    & Krogh’", A

    Sollich, P. & Krogh’", A. Learning with ensembles: how over-fitting can be useful. In Advances in neural information processing systems, 190–196 (1996)

  2. [2]

    Kuncheva, L. I. & Whitaker, C. J. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy. Mach. Learn. 51, 181–207, DOI: 10.1023/A:1022859003006 (2003)

  3. [3]

    Lorenz, E. N. Deterministic nonperiodic flow. J. Atmospheric Sci. 20, 130–141 (1963)

  4. [4]

    Rössler, O. E. An equation for continuous chaos. Phys. Lett. A 57, 397–398 (1976)

  5. [5]

    Lorenz, E. N. Predictability: a problem partly solved. In Seminar on Predictability, 1–18 (ECMWF, Reading, England, 1996)

  6. [6]

    & Hirata, Y

    Okuno, S., Aihara, K. & Hirata, Y . Combining multiple forecasts for multivariate time series via state-dependent weighting. Chaos: An Interdiscip. J. Nonlinear Sci. 29, 33128, DOI: 10.1063/1.5057379 (2019)

  7. [7]

    & Tsuzuki, T

    Kuramoto, Y . & Tsuzuki, T. Persistent Propagation of Concentration Waves in Dissipative Media Far from Thermal Equilibrium. Prog. Theor. Phys. 55, 356–369, DOI: 10.1143/PTP.55.356 (1976)

  8. [8]

    Sivashinsky, G. I. Nonlinear analysis of hydrodynamic instability in laminar flames-I. Derivation of basic equations. Acta Astronaut. 4, 1177–1206, DOI: 10.1016/0094-5765(77)90096-0 (1977)

  9. [9]

    Dawson, C. et al. A comparative study of artificial neural network techniques for river stage forecasting. In Proceedings of the International Joint Conference on Neural Networks , vol. 4, 2666–2670, DOI: 10.1109/IJCNN.2005.1556324 (IEEE, Montreal, Canada, 2005)

  10. [10]

    Lorenz, E. N. Atmospheric predictability as revealed by naturally occurring analogues. J. Atmospheric Sci. 26, 636–646 (1969)

  11. [11]

    Farmer, J. D. & Sidorowich, J. J. Predicting chaotic time series. Phys. Rev. Lett. 59, 845–848, DOI: 10.1103/PhysRevLett. 59.845 (1987)

  12. [12]

    Approximating high-dimensional dynamics by barycentric coordinates with linear programming

    Hirata, Y .et al. Approximating high-dimensional dynamics by barycentric coordinates with linear programming. Chaos 25, 013114, DOI: 10.1063/1.4906746 (2014). 4/4