pith. sign in

arxiv: 2605.16793 · v2 · pith:7ZLTME4Snew · submitted 2026-05-16 · 💻 cs.LG

PULSE: Generative Phase Evolution for Non-Stationary Time Series Forecasting

Pith reviewed 2026-05-21 07:57 UTC · model grok-4.3

classification 💻 cs.LG
keywords non-stationary time seriesforecastingphase evolutionphysics-informed inductive biasWold decompositiongenerative phasedistribution shiftMLP
0
0 comments X

The pith

Formalizing non-stationary dynamics with physical hypotheses enables a simple MLP to achieve state-of-the-art forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to overcome phase amnesia in time series forecasting, where models lose track of evolving global patterns under distribution shifts. It does so by introducing three physical hypotheses that describe how non-stationary series decompose, evolve in phase, and generate varying statistics. These lead to the PULSE framework, which disentangles phases, routes future phase changes, and mixes statistics for robustness. If correct, this means the choice of inductive bias matters more than model complexity for handling real-world shifting data.

Core claim

PULSE resolves the tension in non-stationary forecasting by translating three physical hypotheses into a Disentangle-Evolve-Simulate architecture. Phase-anchored disentanglement prevents optimization interference from trends, the Phase Router generates future trajectories, and Statistic-Aware Mixup ensures robustness to volatility. This allows a plain MLP backbone to deliver state-of-the-art or highly competitive results on twelve benchmarks, demonstrating the value of physics-informed design over architectural sophistication.

What carries the argument

The Phase Router that generates future phase trajectories according to dynamical phase evolution, within the Disentangle-Evolve-Simulate framework.

If this is right

  • A simple multilayer perceptron can match complex models when equipped with the right physical inductive bias.
  • Forecasting systems gain robustness to out-of-distribution changes without added architectural layers.
  • Optimization interference from dominant trends is mitigated by separating phase components.
  • The approach generalizes across diverse real-world time series domains.
  • Training efficiency improves as less complexity is needed for competitive performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This suggests that similar hypothesis-driven designs could benefit other non-stationary tasks like online learning or adaptive control.
  • Testing the framework on synthetic data generated from known phase evolution models would validate the hypotheses directly.
  • Extensions might incorporate additional physical principles, such as conservation laws, to further constrain the generative process.

Load-bearing premise

The three physical hypotheses provide an accurate and sufficient description of non-stationary dynamics that can be directly implemented without creating new optimization problems.

What would settle it

Running PULSE on a dataset where the generated phases from the Phase Router show no correlation with actual observed shifts in the series would disprove the core modeling assumption.

Figures

Figures reproduced from arXiv: 2605.16793 by Fei Wang, Hu Chen, Xinyu Chen, Yangyou Liu, Yuankai Wu, Zezhi Shao.

Figure 1
Figure 1. Figure 1: Motivation and Efficiency of PULSE. disentangle these components to optimize them in their own subspaces. Hypothesis II: Dynamical Phase Evolution. In contrast to classical Fourier assumptions of global, static periodic￾ity (Brigham, 1988), we approach time series from a non￾linear dynamical systems perspective. Real-world signals inherently exhibit non-stationary behaviors, characterized by shifting insta… view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of the proposed PULSE framework. Multivariate Time Series Forecasting (MTSF) aims to pre￾dict future sequences Y ∈ R H×C over a look-ahead hori￾zon H across C variates, given historical observations X ∈ R T ×C of look-back window T. Following the Gener￾alized Decomposition Principle (Hypothesis I), we assume the underlying multivariate process is a superposition of a deterministic stru… view at source ↗
Figure 3
Figure 3. Figure 3: Visual analysis of learned Phase Anchors versus raw data. The data are sourced from the Electricity dataset. ble disentanglement basis is necessary for forecasting under non-stationarity. The Phase Router provides a 2.9% gain by dynamically dispatching temporal patterns, confirming that future structures should be generated rather than copied from history. Finally, SAM and the Statistic-Aware mech￾anism yi… view at source ↗
Figure 4
Figure 4. Figure 4: Plug-and-Play Efficiency of PULSE on the Solar Dataset. We illustrate the performance shift of four backbone models after being equipped with PULSE. The comparison covers MSE, Parameters, and FLOPs, averaged over prediction horizons H ∈ {96, 192, 336, 720}. Arrows indicate clear accuracy im￾provements with modest additional computational overhead. 36 share similar historical patterns and cluster closely in… view at source ↗
Figure 5
Figure 5. Figure 5: Impact of the global period W on model performance across Electricity, ETTh1, ETTh2, and Solar. 12 24 32 48 96 Codebook Size L 0.14 0.16 0.18 MSE Electricity 12 24 32 48 96 Codebook Size L 0.375 0.400 0.425 0.450 ETTh1 12 24 32 48 96 Codebook Size L 0.30 0.35 0.40 ETTh2 12 24 32 48 96 Codebook Size L 0.18 0.19 0.20 0.21 0.22 Solar H=96 H=192 H=336 H=720 [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sensitivity analysis regarding codebook size L across Electricity, ETTh1, ETTh2, and Solar. 4 6 8 12 16 24 Patch size P 0.14 0.16 0.18 0.20 MSE Electricity 4 6 8 12 16 24 Patch size P 0.375 0.400 0.425 0.450 ETTh1 4 6 8 12 16 24 Patch size P 0.30 0.35 0.40 0.45 ETTh2 4 6 8 12 16 24 Patch size P 0.18 0.20 0.22 Solar H=96 H=192 H=336 H=720 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance evaluation under varying router patch sizes P across Electricity, ETTh1, ETTh2, and Solar. trajectory geometry. This creates a localized “Phase Am￾nesia,” depriving the router of the context needed to extrap￾olate trends. Conversely, a larger patch size (e.g., P = 24) encapsulates a complete structural unit (e.g., a full diurnal cycle), enabling the router to operate on information-rich Phase A… view at source ↗
Figure 8
Figure 8. Figure 8: Sensitivity analysis of α on the ETTh2 dataset. The curve illustrates a general phenomenon: a stable robustness plateau in the U-shaped region (α ∈ [0.05, 0.15]), followed by distinct degradation as the prior shifts towards a Uniform distribution (α = 1.0). • The Robustness Plateau (α ≤ 0.25): We observe a wide performance trough where the forecasting error remains minimal and invariant. Within the U-shape… view at source ↗
Figure 9
Figure 9. Figure 9: Frequency-domain visualization of future-side disentanglement. Panel (a) compares the spectra of the future target Y and the deterministic future anchor Ay, while Panel (b) shows the spectrum of the stochastic residual Ry. Ay extracts a low-amplitude deterministic spectral backbone, whereas Ry captures the remaining stochastic variations [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visual analysis of learned Phase Anchors versus raw data on the Electricity dataset via T-SNE. The visualization illustrates that PULSE successfully captures the underlying structural similarities between distinct channels, effectively mapping input sequences X and their corresponding future ground truth Y into a coherent representation space. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visualization of the forecasting results on four representative datasets. From top to bottom, the results are shown on Electricity, Solar, Traffic, and Weather. For each dataset, two representative forecasting cases are displayed side by side, comparing the predicted sequence with the corresponding ground truth. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visualization of the plug-and-play forecasting results (input-96-predict-96) on the Electricity dataset. Each row compares the original performance of DLinear, PatchTST, TimesNet, and iTransformer (left) against the enhanced results after integrating our PULSE framework (right). It is evident that PULSE consistently rectifies the “Phase Amnesia” issue, leading to more precise alignment with the ground tru… view at source ↗
read the original abstract

Time series forecasting under non-stationarity faces a fundamental tension between capturing stable representations and adapting to distribution shifts. Existing methods implicitly rely on static historical assumptions, leading to a critical failure mode we term Phase Amnesia, where models become blind to the evolving global context. To resolve this, we formalize non-stationary dynamics through three physical hypotheses: wold decomposition, dynamical phase evolution, and heteroscedastic manifold generation. These principles inspire PULSE, a physics-informed, plug-and-play framework adopting a Disentangle--Evolve--Simulate design philosophy. Specifically, PULSE utilizes phase-anchored disentanglement to resolve optimization interference caused by dominant trends, employs a Phase Router to actively generate future trajectories, and introduces Statistic-Aware Mixup (SAM) to ensure robustness against out-of-distribution volatility. Empirically, PULSE enables a simple MLP backbone to achieve state-of-the-art or highly competitive performance across 12 real-world benchmarks. This validates that a correct physics-informed inductive bias is far more critical than raw architectural complexity for non-stationary forecasting. The code is available at: https://github.com/Gemost/PULSE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes PULSE, a physics-informed plug-and-play framework for non-stationary time series forecasting. It formalizes the problem via three hypotheses (Wold decomposition, dynamical phase evolution, and heteroscedastic manifold generation) and implements a Disentangle–Evolve–Simulate design with phase-anchored disentanglement, a Phase Router that generates future trajectories, and Statistic-Aware Mixup (SAM) for volatility robustness. The central empirical claim is that these components allow a simple MLP backbone to reach state-of-the-art or highly competitive results on 12 real-world benchmarks, supporting the broader thesis that a correct physics-informed inductive bias outweighs architectural complexity.

Significance. If the performance gains are shown to survive standard controls, ablations, and statistical testing, and if the model components can be demonstrated to follow directly from the stated hypotheses without dominant auxiliary degrees of freedom, the work would provide a valuable existence proof that targeted inductive biases can enable lightweight models to handle distribution shifts effectively. The public code release is a positive factor for reproducibility.

major comments (3)
  1. [§3] §3 (Hypotheses formalization): The extension of classical Wold decomposition (originally for stationary linear processes) to non-stationary dynamical phase evolution is asserted but lacks an explicit derivation showing that the Phase Router’s trajectory-generation mechanism follows necessarily from the hypothesis rather than from an auxiliary generative modeling choice; this weakens the claim that the router supplies a parameter-free physics-informed bias.
  2. [Experiments] Experiments section, main results table: The manuscript states SOTA or competitive performance across 12 benchmarks, yet the provided description supplies no quantitative error bars, statistical significance tests, or full ablation tables isolating the contribution of phase-anchored disentanglement, the Phase Router, and SAM; without these controls it is impossible to rule out that reported gains arise from post-hoc component tuning rather than the hypothesized inductive bias.
  3. [§4.2] §4.2 (Phase Router): The router is described as actively generating future trajectories, but the text does not clarify whether its outputs are produced by a parameter-free procedure derived from the phase-evolution hypothesis or by a learned module whose capacity dominates the performance; this distinction is load-bearing for the central “inductive bias over architecture” argument.
minor comments (2)
  1. [§4.3] Notation for the Statistic-Aware Mixup (SAM) mixing weights is introduced without an explicit equation linking them to the heteroscedastic manifold hypothesis; a short derivation or reference to the relevant equation would improve clarity.
  2. [Figure 2] Figure 2 (architecture diagram) uses several acronyms (Phase Router, SAM, etc.) without a legend; adding a compact legend would aid readers unfamiliar with the new terminology.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. We address each major comment below and describe the revisions we intend to make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Hypotheses formalization): The extension of classical Wold decomposition (originally for stationary linear processes) to non-stationary dynamical phase evolution is asserted but lacks an explicit derivation showing that the Phase Router’s trajectory-generation mechanism follows necessarily from the hypothesis rather than from an auxiliary generative modeling choice; this weakens the claim that the router supplies a parameter-free physics-informed bias.

    Authors: We agree that an explicit derivation would make the connection between the extended Wold hypothesis and the Phase Router more rigorous. The router’s trajectory generation is intended to follow directly from the dynamical phase evolution hypothesis, but the current text presents this link at a high level. In the revision we will add a step-by-step derivation in §3 (or a dedicated appendix) that starts from the non-stationary Wold-style decomposition and shows how the router’s phase-anchored prediction rule is obtained with only the minimal auxiliary assumptions required by the hypothesis. revision: yes

  2. Referee: [Experiments] Experiments section, main results table: The manuscript states SOTA or competitive performance across 12 benchmarks, yet the provided description supplies no quantitative error bars, statistical significance tests, or full ablation tables isolating the contribution of phase-anchored disentanglement, the Phase Router, and SAM; without these controls it is impossible to rule out that reported gains arise from post-hoc component tuning rather than the hypothesized inductive bias.

    Authors: We concur that the absence of error bars, statistical tests, and component-wise ablations limits the strength of the empirical claims. We will expand the Experiments section to report mean and standard deviation over multiple random seeds, include paired statistical significance tests against baselines, and provide full ablation tables that isolate phase-anchored disentanglement, the Phase Router, and SAM. These additions will be placed in the main text or a clearly referenced supplementary table. revision: yes

  3. Referee: [§4.2] §4.2 (Phase Router): The router is described as actively generating future trajectories, but the text does not clarify whether its outputs are produced by a parameter-free procedure derived from the phase-evolution hypothesis or by a learned module whose capacity dominates the performance; this distinction is load-bearing for the central “inductive bias over architecture” argument.

    Authors: The Phase Router is a learned module, yet its architecture and forward pass are deliberately constrained to implement the phase-evolution rule with a small number of parameters that do not dominate the overall model capacity. We will revise §4.2 to state this explicitly, report the exact parameter count of the router relative to the backbone, and include a short argument showing that performance remains competitive even when the router is replaced by a simpler non-learned phase extrapolation, thereby clarifying that the inductive bias, rather than raw capacity, drives the gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation from physical hypotheses to PULSE components.

full rationale

The paper asserts that three physical hypotheses (Wold decomposition, dynamical phase evolution, heteroscedastic manifold generation) formalize non-stationary dynamics and inspire the Disentangle--Evolve--Simulate design of PULSE, including phase-anchored disentanglement, Phase Router, and SAM. No equations, self-definitions, or fitted-parameter renamings are supplied in the abstract that would make any claimed prediction or component equivalent to its inputs by construction. The central claim is empirical (MLP backbone reaches SOTA on 12 benchmarks), which is independent of the motivational hypotheses. No self-citations, uniqueness theorems, or ansatzes smuggled via prior work appear. The framework is therefore self-contained with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 2 invented entities

The central claim rests on three domain assumptions drawn from physics that are introduced without independent empirical or theoretical support in the abstract; no free parameters or new entities with external falsifiability handles are enumerated.

axioms (3)
  • domain assumption Wold decomposition applies to non-stationary time series and separates them into deterministic and stochastic parts without loss of forecasting information
    Invoked to formalize non-stationary dynamics in the abstract.
  • domain assumption Dynamical phase evolution governs how global context shifts over time in non-stationary processes
    Second of the three physical hypotheses used to inspire the framework.
  • domain assumption Heteroscedastic manifold generation accurately models the creation of varying volatility patterns
    Third hypothesis listed as foundational for the approach.
invented entities (2)
  • Phase Amnesia no independent evidence
    purpose: To name the failure mode in which models lose awareness of evolving global context
    Term coined in the abstract to describe the core problem.
  • Phase Router no independent evidence
    purpose: To actively generate future phase trajectories
    New architectural component introduced by the framework.

pith-pipeline@v0.9.0 · 5745 in / 1709 out tokens · 52007 ms · 2026-05-21T07:57:15.068710+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

  1. [1]

    International conference on learning representations , year=

    Reversible instance normalization for accurate time-series forecasting against distribution shift , author=. International conference on learning representations , year=

  2. [2]

    IEEE Transactions on Knowledge and Data Engineering , year=

    Exploring progress in multivariate time series forecasting: Comprehensive benchmarking and heterogeneity analysis , author=. IEEE Transactions on Knowledge and Data Engineering , year=

  3. [3]

    The Thirteenth International Conference on Learning Representations , year=

    FreDF: Learning to Forecast in the Frequency Domain , author=. The Thirteenth International Conference on Learning Representations , year=

  4. [4]

    Proceedings of the 30th ACM international conference on information & knowledge management , pages=

    Adarnn: Adaptive learning and forecasting of time series , author=. Proceedings of the 30th ACM international conference on information & knowledge management , pages=

  5. [5]

    2016 , publisher=

    Information geometry and its applications , author=. 2016 , publisher=

  6. [6]

    Journal of econometrics , volume=

    Generalized autoregressive conditional heteroskedasticity , author=. Journal of econometrics , volume=. 1986 , publisher=

  7. [7]

    Proceedings of the national academy of sciences , volume=

    Discovering governing equations from data by sparse identification of nonlinear dynamical systems , author=. Proceedings of the national academy of sciences , volume=. 2016 , publisher=

  8. [8]

    Proceedings of the Royal Society of London

    The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , author=. Proceedings of the Royal Society of London. Series A: mathematical, physical and engineering sciences , volume=. 1998 , publisher=

  9. [9]

    Oran , title =

    Brigham, E. Oran , title =. 1988 , isbn =

  10. [10]

    1938 , school=

    A study in the analysis of stationary time series , author=. 1938 , school=

  11. [11]

    1995 , publisher=

    The spectral analysis of time series , author=. 1995 , publisher=

  12. [12]

    Journal of the American statistical association , volume=

    Distribution of the estimators for autoregressive time series with a unit root , author=. Journal of the American statistical association , volume=. 1979 , publisher=

  13. [13]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Dish-ts: a general paradigm for alleviating distribution shift in time series forecasting , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  14. [14]

    Advances in Neural Information Processing Systems , volume=

    Adaptive normalization for non-stationary time series forecasting: A temporal slice perspective , author=. Advances in Neural Information Processing Systems , volume=

  15. [15]

    Advances in Neural Information Processing Systems , volume=

    Frequency adaptive normalization for non-stationary time series forecasting , author=. Advances in Neural Information Processing Systems , volume=

  16. [16]

    Advances in Neural Information Processing Systems , volume=

    DDN: Dual-domain dynamic normalization for non-stationary time series forecasting , author=. Advances in Neural Information Processing Systems , volume=

  17. [17]

    International conference on machine learning , pages=

    Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting , author=. International conference on machine learning , pages=. 2022 , organization=

  18. [18]

    The eleventh international conference on learning representations , year=

    Micn: Multi-scale local and global context modeling for long-term series forecasting , author=. The eleventh international conference on learning representations , year=

  19. [19]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Are transformers effective for time series forecasting? , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  20. [20]

    The Twelfth International Conference on Learning Representations , year=

    TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting , author=. The Twelfth International Conference on Learning Representations , year=

  21. [21]

    Advances in Neural Information Processing Systems , volume=

    Parsimony or capability? decomposition delivers both in long-term time series forecasting , author=. Advances in Neural Information Processing Systems , volume=

  22. [22]

    ICLR 2025: The Thirteenth International Conference on Learning Representations , year=

    TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis , author=. ICLR 2025: The Thirteenth International Conference on Learning Representations , year=

  23. [23]

    The Eleventh International Conference on Learning Representations , year=

    TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis , author=. The Eleventh International Conference on Learning Representations , year=

  24. [24]

    The Twelfth International Conference on Learning Representations , year=

    Periodicity decoupling framework for long-term series forecasting , author=. The Twelfth International Conference on Learning Representations , year=

  25. [25]

    Advances in Neural Information Processing Systems , volume=

    Cyclenet: Enhancing time series forecasting through modeling periodic patterns , author=. Advances in Neural Information Processing Systems , volume=

  26. [26]

    2025 , booktitle=

    Temporal Query Network for Efficient Multivariate Time Series Forecasting , author=. 2025 , booktitle=

  27. [27]

    The Eleventh International Conference on Learning Representations , year=

    A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author=. The Eleventh International Conference on Learning Representations , year=

  28. [28]

    The Twelfth International Conference on Learning Representations , year=

    ITransformer: Inverted Transformers Are Effective for Time Series Forecasting , author=. The Twelfth International Conference on Learning Representations , year=

  29. [29]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Msgnet: Learning multi-scale inter-series correlations for multivariate time series forecasting , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  30. [30]

    Advances in Neural Information Processing Systems , volume=

    Timexer: Empowering transformers for time series forecasting with exogenous variables , author=. Advances in Neural Information Processing Systems , volume=

  31. [31]

    Forty-second International Conference on Machine Learning , year=

    CFPT: Empowering Time Series Forecasting through Cross-Frequency Interaction and Periodic-Aware Timestamp Modeling , author=. Forty-second International Conference on Machine Learning , year=

  32. [32]

    Advances in neural information processing systems , volume=

    Non-stationary transformers: Exploring the stationarity in time series forecasting , author=. Advances in neural information processing systems , volume=

  33. [33]

    The eleventh international conference on learning representations , year=

    Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting , author=. The eleventh international conference on learning representations , year=

  34. [34]

    2007 , publisher=

    Time series analysis , author=. 2007 , publisher=

  35. [35]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Informer: Beyond efficient transformer for long sequence time-series forecasting , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  36. [36]

    Advances in neural information processing systems , volume=

    Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting , author=. Advances in neural information processing systems , volume=

  37. [37]

    Advances in Neural Information Processing Systems , volume=

    Scinet: Time series modeling and forecasting with sample convolution and interaction , author=. Advances in Neural Information Processing Systems , volume=

  38. [38]

    Cours d'

    Pareto, Vilfredo , volume=. Cours d'. 1964 , publisher=

  39. [39]

    Journal of machine learning research , volume=

    Visualizing data using t-SNE , author=. Journal of machine learning research , volume=

  40. [40]

    The 41st international ACM SIGIR conference on research & development in information retrieval , pages=

    Modeling long-and short-term temporal patterns with deep neural networks , author=. The 41st international ACM SIGIR conference on research & development in information retrieval , pages=

  41. [41]

    Advances in neural information processing systems , volume=

    Pytorch: An imperative style, high-performance deep learning library , author=. Advances in neural information processing systems , volume=

  42. [42]

    CoRR , year=

    Adam: A Method for Stochastic Optimization , author=. CoRR , year=

  43. [43]

    Transactions on Machine Learning Research , year=

    Chronos: Learning the Language of Time Series , author=. Transactions on Machine Learning Research , year=

  44. [44]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  45. [45]

    Forty-second International Conference on Machine Learning , year=

    TimeBase: The Power of Minimalism in Efficient Long-term Time Series Forecasting , author=. Forty-second International Conference on Machine Learning , year=

  46. [46]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=

  47. [47]

    International Conference on Learning Representations , year=

    mixup: Beyond Empirical Risk Minimization , author=. International Conference on Learning Representations , year=

  48. [48]

    International conference on machine learning , pages=

    Manifold mixup: Better representations by interpolating hidden states , author=. International conference on machine learning , pages=. 2019 , organization=

  49. [49]

    The Fourteenth International Conference on Learning Representations , year=

    PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting , author=. The Fourteenth International Conference on Learning Representations , year=

  50. [50]

    Xu Liu and Yutong Xia and Yuxuan Liang and Junfeng Hu and Yiwei Wang and LEI BAI and Chao Huang and Zhenguang Liu and Bryan Hooi and Roger Zimmermann , booktitle=. Large. 2023 , url=

  51. [51]

    and Sheng, Zhenli and Yang, Bin , title =

    Qiu, Xiangfei and Hu, Jilin and Zhou, Lekui and Wu, Xingjian and Du, Junyang and Zhang, Buang and Guo, Chenjuan and Zhou, Aoying and Jensen, Christian S. and Sheng, Zhenli and Yang, Bin , title =. Proc. VLDB Endow. , month = may, pages =. 2024 , issue_date =. doi:10.14778/3665844.3665863 , abstract =