arxiv: 2512.12116 · v3 · submitted 2025-12-13 · 💻 cs.LG · stat.ML

Recognition: no theorem link

Neural CDEs as Correctors for Learned Time Series Models

Muhammad Bilal Shahid , Zhanhong Jiang , Prajwal Koirala , Soumik Sarkar , Cody Fleming

Authors on Pith no claims yet

Pith reviewed 2026-05-16 23:20 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords neural controlled differential equationstime series forecastingpredictor-corrector frameworkerror accumulationdynamical systemsirregular samplingforecast correction

0 comments

The pith

Neural controlled differential equations correct forecast errors accumulated by learned time-series models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Learned time-series models, whether continuous or discrete, produce multi-step forecasts that degrade due to accumulating errors. This paper introduces a predictor-corrector architecture in which any base forecasting model serves as the predictor while a neural controlled differential equation serves as the corrector that adjusts the trajectory at each step. The corrector accepts irregularly sampled inputs and requires no changes to the underlying predictor. Two regularization terms are added to encourage stable extrapolation and faster convergence during training. Theoretical analysis establishes stability and convergence of the combined system, and experiments on synthetic, physics, and real-world data confirm accuracy gains for multiple base models including neural ODEs, ContiFormer, and DLinear.

Core claim

The Predictor-Corrector framework pairs a learned time-series model that generates multi-step forecasts with a Neural CDE corrector that mitigates error accumulation; the corrector operates on irregular sampling, remains compatible with both continuous- and discrete-time predictors, incorporates regularization for improved extrapolation and training speed, and is supported by stability and convergence guarantees.

What carries the argument

Neural CDE corrector that integrates the residual dynamics between predicted and observed states as a controlled differential equation to adjust the forecast trajectory.

If this is right

Forecasting accuracy improves consistently across diverse base models without requiring predictor-specific modifications.
The framework handles irregularly sampled observations while preserving compatibility with both continuous and discrete predictors.
Regularization yields stable extrapolation beyond the training horizon.
Theoretical guarantees ensure the combined system remains stable and convergent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same corrector structure could be attached to existing deployed forecasting pipelines to extend usable forecast length without retraining the base model.
Hybrid systems that combine the corrector with physics-based simulators may reduce the need for purely data-driven long-horizon modeling.
The approach suggests a general template for adding learned residual dynamics to any sequential predictor that suffers from compounding error.

Load-bearing premise

Once regularized and trained, the Neural CDE corrector will continue to reduce errors on data drawn from distributions different from the training set without introducing new instabilities.

What would settle it

On a new dataset with dynamics outside the training distribution, compare long-horizon forecast error of the base predictor alone against the same predictor paired with the trained Neural CDE corrector; if the corrected version shows equal or higher error, the framework does not deliver the claimed improvement.

Figures

Figures reproduced from arXiv: 2512.12116 by Cody Fleming, Muhammad Bilal Shahid, Prajwal Koirala, Soumik Sarkar, Zhanhong Jiang.

**Figure 2.** Figure 2: The proposed Predictor-Corrector methodology at inference [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The performance of NODE on a test trajectory from FHN. The performance of Pre [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Upper Left: The reduction in MSE (%) of Predictor from timestep 0 up to the timestep indicated on y-axis for different κ values and η = 0. (b) Lower Left: NFEs against epochs during training for different values of κ and η = 0. (c) Upper Right: The reduction in MSE (%) of Predictor from timestep 0 up to the timestep indicated on y-axis for different η values and κ = 1.0 (d) Lower Right: The NFEs agains… view at source ↗

**Figure 5.** Figure 5: The MuJoCo environments inside the simulator [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: The performance of Corrector on one of the trajectories of Pen for a 20% observed points [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: The performance of Corrector on one of the trajectories of Pen for a 20% observed points [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Pareto curves showing the trade-offs between extrapolation and efficiency via the proposed [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Average wall-clock time (in seconds) of an epoch with varying values of [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: The long-horizon tests demonstrating the well-boundedness of the error of the corrected [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: The performance of Corrector on one of the trajectories of the Exchange test dataset. [PITH_FULL_IMAGE:figures/full_fig_p028_11.png] view at source ↗

read the original abstract

Learned time-series models, whether continuous or discrete, are widely used for forecasting the states of dynamical systems but suffer from error accumulation in multi-step forecasts. To address this issue, we propose a Predictor-Corrector framework in which the Predictor is a learned time-series model that generates multi-step forecasts and the Corrector is a neural controlled differential equation that corrects the forecast errors. The Corrector works with irregularly sampled time series and is compatible with both continuous- and discrete-time Predictors. We further introduce two regularization strategies that improve the Corrector's extrapolation performance and accelerate its training. We also provide theoretical guarantees on the stability and convergence of the proposed framework. Experiments on synthetic, physics-based, and real-world datasets show that the proposed framework consistently improves forecasting performance across diverse Predictors, including neural ordinary differential equations, ContiFormer, and DLinear, demonstrating its predictor-agnostic nature.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a Neural CDE can act as a plug-in corrector to cut error buildup in multi-step forecasts from any base predictor, with two regularizers helping extrapolation.

read the letter

The core idea is straightforward: run a standard learned predictor for multi-step forecasts, then feed the accumulating errors into a Neural CDE corrector that works on irregular grids and pairs with both continuous and discrete models. They add two regularizers aimed at better extrapolation and quicker training, plus claims of stability and convergence guarantees. Experiments report consistent gains when the corrector is wrapped around NODEs, ContiFormer, and DLinear on synthetic, physics, and real datasets, which supports the predictor-agnostic claim in practice.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Predictor-Corrector framework for multi-step time series forecasting. A learned model serves as the Predictor to generate forecasts, while a Neural Controlled Differential Equation acts as the Corrector to mitigate accumulated errors. The approach handles irregularly sampled data and is compatible with both continuous- and discrete-time predictors. Two regularization strategies are introduced to improve extrapolation and training efficiency. Theoretical guarantees on stability and convergence are provided, and experiments on synthetic, physics-based, and real-world datasets demonstrate consistent forecasting improvements across predictors including Neural ODEs, ContiFormer, and DLinear, establishing the framework's predictor-agnostic nature.

Significance. If the stability guarantees and empirical gains hold under the stated conditions, the work offers a general, modular method to reduce error accumulation in learned dynamical models without retraining or altering the base predictor. This could meaningfully improve reliability for long-horizon forecasting in scientific and engineering applications, with the combination of theory and broad empirical validation across model classes adding to its potential utility.

major comments (2)

[Theoretical Guarantees] Theoretical Guarantees section: The stability and convergence claims rest on the regularization ensuring the Neural CDE corrector remains well-behaved, yet the analysis does not explicitly address whether these bounds continue to hold for long-horizon forecasts when the predictor's error dynamics deviate from the training distribution (as required for the predictor-agnostic claim). A concrete counter-example or extended proof under distribution shift would strengthen this load-bearing point.
[Experiments] Experiments section (synthetic/physics/real-world results): While consistent gains are reported across predictors, the evaluation does not include targeted long-horizon tests under strong distribution shift; the regularization's effectiveness in preventing new instabilities therefore remains only partially verified, directly impacting the central robustness claim.

minor comments (2)

[Abstract] Abstract: The two regularization strategies are referenced but not named or briefly characterized; adding one sentence describing their form (e.g., Lipschitz penalty or boundedness term) would improve immediate readability.
[Method] Notation and setup: The interface between the Predictor output and the Neural CDE control signal should be formalized with an explicit equation early in the manuscript to avoid ambiguity when readers compare to standard Neural CDE formulations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our Predictor-Corrector framework. We address each major point below, clarifying the scope of our theoretical results and outlining planned revisions to strengthen the empirical validation.

read point-by-point responses

Referee: [Theoretical Guarantees] Theoretical Guarantees section: The stability and convergence claims rest on the regularization ensuring the Neural CDE corrector remains well-behaved, yet the analysis does not explicitly address whether these bounds continue to hold for long-horizon forecasts when the predictor's error dynamics deviate from the training distribution (as required for the predictor-agnostic claim). A concrete counter-example or extended proof under distribution shift would strengthen this load-bearing point.

Authors: Our stability and convergence analysis (Section 4) derives bounds under the assumption that the Neural CDE corrector, regularized for Lipschitz continuity and contraction, keeps the corrected trajectory within a neighborhood where the error dynamics remain controlled. The predictor-agnostic claim holds in the sense that the corrector operates on the observed error signal without requiring knowledge of the predictor's internal structure, provided the regularization prevents instability. We agree that explicit treatment of strong distribution shifts for arbitrarily long horizons would benefit from additional discussion of the assumptions. In the revision we will add a paragraph clarifying these conditions and their relation to the empirical robustness observed across predictors. revision: partial
Referee: [Experiments] Experiments section (synthetic/physics/real-world results): While consistent gains are reported across predictors, the evaluation does not include targeted long-horizon tests under strong distribution shift; the regularization's effectiveness in preventing new instabilities therefore remains only partially verified, directly impacting the central robustness claim.

Authors: The current experiments already span multiple horizons on synthetic, physics, and real-world data with varying noise and sampling irregularities, showing consistent gains. We acknowledge that dedicated long-horizon tests under controlled strong distribution shifts (e.g., predictors trained on disjoint regimes) would provide more direct verification of the regularization. In the revised manuscript we will include such targeted experiments, using increased noise levels and out-of-distribution predictor variants, to further substantiate the robustness claims. revision: yes

Circularity Check

0 steps flagged

Predictor-Corrector framework adds independent Neural CDE corrector without definitional circularity

full rationale

The paper proposes a separate Corrector (Neural CDE) trained to mitigate error accumulation from an existing Predictor (learned time-series model). No equations reduce the claimed forecasting improvements, stability guarantees, or regularization benefits to quantities defined by the same fitted parameters. The framework is explicitly predictor-agnostic and introduces external components rather than re-expressing the Predictor's outputs. Theoretical guarantees and experiments on diverse datasets provide independent content, yielding only minor circularity risk at most from standard self-citation of Neural CDE foundations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that Neural CDEs can be trained to act as stable correctors for arbitrary learned predictors; two regularization parameters are introduced but their specific values are not reported in the abstract.

axioms (1)

domain assumption Neural CDEs can be trained to act as stable correctors for arbitrary learned predictors
Invoked to justify the predictor-agnostic claim and the theoretical stability guarantees.

pith-pipeline@v0.9.0 · 5461 in / 1140 out tokens · 49590 ms · 2026-05-16T23:20:07.749655+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 6 internal anchors

[1]

Time series prediction by chaotic modeling of nonlinear dy- namical systems

Arslan Basharat and Mubarak Shah. Time series prediction by chaotic modeling of nonlinear dy- namical systems. In2009 IEEE 12th international conference on computer vision, pp. 1941–1948. IEEE,

work page 1941
[2]

URLhttps://arxiv.org/abs/1905. 12374. Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the national academy of sciences, 113(15):3932–3937,

work page 1905
[3]

Brits: Bidirectional recurrent imputation for time series.Advances in neural information processing systems, 31,

10 Under review as a conference paper at ICLR 2026 Wei Cao, Dong Wang, Jian Li, Hao Zhou, Lei Li, and Yitan Li. Brits: Bidirectional recurrent imputation for time series.Advances in neural information processing systems, 31,

work page 2026
[4]

Neural Ordinary Differential Equations

URLhttps://arxiv.org/abs/1806.07366. Yuqi Chen, Kan Ren, Yansen Wang, Yuchen Fang, Weiwei Sun, and Dongsheng Li. Contiformer: Continuous-time transformer for irregular time series modeling,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

org/abs/2402.10635

URLhttps://arxiv. org/abs/2402.10635. Kyunghyun Cho, Bart Van Merri¨enboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches.arXiv preprint arXiv:1409.1259,

work page arXiv
[6]

Generating diverse and natural 3d human motions from text

Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li, and Li Cheng. Generating diverse and natural 3d human motions from text. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5152–5161, June 2022a. Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, and Francesc Moreno-Noguer. Multi-person ex- treme motion...

work page doi:10.4249/scholarpedia.1349
[7]

S Mohammad Khansari-Zadeh and Aude Billard

URLhttps://arxiv.org/abs/2106.02039. S Mohammad Khansari-Zadeh and Aude Billard. Learning stable nonlinear dynamical systems with gaussian mixture models.IEEE Transactions on Robotics, 27(5):943–957,

work page arXiv
[8]

Kidger, On neural differential equations (2022), arXiv:2202.02435 [cs.LG]

URLhttps://arxiv.org/abs/ 2202.02435. Patrick Kidger, James Morrill, James Foster, and Terry Lyons. Neural controlled differential equa- tions for irregular time series,

work page arXiv
[9]

URLhttps://arxiv.org/abs/2005.08926. 11 Under review as a conference paper at ICLR 2026 Thorsten Kurth, Shashank Subramanian, Peter Harrington, Jaideep Pathak, Morteza Mardani, David Hall, Andrea Miele, Karthik Kashinath, and Anima Anandkumar. Fourcastnet: Accelerating global high-resolution weather forecasting using adaptive fourier neural operators. InP...

work page arXiv 2005
[10]

Bei Li, Tong Zheng, Rui Wang, Jiahao Liu, Qingyan Guo, Junliang Guo, Xu Tan, Tong Xiao, Jingbo Zhu, Jingang Wang, and Xunliang Cai

URLhttps://arxiv.org/abs/2006.04418. Bei Li, Tong Zheng, Rui Wang, Jiahao Liu, Qingyan Guo, Junliang Guo, Xu Tan, Tong Xiao, Jingbo Zhu, Jingang Wang, and Xunliang Cai. Predictor-corrector enhanced transformers with exponential moving average coefficient learning,

work page arXiv 2006
[11]

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan

URLhttps://arxiv.org/abs/ 2411.03042. Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting.Advances in neural information processing systems, 32,

work page arXiv
[12]

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural net- work: Data-driven traffic forecasting.arXiv preprint arXiv:1707.01926,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Hongyuan Mei and Jason M Eisner

URLhttps://arxiv.org/abs/2410.03159. Hongyuan Mei and Jason M Eisner. The neural hawkes process: A neurally self-modulating multi- variate point process.Advances in neural information processing systems, 30,

work page arXiv
[14]

On the choice of interpolation scheme for neural cdes.Transactions on Machine Learning Research, 2022(9),

James Morrill, Patrick Kidger, Lingyi Yang, and Terry Lyons. On the choice of interpolation scheme for neural cdes.Transactions on Machine Learning Research, 2022(9),

work page 2022
[15]

Latent ODEs for Irregularly-Sampled Time Series

URLhttps://arxiv.org/abs/1907.03907. Muhammad Bilal Shahid and Cody Fleming. Towards robust car following dynamics mod- eling via blackbox models: Methodology, analysis, and recommendations.arXiv preprint arXiv:2402.07139,

work page internal anchor Pith review Pith/arXiv arXiv 1907
[16]

Hopcast: Calibration of autoregressive dynamics models.arXiv preprint arXiv:2501.16587,

Muhammad Bilal Shahid and Cody Fleming. Hopcast: Calibration of autoregressive dynamics models.arXiv preprint arXiv:2501.16587,

work page arXiv
[17]

Louise J Slater, Louise Arnal, Marie-Am ´elie Boucher, Annie Y-Y Chang, Simon Moulds, Conor Murphy, Grey Nearing, Guy Shalev, Chaopeng Shen, Linda Speight, et al

URLhttps://arxiv.org/abs/2402.15656. Louise J Slater, Louise Arnal, Marie-Am ´elie Boucher, Annie Y-Y Chang, Simon Moulds, Conor Murphy, Grey Nearing, Guy Shalev, Chaopeng Shen, Linda Speight, et al. Hybrid forecasting: blending climate predictions with ai models.Hydrology and earth system sciences, 27(9):1865– 1889,

work page arXiv
[18]

Deep unsupervised learning using nonequilibrium thermodynamics

12 Under review as a conference paper at ICLR 2026 Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational conference on machine learn- ing, pp. 2256–2265. pmlr,

work page 2026
[19]

Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U

doi: 10.1109/IROS.2012.6386109. Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U. Balis, Gianluca De Cola, Tristan Deleu, Manuel Goul˜ao, Andreas Kallinteris, Markus Krimmel, Arjun KG, Rodrigo Perez-Vicente, An- drea Pierr ´e, Sander Schulhoff, Jun Jet Tai, Hannah Tan, and Omar G. Younis. Gymnasium: A standard interface for reinforcement learning envi...

work page doi:10.1109/iros.2012.6386109 2012
[20]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

URLhttps://arxiv. org/abs/2407.17032. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural informa- tion processing systems, 30,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Deep Time Series Models: A Comprehensive Survey and Benchmark

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, and Jianmin Wang. Deep time series models: A comprehensive survey and benchmark.arXiv preprint arXiv:2407.13278,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

org/abs/2106.13008

URLhttps://arxiv. org/abs/2106.13008. Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis,

work page arXiv
[23]

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

URLhttps://arxiv. org/abs/2210.02186. Dehe Xu, Qi Zhang, Yan Ding, and De Zhang. Application of a hybrid arima-lstm model based on the spei for drought forecasting.Environmental Science and Pollution Research, 29(3):4128– 4144,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Anomaly transformer: Time series anomaly detection with association discrepancy.arXiv preprint arXiv:2110.02642,

Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Anomaly transformer: Time series anomaly detection with association discrepancy.arXiv preprint arXiv:2110.02642,

work page arXiv
[25]

G Peter Zhang

URLhttps://arxiv.org/abs/2205.13504. G Peter Zhang. Time series forecasting using a hybrid arima and neural network model.Neurocom- puting, 50:159–175,

work page arXiv
[26]

Autohformer: Efficient hierarchical autoregressive transformer for time series prediction

Qianru Zhang, Honggang Wen, Ming Li, Dong Huang, Siu-Ming Yiu, Christian S Jensen, and Pietro Li`o. Autohformer: Efficient hierarchical autoregressive transformer for time series prediction. arXiv preprint arXiv:2506.16001,

work page arXiv
[27]

org/abs/2302.04867

URLhttps://arxiv. org/abs/2302.04867. Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting,

work page arXiv
[28]

13 Under review as a conference paper at ICLR 2026 Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin

URL https://arxiv.org/abs/2012.07436. 13 Under review as a conference paper at ICLR 2026 Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Fre- quency enhanced decomposed transformer for long-term series forecasting,

work page arXiv 2012
[29]

URLhttps: //arxiv.org/abs/2201.12740. 14 Under review as a conference paper at ICLR 2026 A VARIABLELENGTH(η)ANDSPARSE(κ) CONTROLPATHS HYPERPARAMETERS We discussed two regularization methods in section 4.1 and used those for synthetic, physics sim- ulation, and forecasting datasets. This section contains the values of the hyperparameters for both regulariz...

work page arXiv 2026
[30]

Table 9: Multivariate long-term forecasting errors (MSE/MAE; lower is better)

The DLinear with 15 Under review as a conference paper at ICLR 2026 Corrector (w/) consistently improved performance across datasets for each forecast horizon over DLinear without Corrector (w/o) except one case, i.e., Weather (forecast horizon 720), where there is a slight increase in MSE/MAE. Table 9: Multivariate long-term forecasting errors (MSE/MAE; ...

work page arXiv 2026
[31]

with Dormand-Prince 5(4) (Dopri5), an adaptive explicit Runge-Kutta method, using relative/absolute tolerances (rtol= 10 −3,atol= 10 −6). The Contiformer has the following hyperparameters: • Batch size: 64 • Learning rate: 0.001 16 Under review as a conference paper at ICLR 2026 • Encoder: FC(100) 2 • Optimizer:Adam • Heads (H): 4 • Dimension of Key (dk),...

work page 2026
[32]

The parameters and initial conditions are adopted from Shahid & Fleming (2025)

The closed-form expressions of multivariate ODEs are provided in this section. The parameters and initial conditions are adopted from Shahid & Fleming (2025). D.1 LOTKA-VOLTERRA(WANGERSKY,

work page 2025
[33]

dx dt =αx−βxy(6) dy dt =δxy−γy(7) Initial Condition Ranges:x∈[5,20];y∈[5,10] Parameters:α= 1.1;β= 0.4;γ= 0.4;δ= 0.1 2https://github.com/cure-lab/LTSF-Linear 17 Under review as a conference paper at ICLR 2026 D.2 LORENZ(BRUNTON ET AL.,

work page 2026
[34]

The policies were deterministic because the focus of this study is on learning the evolution of states in the system

dS1 dt =J 0 − k1S1S6 1 + (S6/K1)q (13) dS2 dt = 2 k1S1S6 1 + (S6/K1)q −k 2S2(N−S 5)−k 6S2S5 (14) dS3 dt =k 2S2(N−S 5)−k 3S3(A−S 6)(15) dS4 dt =k 3S3(A−S 6)−k 4S4S5 −κ(S 4 −S 7)(16) dS5 dt =k 2S2(N−S 5)−k 4S4S5 −k 6S2S5 (17) dS6 dt =−2 k1S1S6 1 + (S6/K1)q + 2k3S3(A−S 6)−k 5S6 (18) dS7 dt =ψκ(S 4 −S 7)−kS 7 (19) Initial Condition Ranges:S 1 ∈[0.15,1.60];S 2...

work page 2026
[35]

19 Under review as a conference paper at ICLR 2026 0.4 0.2 0.0 Dim

The visualizations demonstrate that the Corrector (trained on the first 50 timesteps) can correct the Predictor up to 200 timesteps, well beyond the training horizon, for such a high-dimensional dynamical system. 19 Under review as a conference paper at ICLR 2026 0.4 0.2 0.0 Dim. 1 0.4 0.2 0.0 Dim. 2 0.2 0.0 0.2 Dim. 3 0.0 0.5 1.0Dim. 4 0.0 0.5 1.0Dim. 5 ...

work page 2026
[36]

The %↓shows the reduction in MSE in percentage

Table 11: Multivariate LTSF errors (MSE/MAE) for forecast horizon 336 of Exchange, ETTm2, ETTh2, & Weather, where the Corrector is trained on the first 50, 100, 150, 200, 250, 300, & 336 timesteps for each setting. The %↓shows the reduction in MSE in percentage. Train Horizon 50 100 150 200 250 300 336 Metrics MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE M...

work page 2026
[37]

The extrapolation columns lists the timesteps for every setting up to which the Corrector brings at least a 3% reduction in MSE of NODE

The interpolation shows the performance of the Corrector for the first 50 timesteps, which corresponds to its training horizon. The extrapolation columns lists the timesteps for every setting up to which the Corrector brings at least a 3% reduction in MSE of NODE. The correc- tor consistently brings a reduction in the MSE of the Predictor irrespective of ...

work page 2026
[38]

For interpolation and extrapolation, the reported MSE values are computed from timestep0up to the specified timestep (t) for each setting (0-t)

with (w/) and without (w/o) Corrector under different interpolation schemes for control paths of Neural CDE. For interpolation and extrapolation, the reported MSE values are computed from timestep0up to the specified timestep (t) for each setting (0-t). Dynamical System (Interpolation) Model Interpolation(% Observed Pts) Extrapolation(% Observed Pts) 20% ...

work page 2022
[39]

This observation is in line with the results reported by Morrill et al. (2022). G.4 SENSITIVITY TOODESOLVERS Thediffraxpackage has different explicit Runga-Kutta (RK) methods. We usedTsit5to re- port results in the paper everywhere else. Here, we analyze the sensitivity of Walker2D results to other solvers from the explicit RK family. The results are repo...

work page 2022
[40]

The adaptive-step size solvers, such asHeun,Dopri5, &Tsit5, performed better than Eulermethod

TheTsit5per- formed better both in interpolation and extrapolation regions compared to other solvers, followed byDopri5. The adaptive-step size solvers, such asHeun,Dopri5, &Tsit5, performed better than Eulermethod. 23 Under review as a conference paper at ICLR 2026 Table 14: MSE of ContiFormer as a Predictor on Walker2D dataset from MuJoCo (Todorov et al.,

work page 2026
[41]

It can be observed that achieving both a small NFE 24 Under review as a conference paper at ICLR 2026 and a large extrapolation horizon is a challenging task

Pareto curves forκ&ηshow the value ofκ&ηright next to each data point, while the x and y axes show the NFE and extrapolation horizon, respectively. It can be observed that achieving both a small NFE 24 Under review as a conference paper at ICLR 2026 and a large extrapolation horizon is a challenging task. The best point is the one that balances both NFE a...

work page 2026
[42]

4 of the paper

It can be seen that smaller values ofκand larger values ofη result in faster training, corroborating the results shown in Fig. 4 of the paper. 25 Under review as a conference paper at ICLR 2026 Figure 9: Average wall-clock time (in seconds) of an epoch with varying values ofκ&η. G.9 LONG-HORIZONSTRESSTESTS We empirically show that the MSE of corrected for...

work page 2026
[43]

Figure 10: The long-horizon tests demonstrating the well-boundedness of the error of the corrected forecasts of the Predictor (NODE) on Lorenz, LV olt, FHN, & Glycolytic

Each data point shows the log(MSE) from timestep 0 to timestep T on the x-axis. Figure 10: The long-horizon tests demonstrating the well-boundedness of the error of the corrected forecasts of the Predictor (NODE) on Lorenz, LV olt, FHN, & Glycolytic. H ADDITIONALRESULTS H.1 ADDITIONALLTSF RESULTS We add two baselines (i.e., MLP and Diffusion (Sohl-Dickste...

work page 2015
[44]

This establishes Neural CDE as a unified Corrector for continuous- and discrete-time Predictors and reg- ularly/irregularly sampled time series

This ablation study was conducted to demonstrate the competitiveness of the Neural CDE corrector against MLP and Diffusion on regularly sampled LTSF datasets. This establishes Neural CDE as a unified Corrector for continuous- and discrete-time Predictors and reg- ularly/irregularly sampled time series. 26 Under review as a conference paper at ICLR 2026 Ta...

work page 2026
[45]

The results of FEDformer with (w/) and without (w/o) our Corrector on ETTm2 are shown

Table 18: Multivariate long-term forecasting errors (MSE/MAE; lower is better). The results of FEDformer with (w/) and without (w/o) our Corrector on ETTm2 are shown. Methods FEDformer (w/) FEDformer (w/o) Metric MSE MAE MSE MAE ETTm2 96 0.189 0.265 0.203 0.287 192 0.250 0.311 0.269 0.328 336 0.315 0.356 0.325 0.366 720 0.415 0.405 0.421 0.415 Avg. 0.292 ...

work page 2026