DSPR: Dual-Stream Physics-Residual Networks for Trustworthy Industrial Time Series Forecasting

Guoqing Wang; Pengwei Yang; Tianyu Li; Yeran Zhang

arxiv: 2604.07393 · v3 · pith:3YKV7I2Hnew · submitted 2026-04-08 · 💻 cs.LG · cs.AI

DSPR: Dual-Stream Physics-Residual Networks for Trustworthy Industrial Time Series Forecasting

Yeran Zhang , Pengwei Yang , Guoqing Wang , Tianyu Li This is my paper

Pith reviewed 2026-05-21 09:54 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords industrial time series forecastingphysics-guided neural networksdynamic graphsregime shiftstransport delaysresidual learningphysical plausibilitytrustworthy forecasting

0 comments

The pith

Dual-stream networks separate stable patterns from physics-guided residuals to forecast industrial time series with high physical consistency under regime shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes DSPR to handle forecasting in industrial settings where conditions change and physical rules must hold. It splits the model into one stream that captures ordinary statistical changes over time and a second stream that isolates the leftover dynamics using an adaptive window to detect transport delays and a physics-guided graph to track changing interactions. This separation aims to cut spurious links while keeping conservation laws intact. A sympathetic reader would care because real industrial systems often fail when models ignore delays or physical structure, leading to unsafe control decisions.

Core claim

The central claim is that explicitly decoupling statistical temporal evolution of individual variables from regime-dependent residual dynamics, implemented through an Adaptive Window module that estimates flow-dependent transport delays and a Physics-Guided Dynamic Graph that incorporates physical priors to learn time-varying interaction structures while suppressing spurious correlations, produces state-of-the-art forecasting accuracy and robustness on four industrial benchmarks while delivering Mean Conservation Accuracy above 99 percent and Total Variation Ratio up to 97.2 percent.

What carries the argument

The dual-stream architecture in which the physics-residual stream uses an Adaptive Window to estimate transport delays and a Physics-Guided Dynamic Graph to model time-varying physical interactions from priors.

Load-bearing premise

The approach assumes that physical priors can be turned into a dynamic graph that accurately learns real interaction structures and transport delays from data without adding new errors or biases.

What would settle it

If retraining DSPR on the same four industrial benchmarks yields conservation accuracy below 95 percent or higher forecast error than a standard recurrent network on at least two regimes, the decoupling benefit would be refuted.

Figures

Figures reproduced from arXiv: 2604.07393 by Guoqing Wang, Pengwei Yang, Tianyu Li, Yeran Zhang.

**Figure 2.** Figure 2: Overview of the proposed DSPR framework. a) Overall Architecture: Decouples dynamics into statistical patterns and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Regime adaptation visualization (𝐿 = 24, 𝐻 = 24). Under High-Load transients (c), statistical baselines (TimeMixer, PatchTST) exhibit significant phase lag. DSPR (red) aligns tightly with ground truth, demonstrating that the Physics-Residual stream successfully adapts effective transport delays [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Fidelity validation on SCR dataset. DSPR (red) main [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: Mechanism identification map. In the SDWPF tur [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 5.** Figure 5: Mechanism recovery in SCR. Distributions of [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 7.** Figure 7: Closed-loop response comparison over 4-hour win [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

read the original abstract

Accurate forecasting of industrial time series requires balancing predictive accuracy with physical plausibility under non-stationary operating conditions. Existing data-driven models often achieve strong statistical performance but struggle to respect regime-dependent interaction structures and transport delays inherent in real-world systems. To address this challenge, we propose DSPR (Dual-Stream Physics-Residual Networks), a forecasting framework that explicitly decouples stable temporal patterns from regime-dependent residual dynamics. The first stream models the statistical temporal evolution of individual variables. The second stream focuses on residual dynamics through two key mechanisms: an Adaptive Window module that estimates flow-dependent transport delays, and a Physics-Guided Dynamic Graph that incorporates physical priors to learn time-varying interaction structures while suppressing spurious correlations. Experiments on four industrial benchmarks spanning heterogeneous regimes demonstrate that DSPR consistently improves forecasting accuracy and robustness under regime shifts while maintaining strong physical plausibility. It achieves state-of-the-art predictive performance, with Mean Conservation Accuracy exceeding 99% and Total Variation Ratio reaching up to 97.2%. Beyond forecasting, the learned interaction structures and adaptive lags provide interpretable insights that are consistent with known domain mechanisms, such as flow-dependent transport delays and wind-to-power scaling behaviors. These results suggest that architectural decoupling with physics-consistent inductive biases offers an effective path toward trustworthy industrial time-series forecasting. Furthermore, DSPR's demonstrated robust performance in long-term industrial deployment bridges the gap between advanced forecasting models and trustworthy autonomous control systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DSPR splits forecasting into a stable statistical stream and a residual stream with an adaptive window plus physics-guided dynamic graph, but the high plausibility scores do not directly confirm those modules recover real physical structures.

read the letter

The main thing to know is that this paper puts forward a dual-stream architecture for industrial time series forecasting that keeps a standard statistical stream separate from a residual stream equipped with an adaptive window for transport delays and a physics-guided dynamic graph for time-varying interactions. It reports good accuracy and very high physical plausibility scores on industrial benchmarks, but the evidence that the new modules are truly embedding physical priors is indirect at best.

Referee Report

3 major / 2 minor

Summary. The paper introduces DSPR, a dual-stream architecture for industrial time series forecasting. The first stream models statistical temporal evolution of variables, while the second stream addresses residual dynamics via an Adaptive Window that estimates flow-dependent transport delays and a Physics-Guided Dynamic Graph that incorporates physical priors to learn time-varying interaction structures and suppress spurious correlations. Experiments on four industrial benchmarks under heterogeneous regimes claim state-of-the-art predictive performance, robustness to regime shifts, and high physical plausibility with Mean Conservation Accuracy exceeding 99% and Total Variation Ratio up to 97.2%. The learned structures are said to yield interpretable insights consistent with known domain mechanisms such as flow-dependent delays and wind-to-power scaling.

Significance. If the results and physical consistency hold, the work could meaningfully advance trustworthy forecasting in industrial settings by combining data-driven accuracy with physics-based inductive biases. The explicit decoupling of streams and the focus on regime-dependent residuals address real limitations of black-box models in non-stationary environments, with potential benefits for long-term deployment and autonomous control systems. The interpretability of learned interactions is a positive feature if independently validated.

major comments (3)

[Experiments] Experiments section: The reported Mean Conservation Accuracy (>99%) and Total Variation Ratio (up to 97.2%) are aggregate plausibility scores, but the manuscript provides no ground-truth interaction matrices, known physical delay values, or controlled ablation isolating the Physics-Guided Dynamic Graph and Adaptive Window from the dual-stream architecture alone. This leaves open whether the metrics validate the physics components or can be satisfied without them matching actual mechanisms.
[§3.2] §3.2 (Physics-Guided Dynamic Graph): The mechanism for incorporating physical priors to learn time-varying structures while suppressing spurious correlations is described conceptually but lacks an explicit formulation showing how priors are enforced independently of the conservation metrics. Without this or an ablation demonstrating its isolated contribution, the claim that it reliably embeds physical priors rests on indirect evidence.
[Adaptive Window module] Adaptive Window module: The assertion that this component accurately recovers flow-dependent transport delays from data alone is central to the residual stream but is supported only by overall forecasting metrics rather than direct recovery tests against known physical lags in the benchmarks.

minor comments (2)

[Abstract] The abstract would benefit from briefly naming the four industrial benchmarks and the specific baseline models used for comparison.
[Methods] Notation for the dynamic graph weights and adaptive window parameters should be introduced with explicit equations in the methods section for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the insightful comments on our work. We address each of the major comments in detail below, providing clarifications and indicating where we will make revisions to strengthen the manuscript.

read point-by-point responses

Referee: Experiments section: The reported Mean Conservation Accuracy (>99%) and Total Variation Ratio (up to 97.2%) are aggregate plausibility scores, but the manuscript provides no ground-truth interaction matrices, known physical delay values, or controlled ablation isolating the Physics-Guided Dynamic Graph and Adaptive Window from the dual-stream architecture alone. This leaves open whether the metrics validate the physics components or can be satisfied without them matching actual mechanisms.

Authors: We appreciate this observation. The industrial benchmarks used in our experiments are real-world datasets that do not come with ground-truth interaction matrices or precise physical delay annotations. To demonstrate the contribution of the physics components, we include ablation studies in Section 4.3 comparing the full DSPR model against variants without the Physics-Guided Dynamic Graph and without the Adaptive Window. These results show that removing these components leads to degraded performance in both forecasting accuracy and plausibility metrics. Additionally, we provide qualitative analysis showing that the learned structures align with domain knowledge. In the revised manuscript, we will expand the ablation studies with more controlled experiments and include a new subsection on synthetic data with known ground-truth to further validate the physics modules. revision: partial
Referee: §3.2 (Physics-Guided Dynamic Graph): The mechanism for incorporating physical priors to learn time-varying structures while suppressing spurious correlations is described conceptually but lacks an explicit formulation showing how priors are enforced independently of the conservation metrics. Without this or an ablation demonstrating its isolated contribution, the claim that it reliably embeds physical priors rests on indirect evidence.

Authors: We agree that a more explicit formulation would enhance clarity. The Physics-Guided Dynamic Graph incorporates physical priors through a combination of graph regularization terms and constraints derived from conservation laws, which are enforced via additional loss components separate from the main conservation accuracy metric. We will add the mathematical formulation in the revised §3.2, detailing the prior enforcement mechanism. We will also include a dedicated ablation study isolating this module's contribution to the overall performance. revision: yes
Referee: Adaptive Window module: The assertion that this component accurately recovers flow-dependent transport delays from data alone is central to the residual stream but is supported only by overall forecasting metrics rather than direct recovery tests against known physical lags in the benchmarks.

Authors: We acknowledge the need for more direct validation. Since the benchmarks lack explicit known physical lags, direct recovery tests against ground truth are not feasible for these datasets. However, we support the claim through case studies where the estimated delays correspond to expected physical behaviors, such as longer delays at lower flow rates. To provide stronger evidence, we will add experiments using synthetic time series with injected known transport delays to directly evaluate the Adaptive Window's recovery accuracy. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained with empirical support

full rationale

The abstract describes a dual-stream architecture incorporating an Adaptive Window and Physics-Guided Dynamic Graph to model residuals and interactions, with results reported via custom metrics on four benchmarks. No equations, definitions, or self-citations are provided that reduce the claimed predictions or plausibility scores to inputs by construction. The metrics (Mean Conservation Accuracy, Total Variation Ratio) are presented as evaluation outcomes rather than tautological redefinitions of the model's inductive biases. The central claims rest on experimental improvements under regime shifts, which are falsifiable against external data and do not rely on load-bearing self-citations or ansatzes imported from prior author work. This is the expected outcome for an architecture paper whose value is in the proposed decoupling rather than a closed mathematical derivation.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on several learned neural components and domain assumptions about physical priors being sufficient to guide graph structure without additional validation.

free parameters (2)

Adaptive Window parameters
Parameters estimating flow-dependent transport delays are learned from data.
Dynamic graph weights
Weights in the Physics-Guided Dynamic Graph are fitted during training.

axioms (1)

domain assumption Physical priors can be incorporated via a dynamic graph to learn time-varying interaction structures while suppressing spurious correlations.
Invoked in the description of the Physics-Guided Dynamic Graph module.

pith-pipeline@v0.9.0 · 5791 in / 1263 out tokens · 45253 ms · 2026-05-21T09:54:55.867345+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

[1]

Tom Beucler, Michael Pritchard, Stephan Rasp, Jordan Ott, Pierre Baldi, and Pierre Gentine. 2021. Enforcing Analytic Constraints in Neural Networks Emulating Physical Systems.Physical Review Letters126, 9 (2021). http://dx.doi.org/10. 1103/PhysRevLett.126.098302

work page 2021
[2]

Brunton, Joshua L

Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. 2016. Sparse Identi- fication of Nonlinear Dynamics with Control (SINDYc).IFAC-PapersOnLine49, 18 (2016), 710–715. doi:10.1016/j.ifacol.2016.10.249 10th IFAC Symposium on Nonlinear Control Systems NOLCOS 2016

work page doi:10.1016/j.ifacol.2016.10.249 2016
[3]

Shengze Cai, Zhiping Mao, Zhicheng Wang, Minglang Yin, and George Em Karniadakis. 2021. Physics-informed neural networks (PINNs) for fluid mechanics: A review. arXiv:2105.09506 [physics.flu-dyn] https://arxiv.org/abs/2105.09506

work page arXiv 2021
[4]

Wanlin Cai, Yuxuan Liang, Xianggen Liu, Jianshuai Feng, and Yuankai Wu. 2023. MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting.arXiv preprint arXiv:2401.00423(2023)

work page arXiv 2023
[5]

Camacho and C

E.F. Camacho and C. Bordons. 2007.Model Predictive Control(2nd ed.). Springer

work page 2007
[6]

Peng Han, Jin Wang, Di Yao, Shuo Shang, and Xiangliang Zhang. 2021. InA Graph-based Approach for Trajectory Similarity Computation in Spatial Networks (KDD ’21). 556–564. https://doi.org/10.1145/3447548.3467337

work page doi:10.1145/3447548.3467337 2021
[7]

Yifan Hu, Guibin Zhang, Peiyuan Liu, Disen Lan, Naiqi Li, Dawei Cheng, Tao Dai, Shu-Tao Xia, and Shirui Pan. 2025. TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting. InForty-second International Con- ference on Machine Learning. https://openreview.net/forum?id=490VcNtjh7

work page 2025
[8]

Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang

George Em Karniadakis, Ioannis G. Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. 2021. Physics-informed machine learning.Nature Reviews Physics3, 6 (2021), 422–440

work page 2021
[9]

Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. 2025. Deep learning for time series forecasting: a survey.International Journal of Machine Learning and Cybernetics16, 7 (2025), 5079–5112

work page 2025
[10]

Lawrence, Seshu Kumar Damarla, Jong Woo Kim, Aditya Tulsyan, Faraz Amjad, Kai Wang, Benoit Chachuat, Jong Min Lee, Biao Huang, and R

Nathan P. Lawrence, Seshu Kumar Damarla, Jong Woo Kim, Aditya Tulsyan, Faraz Amjad, Kai Wang, Benoit Chachuat, Jong Min Lee, Biao Huang, and R. Bhushan Gopaluni. 2024. Machine learning for industrial sensing and control: A survey and practical perspective.Control Engineering Practice145 (2024), 105841

work page 2024
[11]

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. InInternational Conference on Learning Representations (ICLR ’18)

work page 2018
[12]

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. 2023. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting.arXiv preprint arXiv:2310.06625(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karni- adakis. 2021. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence3, 3 (2021), 218–229

work page 2021
[14]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In International Conference on Learning Representations

work page 2023
[15]

Hashem Pesaran and Allan Timmermann

M. Hashem Pesaran and Allan Timmermann. 1992. A Simple Nonparametric Test of Predictive Performance.Journal of Business & Economic Statistics10, 4 (1992), 461–465. http://www.jstor.org/stable/1391822

work page arXiv 1992
[16]

Badgwell

S.Joe Qin and Thomas A. Badgwell. 2003. A survey of industrial model predictive control technology.Control Engineering Practice11, 7 (2003), 733–764

work page 2003
[17]

Abdur Rahman and Md Mahmudul Hasan. 2017. Modeling and Forecasting of Carbon Dioxide Emissions in Bangladesh Using Autoregressive Integrated Moving Average (ARIMA) Models.Open Journal of Statistics7, 4 (July 2017), 560–566. doi:10.4236/ojs.2017.74038

work page doi:10.4236/ojs.2017.74038 2017
[18]

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.J. Comput. Phys.378 (2019), 686–707

work page 2019
[19]

Rieth, Ben D

Cory A. Rieth, Ben D. Amsel, Randy Tran, and Maia B. Cook. 2017. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation. doi:10.7910/DVN/6C3JR1

work page doi:10.7910/dvn/6c3jr1 2017
[20]

Computation at the edge of chaos: Phase transitions and emergent computation,

Leonid I. Rudin, Stanley Osher, and Emad Fatemi. 1992. Nonlinear total variation based noise removal algorithms.Physica D: Nonlinear Phenomena60, 1 (1992), 259–268. https://www.sciencedirect.com/science/article/pii/016727899290242F

work page arXiv 1992
[21]

Christopher A. Sims. 1980. Macroeconomics and Reality.Econometrica48, 1 (1980), 1–48. http://www.jstor.org/stable/1912017

work page arXiv 1980
[22]

Z. Skaf, T. Aliyev, L. Shead, and T. Steffen. 2014. The State of the Art in Selective Catalytic Reduction Control. InSAE 2014 World Congress and Exhibition

work page 2014
[23]

Zhang, and JUN ZHOU

Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, and JUN ZHOU. 2024. TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting. InThe Twelfth International Conference on Learning Representations

work page 2024
[24]

Jared Willard, Xiaowei Jia, Shaoming Xu, Michael Steinbach, and Vipin Kumar

work page
[25]

Surveys55, 4 (2022), 1–37

Integrating scientific knowledge with machine learning for engineering and environmental systems.Comput. Surveys55, 4 (2022), 1–37

work page 2022
[26]

Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. InInternational Conference on Learning Representations

work page 2023
[27]

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Fore- casting. InAdvances in Neural Information Processing Systems

work page 2021
[28]

Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

work page 2020
[29]

Z. Yang, P. Liu, W. Zhou, and Q. Wang. 2022. Deep learning-enhanced NMPC for DeNOx systems.IEEE Transactions on Control Systems Technology30, 2 (2022), 589–603

work page 2022
[30]

Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-Temporal Graph Con- volutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). AAAI Press, 3634–3640

work page 2018
[31]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Se- quence Time-Series Forecasting. InThe Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, Vol. 35. 11106–11115

work page 2021
[32]

Jingbo Zhou, Xinjiang Lu, Yixiong Xiao, Jiantao Su, Junfu Lyu, Yanjun Ma, and De- jing Dou. 2022. SDWPF: A Dataset for Spatial Dynamic Wind Power Forecasting Challenge at KDD Cup 2022.arXiv preprint arXiv:2208.04360(2022)

work page arXiv 2022
[33]

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. InProc. 39th International Conference on Machine Learning (ICML 2022)(Baltimore, Maryland). A Implementation Details A.1 Dataset Descriptions To comprehensively evaluate DSPR across diverse phy...

work page 2022
[34]

Linear MPC (ARX)[ 16]: The industrial standard ARX model for process control, serving as a robustness baseline limited by linearity

Classical Methods. Linear MPC (ARX)[ 16]: The industrial standard ARX model for process control, serving as a robustness baseline limited by linearity

work page
[35]

Informer[ 30]: Uses ProbSparse at- tention for efficient long-sequence forecasting.PatchTST[ 14]: Applies channel-independent patching to capture local semantics

Transformer Variants. Informer[ 30]: Uses ProbSparse at- tention for efficient long-sequence forecasting.PatchTST[ 14]: Applies channel-independent patching to capture local semantics. iTransformer[ 12]: Inverts attention to embed variates as tokens for multivariate correlations.TimeMixer[ 23]: Uses multi-scale MLP mixing.Note: This serves as our Trend St...

work page
[36]

TimesNet[ 25]: Transforms 1D series into 2D tensors to apply convolutions for intra- and inter-period variations

CNN-based Methods. TimesNet[ 25]: Transforms 1D series into 2D tensors to apply convolutions for intra- and inter-period variations

work page
[37]

MSGNet[ 4]: Leverages frequency- domain graph convolutions for multi-scale inter-series correlations

Spectral & Graph Methods. MSGNet[ 4]: Leverages frequency- domain graph convolutions for multi-scale inter-series correlations. TimeFilter[ 7]: Uses learnable frequency filters to decompose tem- poral dynamics efficiently

work page
[38]

loss-level

Physics-Informed Methods. Physics-Guided NN (PG-NN): To compare "loss-level" vs. "architecture-level" integration, we aug- ment the TimeMixer with a soft physical regularization term. The total loss is Ltotal =L MSE +𝝀phy ∥ ˆy−𝒇cons (x) ∥2 2, where𝒇cons (·)rep- resents conservation laws and 𝝀phy balances data fit with physical consistency. A.3 Experimenta...

work page
[39]

Loss weights are set to 𝝀phys = 10−2 and 𝝀sparse = 10−4

ThePhysics-Residual Streamuses 𝒅emb = 64, adaptive win- dow range 𝝎𝒕,𝒄 ∈ [ 0, 20], and gating initialization 𝜶init = 0. Loss weights are set to 𝝀phys = 10−2 and 𝝀sparse = 10−4. Baseline mod- els were reproduced following the Time-Series Library framework (https://github.com/thuml/Time-Series-Library). To facilitate repro- ducibility, the complete DSPR imp...

work page 2026

[1] [1]

Tom Beucler, Michael Pritchard, Stephan Rasp, Jordan Ott, Pierre Baldi, and Pierre Gentine. 2021. Enforcing Analytic Constraints in Neural Networks Emulating Physical Systems.Physical Review Letters126, 9 (2021). http://dx.doi.org/10. 1103/PhysRevLett.126.098302

work page 2021

[2] [2]

Brunton, Joshua L

Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. 2016. Sparse Identi- fication of Nonlinear Dynamics with Control (SINDYc).IFAC-PapersOnLine49, 18 (2016), 710–715. doi:10.1016/j.ifacol.2016.10.249 10th IFAC Symposium on Nonlinear Control Systems NOLCOS 2016

work page doi:10.1016/j.ifacol.2016.10.249 2016

[3] [3]

Shengze Cai, Zhiping Mao, Zhicheng Wang, Minglang Yin, and George Em Karniadakis. 2021. Physics-informed neural networks (PINNs) for fluid mechanics: A review. arXiv:2105.09506 [physics.flu-dyn] https://arxiv.org/abs/2105.09506

work page arXiv 2021

[4] [4]

Wanlin Cai, Yuxuan Liang, Xianggen Liu, Jianshuai Feng, and Yuankai Wu. 2023. MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting.arXiv preprint arXiv:2401.00423(2023)

work page arXiv 2023

[5] [5]

Camacho and C

E.F. Camacho and C. Bordons. 2007.Model Predictive Control(2nd ed.). Springer

work page 2007

[6] [6]

Peng Han, Jin Wang, Di Yao, Shuo Shang, and Xiangliang Zhang. 2021. InA Graph-based Approach for Trajectory Similarity Computation in Spatial Networks (KDD ’21). 556–564. https://doi.org/10.1145/3447548.3467337

work page doi:10.1145/3447548.3467337 2021

[7] [7]

Yifan Hu, Guibin Zhang, Peiyuan Liu, Disen Lan, Naiqi Li, Dawei Cheng, Tao Dai, Shu-Tao Xia, and Shirui Pan. 2025. TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting. InForty-second International Con- ference on Machine Learning. https://openreview.net/forum?id=490VcNtjh7

work page 2025

[8] [8]

Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang

George Em Karniadakis, Ioannis G. Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. 2021. Physics-informed machine learning.Nature Reviews Physics3, 6 (2021), 422–440

work page 2021

[9] [9]

Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. 2025. Deep learning for time series forecasting: a survey.International Journal of Machine Learning and Cybernetics16, 7 (2025), 5079–5112

work page 2025

[10] [10]

Lawrence, Seshu Kumar Damarla, Jong Woo Kim, Aditya Tulsyan, Faraz Amjad, Kai Wang, Benoit Chachuat, Jong Min Lee, Biao Huang, and R

Nathan P. Lawrence, Seshu Kumar Damarla, Jong Woo Kim, Aditya Tulsyan, Faraz Amjad, Kai Wang, Benoit Chachuat, Jong Min Lee, Biao Huang, and R. Bhushan Gopaluni. 2024. Machine learning for industrial sensing and control: A survey and practical perspective.Control Engineering Practice145 (2024), 105841

work page 2024

[11] [11]

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. InInternational Conference on Learning Representations (ICLR ’18)

work page 2018

[12] [12]

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. 2023. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting.arXiv preprint arXiv:2310.06625(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[13] [13]

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karni- adakis. 2021. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence3, 3 (2021), 218–229

work page 2021

[14] [14]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In International Conference on Learning Representations

work page 2023

[15] [15]

Hashem Pesaran and Allan Timmermann

M. Hashem Pesaran and Allan Timmermann. 1992. A Simple Nonparametric Test of Predictive Performance.Journal of Business & Economic Statistics10, 4 (1992), 461–465. http://www.jstor.org/stable/1391822

work page arXiv 1992

[16] [16]

Badgwell

S.Joe Qin and Thomas A. Badgwell. 2003. A survey of industrial model predictive control technology.Control Engineering Practice11, 7 (2003), 733–764

work page 2003

[17] [17]

Abdur Rahman and Md Mahmudul Hasan. 2017. Modeling and Forecasting of Carbon Dioxide Emissions in Bangladesh Using Autoregressive Integrated Moving Average (ARIMA) Models.Open Journal of Statistics7, 4 (July 2017), 560–566. doi:10.4236/ojs.2017.74038

work page doi:10.4236/ojs.2017.74038 2017

[18] [18]

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.J. Comput. Phys.378 (2019), 686–707

work page 2019

[19] [19]

Rieth, Ben D

Cory A. Rieth, Ben D. Amsel, Randy Tran, and Maia B. Cook. 2017. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation. doi:10.7910/DVN/6C3JR1

work page doi:10.7910/dvn/6c3jr1 2017

[20] [20]

Computation at the edge of chaos: Phase transitions and emergent computation,

Leonid I. Rudin, Stanley Osher, and Emad Fatemi. 1992. Nonlinear total variation based noise removal algorithms.Physica D: Nonlinear Phenomena60, 1 (1992), 259–268. https://www.sciencedirect.com/science/article/pii/016727899290242F

work page arXiv 1992

[21] [21]

Christopher A. Sims. 1980. Macroeconomics and Reality.Econometrica48, 1 (1980), 1–48. http://www.jstor.org/stable/1912017

work page arXiv 1980

[22] [22]

Z. Skaf, T. Aliyev, L. Shead, and T. Steffen. 2014. The State of the Art in Selective Catalytic Reduction Control. InSAE 2014 World Congress and Exhibition

work page 2014

[23] [23]

Zhang, and JUN ZHOU

Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, and JUN ZHOU. 2024. TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting. InThe Twelfth International Conference on Learning Representations

work page 2024

[24] [24]

Jared Willard, Xiaowei Jia, Shaoming Xu, Michael Steinbach, and Vipin Kumar

work page

[25] [25]

Surveys55, 4 (2022), 1–37

Integrating scientific knowledge with machine learning for engineering and environmental systems.Comput. Surveys55, 4 (2022), 1–37

work page 2022

[26] [26]

Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. InInternational Conference on Learning Representations

work page 2023

[27] [27]

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Fore- casting. InAdvances in Neural Information Processing Systems

work page 2021

[28] [28]

Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

work page 2020

[29] [29]

Z. Yang, P. Liu, W. Zhou, and Q. Wang. 2022. Deep learning-enhanced NMPC for DeNOx systems.IEEE Transactions on Control Systems Technology30, 2 (2022), 589–603

work page 2022

[30] [30]

Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-Temporal Graph Con- volutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). AAAI Press, 3634–3640

work page 2018

[31] [31]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Se- quence Time-Series Forecasting. InThe Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, Vol. 35. 11106–11115

work page 2021

[32] [32]

Jingbo Zhou, Xinjiang Lu, Yixiong Xiao, Jiantao Su, Junfu Lyu, Yanjun Ma, and De- jing Dou. 2022. SDWPF: A Dataset for Spatial Dynamic Wind Power Forecasting Challenge at KDD Cup 2022.arXiv preprint arXiv:2208.04360(2022)

work page arXiv 2022

[33] [33]

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. InProc. 39th International Conference on Machine Learning (ICML 2022)(Baltimore, Maryland). A Implementation Details A.1 Dataset Descriptions To comprehensively evaluate DSPR across diverse phy...

work page 2022

[34] [34]

Linear MPC (ARX)[ 16]: The industrial standard ARX model for process control, serving as a robustness baseline limited by linearity

Classical Methods. Linear MPC (ARX)[ 16]: The industrial standard ARX model for process control, serving as a robustness baseline limited by linearity

work page

[35] [35]

Informer[ 30]: Uses ProbSparse at- tention for efficient long-sequence forecasting.PatchTST[ 14]: Applies channel-independent patching to capture local semantics

Transformer Variants. Informer[ 30]: Uses ProbSparse at- tention for efficient long-sequence forecasting.PatchTST[ 14]: Applies channel-independent patching to capture local semantics. iTransformer[ 12]: Inverts attention to embed variates as tokens for multivariate correlations.TimeMixer[ 23]: Uses multi-scale MLP mixing.Note: This serves as our Trend St...

work page

[36] [36]

TimesNet[ 25]: Transforms 1D series into 2D tensors to apply convolutions for intra- and inter-period variations

CNN-based Methods. TimesNet[ 25]: Transforms 1D series into 2D tensors to apply convolutions for intra- and inter-period variations

work page

[37] [37]

MSGNet[ 4]: Leverages frequency- domain graph convolutions for multi-scale inter-series correlations

Spectral & Graph Methods. MSGNet[ 4]: Leverages frequency- domain graph convolutions for multi-scale inter-series correlations. TimeFilter[ 7]: Uses learnable frequency filters to decompose tem- poral dynamics efficiently

work page

[38] [38]

loss-level

Physics-Informed Methods. Physics-Guided NN (PG-NN): To compare "loss-level" vs. "architecture-level" integration, we aug- ment the TimeMixer with a soft physical regularization term. The total loss is Ltotal =L MSE +𝝀phy ∥ ˆy−𝒇cons (x) ∥2 2, where𝒇cons (·)rep- resents conservation laws and 𝝀phy balances data fit with physical consistency. A.3 Experimenta...

work page

[39] [39]

Loss weights are set to 𝝀phys = 10−2 and 𝝀sparse = 10−4

ThePhysics-Residual Streamuses 𝒅emb = 64, adaptive win- dow range 𝝎𝒕,𝒄 ∈ [ 0, 20], and gating initialization 𝜶init = 0. Loss weights are set to 𝝀phys = 10−2 and 𝝀sparse = 10−4. Baseline mod- els were reproduced following the Time-Series Library framework (https://github.com/thuml/Time-Series-Library). To facilitate repro- ducibility, the complete DSPR imp...

work page 2026