DSPR: Dual-Stream Physics-Residual Networks for Trustworthy Industrial Time Series Forecasting
Pith reviewed 2026-05-21 09:54 UTC · model grok-4.3
The pith
Dual-stream networks separate stable patterns from physics-guided residuals to forecast industrial time series with high physical consistency under regime shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that explicitly decoupling statistical temporal evolution of individual variables from regime-dependent residual dynamics, implemented through an Adaptive Window module that estimates flow-dependent transport delays and a Physics-Guided Dynamic Graph that incorporates physical priors to learn time-varying interaction structures while suppressing spurious correlations, produces state-of-the-art forecasting accuracy and robustness on four industrial benchmarks while delivering Mean Conservation Accuracy above 99 percent and Total Variation Ratio up to 97.2 percent.
What carries the argument
The dual-stream architecture in which the physics-residual stream uses an Adaptive Window to estimate transport delays and a Physics-Guided Dynamic Graph to model time-varying physical interactions from priors.
Load-bearing premise
The approach assumes that physical priors can be turned into a dynamic graph that accurately learns real interaction structures and transport delays from data without adding new errors or biases.
What would settle it
If retraining DSPR on the same four industrial benchmarks yields conservation accuracy below 95 percent or higher forecast error than a standard recurrent network on at least two regimes, the decoupling benefit would be refuted.
Figures
read the original abstract
Accurate forecasting of industrial time series requires balancing predictive accuracy with physical plausibility under non-stationary operating conditions. Existing data-driven models often achieve strong statistical performance but struggle to respect regime-dependent interaction structures and transport delays inherent in real-world systems. To address this challenge, we propose DSPR (Dual-Stream Physics-Residual Networks), a forecasting framework that explicitly decouples stable temporal patterns from regime-dependent residual dynamics. The first stream models the statistical temporal evolution of individual variables. The second stream focuses on residual dynamics through two key mechanisms: an Adaptive Window module that estimates flow-dependent transport delays, and a Physics-Guided Dynamic Graph that incorporates physical priors to learn time-varying interaction structures while suppressing spurious correlations. Experiments on four industrial benchmarks spanning heterogeneous regimes demonstrate that DSPR consistently improves forecasting accuracy and robustness under regime shifts while maintaining strong physical plausibility. It achieves state-of-the-art predictive performance, with Mean Conservation Accuracy exceeding 99% and Total Variation Ratio reaching up to 97.2%. Beyond forecasting, the learned interaction structures and adaptive lags provide interpretable insights that are consistent with known domain mechanisms, such as flow-dependent transport delays and wind-to-power scaling behaviors. These results suggest that architectural decoupling with physics-consistent inductive biases offers an effective path toward trustworthy industrial time-series forecasting. Furthermore, DSPR's demonstrated robust performance in long-term industrial deployment bridges the gap between advanced forecasting models and trustworthy autonomous control systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DSPR, a dual-stream architecture for industrial time series forecasting. The first stream models statistical temporal evolution of variables, while the second stream addresses residual dynamics via an Adaptive Window that estimates flow-dependent transport delays and a Physics-Guided Dynamic Graph that incorporates physical priors to learn time-varying interaction structures and suppress spurious correlations. Experiments on four industrial benchmarks under heterogeneous regimes claim state-of-the-art predictive performance, robustness to regime shifts, and high physical plausibility with Mean Conservation Accuracy exceeding 99% and Total Variation Ratio up to 97.2%. The learned structures are said to yield interpretable insights consistent with known domain mechanisms such as flow-dependent delays and wind-to-power scaling.
Significance. If the results and physical consistency hold, the work could meaningfully advance trustworthy forecasting in industrial settings by combining data-driven accuracy with physics-based inductive biases. The explicit decoupling of streams and the focus on regime-dependent residuals address real limitations of black-box models in non-stationary environments, with potential benefits for long-term deployment and autonomous control systems. The interpretability of learned interactions is a positive feature if independently validated.
major comments (3)
- [Experiments] Experiments section: The reported Mean Conservation Accuracy (>99%) and Total Variation Ratio (up to 97.2%) are aggregate plausibility scores, but the manuscript provides no ground-truth interaction matrices, known physical delay values, or controlled ablation isolating the Physics-Guided Dynamic Graph and Adaptive Window from the dual-stream architecture alone. This leaves open whether the metrics validate the physics components or can be satisfied without them matching actual mechanisms.
- [§3.2] §3.2 (Physics-Guided Dynamic Graph): The mechanism for incorporating physical priors to learn time-varying structures while suppressing spurious correlations is described conceptually but lacks an explicit formulation showing how priors are enforced independently of the conservation metrics. Without this or an ablation demonstrating its isolated contribution, the claim that it reliably embeds physical priors rests on indirect evidence.
- [Adaptive Window module] Adaptive Window module: The assertion that this component accurately recovers flow-dependent transport delays from data alone is central to the residual stream but is supported only by overall forecasting metrics rather than direct recovery tests against known physical lags in the benchmarks.
minor comments (2)
- [Abstract] The abstract would benefit from briefly naming the four industrial benchmarks and the specific baseline models used for comparison.
- [Methods] Notation for the dynamic graph weights and adaptive window parameters should be introduced with explicit equations in the methods section for clarity.
Simulated Author's Rebuttal
We thank the referee for the insightful comments on our work. We address each of the major comments in detail below, providing clarifications and indicating where we will make revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: Experiments section: The reported Mean Conservation Accuracy (>99%) and Total Variation Ratio (up to 97.2%) are aggregate plausibility scores, but the manuscript provides no ground-truth interaction matrices, known physical delay values, or controlled ablation isolating the Physics-Guided Dynamic Graph and Adaptive Window from the dual-stream architecture alone. This leaves open whether the metrics validate the physics components or can be satisfied without them matching actual mechanisms.
Authors: We appreciate this observation. The industrial benchmarks used in our experiments are real-world datasets that do not come with ground-truth interaction matrices or precise physical delay annotations. To demonstrate the contribution of the physics components, we include ablation studies in Section 4.3 comparing the full DSPR model against variants without the Physics-Guided Dynamic Graph and without the Adaptive Window. These results show that removing these components leads to degraded performance in both forecasting accuracy and plausibility metrics. Additionally, we provide qualitative analysis showing that the learned structures align with domain knowledge. In the revised manuscript, we will expand the ablation studies with more controlled experiments and include a new subsection on synthetic data with known ground-truth to further validate the physics modules. revision: partial
-
Referee: §3.2 (Physics-Guided Dynamic Graph): The mechanism for incorporating physical priors to learn time-varying structures while suppressing spurious correlations is described conceptually but lacks an explicit formulation showing how priors are enforced independently of the conservation metrics. Without this or an ablation demonstrating its isolated contribution, the claim that it reliably embeds physical priors rests on indirect evidence.
Authors: We agree that a more explicit formulation would enhance clarity. The Physics-Guided Dynamic Graph incorporates physical priors through a combination of graph regularization terms and constraints derived from conservation laws, which are enforced via additional loss components separate from the main conservation accuracy metric. We will add the mathematical formulation in the revised §3.2, detailing the prior enforcement mechanism. We will also include a dedicated ablation study isolating this module's contribution to the overall performance. revision: yes
-
Referee: Adaptive Window module: The assertion that this component accurately recovers flow-dependent transport delays from data alone is central to the residual stream but is supported only by overall forecasting metrics rather than direct recovery tests against known physical lags in the benchmarks.
Authors: We acknowledge the need for more direct validation. Since the benchmarks lack explicit known physical lags, direct recovery tests against ground truth are not feasible for these datasets. However, we support the claim through case studies where the estimated delays correspond to expected physical behaviors, such as longer delays at lower flow rates. To provide stronger evidence, we will add experiments using synthetic time series with injected known transport delays to directly evaluate the Adaptive Window's recovery accuracy. revision: partial
Circularity Check
No significant circularity; derivation is self-contained with empirical support
full rationale
The abstract describes a dual-stream architecture incorporating an Adaptive Window and Physics-Guided Dynamic Graph to model residuals and interactions, with results reported via custom metrics on four benchmarks. No equations, definitions, or self-citations are provided that reduce the claimed predictions or plausibility scores to inputs by construction. The metrics (Mean Conservation Accuracy, Total Variation Ratio) are presented as evaluation outcomes rather than tautological redefinitions of the model's inductive biases. The central claims rest on experimental improvements under regime shifts, which are falsifiable against external data and do not rely on load-bearing self-citations or ansatzes imported from prior author work. This is the expected outcome for an architecture paper whose value is in the proposed decoupling rather than a closed mathematical derivation.
Axiom & Free-Parameter Ledger
free parameters (2)
- Adaptive Window parameters
- Dynamic graph weights
axioms (1)
- domain assumption Physical priors can be incorporated via a dynamic graph to learn time-varying interaction structures while suppressing spurious correlations.
Reference graph
Works this paper leans on
-
[1]
Tom Beucler, Michael Pritchard, Stephan Rasp, Jordan Ott, Pierre Baldi, and Pierre Gentine. 2021. Enforcing Analytic Constraints in Neural Networks Emulating Physical Systems.Physical Review Letters126, 9 (2021). http://dx.doi.org/10. 1103/PhysRevLett.126.098302
work page 2021
-
[2]
Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. 2016. Sparse Identi- fication of Nonlinear Dynamics with Control (SINDYc).IFAC-PapersOnLine49, 18 (2016), 710–715. doi:10.1016/j.ifacol.2016.10.249 10th IFAC Symposium on Nonlinear Control Systems NOLCOS 2016
- [3]
- [4]
-
[5]
E.F. Camacho and C. Bordons. 2007.Model Predictive Control(2nd ed.). Springer
work page 2007
-
[6]
Peng Han, Jin Wang, Di Yao, Shuo Shang, and Xiangliang Zhang. 2021. InA Graph-based Approach for Trajectory Similarity Computation in Spatial Networks (KDD ’21). 556–564. https://doi.org/10.1145/3447548.3467337
-
[7]
Yifan Hu, Guibin Zhang, Peiyuan Liu, Disen Lan, Naiqi Li, Dawei Cheng, Tao Dai, Shu-Tao Xia, and Shirui Pan. 2025. TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting. InForty-second International Con- ference on Machine Learning. https://openreview.net/forum?id=490VcNtjh7
work page 2025
-
[8]
Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang
George Em Karniadakis, Ioannis G. Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. 2021. Physics-informed machine learning.Nature Reviews Physics3, 6 (2021), 422–440
work page 2021
-
[9]
Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. 2025. Deep learning for time series forecasting: a survey.International Journal of Machine Learning and Cybernetics16, 7 (2025), 5079–5112
work page 2025
-
[10]
Nathan P. Lawrence, Seshu Kumar Damarla, Jong Woo Kim, Aditya Tulsyan, Faraz Amjad, Kai Wang, Benoit Chachuat, Jong Min Lee, Biao Huang, and R. Bhushan Gopaluni. 2024. Machine learning for industrial sensing and control: A survey and practical perspective.Control Engineering Practice145 (2024), 105841
work page 2024
-
[11]
Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. InInternational Conference on Learning Representations (ICLR ’18)
work page 2018
-
[12]
Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. 2023. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting.arXiv preprint arXiv:2310.06625(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[13]
Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karni- adakis. 2021. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence3, 3 (2021), 218–229
work page 2021
-
[14]
Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam
Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In International Conference on Learning Representations
work page 2023
-
[15]
Hashem Pesaran and Allan Timmermann
M. Hashem Pesaran and Allan Timmermann. 1992. A Simple Nonparametric Test of Predictive Performance.Journal of Business & Economic Statistics10, 4 (1992), 461–465. http://www.jstor.org/stable/1391822
- [16]
-
[17]
Abdur Rahman and Md Mahmudul Hasan. 2017. Modeling and Forecasting of Carbon Dioxide Emissions in Bangladesh Using Autoregressive Integrated Moving Average (ARIMA) Models.Open Journal of Statistics7, 4 (July 2017), 560–566. doi:10.4236/ojs.2017.74038
-
[18]
Maziar Raissi, Paris Perdikaris, and George E Karniadakis. 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.J. Comput. Phys.378 (2019), 686–707
work page 2019
-
[19]
Cory A. Rieth, Ben D. Amsel, Randy Tran, and Maia B. Cook. 2017. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation. doi:10.7910/DVN/6C3JR1
-
[20]
Computation at the edge of chaos: Phase transitions and emergent computation,
Leonid I. Rudin, Stanley Osher, and Emad Fatemi. 1992. Nonlinear total variation based noise removal algorithms.Physica D: Nonlinear Phenomena60, 1 (1992), 259–268. https://www.sciencedirect.com/science/article/pii/016727899290242F
- [21]
-
[22]
Z. Skaf, T. Aliyev, L. Shead, and T. Steffen. 2014. The State of the Art in Selective Catalytic Reduction Control. InSAE 2014 World Congress and Exhibition
work page 2014
-
[23]
Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, and JUN ZHOU. 2024. TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting. InThe Twelfth International Conference on Learning Representations
work page 2024
-
[24]
Jared Willard, Xiaowei Jia, Shaoming Xu, Michael Steinbach, and Vipin Kumar
-
[25]
Integrating scientific knowledge with machine learning for engineering and environmental systems.Comput. Surveys55, 4 (2022), 1–37
work page 2022
-
[26]
Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. InInternational Conference on Learning Representations
work page 2023
-
[27]
Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Fore- casting. InAdvances in Neural Information Processing Systems
work page 2021
-
[28]
Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
work page 2020
-
[29]
Z. Yang, P. Liu, W. Zhou, and Q. Wang. 2022. Deep learning-enhanced NMPC for DeNOx systems.IEEE Transactions on Control Systems Technology30, 2 (2022), 589–603
work page 2022
-
[30]
Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-Temporal Graph Con- volutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). AAAI Press, 3634–3640
work page 2018
-
[31]
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Se- quence Time-Series Forecasting. InThe Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, Vol. 35. 11106–11115
work page 2021
- [32]
-
[33]
Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. InProc. 39th International Conference on Machine Learning (ICML 2022)(Baltimore, Maryland). A Implementation Details A.1 Dataset Descriptions To comprehensively evaluate DSPR across diverse phy...
work page 2022
-
[34]
Classical Methods. Linear MPC (ARX)[ 16]: The industrial standard ARX model for process control, serving as a robustness baseline limited by linearity
-
[35]
Transformer Variants. Informer[ 30]: Uses ProbSparse at- tention for efficient long-sequence forecasting.PatchTST[ 14]: Applies channel-independent patching to capture local semantics. iTransformer[ 12]: Inverts attention to embed variates as tokens for multivariate correlations.TimeMixer[ 23]: Uses multi-scale MLP mixing.Note: This serves as our Trend St...
-
[36]
CNN-based Methods. TimesNet[ 25]: Transforms 1D series into 2D tensors to apply convolutions for intra- and inter-period variations
-
[37]
MSGNet[ 4]: Leverages frequency- domain graph convolutions for multi-scale inter-series correlations
Spectral & Graph Methods. MSGNet[ 4]: Leverages frequency- domain graph convolutions for multi-scale inter-series correlations. TimeFilter[ 7]: Uses learnable frequency filters to decompose tem- poral dynamics efficiently
-
[38]
Physics-Informed Methods. Physics-Guided NN (PG-NN): To compare "loss-level" vs. "architecture-level" integration, we aug- ment the TimeMixer with a soft physical regularization term. The total loss is Ltotal =L MSE +𝝀phy ∥ ˆy−𝒇cons (x) ∥2 2, where𝒇cons (·)rep- resents conservation laws and 𝝀phy balances data fit with physical consistency. A.3 Experimenta...
-
[39]
Loss weights are set to 𝝀phys = 10−2 and 𝝀sparse = 10−4
ThePhysics-Residual Streamuses 𝒅emb = 64, adaptive win- dow range 𝝎𝒕,𝒄 ∈ [ 0, 20], and gating initialization 𝜶init = 0. Loss weights are set to 𝝀phys = 10−2 and 𝝀sparse = 10−4. Baseline mod- els were reproduced following the Time-Series Library framework (https://github.com/thuml/Time-Series-Library). To facilitate repro- ducibility, the complete DSPR imp...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.