Recognition: unknown
CombinationTS: A Modular Framework for Understanding Time-Series Forecasting Models
Pith reviewed 2026-05-09 15:17 UTC · model grok-4.3
The pith
With a well-designed embedding, a parameter-free identity encoder often matches complex time-series forecasting models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is the Identity Paradox: once the data view provided by the Embedding module is well-designed, a parameter-free Identity Encoder often matches or outperforms complex backbones. Additionally, explicit structural priors introduced via Input Transformations yield a more favorable performance-stability trade-off than increasing Encoder complexity.
What carries the argument
CombinationTS decomposes models into Input Transformation, Embedding, Encoder, Decoder, and Output Transformation modules, evaluated under shared conditions using marginalized performance μ and stability σ to attribute contributions.
Load-bearing premise
The five modules are orthogonal and can be fairly compared under a shared evaluation condition space without hidden interactions or biases from how the modules are combined.
What would settle it
A large-scale paired evaluation on new datasets where complex encoders still yield statistically significant gains in both performance and stability even after optimizing the embedding and input transformations would falsify the Identity Paradox.
Figures
read the original abstract
Recent progress in time-series forecasting has led to rapidly increasing architectural complexity, yet many reported State-of-the-Art gains are statistically fragile or misattributed. We argue that progress requires a shift from model selection to modular attribution, identifying which components truly drive performance. We propose CombinationTS, a self-contained probabilistic evaluation framework that decomposes forecasting models into orthogonal modules--Input Transformation, Embedding, Encoder, Decoder, and Output Transformation--and evaluates them under a shared evaluation condition space. By quantifying each component via marginalized performance ($\mu$) and stability ($\sigma$), CombinationTS enables robust attribution beyond fragile point estimates. Through large-scale paired evaluation, we uncover the Identity Paradox: once the data view (Embedding) is well-designed, a parameter-free Identity Encoder often matches or outperforms complex backbones. We further show that explicit structural priors introduced via Input Transformations yield a more favorable performance-stability trade-off than increasing Encoder complexity, establishing a principled baseline for architectural necessity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CombinationTS, a self-contained probabilistic evaluation framework that decomposes time-series forecasting models into five modules (Input Transformation, Embedding, Encoder, Decoder, Output Transformation) evaluated under a shared condition space. It quantifies component contributions via marginalized performance (μ) and stability (σ) from large-scale paired evaluations, claiming an 'Identity Paradox' in which a parameter-free Identity Encoder often matches or outperforms complex encoders once the Embedding is well-designed, and that explicit structural priors from Input Transformations yield superior performance-stability trade-offs compared to increasing Encoder complexity.
Significance. If the modular orthogonality and evaluation controls hold, the framework offers a principled shift from architecture selection to component attribution in time-series forecasting. The stability metric (σ) alongside performance and the modular baseline are potentially high-impact contributions that could reduce over-reliance on complex models and guide more interpretable designs.
major comments (2)
- [Evaluation Methodology] The orthogonality assumption across the five modules is load-bearing for all attribution claims, including the Identity Paradox and Input Transformation results. The paired evaluation design (described in the framework and experiments sections) lacks explicit controls for cross-module interactions, such as distributional or dimensional mismatches when substituting an Identity Encoder after a learned Embedding; this risks misattributing observed μ/σ differences.
- [Experiments and Results] No details are provided on the datasets, number of series, number of random seeds or runs used to compute σ, statistical tests for paired comparisons, or how the shared evaluation condition space is constructed to ensure fair marginalization. These omissions prevent assessment of whether the reported Identity Paradox and trade-off findings are robust or statistically supported.
minor comments (1)
- [Abstract] The abstract refers to 'large-scale paired evaluation' without quantifying scale or conditions; adding a brief parenthetical on the number of models, datasets, or total evaluations would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments, which highlight important aspects of our evaluation framework. We respond to each major comment below, providing clarifications based on the manuscript and committing to revisions that address the concerns without altering the core claims.
read point-by-point responses
-
Referee: [Evaluation Methodology] The orthogonality assumption across the five modules is load-bearing for all attribution claims, including the Identity Paradox and Input Transformation results. The paired evaluation design (described in the framework and experiments sections) lacks explicit controls for cross-module interactions, such as distributional or dimensional mismatches when substituting an Identity Encoder after a learned Embedding; this risks misattributing observed μ/σ differences.
Authors: We agree that the orthogonality assumption is central to the attribution results, including the Identity Paradox. The framework constructs a shared evaluation condition space (Section 3) that enforces identical input distributions, fixed output dimensionalities, and consistent preprocessing across all module substitutions, including normalization of embedding outputs to match encoder expectations. This design is intended to isolate module contributions. However, we acknowledge that the manuscript could more explicitly demonstrate the absence of residual interactions. In the revision, we will add a dedicated subsection in the methodology detailing these controls and include a new ablation quantifying cross-module effects under controlled substitutions. revision: partial
-
Referee: [Experiments and Results] No details are provided on the datasets, number of series, number of random seeds or runs used to compute σ, statistical tests for paired comparisons, or how the shared evaluation condition space is constructed to ensure fair marginalization. These omissions prevent assessment of whether the reported Identity Paradox and trade-off findings are robust or statistically supported.
Authors: The referee correctly notes that these experimental details are necessary for evaluating robustness and reproducibility. While the manuscript describes large-scale paired evaluations under a shared condition space, specific implementation details were not included in the main text. We will revise the Experiments section to add: a summary table of the datasets (including number of series and characteristics), the number of random seeds used to compute σ (10 seeds), the statistical tests for paired comparisons (paired Wilcoxon signed-rank tests), and an expanded description of the shared condition space construction (fixed data splits, hyperparameter ranges, and marginalization procedure). These details will also be cross-referenced in the appendix. revision: yes
Circularity Check
No circularity detected in the modular framework proposal
full rationale
This paper proposes an empirical framework for decomposing time-series forecasting models into modules and evaluating them via paired experiments. There are no mathematical derivations, self-referential definitions, or fitted parameters presented that reduce predictions to inputs by construction. The central claims, such as the Identity Paradox, are outcomes of described large-scale evaluations rather than tautological. The orthogonality assumption is posited for the framework but does not create circularity in the reported results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Forecasting models can be decomposed into the orthogonal modules of Input Transformation, Embedding, Encoder, Decoder, and Output Transformation
invented entities (1)
-
CombinationTS evaluation framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2502.14045 , year=
Brigato, L., Morand, R., Strømmen, K., Panagiotou, M., Schmidt, M., and Mougiakakou, S. Position: There are no champions in long-term time series forecasting.arXiv preprint arXiv:2502.14045,
-
[2]
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
9 CombinationTS: A Modular Framework for Understanding Time-Series Forecasting Models Liu, X., Qiu, X., Wu, X., Li, Z., Guo, C., Hu, J., and Yang, B. Rethinking irregular time series forecasting: A simple yet effective baseline. InAAAI, 2026a. Liu, Y ., Hu, T., Zhang, H., Wu, H., Wang, S., Ma, L., and Long, M. itransformer: Inverted transformers are effec...
work page internal anchor Pith review arXiv
-
[3]
Liu, Z., Duan, P., Chu, Q., Kuhlmann, L., Zhang, C., Yue, W., Tang, X., and Zhang, B. An attributed mul- tiplex network enabled GNN-based stock predictor with observable and non-observable information.Expert Systems with Applications, 296:129018, 2025a. doi: 10.1016/j.eswa.2025.129018. Liu, Z., Duan, P., Wang, B., Tang, X., Chu, Q., Zhang, C., Huang, Y .,...
-
[4]
Liu, Z., Duan, P., Tang, X., Li, B., Huang, Y ., Geng, M., Zhang, C., Zhang, B., and Wang, B. TimeFormer: Trans- former with attention modulation empowered by tempo- ral characteristics for time series forecasting.Expert Systems with Applications, 307:131040, 2026c. doi: 10.1016/j.eswa.2025.131040. Nie, Y . A time series is worth 64words: Long-term foreca...
-
[5]
S., and Yang, B
Qiu, X., Li, Z., Qiu, W., Hu, S., Zhou, L., Wu, X., Li, Z., Guo, C., Zhou, A., Sheng, Z., Hu, J., Jensen, C. S., and Yang, B. TAB: Unified benchmarking of time series anomaly detection methods.Proceedings of the VLDB Endowment, 18(9):2775–2789, 2025a. Qiu, X., Wu, X., Cheng, H., Liu, X., Guo, C., Hu, J., and Yang, B. DBLoss: Decomposition-based loss funct...
2005
-
[6]
arXiv preprint arXiv:2405.14616 , year=
Wang, S., Wu, H., Shi, X., Hu, T., Luo, H., Ma, L., Zhang, J. Y ., and Zhou, J. Timemixer: Decomposable multi- scale mixing for time series forecasting.arXiv preprint arXiv:2405.14616, 2024a. 10 CombinationTS: A Modular Framework for Understanding Time-Series Forecasting Models Wang, Y ., Wu, H., Dong, J., Liu, Y ., Qiu, Y ., Zhang, H., Wang, J., and Long...
-
[7]
ETTh1 and ETTh2 are sampled at an hourly frequency, while ETTm1 and ETTm2 are collected at 15-minute intervals
Specifically, the datasets are described as follows: • Electricity Transformer Temperature (ETT)(Zhou et al., 2021): This benchmark consists of four subsets. ETTh1 and ETTh2 are sampled at an hourly frequency, while ETTm1 and ETTm2 are collected at 15-minute intervals. All time series are measured from two electricity transformers. • Weather(Wu et al., 20...
2021
-
[8]
Dataset H Point wise Patch wise Variate wise Identity L to dim C to dim ˆµˆσ min ˆµˆσ min ˆµˆσ min ˆµˆσ min ˆµˆσ min ˆµˆσ min ETTh1 96 0.3898 0.0143 0.3651 0.37720.00580.3645 0.3834 0.00800.3641 0.5890 0.1537 0.3721 0.3937 0.0156 0.3721 0.4602 0.0357 0.4013 192 0.4324 0.0166 0.3992 0.42400.01330.3984 0.4247 0.01280.3983 0.6092 0.1408 0.4028 0.4343 0.0161 ...
-
[9]
Highlighting follows the convention in Table
Dataset H MLP Transformer Identity ˆµˆσ min ˆµˆσ min ˆµˆσ min ETTh1 96 0.4371 0.08160.3700 0.4518 0.0930 0.3721 0.39070.02310.3641 192 0.4870 0.08270.4043 0.5044 0.0988 0.4091 0.43340.03060.3983 336 0.5186 0.07870.4335 0.5239 0.0795 0.4406 0.46840.02680.4196 720 0.5538 0.1039 0.4533 0.5679 0.10140.4504 0.47220.04060.4248 avg 0.4960 0.0938 - 0.5095 0.1004 ...
-
[10]
Highlighting follows the convention in Table
Dataset Channel Point ˆµˆσ min ˆµˆσ min ETTh1 0.4028 0.03690.3743 0.38660.0090 0.3747 ETTh2 0.30100.01390.2818 0.3028 0.0253 0.2829 ETTm1 0.3462 0.0186 0.3160 0.33620.01730.3137 ETTm2 0.1830 0.00350.1738 0.18220.0018 0.1778 Electricity 0.2018 0.02910.1580 0.19760.0102 0.1840 Weather 0.19030.00960.1599 0.1913 0.0038 0.1816 Table 10.Experiment 2 (Auditing I...
-
[11]
Highlighting follows the convention in Table
Dataset Identity MLP Transformer ˆµˆσ min ˆµˆσ min ˆµˆσ min ETTh1 0.39320.02530.3743 0.3971 0.03340.3749 0.3984 0.0330 0.3773 ETTh2 0.29080.00910.2829 0.3052 0.01230.2818 0.3097 0.0287 0.2873 ETTm1 0.3613 0.0072 0.3522 0.3314 0.01230.3159 0.33080.01520.3137 ETTm2 0.1827 0.0017 0.1804 0.18050.00420.1738 0.1843 0.00340.1770 Electricity 0.2078 0.0152 0.1981 ...
-
[12]
Highlighting follows the convention in Table 2, where red denotes MSE and blue denotes MAE
Dataset H iTransformer SimpleTM ˆµˆσ min ˆµˆσ min ETTh1 96 0.3934 0.0133 0.3789 0.38120.00760.3730 192 0.4354 0.0099 0.4171 0.42870.01050.4148 336 0.4656 0.0127 0.4434 0.46010.01320.4314 720 0.4711 0.0155 0.4504 0.46460.01510.4473 avg 0.4401 0.0335 - 0.43240.0354 - ETTh2 96 0.3214 0.0228 0.2964 0.29450.00650.2840 192 0.3887 0.0167 0.3599 0.37090.01100.352...
-
[13]
(a) Performance across 2 Embeddings. Dataset Channel Point ˆµ ci low ci upp ˆσ B ˆµ ci low ci upp ˆσ B ETTh1 0.4028 0.3943 0.4114 0.00140.3743 0.38660.3845 0.3887 0.0001 0.3747 ETTh2 0.30100.2987 0.3033 0.00020.2818 0.3028 0.2987 0.3069 0.0006 0.2829 ETTm1 0.3462 0.3431 0.3492 0.0003 0.3160 0.33620.3334 0.3390 0.00030.3137 ETTm2 0.1830 0.1822 0.1838 0.000...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.