pith. machine review for the scientific record. sign in

arxiv: 2605.01231 · v1 · submitted 2026-05-02 · 💻 cs.LG

Recognition: unknown

CombinationTS: A Modular Framework for Understanding Time-Series Forecasting Models

Authors on Pith no claims yet

Pith reviewed 2026-05-09 15:17 UTC · model grok-4.3

classification 💻 cs.LG
keywords time-series forecastingmodular decompositionidentity encoderinput transformationperformance attributionmodel complexitystability analysis
0
0 comments X

The pith

With a well-designed embedding, a parameter-free identity encoder often matches complex time-series forecasting models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CombinationTS, a framework that splits forecasting models into five modules to evaluate each part's contribution separately. It reveals that once the embedding creates a good data view, simple identity encoders perform as well as or better than intricate encoder architectures. This challenges the push for more complex models by showing that input transformations with structural priors provide better performance and stability. A reader cares because it offers a way to design simpler, more reliable forecasting systems and to attribute gains correctly instead of relying on fragile overall benchmarks. The approach uses marginalized metrics for performance and stability to make comparisons robust.

Core claim

The central discovery is the Identity Paradox: once the data view provided by the Embedding module is well-designed, a parameter-free Identity Encoder often matches or outperforms complex backbones. Additionally, explicit structural priors introduced via Input Transformations yield a more favorable performance-stability trade-off than increasing Encoder complexity.

What carries the argument

CombinationTS decomposes models into Input Transformation, Embedding, Encoder, Decoder, and Output Transformation modules, evaluated under shared conditions using marginalized performance μ and stability σ to attribute contributions.

Load-bearing premise

The five modules are orthogonal and can be fairly compared under a shared evaluation condition space without hidden interactions or biases from how the modules are combined.

What would settle it

A large-scale paired evaluation on new datasets where complex encoders still yield statistically significant gains in both performance and stability even after optimizing the embedding and input transformations would falsify the Identity Paradox.

Figures

Figures reproduced from arXiv: 2605.01231 by Chenxi Wang, Fanda Fan, Jianfeng Zhan, Kuoyu Gao, Lei Wang, Rui Tang, Simiao Pang, Wanling Gao, Xiaorui Wang, Yuanfeng Shang, Yuxuan Yang, Zhipeng Liu.

Figure 1
Figure 1. Figure 1: The conceptual framework of CombinationTS. The diagram illustrates the paradigm shift proposed in this work across two dimensions. (Top) Model Architecture: We transition from entangled Black Box models to a modular decomposition framework. By dismantling models into five orthogonal components—Input Transformation, Embedding, Encoder, Decoder, and Output Transforma￾tion—we enable free recombination to iden… view at source ↗
Figure 3
Figure 3. Figure 3: , we summarize three insights about data view (95% confidence intervals reported in view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of encoder effectiveness under paired EC sampling (boxplot). The full figure is presented in view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of encoder effectiveness under paired EC sampling (full figure). 14 view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of embedding effectiveness under paired EC sampling (full figure). 15 view at source ↗
read the original abstract

Recent progress in time-series forecasting has led to rapidly increasing architectural complexity, yet many reported State-of-the-Art gains are statistically fragile or misattributed. We argue that progress requires a shift from model selection to modular attribution, identifying which components truly drive performance. We propose CombinationTS, a self-contained probabilistic evaluation framework that decomposes forecasting models into orthogonal modules--Input Transformation, Embedding, Encoder, Decoder, and Output Transformation--and evaluates them under a shared evaluation condition space. By quantifying each component via marginalized performance ($\mu$) and stability ($\sigma$), CombinationTS enables robust attribution beyond fragile point estimates. Through large-scale paired evaluation, we uncover the Identity Paradox: once the data view (Embedding) is well-designed, a parameter-free Identity Encoder often matches or outperforms complex backbones. We further show that explicit structural priors introduced via Input Transformations yield a more favorable performance-stability trade-off than increasing Encoder complexity, establishing a principled baseline for architectural necessity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces CombinationTS, a self-contained probabilistic evaluation framework that decomposes time-series forecasting models into five modules (Input Transformation, Embedding, Encoder, Decoder, Output Transformation) evaluated under a shared condition space. It quantifies component contributions via marginalized performance (μ) and stability (σ) from large-scale paired evaluations, claiming an 'Identity Paradox' in which a parameter-free Identity Encoder often matches or outperforms complex encoders once the Embedding is well-designed, and that explicit structural priors from Input Transformations yield superior performance-stability trade-offs compared to increasing Encoder complexity.

Significance. If the modular orthogonality and evaluation controls hold, the framework offers a principled shift from architecture selection to component attribution in time-series forecasting. The stability metric (σ) alongside performance and the modular baseline are potentially high-impact contributions that could reduce over-reliance on complex models and guide more interpretable designs.

major comments (2)
  1. [Evaluation Methodology] The orthogonality assumption across the five modules is load-bearing for all attribution claims, including the Identity Paradox and Input Transformation results. The paired evaluation design (described in the framework and experiments sections) lacks explicit controls for cross-module interactions, such as distributional or dimensional mismatches when substituting an Identity Encoder after a learned Embedding; this risks misattributing observed μ/σ differences.
  2. [Experiments and Results] No details are provided on the datasets, number of series, number of random seeds or runs used to compute σ, statistical tests for paired comparisons, or how the shared evaluation condition space is constructed to ensure fair marginalization. These omissions prevent assessment of whether the reported Identity Paradox and trade-off findings are robust or statistically supported.
minor comments (1)
  1. [Abstract] The abstract refers to 'large-scale paired evaluation' without quantifying scale or conditions; adding a brief parenthetical on the number of models, datasets, or total evaluations would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which highlight important aspects of our evaluation framework. We respond to each major comment below, providing clarifications based on the manuscript and committing to revisions that address the concerns without altering the core claims.

read point-by-point responses
  1. Referee: [Evaluation Methodology] The orthogonality assumption across the five modules is load-bearing for all attribution claims, including the Identity Paradox and Input Transformation results. The paired evaluation design (described in the framework and experiments sections) lacks explicit controls for cross-module interactions, such as distributional or dimensional mismatches when substituting an Identity Encoder after a learned Embedding; this risks misattributing observed μ/σ differences.

    Authors: We agree that the orthogonality assumption is central to the attribution results, including the Identity Paradox. The framework constructs a shared evaluation condition space (Section 3) that enforces identical input distributions, fixed output dimensionalities, and consistent preprocessing across all module substitutions, including normalization of embedding outputs to match encoder expectations. This design is intended to isolate module contributions. However, we acknowledge that the manuscript could more explicitly demonstrate the absence of residual interactions. In the revision, we will add a dedicated subsection in the methodology detailing these controls and include a new ablation quantifying cross-module effects under controlled substitutions. revision: partial

  2. Referee: [Experiments and Results] No details are provided on the datasets, number of series, number of random seeds or runs used to compute σ, statistical tests for paired comparisons, or how the shared evaluation condition space is constructed to ensure fair marginalization. These omissions prevent assessment of whether the reported Identity Paradox and trade-off findings are robust or statistically supported.

    Authors: The referee correctly notes that these experimental details are necessary for evaluating robustness and reproducibility. While the manuscript describes large-scale paired evaluations under a shared condition space, specific implementation details were not included in the main text. We will revise the Experiments section to add: a summary table of the datasets (including number of series and characteristics), the number of random seeds used to compute σ (10 seeds), the statistical tests for paired comparisons (paired Wilcoxon signed-rank tests), and an expanded description of the shared condition space construction (fixed data splits, hyperparameter ranges, and marginalization procedure). These details will also be cross-referenced in the appendix. revision: yes

Circularity Check

0 steps flagged

No circularity detected in the modular framework proposal

full rationale

This paper proposes an empirical framework for decomposing time-series forecasting models into modules and evaluating them via paired experiments. There are no mathematical derivations, self-referential definitions, or fitted parameters presented that reduce predictions to inputs by construction. The central claims, such as the Identity Paradox, are outcomes of described large-scale evaluations rather than tautological. The orthogonality assumption is posited for the framework but does not create circularity in the reported results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the assumption that models can be decomposed into the listed orthogonal modules and evaluated independently.

axioms (1)
  • domain assumption Forecasting models can be decomposed into the orthogonal modules of Input Transformation, Embedding, Encoder, Decoder, and Output Transformation
    This decomposition is the foundational premise of the CombinationTS framework.
invented entities (1)
  • CombinationTS evaluation framework no independent evidence
    purpose: To quantify modular contributions to forecasting performance via marginalized metrics
    Newly proposed in this work with no independent evidence outside the paper.

pith-pipeline@v0.9.0 · 5498 in / 1165 out tokens · 67049 ms · 2026-05-09T15:17:59.403332+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 11 canonical work pages · 1 internal anchor

  1. [1]

    arXiv preprint arXiv:2502.14045 , year=

    Brigato, L., Morand, R., Strømmen, K., Panagiotou, M., Schmidt, M., and Mougiakakou, S. Position: There are no champions in long-term time series forecasting.arXiv preprint arXiv:2502.14045,

  2. [2]

    iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

    9 CombinationTS: A Modular Framework for Understanding Time-Series Forecasting Models Liu, X., Qiu, X., Wu, X., Li, Z., Guo, C., Hu, J., and Yang, B. Rethinking irregular time series forecasting: A simple yet effective baseline. InAAAI, 2026a. Liu, Y ., Hu, T., Zhang, H., Wu, H., Wang, S., Ma, L., and Long, M. itransformer: Inverted transformers are effec...

  3. [3]

    An attributed mul- tiplex network enabled GNN-based stock predictor with observable and non-observable information.Expert Systems with Applications, 296:129018, 2025a

    Liu, Z., Duan, P., Chu, Q., Kuhlmann, L., Zhang, C., Yue, W., Tang, X., and Zhang, B. An attributed mul- tiplex network enabled GNN-based stock predictor with observable and non-observable information.Expert Systems with Applications, 296:129018, 2025a. doi: 10.1016/j.eswa.2025.129018. Liu, Z., Duan, P., Wang, B., Tang, X., Chu, Q., Zhang, C., Huang, Y .,...

  4. [4]

    TimeFormer: Trans- former with attention modulation empowered by tempo- ral characteristics for time series forecasting.Expert Systems with Applications, 307:131040, 2026c

    Liu, Z., Duan, P., Tang, X., Li, B., Huang, Y ., Geng, M., Zhang, C., Zhang, B., and Wang, B. TimeFormer: Trans- former with attention modulation empowered by tempo- ral characteristics for time series forecasting.Expert Systems with Applications, 307:131040, 2026c. doi: 10.1016/j.eswa.2025.131040. Nie, Y . A time series is worth 64words: Long-term foreca...

  5. [5]

    S., and Yang, B

    Qiu, X., Li, Z., Qiu, W., Hu, S., Zhou, L., Wu, X., Li, Z., Guo, C., Zhou, A., Sheng, Z., Hu, J., Jensen, C. S., and Yang, B. TAB: Unified benchmarking of time series anomaly detection methods.Proceedings of the VLDB Endowment, 18(9):2775–2789, 2025a. Qiu, X., Wu, X., Cheng, H., Liu, X., Guo, C., Hu, J., and Yang, B. DBLoss: Decomposition-based loss funct...

  6. [6]

    arXiv preprint arXiv:2405.14616 , year=

    Wang, S., Wu, H., Shi, X., Hu, T., Luo, H., Ma, L., Zhang, J. Y ., and Zhou, J. Timemixer: Decomposable multi- scale mixing for time series forecasting.arXiv preprint arXiv:2405.14616, 2024a. 10 CombinationTS: A Modular Framework for Understanding Time-Series Forecasting Models Wang, Y ., Wu, H., Dong, J., Liu, Y ., Qiu, Y ., Zhang, H., Wang, J., and Long...

  7. [7]

    ETTh1 and ETTh2 are sampled at an hourly frequency, while ETTm1 and ETTm2 are collected at 15-minute intervals

    Specifically, the datasets are described as follows: • Electricity Transformer Temperature (ETT)(Zhou et al., 2021): This benchmark consists of four subsets. ETTh1 and ETTh2 are sampled at an hourly frequency, while ETTm1 and ETTm2 are collected at 15-minute intervals. All time series are measured from two electricity transformers. • Weather(Wu et al., 20...

  8. [8]

    Identity

    Dataset H Point wise Patch wise Variate wise Identity L to dim C to dim ˆµˆσ min ˆµˆσ min ˆµˆσ min ˆµˆσ min ˆµˆσ min ˆµˆσ min ETTh1 96 0.3898 0.0143 0.3651 0.37720.00580.3645 0.3834 0.00800.3641 0.5890 0.1537 0.3721 0.3937 0.0156 0.3721 0.4602 0.0357 0.4013 192 0.4324 0.0166 0.3992 0.42400.01330.3984 0.4247 0.01280.3983 0.6092 0.1408 0.4028 0.4343 0.0161 ...

  9. [9]

    Highlighting follows the convention in Table

    Dataset H MLP Transformer Identity ˆµˆσ min ˆµˆσ min ˆµˆσ min ETTh1 96 0.4371 0.08160.3700 0.4518 0.0930 0.3721 0.39070.02310.3641 192 0.4870 0.08270.4043 0.5044 0.0988 0.4091 0.43340.03060.3983 336 0.5186 0.07870.4335 0.5239 0.0795 0.4406 0.46840.02680.4196 720 0.5538 0.1039 0.4533 0.5679 0.10140.4504 0.47220.04060.4248 avg 0.4960 0.0938 - 0.5095 0.1004 ...

  10. [10]

    Highlighting follows the convention in Table

    Dataset Channel Point ˆµˆσ min ˆµˆσ min ETTh1 0.4028 0.03690.3743 0.38660.0090 0.3747 ETTh2 0.30100.01390.2818 0.3028 0.0253 0.2829 ETTm1 0.3462 0.0186 0.3160 0.33620.01730.3137 ETTm2 0.1830 0.00350.1738 0.18220.0018 0.1778 Electricity 0.2018 0.02910.1580 0.19760.0102 0.1840 Weather 0.19030.00960.1599 0.1913 0.0038 0.1816 Table 10.Experiment 2 (Auditing I...

  11. [11]

    Highlighting follows the convention in Table

    Dataset Identity MLP Transformer ˆµˆσ min ˆµˆσ min ˆµˆσ min ETTh1 0.39320.02530.3743 0.3971 0.03340.3749 0.3984 0.0330 0.3773 ETTh2 0.29080.00910.2829 0.3052 0.01230.2818 0.3097 0.0287 0.2873 ETTm1 0.3613 0.0072 0.3522 0.3314 0.01230.3159 0.33080.01520.3137 ETTm2 0.1827 0.0017 0.1804 0.18050.00420.1738 0.1843 0.00340.1770 Electricity 0.2078 0.0152 0.1981 ...

  12. [12]

    Highlighting follows the convention in Table 2, where red denotes MSE and blue denotes MAE

    Dataset H iTransformer SimpleTM ˆµˆσ min ˆµˆσ min ETTh1 96 0.3934 0.0133 0.3789 0.38120.00760.3730 192 0.4354 0.0099 0.4171 0.42870.01050.4148 336 0.4656 0.0127 0.4434 0.46010.01320.4314 720 0.4711 0.0155 0.4504 0.46460.01510.4473 avg 0.4401 0.0335 - 0.43240.0354 - ETTh2 96 0.3214 0.0228 0.2964 0.29450.00650.2840 192 0.3887 0.0167 0.3599 0.37090.01100.352...

  13. [13]

    (a) Performance across 2 Embeddings. Dataset Channel Point ˆµ ci low ci upp ˆσ B ˆµ ci low ci upp ˆσ B ETTh1 0.4028 0.3943 0.4114 0.00140.3743 0.38660.3845 0.3887 0.0001 0.3747 ETTh2 0.30100.2987 0.3033 0.00020.2818 0.3028 0.2987 0.3069 0.0006 0.2829 ETTm1 0.3462 0.3431 0.3492 0.0003 0.3160 0.33620.3334 0.3390 0.00030.3137 ETTm2 0.1830 0.1822 0.1838 0.000...