DecompKAN: Decomposed Patch-KAN for Long-Term Time Series Forecasting
Pith reviewed 2026-05-08 04:42 UTC · model grok-4.3
The pith
DecompKAN delivers best or tied-best MSE on 15 of 32 benchmark cases for long-term time series forecasting while exposing its learned functions for inspection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DecompKAN combines trend-residual decomposition, channel-wise patching, learned instance normalization, and B-spline KAN edge functions into a lightweight model. Each KAN edge learns an explicit one-dimensional scalar function over the patch embeddings that can be plotted directly. On standard benchmarks it records best or tied-best MSE on 15 of 32 dataset-horizon combinations among selected baselines and on 20 of 36 comparisons under a controlled same-recipe protocol across nine datasets, including physiological PPG-DaLiA data. Ablation results indicate that the decomposition-patching-normalization pipeline contributes more to performance than the choice of nonlinear layer, while the KAN (K
What carries the argument
B-spline Kolmogorov-Arnold Network edge functions that learn explicit, inspectable 1D scalar transformations over learned patch-embedding coordinates.
If this is right
- The model records particular gains on datasets with smooth temporal dynamics such as Solar, ECL, and Weather.
- It shows competitive results on physiological time series including the PPG-DaLiA benchmark.
- Ablations indicate the decomposition, patching, and normalization steps matter more for accuracy than the specific nonlinear layer.
- Visualization of the edge functions reveals qualitatively different latent nonlinearities across domains.
Where Pith is reading between the lines
- The explicit functions could support debugging and trust in high-stakes forecasting applications such as energy or health monitoring.
- The same decomposition-plus-patching recipe might be tested with other function approximators to isolate whether KAN adds unique value beyond interpretability.
- If the pipeline dominates performance, hybrid architectures that keep the front-end decomposition but swap the backend could be explored systematically.
- The reported domain-specific nonlinearities suggest that pre-training or meta-learning across domains might further improve generalization.
Load-bearing premise
The chosen baselines and the controlled same-recipe evaluation fairly represent current methods and that the reported MSE gains are not artifacts of hyperparameter choices or preprocessing details omitted from the experiments.
What would settle it
A re-run of the 32 and 36 comparisons in which stronger hyperparameter tuning or additional published baselines close or reverse the reported MSE advantages on the majority of the winning cases.
Figures
read the original abstract
Accurate time series forecasting in scientific domains such as climate modeling, physiological monitoring, and energy systems benefits from both competitive predictions and model transparency. This work proposes DecompKAN, a lightweight attention-free architecture that combines trend-residual decomposition, channel-wise patching, learned instance normalization, and B-spline Kolmogorov-Arnold Network (KAN) edge functions. Each KAN edge learns an explicit, inspectable 1D scalar function over learned patch-embedding coordinates that can be directly visualized. On standard benchmarks, DecompKAN achieves best or tied-best MSE on 15 of 32 dataset-horizon combinations among selected published baselines, and achieves best or tied-best MSE on 20 of 36 comparisons under a controlled same-recipe evaluation across 9 datasets including the physiological PPG-DaLiA benchmark. The architecture shows particular strength on datasets with smooth temporal dynamics (Solar -17%, ECL -10% vs. iTransformer, Weather) and physiological time series. Visualization of learned edge functions reveals qualitatively different latent nonlinearities across domains. Ablation analysis shows that the architectural pipeline (decomposition, patching, normalization) drives performance more than the choice of nonlinear layer, while the KAN formulation enables inspection of learned latent transformations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DecompKAN, a lightweight attention-free architecture for long-term time series forecasting. It integrates trend-residual decomposition, channel-wise patching, learned instance normalization, and B-spline Kolmogorov-Arnold Network (KAN) edge functions. The authors claim that DecompKAN achieves best or tied-best MSE on 15 of 32 dataset-horizon combinations against selected published baselines, and on 20 of 36 comparisons under a controlled same-recipe evaluation across 9 datasets (including PPG-DaLiA). They highlight stronger performance on smooth-dynamics datasets (e.g., Solar, ECL, Weather) and physiological series, provide visualizations of learned KAN edge functions, and report ablation results indicating that the overall pipeline contributes more to performance than the choice of nonlinear layer.
Significance. If the performance claims hold under rigorously controlled conditions, the work offers a useful interpretable alternative to attention-based forecasters, with particular relevance for scientific domains requiring model inspection. The explicit visualization of learned 1D edge functions and the ablation separating pipeline effects from nonlinearity choice are constructive contributions. The approach is lightweight and avoids attention, which could be valuable for resource-constrained or transparency-focused applications.
major comments (2)
- [§4.2] §4.2 (Controlled same-recipe evaluation) and Table 3: The claim that DecompKAN outperforms iTransformer by 17% on Solar (and similar deltas elsewhere) under identical conditions is load-bearing for the central performance claim, yet the section does not provide explicit confirmation or a supplementary table listing the precise train/val/test splits, instance normalization procedure, patching parameters, optimizer, epoch count, and early-stopping rule applied uniformly to every baseline (including iTransformer and others). Without this, the reported improvements cannot be unambiguously attributed to the DecompKAN components rather than unequal experimental protocols.
- [§5.3] §5.3 (Ablation study): The conclusion that 'the architectural pipeline drives performance more than the nonlinear layer' rests on comparisons that replace KAN with other nonlinearities, but the text does not report the exact MSE deltas or statistical tests when the full pipeline (decomposition + patching + normalization) is held fixed while only swapping the edge functions. This makes it difficult to quantify the incremental contribution of the B-spline KAN formulation versus the preprocessing steps.
minor comments (3)
- [Tables 2-3] The reported MSE values in Tables 2 and 3 lack error bars, standard deviations across random seeds, or results from statistical significance tests (e.g., paired t-tests), which is standard for claiming 'best or tied-best' rankings.
- [Figure 4] Figure 4 (learned edge function visualizations): The caption and surrounding text could more explicitly state the input coordinate ranges, the number of samples visualized per domain, and whether the functions are shown after training on the full dataset or a subset.
- [§1 and §2] The abstract and §1 mention 'learned instance normalization' but the precise formulation (e.g., whether it is affine or includes learnable parameters per channel) is not contrasted with standard RevIN or other normalization baselines in the related-work section.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to strengthen the experimental documentation and ablation reporting.
read point-by-point responses
-
Referee: [§4.2] §4.2 (Controlled same-recipe evaluation) and Table 3: The claim that DecompKAN outperforms iTransformer by 17% on Solar (and similar deltas elsewhere) under identical conditions is load-bearing for the central performance claim, yet the section does not provide explicit confirmation or a supplementary table listing the precise train/val/test splits, instance normalization procedure, patching parameters, optimizer, epoch count, and early-stopping rule applied uniformly to every baseline (including iTransformer and others). Without this, the reported improvements cannot be unambiguously attributed to the DecompKAN components rather than unequal experimental protocols.
Authors: We agree that explicit protocol documentation is necessary to substantiate the controlled-evaluation claims. In the revised manuscript we have added Supplementary Table S1, which lists the exact train/val/test splits, instance-normalization procedure, patching parameters, optimizer, maximum epoch count, and early-stopping rule applied uniformly to all models (including re-implemented baselines) in the same-recipe experiments of §4.2. These settings were enforced identically across models while following the original baseline papers’ data splits where they exist; the table makes the uniformity verifiable and supports attribution of the reported gains to the DecompKAN components. revision: yes
-
Referee: [§5.3] §5.3 (Ablation study): The conclusion that 'the architectural pipeline drives performance more than the nonlinear layer' rests on comparisons that replace KAN with other nonlinearities, but the text does not report the exact MSE deltas or statistical tests when the full pipeline (decomposition + patching + normalization) is held fixed while only swapping the edge functions. This makes it difficult to quantify the incremental contribution of the B-spline KAN formulation versus the preprocessing steps.
Authors: We acknowledge that quantitative deltas and statistical tests would make the ablation more precise. The revised §5.3 now includes an expanded table that reports the exact MSE values for each nonlinearity (KAN, MLP, ReLU, GELU) under the fixed full pipeline, together with the corresponding percentage changes relative to the KAN baseline and results of paired statistical tests (Wilcoxon signed-rank) across the nine datasets. These additions allow readers to directly assess the incremental contribution of the B-spline formulation versus the preprocessing pipeline. revision: yes
Circularity Check
No circularity: empirical performance claims rest on external benchmarks
full rationale
The paper defines DecompKAN as an explicit composition of trend-residual decomposition, channel-wise patching, instance normalization, and B-spline KAN layers; none of these components is defined in terms of the final performance metric or any predicted quantity. Reported results consist of direct MSE comparisons against published baselines and a controlled re-evaluation on fixed datasets, with no equations that rename fitted parameters as predictions, no self-citation chains invoked as uniqueness theorems, and no ansatz smuggled through prior work. Ablation statements compare architectural choices without reducing one to the other by construction. The derivation chain is therefore self-contained against external data and does not collapse to its inputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- patch size and stride
- B-spline order and grid size
axioms (1)
- domain assumption Trend-residual decomposition yields additive components that are easier to model separately.
Reference graph
Works this paper leans on
-
[1]
The residual branch learns sharper, more oscillatory functions with steeper gradients, appropriate for modeling higher-frequency seasonal and irregular components
Branch specialization.The trend branch learns predominantly smooth, slowly varying functions (gradual slopes, soft thresholds), consistent with its role in capturing low-frequency dynamics. The residual branch learns sharper, more oscillatory functions with steeper gradients, appropriate for modeling higher-frequency seasonal and irregular components
-
[2]
Functional diversity.Even within a single layer, edges learn qualitatively different shapes: smooth monotone mappings, sharp threshold transitions, oscillatory patterns, and near- identity functions. This diversity arises naturally from training without any explicit regular- ization on edge function shape, suggesting that B-spline KAN layers discover a he...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.