KANMixer: a minimal KAN-centered mixer for long-term time series forecasting

Lingyu Jiang , Dengzhe Hou , Yuping Wang , Yao Su , Shuo Xing , Wenjing Chen , Xin Zhang , Zhengzhong Tu

show 4 more authors

Ziming Zhang Fangzhou Lin Michael Zielewski Kazunori D Yamada

Authors on Pith no claims yet

classification 💻 cs.LG

keywords ltsfkanmixerpredictionbackbonebestchoicesdesignforecasting

0 comments

read the original abstract

Long-term time series forecasting (LTSF) underpins critical applications from energy management to weather prediction, yet achieving reliable multi-step-ahead accuracy remains challenging. Existing LTSF approaches, dominated by MLP- and Transformer-based architectures, either rely on simple linear mappings or introduce increasingly complex hand-crafted inductive biases, raising the question of whether a more expressive and principled nonlinear core could offer a better alternative. Therefore, we investigate whether Kolmogorov-Arnold Networks (KANs), a recently proposed model featuring adaptive basis functions capable of granular modulation of nonlinearities, can improve LTSF performance, and under which design choices they are most effective. Specifically, we propose KANMixer, a minimal KAN-centered architecture consisting of a multi-scale pooling frontend, a KAN-based temporal mixing backbone, and prediction heads. By avoiding heavy auxiliary modules, KANMixer enables a clear assessment of KAN components in LTSF. Across 28 benchmark-horizon settings against nine baselines, KANMixer achieves the best MSE in 16 settings and the best MAE in 11. Furthermore, extensive ablations on three representative datasets show that KAN effectiveness depends strongly on the choice of edge function; B-spline bases outperform Fourier and Wavelet alternatives; the prediction head contributes most to the gains; moderate depth is preferred over deeper unstable stacks; and decomposition priors help MLP but harm KAN. Beyond practical guidance for integrating KAN into LTSF, these results reveal an underexplored dependency between structural priors and backbone nonlinearity: design choices that benefit MLP can degrade KAN.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting
cs.LG 2025-11 unverdicted novelty 5.0

TimePre unifies MLP speed and MCL distributional power via Stabilized Instance Normalization to deliver SOTA probabilistic accuracy, orders-of-magnitude faster inference, and improved stability over prior MCL methods.