pith. sign in

arxiv: 2605.05685 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.AI· stat.ML

Temporal Functional Circuits: From Spline Plots to Faithful Explanations in KAN Forecasting

Pith reviewed 2026-05-08 14:52 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML
keywords Kolmogorov-Arnold NetworksKANtime-series forecastinginterpretabilityB-splinegated residualedge functionsattribution
0
0 comments X

The pith

A gated residual KAN decomposes forecasts into a linear base plus sparse spline corrections whose edge functions map to input lags and prove predictive via removal tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Kolmogorov-Arnold Networks can deliver mechanistic explanations for time-series forecasts by exposing every edge as an explicit learnable function rather than a hidden weight. It introduces Temporal Functional Circuits on top of a gated residual architecture that splits each prediction into a fixed linear term and a selectively activated KAN term. Each edge is then linked to specific past input lags through output-aware attribution, ranked by how much its activation range contributes, and tested for faithfulness by zeroing the edge or stripping out its learned B-spline while leaving the base SiLU intact. The key result is that removing the spline component measurably worsens accuracy, indicating the learned curve shape itself encodes useful temporal structure beyond the base activation. On synthetic regime-switching data the gate widens with complexity and the full model cuts MSE by 59 percent compared with linear baselines, while staying competitive on eight real benchmarks.

Core claim

The spline component inside each KAN edge function carries predictive information independent of the base activation; removing the learned B-spline while retaining the SiLU term degrades forecast accuracy, and output-aware attribution can reliably map those edges to the input lags they depend on inside a gated residual KAN.

What carries the argument

Temporal Functional Circuits, which performs output-aware attribution to link KAN edge functions to input time lags, ranks them by activation range, and validates them through edge-level interventions such as zeroing or spline removal inside a gated residual KAN.

If this is right

  • On regime-switching synthetic signals the gated KAN records 59 percent lower MSE than linear-only models.
  • The learned gate opens progressively wider as the complexity of the underlying signal increases across four synthetic regimes.
  • The architecture matches or exceeds linear, attention, and MLP baselines across eight forecasting benchmarks while exposing explicit edge functions.
  • MLP-based corrections cannot supply the same per-edge interpretability because they lack the explicit spline parameterization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the attribution mapping holds, forecasters could inspect which past lags drive a given prediction and adjust data collection accordingly.
  • The same intervention protocol might be applied to other spline-based or piecewise models to test whether their nonlinear pieces add value.
  • Wider adoption could let practitioners replace black-box corrections with traceable spline shapes that reveal recurring temporal motifs in their data.

Load-bearing premise

Output-aware attribution correctly identifies which input lags each edge function depends on and that intervening on individual edges isolates their contribution without interference from the gating or residual connections.

What would settle it

Finding that forecast error stays the same or improves after removing the B-spline component from edges that attribution ranked as important, or that zeroing an attributed edge produces no change in the output.

Figures

Figures reproduced from arXiv: 2605.05685 by Naveen Mysore.

Figure 1
Figure 1. Figure 1: Circuit walkthrough on Weather (channel 10). (a) Input window with the top-attributed view at source ↗
Figure 2
Figure 2. Figure 2: Synthetic regime experiments. (a) Gate utilization view at source ↗
Figure 3
Figure 3. Figure 3: Edge deletion curves (residual branch, ranked by view at source ↗
Figure 4
Figure 4. Figure 4: DecompKAN ablation study (H=96). On Weather, the full KAN pipeline is best. On ETTh1, replacing KAN with a linear layer improves performance. The choice of nonlinear core is dataset-dependent, motivating the gated residual design view at source ↗
Figure 5
Figure 5. Figure 5: UKAN heatmap across datasets and horizons. 0.00 0.05 0.10 0.15 0.20 Snonlin 0.05 0.10 0.15 0.20 0.25 0.30 UKAN Spearman ρ = 0.456 (p = 0.015, n = 28) Weather ETTh1 ETTh2 ETTm1 ETTm2 PPG Solar view at source ↗
Figure 6
Figure 6. Figure 6: Snonlin vs. UKAN scatter. Spearman ρ = 0.46, p = 0.015. C Published Baselines D Basis Function Ablation ∗SinCos on Solar diverged during training (learnable frequency initialization issue). The choice of basis function is dataset-dependent: B-spline is strongest on Solar (where smooth seasonal splines match the signal), Fourier and SinCos excel on ETTh2 (periodic structure), and SinCos achieves competitive… view at source ↗
Figure 7
Figure 7. Figure 7: Top-5 learned KAN edge functions ϕe(z) ranked by Re for the residual branch. Weather edges (top) show complex nonlinear shapes concentrated on the most recent patch (lags 320–336). ETTm1 edges (bottom) are smoother with lower Re, consistent with lower UKAN. H Temporal Grounding Validation Synthetic lag recovery and Ae validation. A synthetic signal x(t) = 0.6 x(t−3)+0.25 x(t−12)+ ϵ (ϵ ∼ N (0, 0.1 2 )) is c… view at source ↗
read the original abstract

Unlike MLPs, Kolmogorov-Arnold Networks (KANs) expose explicit learnable edge functions on every connection, enabling mechanistic explanation in time-series forecasting. This paper introduces Temporal Functional Circuits, a framework that transforms KAN edge functions from latent visualizations into faithful, temporally grounded explanations. Built on a gated residual KAN that decomposes forecasts into a linear base and a sparsely activated KAN correction, the framework (i) maps each edge to input lags via output-aware attribution, (ii) ranks edges by learned activation range, and (iii) validates faithfulness through edge-level interventions including zeroing and spline removal. Removing the learned B-spline component while retaining the base SiLU term degrades forecasts, providing evidence that the spline shape itself carries predictive value beyond the base activation. On four synthetic regimes of increasing complexity, the learned gate opens progressively wider as signal complexity grows. On regime-switching signals, gated KAN achieves 59% lower MSE than linear-only models. Across eight benchmarks, the gated architecture is competitive with linear, attention, and MLP alternatives, while providing interpretable edge functions that MLP-based corrections cannot offer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces Temporal Functional Circuits, a framework for transforming KAN edge functions into temporally grounded explanations for time-series forecasting. It employs a gated residual KAN that decomposes forecasts into a linear base plus sparsely activated KAN correction, maps edges to input lags via output-aware attribution, ranks them by learned activation range, and validates faithfulness via edge interventions (zeroing and spline removal). Key results include forecast degradation when the B-spline is removed while retaining the SiLU base, progressively wider gate opening with signal complexity, 59% lower MSE than linear models on regime-switching signals, and competitive performance against linear, attention, and MLP baselines on eight benchmarks.

Significance. If the faithfulness claims hold after addressing potential confounding, the work offers a concrete advance in mechanistic interpretability for forecasting by leveraging KANs' explicit edge functions rather than post-hoc methods. The intervention-based validation approach (spline removal while retaining base activation) is a strength worth preserving, as is the empirical demonstration of gate behavior scaling with complexity and the competitive benchmark results. These elements could support more trustworthy KAN-based forecasters if the isolation of spline contributions is rigorously established.

major comments (1)
  1. [edge-level interventions and spline-removal validation] The central evidence that 'the spline shape itself carries predictive value beyond the base activation' (abstract) rests on the spline-removal intervention. However, in the gated residual architecture, zeroing only the learned B-spline while keeping the SiLU term can alter the input to the learned gate or permit compensation along the residual linear path, so the observed MSE increase does not necessarily isolate the spline's explanatory contribution. This is load-bearing for the faithfulness claim and the 'Temporal Functional Circuits' framework; additional controls (e.g., gate-ablation or input-normalized interventions) are needed to rule out compensation, especially given the reported wider gate opening on complex regimes.
minor comments (2)
  1. [abstract and experimental results] The abstract reports concrete performance numbers (59% MSE reduction, competitive benchmarks) but provides no details on experimental setup, number of runs, statistical tests, or hyperparameter selection; these should be added to support reproducibility and the soundness of the empirical claims.
  2. [output-aware attribution] The output-aware attribution method for mapping edges to specific input lags is central to the temporal grounding claim but is described at a high level; a precise definition or pseudocode would clarify how it avoids confounding from the gated architecture.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review, which identifies a key area for strengthening our validation of the spline contributions. We address the major comment below and will incorporate additional controls to bolster the faithfulness claims.

read point-by-point responses
  1. Referee: [edge-level interventions and spline-removal validation] The central evidence that 'the spline shape itself carries predictive value beyond the base activation' (abstract) rests on the spline-removal intervention. However, in the gated residual architecture, zeroing only the learned B-spline while keeping the SiLU term can alter the input to the learned gate or permit compensation along the residual linear path, so the observed MSE increase does not necessarily isolate the spline's explanatory contribution. This is load-bearing for the faithfulness claim and the 'Temporal Functional Circuits' framework; additional controls (e.g., gate-ablation or input-normalized interventions) are needed to rule out compensation, especially given the reported wider gate opening on complex regimes.

    Authors: We acknowledge that the referee's concern is valid and merits additional controls to ensure the spline-removal intervention cleanly isolates the B-spline's contribution. In the current gated residual design, the KAN correction (which includes the per-edge SiLU base plus learned B-spline) is modulated by the gate before being added to the linear residual path; thus, removing the spline term could in principle influence gate behavior or allow the linear path to partially compensate. To address this directly, we will add new experiments in the revised manuscript: (1) gate-ablation studies in which the gate is fixed to a constant value of 1 while performing spline removal, and (2) input-normalized interventions that hold the gate's input statistics fixed across conditions. These results will be presented alongside the existing intervention tables. We expect the degradation to persist under fixed gating, thereby confirming that the spline shapes carry independent predictive value beyond the base activation and the residual path. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical interventions are independent of fitting process

full rationale

The paper's key evidence consists of post-training edge interventions (zeroing, spline removal) on a trained gated residual KAN model, showing MSE degradation when the B-spline component is removed while retaining the SiLU base. This is an external test performed after optimization and does not reduce any claimed quantity to its own fitted inputs by construction. No equations are presented that define a target result in terms of itself, no fitted parameters are relabeled as independent predictions, and no load-bearing claims rely on self-citations whose content is unverified or tautological. The architecture description (linear base plus sparsely activated KAN correction) and attribution/ranking steps are standard post-hoc analyses on an already-trained network. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review yields limited visibility into parameters or assumptions; framework appears to rely on standard neural training practices without explicit new axioms or free parameters stated.

axioms (1)
  • domain assumption Standard neural network training assumptions including differentiability of activations and suitability of MSE loss for forecasting
    Implicit in any KAN training setup for time series
invented entities (1)
  • Temporal Functional Circuits no independent evidence
    purpose: Framework to convert KAN edge functions into temporally grounded explanations
    Newly introduced construct in the paper

pith-pipeline@v0.9.0 · 5496 in / 1210 out tokens · 44794 ms · 2026-05-08T14:52:24.266914+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 6 canonical work pages

  1. [1]

    AAAI , year=

    Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , author=. AAAI , year=

  2. [2]

    NeurIPS , year=

    Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting , author=. NeurIPS , year=

  3. [3]

    Zhou, Tian and Ma, Ziqing and Wen, Qingsong and Wang, Xue and Sun, Liang and Jin, Rong , booktitle=

  4. [4]

    AAAI , year=

    Are Transformers Effective for Time Series Forecasting? , author=. AAAI , year=

  5. [5]

    ICLR , year=

    A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author=. ICLR , year=

  6. [6]

    Liu, Yong and Hu, Tengge and Zhang, Haoran and Wu, Haixu and Wang, Shiyu and Ma, Lintao and Long, Mingsheng , booktitle=

  7. [7]

    Wu, Haixu and Hu, Tengge and Liu, Yong and Zhou, Hang and Wang, Jianmin and Long, Mingsheng , booktitle=

  8. [8]

    ICLR , year=

    Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting , author=. ICLR , year=

  9. [9]

    Long-term Forecasting with

    Das, Abhimanyu and Kong, Weihao and Leber, Andrew and Mathew, Rajat and Sen, Rajat , booktitle=. Long-term Forecasting with

  10. [10]

    ICLR , year=

    Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift , author=. ICLR , year=

  11. [11]

    NeurIPS , year=

    Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting , author=. NeurIPS , year=

  12. [12]

    Fan, Wei and Wang, Pengyang and Wang, Dongkun and Wang, Dongjie and Zhou, Yuanchun and Fu, Yanjie , journal=

  13. [13]

    ICLR , year=

    Liu, Ziming and Wang, Yixuan and Vaidya, Sachin and Ruehle, Fabian and Halverson, James and Solja. ICLR , year=

  14. [14]

    Han, Xiao and Zhang, Xinfeng and Wu, Yiling and Zhang, Zhenduo and Wu, Zhe , journal=

  15. [15]

    Huang, Songtao and Zhao, Zhen and Li, Can and Bai, Lei , booktitle=

  16. [16]

    ICML , year=

    Unified Training of Universal Time Series Forecasting Transformers , author=. ICML , year=

  17. [17]

    Transactions on Machine Learning Research (TMLR) , year=

    Chronos: Learning the Language of Time Series , author=. Transactions on Machine Learning Research (TMLR) , year=

  18. [18]

    ICML , year=

    A Decoder-Only Foundation Model for Time-Series Forecasting , author=. ICML , year=

  19. [19]

    and Shi, Xiaoming and Chen, Pin-Yu and Liang, Yuxuan and Li, Yuan-Fang and Pan, Shirui and Wen, Qingsong , journal=

    Jin, Ming and Wang, Shiyu and Ma, Lintao and Chu, Zhixuan and Zhang, James Y. and Shi, Xiaoming and Chen, Pin-Yu and Liang, Yuxuan and Li, Yuan-Fang and Pan, Shirui and Wen, Qingsong , journal=

  20. [20]

    NeurIPS , year=

    Attention Is All You Need , author=. NeurIPS , year=

  21. [21]

    Doklady Akademii Nauk SSSR , volume=

    On the Representation of Continuous Functions of Several Variables by Superposition of Continuous Functions of One Variable and Addition , author=. Doklady Akademii Nauk SSSR , volume=

  22. [22]

    A Practical Guide to Splines , author=

  23. [23]

    SIGIR , year=

    Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , author=. SIGIR , year=

  24. [24]

    ICLR , year=

    Graph Attention Networks , author=. ICLR , year=

  25. [25]

    Learning Phrase Representations using

    Cho, Kyunghyun and van Merri. Learning Phrase Representations using. EMNLP , year=

  26. [26]

    ICLR , year=

    Adam: A Method for Stochastic Optimization , author=. ICLR , year=

  27. [27]

    Chen, Hui and Luong, Viet and Mukherjee, Lopamudra and Singh, Vikas , booktitle=

  28. [28]

    Reiss, Attila and Indlekofer, Ina and Schmidt, Philip and Van Laerhoven, Kristof , journal=. Deep. 2019 , publisher=

  29. [29]

    Lu, Jiecheng and Han, Xu and Sun, Yan and Yang, Shihao , booktitle=

  30. [30]

    NeurIPS , year=

    A Unified Approach to Interpreting Model Predictions , author=. NeurIPS , year=

  31. [31]

    2026 , note=

    Anonymous , journal=. 2026 , note=

  32. [32]

    2024 , note=

    efficient-kan: An efficient pure-. 2024 , note=

  33. [33]

    NeurIPS , year=

    Sanity Checks for Saliency Maps , author=. NeurIPS , year=

  34. [34]

    ICML , year=

    Axiomatic Attribution for Deep Networks , author=. ICML , year=

  35. [35]

    Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (

    Kim, Been and Wattenberg, Martin and Gilmer, Justin and Carrie, Cai and Wexler, James and Viegas, Fernanda and Sayres, Rory , booktitle=. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (

  36. [36]

    2023 , eprint=

    Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency , author=. 2023 , eprint=

  37. [37]

    ICML , year=

    Learning Important Features Through Propagating Activation Differences , author=. ICML , year=

  38. [38]

    American Mathematical Society Translations , volume=

    On the Representation of Functions of Several Variables as a Superposition of Functions of Fewer Variables , author=. American Mathematical Society Translations , volume=

  39. [39]

    Selvaraju, Ramprasaath R and Cogswell, Michael and Das, Abhishek and Vedantam, Ramakrishna and Parikh, Devi and Batra, Dhruv , booktitle=. Grad-

  40. [40]

    Kolmogorov-Arnold networks for time series: Bridging predictive power and interpretability,

    Kolmogorov-Arnold Networks for Time Series: Bridging Predictive Power and Interpretability , author=. arXiv preprint arXiv:2406.02496 , year=

  41. [41]

    NeurIPS , year=

    A Benchmark for Interpretability Methods in Deep Neural Networks , author=. NeurIPS , year=

  42. [42]

    ICCV , year=

    Interpretable Explanations of Black Boxes by Meaningful Perturbation , author=. ICCV , year=

  43. [43]

    KDD , year=

    Bento, Jo. KDD , year=

  44. [44]

    Oreshkin, Boris N and Carpov, Dmitri and Chapados, Nicolas and Bengio, Yoshua , booktitle=

  45. [45]

    Transformer Circuits Thread , year=

    A Mathematical Framework for Transformer Circuits , author=. Transformer Circuits Thread , year=

  46. [46]

    ``Why Should

    Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos , booktitle=. ``Why Should

  47. [47]

    NeurIPS , year=

    Benchmarking Deep Learning Interpretability in Time Series Predictions , author=. NeurIPS , year=

  48. [48]

    ICML , year=

    Explaining Time Series Predictions with Dynamic Masks , author=. ICML , year=

  49. [49]

    ICLR , year=

    Temporal Dependencies in Feature Importance for Time Series Prediction , author=. ICLR , year=

  50. [50]

    Explainable artificial intelligence (xai) on timeseries data: A survey.arXiv preprint arXiv:2104.00950,

    Rojat, Thomas and Puget, Rapha. Explainable Artificial Intelligence (. arXiv preprint arXiv:2104.00950 , year=

  51. [51]

    Explainable

    Theissler, Andreas and Spinnato, Francesco and Schlegel, Udo and Guidotti, Riccardo , journal=. Explainable

  52. [52]

    Towards Faithfully Interpretable

    Jacovi, Alon and Goldberg, Yoav , booktitle=. Towards Faithfully Interpretable

  53. [53]

    Nature Machine Intelligence , volume=

    Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead , author=. Nature Machine Intelligence , volume=

  54. [54]

    NAACL-HLT , year=

    Attention is not Explanation , author=. NAACL-HLT , year=

  55. [55]

    EMNLP , year=

    Attention is not not Explanation , author=. EMNLP , year=

  56. [56]

    KDD , year=

    Time Series Shapelets: A New Primitive for Data Mining , author=. KDD , year=

  57. [57]

    KDD , year=

    Learning Time-Series Shapelets , author=. KDD , year=

  58. [58]

    Neural Computation , volume=

    Adaptive Mixtures of Local Experts , author=. Neural Computation , volume=

  59. [59]

    ICLR , year=

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , author=. ICLR , year=

  60. [60]

    International Journal of Forecasting , volume=

    Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , author=. International Journal of Forecasting , volume=

  61. [61]

    AISTATS , year=

    Mixture-of-Linear-Experts for Long-term Time Series Forecasting , author=. AISTATS , year=

  62. [62]

    Distill , year=

    Zoom In: An Introduction to Circuits , author=. Distill , year=

  63. [63]

    NeurIPS , year=

    Causal Abstractions of Neural Networks , author=. NeurIPS , year=

  64. [64]

    Wang, Shiyu and Wu, Haixu and Shi, Xiaoming and Hu, Tengge and Luo, Huakun and Ma, Lintao and Zhang, James Y and Zhou, Jun , booktitle=

  65. [65]

    Kolmog orov-Arnold Networks (KANs) for time series analysis,

    Vaca-Rubio, Cristian J and Blanco, Luis and Pereira, Roberto and M. arXiv preprint arXiv:2405.08790 , year=

  66. [66]

    Sigkan: Signature-weighted kolmogorov- arnold networks for time series,

    Hugo Inzirillo and Remi Genet , year=. 2406.17890 , archivePrefix=

  67. [67]

    A Gated Residual

    Hugo Inzirillo and Remi Genet , year=. A Gated Residual. 2409.15161 , archivePrefix=

  68. [68]

    2025 , publisher=

    Kim, Dongwoo and Kang, Junghyo and Hwang, Heesung and Kim, Hyungju , journal=. 2025 , publisher=

  69. [69]

    and Bouguila, Nizar , journal=

    Hasan, Md Zahidul and Ben Hamza, A. and Bouguila, Nizar , journal=. Time Series Forecasting with

  70. [70]

    arXiv preprint arXiv:2509.22467 , year=

    Almod. arXiv preprint arXiv:2509.22467 , year=

  71. [71]

    Fuad, Kazi Ahmed Asif and Chen, Lizhong , journal=