pith. sign in

arxiv: 2509.14640 · v2 · submitted 2025-09-18 · 💻 cs.LG

DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers

Pith reviewed 2026-05-18 15:31 UTC · model grok-4.3

classification 💻 cs.LG
keywords positional encodingtime seriestransformerswavelet transformsignal processingmachine learningDyWPE
0
0 comments X

The pith

DyWPE generates transformer positional embeddings directly from the time series signal by applying the Discrete Wavelet Transform.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Traditional positional encodings in transformers rely only on sequence indices and remain blind to the actual values in the time series. DyWPE instead decomposes the raw input signal with the Discrete Wavelet Transform to produce embeddings that carry both order and multi-scale signal features. The approach targets time series where patterns shift across temporal scales, such as biomedical recordings. Experiments on ten datasets show consistent gains over prior positional methods, with larger benefits on longer sequences and complex signals. If the method holds, transformers could handle non-stationary time series more effectively without extra dataset tuning.

Core claim

Existing positional encodings are signal-agnostic because they derive information solely from indices. DyWPE replaces this with a dynamic construction that feeds the raw time series through the Discrete Wavelet Transform, yielding embeddings that embed both positional order and the signal's own multi-resolution characteristics.

What carries the argument

Dynamic Wavelet Positional Encoding (DyWPE), which applies the Discrete Wavelet Transform directly to the input time series to create signal-dependent positional embeddings.

If this is right

  • Time series transformers gain accuracy on longer sequences without changing the underlying architecture.
  • Complex non-stationary signals such as biomedical recordings receive larger performance lifts than simpler datasets.
  • The same wavelet-derived embeddings can be used across multiple datasets without per-dataset hyperparameter search.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could extend to other sequence models that currently use index-based positions, such as state-space models.
  • Wavelet scales chosen at inference time might allow adaptive resolution for different forecasting horizons.
  • Similar signal-aware encodings might reduce the need for heavy data augmentation in small time series regimes.

Load-bearing premise

That the Discrete Wavelet Transform applied to the raw input time series will produce positional embeddings that preserve necessary order information while adding useful signal-dependent features without introducing artifacts.

What would settle it

An experiment on a long biomedical time series dataset where DyWPE yields lower accuracy or higher error than standard sinusoidal or learned positional encodings.

read the original abstract

Existing positional encoding methods in transformers are fundamentally signal-agnostic, deriving positional information solely from sequence indices while ignoring the underlying signal characteristics. This limitation is particularly problematic for time series analysis, where signals exhibit complex, non-stationary dynamics across multiple temporal scales. We introduce Dynamic Wavelet Positional Encoding (DyWPE), a novel signal-aware framework that generates positional embeddings directly from input time series using the Discrete Wavelet Transform (DWT). Comprehensive experiments on ten diverse time series datasets demonstrate that DyWPE consistently outperforms state-of-the-art positional encoding methods, with particularly significant improvements on longer sequences and complex biomedical signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Dynamic Wavelet Positional Encoding (DyWPE), a signal-aware positional encoding for time series transformers. It generates embeddings by applying the Discrete Wavelet Transform (DWT) directly to the raw input time series, in contrast to index-based methods that ignore signal characteristics. Experiments across ten diverse datasets show consistent outperformance versus state-of-the-art positional encodings, with larger gains reported on longer sequences and complex biomedical signals.

Significance. If the central construction is shown to preserve temporal order while adding signal-dependent features, the approach could meaningfully extend transformer applicability to non-stationary time series. The multi-dataset empirical evaluation and focus on longer sequences constitute a concrete strength that would support broader adoption if the order-preservation mechanism is clarified.

major comments (2)
  1. [§3.2] §3.2 (DyWPE construction): the mapping from DWT approximation and detail coefficients to embeddings is described without an auxiliary absolute or relative position-index injection step. Standard DWT coefficients are indexed by scale and local support rather than global sequence position; if this mapping is used directly, the resulting vectors function primarily as dynamic feature augmentation. This directly affects the central claim that DyWPE supplies positional information necessary for attention on longer sequences.
  2. [Table 4] Table 4 (long-sequence results): the reported accuracy gains on sequences longer than 512 steps are presented without per-run standard deviations or paired statistical tests against the strongest baseline. Because the paper emphasizes particular improvements on longer sequences, the absence of these controls leaves open whether the gains are robust or sensitive to post-hoc hyperparameter choices.
minor comments (2)
  1. [Figure 2] Figure 2: the wavelet coefficient visualization would benefit from explicit annotation of the original time indices to illustrate how order is retained after the DWT step.
  2. [§4.1] §4.1 (dataset description): the biomedical datasets are listed without reference to their sampling rates or non-stationarity measures, which would help readers assess the claimed suitability of DyWPE for such signals.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and the opportunity to clarify aspects of our work on DyWPE. We address each major comment below and describe the planned revisions.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (DyWPE construction): the mapping from DWT approximation and detail coefficients to embeddings is described without an auxiliary absolute or relative position-index injection step. Standard DWT coefficients are indexed by scale and local support rather than global sequence position; if this mapping is used directly, the resulting vectors function primarily as dynamic feature augmentation. This directly affects the central claim that DyWPE supplies positional information necessary for attention on longer sequences.

    Authors: We appreciate the referee's careful reading of §3.2. The DWT is applied directly to the full input sequence, and the resulting approximation and detail coefficients retain explicit temporal localization due to the time-frequency properties of the wavelet basis. Each coefficient corresponds to a specific time support within the original sequence, so the mapping to embeddings encodes both scale-specific signal content and the underlying temporal positioning without requiring a separate index-based injection. This design ensures that the embeddings remain sensitive to sequence order while incorporating signal-dependent information. To strengthen the presentation, we will revise §3.2 to include a formal argument based on the invertibility of the DWT and add a brief illustrative diagram showing coefficient alignment with original time indices. revision: yes

  2. Referee: [Table 4] Table 4 (long-sequence results): the reported accuracy gains on sequences longer than 512 steps are presented without per-run standard deviations or paired statistical tests against the strongest baseline. Because the paper emphasizes particular improvements on longer sequences, the absence of these controls leaves open whether the gains are robust or sensitive to post-hoc hyperparameter choices.

    Authors: We agree that the long-sequence results in Table 4 would benefit from additional statistical controls. In the revised manuscript we will report per-run standard deviations (computed over the same number of independent runs used elsewhere in the paper) and include paired statistical tests (e.g., Wilcoxon signed-rank test) against the strongest baseline for each long-sequence dataset. These additions will directly address concerns about robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical proposal validated by experiments

full rationale

The paper proposes DyWPE as a new signal-aware positional encoding using DWT applied to raw time series inputs, then reports empirical outperformance on ten datasets. No derivation chain, equations, or predictions are presented that reduce by construction to fitted parameters, self-definitions, or self-citations. The central claims rest on experimental results rather than tautological mappings, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that DWT decompositions provide suitable multi-scale features for positional information; no free parameters or invented entities are identifiable from the abstract alone.

axioms (1)
  • domain assumption Discrete Wavelet Transform applied to the input time series produces embeddings that encode both position and signal characteristics effectively.
    This premise is required for the method to generate useful positional encodings.

pith-pipeline@v0.9.0 · 5629 in / 1138 out tokens · 36713 ms · 2026-05-18T15:31:06.898207+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 6 internal anchors

  1. [1]

    INTRODUCTION The transformer architecture has revolutionized sequential data modeling across diverse domains, from natural language processing to time series analysis [1]. A fundamental com- ponent enabling transformers to process sequential data is positional encoding, which addresses the inherent permu- tation invariance of self-attention mechanisms by ...

  2. [2]

    Early work in sequence-to- sequence learning demonstrated the power of attention in encoder-decoder frameworks [7]

    BACKGROUND AND RELATED WORK The application of attention in time series analysis has evolved from augmenting recurrent models to forming the core of modern transformers. Early work in sequence-to- sequence learning demonstrated the power of attention in encoder-decoder frameworks [7]. For time series, this led to methods like dual-stage attention-based RN...

  3. [3]

    prototypes

    DYNAMIC W A VELET POSITIONAL ENCODING 3.1. Problem Formulation and Overview Given a time series datasetX={x 1, x2, ..., xn}withn samples, where each samplex i ∈R L×dx represents ad x- dimensional time series of lengthL, and corresponding labels Y={y 1, y2, ..., yn}wherey i ∈ {1,2, ..., c}, our objective is to learn a positional encoding that captures sign...

  4. [4]

    Rel. Imp

    EXPERIMENTAL EV ALUATION 4.1. Experimental Setup We conduct comprehensive experiments across ten diverse time series datasets spanning multiple domains such as Hu- man Activity Recognition, Device, and EEG classification, with eight datasets from the UEA archive [15] and two addi- tional datasets [16, 17], as shown in Table 1. We evaluated DyWPE against e...

  5. [5]

    CONCLUSION We introduced Dynamic Wavelet Positional Encoding (Dy- WPE), the first signal-aware positional encoding framework for transformer-based time series models. By analyzing ac- tual signal content through multi-scale wavelet decomposi- tion and dynamically modulating learnable scale embeddings, DyWPE creates rich positional representations that ada...

  6. [6]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural informa- tion processing systems, vol. 30, 2017

  7. [7]

    A transformer-based framework for multi- variate time series representation learning,

    G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multi- variate time series representation learning,” inProceed- ings of the 27th ACM SIGKDD Conference on Knowl- edge Discovery & Data Mining, 2021, pp. 2114–2124

  8. [8]

    Tempo- ral fusion transformers for interpretable multi-horizon time series forecasting,

    B. Lim, S. ¨O. Arık, N. Loeff, and T. Pfister, “Tempo- ral fusion transformers for interpretable multi-horizon time series forecasting,”International Journal of F ore- casting, vol. 37, no. 4, pp. 1748–1764, 2021

  9. [9]

    Self-Attention with Relative Position Representations

    P. Shaw, J. Uszkoreit, and A. Vaswani, “Self-attention with relative position representations,”arXiv preprint arXiv:1803.02155, 2018

  10. [10]

    Guolin Ke, Di He, and Tie-Yan Liu

    G. Ke, D. He, and T.-Y . Liu, “Rethinking posi- tional encoding in language pre-training,”arXiv preprint arXiv:2006.15595, 2020

  11. [11]

    Improving position encoding of transformers for mul- tivariate time series classification,

    N. M. Foumani, C. W. Tan, G. I. Webb, and M. Salehi, “Improving position encoding of transformers for mul- tivariate time series classification,”Data Mining and Knowledge Discovery, vol. 38, no. 1, pp. 22–48, 2024

  12. [12]

    Neural Machine Translation by Jointly Learning to Align and Translate

    D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014

  13. [13]

    A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

    Y . Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell, “A dual-stage attention-based recurrent neu- ral network for time series prediction,”arXiv preprint arXiv:1704.02971, 2017

  14. [14]

    Gated transformer networks for mul- tivariate time series classification,

    M. Liu, S. Ren, S. Ma, J. Jiao, Y . Chen, Z. Wang, and W. Song, “Gated transformer networks for mul- tivariate time series classification,”arXiv preprint arXiv:2103.14438, 2021

  15. [15]

    Dif- ferentiable patch selection for image recognition,

    J.-B. Cordonnier, A. Mahendran, A. Dosovitskiy, D. Weissenborn, J. Uszkoreit, and T. Unterthiner, “Dif- ferentiable patch selection for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2351–2360

  16. [16]

    Ro- former: Enhanced transformer with rotary position em- bedding,

    J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu, “Ro- former: Enhanced transformer with rotary position em- bedding,”Neurocomputing, vol. 568, p. 127063, 2024

  17. [17]

    Positional Encoding in Transformer-Based Time Series Models: A Survey

    H. Irani and V . Metsis, “Positional encoding in transformer-based time series models: a survey,”arXiv preprint arXiv:2502.12370, 2025

  18. [18]

    Adaptive positional encoding mechanisms for dynamic time series,

    A. Bell, M. Gray, and R. King, “Adaptive positional encoding mechanisms for dynamic time series,”Expert Systems with Applications, vol. 220, p. 119678, 2023

  19. [19]

    Wavelet-based positional representation for long con- text,

    Y . Oka, T. Hasegawa, K. Nishida, and K. Saito, “Wavelet-based positional representation for long con- text,”arXiv preprint arXiv:2502.02004, 2025

  20. [20]

    The UEA multivariate time series classification archive, 2018

    A. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large, A. Bostrom, P. Southam, and E. Keogh, “The uea mul- tivariate time series classification archive, 2018,”arXiv preprint arXiv:1811.00075, 2018

  21. [21]

    Unimib shar: A dataset for human activity recognition using acceleration data from smartphones,

    D. Micucci, M. Mobilio, and P. Napoletano, “Unimib shar: A dataset for human activity recognition using acceleration data from smartphones,”Applied Sciences, vol. 7, no. 10, p. 1101, 2017

  22. [22]

    Room Occupancy Esti- mation,

    A. P. Singh and S. Chaudhari, “Room Occupancy Esti- mation,” UCI Machine Learning Repository, 2018, DOI: https://doi.org/10.24432/C5P605