DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers

Habib Irani; Vangelis Metsis

arxiv: 2509.14640 · v2 · submitted 2025-09-18 · 💻 cs.LG

DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers

Habib Irani , Vangelis Metsis This is my paper

Pith reviewed 2026-05-18 15:31 UTC · model grok-4.3

classification 💻 cs.LG

keywords positional encodingtime seriestransformerswavelet transformsignal processingmachine learningDyWPE

0 comments

The pith

DyWPE generates transformer positional embeddings directly from the time series signal by applying the Discrete Wavelet Transform.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Traditional positional encodings in transformers rely only on sequence indices and remain blind to the actual values in the time series. DyWPE instead decomposes the raw input signal with the Discrete Wavelet Transform to produce embeddings that carry both order and multi-scale signal features. The approach targets time series where patterns shift across temporal scales, such as biomedical recordings. Experiments on ten datasets show consistent gains over prior positional methods, with larger benefits on longer sequences and complex signals. If the method holds, transformers could handle non-stationary time series more effectively without extra dataset tuning.

Core claim

Existing positional encodings are signal-agnostic because they derive information solely from indices. DyWPE replaces this with a dynamic construction that feeds the raw time series through the Discrete Wavelet Transform, yielding embeddings that embed both positional order and the signal's own multi-resolution characteristics.

What carries the argument

Dynamic Wavelet Positional Encoding (DyWPE), which applies the Discrete Wavelet Transform directly to the input time series to create signal-dependent positional embeddings.

If this is right

Time series transformers gain accuracy on longer sequences without changing the underlying architecture.
Complex non-stationary signals such as biomedical recordings receive larger performance lifts than simpler datasets.
The same wavelet-derived embeddings can be used across multiple datasets without per-dataset hyperparameter search.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could extend to other sequence models that currently use index-based positions, such as state-space models.
Wavelet scales chosen at inference time might allow adaptive resolution for different forecasting horizons.
Similar signal-aware encodings might reduce the need for heavy data augmentation in small time series regimes.

Load-bearing premise

That the Discrete Wavelet Transform applied to the raw input time series will produce positional embeddings that preserve necessary order information while adding useful signal-dependent features without introducing artifacts.

What would settle it

An experiment on a long biomedical time series dataset where DyWPE yields lower accuracy or higher error than standard sinusoidal or learned positional encodings.

read the original abstract

Existing positional encoding methods in transformers are fundamentally signal-agnostic, deriving positional information solely from sequence indices while ignoring the underlying signal characteristics. This limitation is particularly problematic for time series analysis, where signals exhibit complex, non-stationary dynamics across multiple temporal scales. We introduce Dynamic Wavelet Positional Encoding (DyWPE), a novel signal-aware framework that generates positional embeddings directly from input time series using the Discrete Wavelet Transform (DWT). Comprehensive experiments on ten diverse time series datasets demonstrate that DyWPE consistently outperforms state-of-the-art positional encoding methods, with particularly significant improvements on longer sequences and complex biomedical signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DyWPE applies DWT to raw inputs for signal-aware positional embeddings, but the construction may lean more toward feature augmentation than strict order encoding.

read the letter

The main thing to know is that this paper takes the Discrete Wavelet Transform of the input time series itself to build dynamic positional embeddings for transformers. That specific combination looks new compared to prior work on fixed or learned positional encodings. They back it with tests on ten datasets and report steady gains over existing methods, with bigger lifts on longer sequences and biomedical signals. That empirical spread is the part that actually lands as useful evidence for people working in this corner of time series modeling. The DWT math is standard and reproducible, so no issue there on the technical foundation. The citation pattern stays within the usual positional encoding and wavelet literature without obvious gaps or over-claiming. The soft spot sits in the central construction. The stress-test concern holds up on a first read: DWT coefficients are indexed by scale and local support within the signal, not by sequence position per se. If the method feeds those coefficients straight into the embedding without an auxiliary position-index step, the attention mechanism could be picking up signal-dependent features rather than improved temporal order. That would still explain gains on complex or non-stationary data, but it undercuts the claim that this fixes positional encoding specifically for longer sequences. A clear diagram or ablation isolating the positional component would tighten this. This paper is for researchers building or tuning transformers on time series, especially those handling multi-scale or biomedical signals. A reader already working on positional variants would pull practical implementation ideas from the results. It has enough concrete experiments to deserve peer review so the embedding mapping and robustness checks can be examined in detail.

Referee Report

2 major / 2 minor

Summary. The paper introduces Dynamic Wavelet Positional Encoding (DyWPE), a signal-aware positional encoding for time series transformers. It generates embeddings by applying the Discrete Wavelet Transform (DWT) directly to the raw input time series, in contrast to index-based methods that ignore signal characteristics. Experiments across ten diverse datasets show consistent outperformance versus state-of-the-art positional encodings, with larger gains reported on longer sequences and complex biomedical signals.

Significance. If the central construction is shown to preserve temporal order while adding signal-dependent features, the approach could meaningfully extend transformer applicability to non-stationary time series. The multi-dataset empirical evaluation and focus on longer sequences constitute a concrete strength that would support broader adoption if the order-preservation mechanism is clarified.

major comments (2)

[§3.2] §3.2 (DyWPE construction): the mapping from DWT approximation and detail coefficients to embeddings is described without an auxiliary absolute or relative position-index injection step. Standard DWT coefficients are indexed by scale and local support rather than global sequence position; if this mapping is used directly, the resulting vectors function primarily as dynamic feature augmentation. This directly affects the central claim that DyWPE supplies positional information necessary for attention on longer sequences.
[Table 4] Table 4 (long-sequence results): the reported accuracy gains on sequences longer than 512 steps are presented without per-run standard deviations or paired statistical tests against the strongest baseline. Because the paper emphasizes particular improvements on longer sequences, the absence of these controls leaves open whether the gains are robust or sensitive to post-hoc hyperparameter choices.

minor comments (2)

[Figure 2] Figure 2: the wavelet coefficient visualization would benefit from explicit annotation of the original time indices to illustrate how order is retained after the DWT step.
[§4.1] §4.1 (dataset description): the biomedical datasets are listed without reference to their sampling rates or non-stationarity measures, which would help readers assess the claimed suitability of DyWPE for such signals.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and the opportunity to clarify aspects of our work on DyWPE. We address each major comment below and describe the planned revisions.

read point-by-point responses

Referee: [§3.2] §3.2 (DyWPE construction): the mapping from DWT approximation and detail coefficients to embeddings is described without an auxiliary absolute or relative position-index injection step. Standard DWT coefficients are indexed by scale and local support rather than global sequence position; if this mapping is used directly, the resulting vectors function primarily as dynamic feature augmentation. This directly affects the central claim that DyWPE supplies positional information necessary for attention on longer sequences.

Authors: We appreciate the referee's careful reading of §3.2. The DWT is applied directly to the full input sequence, and the resulting approximation and detail coefficients retain explicit temporal localization due to the time-frequency properties of the wavelet basis. Each coefficient corresponds to a specific time support within the original sequence, so the mapping to embeddings encodes both scale-specific signal content and the underlying temporal positioning without requiring a separate index-based injection. This design ensures that the embeddings remain sensitive to sequence order while incorporating signal-dependent information. To strengthen the presentation, we will revise §3.2 to include a formal argument based on the invertibility of the DWT and add a brief illustrative diagram showing coefficient alignment with original time indices. revision: yes
Referee: [Table 4] Table 4 (long-sequence results): the reported accuracy gains on sequences longer than 512 steps are presented without per-run standard deviations or paired statistical tests against the strongest baseline. Because the paper emphasizes particular improvements on longer sequences, the absence of these controls leaves open whether the gains are robust or sensitive to post-hoc hyperparameter choices.

Authors: We agree that the long-sequence results in Table 4 would benefit from additional statistical controls. In the revised manuscript we will report per-run standard deviations (computed over the same number of independent runs used elsewhere in the paper) and include paired statistical tests (e.g., Wilcoxon signed-rank test) against the strongest baseline for each long-sequence dataset. These additions will directly address concerns about robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical proposal validated by experiments

full rationale

The paper proposes DyWPE as a new signal-aware positional encoding using DWT applied to raw time series inputs, then reports empirical outperformance on ten datasets. No derivation chain, equations, or predictions are presented that reduce by construction to fitted parameters, self-definitions, or self-citations. The central claims rest on experimental results rather than tautological mappings, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that DWT decompositions provide suitable multi-scale features for positional information; no free parameters or invented entities are identifiable from the abstract alone.

axioms (1)

domain assumption Discrete Wavelet Transform applied to the input time series produces embeddings that encode both position and signal characteristics effectively.
This premise is required for the method to generate useful positional encodings.

pith-pipeline@v0.9.0 · 5629 in / 1138 out tokens · 36713 ms · 2026-05-18T15:31:06.898207+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 6 internal anchors

[1]

INTRODUCTION The transformer architecture has revolutionized sequential data modeling across diverse domains, from natural language processing to time series analysis [1]. A fundamental com- ponent enabling transformers to process sequential data is positional encoding, which addresses the inherent permu- tation invariance of self-attention mechanisms by ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Early work in sequence-to- sequence learning demonstrated the power of attention in encoder-decoder frameworks [7]

BACKGROUND AND RELATED WORK The application of attention in time series analysis has evolved from augmenting recurrent models to forming the core of modern transformers. Early work in sequence-to- sequence learning demonstrated the power of attention in encoder-decoder frameworks [7]. For time series, this led to methods like dual-stage attention-based RN...

work page
[3]

prototypes

DYNAMIC W A VELET POSITIONAL ENCODING 3.1. Problem Formulation and Overview Given a time series datasetX={x 1, x2, ..., xn}withn samples, where each samplex i ∈R L×dx represents ad x- dimensional time series of lengthL, and corresponding labels Y={y 1, y2, ..., yn}wherey i ∈ {1,2, ..., c}, our objective is to learn a positional encoding that captures sign...

work page
[4]

Rel. Imp

EXPERIMENTAL EV ALUATION 4.1. Experimental Setup We conduct comprehensive experiments across ten diverse time series datasets spanning multiple domains such as Hu- man Activity Recognition, Device, and EEG classification, with eight datasets from the UEA archive [15] and two addi- tional datasets [16, 17], as shown in Table 1. We evaluated DyWPE against e...

work page
[5]

CONCLUSION We introduced Dynamic Wavelet Positional Encoding (Dy- WPE), the first signal-aware positional encoding framework for transformer-based time series models. By analyzing ac- tual signal content through multi-scale wavelet decomposi- tion and dynamically modulating learnable scale embeddings, DyWPE creates rich positional representations that ada...

work page
[6]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural informa- tion processing systems, vol. 30, 2017

work page 2017
[7]

A transformer-based framework for multi- variate time series representation learning,

G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multi- variate time series representation learning,” inProceed- ings of the 27th ACM SIGKDD Conference on Knowl- edge Discovery & Data Mining, 2021, pp. 2114–2124

work page 2021
[8]

Tempo- ral fusion transformers for interpretable multi-horizon time series forecasting,

B. Lim, S. ¨O. Arık, N. Loeff, and T. Pfister, “Tempo- ral fusion transformers for interpretable multi-horizon time series forecasting,”International Journal of F ore- casting, vol. 37, no. 4, pp. 1748–1764, 2021

work page 2021
[9]

Self-Attention with Relative Position Representations

P. Shaw, J. Uszkoreit, and A. Vaswani, “Self-attention with relative position representations,”arXiv preprint arXiv:1803.02155, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Guolin Ke, Di He, and Tie-Yan Liu

G. Ke, D. He, and T.-Y . Liu, “Rethinking posi- tional encoding in language pre-training,”arXiv preprint arXiv:2006.15595, 2020

work page arXiv 2006
[11]

Improving position encoding of transformers for mul- tivariate time series classification,

N. M. Foumani, C. W. Tan, G. I. Webb, and M. Salehi, “Improving position encoding of transformers for mul- tivariate time series classification,”Data Mining and Knowledge Discovery, vol. 38, no. 1, pp. 22–48, 2024

work page 2024
[12]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[13]

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

Y . Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell, “A dual-stage attention-based recurrent neu- ral network for time series prediction,”arXiv preprint arXiv:1704.02971, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Gated transformer networks for mul- tivariate time series classification,

M. Liu, S. Ren, S. Ma, J. Jiao, Y . Chen, Z. Wang, and W. Song, “Gated transformer networks for mul- tivariate time series classification,”arXiv preprint arXiv:2103.14438, 2021

work page arXiv 2021
[15]

Dif- ferentiable patch selection for image recognition,

J.-B. Cordonnier, A. Mahendran, A. Dosovitskiy, D. Weissenborn, J. Uszkoreit, and T. Unterthiner, “Dif- ferentiable patch selection for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2351–2360

work page 2021
[16]

Ro- former: Enhanced transformer with rotary position em- bedding,

J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu, “Ro- former: Enhanced transformer with rotary position em- bedding,”Neurocomputing, vol. 568, p. 127063, 2024

work page 2024
[17]

Positional Encoding in Transformer-Based Time Series Models: A Survey

H. Irani and V . Metsis, “Positional encoding in transformer-based time series models: a survey,”arXiv preprint arXiv:2502.12370, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Adaptive positional encoding mechanisms for dynamic time series,

A. Bell, M. Gray, and R. King, “Adaptive positional encoding mechanisms for dynamic time series,”Expert Systems with Applications, vol. 220, p. 119678, 2023

work page 2023
[19]

Wavelet-based positional representation for long con- text,

Y . Oka, T. Hasegawa, K. Nishida, and K. Saito, “Wavelet-based positional representation for long con- text,”arXiv preprint arXiv:2502.02004, 2025

work page arXiv 2025
[20]

The UEA multivariate time series classification archive, 2018

A. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large, A. Bostrom, P. Southam, and E. Keogh, “The uea mul- tivariate time series classification archive, 2018,”arXiv preprint arXiv:1811.00075, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

Unimib shar: A dataset for human activity recognition using acceleration data from smartphones,

D. Micucci, M. Mobilio, and P. Napoletano, “Unimib shar: A dataset for human activity recognition using acceleration data from smartphones,”Applied Sciences, vol. 7, no. 10, p. 1101, 2017

work page 2017
[22]

Room Occupancy Esti- mation,

A. P. Singh and S. Chaudhari, “Room Occupancy Esti- mation,” UCI Machine Learning Repository, 2018, DOI: https://doi.org/10.24432/C5P605

work page doi:10.24432/c5p605 2018

[1] [1]

INTRODUCTION The transformer architecture has revolutionized sequential data modeling across diverse domains, from natural language processing to time series analysis [1]. A fundamental com- ponent enabling transformers to process sequential data is positional encoding, which addresses the inherent permu- tation invariance of self-attention mechanisms by ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Early work in sequence-to- sequence learning demonstrated the power of attention in encoder-decoder frameworks [7]

BACKGROUND AND RELATED WORK The application of attention in time series analysis has evolved from augmenting recurrent models to forming the core of modern transformers. Early work in sequence-to- sequence learning demonstrated the power of attention in encoder-decoder frameworks [7]. For time series, this led to methods like dual-stage attention-based RN...

work page

[3] [3]

prototypes

DYNAMIC W A VELET POSITIONAL ENCODING 3.1. Problem Formulation and Overview Given a time series datasetX={x 1, x2, ..., xn}withn samples, where each samplex i ∈R L×dx represents ad x- dimensional time series of lengthL, and corresponding labels Y={y 1, y2, ..., yn}wherey i ∈ {1,2, ..., c}, our objective is to learn a positional encoding that captures sign...

work page

[4] [4]

Rel. Imp

EXPERIMENTAL EV ALUATION 4.1. Experimental Setup We conduct comprehensive experiments across ten diverse time series datasets spanning multiple domains such as Hu- man Activity Recognition, Device, and EEG classification, with eight datasets from the UEA archive [15] and two addi- tional datasets [16, 17], as shown in Table 1. We evaluated DyWPE against e...

work page

[5] [5]

CONCLUSION We introduced Dynamic Wavelet Positional Encoding (Dy- WPE), the first signal-aware positional encoding framework for transformer-based time series models. By analyzing ac- tual signal content through multi-scale wavelet decomposi- tion and dynamically modulating learnable scale embeddings, DyWPE creates rich positional representations that ada...

work page

[6] [6]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural informa- tion processing systems, vol. 30, 2017

work page 2017

[7] [7]

A transformer-based framework for multi- variate time series representation learning,

G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multi- variate time series representation learning,” inProceed- ings of the 27th ACM SIGKDD Conference on Knowl- edge Discovery & Data Mining, 2021, pp. 2114–2124

work page 2021

[8] [8]

Tempo- ral fusion transformers for interpretable multi-horizon time series forecasting,

B. Lim, S. ¨O. Arık, N. Loeff, and T. Pfister, “Tempo- ral fusion transformers for interpretable multi-horizon time series forecasting,”International Journal of F ore- casting, vol. 37, no. 4, pp. 1748–1764, 2021

work page 2021

[9] [9]

Self-Attention with Relative Position Representations

P. Shaw, J. Uszkoreit, and A. Vaswani, “Self-attention with relative position representations,”arXiv preprint arXiv:1803.02155, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Guolin Ke, Di He, and Tie-Yan Liu

G. Ke, D. He, and T.-Y . Liu, “Rethinking posi- tional encoding in language pre-training,”arXiv preprint arXiv:2006.15595, 2020

work page arXiv 2006

[11] [11]

Improving position encoding of transformers for mul- tivariate time series classification,

N. M. Foumani, C. W. Tan, G. I. Webb, and M. Salehi, “Improving position encoding of transformers for mul- tivariate time series classification,”Data Mining and Knowledge Discovery, vol. 38, no. 1, pp. 22–48, 2024

work page 2024

[12] [12]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[13] [13]

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

Y . Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell, “A dual-stage attention-based recurrent neu- ral network for time series prediction,”arXiv preprint arXiv:1704.02971, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Gated transformer networks for mul- tivariate time series classification,

M. Liu, S. Ren, S. Ma, J. Jiao, Y . Chen, Z. Wang, and W. Song, “Gated transformer networks for mul- tivariate time series classification,”arXiv preprint arXiv:2103.14438, 2021

work page arXiv 2021

[15] [15]

Dif- ferentiable patch selection for image recognition,

J.-B. Cordonnier, A. Mahendran, A. Dosovitskiy, D. Weissenborn, J. Uszkoreit, and T. Unterthiner, “Dif- ferentiable patch selection for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2351–2360

work page 2021

[16] [16]

Ro- former: Enhanced transformer with rotary position em- bedding,

J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu, “Ro- former: Enhanced transformer with rotary position em- bedding,”Neurocomputing, vol. 568, p. 127063, 2024

work page 2024

[17] [17]

Positional Encoding in Transformer-Based Time Series Models: A Survey

H. Irani and V . Metsis, “Positional encoding in transformer-based time series models: a survey,”arXiv preprint arXiv:2502.12370, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

Adaptive positional encoding mechanisms for dynamic time series,

A. Bell, M. Gray, and R. King, “Adaptive positional encoding mechanisms for dynamic time series,”Expert Systems with Applications, vol. 220, p. 119678, 2023

work page 2023

[19] [19]

Wavelet-based positional representation for long con- text,

Y . Oka, T. Hasegawa, K. Nishida, and K. Saito, “Wavelet-based positional representation for long con- text,”arXiv preprint arXiv:2502.02004, 2025

work page arXiv 2025

[20] [20]

The UEA multivariate time series classification archive, 2018

A. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large, A. Bostrom, P. Southam, and E. Keogh, “The uea mul- tivariate time series classification archive, 2018,”arXiv preprint arXiv:1811.00075, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[21] [21]

Unimib shar: A dataset for human activity recognition using acceleration data from smartphones,

D. Micucci, M. Mobilio, and P. Napoletano, “Unimib shar: A dataset for human activity recognition using acceleration data from smartphones,”Applied Sciences, vol. 7, no. 10, p. 1101, 2017

work page 2017

[22] [22]

Room Occupancy Esti- mation,

A. P. Singh and S. Chaudhari, “Room Occupancy Esti- mation,” UCI Machine Learning Repository, 2018, DOI: https://doi.org/10.24432/C5P605

work page doi:10.24432/c5p605 2018