DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers
Pith reviewed 2026-05-18 15:31 UTC · model grok-4.3
The pith
DyWPE generates transformer positional embeddings directly from the time series signal by applying the Discrete Wavelet Transform.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Existing positional encodings are signal-agnostic because they derive information solely from indices. DyWPE replaces this with a dynamic construction that feeds the raw time series through the Discrete Wavelet Transform, yielding embeddings that embed both positional order and the signal's own multi-resolution characteristics.
What carries the argument
Dynamic Wavelet Positional Encoding (DyWPE), which applies the Discrete Wavelet Transform directly to the input time series to create signal-dependent positional embeddings.
If this is right
- Time series transformers gain accuracy on longer sequences without changing the underlying architecture.
- Complex non-stationary signals such as biomedical recordings receive larger performance lifts than simpler datasets.
- The same wavelet-derived embeddings can be used across multiple datasets without per-dataset hyperparameter search.
Where Pith is reading between the lines
- The method could extend to other sequence models that currently use index-based positions, such as state-space models.
- Wavelet scales chosen at inference time might allow adaptive resolution for different forecasting horizons.
- Similar signal-aware encodings might reduce the need for heavy data augmentation in small time series regimes.
Load-bearing premise
That the Discrete Wavelet Transform applied to the raw input time series will produce positional embeddings that preserve necessary order information while adding useful signal-dependent features without introducing artifacts.
What would settle it
An experiment on a long biomedical time series dataset where DyWPE yields lower accuracy or higher error than standard sinusoidal or learned positional encodings.
read the original abstract
Existing positional encoding methods in transformers are fundamentally signal-agnostic, deriving positional information solely from sequence indices while ignoring the underlying signal characteristics. This limitation is particularly problematic for time series analysis, where signals exhibit complex, non-stationary dynamics across multiple temporal scales. We introduce Dynamic Wavelet Positional Encoding (DyWPE), a novel signal-aware framework that generates positional embeddings directly from input time series using the Discrete Wavelet Transform (DWT). Comprehensive experiments on ten diverse time series datasets demonstrate that DyWPE consistently outperforms state-of-the-art positional encoding methods, with particularly significant improvements on longer sequences and complex biomedical signals.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Dynamic Wavelet Positional Encoding (DyWPE), a signal-aware positional encoding for time series transformers. It generates embeddings by applying the Discrete Wavelet Transform (DWT) directly to the raw input time series, in contrast to index-based methods that ignore signal characteristics. Experiments across ten diverse datasets show consistent outperformance versus state-of-the-art positional encodings, with larger gains reported on longer sequences and complex biomedical signals.
Significance. If the central construction is shown to preserve temporal order while adding signal-dependent features, the approach could meaningfully extend transformer applicability to non-stationary time series. The multi-dataset empirical evaluation and focus on longer sequences constitute a concrete strength that would support broader adoption if the order-preservation mechanism is clarified.
major comments (2)
- [§3.2] §3.2 (DyWPE construction): the mapping from DWT approximation and detail coefficients to embeddings is described without an auxiliary absolute or relative position-index injection step. Standard DWT coefficients are indexed by scale and local support rather than global sequence position; if this mapping is used directly, the resulting vectors function primarily as dynamic feature augmentation. This directly affects the central claim that DyWPE supplies positional information necessary for attention on longer sequences.
- [Table 4] Table 4 (long-sequence results): the reported accuracy gains on sequences longer than 512 steps are presented without per-run standard deviations or paired statistical tests against the strongest baseline. Because the paper emphasizes particular improvements on longer sequences, the absence of these controls leaves open whether the gains are robust or sensitive to post-hoc hyperparameter choices.
minor comments (2)
- [Figure 2] Figure 2: the wavelet coefficient visualization would benefit from explicit annotation of the original time indices to illustrate how order is retained after the DWT step.
- [§4.1] §4.1 (dataset description): the biomedical datasets are listed without reference to their sampling rates or non-stationarity measures, which would help readers assess the claimed suitability of DyWPE for such signals.
Simulated Author's Rebuttal
We thank the referee for their constructive comments and the opportunity to clarify aspects of our work on DyWPE. We address each major comment below and describe the planned revisions.
read point-by-point responses
-
Referee: [§3.2] §3.2 (DyWPE construction): the mapping from DWT approximation and detail coefficients to embeddings is described without an auxiliary absolute or relative position-index injection step. Standard DWT coefficients are indexed by scale and local support rather than global sequence position; if this mapping is used directly, the resulting vectors function primarily as dynamic feature augmentation. This directly affects the central claim that DyWPE supplies positional information necessary for attention on longer sequences.
Authors: We appreciate the referee's careful reading of §3.2. The DWT is applied directly to the full input sequence, and the resulting approximation and detail coefficients retain explicit temporal localization due to the time-frequency properties of the wavelet basis. Each coefficient corresponds to a specific time support within the original sequence, so the mapping to embeddings encodes both scale-specific signal content and the underlying temporal positioning without requiring a separate index-based injection. This design ensures that the embeddings remain sensitive to sequence order while incorporating signal-dependent information. To strengthen the presentation, we will revise §3.2 to include a formal argument based on the invertibility of the DWT and add a brief illustrative diagram showing coefficient alignment with original time indices. revision: yes
-
Referee: [Table 4] Table 4 (long-sequence results): the reported accuracy gains on sequences longer than 512 steps are presented without per-run standard deviations or paired statistical tests against the strongest baseline. Because the paper emphasizes particular improvements on longer sequences, the absence of these controls leaves open whether the gains are robust or sensitive to post-hoc hyperparameter choices.
Authors: We agree that the long-sequence results in Table 4 would benefit from additional statistical controls. In the revised manuscript we will report per-run standard deviations (computed over the same number of independent runs used elsewhere in the paper) and include paired statistical tests (e.g., Wilcoxon signed-rank test) against the strongest baseline for each long-sequence dataset. These additions will directly address concerns about robustness. revision: yes
Circularity Check
No circularity: empirical proposal validated by experiments
full rationale
The paper proposes DyWPE as a new signal-aware positional encoding using DWT applied to raw time series inputs, then reports empirical outperformance on ten datasets. No derivation chain, equations, or predictions are presented that reduce by construction to fitted parameters, self-definitions, or self-citations. The central claims rest on experimental results rather than tautological mappings, making the work self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Discrete Wavelet Transform applied to the input time series produces embeddings that encode both position and signal characteristics effectively.
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION The transformer architecture has revolutionized sequential data modeling across diverse domains, from natural language processing to time series analysis [1]. A fundamental com- ponent enabling transformers to process sequential data is positional encoding, which addresses the inherent permu- tation invariance of self-attention mechanisms by ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
BACKGROUND AND RELATED WORK The application of attention in time series analysis has evolved from augmenting recurrent models to forming the core of modern transformers. Early work in sequence-to- sequence learning demonstrated the power of attention in encoder-decoder frameworks [7]. For time series, this led to methods like dual-stage attention-based RN...
-
[3]
DYNAMIC W A VELET POSITIONAL ENCODING 3.1. Problem Formulation and Overview Given a time series datasetX={x 1, x2, ..., xn}withn samples, where each samplex i ∈R L×dx represents ad x- dimensional time series of lengthL, and corresponding labels Y={y 1, y2, ..., yn}wherey i ∈ {1,2, ..., c}, our objective is to learn a positional encoding that captures sign...
-
[4]
EXPERIMENTAL EV ALUATION 4.1. Experimental Setup We conduct comprehensive experiments across ten diverse time series datasets spanning multiple domains such as Hu- man Activity Recognition, Device, and EEG classification, with eight datasets from the UEA archive [15] and two addi- tional datasets [16, 17], as shown in Table 1. We evaluated DyWPE against e...
-
[5]
CONCLUSION We introduced Dynamic Wavelet Positional Encoding (Dy- WPE), the first signal-aware positional encoding framework for transformer-based time series models. By analyzing ac- tual signal content through multi-scale wavelet decomposi- tion and dynamically modulating learnable scale embeddings, DyWPE creates rich positional representations that ada...
-
[6]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural informa- tion processing systems, vol. 30, 2017
work page 2017
-
[7]
A transformer-based framework for multi- variate time series representation learning,
G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multi- variate time series representation learning,” inProceed- ings of the 27th ACM SIGKDD Conference on Knowl- edge Discovery & Data Mining, 2021, pp. 2114–2124
work page 2021
-
[8]
Tempo- ral fusion transformers for interpretable multi-horizon time series forecasting,
B. Lim, S. ¨O. Arık, N. Loeff, and T. Pfister, “Tempo- ral fusion transformers for interpretable multi-horizon time series forecasting,”International Journal of F ore- casting, vol. 37, no. 4, pp. 1748–1764, 2021
work page 2021
-
[9]
Self-Attention with Relative Position Representations
P. Shaw, J. Uszkoreit, and A. Vaswani, “Self-attention with relative position representations,”arXiv preprint arXiv:1803.02155, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[10]
Guolin Ke, Di He, and Tie-Yan Liu
G. Ke, D. He, and T.-Y . Liu, “Rethinking posi- tional encoding in language pre-training,”arXiv preprint arXiv:2006.15595, 2020
-
[11]
Improving position encoding of transformers for mul- tivariate time series classification,
N. M. Foumani, C. W. Tan, G. I. Webb, and M. Salehi, “Improving position encoding of transformers for mul- tivariate time series classification,”Data Mining and Knowledge Discovery, vol. 38, no. 1, pp. 22–48, 2024
work page 2024
-
[12]
Neural Machine Translation by Jointly Learning to Align and Translate
D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[13]
A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
Y . Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell, “A dual-stage attention-based recurrent neu- ral network for time series prediction,”arXiv preprint arXiv:1704.02971, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[14]
Gated transformer networks for mul- tivariate time series classification,
M. Liu, S. Ren, S. Ma, J. Jiao, Y . Chen, Z. Wang, and W. Song, “Gated transformer networks for mul- tivariate time series classification,”arXiv preprint arXiv:2103.14438, 2021
-
[15]
Dif- ferentiable patch selection for image recognition,
J.-B. Cordonnier, A. Mahendran, A. Dosovitskiy, D. Weissenborn, J. Uszkoreit, and T. Unterthiner, “Dif- ferentiable patch selection for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2351–2360
work page 2021
-
[16]
Ro- former: Enhanced transformer with rotary position em- bedding,
J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu, “Ro- former: Enhanced transformer with rotary position em- bedding,”Neurocomputing, vol. 568, p. 127063, 2024
work page 2024
-
[17]
Positional Encoding in Transformer-Based Time Series Models: A Survey
H. Irani and V . Metsis, “Positional encoding in transformer-based time series models: a survey,”arXiv preprint arXiv:2502.12370, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Adaptive positional encoding mechanisms for dynamic time series,
A. Bell, M. Gray, and R. King, “Adaptive positional encoding mechanisms for dynamic time series,”Expert Systems with Applications, vol. 220, p. 119678, 2023
work page 2023
-
[19]
Wavelet-based positional representation for long con- text,
Y . Oka, T. Hasegawa, K. Nishida, and K. Saito, “Wavelet-based positional representation for long con- text,”arXiv preprint arXiv:2502.02004, 2025
-
[20]
The UEA multivariate time series classification archive, 2018
A. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large, A. Bostrom, P. Southam, and E. Keogh, “The uea mul- tivariate time series classification archive, 2018,”arXiv preprint arXiv:1811.00075, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
Unimib shar: A dataset for human activity recognition using acceleration data from smartphones,
D. Micucci, M. Mobilio, and P. Napoletano, “Unimib shar: A dataset for human activity recognition using acceleration data from smartphones,”Applied Sciences, vol. 7, no. 10, p. 1101, 2017
work page 2017
-
[22]
A. P. Singh and S. Chaudhari, “Room Occupancy Esti- mation,” UCI Machine Learning Repository, 2018, DOI: https://doi.org/10.24432/C5P605
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.