A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

Dongjin Song; Garrison Cottrell; Guofei Jiang; Haifeng Chen; Wei Cheng; Yao Qin

arxiv: 1704.02971 · v4 · pith:W4GEIWOJnew · submitted 2017-04-07 · 💻 cs.LG · stat.ML

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

Yao Qin , Dongjin Song , Haifeng Chen , Wei Cheng , Guofei Jiang , Garrison Cottrell This is my paper

classification 💻 cs.LG stat.ML

keywords seriestimeattentiondrivingdual-stagerelevantattention-basedbeen

0 comments

read the original abstract

The Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades. Despite the fact that various NARX models have been developed, few of them can capture the long-term temporal dependencies appropriately and select the relevant driving series to make predictions. In this paper, we propose a dual-stage attention-based recurrent neural network (DA-RNN) to address these two issues. In the first stage, we introduce an input attention mechanism to adaptively extract relevant driving series (a.k.a., input features) at each time step by referring to the previous encoder hidden state. In the second stage, we use a temporal attention mechanism to select relevant encoder hidden states across all time steps. With this dual-stage attention scheme, our model can not only make predictions effectively, but can also be easily interpreted. Thorough empirical studies based upon the SML 2010 dataset and the NASDAQ 100 Stock dataset demonstrate that the DA-RNN can outperform state-of-the-art methods for time series prediction.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AdaMamba: Adaptive Frequency-Gated Mamba for Long-Term Time Series Forecasting
cs.AI 2026-04 unverdicted novelty 7.0

AdaMamba adds input-dependent frequency bases and a unified time-frequency forgetting gate to Mamba, yielding higher forecasting accuracy than prior methods on standard long-term time series benchmarks.
Deep Time Series Models: A Comprehensive Survey and Benchmark
cs.LG 2024-07 unverdicted novelty 7.0

This survey and benchmark of deep time series models using the released TSLib library finds that models with specific structures perform well only on distinct analysis tasks.
DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers
cs.LG 2025-09 unverdicted novelty 6.0

DyWPE generates positional embeddings for time series transformers from the input signal via Discrete Wavelet Transform and outperforms standard positional encodings on ten datasets, especially longer sequences and bi...
CASE-NET: Deep Spatio-Temporal Representation Learning via Causal Attention and Channel Recalibration for Multivariate Time Series Classification
cs.LG 2026-05 unverdicted novelty 4.0

CASE-NET combines a causal temporal encoder with adaptive channel recalibration and reports new state-of-the-art accuracy on four of six evaluated multivariate time series tasks.
Hermes: A Multi-Scale Spatial-Temporal Hypergraph Network for Stock Time Series Forecasting
cs.LG 2025-09 unverdicted novelty 4.0

Hermes is a multi-scale spatial-temporal hypergraph network that improves stock forecasting accuracy by capturing inter-industry lead-lag dependencies and fusing information across scales.
Machine Learning and Deep Learning Models for Short Term Electricity Price Forecasting in Australia's National Electricity Market
cs.LG 2026-04 conditional novelty 3.0

GBRT reaches R² 0.88 on price forecasting and 0.96 on demand but every model exceeds 90% MAPE for prices, underscoring the difficulty of the task.
Deep Learning for Electricity Price Forecasting: A Review of Day-Ahead, Intraday, and Balancing Electricity Markets
q-fin.CP 2026-02 unverdicted novelty 3.0

A structured review organizes deep learning models for electricity price forecasting via a backbone-head-loss taxonomy and identifies gaps in intraday and balancing market applications.
Positional Encoding in Transformer-Based Time Series Models: A Survey
cs.LG 2025-02 unverdicted novelty 3.0

A survey of positional encoding methods in transformer-based time series models that evaluates fixed, learnable, relative, and hybrid approaches on classification tasks and links effectiveness to data characteristics.