Recency Biased Causal Attention for Time-series Forecasting

· 2025 · cs.LG · arXiv 2502.06151

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Recency bias is a useful inductive prior for sequential modeling: it emphasizes nearby observations and can still allow longer-range dependencies. Standard Transformer attention lacks this property, relying on all-to-all interactions that overlook the causal and often local structure of temporal data. We propose a simple mechanism to introduce recency bias by reweighting attention scores with a smooth heavy-tailed decay. This adjustment strengthens local temporal dependencies without sacrificing the flexibility to capture broader and data-specific correlations. We show that recency-biased attention consistently improves sequential modeling, aligning Transformer more closely with the read, ignore, and write operations of RNNs. Finally, we demonstrate that our approach achieves competitive and often superior performance on challenging time-series forecasting benchmarks.

representative citing papers

Neural equilibria for long-term prediction of nonlinear conservation laws

cs.LG · 2025-01-12 · unverdicted · novelty 6.0

NeurDE learns the equilibrium closure within a kinetic solver to outperform larger neural models on long-term predictions of nonlinear conservation laws including shocks.

citing papers explorer

Showing 1 of 1 citing paper.

Neural equilibria for long-term prediction of nonlinear conservation laws cs.LG · 2025-01-12 · unverdicted · none · ref 33 · internal anchor
NeurDE learns the equilibrium closure within a kinetic solver to outperform larger neural models on long-term predictions of nonlinear conservation laws including shocks.

Recency Biased Causal Attention for Time-series Forecasting

fields

years

verdicts

representative citing papers

citing papers explorer