pith. sign in

arxiv: 1907.09207 · v1 · pith:7YWL43PLnew · submitted 2019-07-22 · 💻 cs.LG · stat.ML

Deep Learning for Time Series Forecasting: The Electric Load Case

Pith reviewed 2026-05-24 17:56 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords deep learningtime series forecastingelectric load forecastingneural networksrecurrent neural networkssequence to sequence modelstemporal convolutional networkssmart grids
0
0 comments X

The pith

Deep learning architectures for one-day-ahead electric load forecasting are reviewed and compared on two real datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the lack of comprehensive comparisons among deep learning models for electric load forecasting, a nonlinear task critical for smart grid operations. It reviews and tests feedforward neural networks, recurrent neural networks, sequence-to-sequence models, and temporal convolutional networks for short-term predictions. Experiments use two real-world datasets to contrast these families and their variants. A sympathetic reader would care because the evaluation aims to guide model selection where accurate forecasts improve infrastructure efficiency. The work fills a literature gap by focusing on architectures novel to load forecasting but established in signal processing.

Core claim

By reviewing recent trends and running experiments on two real-world datasets, the paper shows that contrasting feedforward and recurrent neural networks, sequence-to-sequence models, and temporal convolutional neural networks provides a basis for selecting deep learning approaches suited to one-day-ahead electric load forecasting.

What carries the argument

Experimental contrast of feedforward neural networks, recurrent neural networks, sequence-to-sequence models, and temporal convolutional neural networks on short-term electric load data.

If this is right

  • Accurate short-term load forecasts become more achievable by choosing among the contrasted neural network families.
  • Smart grid management gains practical guidance from the side-by-side evaluation on real data.
  • Sequence-to-sequence and temporal convolutional approaches, already used in signal processing, receive direct testing in the load forecasting setting.
  • Further work can build on the identified performance patterns across the two datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same comparison method could be applied to other time series domains such as traffic or weather prediction.
  • If one architecture family consistently leads on the tested data, practitioners might prioritize it for similar forecasting tasks without exhaustive re-tuning.
  • Extending the evaluation to longer horizons or additional datasets would test whether the observed differences persist.

Load-bearing premise

The two selected real-world datasets and the chosen architectural variants are representative enough to support conclusions about which deep learning families work best for electric load forecasting in general.

What would settle it

A new study using different real-world load datasets or additional architectural variants that produces reversed performance rankings among the model families would undermine the comparison's broader applicability.

Figures

Figures reproduced from arXiv: 1907.09207 by Alberto Gasparin, Cesare Alippi, Slobodan Lukovic.

Figure 1
Figure 1. Figure 1: A sliding windowed approach is used to frame the forecasting problem into a supervised machine learning [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (Left) A simple RNN with a single input. The black box represents the delay operator which leads to Equation [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: 6 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 3
Figure 3. Figure 3: A simple ERNN block with one cell implementing Equation 78 once rewritten as matrix concatenation: [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Long-Short Term Memory block with one cell. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Gated Recurrent Unit memory block with one cell. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: seq2seq (Encoder-Decoder) architecture with a general Recurrent Neural network both for the encoder and [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (Left) decoder with ground-truth inputs (Teacher Forcing). (Right) Decoder with self-generated inputs. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A 3 layers CNN with causal convolu￾tion (no dilation), the receptive field r is 4 [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: TCN Architecture. xt is the vector of historical loads along with the exogenous features for the time window indexes from 0 to nT , zt is the vector of exogenous variables related to the last nO indexes of the time window (when available), ˆyt is the output vector. Residual Blocks are composed by a 1D Dilated Causal Convolution, a ReLU activation and Dropout. The square box represents a concatenation betw… view at source ↗
Figure 11
Figure 11. Figure 11: Weekly statistics for the electric load in the whole IHEPC(Left) and GEFCom2014datasets (right). The bold [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: (Right) Predictive performance of all the models on a single day for IHEPCdataset. The left portion of the [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: (Right) Predictive performance of all the models on a single day for GEFCom2014dataset. The left portion [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
read the original abstract

Management and efficient operations in critical infrastructure such as Smart Grids take huge advantage of accurate power load forecasting which, due to its nonlinear nature, remains a challenging task. Recently, deep learning has emerged in the machine learning field achieving impressive performance in a vast range of tasks, from image classification to machine translation. Applications of deep learning models to the electric load forecasting problem are gaining interest among researchers as well as the industry, but a comprehensive and sound comparison among different architectures is not yet available in the literature. This work aims at filling the gap by reviewing and experimentally evaluating on two real-world datasets the most recent trends in electric load forecasting, by contrasting deep learning architectures on short term forecast (one day ahead prediction). Specifically, we focus on feedforward and recurrent neural networks, sequence to sequence models and temporal convolutional neural networks along with architectural variants, which are known in the signal processing community but are novel to the load forecasting one.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper reviews recent deep learning approaches for electric load forecasting and performs an experimental comparison of feedforward, recurrent, sequence-to-sequence, and temporal convolutional architectures (with variants) for one-day-ahead prediction on two real-world datasets, aiming to identify preferable families for this task.

Significance. If the experimental ranking is robust, the work supplies a needed benchmark contrasting DL families on load data and could inform smart-grid applications; the review component also consolidates recent trends. The limited dataset count, however, restricts the strength of any architectural preference claims beyond the specific traces examined.

major comments (1)
  1. [Abstract and experimental evaluation section] Abstract and experimental evaluation section: the central claim of contrasting architectures to identify preferable DL families for electric load forecasting rests on results from exactly two real-world datasets. No discussion is provided of how these traces differ in seasonality, resolution, geographic origin, or exogenous drivers; if they share similar statistical regimes the observed ranking may be an artifact rather than a general preference.
minor comments (1)
  1. [Abstract] Abstract supplies no information on the concrete metrics, statistical tests, or preprocessing steps used in the evaluation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The concern about dataset diversity and characterization is valid and will be addressed through revisions that add explicit discussion of the traces while preserving the paper's core experimental contribution.

read point-by-point responses
  1. Referee: [Abstract and experimental evaluation section] Abstract and experimental evaluation section: the central claim of contrasting architectures to identify preferable DL families for electric load forecasting rests on results from exactly two real-world datasets. No discussion is provided of how these traces differ in seasonality, resolution, geographic origin, or exogenous drivers; if they share similar statistical regimes the observed ranking may be an artifact rather than a general preference.

    Authors: We agree that the manuscript would benefit from explicit characterization of the two datasets. In the revised version we will expand the experimental evaluation section with a new subsection describing each trace's seasonality, sampling resolution, geographic origin, and exogenous drivers (where available). We will also add a limitations paragraph in the conclusions that qualifies the architectural preferences as observed on these specific traces and notes that broader validation across additional regimes remains future work. These changes directly respond to the risk that the ranking could be an artifact of similar statistical properties. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model comparison with no derivation chain

full rationale

The paper is an empirical review and experimental evaluation of deep learning architectures for short-term electric load forecasting on two real-world datasets. It contrasts feedforward, recurrent, seq2seq and TCN variants but contains no claimed first-principles derivation, no fitted parameters renamed as predictions, and no load-bearing self-citation chains that reduce the central claim to its own inputs. The reader's assessment of score 1.0 is consistent with the absence of any self-definitional, fitted-input, or uniqueness-imported circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the standard machine-learning assumption that sufficiently expressive neural networks can approximate the nonlinear mapping from historical load and exogenous variables to future load; no free parameters or invented entities are introduced because the work is a comparative evaluation.

axioms (1)
  • domain assumption Deep learning architectures are suitable for modeling the nonlinear dynamics of electric load time series.
    Invoked in the opening motivation for applying DL models to the forecasting task.

pith-pipeline@v0.9.0 · 5689 in / 1187 out tokens · 28284 ms · 2026-05-24T17:56:39.318497+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

95 extracted references · 95 canonical work pages · 15 internal anchors

  1. [1]

    X. Fang, S. Misra, G. Xue, and D. Yang. Smart grid — the new and improved power grid: A survey. IEEE Communications Surveys Tutorials, 14(4):944–980, Fourth 2012

  2. [2]

    A methodology for electric power load forecasting

    Eisa Almeshaiei and Hassan Soltan. A methodology for electric power load forecasting. Alexandria Engineering Journal, 50(2):137 – 144, 2011

  3. [3]

    H. S. Hippert, C. E. Pedreira, and R. C. Souza. Neural networks for short-term load forecasting: a review and evaluation. IEEE Transactions on Power Systems, 16(1):44–55, Feb 2001. 21 A PREPRINT

  4. [4]

    Analysis of an adaptive time-series autoregressive moving-average (arma) model for short-term load forecasting

    Jiann-Fuh Chen, Wei-Ming Wang, and Chao-Ming Huang. Analysis of an adaptive time-series autoregressive moving-average (arma) model for short-term load forecasting. Electric Power Systems Research, 34(3):187–196, 1995

  5. [5]

    Short-term load forecasting via arma model identification including non-gaussian process considerations

    Shyh-Jier Huang and Kuang-Rong Shih. Short-term load forecasting via arma model identification including non-gaussian process considerations. IEEE Transactions on power systems, 18(2):673–679, 2003

  6. [6]

    The time series approach to short term load forecasting

    Martin T Hagan and Suzanne M Behr. The time series approach to short term load forecasting. IEEE Transactions on Power Systems, 2(3):785–791, 1987

  7. [7]

    A particle swarm optimization to identifying the armax model for short-term load forecasting

    Chao-Ming Huang, Chi-Jen Huang, and Ming-Li Wang. A particle swarm optimization to identifying the armax model for short-term load forecasting. IEEE Transactions on Power Systems, 20(2):1126–1133, 2005

  8. [8]

    Identification of armax model for short term load forecasting: An evolutionary programming approach

    Hong-Tzer Yang, Chao-Ming Huang, and Ching-Lien Huang. Identification of armax model for short term load forecasting: An evolutionary programming approach. In Power Industry Computer Application Conference, 1995. Conference Proceedings., 1995 IEEE, pages 325–330. IEEE, 1995

  9. [9]

    Building-level occupancy data to improve arima-based electricity use forecasts

    Guy R Newsham and Benjamin J Birt. Building-level occupancy data to improve arima-based electricity use forecasts. In Proceedings of the 2nd ACM workshop on embedded sensing systems for energy-efficiency in building, pages 13–18. ACM, 2010

  10. [10]

    K. Y . Lee, Y . T. Cha, and J. H. Park. Short-term load forecasting using an artificial neural network. IEEE Transactions on Power Systems, 7(1):124–132, Feb 1992

  11. [11]

    D. C. Park, M. A. El-Sharkawi, R. J. Marks, L. E. Atlas, and M. J. Damborg. Electric load forecasting using an artificial neural network. IEEE Transactions on Power Systems, 6(2):442–449, May 1991

  12. [12]

    Liew, and C.S

    Dipti Srinivasan, A.C. Liew, and C.S. Chang. A neural network short-term load forecaster.Electric Power Systems Research, 28(3):227 – 234, 1994

  13. [13]

    Drezga and S

    I. Drezga and S. Rahman. Short-term load forecasting with local ann predictors. IEEE Transactions on Power Systems, 14(3):844–850, Aug 1999

  14. [14]

    K. Chen, K. Chen, Q. Wang, Z. He, J. Hu, and J. He. Short-term load forecasting with deep residual networks. IEEE Transactions on Smart Grid, pages 1–1, 2018

  15. [15]

    A high precision artificial neural networks model for short-term energy load forecasting

    Ping-Huan Kuo and Chiou-Jye Huang. A high precision artificial neural networks model for short-term energy load forecasting. Energies, 11(1), 2018

  16. [16]

    Amarasinghe, D

    K. Amarasinghe, D. L. Marino, and M. Manic. Deep neural networks for energy load forecasting. In 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE), pages 1483–1488, June 2017

  17. [17]

    Electricity short term load forecasting using elman recurrent neural network

    Siddarameshwara Nayaka, Anup Yelamali, and Kshitiz Byahatti. Electricity short term load forecasting using elman recurrent neural network. pages 351 – 354, 11 2010

  18. [18]

    An overview and comparative analysis of Recurrent Neural Networks for Short Term Load Forecasting

    Filippo Maria Bianchi, Enrico Maiorino, Michael C. Kampffmeyer, Antonello Rizzi, and Robert Jenssen. An overview and comparative analysis of recurrent neural networks for short term load forecasting. CoRR, abs/1705.04378, 2017

  19. [19]

    Short-term electric load forecasting using echo state networks and pca decomposition

    Filippo Maria Bianchi, Enrico De Santis, Antonello Rizzi, and Alireza Sadeghian. Short-term electric load forecasting using echo state networks and pca decomposition. IEEE Access, 3:1931–1943, 2015

  20. [20]

    Deep learning for estimating building energy consumption

    Elena Mocanu, Phuong H Nguyen, Madeleine Gibescu, and Wil L Kling. Deep learning for estimating building energy consumption. Sustainable Energy, Grids and Networks, 6:91–99, 2016

  21. [21]

    Electric load forecasting in smart grids using long-short- term-memory based recurrent neural network

    Jian Zheng, Cencen Xu, Ziang Zhang, and Xiaohua Li. Electric load forecasting in smart grids using long-short- term-memory based recurrent neural network. In Information Sciences and Systems (CISS), 2017 51st Annual Conference on, pages 1–6. IEEE, 2017

  22. [22]

    Short-term residential load forecasting based on lstm recurrent neural network

    Weicong Kong, Zhao Yang Dong, Youwei Jia, David J Hill, Yan Xu, and Yuan Zhang. Short-term residential load forecasting based on lstm recurrent neural network. IEEE Transactions on Smart Grid, 2017

  23. [23]

    Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches

    Salah Bouktif, Ali Fiaz, Ali Ouni, and Mohamed Serhani. Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches. Energies, 11(7):1636, 2018

  24. [24]

    Short-term load forecasting with multi-source data using gated recurrent unit neural networks

    Yixing Wang, Meiqin Liu, Zhejing Bao, and Senlin Zhang. Short-term load forecasting with multi-source data using gated recurrent unit neural networks. Energies, 11:1138, 05 2018

  25. [25]

    Load forecasting via deep neural networks

    Wan He. Load forecasting via deep neural networks. Procedia Computer Science, 122:308 – 314, 2017. 5th International Conference on Information Technology and Quantitative Management, ITQM 2017. 22 A PREPRINT

  26. [26]

    A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network

    Chujie Tian, Jian Ma, Chunhong Zhang, and Panpan Zhan. A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network. Energies, 11:3493, 12 2018

  27. [27]

    Global energy forecasting competition 2012

    Tao Hong, Pierre Pinson, and Shu Fan. Global energy forecasting competition 2012. International Journal of Forecasting, 30(2):357 – 363, 2014

  28. [28]

    Using recurrent artificial neural networks to forecast household electricity consumption

    Antonino Marvuglia and Antonio Messineo. Using recurrent artificial neural networks to forecast household electricity consumption. Energy Procedia, 14:45 – 55, 2012. 2011 2nd International Conference on Advances in Energy Engineering (ICAEE)

  29. [29]

    UCI machine learning repository, 2017

    Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository, 2017

  30. [30]

    Smart grid, smart city, australian govern., australia, canberray

  31. [31]

    Yao Cheng, Chang Xu, Daisuke Mashima, Vrizlynn L. L. Thing, and Yongdong Wu. Powerlstm: Power demand forecasting using long short-term memory neural network. In Gao Cong, Wen-Chih Peng, Wei Emma Zhang, Chengliang Li, and Aixin Sun, editors, Advanced Data Mining and Applications, pages 727–740, Cham, 2017. Springer International Publishing

  32. [32]

    http://traces.cs.umass.edu/index.php/Smart/Smart, 2017

    Umass smart dataset. http://traces.cs.umass.edu/index.php/Smart/Smart, 2017

  33. [33]

    Building energy load forecasting using deep neural networks

    Daniel L Marino, Kasun Amarasinghe, and Milos Manic. Building energy load forecasting using deep neural networks. In Industrial Electronics Society, IECON 2016-42nd Annual Conference of the IEEE, pages 7046–7051. IEEE, 2016

  34. [34]

    Combining auto-regression with exogenous variables in sequence-to-sequence recurrent neural networks for short-term load forecasting

    Henning Wilms, Marco Cupelli, and Antonello Monti. Combining auto-regression with exogenous variables in sequence-to-sequence recurrent neural networks for short-term load forecasting. In 2018 IEEE 16th International Conference on Industrial Informatics (INDIN), pages 673–679. IEEE, 2018

  35. [35]

    Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli, and Rob J. Hyndman. Probabilistic energy forecasting: Global energy forecasting competition 2014 and beyond. International Journal of Forecasting, 32(3):896 – 913, 2016

  36. [36]

    Almalaq and G

    A. Almalaq and G. Edwards. A review of deep learning methods applied on load forecasting. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 511–516, Dec 2017

  37. [37]

    Approximation with artificial neural networks

    Balázs Csanád Csáji. Approximation with artificial neural networks

  38. [38]

    G. Hinton. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

  39. [39]

    Matthew D. Zeiler. Adadelta: An adaptive learning rate method. 1212, 12 2012

  40. [40]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014

  41. [41]

    Chen, D.C

    S.T. Chen, D.C. Yu, and A.R. Moghaddamjo. Weather sensitive short-term load forecasting using nonfully connected artificial neural network. IEEE Transactions on Power Systems (Institute of Electrical and Electronics Engineers); (United States), (3), 8 1992

  42. [42]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016, pages 770–778, 2016

  43. [43]

    Jeffrey L. Elman. Finding structure in time. COGNITIVE SCIENCE, 14(2):179–211, 1990

  44. [44]

    Long short-term memory

    Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997

  45. [45]

    Learning phrase representations using RNN encoder-decoder for statistical machine translation

    Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A ...

  46. [46]

    Paul J. Werbos. Backpropagation through time: What it does and how to do it. 1990

  47. [47]

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Parallel distributed processing: Explorations in the microstruc- ture of cognition, vol. 1. chapter Learning Internal Representations by Error Propagation, pages 318–362. MIT Press, Cambridge, MA, USA, 1986

  48. [48]

    Williams and Jing Peng

    Ronald J. Williams and Jing Peng. An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation, 2, 09 1998

  49. [49]

    Bengio, P

    Y . Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult.Trans. Neur. Netw., 5(2):157–166, March 1994. 23 A PREPRINT

  50. [50]

    On the difficulty of training recurrent neural networks

    Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML’13, pages III–1310–III–1318. JMLR.org, 2013

  51. [51]

    Greff, R

    K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber. Lstm: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10):2222–2232, Oct 2017

  52. [52]

    Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

    Junyoung Chung, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014

  53. [53]

    Comparative Study of CNN and RNN for Natural Language Processing

    Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923, 2017

  54. [54]

    How to Construct Deep Recurrent Neural Networks

    Razvan Pascanu, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. How to construct deep recurrent neural networks. CoRR, abs/1312.6026, 2013

  55. [55]

    Learning complex, extended sequences using the principle of history compression

    Jürgen Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Comput., 4(2):234–242, March 1992

  56. [56]

    Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. Speech recognition with deep recurrent neural networks. CoRR, abs/1303.5778, 2013

  57. [57]

    Training and analysing deep recurrent neural networks

    Michiel Hermans and Benjamin Schrauwen. Training and analysing deep recurrent neural networks. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 26, pages 190–198. Curran Associates, Inc., 2013

  58. [58]

    Atiya, and Antti Sorjamaa

    Souhaib Ben Taieb, Gianluca Bontempi, Amir F. Atiya, and Antti Sorjamaa. A review and comparison of strategies for multi-step ahead time series forecasting based on the nn5 forecasting competition. Expert Systems with Applications, 39(8):7067 – 7083, 2012

  59. [59]

    Time series prediction using dirrec strategy

    Antti Sorjamaa and Amaury Lendasse. Time series prediction using dirrec strategy. volume 6, pages 143–148, 01 2006

  60. [60]

    Long term time series prediction with multi-input multi-output local learning

    Gianluca Bontempi. Long term time series prediction with multi-input multi-output local learning. Proceedings of the 2nd European Symposium on Time Series Prediction (TSP), ESTSP08, 01 2008

  61. [61]

    Long-term prediction of time series by combining direct and mimo strategies

    Souhaib Ben Taieb, Gianluca Bontempi, Antti Sorjamaa, and Amaury Lendasse. Long-term prediction of time series by combining direct and mimo strategies. 2009 International Joint Conference on Neural Networks, pages 3054–3061, 2009

  62. [62]

    F. M. Bianchi, E. De Santis, A. Rizzi, and A. Sadeghian. Short-term electric load forecasting using echo state networks and pca decomposition. IEEE Access, 3:1931–1943, 2015

  63. [63]

    Ilya Sutskever, Oriol Vinyals, and Quoc V . Le. Sequence to sequence learning with neural networks. InAdvances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 3104–3112, 2014

  64. [64]

    Neural Machine Translation by Jointly Learning to Align and Translate

    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014

  65. [65]

    Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V . Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, ukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between h...

  66. [66]

    Graves, A

    A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6645–6649, May 2013

  67. [67]

    Attention-based models for speech recognition

    Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 577–585. Curran Associates, Inc., 2015

  68. [68]

    Bahdanau, J

    D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y . Bengio. End-to-end attention-based large vocabulary speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4945–4949, March 2016

  69. [69]

    Show, attend and tell: Neural image caption generation with visual attention

    Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Resear...

  70. [70]

    Williams and David Zipser

    Ronald J. Williams and David Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1:270–280, 1989. 24 A PREPRINT

  71. [71]

    Sequence Level Training with Recurrent Neural Networks

    Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. Sequence level training with recurrent neural networks. CoRR, abs/1511.06732, 2015

  72. [72]

    Scheduled sampling for sequence prediction with recurrent neural networks

    Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, pages 1171–1179, Cambridge, MA, USA, 2015. MIT Press

  73. [73]

    Courville, and Yoshua Bengio

    Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron C. Courville, and Yoshua Bengio. Professor forcing: A new algorithm for training recurrent networks. In NIPS, 2016

  74. [74]

    Yaser Keneshloo, Tian Shi, Naren Ramakrishnan, and Chandan K. Reddy. Deep reinforcement learning for sequence to sequence models. CoRR, abs/1805.09461, 2018

  75. [75]

    Combining auto-regression with exogenous variables in sequence- to-sequence recurrent neural networks for short-term load forecasting

    Henning Wilms, Marco Cupelli, and A Monti. Combining auto-regression with exogenous variables in sequence- to-sequence recurrent neural networks for short-term load forecasting. pages 673–679, 07 2018

  76. [76]

    Gradient-based learning applied to document recognition

    Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pages 2278–2324, 1998

  77. [77]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012

  78. [78]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014

  79. [79]

    Going deeper with convolutions

    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

  80. [80]

    Fast r-cnn

    Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision , pages 1440–1448, 2015

Showing first 80 references.