Deep Learning for Time Series Forecasting: The Electric Load Case

Alberto Gasparin; Cesare Alippi; Slobodan Lukovic

REVIEW 1 major objections 1 minor 95 references

Reviewed by Pith at T0; open to challenge.

T0 means a machine referee read the full paper against a public rubric. The mark states how deep the mechanical check went, never who wrote it. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

Deep learning architectures for one-day-ahead electric load forecasting are reviewed and compared on two real datasets.

2026-05-24 17:56 UTC pith:7YWL43PL

load-bearing objection This paper runs a side-by-side comparison of existing deep learning architectures for one-day-ahead load forecasting on two real datasets. the 1 major comments →

arxiv 1907.09207 v1 pith:7YWL43PL submitted 2019-07-22 cs.LG stat.ML

Deep Learning for Time Series Forecasting: The Electric Load Case

Alberto Gasparin , Slobodan Lukovic , Cesare Alippi This is my paper

classification cs.LG stat.ML

keywords deep learningtime series forecastingelectric load forecastingneural networksrecurrent neural networkssequence to sequence modelstemporal convolutional networkssmart grids

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the lack of comprehensive comparisons among deep learning models for electric load forecasting, a nonlinear task critical for smart grid operations. It reviews and tests feedforward neural networks, recurrent neural networks, sequence-to-sequence models, and temporal convolutional networks for short-term predictions. Experiments use two real-world datasets to contrast these families and their variants. A sympathetic reader would care because the evaluation aims to guide model selection where accurate forecasts improve infrastructure efficiency. The work fills a literature gap by focusing on architectures novel to load forecasting but established in signal processing.

Core claim

By reviewing recent trends and running experiments on two real-world datasets, the paper shows that contrasting feedforward and recurrent neural networks, sequence-to-sequence models, and temporal convolutional neural networks provides a basis for selecting deep learning approaches suited to one-day-ahead electric load forecasting.

What carries the argument

Experimental contrast of feedforward neural networks, recurrent neural networks, sequence-to-sequence models, and temporal convolutional neural networks on short-term electric load data.

Load-bearing premise

The two selected real-world datasets and the chosen architectural variants are representative enough to support conclusions about which deep learning families work best for electric load forecasting in general.

What would settle it

A new study using different real-world load datasets or additional architectural variants that produces reversed performance rankings among the model families would undermine the comparison's broader applicability.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Accurate short-term load forecasts become more achievable by choosing among the contrasted neural network families.
Smart grid management gains practical guidance from the side-by-side evaluation on real data.
Sequence-to-sequence and temporal convolutional approaches, already used in signal processing, receive direct testing in the load forecasting setting.
Further work can build on the identified performance patterns across the two datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same comparison method could be applied to other time series domains such as traffic or weather prediction.
If one architecture family consistently leads on the tested data, practitioners might prioritize it for similar forecasting tasks without exhaustive re-tuning.
Extending the evaluation to longer horizons or additional datasets would test whether the observed differences persist.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

This paper runs a side-by-side comparison of existing deep learning architectures for one-day-ahead load forecasting on two real datasets.

read the letter

The paper's core move is to review and test feedforward nets, RNNs, seq2seq models, and temporal CNNs on short-term electric load prediction using two real-world traces. It does not introduce new methods or derivations; the architectures come from prior work. The stated goal is to fill a gap in direct comparisons for this task, which the abstract says was missing from the literature. That is the main thing a reader gets: an applied evaluation rather than a methodological advance. If the results section shows consistent differences with proper controls, it supplies a practical reference point for people choosing among these families in grid operations. The experiments stay focused on one-day-ahead forecasts, which matches a common operational need. The authors also note that some variants are familiar in signal processing but less tested in load forecasting, so the side-by-side application is the incremental step. The clearest limitation is the data scope. Two datasets are a narrow base for claims about which architecture family is generally preferable. Load series differ in seasonality, resolution, and exogenous factors; if the chosen traces share similar properties, the ranking could be specific to that regime. The abstract gives no detail on metrics, preprocessing, or statistical tests, so any conclusions rest on how the full results handle those elements. This paper is for practitioners or applied researchers in smart-grid forecasting who want an empirical overview of DL options on real data. Readers looking for new theory or broad methodological claims will not find them here. It is worth sending to peer review because the experimental comparison addresses a stated practical gap and the setup is straightforward enough that referees can assess the controls and generalizability directly.

Referee Report

1 major / 1 minor

Summary. The paper reviews recent deep learning approaches for electric load forecasting and performs an experimental comparison of feedforward, recurrent, sequence-to-sequence, and temporal convolutional architectures (with variants) for one-day-ahead prediction on two real-world datasets, aiming to identify preferable families for this task.

Significance. If the experimental ranking is robust, the work supplies a needed benchmark contrasting DL families on load data and could inform smart-grid applications; the review component also consolidates recent trends. The limited dataset count, however, restricts the strength of any architectural preference claims beyond the specific traces examined.

major comments (1)

[Abstract and experimental evaluation section] Abstract and experimental evaluation section: the central claim of contrasting architectures to identify preferable DL families for electric load forecasting rests on results from exactly two real-world datasets. No discussion is provided of how these traces differ in seasonality, resolution, geographic origin, or exogenous drivers; if they share similar statistical regimes the observed ranking may be an artifact rather than a general preference.

minor comments (1)

[Abstract] Abstract supplies no information on the concrete metrics, statistical tests, or preprocessing steps used in the evaluation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The concern about dataset diversity and characterization is valid and will be addressed through revisions that add explicit discussion of the traces while preserving the paper's core experimental contribution.

read point-by-point responses

Referee: [Abstract and experimental evaluation section] Abstract and experimental evaluation section: the central claim of contrasting architectures to identify preferable DL families for electric load forecasting rests on results from exactly two real-world datasets. No discussion is provided of how these traces differ in seasonality, resolution, geographic origin, or exogenous drivers; if they share similar statistical regimes the observed ranking may be an artifact rather than a general preference.

Authors: We agree that the manuscript would benefit from explicit characterization of the two datasets. In the revised version we will expand the experimental evaluation section with a new subsection describing each trace's seasonality, sampling resolution, geographic origin, and exogenous drivers (where available). We will also add a limitations paragraph in the conclusions that qualifies the architectural preferences as observed on these specific traces and notes that broader validation across additional regimes remains future work. These changes directly respond to the risk that the ranking could be an artifact of similar statistical properties. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model comparison with no derivation chain

full rationale

The paper is an empirical review and experimental evaluation of deep learning architectures for short-term electric load forecasting on two real-world datasets. It contrasts feedforward, recurrent, seq2seq and TCN variants but contains no claimed first-principles derivation, no fitted parameters renamed as predictions, and no load-bearing self-citation chains that reduce the central claim to its own inputs. The reader's assessment of score 1.0 is consistent with the absence of any self-definitional, fitted-input, or uniqueness-imported circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the standard machine-learning assumption that sufficiently expressive neural networks can approximate the nonlinear mapping from historical load and exogenous variables to future load; no free parameters or invented entities are introduced because the work is a comparative evaluation.

axioms (1)

domain assumption Deep learning architectures are suitable for modeling the nonlinear dynamics of electric load time series.
Invoked in the opening motivation for applying DL models to the forecasting task.

pith-pipeline@v0.9.0 · 5689 in / 1187 out tokens · 28284 ms · 2026-05-24T17:56:39.318497+00:00 · methodology

0 comments

read the original abstract

Management and efficient operations in critical infrastructure such as Smart Grids take huge advantage of accurate power load forecasting which, due to its nonlinear nature, remains a challenging task. Recently, deep learning has emerged in the machine learning field achieving impressive performance in a vast range of tasks, from image classification to machine translation. Applications of deep learning models to the electric load forecasting problem are gaining interest among researchers as well as the industry, but a comprehensive and sound comparison among different architectures is not yet available in the literature. This work aims at filling the gap by reviewing and experimentally evaluating on two real-world datasets the most recent trends in electric load forecasting, by contrasting deep learning architectures on short term forecast (one day ahead prediction). Specifically, we focus on feedforward and recurrent neural networks, sequence to sequence models and temporal convolutional neural networks along with architectural variants, which are known in the signal processing community but are novel to the load forecasting one.

Figures

Figures reproduced from arXiv: 1907.09207 by Alberto Gasparin, Cesare Alippi, Slobodan Lukovic.

**Figure 1.** Figure 1: A sliding windowed approach is used to frame the forecasting problem into a supervised machine learning [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: (Left) A simple RNN with a single input. The black box represents the delay operator which leads to Equation [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: 6 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 3.** Figure 3: A simple ERNN block with one cell implementing Equation 78 once rewritten as matrix concatenation: [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Long-Short Term Memory block with one cell. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Gated Recurrent Unit memory block with one cell. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: seq2seq (Encoder-Decoder) architecture with a general Recurrent Neural network both for the encoder and [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: (Left) decoder with ground-truth inputs (Teacher Forcing). (Right) Decoder with self-generated inputs. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: A 3 layers CNN with causal convolution (no dilation), the receptive field r is 4 [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 10.** Figure 10: TCN Architecture. xt is the vector of historical loads along with the exogenous features for the time window indexes from 0 to nT , zt is the vector of exogenous variables related to the last nO indexes of the time window (when available), ˆyt is the output vector. Residual Blocks are composed by a 1D Dilated Causal Convolution, a ReLU activation and Dropout. The square box represents a concatenation betw… view at source ↗

**Figure 11.** Figure 11: Weekly statistics for the electric load in the whole IHEPC(Left) and GEFCom2014datasets (right). The bold [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: (Right) Predictive performance of all the models on a single day for IHEPCdataset. The left portion of the [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: (Right) Predictive performance of all the models on a single day for GEFCom2014dataset. The left portion [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

discussion (0)

Reference graph

Works this paper leans on

95 extracted references · 95 canonical work pages · 15 internal anchors

[1]

X. Fang, S. Misra, G. Xue, and D. Yang. Smart grid — the new and improved power grid: A survey. IEEE Communications Surveys Tutorials, 14(4):944–980, Fourth 2012

work page 2012
[2]

A methodology for electric power load forecasting

Eisa Almeshaiei and Hassan Soltan. A methodology for electric power load forecasting. Alexandria Engineering Journal, 50(2):137 – 144, 2011

work page 2011
[3]

H. S. Hippert, C. E. Pedreira, and R. C. Souza. Neural networks for short-term load forecasting: a review and evaluation. IEEE Transactions on Power Systems, 16(1):44–55, Feb 2001. 21 A PREPRINT

work page 2001
[4]

Analysis of an adaptive time-series autoregressive moving-average (arma) model for short-term load forecasting

Jiann-Fuh Chen, Wei-Ming Wang, and Chao-Ming Huang. Analysis of an adaptive time-series autoregressive moving-average (arma) model for short-term load forecasting. Electric Power Systems Research, 34(3):187–196, 1995

work page 1995
[5]

Short-term load forecasting via arma model identiﬁcation including non-gaussian process considerations

Shyh-Jier Huang and Kuang-Rong Shih. Short-term load forecasting via arma model identiﬁcation including non-gaussian process considerations. IEEE Transactions on power systems, 18(2):673–679, 2003

work page 2003
[6]

The time series approach to short term load forecasting

Martin T Hagan and Suzanne M Behr. The time series approach to short term load forecasting. IEEE Transactions on Power Systems, 2(3):785–791, 1987

work page 1987
[7]

A particle swarm optimization to identifying the armax model for short-term load forecasting

Chao-Ming Huang, Chi-Jen Huang, and Ming-Li Wang. A particle swarm optimization to identifying the armax model for short-term load forecasting. IEEE Transactions on Power Systems, 20(2):1126–1133, 2005

work page 2005
[8]

Identiﬁcation of armax model for short term load forecasting: An evolutionary programming approach

Hong-Tzer Yang, Chao-Ming Huang, and Ching-Lien Huang. Identiﬁcation of armax model for short term load forecasting: An evolutionary programming approach. In Power Industry Computer Application Conference, 1995. Conference Proceedings., 1995 IEEE, pages 325–330. IEEE, 1995

work page 1995
[9]

Building-level occupancy data to improve arima-based electricity use forecasts

Guy R Newsham and Benjamin J Birt. Building-level occupancy data to improve arima-based electricity use forecasts. In Proceedings of the 2nd ACM workshop on embedded sensing systems for energy-efﬁciency in building, pages 13–18. ACM, 2010

work page 2010
[10]

K. Y . Lee, Y . T. Cha, and J. H. Park. Short-term load forecasting using an artiﬁcial neural network. IEEE Transactions on Power Systems, 7(1):124–132, Feb 1992

work page 1992
[11]

D. C. Park, M. A. El-Sharkawi, R. J. Marks, L. E. Atlas, and M. J. Damborg. Electric load forecasting using an artiﬁcial neural network. IEEE Transactions on Power Systems, 6(2):442–449, May 1991

work page 1991
[12]

Liew, and C.S

Dipti Srinivasan, A.C. Liew, and C.S. Chang. A neural network short-term load forecaster.Electric Power Systems Research, 28(3):227 – 234, 1994

work page 1994
[13]

Drezga and S

I. Drezga and S. Rahman. Short-term load forecasting with local ann predictors. IEEE Transactions on Power Systems, 14(3):844–850, Aug 1999

work page 1999
[14]

K. Chen, K. Chen, Q. Wang, Z. He, J. Hu, and J. He. Short-term load forecasting with deep residual networks. IEEE Transactions on Smart Grid, pages 1–1, 2018

work page 2018
[15]

A high precision artiﬁcial neural networks model for short-term energy load forecasting

Ping-Huan Kuo and Chiou-Jye Huang. A high precision artiﬁcial neural networks model for short-term energy load forecasting. Energies, 11(1), 2018

work page 2018
[16]

Amarasinghe, D

K. Amarasinghe, D. L. Marino, and M. Manic. Deep neural networks for energy load forecasting. In 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE), pages 1483–1488, June 2017

work page 2017
[17]

Electricity short term load forecasting using elman recurrent neural network

Siddarameshwara Nayaka, Anup Yelamali, and Kshitiz Byahatti. Electricity short term load forecasting using elman recurrent neural network. pages 351 – 354, 11 2010

work page 2010
[18]

An overview and comparative analysis of Recurrent Neural Networks for Short Term Load Forecasting

Filippo Maria Bianchi, Enrico Maiorino, Michael C. Kampffmeyer, Antonello Rizzi, and Robert Jenssen. An overview and comparative analysis of recurrent neural networks for short term load forecasting. CoRR, abs/1705.04378, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Short-term electric load forecasting using echo state networks and pca decomposition

Filippo Maria Bianchi, Enrico De Santis, Antonello Rizzi, and Alireza Sadeghian. Short-term electric load forecasting using echo state networks and pca decomposition. IEEE Access, 3:1931–1943, 2015

work page 1931
[20]

Deep learning for estimating building energy consumption

Elena Mocanu, Phuong H Nguyen, Madeleine Gibescu, and Wil L Kling. Deep learning for estimating building energy consumption. Sustainable Energy, Grids and Networks, 6:91–99, 2016

work page 2016
[21]

Electric load forecasting in smart grids using long-short- term-memory based recurrent neural network

Jian Zheng, Cencen Xu, Ziang Zhang, and Xiaohua Li. Electric load forecasting in smart grids using long-short- term-memory based recurrent neural network. In Information Sciences and Systems (CISS), 2017 51st Annual Conference on, pages 1–6. IEEE, 2017

work page 2017
[22]

Short-term residential load forecasting based on lstm recurrent neural network

Weicong Kong, Zhao Yang Dong, Youwei Jia, David J Hill, Yan Xu, and Yuan Zhang. Short-term residential load forecasting based on lstm recurrent neural network. IEEE Transactions on Smart Grid, 2017

work page 2017
[23]

Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches

Salah Bouktif, Ali Fiaz, Ali Ouni, and Mohamed Serhani. Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches. Energies, 11(7):1636, 2018

work page 2018
[24]

Short-term load forecasting with multi-source data using gated recurrent unit neural networks

Yixing Wang, Meiqin Liu, Zhejing Bao, and Senlin Zhang. Short-term load forecasting with multi-source data using gated recurrent unit neural networks. Energies, 11:1138, 05 2018

work page 2018
[25]

Load forecasting via deep neural networks

Wan He. Load forecasting via deep neural networks. Procedia Computer Science, 122:308 – 314, 2017. 5th International Conference on Information Technology and Quantitative Management, ITQM 2017. 22 A PREPRINT

work page 2017
[26]

A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network

Chujie Tian, Jian Ma, Chunhong Zhang, and Panpan Zhan. A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network. Energies, 11:3493, 12 2018

work page 2018
[27]

Global energy forecasting competition 2012

Tao Hong, Pierre Pinson, and Shu Fan. Global energy forecasting competition 2012. International Journal of Forecasting, 30(2):357 – 363, 2014

work page 2012
[28]

Using recurrent artiﬁcial neural networks to forecast household electricity consumption

Antonino Marvuglia and Antonio Messineo. Using recurrent artiﬁcial neural networks to forecast household electricity consumption. Energy Procedia, 14:45 – 55, 2012. 2011 2nd International Conference on Advances in Energy Engineering (ICAEE)

work page 2012
[29]

UCI machine learning repository, 2017

Dua Dheeru and Eﬁ Karra Taniskidou. UCI machine learning repository, 2017

work page 2017
[30]

Smart grid, smart city, australian govern., australia, canberray

work page
[31]

Yao Cheng, Chang Xu, Daisuke Mashima, Vrizlynn L. L. Thing, and Yongdong Wu. Powerlstm: Power demand forecasting using long short-term memory neural network. In Gao Cong, Wen-Chih Peng, Wei Emma Zhang, Chengliang Li, and Aixin Sun, editors, Advanced Data Mining and Applications, pages 727–740, Cham, 2017. Springer International Publishing

work page 2017
[32]

http://traces.cs.umass.edu/index.php/Smart/Smart, 2017

Umass smart dataset. http://traces.cs.umass.edu/index.php/Smart/Smart, 2017

work page 2017
[33]

Building energy load forecasting using deep neural networks

Daniel L Marino, Kasun Amarasinghe, and Milos Manic. Building energy load forecasting using deep neural networks. In Industrial Electronics Society, IECON 2016-42nd Annual Conference of the IEEE, pages 7046–7051. IEEE, 2016

work page 2016
[34]

Combining auto-regression with exogenous variables in sequence-to-sequence recurrent neural networks for short-term load forecasting

Henning Wilms, Marco Cupelli, and Antonello Monti. Combining auto-regression with exogenous variables in sequence-to-sequence recurrent neural networks for short-term load forecasting. In 2018 IEEE 16th International Conference on Industrial Informatics (INDIN), pages 673–679. IEEE, 2018

work page 2018
[35]

Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli, and Rob J. Hyndman. Probabilistic energy forecasting: Global energy forecasting competition 2014 and beyond. International Journal of Forecasting, 32(3):896 – 913, 2016

work page 2014
[36]

Almalaq and G

A. Almalaq and G. Edwards. A review of deep learning methods applied on load forecasting. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 511–516, Dec 2017

work page 2017
[37]

Approximation with artiﬁcial neural networks

Balázs Csanád Csáji. Approximation with artiﬁcial neural networks

work page
[38]

G. Hinton. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

work page
[39]

Matthew D. Zeiler. Adadelta: An adaptive learning rate method. 1212, 12 2012

work page 2012
[40]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[41]

Chen, D.C

S.T. Chen, D.C. Yu, and A.R. Moghaddamjo. Weather sensitive short-term load forecasting using nonfully connected artiﬁcial neural network. IEEE Transactions on Power Systems (Institute of Electrical and Electronics Engineers); (United States), (3), 8 1992

work page 1992
[42]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016, pages 770–778, 2016

work page 2016
[43]

Jeffrey L. Elman. Finding structure in time. COGNITIVE SCIENCE, 14(2):179–211, 1990

work page 1990
[44]

Long short-term memory

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997

work page 1997
[45]

Learning phrase representations using RNN encoder-decoder for statistical machine translation

Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A ...

work page 2014
[46]

Paul J. Werbos. Backpropagation through time: What it does and how to do it. 1990

work page 1990
[47]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Parallel distributed processing: Explorations in the microstruc- ture of cognition, vol. 1. chapter Learning Internal Representations by Error Propagation, pages 318–362. MIT Press, Cambridge, MA, USA, 1986

work page 1986
[48]

Williams and Jing Peng

Ronald J. Williams and Jing Peng. An efﬁcient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation, 2, 09 1998

work page 1998
[49]

Bengio, P

Y . Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difﬁcult.Trans. Neur. Netw., 5(2):157–166, March 1994. 23 A PREPRINT

work page 1994
[50]

On the difﬁculty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difﬁculty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML’13, pages III–1310–III–1318. JMLR.org, 2013

work page 2013
[51]

Greff, R

K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber. Lstm: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10):2222–2232, Oct 2017

work page 2017
[52]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Junyoung Chung, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[53]

Comparative Study of CNN and RNN for Natural Language Processing

Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[54]

How to Construct Deep Recurrent Neural Networks

Razvan Pascanu, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. How to construct deep recurrent neural networks. CoRR, abs/1312.6026, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[55]

Learning complex, extended sequences using the principle of history compression

Jürgen Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Comput., 4(2):234–242, March 1992

work page 1992
[56]

Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. Speech recognition with deep recurrent neural networks. CoRR, abs/1303.5778, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[57]

Training and analysing deep recurrent neural networks

Michiel Hermans and Benjamin Schrauwen. Training and analysing deep recurrent neural networks. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 26, pages 190–198. Curran Associates, Inc., 2013

work page 2013
[58]

Atiya, and Antti Sorjamaa

Souhaib Ben Taieb, Gianluca Bontempi, Amir F. Atiya, and Antti Sorjamaa. A review and comparison of strategies for multi-step ahead time series forecasting based on the nn5 forecasting competition. Expert Systems with Applications, 39(8):7067 – 7083, 2012

work page 2012
[59]

Time series prediction using dirrec strategy

Antti Sorjamaa and Amaury Lendasse. Time series prediction using dirrec strategy. volume 6, pages 143–148, 01 2006

work page 2006
[60]

Long term time series prediction with multi-input multi-output local learning

Gianluca Bontempi. Long term time series prediction with multi-input multi-output local learning. Proceedings of the 2nd European Symposium on Time Series Prediction (TSP), ESTSP08, 01 2008

work page 2008
[61]

Long-term prediction of time series by combining direct and mimo strategies

Souhaib Ben Taieb, Gianluca Bontempi, Antti Sorjamaa, and Amaury Lendasse. Long-term prediction of time series by combining direct and mimo strategies. 2009 International Joint Conference on Neural Networks, pages 3054–3061, 2009

work page 2009
[62]

F. M. Bianchi, E. De Santis, A. Rizzi, and A. Sadeghian. Short-term electric load forecasting using echo state networks and pca decomposition. IEEE Access, 3:1931–1943, 2015

work page 1931
[63]

Ilya Sutskever, Oriol Vinyals, and Quoc V . Le. Sequence to sequence learning with neural networks. InAdvances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 3104–3112, 2014

work page 2014
[64]

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[65]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V . Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, ukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between h...

work page 2016
[66]

Graves, A

A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6645–6649, May 2013

work page 2013
[67]

Attention-based models for speech recognition

Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 577–585. Curran Associates, Inc., 2015

work page 2015
[68]

Bahdanau, J

D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y . Bengio. End-to-end attention-based large vocabulary speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4945–4949, March 2016

work page 2016
[69]

Show, attend and tell: Neural image caption generation with visual attention

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Resear...

work page 2048
[70]

Williams and David Zipser

Ronald J. Williams and David Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1:270–280, 1989. 24 A PREPRINT

work page 1989
[71]

Sequence Level Training with Recurrent Neural Networks

Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. Sequence level training with recurrent neural networks. CoRR, abs/1511.06732, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[72]

Scheduled sampling for sequence prediction with recurrent neural networks

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, pages 1171–1179, Cambridge, MA, USA, 2015. MIT Press

work page 2015
[73]

Courville, and Yoshua Bengio

Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron C. Courville, and Yoshua Bengio. Professor forcing: A new algorithm for training recurrent networks. In NIPS, 2016

work page 2016
[74]

Yaser Keneshloo, Tian Shi, Naren Ramakrishnan, and Chandan K. Reddy. Deep reinforcement learning for sequence to sequence models. CoRR, abs/1805.09461, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[75]

Combining auto-regression with exogenous variables in sequence- to-sequence recurrent neural networks for short-term load forecasting

Henning Wilms, Marco Cupelli, and A Monti. Combining auto-regression with exogenous variables in sequence- to-sequence recurrent neural networks for short-term load forecasting. pages 673–679, 07 2018

work page 2018
[76]

Gradient-based learning applied to document recognition

Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pages 2278–2324, 1998

work page 1998
[77]

Imagenet classiﬁcation with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012

work page 2012
[78]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[79]

Going deeper with convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

work page 2015
[80]

Fast r-cnn

Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision , pages 1440–1448, 2015

work page 2015

Showing first 80 references.

[1] [1]

X. Fang, S. Misra, G. Xue, and D. Yang. Smart grid — the new and improved power grid: A survey. IEEE Communications Surveys Tutorials, 14(4):944–980, Fourth 2012

work page 2012

[2] [2]

A methodology for electric power load forecasting

Eisa Almeshaiei and Hassan Soltan. A methodology for electric power load forecasting. Alexandria Engineering Journal, 50(2):137 – 144, 2011

work page 2011

[3] [3]

H. S. Hippert, C. E. Pedreira, and R. C. Souza. Neural networks for short-term load forecasting: a review and evaluation. IEEE Transactions on Power Systems, 16(1):44–55, Feb 2001. 21 A PREPRINT

work page 2001

[4] [4]

Analysis of an adaptive time-series autoregressive moving-average (arma) model for short-term load forecasting

Jiann-Fuh Chen, Wei-Ming Wang, and Chao-Ming Huang. Analysis of an adaptive time-series autoregressive moving-average (arma) model for short-term load forecasting. Electric Power Systems Research, 34(3):187–196, 1995

work page 1995

[5] [5]

Short-term load forecasting via arma model identiﬁcation including non-gaussian process considerations

Shyh-Jier Huang and Kuang-Rong Shih. Short-term load forecasting via arma model identiﬁcation including non-gaussian process considerations. IEEE Transactions on power systems, 18(2):673–679, 2003

work page 2003

[6] [6]

The time series approach to short term load forecasting

Martin T Hagan and Suzanne M Behr. The time series approach to short term load forecasting. IEEE Transactions on Power Systems, 2(3):785–791, 1987

work page 1987

[7] [7]

A particle swarm optimization to identifying the armax model for short-term load forecasting

Chao-Ming Huang, Chi-Jen Huang, and Ming-Li Wang. A particle swarm optimization to identifying the armax model for short-term load forecasting. IEEE Transactions on Power Systems, 20(2):1126–1133, 2005

work page 2005

[8] [8]

Identiﬁcation of armax model for short term load forecasting: An evolutionary programming approach

Hong-Tzer Yang, Chao-Ming Huang, and Ching-Lien Huang. Identiﬁcation of armax model for short term load forecasting: An evolutionary programming approach. In Power Industry Computer Application Conference, 1995. Conference Proceedings., 1995 IEEE, pages 325–330. IEEE, 1995

work page 1995

[9] [9]

Building-level occupancy data to improve arima-based electricity use forecasts

Guy R Newsham and Benjamin J Birt. Building-level occupancy data to improve arima-based electricity use forecasts. In Proceedings of the 2nd ACM workshop on embedded sensing systems for energy-efﬁciency in building, pages 13–18. ACM, 2010

work page 2010

[10] [10]

K. Y . Lee, Y . T. Cha, and J. H. Park. Short-term load forecasting using an artiﬁcial neural network. IEEE Transactions on Power Systems, 7(1):124–132, Feb 1992

work page 1992

[11] [11]

D. C. Park, M. A. El-Sharkawi, R. J. Marks, L. E. Atlas, and M. J. Damborg. Electric load forecasting using an artiﬁcial neural network. IEEE Transactions on Power Systems, 6(2):442–449, May 1991

work page 1991

[12] [12]

Liew, and C.S

Dipti Srinivasan, A.C. Liew, and C.S. Chang. A neural network short-term load forecaster.Electric Power Systems Research, 28(3):227 – 234, 1994

work page 1994

[13] [13]

Drezga and S

I. Drezga and S. Rahman. Short-term load forecasting with local ann predictors. IEEE Transactions on Power Systems, 14(3):844–850, Aug 1999

work page 1999

[14] [14]

K. Chen, K. Chen, Q. Wang, Z. He, J. Hu, and J. He. Short-term load forecasting with deep residual networks. IEEE Transactions on Smart Grid, pages 1–1, 2018

work page 2018

[15] [15]

A high precision artiﬁcial neural networks model for short-term energy load forecasting

Ping-Huan Kuo and Chiou-Jye Huang. A high precision artiﬁcial neural networks model for short-term energy load forecasting. Energies, 11(1), 2018

work page 2018

[16] [16]

Amarasinghe, D

K. Amarasinghe, D. L. Marino, and M. Manic. Deep neural networks for energy load forecasting. In 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE), pages 1483–1488, June 2017

work page 2017

[17] [17]

Electricity short term load forecasting using elman recurrent neural network

Siddarameshwara Nayaka, Anup Yelamali, and Kshitiz Byahatti. Electricity short term load forecasting using elman recurrent neural network. pages 351 – 354, 11 2010

work page 2010

[18] [18]

An overview and comparative analysis of Recurrent Neural Networks for Short Term Load Forecasting

Filippo Maria Bianchi, Enrico Maiorino, Michael C. Kampffmeyer, Antonello Rizzi, and Robert Jenssen. An overview and comparative analysis of recurrent neural networks for short term load forecasting. CoRR, abs/1705.04378, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

Short-term electric load forecasting using echo state networks and pca decomposition

Filippo Maria Bianchi, Enrico De Santis, Antonello Rizzi, and Alireza Sadeghian. Short-term electric load forecasting using echo state networks and pca decomposition. IEEE Access, 3:1931–1943, 2015

work page 1931

[20] [20]

Deep learning for estimating building energy consumption

Elena Mocanu, Phuong H Nguyen, Madeleine Gibescu, and Wil L Kling. Deep learning for estimating building energy consumption. Sustainable Energy, Grids and Networks, 6:91–99, 2016

work page 2016

[21] [21]

Electric load forecasting in smart grids using long-short- term-memory based recurrent neural network

Jian Zheng, Cencen Xu, Ziang Zhang, and Xiaohua Li. Electric load forecasting in smart grids using long-short- term-memory based recurrent neural network. In Information Sciences and Systems (CISS), 2017 51st Annual Conference on, pages 1–6. IEEE, 2017

work page 2017

[22] [22]

Short-term residential load forecasting based on lstm recurrent neural network

Weicong Kong, Zhao Yang Dong, Youwei Jia, David J Hill, Yan Xu, and Yuan Zhang. Short-term residential load forecasting based on lstm recurrent neural network. IEEE Transactions on Smart Grid, 2017

work page 2017

[23] [23]

Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches

Salah Bouktif, Ali Fiaz, Ali Ouni, and Mohamed Serhani. Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches. Energies, 11(7):1636, 2018

work page 2018

[24] [24]

Short-term load forecasting with multi-source data using gated recurrent unit neural networks

Yixing Wang, Meiqin Liu, Zhejing Bao, and Senlin Zhang. Short-term load forecasting with multi-source data using gated recurrent unit neural networks. Energies, 11:1138, 05 2018

work page 2018

[25] [25]

Load forecasting via deep neural networks

Wan He. Load forecasting via deep neural networks. Procedia Computer Science, 122:308 – 314, 2017. 5th International Conference on Information Technology and Quantitative Management, ITQM 2017. 22 A PREPRINT

work page 2017

[26] [26]

A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network

Chujie Tian, Jian Ma, Chunhong Zhang, and Panpan Zhan. A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network. Energies, 11:3493, 12 2018

work page 2018

[27] [27]

Global energy forecasting competition 2012

Tao Hong, Pierre Pinson, and Shu Fan. Global energy forecasting competition 2012. International Journal of Forecasting, 30(2):357 – 363, 2014

work page 2012

[28] [28]

Using recurrent artiﬁcial neural networks to forecast household electricity consumption

Antonino Marvuglia and Antonio Messineo. Using recurrent artiﬁcial neural networks to forecast household electricity consumption. Energy Procedia, 14:45 – 55, 2012. 2011 2nd International Conference on Advances in Energy Engineering (ICAEE)

work page 2012

[29] [29]

UCI machine learning repository, 2017

Dua Dheeru and Eﬁ Karra Taniskidou. UCI machine learning repository, 2017

work page 2017

[30] [30]

Smart grid, smart city, australian govern., australia, canberray

work page

[31] [31]

Yao Cheng, Chang Xu, Daisuke Mashima, Vrizlynn L. L. Thing, and Yongdong Wu. Powerlstm: Power demand forecasting using long short-term memory neural network. In Gao Cong, Wen-Chih Peng, Wei Emma Zhang, Chengliang Li, and Aixin Sun, editors, Advanced Data Mining and Applications, pages 727–740, Cham, 2017. Springer International Publishing

work page 2017

[32] [32]

http://traces.cs.umass.edu/index.php/Smart/Smart, 2017

Umass smart dataset. http://traces.cs.umass.edu/index.php/Smart/Smart, 2017

work page 2017

[33] [33]

Building energy load forecasting using deep neural networks

Daniel L Marino, Kasun Amarasinghe, and Milos Manic. Building energy load forecasting using deep neural networks. In Industrial Electronics Society, IECON 2016-42nd Annual Conference of the IEEE, pages 7046–7051. IEEE, 2016

work page 2016

[34] [34]

Combining auto-regression with exogenous variables in sequence-to-sequence recurrent neural networks for short-term load forecasting

Henning Wilms, Marco Cupelli, and Antonello Monti. Combining auto-regression with exogenous variables in sequence-to-sequence recurrent neural networks for short-term load forecasting. In 2018 IEEE 16th International Conference on Industrial Informatics (INDIN), pages 673–679. IEEE, 2018

work page 2018

[35] [35]

Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli, and Rob J. Hyndman. Probabilistic energy forecasting: Global energy forecasting competition 2014 and beyond. International Journal of Forecasting, 32(3):896 – 913, 2016

work page 2014

[36] [36]

Almalaq and G

A. Almalaq and G. Edwards. A review of deep learning methods applied on load forecasting. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 511–516, Dec 2017

work page 2017

[37] [37]

Approximation with artiﬁcial neural networks

Balázs Csanád Csáji. Approximation with artiﬁcial neural networks

work page

[38] [38]

G. Hinton. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

work page

[39] [39]

Matthew D. Zeiler. Adadelta: An adaptive learning rate method. 1212, 12 2012

work page 2012

[40] [40]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[41] [41]

Chen, D.C

S.T. Chen, D.C. Yu, and A.R. Moghaddamjo. Weather sensitive short-term load forecasting using nonfully connected artiﬁcial neural network. IEEE Transactions on Power Systems (Institute of Electrical and Electronics Engineers); (United States), (3), 8 1992

work page 1992

[42] [42]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016, pages 770–778, 2016

work page 2016

[43] [43]

Jeffrey L. Elman. Finding structure in time. COGNITIVE SCIENCE, 14(2):179–211, 1990

work page 1990

[44] [44]

Long short-term memory

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997

work page 1997

[45] [45]

Learning phrase representations using RNN encoder-decoder for statistical machine translation

Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A ...

work page 2014

[46] [46]

Paul J. Werbos. Backpropagation through time: What it does and how to do it. 1990

work page 1990

[47] [47]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Parallel distributed processing: Explorations in the microstruc- ture of cognition, vol. 1. chapter Learning Internal Representations by Error Propagation, pages 318–362. MIT Press, Cambridge, MA, USA, 1986

work page 1986

[48] [48]

Williams and Jing Peng

Ronald J. Williams and Jing Peng. An efﬁcient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation, 2, 09 1998

work page 1998

[49] [49]

Bengio, P

Y . Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difﬁcult.Trans. Neur. Netw., 5(2):157–166, March 1994. 23 A PREPRINT

work page 1994

[50] [50]

On the difﬁculty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difﬁculty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML’13, pages III–1310–III–1318. JMLR.org, 2013

work page 2013

[51] [51]

Greff, R

K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber. Lstm: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10):2222–2232, Oct 2017

work page 2017

[52] [52]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Junyoung Chung, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[53] [53]

Comparative Study of CNN and RNN for Natural Language Processing

Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[54] [54]

How to Construct Deep Recurrent Neural Networks

Razvan Pascanu, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. How to construct deep recurrent neural networks. CoRR, abs/1312.6026, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[55] [55]

Learning complex, extended sequences using the principle of history compression

Jürgen Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Comput., 4(2):234–242, March 1992

work page 1992

[56] [56]

Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. Speech recognition with deep recurrent neural networks. CoRR, abs/1303.5778, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[57] [57]

Training and analysing deep recurrent neural networks

Michiel Hermans and Benjamin Schrauwen. Training and analysing deep recurrent neural networks. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 26, pages 190–198. Curran Associates, Inc., 2013

work page 2013

[58] [58]

Atiya, and Antti Sorjamaa

Souhaib Ben Taieb, Gianluca Bontempi, Amir F. Atiya, and Antti Sorjamaa. A review and comparison of strategies for multi-step ahead time series forecasting based on the nn5 forecasting competition. Expert Systems with Applications, 39(8):7067 – 7083, 2012

work page 2012

[59] [59]

Time series prediction using dirrec strategy

Antti Sorjamaa and Amaury Lendasse. Time series prediction using dirrec strategy. volume 6, pages 143–148, 01 2006

work page 2006

[60] [60]

Long term time series prediction with multi-input multi-output local learning

Gianluca Bontempi. Long term time series prediction with multi-input multi-output local learning. Proceedings of the 2nd European Symposium on Time Series Prediction (TSP), ESTSP08, 01 2008

work page 2008

[61] [61]

Long-term prediction of time series by combining direct and mimo strategies

Souhaib Ben Taieb, Gianluca Bontempi, Antti Sorjamaa, and Amaury Lendasse. Long-term prediction of time series by combining direct and mimo strategies. 2009 International Joint Conference on Neural Networks, pages 3054–3061, 2009

work page 2009

[62] [62]

F. M. Bianchi, E. De Santis, A. Rizzi, and A. Sadeghian. Short-term electric load forecasting using echo state networks and pca decomposition. IEEE Access, 3:1931–1943, 2015

work page 1931

[63] [63]

Ilya Sutskever, Oriol Vinyals, and Quoc V . Le. Sequence to sequence learning with neural networks. InAdvances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 3104–3112, 2014

work page 2014

[64] [64]

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[65] [65]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V . Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, ukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between h...

work page 2016

[66] [66]

Graves, A

A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6645–6649, May 2013

work page 2013

[67] [67]

Attention-based models for speech recognition

Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 577–585. Curran Associates, Inc., 2015

work page 2015

[68] [68]

Bahdanau, J

D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y . Bengio. End-to-end attention-based large vocabulary speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4945–4949, March 2016

work page 2016

[69] [69]

Show, attend and tell: Neural image caption generation with visual attention

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Resear...

work page 2048

[70] [70]

Williams and David Zipser

Ronald J. Williams and David Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1:270–280, 1989. 24 A PREPRINT

work page 1989

[71] [71]

Sequence Level Training with Recurrent Neural Networks

Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. Sequence level training with recurrent neural networks. CoRR, abs/1511.06732, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[72] [72]

Scheduled sampling for sequence prediction with recurrent neural networks

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, pages 1171–1179, Cambridge, MA, USA, 2015. MIT Press

work page 2015

[73] [73]

Courville, and Yoshua Bengio

Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron C. Courville, and Yoshua Bengio. Professor forcing: A new algorithm for training recurrent networks. In NIPS, 2016

work page 2016

[74] [74]

Yaser Keneshloo, Tian Shi, Naren Ramakrishnan, and Chandan K. Reddy. Deep reinforcement learning for sequence to sequence models. CoRR, abs/1805.09461, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[75] [75]

Combining auto-regression with exogenous variables in sequence- to-sequence recurrent neural networks for short-term load forecasting

Henning Wilms, Marco Cupelli, and A Monti. Combining auto-regression with exogenous variables in sequence- to-sequence recurrent neural networks for short-term load forecasting. pages 673–679, 07 2018

work page 2018

[76] [76]

Gradient-based learning applied to document recognition

Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pages 2278–2324, 1998

work page 1998

[77] [77]

Imagenet classiﬁcation with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012

work page 2012

[78] [78]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[79] [79]

Going deeper with convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

work page 2015

[80] [80]

Fast r-cnn

Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision , pages 1440–1448, 2015

work page 2015