Streaming Adaptation of Deep Forecasting Models using Adaptive Recurrent Units

Prathamesh Deshpande; Sunita Sarawagi

arxiv: 1906.09926 · v2 · pith:YLWBM2NQnew · submitted 2019-06-24 · 💻 cs.LG · cs.AI· stat.ML

Streaming Adaptation of Deep Forecasting Models using Adaptive Recurrent Units

Prathamesh Deshpande , Sunita Sarawagi This is my paper

Pith reviewed 2026-05-25 17:39 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords time series forecastingstreaming adaptationadaptive recurrent unitdeep learninglocal adaptationconditional Gaussianonline learningglobal models

0 comments

The pith

Adaptive Recurrent Units embed closed-form local linear models inside deep global time-series forecasters for streaming adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ARU to let a single deep model trained across many time series adapt its predictions to each individual series as new data arrives. It does this by maintaining a compact set of sufficient statistics for conditional Gaussian distributions and using them to derive per-series linear parameters in closed form. The unit plugs into the global network so that training remains end-to-end while inference uses only a fixed-size state and an RNN-style update. Experiments across datasets show this approach outperforms prior local-adaptation techniques that require extra computation from the global network. If the approach holds, global forecasting models can personalize to new or drifting series without storing per-series parameters or retraining.

Core claim

ARU maintains sufficient statistics of conditional Gaussian distributions inside a globally trained deep network and uses them to compute local linear parameters in closed form; this embedding permits both end-to-end training of the global model and lightweight RNN-like updates that adapt forecasts to streaming per-series data.

What carries the argument

The Adaptive Recurrent Unit (ARU), which stores fixed-size sufficient statistics for conditional Gaussians to obtain local parameters without taxing the global network.

If this is right

A single global network can serve many time series while still producing series-specific forecasts at test time.
Memory use stays constant regardless of the number of series, because only fixed-size statistics are kept.
Adaptation happens online with a simple update rule, without requiring gradient steps on the global weights.
End-to-end training remains possible because the local-parameter computation is differentiable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sufficient-statistic trick could be tried in other sequence models where global and local structure must coexist.
If the local parameters prove stable, periodic global retraining might be needed less often.
The method assumes the conditional distributions stay approximately Gaussian; non-Gaussian extensions would require new sufficient statistics.

Load-bearing premise

The closed-form local parameters derived from the maintained statistics can be inserted into the deep global model without degrading its learned representations or needing extra tuning steps.

What would settle it

On a held-out streaming dataset with distribution drift, the ARU-adapted forecasts show no improvement over a global model that receives no per-series updates.

Figures

Figures reproduced from arXiv: 1906.09926 by Prathamesh Deshpande, Sunita Sarawagi.

**Figure 2.** Figure 2: ARU cell combined with the decoder of the global [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Performance of ARU and DeepState on synthetic data [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Examples of four time series from the Rossman datas [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

We present ARU, an Adaptive Recurrent Unit for streaming adaptation of deep globally trained time-series forecasting models. The ARU combines the advantages of learning complex data transformations across multiple time series from deep global models, with per-series localization offered by closed-form linear models. Unlike existing methods of adaptation that are either memory-intensive or non-responsive after training, ARUs require only fixed sized state and adapt to streaming data via an easy RNN-like update operation. The core principle driving ARU is simple --- maintain sufficient statistics of conditional Gaussian distributions and use them to compute local parameters in closed form. Our contribution is in embedding such local linear models in globally trained deep models while allowing end-to-end training on the one hand, and easy RNN-like updates on the other. Across several datasets we show that ARU is more effective than recently proposed local adaptation methods that tax the global network to compute local parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ARU's core is a clean way to keep Gaussian sufficient stats inside an RNN unit for closed-form local params on top of a global deep forecaster, but the integration with gradients needs explicit verification.

read the letter

The paper's main contribution is ARU, an RNN-style unit that tracks sufficient statistics of conditional Gaussians so local linear parameters can be solved in closed form for each series while the whole system stays inside a globally trained deep model. The state stays fixed size and updates look like a standard RNN step, which addresses the practical need for streaming adaptation without retraining the global network or storing per-series models. This sits between fully global and fully local approaches and claims better results than recent local-adaptation baselines that call the global network repeatedly. The idea is straightforward and targets a real deployment constraint in time-series systems. The abstract does not show the equations, so it is not possible to check whether the closed-form step is fully inside the gradient flow or whether it requires any detached computation or extra regularization. If the local parameters end up being computed outside backprop, the global representation would be unchanged from a non-adaptive baseline and the reported gains would need another explanation. The experiments are described only at a high level with no dataset names, error bars, or exact baseline implementations visible here. A reader working on production forecasting pipelines that must handle new or drifting series would find the architectural sketch useful. The work shows clear engagement with the adaptation literature and the central claim is internally consistent on its own terms, so it is worth sending to referees even if the math and experimental details will need expansion.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce the Adaptive Recurrent Unit (ARU) for streaming adaptation of deep globally trained time-series forecasting models. ARU maintains sufficient statistics of conditional Gaussian distributions to compute local parameters in closed form, enabling per-series localization while supporting end-to-end training of the global deep model and fixed-size RNN-style updates. It asserts superior effectiveness over recently proposed local adaptation methods across several datasets.

Significance. If the claimed integration of closed-form local adaptation into end-to-end trainable deep models holds without degrading global representations or requiring post-hoc tuning, the result would offer a practical, memory-efficient solution for adapting forecasting models to individual streaming time series, addressing key limitations of existing methods.

major comments (2)

[Abstract] Abstract: The core principle is stated and superior results are claimed, but no equations, experimental details, error bars, or dataset descriptions are provided, so the central effectiveness claim cannot be verified from the given text.
[Method] Method (core principle description): The claim that sufficient statistics of conditional Gaussians can be maintained and inverted in closed form inside a globally trained deep network while permitting end-to-end gradient flow and fixed-size updates is load-bearing for attributing gains to ARU; the manuscript must demonstrate this integration explicitly rather than assuming it avoids detachment from backprop or extra tuning.

minor comments (1)

[Abstract] Abstract: The acronym ARU is introduced without an initial definition or citation to prior related work on adaptive units.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their review. We address the two major comments below, clarifying where the manuscript provides the requested details and offering revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The core principle is stated and superior results are claimed, but no equations, experimental details, error bars, or dataset descriptions are provided, so the central effectiveness claim cannot be verified from the given text.

Authors: Abstracts are concise by design and do not include equations or full experimental details, which appear in the body. Section 3 derives the sufficient statistics and closed-form updates; Section 4.1 describes the datasets; Section 4.2–4.3 reports results with error bars and comparisons. We can add one sentence to the abstract referencing these sections if the editor requests, but the current form follows standard practice. revision: no
Referee: [Method] Method (core principle description): The claim that sufficient statistics of conditional Gaussians can be maintained and inverted in closed form inside a globally trained deep network while permitting end-to-end gradient flow and fixed-size updates is load-bearing for attributing gains to ARU; the manuscript must demonstrate this integration explicitly rather than assuming it avoids detachment from backprop or extra tuning.

Authors: Section 3.2–3.3 explicitly derives the maintenance of conditional Gaussian sufficient statistics, the closed-form local parameter computation, and the fixed-size RNN-style update. We show that the local parameters are differentiable functions of the statistics, enabling direct gradient flow to global parameters during end-to-end training with no post-hoc tuning. We will add an algorithm box and a short backpropagation derivation in the revision to make the integration path fully explicit. revision: partial

Circularity Check

0 steps flagged

No circularity detected; derivation remains self-contained without reductions to inputs

full rationale

The provided abstract and context contain no equations, derivations, or self-citations that reduce any claimed prediction or result to a fitted quantity or definitional equivalence by construction. The core idea of maintaining sufficient statistics for closed-form local parameters is presented as an embedding principle into a global deep model, with effectiveness asserted via empirical comparison on datasets rather than by algebraic identity or self-referential fitting. No load-bearing step matches any of the enumerated circularity patterns, as no specific reductions (e.g., Eq. X defined as Eq. Y) can be exhibited from the text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the assumption that conditional Gaussian distributions adequately capture the local linear structure needed for adaptation; no free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)

domain assumption Sufficient statistics of conditional Gaussian distributions can be maintained in fixed size and used to compute local linear parameters in closed form that improve forecasting accuracy.
Stated as the core principle driving ARU in the abstract.

invented entities (1)

Adaptive Recurrent Unit (ARU) no independent evidence
purpose: To enable streaming per-series adaptation inside a global deep model via RNN-like updates.
New component introduced by the paper.

pith-pipeline@v0.9.0 · 5682 in / 1213 out tokens · 20274 ms · 2026-05-25T17:39:13.940552+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 6 internal anchors

[1]

Miguel Araújo, Pedro Ribeiro, and Christos Faloutsos. 2 018. Tensorcast: Fore- casting Time-evolving Networks with Contextual Information. In Proceedings of the 27th International Joint Conference on Artiﬁcial Intel ligence (IJCAI’18)

work page
[2]

George Athanasopoulos, Rob J Hyndman, Haiyan Song, and D oris C Wu. 2011. The tourism forecasting competition. International Journal of Forecasting 27, 3 (2011), 822–844

work page 2011
[3]

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2007. Analy- sis of Representations for Domain Adaptation. InAdvances in Neural Information Processing Systems 20 . MIT Press, Cambridge, MA

work page 2007
[4]

John Blitzer, Ryan McDonald, and Fernando Pereira. 2006 . Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing. Association for Computational Linguistics, 120–128

work page 2006
[5]

G.E.P Box and D.R. Cox. 1964. An analysis of transformati ons. Journal of Royal Statistical Society. Series B (Methodological) 26, 2 (1964), 211–252

work page 1964
[6]

G.E.P Box and Gwilym M. Jenkins. 1968. Some recent advanc es in forecasting and control. Journal of Royal Statistical Society. Series C (Applied Sta tistics) 17, 2 (1968), 91–109

work page 1968
[7]

Nicolas Chapados. 2014. Eﬀective Bayesian modeling of g roups of related count time series. arXiv preprint arXiv:1405.3738 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[8]

Christos Faloutsos, Jan Gasthaus, Tim Januschowski, an d Yuyang Wang. 2018. Forecasting Big Time Series: Old and New. Proc. VLDB Endow. 11, 12 (Aug. 2018), 2102–2105

work page 2018
[9]

Kris Johnson Ferreira, Bing Hong Alex Lee, and David Simc hi-Levi. 2015. Ana- lytics for and online retailer: Demand forecasting and pric e optimization. Man- ufacturing and Service Operations Management 18, 1 (2015), 69–88

work page 2015
[10]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. M odel-Agnostic Meta- Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th Inter- national Conference on Machine Learning . 1126–1135

work page 2017
[11]

Valentin Flunkert, David Salinas, and Jan Gasthaus. 20 17. DeepAR: Probabilis- tic Forecasting with Autoregressive Recurrent Networks. CoRR abs/1704.04110 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Hardik Goel, Igor Melnyk, and Arindam Banerjee. 2017. R 2N2: Residual Re- current Neural Networks for Multivariate Time Series Forec asting. CoRR abs/1709.03159 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Hyndman, A.B

R. Hyndman, A.B. Koehler, J.K. Ord, and R.D. Snyder. 200 8. Forecasting with exponential smoothing: The state space approach . Springer

work page
[14]

Vitaly Kuznetsov and Zelda Mariet. 2019. Foundations o f Sequence-to-Sequence Modeling for Time Series. AISTATS (2019)

work page 2019
[15]

Larson, David Simchi-Levi, Philip Kaminsky, an d Edith Simchi-Levi

Paul D. Larson, David Simchi-Levi, Philip Kaminsky, an d Edith Simchi-Levi

work page
[16]

Journal of Business Logistics 22, 1 (2001), 259–261

Designing and manging the supply chain. Journal of Business Logistics 22, 1 (2001), 259–261

work page 2001
[17]

Aditya Prakash, and Christos Faloutsos

Lei Li, B. Aditya Prakash, and Christos Faloutsos. 2010 . Parsimonious Linear Fingerprinting for Time Series. Proc. VLDB Endow. 3, 1-2 (Sept. 2010)

work page 2010
[18]

Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Piete r Abbeel. 2018. A Sim- ple Neural Attentive Meta-Learner. In International Conference on Learning Rep- resentations

work page 2018
[19]

Srayanta Mukherjee, Devashish Shankar, Atin Ghosh, Ni lam Tathawadekar, Pramod Kompalli, Sunita Sarawagi, and Krishnendu Chaudhury. 2018. ARMDN: Associative and Recurrent Mixture Density Networks for eRe tail Demand Fore- casting. CoRR abs/1803.03800 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

Oliva, Barnabás Póczos, and Jeﬀ G

Junier B. Oliva, Barnabás Póczos, and Jeﬀ G. Schneider. 2017. The Statistical Recurrent Unit. In ICML. 2671–2680

work page 2017
[21]

Cottrell

Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei J iang, and Garrison W. Cottrell. 2017. A Dual-Stage Attention-Based Recurrent Ne ural Network for Time Series Prediction. In IJCAI. 2627–2633

work page 2017
[22]

J.R. Quinlan. 1992. Learning with continuous classes. Proceedings of the 5th Australian Joint Conference on Artiﬁcial Intelligence (1992), 343–348

work page 1992
[23]

Jack W Rae, Chris Dyer, Peter Dayan, and Timothy P Lillic rap. 2018. Fast Para- metric Learning with Activation Memorization. arXiv preprint arXiv:1803.10049 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

Syama Sundar Rangapuram, Matthias W Seeger, Jan Gastha us, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. 2018. Deep State Space Mo dels for Time Series Forecasting. In Advances in Neural Information Processing Systems 31 , S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bi anchi, and R. Gar- nett (Eds.). 7796–7805

work page 2018
[25]

Sachin Ravi and Hugo Larochelle. 2017. Optimization as a model for few shot learning. In ICLR

work page 2017
[26]

Marek Rei. 2015. Online Representation Learning in Rec urrent Neural Language Models. In Proceedings of the 2015 Conference on Empirical Methods in Na tural Language Processing. http://aclweb.org/anthology/D/D15/D15-1026.pdf

work page 2015
[27]

Lillicrap

Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daa n Wierstra, and Tim- othy P. Lillicrap. 2016. Meta-Learning with Memory-Augmen ted Neural Net- works. In ICML. 1842–1850

work page 2016
[28]

Matthias Seeger, David Salinas, and Valentin Flunkert . 2016. Bayesian Inter- mittent Demand Forecasting for Large Inventories. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16)

work page 2016
[29]

Shiv Shankar and Sunita Sarawagi. 2018. Labeled Memory Networks for Online Model Adaptation. In AAAI

work page 2018
[30]

Ruofeng Wen, Kari Torkkola, and Balakrishnan Narayana swamy. 2017. A Multi- Horizon Quantile Recurrent Forecaster. arXiv preprint arXiv:1711.11053 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Hsiang-Fu Yu, Nikhil Rao, and Inderjit S Dhillon. 2016. Temporal regularized matrix factorization for high-dimensional time series pre diction. In Advances in neural information processing systems . 847–855

work page 2016
[32]

Dhillo n

Jiong Zhang, Yibo Lin, Zhao Song, and Inderjit S. Dhillo n. 2018. Learning Long Term Dependencies via Fourier Recurrent Units. In Proceedings of the 35th Inter- national Conference on Machine Learning, ICML 2018, Stockh olmsmässan, Stock- holm, Sweden, July 10-15, 2018 . 5810–5818

work page 2018

[1] [1]

Miguel Araújo, Pedro Ribeiro, and Christos Faloutsos. 2 018. Tensorcast: Fore- casting Time-evolving Networks with Contextual Information. In Proceedings of the 27th International Joint Conference on Artiﬁcial Intel ligence (IJCAI’18)

work page

[2] [2]

George Athanasopoulos, Rob J Hyndman, Haiyan Song, and D oris C Wu. 2011. The tourism forecasting competition. International Journal of Forecasting 27, 3 (2011), 822–844

work page 2011

[3] [3]

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2007. Analy- sis of Representations for Domain Adaptation. InAdvances in Neural Information Processing Systems 20 . MIT Press, Cambridge, MA

work page 2007

[4] [4]

John Blitzer, Ryan McDonald, and Fernando Pereira. 2006 . Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing. Association for Computational Linguistics, 120–128

work page 2006

[5] [5]

G.E.P Box and D.R. Cox. 1964. An analysis of transformati ons. Journal of Royal Statistical Society. Series B (Methodological) 26, 2 (1964), 211–252

work page 1964

[6] [6]

G.E.P Box and Gwilym M. Jenkins. 1968. Some recent advanc es in forecasting and control. Journal of Royal Statistical Society. Series C (Applied Sta tistics) 17, 2 (1968), 91–109

work page 1968

[7] [7]

Nicolas Chapados. 2014. Eﬀective Bayesian modeling of g roups of related count time series. arXiv preprint arXiv:1405.3738 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[8] [8]

Christos Faloutsos, Jan Gasthaus, Tim Januschowski, an d Yuyang Wang. 2018. Forecasting Big Time Series: Old and New. Proc. VLDB Endow. 11, 12 (Aug. 2018), 2102–2105

work page 2018

[9] [9]

Kris Johnson Ferreira, Bing Hong Alex Lee, and David Simc hi-Levi. 2015. Ana- lytics for and online retailer: Demand forecasting and pric e optimization. Man- ufacturing and Service Operations Management 18, 1 (2015), 69–88

work page 2015

[10] [10]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. M odel-Agnostic Meta- Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th Inter- national Conference on Machine Learning . 1126–1135

work page 2017

[11] [11]

Valentin Flunkert, David Salinas, and Jan Gasthaus. 20 17. DeepAR: Probabilis- tic Forecasting with Autoregressive Recurrent Networks. CoRR abs/1704.04110 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Hardik Goel, Igor Melnyk, and Arindam Banerjee. 2017. R 2N2: Residual Re- current Neural Networks for Multivariate Time Series Forec asting. CoRR abs/1709.03159 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

Hyndman, A.B

R. Hyndman, A.B. Koehler, J.K. Ord, and R.D. Snyder. 200 8. Forecasting with exponential smoothing: The state space approach . Springer

work page

[14] [14]

Vitaly Kuznetsov and Zelda Mariet. 2019. Foundations o f Sequence-to-Sequence Modeling for Time Series. AISTATS (2019)

work page 2019

[15] [15]

Larson, David Simchi-Levi, Philip Kaminsky, an d Edith Simchi-Levi

Paul D. Larson, David Simchi-Levi, Philip Kaminsky, an d Edith Simchi-Levi

work page

[16] [16]

Journal of Business Logistics 22, 1 (2001), 259–261

Designing and manging the supply chain. Journal of Business Logistics 22, 1 (2001), 259–261

work page 2001

[17] [17]

Aditya Prakash, and Christos Faloutsos

Lei Li, B. Aditya Prakash, and Christos Faloutsos. 2010 . Parsimonious Linear Fingerprinting for Time Series. Proc. VLDB Endow. 3, 1-2 (Sept. 2010)

work page 2010

[18] [18]

Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Piete r Abbeel. 2018. A Sim- ple Neural Attentive Meta-Learner. In International Conference on Learning Rep- resentations

work page 2018

[19] [19]

Srayanta Mukherjee, Devashish Shankar, Atin Ghosh, Ni lam Tathawadekar, Pramod Kompalli, Sunita Sarawagi, and Krishnendu Chaudhury. 2018. ARMDN: Associative and Recurrent Mixture Density Networks for eRe tail Demand Fore- casting. CoRR abs/1803.03800 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

Oliva, Barnabás Póczos, and Jeﬀ G

Junier B. Oliva, Barnabás Póczos, and Jeﬀ G. Schneider. 2017. The Statistical Recurrent Unit. In ICML. 2671–2680

work page 2017

[21] [21]

Cottrell

Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei J iang, and Garrison W. Cottrell. 2017. A Dual-Stage Attention-Based Recurrent Ne ural Network for Time Series Prediction. In IJCAI. 2627–2633

work page 2017

[22] [22]

J.R. Quinlan. 1992. Learning with continuous classes. Proceedings of the 5th Australian Joint Conference on Artiﬁcial Intelligence (1992), 343–348

work page 1992

[23] [23]

Jack W Rae, Chris Dyer, Peter Dayan, and Timothy P Lillic rap. 2018. Fast Para- metric Learning with Activation Memorization. arXiv preprint arXiv:1803.10049 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

Syama Sundar Rangapuram, Matthias W Seeger, Jan Gastha us, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. 2018. Deep State Space Mo dels for Time Series Forecasting. In Advances in Neural Information Processing Systems 31 , S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bi anchi, and R. Gar- nett (Eds.). 7796–7805

work page 2018

[25] [25]

Sachin Ravi and Hugo Larochelle. 2017. Optimization as a model for few shot learning. In ICLR

work page 2017

[26] [26]

Marek Rei. 2015. Online Representation Learning in Rec urrent Neural Language Models. In Proceedings of the 2015 Conference on Empirical Methods in Na tural Language Processing. http://aclweb.org/anthology/D/D15/D15-1026.pdf

work page 2015

[27] [27]

Lillicrap

Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daa n Wierstra, and Tim- othy P. Lillicrap. 2016. Meta-Learning with Memory-Augmen ted Neural Net- works. In ICML. 1842–1850

work page 2016

[28] [28]

Matthias Seeger, David Salinas, and Valentin Flunkert . 2016. Bayesian Inter- mittent Demand Forecasting for Large Inventories. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16)

work page 2016

[29] [29]

Shiv Shankar and Sunita Sarawagi. 2018. Labeled Memory Networks for Online Model Adaptation. In AAAI

work page 2018

[30] [30]

Ruofeng Wen, Kari Torkkola, and Balakrishnan Narayana swamy. 2017. A Multi- Horizon Quantile Recurrent Forecaster. arXiv preprint arXiv:1711.11053 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

Hsiang-Fu Yu, Nikhil Rao, and Inderjit S Dhillon. 2016. Temporal regularized matrix factorization for high-dimensional time series pre diction. In Advances in neural information processing systems . 847–855

work page 2016

[32] [32]

Dhillo n

Jiong Zhang, Yibo Lin, Zhao Song, and Inderjit S. Dhillo n. 2018. Learning Long Term Dependencies via Fourier Recurrent Units. In Proceedings of the 35th Inter- national Conference on Machine Learning, ICML 2018, Stockh olmsmässan, Stock- holm, Sweden, July 10-15, 2018 . 5810–5818

work page 2018