pith. sign in

arxiv: 1906.08812 · v1 · pith:KZ4Z3OXYnew · submitted 2019-06-20 · 📡 eess.SP

Cache-Aided NOMA Mobile Edge Computing: A Reinforcement Learning Approach

Pith reviewed 2026-05-25 19:10 UTC · model grok-4.3

classification 📡 eess.SP
keywords NOMAmobile edge computingcachingreinforcement learningQ-learningLSTMtask offloadingresource allocation
0
0 comments X

The pith

A NOMA cache-aided mobile edge computing system uses LSTM task popularity prediction and Bayesian learning automata Q-learning to maximize long-term rewards through joint offloading, resource, and caching decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that pairs non-orthogonal multiple access with edge caching to handle users' computation requests more efficiently. An LSTM network forecasts which tasks will be popular so that a long-term reward maximization problem can jointly decide which tasks to offload, how to allocate computing power, and what to store locally. Single-agent Q-learning solves the resource allocation part while a multi-agent Q-learning version equipped with Bayesian learning automata handles the offloading choices. The authors prove that the Bayesian automata action selector is self-correcting and always picks an optimal action for the current state. Simulations confirm lower prediction error with higher learning rates and clear gains over doing everything locally, offloading everything, or skipping caching.

Core claim

We prove that the BLA based action selection scheme is instantaneously self-correcting and the selected action is an optimal solution for each state. Extensive simulation results demonstrate that the proposed framework significantly outperforms the benchmarks like all local computing, all offloading computing, and non-cache computing.

What carries the argument

Bayesian learning automata based action selection scheme inside multi-agent Q-learning, which selects the optimal offloading action for every state.

If this is right

  • LSTM prediction error decreases as the learning rate increases.
  • The framework outperforms all-local, all-offload, and non-cache baselines.
  • BLA-based multi-agent Q-learning improves on conventional reinforcement learning methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The self-correcting property of the action selector could transfer to other multi-agent reinforcement learning problems that require fast adaptation in wireless networks.
  • Adding mobility or location data to the LSTM input might improve caching accuracy when users move between edge servers.
  • Testing the joint optimization under sudden changes in the number of active users would reveal how well the claimed optimality scales.

Load-bearing premise

Task popularity follows temporal patterns that an LSTM can forecast accurately enough to support effective long-term reward maximization.

What would settle it

Execute the full system on real user task request traces and check whether performance remains superior to the three benchmarks or whether the BLA selector ever fails to pick the optimal action in a visited state.

Figures

Figures reproduced from arXiv: 1906.08812 by Naofal Al-Dhahir, Yuanwei Liu, Yue Chen, Zhong Yang.

Figure 1
Figure 1. Figure 1: An illustration of multi-users cache-aided mobile edge computing networks. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of cache-aided MEC networks. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Flow chart of LSTMs for task popularity prediction. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An illustration of Bayesian learning automata based multi-agent Q-learning in cache-aided [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training loss of the proposed LSTM for task popularity prediction. [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Simulation results of task popularity prediction using LSTMs. [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: A larger task input size requires more computing energy both for the mobile users and [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 7
Figure 7. Figure 7: Total transmit energy consumption vs. task input size. [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Total energy consumption vs. the computation capacity of the AP. [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Total transmit energy consumption vs. cache capacity of the AP. [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The convergence of the proposed algorithm. [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
read the original abstract

A novel non-orthogonal multiple access (NOMA) based cache-aided mobile edge computing (MEC) framework is proposed. For the purpose of efficiently allocating communication and computation resources to users' computation tasks requests, we propose a long-short-term memory (LSTM) network to predict the task popularity. Based on the predicted task popularity, a long-term reward maximization problem is formulated that involves a joint optimization of the task offloading decisions, computation resource allocation, and caching decisions. To tackle this challenging problem, a single-agent Q-learning (SAQ-learning) algorithm is invoked to learn a long-term resource allocation strategy. Furthermore, a Bayesian learning automata (BLA) based multi-agent Q-learning (MAQ-learning) algorithm is proposed for task offloading decisions. More specifically, a BLA based action select scheme is proposed for the agents in MAQ-learning to select the optimal action in every state. We prove that the BLA based action selection scheme is instantaneously self-correcting and the selected action is an optimal solution for each state. Extensive simulation results demonstrate that: 1) The prediction error of the proposed LSTMs based task popularity prediction decreases with increasing learning rate. 2) The proposed framework significantly outperforms the benchmarks like all local computing, all offloading computing, and non-cache computing. 3) The proposed BLA based MAQ-learning achieves an improved performance compared to conventional reinforcement learning algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a NOMA-based cache-aided MEC framework that uses an LSTM network to predict task popularity, formulates a long-term reward maximization problem jointly optimizing task offloading, computation resource allocation, and caching decisions, solves it via single-agent Q-learning (SAQ-learning) for resource allocation and a Bayesian learning automata (BLA) based multi-agent Q-learning (MAQ-learning) for offloading decisions, and claims to prove that the BLA action selection is instantaneously self-correcting and yields an optimal action per state. Extensive simulations are said to show that the LSTM prediction error decreases with learning rate, the framework outperforms benchmarks (all-local, all-offloading, non-cache), and BLA-MAQ-learning outperforms conventional RL.

Significance. If the claimed optimality proof for the BLA scheme holds under the non-stationary popularity induced by the LSTM predictor and the joint optimization, and if the simulation results are statistically robust, the work would contribute a concrete RL-based approach to dynamic resource allocation in cache-aided NOMA MEC systems, potentially improving long-term performance over static or myopic baselines.

major comments (2)
  1. [Abstract / proof section] Abstract (and the section containing the BLA optimality proof): The central claim that 'the BLA based action selection scheme is instantaneously self-correcting and the selected action is an optimal solution for each state' underpins the asserted superiority of MAQ-learning over conventional RL; however, the provided text gives no derivation, no statement of assumptions (e.g., reward stationarity, perfect Q-value knowledge, or handling of LSTM-induced non-stationarity), and no explicit reduction showing how the self-correcting property survives the joint offloading-caching-NOMA optimization. This must be supplied with the precise conditions under which optimality holds.
  2. [Abstract / simulation section] Abstract (simulation claims): The statements that the framework 'significantly outperforms' the three benchmarks and that BLA-MAQ-learning 'achieves an improved performance' are load-bearing for the practical contribution, yet no error bars, number of independent runs, confidence intervals, or data-exclusion rules are mentioned; without these, the quantitative superiority cannot be assessed.
minor comments (1)
  1. [Abstract] The abstract refers to 'the proposed LSTMs based task popularity prediction' (plural) but earlier mentions 'a long-short-term memory (LSTM) network' (singular); notation should be consistent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of the optimality claim and the statistical reporting of results. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract / proof section] Abstract (and the section containing the BLA optimality proof): The central claim that 'the BLA based action selection scheme is instantaneously self-correcting and the selected action is an optimal solution for each state' underpins the asserted superiority of MAQ-learning over conventional RL; however, the provided text gives no derivation, no statement of assumptions (e.g., reward stationarity, perfect Q-value knowledge, or handling of LSTM-induced non-stationarity), and no explicit reduction showing how the self-correcting property survives the joint offloading-caching-NOMA optimization. This must be supplied with the precise conditions under which optimality holds.

    Authors: We agree the current text states the claim without a full derivation or explicit assumptions. In revision we will add a dedicated appendix containing the complete proof of the instantaneously self-correcting property, together with the precise assumptions (stationary rewards within each prediction window, known Q-values at convergence, and the reduction showing the property holds for the joint offloading-caching-NOMA action space). We will also clarify that the LSTM predictor is used to generate a fixed popularity vector for each optimization epoch, thereby restoring stationarity for the subsequent RL stage. revision: yes

  2. Referee: [Abstract / simulation section] Abstract (simulation claims): The statements that the framework 'significantly outperforms' the three benchmarks and that BLA-MAQ-learning 'achieves an improved performance' are load-bearing for the practical contribution, yet no error bars, number of independent runs, confidence intervals, or data-exclusion rules are mentioned; without these, the quantitative superiority cannot be assessed.

    Authors: We concur that statistical details are required for rigorous assessment. The revised manuscript will report the number of independent Monte-Carlo runs (100), include error bars as one standard deviation, and add 95% confidence intervals for all key performance curves. Data-exclusion rules (none applied) will also be stated. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations remain independent of inputs

full rationale

The paper's core chain—LSTM-based task popularity prediction feeding a long-term reward maximization, followed by SAQ-learning and BLA-augmented MAQ-learning with an asserted proof of instantaneous self-correction and per-state optimality—does not reduce any claimed result to its fitted inputs by construction. The LSTM component is a standard supervised predictor whose outputs are treated as exogenous inputs to the subsequent optimization; the BLA proof is presented as a standalone mathematical argument rather than a tautology or renamed fit. No self-citation chains, ansatz smuggling, or uniqueness theorems imported from prior author work appear as load-bearing elements. Simulations are reported separately as empirical validation and do not substitute for the claimed proof. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit equations or sections to enumerate free parameters, axioms, or invented entities; the framework implicitly relies on standard wireless channel and task arrival models plus RL convergence assumptions that are not detailed here.

pith-pipeline@v0.9.0 · 5787 in / 1301 out tokens · 34138 ms · 2026-05-25T19:10:25.528960+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 4 internal anchors

  1. [1]

    Deep reinforcement learning in cache-aided MEC networks,

    Z. Yang, Y . Liu, Y . Chen, and G. Tyson, “Deep reinforcement learning in cache-aided MEC networks,” in IEEE Proc. of International Commun. Conf. (ICC) , Shanghai, China, May. 2019

  2. [2]

    Edge-cocaco: Toward joint optimization of computation, caching, and communication on edge cloud,

    M. Chen, Y . Hao, L. Hu, M. S. Hossain, and A. Ghoneim, “Edge-cocaco: Toward joint optimization of computation, caching, and communication on edge cloud,” IEEE Wireless Commun., vol. 25, no. 3, pp. 21–27, Jun. 2018

  3. [3]

    Distributed resource allocation in blockchain-based video streaming systems with mobile edge computing,

    M. Liu, F. R. Yu, Y . Teng, V . C. M. Leung, and M. Song, “Distributed resource allocation in blockchain-based video streaming systems with mobile edge computing,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 695–708, Jan. 2019

  4. [4]

    A survey on mobile edge computing: The communication perspective,

    Y . Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,” IEEE Commun. Surv. Tut. , vol. 19, no. 4, pp. 2322–2358, four quarter 2017

  5. [5]

    User association and resource allocation in unified noma enabled heterogeneous ultra dense networks,

    Z. Qin, X. Yue, Y . Liu, Z. Ding, and A. Nallanathan, “User association and resource allocation in unified noma enabled heterogeneous ultra dense networks,” IEEE Commun. Mag. , vol. 56, no. 6, pp. 86–92, Jun. 2018

  6. [6]

    Nonorthogonal multiple access for 5G and beyond,

    Y . Liu, Z. Qin, M. Elkashlan, Z. Ding, A. Nallanathan, and L. Hanzo, “Nonorthogonal multiple access for 5G and beyond,” Proc. IEEE, vol. 105, no. 12, pp. 2347–2381, Dec. 2017

  7. [7]

    Application of non-orthogonal multiple access in LTE and 5G networks,

    Z. Ding, Y . Liu, J. Choi, Q. Sun, M. Elkashlan, C. I, and H. V . Poor, “Application of non-orthogonal multiple access in LTE and 5G networks,” IEEE Commun. Mag. , vol. 55, no. 2, pp. 185–191, Feb. 2017

  8. [8]

    Cache-aided non-orthogonal multiple access: The two-user case,

    L. Xiang, D. W. K. Ng, X. Ge, Z. Ding, V . W. S. Wong, and R. Schober, “Cache-aided non-orthogonal multiple access: The two-user case,” IEEE J. Sel. Topic Signal Processing. , pp. 1–1, 2019

  9. [9]

    Energy-efficient resource allocation for cache-assisted mobile edge computing,

    Y . Cui, W. He, C. Ni, C. Guo, and Z. Liu, “Energy-efficient resource allocation for cache-assisted mobile edge computing,” in IEEE Proc. of Loc. Com. Netw. (LCN) , Oct. 2017, pp. 640–648

  10. [10]

    Dynamic computation offloading for mobile-edge computing with energy harvesting devices,

    Y . Mao, J. Zhang, and K. B. Letaief, “Dynamic computation offloading for mobile-edge computing with energy harvesting devices,” IEEE J. Sel. Areas Commun. , vol. 34, no. 12, pp. 3590–3605, Dec. 2016

  11. [11]

    Energy-efficient resource allocation for mobile-edge computation offloading,

    C. You, K. Huang, H. Chae, and B. H. Kim, “Energy-efficient resource allocation for mobile-edge computation offloading,” IEEE Trans. Wireless Commun. , vol. 16, no. 3, pp. 1397–1411, Mar. 2017

  12. [12]

    Optimized Computation Offloading Performance in Virtual Edge Computing Systems via Deep Reinforcement Learning

    X. Chen, H. Zhang, C. Wu, S. Mao, Y . Ji, and M. Bennis, “Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning,” ArXiv, May 2018. [Online]. Available: http://arxiv.org/abs/1805.06146

  13. [13]

    Impact of non-orthogonal multiple access on the offloading of mobile edge computing,

    Z. Ding, P. Fan, and H. V . Poor, “Impact of non-orthogonal multiple access on the offloading of mobile edge computing,” IEEE Trans. Commun. , vol. 67, no. 1, pp. 375–390, Jan. 2019

  14. [14]

    Multi-antenna NOMA for computation offloading in multiuser mobile edge computing systems,

    F. Wang, J. Xu, and Z. Ding, “Multi-antenna NOMA for computation offloading in multiuser mobile edge computing systems,” IEEE Trans. Commun. , vol. 67, no. 3, pp. 2450–2463, Mar. 2019

  15. [15]

    Edge computing aware NOMA for 5G networks,

    A. Kiani and N. Ansari, “Edge computing aware NOMA for 5G networks,” IEEE Int. of Things , vol. 5, no. 2, pp. 1299–1306, Apr. 2018

  16. [16]

    Energy-efficient NOMA-based mobile edge computing offloading,

    Y . Pan, M. Chen, Z. Yang, N. Huang, and M. Shikh-Bahaei, “Energy-efficient NOMA-based mobile edge computing offloading,” IEEE Commun. Lett. , vol. 23, no. 2, pp. 310–313, Feb. 2019

  17. [17]

    Delay minimization for NOMA-MEC offloading,

    Z. Ding, D. W. K. Ng, R. Schober, and H. V . Poor, “Delay minimization for NOMA-MEC offloading,” IEEE Signal Process. Lett., vol. 25, no. 12, pp. 1875–1879, Dec. 2018

  18. [18]

    Joint radio and computational resource allocation for NOMA-based mobile edge computing in heterogeneous networks,

    Z. Song, Y . Liu, and X. Sun, “Joint radio and computational resource allocation for NOMA-based mobile edge computing in heterogeneous networks,” IEEE Commun. Lett. , vol. 22, no. 12, pp. 2559–2562, Dec. 2018

  19. [19]

    Interplay between NOMA and other emerging technologies: A survey,

    M. Vaezi, G. Amarasuriya, Y . Liu, A. Arafa, and Z. D. Fang Fang, “Interplay between NOMA and other emerging technologies: A survey,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1903.10489

  20. [20]

    NOMA assisted wireless caching: Strategies and performance analysis,

    Z. Ding, P. Fan, G. K. Karagiannidis, R. Schober, and H. V . Poor, “NOMA assisted wireless caching: Strategies and performance analysis,” IEEE Trans. Commun. , vol. 66, no. 10, pp. 4854–4876, Oct. 2018

  21. [21]

    Coverage performance of NOMA in wireless caching networks,

    Z. Zhao, M. Xu, W. Xie, Z. Ding, and G. K. Karagiannidis, “Coverage performance of NOMA in wireless caching networks,” IEEE Commun. Lett. , vol. 22, no. 7, pp. 1458–1461, Jul. 2018. 30

  22. [22]

    Cache-Aided Non-Orthogonal Multiple Access

    L. Xiang, D. W. K. Ng, X. Ge, and Z. Ding, “Cache-aided non-orthogonal multiple access,” ArXiv, 2018. [Online]. Available: https://arxiv.org/abs/1712.09557

  23. [23]

    Leveraging Edge Caching in NOMA Systems with QoS Requirements

    J. A. Oviedo and H. R. Sadjadpour, “Leveraging edge caching in NOMA systems with QoS requirements,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1801.07430

  24. [24]

    Mode selection between index coding and superposition coding in cache-based NOMA networks,

    Y . Fu, Y . Liu, H. Wang, Z. Shi, and Y . Liu, “Mode selection between index coding and superposition coding in cache-based NOMA networks,” IEEE Commun. Lett. , vol. 23, no. 3, pp. 478–481, Mar. 2019

  25. [25]

    Security in mobile edge caching with reinforcement learning,

    L. Xiao, X. Wan, C. Dai, X. Du, X. Chen, and M. Guizani, “Security in mobile edge caching with reinforcement learning,” IEEE Wireless Commun., vol. 25, no. 3, pp. 116–122, Jun. 2018

  26. [26]

    Communications, caching, and computing for next generation HetNets,

    Y . Zhou, F. R. Yu, J. Chen, and Y . Kuo, “Communications, caching, and computing for next generation HetNets,” IEEE Wireless Commun., vol. 25, no. 4, pp. 104–111, Aug. 2018

  27. [27]

    A branch and bound algorithm for feature subset selection,

    Narendra and Fukunaga, “A branch and bound algorithm for feature subset selection,” IEEE Trans. Comput. , vol. C-26, no. 9, pp. 917–922, Sep. 1977

  28. [28]

    D. P. Bertsekas, Dynamic programming and optimal control . Athena scientific Belmont, MA, 2005, vol. 1, no. 3

  29. [29]

    Deep learning in physical layer communications,

    Z. Qin, H. Ye, G. Y . Li, and B. F. Juang, “Deep learning in physical layer communications,” IEEE Wireless Commun. , vol. 26, no. 2, pp. 93–99, Apr. 2019

  30. [30]

    Multiple-step-ahead traffic prediction in high-speed networks,

    A. Bayati, K. Khoa Nguyen, and M. Cheriet, “Multiple-step-ahead traffic prediction in high-speed networks,” IEEE Commun. Lett., vol. 22, no. 12, pp. 2447–2450, Dec. 2018

  31. [31]

    Learning statistical scripts with LSTM recurrent neural networks,

    K. Pichotta and R. J. Mooney, “Learning statistical scripts with LSTM recurrent neural networks,” The Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) , 2016

  32. [32]

    Predrnn: Recurrent neural networks for predictive learning using spatiotemporal LSTMs,

    Y . Wang, M. Long, J. Wang, Z. Gao, and P. S. Yu, “Predrnn: Recurrent neural networks for predictive learning using spatiotemporal LSTMs,” Advances in Neural Information Processing Systems 30 (NIPS-17) , 2017

  33. [33]

    Learning resource allocation and pricing for cloud profit maximization,

    B. Du, C. Wu, and Z. Huang, “Learning resource allocation and pricing for cloud profit maximization,” The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) , 2019

  34. [34]

    Multi-armed bandit for energy-efficient and delay-sensitive edge computing in dynamic networks with uncertainty,

    S. Ghoorchian and S. Maghsudi, “Multi-armed bandit for energy-efficient and delay-sensitive edge computing in dynamic networks with uncertainty,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1904.06258

  35. [35]

    Bayesian reinforcement learning-based coalition formation for distributed resource sharing by device-to- device users in heterogeneous cellular networks,

    A. Asheralieva, “Bayesian reinforcement learning-based coalition formation for distributed resource sharing by device-to- device users in heterogeneous cellular networks,” IEEE Trans. Wireless Commun. , vol. 16, no. 8, pp. 5016–5032, Aug. 2017

  36. [36]

    Learning Automata Based Q-learning for Content Placement in Cooperative Caching

    Z. Yang, Y . Liu, Y . Chen, and L. Jiao, “Learning automata based Q-learning for content placement in cooperative caching,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1903.06235

  37. [37]

    Joint offloading and computing optimization in wireless powered mobile-edge computing systems,

    F. Wang, J. Xu, X. Wang, and S. Cui, “Joint offloading and computing optimization in wireless powered mobile-edge computing systems,” IEEE Trans. Wireless Commun. , vol. 17, no. 3, pp. 1784–1797, Mar. 2018

  38. [38]

    Energy efficient mobile cloud computing powered by wireless energy transfer,

    C. You, K. Huang, and H. Chae, “Energy efficient mobile cloud computing powered by wireless energy transfer,” IEEE J. Sel. Areas Commun. , vol. 34, no. 5, pp. 1757–1771, May 2016

  39. [39]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997

  40. [40]

    A learning algorithm for continually running fully recurrent neural networks,

    R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural Comput., vol. 1, no. 2, pp. 270–280, Jun. 1989

  41. [41]

    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction . Cambridge, MA, USA: MIT Press, 2016

  42. [42]

    A bayesian learning automaton for solving two-armed bernoulli bandit problems,

    O. Granmo, “A bayesian learning automaton for solving two-armed bernoulli bandit problems,” in International Conference on Machine Learning and Applications , Dec. 2008, pp. 23–30

  43. [43]

    Echo state networks for proactive caching in cloud-based radio access networks with mobile users,

    M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for proactive caching in cloud-based radio access networks with mobile users,” IEEE Trans. Wireless Commun. , vol. 16, no. 6, pp. 3520–3535, Jun. 2017

  44. [44]

    Solving two-armed bernoulli bandit problems using a bayesian learning automaton,

    O.-C. Granmo, “Solving two-armed bernoulli bandit problems using a bayesian learning automaton,” International Journal of Intelligent Computing and Cybernetics , vol. 3, no. 2, pp. 207–234, 2010