Cache-Aided NOMA Mobile Edge Computing: A Reinforcement Learning Approach

Naofal Al-Dhahir; Yuanwei Liu; Yue Chen; Zhong Yang

arxiv: 1906.08812 · v1 · pith:KZ4Z3OXYnew · submitted 2019-06-20 · 📡 eess.SP

Cache-Aided NOMA Mobile Edge Computing: A Reinforcement Learning Approach

Zhong Yang , Yuanwei Liu , Yue Chen , Naofal Al-Dhahir This is my paper

Pith reviewed 2026-05-25 19:10 UTC · model grok-4.3

classification 📡 eess.SP

keywords NOMAmobile edge computingcachingreinforcement learningQ-learningLSTMtask offloadingresource allocation

0 comments

The pith

A NOMA cache-aided mobile edge computing system uses LSTM task popularity prediction and Bayesian learning automata Q-learning to maximize long-term rewards through joint offloading, resource, and caching decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that pairs non-orthogonal multiple access with edge caching to handle users' computation requests more efficiently. An LSTM network forecasts which tasks will be popular so that a long-term reward maximization problem can jointly decide which tasks to offload, how to allocate computing power, and what to store locally. Single-agent Q-learning solves the resource allocation part while a multi-agent Q-learning version equipped with Bayesian learning automata handles the offloading choices. The authors prove that the Bayesian automata action selector is self-correcting and always picks an optimal action for the current state. Simulations confirm lower prediction error with higher learning rates and clear gains over doing everything locally, offloading everything, or skipping caching.

Core claim

We prove that the BLA based action selection scheme is instantaneously self-correcting and the selected action is an optimal solution for each state. Extensive simulation results demonstrate that the proposed framework significantly outperforms the benchmarks like all local computing, all offloading computing, and non-cache computing.

What carries the argument

Bayesian learning automata based action selection scheme inside multi-agent Q-learning, which selects the optimal offloading action for every state.

If this is right

LSTM prediction error decreases as the learning rate increases.
The framework outperforms all-local, all-offload, and non-cache baselines.
BLA-based multi-agent Q-learning improves on conventional reinforcement learning methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The self-correcting property of the action selector could transfer to other multi-agent reinforcement learning problems that require fast adaptation in wireless networks.
Adding mobility or location data to the LSTM input might improve caching accuracy when users move between edge servers.
Testing the joint optimization under sudden changes in the number of active users would reveal how well the claimed optimality scales.

Load-bearing premise

Task popularity follows temporal patterns that an LSTM can forecast accurately enough to support effective long-term reward maximization.

What would settle it

Execute the full system on real user task request traces and check whether performance remains superior to the three benchmarks or whether the BLA selector ever fails to pick the optimal action in a visited state.

Figures

Figures reproduced from arXiv: 1906.08812 by Naofal Al-Dhahir, Yuanwei Liu, Yue Chen, Zhong Yang.

**Figure 2.** Figure 2: An illustration of cache-aided MEC networks. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Flow chart of LSTMs for task popularity prediction. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: An illustration of Bayesian learning automata based multi-agent Q-learning in cache-aided [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Training loss of the proposed LSTM for task popularity prediction. [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Simulation results of task popularity prediction using LSTMs. [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: A larger task input size requires more computing energy both for the mobile users and [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 7.** Figure 7: Total transmit energy consumption vs. task input size. [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Total energy consumption vs. the computation capacity of the AP. [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗

**Figure 9.** Figure 9: Total transmit energy consumption vs. cache capacity of the AP. [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗

**Figure 10.** Figure 10: The convergence of the proposed algorithm. [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗

read the original abstract

A novel non-orthogonal multiple access (NOMA) based cache-aided mobile edge computing (MEC) framework is proposed. For the purpose of efficiently allocating communication and computation resources to users' computation tasks requests, we propose a long-short-term memory (LSTM) network to predict the task popularity. Based on the predicted task popularity, a long-term reward maximization problem is formulated that involves a joint optimization of the task offloading decisions, computation resource allocation, and caching decisions. To tackle this challenging problem, a single-agent Q-learning (SAQ-learning) algorithm is invoked to learn a long-term resource allocation strategy. Furthermore, a Bayesian learning automata (BLA) based multi-agent Q-learning (MAQ-learning) algorithm is proposed for task offloading decisions. More specifically, a BLA based action select scheme is proposed for the agents in MAQ-learning to select the optimal action in every state. We prove that the BLA based action selection scheme is instantaneously self-correcting and the selected action is an optimal solution for each state. Extensive simulation results demonstrate that: 1) The prediction error of the proposed LSTMs based task popularity prediction decreases with increasing learning rate. 2) The proposed framework significantly outperforms the benchmarks like all local computing, all offloading computing, and non-cache computing. 3) The proposed BLA based MAQ-learning achieves an improved performance compared to conventional reinforcement learning algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines LSTM task prediction with a BLA-based multi-agent Q-learning scheme for joint NOMA-MEC decisions and claims an optimality proof for the action selector, but that proof is the part that needs checking.

read the letter

The main thing to know is that this work builds a framework linking LSTM popularity forecasts to a long-term optimization of offloading, caching, and NOMA resource splits, solved via single-agent Q-learning plus a Bayesian learning automata multi-agent variant. The authors state they prove the BLA selector is instantaneously self-correcting and picks the optimal action in every state, then show simulation gains over local-only, full-offload, and no-cache baselines.

Referee Report

2 major / 1 minor

Summary. The paper proposes a NOMA-based cache-aided MEC framework that uses an LSTM network to predict task popularity, formulates a long-term reward maximization problem jointly optimizing task offloading, computation resource allocation, and caching decisions, solves it via single-agent Q-learning (SAQ-learning) for resource allocation and a Bayesian learning automata (BLA) based multi-agent Q-learning (MAQ-learning) for offloading decisions, and claims to prove that the BLA action selection is instantaneously self-correcting and yields an optimal action per state. Extensive simulations are said to show that the LSTM prediction error decreases with learning rate, the framework outperforms benchmarks (all-local, all-offloading, non-cache), and BLA-MAQ-learning outperforms conventional RL.

Significance. If the claimed optimality proof for the BLA scheme holds under the non-stationary popularity induced by the LSTM predictor and the joint optimization, and if the simulation results are statistically robust, the work would contribute a concrete RL-based approach to dynamic resource allocation in cache-aided NOMA MEC systems, potentially improving long-term performance over static or myopic baselines.

major comments (2)

[Abstract / proof section] Abstract (and the section containing the BLA optimality proof): The central claim that 'the BLA based action selection scheme is instantaneously self-correcting and the selected action is an optimal solution for each state' underpins the asserted superiority of MAQ-learning over conventional RL; however, the provided text gives no derivation, no statement of assumptions (e.g., reward stationarity, perfect Q-value knowledge, or handling of LSTM-induced non-stationarity), and no explicit reduction showing how the self-correcting property survives the joint offloading-caching-NOMA optimization. This must be supplied with the precise conditions under which optimality holds.
[Abstract / simulation section] Abstract (simulation claims): The statements that the framework 'significantly outperforms' the three benchmarks and that BLA-MAQ-learning 'achieves an improved performance' are load-bearing for the practical contribution, yet no error bars, number of independent runs, confidence intervals, or data-exclusion rules are mentioned; without these, the quantitative superiority cannot be assessed.

minor comments (1)

[Abstract] The abstract refers to 'the proposed LSTMs based task popularity prediction' (plural) but earlier mentions 'a long-short-term memory (LSTM) network' (singular); notation should be consistent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of the optimality claim and the statistical reporting of results. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract / proof section] Abstract (and the section containing the BLA optimality proof): The central claim that 'the BLA based action selection scheme is instantaneously self-correcting and the selected action is an optimal solution for each state' underpins the asserted superiority of MAQ-learning over conventional RL; however, the provided text gives no derivation, no statement of assumptions (e.g., reward stationarity, perfect Q-value knowledge, or handling of LSTM-induced non-stationarity), and no explicit reduction showing how the self-correcting property survives the joint offloading-caching-NOMA optimization. This must be supplied with the precise conditions under which optimality holds.

Authors: We agree the current text states the claim without a full derivation or explicit assumptions. In revision we will add a dedicated appendix containing the complete proof of the instantaneously self-correcting property, together with the precise assumptions (stationary rewards within each prediction window, known Q-values at convergence, and the reduction showing the property holds for the joint offloading-caching-NOMA action space). We will also clarify that the LSTM predictor is used to generate a fixed popularity vector for each optimization epoch, thereby restoring stationarity for the subsequent RL stage. revision: yes
Referee: [Abstract / simulation section] Abstract (simulation claims): The statements that the framework 'significantly outperforms' the three benchmarks and that BLA-MAQ-learning 'achieves an improved performance' are load-bearing for the practical contribution, yet no error bars, number of independent runs, confidence intervals, or data-exclusion rules are mentioned; without these, the quantitative superiority cannot be assessed.

Authors: We concur that statistical details are required for rigorous assessment. The revised manuscript will report the number of independent Monte-Carlo runs (100), include error bars as one standard deviation, and add 95% confidence intervals for all key performance curves. Data-exclusion rules (none applied) will also be stated. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations remain independent of inputs

full rationale

The paper's core chain—LSTM-based task popularity prediction feeding a long-term reward maximization, followed by SAQ-learning and BLA-augmented MAQ-learning with an asserted proof of instantaneous self-correction and per-state optimality—does not reduce any claimed result to its fitted inputs by construction. The LSTM component is a standard supervised predictor whose outputs are treated as exogenous inputs to the subsequent optimization; the BLA proof is presented as a standalone mathematical argument rather than a tautology or renamed fit. No self-citation chains, ansatz smuggling, or uniqueness theorems imported from prior author work appear as load-bearing elements. Simulations are reported separately as empirical validation and do not substitute for the claimed proof. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit equations or sections to enumerate free parameters, axioms, or invented entities; the framework implicitly relies on standard wireless channel and task arrival models plus RL convergence assumptions that are not detailed here.

pith-pipeline@v0.9.0 · 5787 in / 1301 out tokens · 34138 ms · 2026-05-25T19:10:25.528960+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 4 internal anchors

[1]

Deep reinforcement learning in cache-aided MEC networks,

Z. Yang, Y . Liu, Y . Chen, and G. Tyson, “Deep reinforcement learning in cache-aided MEC networks,” in IEEE Proc. of International Commun. Conf. (ICC) , Shanghai, China, May. 2019

work page 2019
[2]

Edge-cocaco: Toward joint optimization of computation, caching, and communication on edge cloud,

M. Chen, Y . Hao, L. Hu, M. S. Hossain, and A. Ghoneim, “Edge-cocaco: Toward joint optimization of computation, caching, and communication on edge cloud,” IEEE Wireless Commun., vol. 25, no. 3, pp. 21–27, Jun. 2018

work page 2018
[3]

Distributed resource allocation in blockchain-based video streaming systems with mobile edge computing,

M. Liu, F. R. Yu, Y . Teng, V . C. M. Leung, and M. Song, “Distributed resource allocation in blockchain-based video streaming systems with mobile edge computing,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 695–708, Jan. 2019

work page 2019
[4]

A survey on mobile edge computing: The communication perspective,

Y . Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,” IEEE Commun. Surv. Tut. , vol. 19, no. 4, pp. 2322–2358, four quarter 2017

work page 2017
[5]

User association and resource allocation in uniﬁed noma enabled heterogeneous ultra dense networks,

Z. Qin, X. Yue, Y . Liu, Z. Ding, and A. Nallanathan, “User association and resource allocation in uniﬁed noma enabled heterogeneous ultra dense networks,” IEEE Commun. Mag. , vol. 56, no. 6, pp. 86–92, Jun. 2018

work page 2018
[6]

Nonorthogonal multiple access for 5G and beyond,

Y . Liu, Z. Qin, M. Elkashlan, Z. Ding, A. Nallanathan, and L. Hanzo, “Nonorthogonal multiple access for 5G and beyond,” Proc. IEEE, vol. 105, no. 12, pp. 2347–2381, Dec. 2017

work page 2017
[7]

Application of non-orthogonal multiple access in LTE and 5G networks,

Z. Ding, Y . Liu, J. Choi, Q. Sun, M. Elkashlan, C. I, and H. V . Poor, “Application of non-orthogonal multiple access in LTE and 5G networks,” IEEE Commun. Mag. , vol. 55, no. 2, pp. 185–191, Feb. 2017

work page 2017
[8]

Cache-aided non-orthogonal multiple access: The two-user case,

L. Xiang, D. W. K. Ng, X. Ge, Z. Ding, V . W. S. Wong, and R. Schober, “Cache-aided non-orthogonal multiple access: The two-user case,” IEEE J. Sel. Topic Signal Processing. , pp. 1–1, 2019

work page 2019
[9]

Energy-efﬁcient resource allocation for cache-assisted mobile edge computing,

Y . Cui, W. He, C. Ni, C. Guo, and Z. Liu, “Energy-efﬁcient resource allocation for cache-assisted mobile edge computing,” in IEEE Proc. of Loc. Com. Netw. (LCN) , Oct. 2017, pp. 640–648

work page 2017
[10]

Dynamic computation ofﬂoading for mobile-edge computing with energy harvesting devices,

Y . Mao, J. Zhang, and K. B. Letaief, “Dynamic computation ofﬂoading for mobile-edge computing with energy harvesting devices,” IEEE J. Sel. Areas Commun. , vol. 34, no. 12, pp. 3590–3605, Dec. 2016

work page 2016
[11]

Energy-efﬁcient resource allocation for mobile-edge computation ofﬂoading,

C. You, K. Huang, H. Chae, and B. H. Kim, “Energy-efﬁcient resource allocation for mobile-edge computation ofﬂoading,” IEEE Trans. Wireless Commun. , vol. 16, no. 3, pp. 1397–1411, Mar. 2017

work page 2017
[12]

Optimized Computation Offloading Performance in Virtual Edge Computing Systems via Deep Reinforcement Learning

X. Chen, H. Zhang, C. Wu, S. Mao, Y . Ji, and M. Bennis, “Optimized computation ofﬂoading performance in virtual edge computing systems via deep reinforcement learning,” ArXiv, May 2018. [Online]. Available: http://arxiv.org/abs/1805.06146

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Impact of non-orthogonal multiple access on the ofﬂoading of mobile edge computing,

Z. Ding, P. Fan, and H. V . Poor, “Impact of non-orthogonal multiple access on the ofﬂoading of mobile edge computing,” IEEE Trans. Commun. , vol. 67, no. 1, pp. 375–390, Jan. 2019

work page 2019
[14]

Multi-antenna NOMA for computation ofﬂoading in multiuser mobile edge computing systems,

F. Wang, J. Xu, and Z. Ding, “Multi-antenna NOMA for computation ofﬂoading in multiuser mobile edge computing systems,” IEEE Trans. Commun. , vol. 67, no. 3, pp. 2450–2463, Mar. 2019

work page 2019
[15]

Edge computing aware NOMA for 5G networks,

A. Kiani and N. Ansari, “Edge computing aware NOMA for 5G networks,” IEEE Int. of Things , vol. 5, no. 2, pp. 1299–1306, Apr. 2018

work page 2018
[16]

Energy-efﬁcient NOMA-based mobile edge computing ofﬂoading,

Y . Pan, M. Chen, Z. Yang, N. Huang, and M. Shikh-Bahaei, “Energy-efﬁcient NOMA-based mobile edge computing ofﬂoading,” IEEE Commun. Lett. , vol. 23, no. 2, pp. 310–313, Feb. 2019

work page 2019
[17]

Delay minimization for NOMA-MEC ofﬂoading,

Z. Ding, D. W. K. Ng, R. Schober, and H. V . Poor, “Delay minimization for NOMA-MEC ofﬂoading,” IEEE Signal Process. Lett., vol. 25, no. 12, pp. 1875–1879, Dec. 2018

work page 2018
[18]

Joint radio and computational resource allocation for NOMA-based mobile edge computing in heterogeneous networks,

Z. Song, Y . Liu, and X. Sun, “Joint radio and computational resource allocation for NOMA-based mobile edge computing in heterogeneous networks,” IEEE Commun. Lett. , vol. 22, no. 12, pp. 2559–2562, Dec. 2018

work page 2018
[19]

Interplay between NOMA and other emerging technologies: A survey,

M. Vaezi, G. Amarasuriya, Y . Liu, A. Arafa, and Z. D. Fang Fang, “Interplay between NOMA and other emerging technologies: A survey,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1903.10489

work page arXiv 2019
[20]

NOMA assisted wireless caching: Strategies and performance analysis,

Z. Ding, P. Fan, G. K. Karagiannidis, R. Schober, and H. V . Poor, “NOMA assisted wireless caching: Strategies and performance analysis,” IEEE Trans. Commun. , vol. 66, no. 10, pp. 4854–4876, Oct. 2018

work page 2018
[21]

Coverage performance of NOMA in wireless caching networks,

Z. Zhao, M. Xu, W. Xie, Z. Ding, and G. K. Karagiannidis, “Coverage performance of NOMA in wireless caching networks,” IEEE Commun. Lett. , vol. 22, no. 7, pp. 1458–1461, Jul. 2018. 30

work page 2018
[22]

Cache-Aided Non-Orthogonal Multiple Access

L. Xiang, D. W. K. Ng, X. Ge, and Z. Ding, “Cache-aided non-orthogonal multiple access,” ArXiv, 2018. [Online]. Available: https://arxiv.org/abs/1712.09557

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Leveraging Edge Caching in NOMA Systems with QoS Requirements

J. A. Oviedo and H. R. Sadjadpour, “Leveraging edge caching in NOMA systems with QoS requirements,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1801.07430

work page internal anchor Pith review Pith/arXiv arXiv 2019
[24]

Mode selection between index coding and superposition coding in cache-based NOMA networks,

Y . Fu, Y . Liu, H. Wang, Z. Shi, and Y . Liu, “Mode selection between index coding and superposition coding in cache-based NOMA networks,” IEEE Commun. Lett. , vol. 23, no. 3, pp. 478–481, Mar. 2019

work page 2019
[25]

Security in mobile edge caching with reinforcement learning,

L. Xiao, X. Wan, C. Dai, X. Du, X. Chen, and M. Guizani, “Security in mobile edge caching with reinforcement learning,” IEEE Wireless Commun., vol. 25, no. 3, pp. 116–122, Jun. 2018

work page 2018
[26]

Communications, caching, and computing for next generation HetNets,

Y . Zhou, F. R. Yu, J. Chen, and Y . Kuo, “Communications, caching, and computing for next generation HetNets,” IEEE Wireless Commun., vol. 25, no. 4, pp. 104–111, Aug. 2018

work page 2018
[27]

A branch and bound algorithm for feature subset selection,

Narendra and Fukunaga, “A branch and bound algorithm for feature subset selection,” IEEE Trans. Comput. , vol. C-26, no. 9, pp. 917–922, Sep. 1977

work page 1977
[28]

D. P. Bertsekas, Dynamic programming and optimal control . Athena scientiﬁc Belmont, MA, 2005, vol. 1, no. 3

work page 2005
[29]

Deep learning in physical layer communications,

Z. Qin, H. Ye, G. Y . Li, and B. F. Juang, “Deep learning in physical layer communications,” IEEE Wireless Commun. , vol. 26, no. 2, pp. 93–99, Apr. 2019

work page 2019
[30]

Multiple-step-ahead trafﬁc prediction in high-speed networks,

A. Bayati, K. Khoa Nguyen, and M. Cheriet, “Multiple-step-ahead trafﬁc prediction in high-speed networks,” IEEE Commun. Lett., vol. 22, no. 12, pp. 2447–2450, Dec. 2018

work page 2018
[31]

Learning statistical scripts with LSTM recurrent neural networks,

K. Pichotta and R. J. Mooney, “Learning statistical scripts with LSTM recurrent neural networks,” The Thirtieth AAAI Conference on Artiﬁcial Intelligence (AAAI-16) , 2016

work page 2016
[32]

Predrnn: Recurrent neural networks for predictive learning using spatiotemporal LSTMs,

Y . Wang, M. Long, J. Wang, Z. Gao, and P. S. Yu, “Predrnn: Recurrent neural networks for predictive learning using spatiotemporal LSTMs,” Advances in Neural Information Processing Systems 30 (NIPS-17) , 2017

work page 2017
[33]

Learning resource allocation and pricing for cloud proﬁt maximization,

B. Du, C. Wu, and Z. Huang, “Learning resource allocation and pricing for cloud proﬁt maximization,” The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19) , 2019

work page 2019
[34]

Multi-armed bandit for energy-efﬁcient and delay-sensitive edge computing in dynamic networks with uncertainty,

S. Ghoorchian and S. Maghsudi, “Multi-armed bandit for energy-efﬁcient and delay-sensitive edge computing in dynamic networks with uncertainty,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1904.06258

work page arXiv 2019
[35]

Bayesian reinforcement learning-based coalition formation for distributed resource sharing by device-to- device users in heterogeneous cellular networks,

A. Asheralieva, “Bayesian reinforcement learning-based coalition formation for distributed resource sharing by device-to- device users in heterogeneous cellular networks,” IEEE Trans. Wireless Commun. , vol. 16, no. 8, pp. 5016–5032, Aug. 2017

work page 2017
[36]

Learning Automata Based Q-learning for Content Placement in Cooperative Caching

Z. Yang, Y . Liu, Y . Chen, and L. Jiao, “Learning automata based Q-learning for content placement in cooperative caching,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1903.06235

work page internal anchor Pith review Pith/arXiv arXiv 2019
[37]

Joint ofﬂoading and computing optimization in wireless powered mobile-edge computing systems,

F. Wang, J. Xu, X. Wang, and S. Cui, “Joint ofﬂoading and computing optimization in wireless powered mobile-edge computing systems,” IEEE Trans. Wireless Commun. , vol. 17, no. 3, pp. 1784–1797, Mar. 2018

work page 2018
[38]

Energy efﬁcient mobile cloud computing powered by wireless energy transfer,

C. You, K. Huang, and H. Chae, “Energy efﬁcient mobile cloud computing powered by wireless energy transfer,” IEEE J. Sel. Areas Commun. , vol. 34, no. 5, pp. 1757–1771, May 2016

work page 2016
[39]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997

work page 1997
[40]

A learning algorithm for continually running fully recurrent neural networks,

R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural Comput., vol. 1, no. 2, pp. 270–280, Jun. 1989

work page 1989
[41]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction . Cambridge, MA, USA: MIT Press, 2016

work page 2016
[42]

A bayesian learning automaton for solving two-armed bernoulli bandit problems,

O. Granmo, “A bayesian learning automaton for solving two-armed bernoulli bandit problems,” in International Conference on Machine Learning and Applications , Dec. 2008, pp. 23–30

work page 2008
[43]

Echo state networks for proactive caching in cloud-based radio access networks with mobile users,

M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for proactive caching in cloud-based radio access networks with mobile users,” IEEE Trans. Wireless Commun. , vol. 16, no. 6, pp. 3520–3535, Jun. 2017

work page 2017
[44]

Solving two-armed bernoulli bandit problems using a bayesian learning automaton,

O.-C. Granmo, “Solving two-armed bernoulli bandit problems using a bayesian learning automaton,” International Journal of Intelligent Computing and Cybernetics , vol. 3, no. 2, pp. 207–234, 2010

work page 2010

[1] [1]

Deep reinforcement learning in cache-aided MEC networks,

Z. Yang, Y . Liu, Y . Chen, and G. Tyson, “Deep reinforcement learning in cache-aided MEC networks,” in IEEE Proc. of International Commun. Conf. (ICC) , Shanghai, China, May. 2019

work page 2019

[2] [2]

Edge-cocaco: Toward joint optimization of computation, caching, and communication on edge cloud,

M. Chen, Y . Hao, L. Hu, M. S. Hossain, and A. Ghoneim, “Edge-cocaco: Toward joint optimization of computation, caching, and communication on edge cloud,” IEEE Wireless Commun., vol. 25, no. 3, pp. 21–27, Jun. 2018

work page 2018

[3] [3]

Distributed resource allocation in blockchain-based video streaming systems with mobile edge computing,

M. Liu, F. R. Yu, Y . Teng, V . C. M. Leung, and M. Song, “Distributed resource allocation in blockchain-based video streaming systems with mobile edge computing,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 695–708, Jan. 2019

work page 2019

[4] [4]

A survey on mobile edge computing: The communication perspective,

Y . Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,” IEEE Commun. Surv. Tut. , vol. 19, no. 4, pp. 2322–2358, four quarter 2017

work page 2017

[5] [5]

User association and resource allocation in uniﬁed noma enabled heterogeneous ultra dense networks,

Z. Qin, X. Yue, Y . Liu, Z. Ding, and A. Nallanathan, “User association and resource allocation in uniﬁed noma enabled heterogeneous ultra dense networks,” IEEE Commun. Mag. , vol. 56, no. 6, pp. 86–92, Jun. 2018

work page 2018

[6] [6]

Nonorthogonal multiple access for 5G and beyond,

Y . Liu, Z. Qin, M. Elkashlan, Z. Ding, A. Nallanathan, and L. Hanzo, “Nonorthogonal multiple access for 5G and beyond,” Proc. IEEE, vol. 105, no. 12, pp. 2347–2381, Dec. 2017

work page 2017

[7] [7]

Application of non-orthogonal multiple access in LTE and 5G networks,

Z. Ding, Y . Liu, J. Choi, Q. Sun, M. Elkashlan, C. I, and H. V . Poor, “Application of non-orthogonal multiple access in LTE and 5G networks,” IEEE Commun. Mag. , vol. 55, no. 2, pp. 185–191, Feb. 2017

work page 2017

[8] [8]

Cache-aided non-orthogonal multiple access: The two-user case,

L. Xiang, D. W. K. Ng, X. Ge, Z. Ding, V . W. S. Wong, and R. Schober, “Cache-aided non-orthogonal multiple access: The two-user case,” IEEE J. Sel. Topic Signal Processing. , pp. 1–1, 2019

work page 2019

[9] [9]

Energy-efﬁcient resource allocation for cache-assisted mobile edge computing,

Y . Cui, W. He, C. Ni, C. Guo, and Z. Liu, “Energy-efﬁcient resource allocation for cache-assisted mobile edge computing,” in IEEE Proc. of Loc. Com. Netw. (LCN) , Oct. 2017, pp. 640–648

work page 2017

[10] [10]

Dynamic computation ofﬂoading for mobile-edge computing with energy harvesting devices,

Y . Mao, J. Zhang, and K. B. Letaief, “Dynamic computation ofﬂoading for mobile-edge computing with energy harvesting devices,” IEEE J. Sel. Areas Commun. , vol. 34, no. 12, pp. 3590–3605, Dec. 2016

work page 2016

[11] [11]

Energy-efﬁcient resource allocation for mobile-edge computation ofﬂoading,

C. You, K. Huang, H. Chae, and B. H. Kim, “Energy-efﬁcient resource allocation for mobile-edge computation ofﬂoading,” IEEE Trans. Wireless Commun. , vol. 16, no. 3, pp. 1397–1411, Mar. 2017

work page 2017

[12] [12]

Optimized Computation Offloading Performance in Virtual Edge Computing Systems via Deep Reinforcement Learning

X. Chen, H. Zhang, C. Wu, S. Mao, Y . Ji, and M. Bennis, “Optimized computation ofﬂoading performance in virtual edge computing systems via deep reinforcement learning,” ArXiv, May 2018. [Online]. Available: http://arxiv.org/abs/1805.06146

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Impact of non-orthogonal multiple access on the ofﬂoading of mobile edge computing,

Z. Ding, P. Fan, and H. V . Poor, “Impact of non-orthogonal multiple access on the ofﬂoading of mobile edge computing,” IEEE Trans. Commun. , vol. 67, no. 1, pp. 375–390, Jan. 2019

work page 2019

[14] [14]

Multi-antenna NOMA for computation ofﬂoading in multiuser mobile edge computing systems,

F. Wang, J. Xu, and Z. Ding, “Multi-antenna NOMA for computation ofﬂoading in multiuser mobile edge computing systems,” IEEE Trans. Commun. , vol. 67, no. 3, pp. 2450–2463, Mar. 2019

work page 2019

[15] [15]

Edge computing aware NOMA for 5G networks,

A. Kiani and N. Ansari, “Edge computing aware NOMA for 5G networks,” IEEE Int. of Things , vol. 5, no. 2, pp. 1299–1306, Apr. 2018

work page 2018

[16] [16]

Energy-efﬁcient NOMA-based mobile edge computing ofﬂoading,

Y . Pan, M. Chen, Z. Yang, N. Huang, and M. Shikh-Bahaei, “Energy-efﬁcient NOMA-based mobile edge computing ofﬂoading,” IEEE Commun. Lett. , vol. 23, no. 2, pp. 310–313, Feb. 2019

work page 2019

[17] [17]

Delay minimization for NOMA-MEC ofﬂoading,

Z. Ding, D. W. K. Ng, R. Schober, and H. V . Poor, “Delay minimization for NOMA-MEC ofﬂoading,” IEEE Signal Process. Lett., vol. 25, no. 12, pp. 1875–1879, Dec. 2018

work page 2018

[18] [18]

Joint radio and computational resource allocation for NOMA-based mobile edge computing in heterogeneous networks,

Z. Song, Y . Liu, and X. Sun, “Joint radio and computational resource allocation for NOMA-based mobile edge computing in heterogeneous networks,” IEEE Commun. Lett. , vol. 22, no. 12, pp. 2559–2562, Dec. 2018

work page 2018

[19] [19]

Interplay between NOMA and other emerging technologies: A survey,

M. Vaezi, G. Amarasuriya, Y . Liu, A. Arafa, and Z. D. Fang Fang, “Interplay between NOMA and other emerging technologies: A survey,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1903.10489

work page arXiv 2019

[20] [20]

NOMA assisted wireless caching: Strategies and performance analysis,

Z. Ding, P. Fan, G. K. Karagiannidis, R. Schober, and H. V . Poor, “NOMA assisted wireless caching: Strategies and performance analysis,” IEEE Trans. Commun. , vol. 66, no. 10, pp. 4854–4876, Oct. 2018

work page 2018

[21] [21]

Coverage performance of NOMA in wireless caching networks,

Z. Zhao, M. Xu, W. Xie, Z. Ding, and G. K. Karagiannidis, “Coverage performance of NOMA in wireless caching networks,” IEEE Commun. Lett. , vol. 22, no. 7, pp. 1458–1461, Jul. 2018. 30

work page 2018

[22] [22]

Cache-Aided Non-Orthogonal Multiple Access

L. Xiang, D. W. K. Ng, X. Ge, and Z. Ding, “Cache-aided non-orthogonal multiple access,” ArXiv, 2018. [Online]. Available: https://arxiv.org/abs/1712.09557

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Leveraging Edge Caching in NOMA Systems with QoS Requirements

J. A. Oviedo and H. R. Sadjadpour, “Leveraging edge caching in NOMA systems with QoS requirements,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1801.07430

work page internal anchor Pith review Pith/arXiv arXiv 2019

[24] [24]

Mode selection between index coding and superposition coding in cache-based NOMA networks,

Y . Fu, Y . Liu, H. Wang, Z. Shi, and Y . Liu, “Mode selection between index coding and superposition coding in cache-based NOMA networks,” IEEE Commun. Lett. , vol. 23, no. 3, pp. 478–481, Mar. 2019

work page 2019

[25] [25]

Security in mobile edge caching with reinforcement learning,

L. Xiao, X. Wan, C. Dai, X. Du, X. Chen, and M. Guizani, “Security in mobile edge caching with reinforcement learning,” IEEE Wireless Commun., vol. 25, no. 3, pp. 116–122, Jun. 2018

work page 2018

[26] [26]

Communications, caching, and computing for next generation HetNets,

Y . Zhou, F. R. Yu, J. Chen, and Y . Kuo, “Communications, caching, and computing for next generation HetNets,” IEEE Wireless Commun., vol. 25, no. 4, pp. 104–111, Aug. 2018

work page 2018

[27] [27]

A branch and bound algorithm for feature subset selection,

Narendra and Fukunaga, “A branch and bound algorithm for feature subset selection,” IEEE Trans. Comput. , vol. C-26, no. 9, pp. 917–922, Sep. 1977

work page 1977

[28] [28]

D. P. Bertsekas, Dynamic programming and optimal control . Athena scientiﬁc Belmont, MA, 2005, vol. 1, no. 3

work page 2005

[29] [29]

Deep learning in physical layer communications,

Z. Qin, H. Ye, G. Y . Li, and B. F. Juang, “Deep learning in physical layer communications,” IEEE Wireless Commun. , vol. 26, no. 2, pp. 93–99, Apr. 2019

work page 2019

[30] [30]

Multiple-step-ahead trafﬁc prediction in high-speed networks,

A. Bayati, K. Khoa Nguyen, and M. Cheriet, “Multiple-step-ahead trafﬁc prediction in high-speed networks,” IEEE Commun. Lett., vol. 22, no. 12, pp. 2447–2450, Dec. 2018

work page 2018

[31] [31]

Learning statistical scripts with LSTM recurrent neural networks,

K. Pichotta and R. J. Mooney, “Learning statistical scripts with LSTM recurrent neural networks,” The Thirtieth AAAI Conference on Artiﬁcial Intelligence (AAAI-16) , 2016

work page 2016

[32] [32]

Predrnn: Recurrent neural networks for predictive learning using spatiotemporal LSTMs,

Y . Wang, M. Long, J. Wang, Z. Gao, and P. S. Yu, “Predrnn: Recurrent neural networks for predictive learning using spatiotemporal LSTMs,” Advances in Neural Information Processing Systems 30 (NIPS-17) , 2017

work page 2017

[33] [33]

Learning resource allocation and pricing for cloud proﬁt maximization,

B. Du, C. Wu, and Z. Huang, “Learning resource allocation and pricing for cloud proﬁt maximization,” The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19) , 2019

work page 2019

[34] [34]

Multi-armed bandit for energy-efﬁcient and delay-sensitive edge computing in dynamic networks with uncertainty,

S. Ghoorchian and S. Maghsudi, “Multi-armed bandit for energy-efﬁcient and delay-sensitive edge computing in dynamic networks with uncertainty,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1904.06258

work page arXiv 2019

[35] [35]

Bayesian reinforcement learning-based coalition formation for distributed resource sharing by device-to- device users in heterogeneous cellular networks,

A. Asheralieva, “Bayesian reinforcement learning-based coalition formation for distributed resource sharing by device-to- device users in heterogeneous cellular networks,” IEEE Trans. Wireless Commun. , vol. 16, no. 8, pp. 5016–5032, Aug. 2017

work page 2017

[36] [36]

Learning Automata Based Q-learning for Content Placement in Cooperative Caching

Z. Yang, Y . Liu, Y . Chen, and L. Jiao, “Learning automata based Q-learning for content placement in cooperative caching,” ArXiv, 2019. [Online]. Available: https://arxiv.org/abs/1903.06235

work page internal anchor Pith review Pith/arXiv arXiv 2019

[37] [37]

Joint ofﬂoading and computing optimization in wireless powered mobile-edge computing systems,

F. Wang, J. Xu, X. Wang, and S. Cui, “Joint ofﬂoading and computing optimization in wireless powered mobile-edge computing systems,” IEEE Trans. Wireless Commun. , vol. 17, no. 3, pp. 1784–1797, Mar. 2018

work page 2018

[38] [38]

Energy efﬁcient mobile cloud computing powered by wireless energy transfer,

C. You, K. Huang, and H. Chae, “Energy efﬁcient mobile cloud computing powered by wireless energy transfer,” IEEE J. Sel. Areas Commun. , vol. 34, no. 5, pp. 1757–1771, May 2016

work page 2016

[39] [39]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997

work page 1997

[40] [40]

A learning algorithm for continually running fully recurrent neural networks,

R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural Comput., vol. 1, no. 2, pp. 270–280, Jun. 1989

work page 1989

[41] [41]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction . Cambridge, MA, USA: MIT Press, 2016

work page 2016

[42] [42]

A bayesian learning automaton for solving two-armed bernoulli bandit problems,

O. Granmo, “A bayesian learning automaton for solving two-armed bernoulli bandit problems,” in International Conference on Machine Learning and Applications , Dec. 2008, pp. 23–30

work page 2008

[43] [43]

Echo state networks for proactive caching in cloud-based radio access networks with mobile users,

M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for proactive caching in cloud-based radio access networks with mobile users,” IEEE Trans. Wireless Commun. , vol. 16, no. 6, pp. 3520–3535, Jun. 2017

work page 2017

[44] [44]

Solving two-armed bernoulli bandit problems using a bayesian learning automaton,

O.-C. Granmo, “Solving two-armed bernoulli bandit problems using a bayesian learning automaton,” International Journal of Intelligent Computing and Cybernetics , vol. 3, no. 2, pp. 207–234, 2010

work page 2010