Service Placement in Small Cell Networks Using Distributed Best Arm Identification in Linear Bandits

Aydin Sezgin; Mariam Yahya; Setareh Maghsudi

arxiv: 2506.22480 · v2 · submitted 2025-06-22 · 💻 cs.NI · cs.DC· cs.LG

Service Placement in Small Cell Networks Using Distributed Best Arm Identification in Linear Bandits

Mariam Yahya , Aydin Sezgin , Setareh Maghsudi This is my paper

Pith reviewed 2026-05-19 08:18 UTC · model grok-4.3

classification 💻 cs.NI cs.DCcs.LG

keywords service placementsmall cell networkslinear banditsbest arm identificationmulti-agent learningedge computingdistributed algorithms

0 comments

The pith

Distributed multi-agent best-arm identification lets small base stations collaboratively find the service that most cuts user delay, with learning rounds falling in proportion to the number of stations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models unknown service demand in small cell networks as a linear function of service attributes and casts the placement choice as a linear bandit problem in which services are arms and small base stations are agents. It develops a distributed adaptive multi-agent best-arm identification procedure that lets the stations pool information to reach a fixed-confidence decision on the single best local service. If the procedure works, networks can respond to changing demand without a central coordinator and without exhaustive trials, because each added station shortens the total rounds needed. The authors supply both simulation evidence of the speedup and theoretical bounds on sample complexity together with communication cost.

Core claim

We propose a distributed and adaptive multi-agent best-arm identification algorithm under a fixed-confidence setting for linear bandits, in which small base stations act as agents and services as arms; the stations collaborate to identify the service that yields the largest reduction in total user delay relative to cloud placement, achieving near-optimal speedup in which the number of learning rounds decreases proportionally with the number of stations, accompanied by analysis of sample complexity and communication overhead.

What carries the argument

Distributed adaptive multi-agent best-arm identification algorithm operating on a linear bandit model of service demand, in which stations share observations to accelerate identification of the arm that maximizes delay reduction.

If this is right

The algorithm identifies the optimal service with the prescribed confidence level.
Learning rounds decrease proportionally with the number of small base stations.
Sample complexity remains bounded while communication overhead stays manageable.
Simulations confirm near-optimal speedup under dynamic network conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same collaboration pattern could be tested on other edge-resource decisions such as caching or CPU allocation.
If the linear model is only approximate, the method may still serve as a fast initialization step before a more general learner takes over.
Asynchronous updates among stations would be a natural next test to reduce coordination latency further.

Load-bearing premise

Service demand is accurately captured by a linear function of service attributes and stations can share the necessary information under dynamic conditions without prohibitive communication costs.

What would settle it

Deploy the algorithm in a live small-cell testbed and measure whether the rounds required to reach the target confidence level shrink proportionally as the number of participating base stations is increased while holding the linear demand model fixed.

Figures

Figures reproduced from arXiv: 2506.22480 by Aydin Sezgin, Mariam Yahya, Setareh Maghsudi.

**Figure 1.** Figure 1: System model A service provider manages K distinct services that can be deployed either at the cloud or the SBSs, depending on the SBSs’ decision. Each service k ∈ [K] is characterized by a ddimensional context vector xk ∈ R d . This vector can represent various factors such as the service type, computational and memory requirements, and geographic or time-based demand. Here, our optimization problem depe… view at source ↗

**Figure 2.** Figure 2: a shows the confidence set at time t. Since it overlaps with C(1) and C(3), arms 1 and 3 can be optimal. To shrink the confidence set into one cone only, the next arm pull must be in the direction x3 − x2. Fig. 2a shows the confidence set after the arm pull. Since the confidence set resides fully in cone 1, arm 1 is identified as the optimal. (a) The confidence set at time t. (b) The confidence set at t + … view at source ↗

**Figure 3.** Figure 3: The sample complexity performance for different [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The normalized demand for the K = 10 services over 8 periods of time. To study the performance of the network under different numbers of collaborating SBSs, we consider a network consisting of six small cells, each with a radius of 200 m and an SBS located at its center. The cells are arranged in a circular layout such that their edges touch, forming an outer circle with a radius of 600 m. An MBS is posit… view at source ↗

**Figure 6.** Figure 6: The communication threshold D vs. the number of communication rounds and the total sample complexity for a four-SBS network with K = 10 services. VIII. CONCLUSION AND FUTURE WORK This work addresses the problem of service placement in small cell networks, considering the unknown service demand and the randomness of the environment. Given the limited resources at the SBSs, the goal was to identify the servi… view at source ↗

read the original abstract

As users in small cell networks increasingly rely on computation-intensive services, cloud-based access often results in high latency. Multi-access edge computing (MEC) mitigates this by bringing computational resources closer to end users, with small base stations (SBSs) serving as edge servers to enable low-latency service delivery. However, limited edge capacity makes it challenging to decide which services to deploy locally versus in the cloud, especially under unknown service demand and dynamic network conditions. To tackle this problem, we model service demand as a linear function of service attributes and formulate the service placement task as a linear bandit problem, where SBSs act as agents and services as arms. The goal is to identify the service that, when placed at the edge, offers the greatest reduction in total user delay compared to cloud deployment. We propose a distributed and adaptive multi-agent best-arm identification (BAI) algorithm under a fixed-confidence setting, where SBSs collaborate to accelerate learning. Simulations show that our algorithm identifies the optimal service with the desired confidence and achieves near-optimal speedup, as the number of learning rounds decreases proportionally with the number of SBSs. We also provide theoretical analysis of the algorithm's sample complexity and communication overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper casts service placement as distributed linear-bandit best-arm identification and reports near-optimal speedup with more SBSs, but the speedup rests on communication costs staying negligible.

read the letter

This paper maps the service placement decision in small-cell MEC networks to a linear bandit where each SBS is an agent and services are arms. The goal is to find the service that cuts total user delay the most when run at the edge instead of the cloud. They give a distributed adaptive BAI algorithm under fixed confidence, derive sample-complexity bounds, and run simulations that show the number of rounds drops roughly in proportion to the number of SBSs while still hitting the target confidence level. They also track communication overhead in the analysis.

Referee Report

2 major / 1 minor

Summary. The manuscript models service demand as a linear function of service attributes and casts service placement in small-cell networks as a linear bandit problem in which small base stations (SBSs) act as collaborating agents. It introduces a distributed adaptive multi-agent best-arm identification algorithm under a fixed-confidence criterion whose goal is to identify the service that yields the largest reduction in total user delay relative to cloud placement. The paper supplies a theoretical analysis of sample complexity and communication overhead together with simulation results asserting that the algorithm meets the target confidence while achieving near-optimal speedup (learning rounds scaling inversely with the number of SBSs).

Significance. If the sample-complexity bounds remain valid once communication latency is explicitly incorporated and the simulations prove robust, the work would usefully connect linear bandits to practical MEC service-placement decisions, supplying both provable fixed-confidence guarantees and a scalable collaborative mechanism. The explicit derivation of sample-complexity bounds and the reporting of simulation outcomes constitute clear strengths.

major comments (2)

[Theoretical analysis] Theoretical analysis: the claim that sample complexity decreases proportionally with the number of SBSs is load-bearing for the central speedup result, yet the analysis does not appear to include an explicit additive term for communication latency or topology dynamics inside the stopping-time bound; without this term the proportionality may hold only under an idealized synchronous model rather than the stated dynamic network setting.
[Simulations] Simulations section: the abstract asserts that simulations confirm fixed-confidence identification and near-optimal speedup, but no concrete performance tables, error bars, or exclusion criteria for the linear demand model are referenced, leaving the empirical support for the proportionality claim difficult to evaluate.

minor comments (1)

[Abstract] Abstract: a single sentence summarizing the form of the derived sample-complexity bound would help readers assess the theoretical contribution at a glance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the work.

read point-by-point responses

Referee: [Theoretical analysis] Theoretical analysis: the claim that sample complexity decreases proportionally with the number of SBSs is load-bearing for the central speedup result, yet the analysis does not appear to include an explicit additive term for communication latency or topology dynamics inside the stopping-time bound; without this term the proportionality may hold only under an idealized synchronous model rather than the stated dynamic network setting.

Authors: We agree that the current stopping-time analysis assumes synchronous rounds and does not explicitly bound communication latency or topology changes. In the revision we will augment the sample-complexity bound with an additive term that accounts for per-round communication latency under a bounded-delay model, and we will add a corollary clarifying the conditions (e.g., latency bounded by a constant fraction of the per-round computation time) under which the inverse scaling with the number of SBSs continues to hold in dynamic networks. revision: yes
Referee: [Simulations] Simulations section: the abstract asserts that simulations confirm fixed-confidence identification and near-optimal speedup, but no concrete performance tables, error bars, or exclusion criteria for the linear demand model are referenced, leaving the empirical support for the proportionality claim difficult to evaluate.

Authors: We concur that additional empirical detail is needed. The revised simulations section will include a table of mean rounds-to-identification and standard deviations over 50 independent runs, error bars on all speedup plots, the precise linear demand model parameters (feature dimension, noise variance, service attribute ranges), and the criteria used to generate and filter demand instances. These additions will make the empirical support for the proportionality claim fully reproducible and transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained with independent theoretical analysis

full rationale

The paper proposes a new distributed adaptive multi-agent BAI algorithm for the linear bandit formulation of service placement. Sample complexity and near-optimal speedup (rounds scaling with number of SBSs) are derived from the algorithm design, collaboration protocol, and fixed-confidence guarantees, with separate analysis of communication overhead. The linear demand model is explicitly an assumption, not a derived output. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain. The central claims rest on stated theoretical bounds rather than reducing to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Limited information available from abstract only; no explicit free parameters, axioms, or invented entities are stated. The linear demand model and collaboration assumptions are implicit but not quantified.

pith-pipeline@v0.9.0 · 5757 in / 993 out tokens · 29404 ms · 2026-05-19T08:18:00.344241+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Service placement using distributed pure exploration in linear bandits

M. Yahya, A. Sezgin, and S. Maghsudi, “Service placement using distributed pure exploration in linear bandits.” to be published, 2025

work page 2025
[2]

A survey on mobile edge computing for video stream- ing: Opportunities and challenges,

M. A. Khan, E. Baccour, Z. Chkirbene, A. Erbad, R. Hamila, M. Hamdi, and M. Gabbouj, “A survey on mobile edge computing for video stream- ing: Opportunities and challenges,” IEEE Access, vol. 10, pp. 120514– 120550, 2022

work page 2022
[3]

Energy- efficient and privacy-preserved incentive mechanism for mobile edge computing-assisted federated learning in healthcare system,

J. Liu, Z. Chang, K. Wang, Z. Zhao, and T. H ¨am¨al¨ainen, “Energy- efficient and privacy-preserved incentive mechanism for mobile edge computing-assisted federated learning in healthcare system,” IEEE Trans. on Network and Service Management , 2024

work page 2024
[4]

Joint edge caching and computation offloading for heterogeneous tasks in MEC-enabled vehicular networks,

Y . Li, L. Li, and Z. Zhou, “Joint edge caching and computation offloading for heterogeneous tasks in MEC-enabled vehicular networks,” Veh. Commun., vol. 50, p. 100860, 2024

work page 2024
[5]

Ericsson mobility report

Ericsson, “Ericsson mobility report.” https://www.ericsson.com/en/ reports-and-papers/mobility-report/reports/november-2024, 2024. Ac- cessed: Jan. 2025

work page 2024
[6]

Mobile edge computing,

B. Liang, V . Wong, R. Schober, D. Ng, and L. Wang, “Mobile edge computing,” Key Technol. for 5G Wireless Sys., vol. 16, no. 3, pp. 1397– 1411, 2017

work page 2017
[7]

Dynamic service placement in multi-access edge computing: A systematic literature review,

H. T. Malazi, S. R. Chaudhry, A. Kazmi, A. Palade, C. Cabrera, G. White, and S. Clarke, “Dynamic service placement in multi-access edge computing: A systematic literature review,” IEEE Access, vol. 10, pp. 32639–32688, 2022

work page 2022
[8]

Adaptive user- managed service placement for mobile edge computing: An online learning approach,

T. Ouyang, R. Li, X. Chen, Z. Zhou, and X. Tang, “Adaptive user- managed service placement for mobile edge computing: An online learning approach,” in IEEE Conf. on Comput. Commun. (INFOCOM) , pp. 1468–1476, IEEE, 2019

work page 2019
[9]

Budget-constrained edge service provisioning with demand estimation via bandit learning,

L. Chen and J. Xu, “Budget-constrained edge service provisioning with demand estimation via bandit learning,” IEEE J. Sel. Areas in Commun., vol. 37, no. 10, pp. 2364–2376, 2019

work page 2019
[10]

Multi-time scale service caching and pricing in MEC systems with dynamic program popularity,

Y . Chen, X. Hu, S. Gong, Z. Su, and B. Gu, “Multi-time scale service caching and pricing in MEC systems with dynamic program popularity,” IEEE Internet of Things J. , 2025

work page 2025
[11]

Service placement and request schedul- ing for data-intensive applications in edge clouds,

V . Farhadi, F. Mehmeti, T. He, T. F. La Porta, H. Khamfroush, S. Wang, K. S. Chan, and K. Poularakis, “Service placement and request schedul- ing for data-intensive applications in edge clouds,” IEEE/ACM Trans. on Netw., vol. 29, no. 2, pp. 779–792, 2021

work page 2021
[12]

Cloudlet placement and task allocation in mobile edge computing,

S. Yang, F. Li, M. Shen, X. Chen, X. Fu, and Y . Wang, “Cloudlet placement and task allocation in mobile edge computing,” IEEE Internet of Things J. , vol. 6, no. 3, pp. 5853–5863, 2019

work page 2019
[13]

Spatio–temporal edge service placement: A bandit learning approach,

L. Chen, J. Xu, S. Ren, and P. Zhou, “Spatio–temporal edge service placement: A bandit learning approach,” IEEE Trans. on Wireless Commun., vol. 17, no. 12, pp. 8388–8401, 2018

work page 2018
[14]

Bandit learning-based service placement and resource allocation for mobile edge computing,

W. He, D. He, Y . Huang, Y . Zhang, Y . Xu, G. Yun-feng, and W. Zhang, “Bandit learning-based service placement and resource allocation for mobile edge computing,” in IEEE 31st Annual Int. Symp. on Pers., Indoor and Mobile Radio Commun. , pp. 1–6, IEEE, 2020

work page 2020
[15]

Lattimore and C

T. Lattimore and C. Szepesv ´ari, Bandit algorithms. Cambridge Univer- sity Press, 2020

work page 2020
[16]

Pure exploration in multi-armed bandits problems,

S. Bubeck, R. Munos, and G. Stoltz, “Pure exploration in multi-armed bandits problems,” in 20th Int. Conf. Algorithmic Learn. Theory (ALT), Porto, Portugal,, pp. 23–37, 2009

work page 2009
[17]

Online learning aided decentralized multi-user task offloading for mobile edge computing,

X. Wang, J. Ye, and J. C. Lui, “Online learning aided decentralized multi-user task offloading for mobile edge computing,” IEEE Trans. on Mobile Comput., vol. 23, no. 4, pp. 3328–3342, 2023

work page 2023
[18]

Improved algorithms for linear stochastic bandits,

Y . Abbasi-Yadkori, D. P´al, and C. Szepesv ´ari, “Improved algorithms for linear stochastic bandits,” Advances in Neural Inf. Proc. Sys. , vol. 24, 2011

work page 2011
[19]

Best arm identification in multi-armed bandits.,

J.-Y . Audibert, S. Bubeck, and R. Munos, “Best arm identification in multi-armed bandits.,” in COLT, pp. 41–53, 2010. 13

work page 2010
[20]

On the complexity of best- arm identification in multi-armed bandit models,

E. Kaufmann, O. Capp ´e, and A. Garivier, “On the complexity of best- arm identification in multi-armed bandit models,”The J. of Mach. Learn. Res., vol. 17, no. 1, pp. 1–42, 2016

work page 2016
[21]

On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning,

M. Hoffman, B. Shahriari, and N. Freitas, “On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning,” in Artif. Intell. and Statist. , pp. 365–374, PMLR, 2014

work page 2014
[22]

Best-arm identification in linear bandits,

M. Soare, A. Lazaric, and R. Munos, “Best-arm identification in linear bandits,” Advances in Neural Inf. Proc. Sys. , vol. 27, 2014

work page 2014
[23]

A fully adaptive algorithm for pure exploration in linear bandits,

L. Xu, J. Honda, and M. Sugiyama, “A fully adaptive algorithm for pure exploration in linear bandits,” in Int.l Conf. Artif. Intell. and Statist. , pp. 843–851, PMLR, 2018

work page 2018
[24]

Distributed bandit learning: Near-optimal regret with efficient communication,

Y . Wang, J. Hu, X. Chen, and L. Wang, “Distributed bandit learning: Near-optimal regret with efficient communication,” in Int. Conf. on Learn. Representations (ICLR) 2020 , 2019

work page 2020
[25]

Federated multi-armed bandits,

C. Shi and C. Shen, “Federated multi-armed bandits,” in Proc. of the AAAI Conf. on Artif. Intell. , vol. 35, pp. 9603–9611, 2021

work page 2021
[26]

Federated best arm identification with heterogeneous clients,

Z. Chen, P. Karthik, V . Y . Tan, and Y . M. Chee, “Federated best arm identification with heterogeneous clients,” IEEE Trans. on Inf. Theory , 2023

work page 2023
[27]

A simple and provably efficient al- gorithm for asynchronous federated contextual linear bandits,

J. He, T. Wang, Y . Min, and Q. Gu, “A simple and provably efficient al- gorithm for asynchronous federated contextual linear bandits,” Advances in Neural Inf. Proc. Sys. , vol. 35, pp. 4762–4775, 2022

work page 2022
[28]

Exploiting heterogeneity in robust federated best-arm identification,

A. Mitra, H. Hassani, and G. Pappas, “Exploiting heterogeneity in robust federated best-arm identification,” arXiv preprint arXiv:2109.05700 , 2021

work page arXiv 2021
[29]

Collaborative learning with limited interaction: Tight bounds for distributed exploration in multi-armed bandits,

C. Tao, Q. Zhang, and Y . Zhou, “Collaborative learning with limited interaction: Tight bounds for distributed exploration in multi-armed bandits,” in 2019 IEEE 60th Annu. Symp. on Found. of Comput. Sci. (FOCS), pp. 126–146, IEEE, 2019

work page 2019
[30]

Federated linear contextual bandits,

R. Huang, W. Wu, J. Yang, and C. Shen, “Federated linear contextual bandits,” Advances in Neural Inf. Process. Sys. , vol. 34, pp. 27057– 27068, 2021

work page 2021
[31]

Distributed contextual linear bandits with minimax optimal communication cost,

S. Amani, T. Lattimore, A. Gy ¨orgy, and L. Yang, “Distributed contextual linear bandits with minimax optimal communication cost,” in Int. Conf. on Mach. Learn. , pp. 691–717, PMLR, 2023

work page 2023
[32]

Decentralized task offloading and load-balancing for mobile edge computing in dense networks,

M. Yahya, A. Conzelmann, and S. Maghsudi, “Decentralized task offloading and load-balancing for mobile edge computing in dense networks,” IEEE Commun. Lett. , 2024

work page 2024
[33]

A contextual multi-armed bandit approach to caching in wireless small cell network,

C. Zhang, P. Ren, and Q. Du, “A contextual multi-armed bandit approach to caching in wireless small cell network,” in 2017 9th Int. Conf. on Wireless Commun. and Signal Proc. (WCSP) , pp. 1–6, IEEE, 2017

work page 2017
[34]

Two time-scale joint service caching and task offloading for uav-assisted mobile edge computing,

R. Zhou, X. Wu, H. Tan, and R. Zhang, “Two time-scale joint service caching and task offloading for uav-assisted mobile edge computing,” in IEEE Conf. on Comput. Commun. (INFOCOM) , pp. 1189–1198, 2022

work page 2022
[35]

Content popularity prediction towards location-aware mobile edge caching,

P. Yang, N. Zhang, S. Zhang, L. Yu, J. Zhang, and X. Shen, “Content popularity prediction towards location-aware mobile edge caching,” IEEE Trans. on Multimedia , vol. 21, no. 4, pp. 915–929, 2018

work page 2018

[1] [1]

Service placement using distributed pure exploration in linear bandits

M. Yahya, A. Sezgin, and S. Maghsudi, “Service placement using distributed pure exploration in linear bandits.” to be published, 2025

work page 2025

[2] [2]

A survey on mobile edge computing for video stream- ing: Opportunities and challenges,

M. A. Khan, E. Baccour, Z. Chkirbene, A. Erbad, R. Hamila, M. Hamdi, and M. Gabbouj, “A survey on mobile edge computing for video stream- ing: Opportunities and challenges,” IEEE Access, vol. 10, pp. 120514– 120550, 2022

work page 2022

[3] [3]

Energy- efficient and privacy-preserved incentive mechanism for mobile edge computing-assisted federated learning in healthcare system,

J. Liu, Z. Chang, K. Wang, Z. Zhao, and T. H ¨am¨al¨ainen, “Energy- efficient and privacy-preserved incentive mechanism for mobile edge computing-assisted federated learning in healthcare system,” IEEE Trans. on Network and Service Management , 2024

work page 2024

[4] [4]

Joint edge caching and computation offloading for heterogeneous tasks in MEC-enabled vehicular networks,

Y . Li, L. Li, and Z. Zhou, “Joint edge caching and computation offloading for heterogeneous tasks in MEC-enabled vehicular networks,” Veh. Commun., vol. 50, p. 100860, 2024

work page 2024

[5] [5]

Ericsson mobility report

Ericsson, “Ericsson mobility report.” https://www.ericsson.com/en/ reports-and-papers/mobility-report/reports/november-2024, 2024. Ac- cessed: Jan. 2025

work page 2024

[6] [6]

Mobile edge computing,

B. Liang, V . Wong, R. Schober, D. Ng, and L. Wang, “Mobile edge computing,” Key Technol. for 5G Wireless Sys., vol. 16, no. 3, pp. 1397– 1411, 2017

work page 2017

[7] [7]

Dynamic service placement in multi-access edge computing: A systematic literature review,

H. T. Malazi, S. R. Chaudhry, A. Kazmi, A. Palade, C. Cabrera, G. White, and S. Clarke, “Dynamic service placement in multi-access edge computing: A systematic literature review,” IEEE Access, vol. 10, pp. 32639–32688, 2022

work page 2022

[8] [8]

Adaptive user- managed service placement for mobile edge computing: An online learning approach,

T. Ouyang, R. Li, X. Chen, Z. Zhou, and X. Tang, “Adaptive user- managed service placement for mobile edge computing: An online learning approach,” in IEEE Conf. on Comput. Commun. (INFOCOM) , pp. 1468–1476, IEEE, 2019

work page 2019

[9] [9]

Budget-constrained edge service provisioning with demand estimation via bandit learning,

L. Chen and J. Xu, “Budget-constrained edge service provisioning with demand estimation via bandit learning,” IEEE J. Sel. Areas in Commun., vol. 37, no. 10, pp. 2364–2376, 2019

work page 2019

[10] [10]

Multi-time scale service caching and pricing in MEC systems with dynamic program popularity,

Y . Chen, X. Hu, S. Gong, Z. Su, and B. Gu, “Multi-time scale service caching and pricing in MEC systems with dynamic program popularity,” IEEE Internet of Things J. , 2025

work page 2025

[11] [11]

Service placement and request schedul- ing for data-intensive applications in edge clouds,

V . Farhadi, F. Mehmeti, T. He, T. F. La Porta, H. Khamfroush, S. Wang, K. S. Chan, and K. Poularakis, “Service placement and request schedul- ing for data-intensive applications in edge clouds,” IEEE/ACM Trans. on Netw., vol. 29, no. 2, pp. 779–792, 2021

work page 2021

[12] [12]

Cloudlet placement and task allocation in mobile edge computing,

S. Yang, F. Li, M. Shen, X. Chen, X. Fu, and Y . Wang, “Cloudlet placement and task allocation in mobile edge computing,” IEEE Internet of Things J. , vol. 6, no. 3, pp. 5853–5863, 2019

work page 2019

[13] [13]

Spatio–temporal edge service placement: A bandit learning approach,

L. Chen, J. Xu, S. Ren, and P. Zhou, “Spatio–temporal edge service placement: A bandit learning approach,” IEEE Trans. on Wireless Commun., vol. 17, no. 12, pp. 8388–8401, 2018

work page 2018

[14] [14]

Bandit learning-based service placement and resource allocation for mobile edge computing,

W. He, D. He, Y . Huang, Y . Zhang, Y . Xu, G. Yun-feng, and W. Zhang, “Bandit learning-based service placement and resource allocation for mobile edge computing,” in IEEE 31st Annual Int. Symp. on Pers., Indoor and Mobile Radio Commun. , pp. 1–6, IEEE, 2020

work page 2020

[15] [15]

Lattimore and C

T. Lattimore and C. Szepesv ´ari, Bandit algorithms. Cambridge Univer- sity Press, 2020

work page 2020

[16] [16]

Pure exploration in multi-armed bandits problems,

S. Bubeck, R. Munos, and G. Stoltz, “Pure exploration in multi-armed bandits problems,” in 20th Int. Conf. Algorithmic Learn. Theory (ALT), Porto, Portugal,, pp. 23–37, 2009

work page 2009

[17] [17]

Online learning aided decentralized multi-user task offloading for mobile edge computing,

X. Wang, J. Ye, and J. C. Lui, “Online learning aided decentralized multi-user task offloading for mobile edge computing,” IEEE Trans. on Mobile Comput., vol. 23, no. 4, pp. 3328–3342, 2023

work page 2023

[18] [18]

Improved algorithms for linear stochastic bandits,

Y . Abbasi-Yadkori, D. P´al, and C. Szepesv ´ari, “Improved algorithms for linear stochastic bandits,” Advances in Neural Inf. Proc. Sys. , vol. 24, 2011

work page 2011

[19] [19]

Best arm identification in multi-armed bandits.,

J.-Y . Audibert, S. Bubeck, and R. Munos, “Best arm identification in multi-armed bandits.,” in COLT, pp. 41–53, 2010. 13

work page 2010

[20] [20]

On the complexity of best- arm identification in multi-armed bandit models,

E. Kaufmann, O. Capp ´e, and A. Garivier, “On the complexity of best- arm identification in multi-armed bandit models,”The J. of Mach. Learn. Res., vol. 17, no. 1, pp. 1–42, 2016

work page 2016

[21] [21]

On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning,

M. Hoffman, B. Shahriari, and N. Freitas, “On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning,” in Artif. Intell. and Statist. , pp. 365–374, PMLR, 2014

work page 2014

[22] [22]

Best-arm identification in linear bandits,

M. Soare, A. Lazaric, and R. Munos, “Best-arm identification in linear bandits,” Advances in Neural Inf. Proc. Sys. , vol. 27, 2014

work page 2014

[23] [23]

A fully adaptive algorithm for pure exploration in linear bandits,

L. Xu, J. Honda, and M. Sugiyama, “A fully adaptive algorithm for pure exploration in linear bandits,” in Int.l Conf. Artif. Intell. and Statist. , pp. 843–851, PMLR, 2018

work page 2018

[24] [24]

Distributed bandit learning: Near-optimal regret with efficient communication,

Y . Wang, J. Hu, X. Chen, and L. Wang, “Distributed bandit learning: Near-optimal regret with efficient communication,” in Int. Conf. on Learn. Representations (ICLR) 2020 , 2019

work page 2020

[25] [25]

Federated multi-armed bandits,

C. Shi and C. Shen, “Federated multi-armed bandits,” in Proc. of the AAAI Conf. on Artif. Intell. , vol. 35, pp. 9603–9611, 2021

work page 2021

[26] [26]

Federated best arm identification with heterogeneous clients,

Z. Chen, P. Karthik, V . Y . Tan, and Y . M. Chee, “Federated best arm identification with heterogeneous clients,” IEEE Trans. on Inf. Theory , 2023

work page 2023

[27] [27]

A simple and provably efficient al- gorithm for asynchronous federated contextual linear bandits,

J. He, T. Wang, Y . Min, and Q. Gu, “A simple and provably efficient al- gorithm for asynchronous federated contextual linear bandits,” Advances in Neural Inf. Proc. Sys. , vol. 35, pp. 4762–4775, 2022

work page 2022

[28] [28]

Exploiting heterogeneity in robust federated best-arm identification,

A. Mitra, H. Hassani, and G. Pappas, “Exploiting heterogeneity in robust federated best-arm identification,” arXiv preprint arXiv:2109.05700 , 2021

work page arXiv 2021

[29] [29]

Collaborative learning with limited interaction: Tight bounds for distributed exploration in multi-armed bandits,

C. Tao, Q. Zhang, and Y . Zhou, “Collaborative learning with limited interaction: Tight bounds for distributed exploration in multi-armed bandits,” in 2019 IEEE 60th Annu. Symp. on Found. of Comput. Sci. (FOCS), pp. 126–146, IEEE, 2019

work page 2019

[30] [30]

Federated linear contextual bandits,

R. Huang, W. Wu, J. Yang, and C. Shen, “Federated linear contextual bandits,” Advances in Neural Inf. Process. Sys. , vol. 34, pp. 27057– 27068, 2021

work page 2021

[31] [31]

Distributed contextual linear bandits with minimax optimal communication cost,

S. Amani, T. Lattimore, A. Gy ¨orgy, and L. Yang, “Distributed contextual linear bandits with minimax optimal communication cost,” in Int. Conf. on Mach. Learn. , pp. 691–717, PMLR, 2023

work page 2023

[32] [32]

Decentralized task offloading and load-balancing for mobile edge computing in dense networks,

M. Yahya, A. Conzelmann, and S. Maghsudi, “Decentralized task offloading and load-balancing for mobile edge computing in dense networks,” IEEE Commun. Lett. , 2024

work page 2024

[33] [33]

A contextual multi-armed bandit approach to caching in wireless small cell network,

C. Zhang, P. Ren, and Q. Du, “A contextual multi-armed bandit approach to caching in wireless small cell network,” in 2017 9th Int. Conf. on Wireless Commun. and Signal Proc. (WCSP) , pp. 1–6, IEEE, 2017

work page 2017

[34] [34]

Two time-scale joint service caching and task offloading for uav-assisted mobile edge computing,

R. Zhou, X. Wu, H. Tan, and R. Zhang, “Two time-scale joint service caching and task offloading for uav-assisted mobile edge computing,” in IEEE Conf. on Comput. Commun. (INFOCOM) , pp. 1189–1198, 2022

work page 2022

[35] [35]

Content popularity prediction towards location-aware mobile edge caching,

P. Yang, N. Zhang, S. Zhang, L. Yu, J. Zhang, and X. Shen, “Content popularity prediction towards location-aware mobile edge caching,” IEEE Trans. on Multimedia , vol. 21, no. 4, pp. 915–929, 2018

work page 2018