NL-CPS: Reinforcement Learning-Based Kubernetes Control Plane Placement in Multi-Region Clusters

Amjad Ullah; Sajid Alam; Ze Wang

arxiv: 2604.08434 · v1 · submitted 2026-04-09 · 💻 cs.DC

NL-CPS: Reinforcement Learning-Based Kubernetes Control Plane Placement in Multi-Region Clusters

Sajid Alam , Amjad Ullah , Ze Wang This is my paper

Pith reviewed 2026-05-10 16:49 UTC · model grok-4.3

classification 💻 cs.DC

keywords kubernetescontrol plane placementreinforcement learningneural contextual banditsmulti-region clusterscloud-edge computingcontainer orchestration

0 comments

The pith

A neural contextual bandit reinforcement learning system learns optimal Kubernetes control-plane node placements from infrastructure characteristics and performance observations in multi-region clusters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Kubernetes control-plane placement affects cluster reliability and performance, yet current methods often pick nodes arbitrarily without regard to resources or network topology. The paper proposes a reinforcement learning framework that uses neural contextual bandits to observe real operational metrics and derive better placement policies automatically for clusters spanning multiple regions and Cloud-Edge resources. If the approach works, container orchestration could move from static initialization to adaptive selection that improves scalability and resilience without manual intervention. Experiments across geographically distributed regions and varied cluster sizes report measurable gains against baseline strategies.

Core claim

The authors present NL-CPS, a reinforcement learning framework based on neural contextual bandits that observes operational performance and infrastructure characteristics to learn policies for placing control-plane nodes across dynamically selected multi-region Cloud-Edge resources, with experimental results showing substantial improvements over baseline approaches in several cluster configurations.

What carries the argument

Neural contextual bandits that map infrastructure features and performance observations to placement actions and learn from resulting cluster performance rewards.

If this is right

Control-plane node selection becomes an automated, data-driven process rather than an arbitrary choice at cluster initialization.
Clusters gain improved reliability and scalability when operating across heterogeneous multi-region infrastructures.
The learned policies extend to dynamic Cloud-Edge resource selection within an automated orchestration system.
Overall cluster performance metrics improve relative to conventional initialization procedures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the bandit learns generalizable policies, similar contextual reinforcement learning could be applied to related orchestration decisions such as worker-node scheduling.
Production deployments would benefit from testing how quickly the system adapts when regions or node capacities change in real time.
The framework might integrate into existing Kubernetes operators to reduce manual tuning for large-scale multi-region workloads.

Load-bearing premise

The neural contextual bandit can learn stable and effective placement policies from the available infrastructure and performance data without excessive exploration costs or instability in changing multi-region environments.

What would settle it

Apply the learned policy to a new, unseen multi-region Kubernetes cluster configuration and check whether measured metrics such as control-plane latency, pod scheduling time, or availability fail to exceed those achieved by arbitrary or baseline placements.

Figures

Figures reproduced from arXiv: 2604.08434 by Amjad Ullah, Sajid Alam, Ze Wang.

**Figure 1.** Figure 1: NL-CPS Architecture step 5, and returns an immediate reward signal in step 6, which the agent uses to update its network parameters. This cycle repeats across successive training episodes, enabling the agent to progressively learn which node characteristics contribute to optimal control-plane performance. NL-CPS maintains a neural network fθ : R 3 → R that predicts expected rewards for the features of the … view at source ↗

**Figure 2.** Figure 2: NL-CPS training convergence for 5, 8, 10, and 12-node [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: 12-node cluster: 40-pod deployment throughput [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: 12-node cluster: Pod creation latency (40-pod deploy [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: 18-node cluster: Throughput and latency across place [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 5.** Figure 5: 12-node cluster: CRUD operation latencies. NL-CPS [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

The placement of Kubernetes control-plane nodes is critical to ensuring cluster reliability, scalability, and performance, and therefore represents a significant deployment challenge in heterogeneous, multi-region environments. Existing initialisation procedures typically select control-plane hosts arbitrarily, without considering node resource capacity or network topology, often leading to suboptimal cluster performance and reduced resilience. Given Kubernetes's status as the de facto standard for container orchestration, there is a need to rigorously evaluate how control-plane node placement influences the overall performance of the cluster operating across multiple regions. This paper advances this goal by introducing an intelligent methodology for selecting control-plane node placement across dynamically selected Cloud-Edge resources spanning multiple regions, as part of an automated orchestration system. More specifically, we propose a reinforcement learning framework based on neural contextual bandits that observes operational performance and learns optimal control-plane placement policies from infrastructure characteristics. Experimental evaluation across several geographically distributed regions and multiple cluster configurations demonstrates substantial performance improvements over several baseline approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps neural contextual bandits onto Kubernetes control-plane placement and backs the mapping with multi-region experiments that beat baselines.

read the letter

This paper takes neural contextual bandits and applies them to deciding where to put Kubernetes control-plane nodes across regions. The authors treat infrastructure metrics and topology as context, define a reward from cluster performance, and show the bandit learns policies that improve on several baselines in their tests. That is the core contribution: a concrete, working formulation for this placement task rather than a new algorithm. The experiments cover multiple geographic regions and cluster sizes, which gives the results some practical grounding. The formulation stays consistent with standard contextual bandit setups, and the reported outcomes line up with the learning step they describe. No load-bearing gaps appear in how they connect the bandit to the placement decisions. The main limitation is that the work stays incremental. The technique itself is established, so the advance is the domain mapping and the empirical comparison rather than a new theoretical result. The abstract calls the gains substantial, but the value depends on how large those gains are in absolute terms and whether they survive more aggressive baselines or production noise. Exploration cost during learning also needs clear handling if the method is meant for live clusters. Readers working on automated orchestration or cloud-edge systems will find the setup and numbers useful as a reference point. The paper is coherent on its own terms and supplies enough detail to evaluate the claims, so it belongs in peer review rather than a desk reject.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces NL-CPS, a reinforcement learning framework based on neural contextual bandits for selecting optimal Kubernetes control-plane node placements in heterogeneous multi-region clusters. It observes infrastructure characteristics and operational performance metrics to learn placement policies, with experimental evaluation across geographically distributed regions and multiple cluster configurations claiming substantial improvements over baseline approaches.

Significance. If the reported empirical gains are reproducible, the work offers a practical, data-driven method for improving reliability and performance in distributed container orchestration systems, which are foundational to cloud-edge computing. The explicit formulation of context features, reward based on cluster performance, and comparison to baselines provides a falsifiable contribution that could inform automated deployment tools.

minor comments (2)

Abstract: the phrase 'substantial performance improvements' is not accompanied by any quantitative metrics or effect sizes; adding one or two key numbers (e.g., latency reduction or throughput gain) would make the summary more informative without altering the technical content.
The weakest assumption noted in the stress-test (potential instability or exploration overhead of the neural contextual bandit in dynamic multi-region settings) is addressed by the reported experiments, but a brief paragraph quantifying convergence time or regret bounds would further strengthen the claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, significance assessment, and recommendation for minor revision. The review correctly identifies the core contribution of NL-CPS as a neural contextual bandit formulation for control-plane placement that incorporates infrastructure context and cluster performance rewards. We appreciate the recognition of its potential applicability to automated deployment tools in cloud-edge environments.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript proposes a neural contextual bandit RL framework for Kubernetes control-plane placement and reports experimental gains over baselines across multi-region setups. No equations, derivations, fitted parameters presented as predictions, or self-citation chains appear in the provided text. The approach is self-contained as an empirical systems contribution whose validity rests on the reported performance comparisons rather than any reduction of outputs to inputs by definition or construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on standard reinforcement-learning assumptions about observable states and reward signals; no free parameters, new axioms, or invented entities are introduced in the abstract.

axioms (1)

domain assumption Infrastructure characteristics and operational performance metrics are observable and can serve as context and reward signals for the bandit algorithm.
Implicit in the description of the RL framework that learns from these observations.

pith-pipeline@v0.9.0 · 5460 in / 1120 out tokens · 64640 ms · 2026-05-10T16:49:05.628200+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Cloud native application development: Best practices and challenges,

V . U. Ugwueze, “Cloud native application development: Best practices and challenges,”International Journal of Research Publication and Reviews, vol. 5, no. 12, pp. 2399–2412, 2024

work page 2024
[2]

Orchestration in the cloud-to-things compute continuum: taxonomy, survey and future directions,

A. Ullah, T. Kiss, J. Kov ´acs, F. Tusa, J. Deslauriers, H. Dagdeviren, R. Arjun, and H. Hamzeh, “Orchestration in the cloud-to-things compute continuum: taxonomy, survey and future directions,”Journal of Cloud Computing, vol. 12, no. 1, pp. 1–29, 2023

work page 2023
[3]

Cloud-edge orchestration for smart cities: A review of kubernetes-based orchestration architectures,

S. B ¨ohm and G. Wirtz, “Cloud-edge orchestration for smart cities: A review of kubernetes-based orchestration architectures,”EAI Endorsed Transactions on Smart Cities, vol. 6, no. 18, pp. e2–e2, 2022

work page 2022
[4]

Cloud resource orchestration in the multi-cloud landscape: a systematic review of existing frameworks,

O. Tomarchio, D. Calcaterra, and G. D. Modica, “Cloud resource orchestration in the multi-cloud landscape: a systematic review of existing frameworks,”Journal of Cloud Computing, 2020

work page 2020
[5]

Lidc: A location independent multi- cluster computing framework for data intensive science,

S. Timilsina and S. Shannigrahi, “Lidc: A location independent multi- cluster computing framework for data intensive science,” inSC24- W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2024, pp. 760– 764

work page 2024
[6]

Towards a decentralised application- centric orchestration framework in the cloud-edge continuum,

A. Ullah, A. M ´arkus, H. I. Aslan, T. Kiss, J. Kov´acs, J. Deslauriers, A. L. Murphy, Y . Wang, and O. Kao, “Towards a decentralised application- centric orchestration framework in the cloud-edge continuum,” in9th IEEE International Conference on Fog and Edge Computing, ICFEC 2025, Tromso, Norway, May 19-22, 2025. IEEE, 2025, pp. 37–41. [Online]. Availab...

work page doi:10.1109/icfec65699.2025.00014 2025
[7]

Distributed resource selection for self-organising cloud-edge systems,

Q. Renau, A. Ullah, and E. Hart, “Distributed resource selection for self-organising cloud-edge systems,”arXiv preprint arXiv:2510.08228, 2025

work page arXiv 2025
[8]

Self-adapting cpu scheduling for mixed database workloads via hierarchical deep reinforcement learning,

S. Xing, Y . Wang, and W. Liu, “Self-adapting cpu scheduling for mixed database workloads via hierarchical deep reinforcement learning,” Symmetry, vol. 17, no. 7, p. 1109, 2025

work page 2025
[9]

Adaptive container migration in cloud-native systems via deep q-learning optimization,

W. Zhu, “Adaptive container migration in cloud-native systems via deep q-learning optimization,”Journal of Computer Technology and Software, vol. 3, no. 5, 2024

work page 2024
[10]

A hierarchical framework of cloud resource allocation and power manage- ment using deep reinforcement learning,

N. Liu, Z. Li, J. Xu, Z. Xu, S. Lin, Q. Qiu, J. Tang, and Y . Wang, “A hierarchical framework of cloud resource allocation and power manage- ment using deep reinforcement learning,” inIEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2017, pp. 372–382

work page 2017
[11]

Swarmchestrate: Towards a fully decentralised framework for orchestrating applications in the cloud-to-edge continuum,

T. Kiss, A. Ullah, G. Terstyanszky, O. Kao, S. Becker, Y . Verginadis, A. Michalas, V . Stankovski, A. Kertesz, E. Ricciet al., “Swarmchestrate: Towards a fully decentralised framework for orchestrating applications in the cloud-to-edge continuum,” inInternational Conference on Advanced Information Networking and Applications. Springer, 2024, pp. 89–100

work page 2024
[12]

Lightweight Kubernetes distributions: A performance comparison of MicroK8s, k3s, k0s, and Microshift,

H. Koziolek and N. Eskandani, “Lightweight Kubernetes distributions: A performance comparison of MicroK8s, k3s, k0s, and Microshift,” inProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering (ICPE). Coimbra, Portugal: ACM, 2023, pp. 17–29

work page 2023
[13]

Kubernetes distributions for the edge: serverless performance evaluation,

V . Kjorveziroski and S. Filiposka, “Kubernetes distributions for the edge: serverless performance evaluation,”The Journal of Supercomputing, vol. 78, no. 11, pp. 13 728–13 755, 2022

work page 2022
[14]

Profiling lightweight container platforms: MicroK8s and K3s in comparison to Kubernetes,

S. B ¨ohm and G. Wirtz, “Profiling lightweight container platforms: MicroK8s and K3s in comparison to Kubernetes,” inProceedings of the 13th ZEUS Workshop (ZEUS 2021), ser. CEUR Workshop Proceedings, vol. 2839, 2021, pp. 65–73

work page 2021
[15]

Performance evalu- ation of container orchestration tools in edge computing environments,

I. ˇCili´c, P. Krivi´c, I. Podnar ˇZarko, and M. Ku ˇsek, “Performance evalu- ation of container orchestration tools in edge computing environments,” Sensors, vol. 23, no. 8, p. 4008, 2023

work page 2023
[16]

On the optimization of Kubernetes toward the enhancement of cloud computing,

S. K. Mondal, Z. Zheng, and Y . Cheng, “On the optimization of Kubernetes toward the enhancement of cloud computing,”Mathematics, vol. 12, no. 16, p. 2476, 2024

work page 2024
[17]

Enhancing the Kubernetes platform with a load-aware orchestration strategy,

A. Marchese and O. Tomarchio, “Enhancing the Kubernetes platform with a load-aware orchestration strategy,”SN Computer Science, vol. 6, pp. 1–15, 2025

work page 2025
[18]

Karmada: Open, multi-cloud, multi-cluster Ku- bernetes orchestration,

Karmada Community, “Karmada: Open, multi-cloud, multi-cluster Ku- bernetes orchestration,” https://karmada.io, 2024, accessed: 2024-12-30

work page 2024
[19]

RLSK: A job scheduler for federated Kubernetes clusters based on reinforcement learning,

J. Huang, C. Xiao, and W. Wu, “RLSK: A job scheduler for federated Kubernetes clusters based on reinforcement learning,” inProceedings of the IEEE International Conference on Cloud Engineering (IC2E). IEEE, 2020, pp. 116–123

work page 2020
[20]

Tailored learning-based scheduling for Kubernetes-oriented edge-cloud system,

Y . Han, S. Shen, X. Wang, S. Wang, and V . C. M. Leung, “Tailored learning-based scheduling for Kubernetes-oriented edge-cloud system,” inProceedings of IEEE INFOCOM. IEEE, 2021, pp. 1–10

work page 2021
[21]

DRS: A deep reinforce- ment learning enhanced Kubernetes scheduler for microservice-based system,

Y . Jian, P. Liu, S. Yang, Y . Zhang, and Z. Li, “DRS: A deep reinforce- ment learning enhanced Kubernetes scheduler for microservice-based system,”Software: Practice and Experience, vol. 54, no. 2, pp. 287– 306, 2024

work page 2024
[22]

Optimization of task-scheduling strategy in edge Kubernetes clusters based on deep reinforcement learning,

X. Wang, K. Zhao, and B. Qin, “Optimization of task-scheduling strategy in edge Kubernetes clusters based on deep reinforcement learning,” Mathematics, vol. 11, no. 20, p. 4269, 2023

work page 2023
[23]

Kubernetes scheduling: Taxonomy, ongoing issues and challenges,

C. Carri ´on, “Kubernetes scheduling: Taxonomy, ongoing issues and challenges,”ACM Computing Surveys, vol. 55, no. 7, pp. 1–37, 2022

work page 2022
[24]

A contextual-bandit approach to personalized news article recommendation,

L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” inWWW, 2010

work page 2010
[25]

Finite-time analysis of the multiarmed bandit problem,

P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,”Machine Learning, vol. 47, no. 2-3, pp. 235–256, 2002

work page 2002
[26]

Neural contextual bandits with upper confidence bound-based exploration,

D. Zhou, L. Li, and Q. Gu, “Neural contextual bandits with upper confidence bound-based exploration,” inUAI, 2020

work page 2020
[27]

K-bench: Workload benchmark for Kubernetes,

VMware Tanzu, “K-bench: Workload benchmark for Kubernetes,” https: //github.com/vmware-tanzu/k-bench, 2020, accessed: 2025-03-09

work page 2020

[1] [1]

Cloud native application development: Best practices and challenges,

V . U. Ugwueze, “Cloud native application development: Best practices and challenges,”International Journal of Research Publication and Reviews, vol. 5, no. 12, pp. 2399–2412, 2024

work page 2024

[2] [2]

Orchestration in the cloud-to-things compute continuum: taxonomy, survey and future directions,

A. Ullah, T. Kiss, J. Kov ´acs, F. Tusa, J. Deslauriers, H. Dagdeviren, R. Arjun, and H. Hamzeh, “Orchestration in the cloud-to-things compute continuum: taxonomy, survey and future directions,”Journal of Cloud Computing, vol. 12, no. 1, pp. 1–29, 2023

work page 2023

[3] [3]

Cloud-edge orchestration for smart cities: A review of kubernetes-based orchestration architectures,

S. B ¨ohm and G. Wirtz, “Cloud-edge orchestration for smart cities: A review of kubernetes-based orchestration architectures,”EAI Endorsed Transactions on Smart Cities, vol. 6, no. 18, pp. e2–e2, 2022

work page 2022

[4] [4]

Cloud resource orchestration in the multi-cloud landscape: a systematic review of existing frameworks,

O. Tomarchio, D. Calcaterra, and G. D. Modica, “Cloud resource orchestration in the multi-cloud landscape: a systematic review of existing frameworks,”Journal of Cloud Computing, 2020

work page 2020

[5] [5]

Lidc: A location independent multi- cluster computing framework for data intensive science,

S. Timilsina and S. Shannigrahi, “Lidc: A location independent multi- cluster computing framework for data intensive science,” inSC24- W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2024, pp. 760– 764

work page 2024

[6] [6]

Towards a decentralised application- centric orchestration framework in the cloud-edge continuum,

A. Ullah, A. M ´arkus, H. I. Aslan, T. Kiss, J. Kov´acs, J. Deslauriers, A. L. Murphy, Y . Wang, and O. Kao, “Towards a decentralised application- centric orchestration framework in the cloud-edge continuum,” in9th IEEE International Conference on Fog and Edge Computing, ICFEC 2025, Tromso, Norway, May 19-22, 2025. IEEE, 2025, pp. 37–41. [Online]. Availab...

work page doi:10.1109/icfec65699.2025.00014 2025

[7] [7]

Distributed resource selection for self-organising cloud-edge systems,

Q. Renau, A. Ullah, and E. Hart, “Distributed resource selection for self-organising cloud-edge systems,”arXiv preprint arXiv:2510.08228, 2025

work page arXiv 2025

[8] [8]

Self-adapting cpu scheduling for mixed database workloads via hierarchical deep reinforcement learning,

S. Xing, Y . Wang, and W. Liu, “Self-adapting cpu scheduling for mixed database workloads via hierarchical deep reinforcement learning,” Symmetry, vol. 17, no. 7, p. 1109, 2025

work page 2025

[9] [9]

Adaptive container migration in cloud-native systems via deep q-learning optimization,

W. Zhu, “Adaptive container migration in cloud-native systems via deep q-learning optimization,”Journal of Computer Technology and Software, vol. 3, no. 5, 2024

work page 2024

[10] [10]

A hierarchical framework of cloud resource allocation and power manage- ment using deep reinforcement learning,

N. Liu, Z. Li, J. Xu, Z. Xu, S. Lin, Q. Qiu, J. Tang, and Y . Wang, “A hierarchical framework of cloud resource allocation and power manage- ment using deep reinforcement learning,” inIEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2017, pp. 372–382

work page 2017

[11] [11]

Swarmchestrate: Towards a fully decentralised framework for orchestrating applications in the cloud-to-edge continuum,

T. Kiss, A. Ullah, G. Terstyanszky, O. Kao, S. Becker, Y . Verginadis, A. Michalas, V . Stankovski, A. Kertesz, E. Ricciet al., “Swarmchestrate: Towards a fully decentralised framework for orchestrating applications in the cloud-to-edge continuum,” inInternational Conference on Advanced Information Networking and Applications. Springer, 2024, pp. 89–100

work page 2024

[12] [12]

Lightweight Kubernetes distributions: A performance comparison of MicroK8s, k3s, k0s, and Microshift,

H. Koziolek and N. Eskandani, “Lightweight Kubernetes distributions: A performance comparison of MicroK8s, k3s, k0s, and Microshift,” inProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering (ICPE). Coimbra, Portugal: ACM, 2023, pp. 17–29

work page 2023

[13] [13]

Kubernetes distributions for the edge: serverless performance evaluation,

V . Kjorveziroski and S. Filiposka, “Kubernetes distributions for the edge: serverless performance evaluation,”The Journal of Supercomputing, vol. 78, no. 11, pp. 13 728–13 755, 2022

work page 2022

[14] [14]

Profiling lightweight container platforms: MicroK8s and K3s in comparison to Kubernetes,

S. B ¨ohm and G. Wirtz, “Profiling lightweight container platforms: MicroK8s and K3s in comparison to Kubernetes,” inProceedings of the 13th ZEUS Workshop (ZEUS 2021), ser. CEUR Workshop Proceedings, vol. 2839, 2021, pp. 65–73

work page 2021

[15] [15]

Performance evalu- ation of container orchestration tools in edge computing environments,

I. ˇCili´c, P. Krivi´c, I. Podnar ˇZarko, and M. Ku ˇsek, “Performance evalu- ation of container orchestration tools in edge computing environments,” Sensors, vol. 23, no. 8, p. 4008, 2023

work page 2023

[16] [16]

On the optimization of Kubernetes toward the enhancement of cloud computing,

S. K. Mondal, Z. Zheng, and Y . Cheng, “On the optimization of Kubernetes toward the enhancement of cloud computing,”Mathematics, vol. 12, no. 16, p. 2476, 2024

work page 2024

[17] [17]

Enhancing the Kubernetes platform with a load-aware orchestration strategy,

A. Marchese and O. Tomarchio, “Enhancing the Kubernetes platform with a load-aware orchestration strategy,”SN Computer Science, vol. 6, pp. 1–15, 2025

work page 2025

[18] [18]

Karmada: Open, multi-cloud, multi-cluster Ku- bernetes orchestration,

Karmada Community, “Karmada: Open, multi-cloud, multi-cluster Ku- bernetes orchestration,” https://karmada.io, 2024, accessed: 2024-12-30

work page 2024

[19] [19]

RLSK: A job scheduler for federated Kubernetes clusters based on reinforcement learning,

J. Huang, C. Xiao, and W. Wu, “RLSK: A job scheduler for federated Kubernetes clusters based on reinforcement learning,” inProceedings of the IEEE International Conference on Cloud Engineering (IC2E). IEEE, 2020, pp. 116–123

work page 2020

[20] [20]

Tailored learning-based scheduling for Kubernetes-oriented edge-cloud system,

Y . Han, S. Shen, X. Wang, S. Wang, and V . C. M. Leung, “Tailored learning-based scheduling for Kubernetes-oriented edge-cloud system,” inProceedings of IEEE INFOCOM. IEEE, 2021, pp. 1–10

work page 2021

[21] [21]

DRS: A deep reinforce- ment learning enhanced Kubernetes scheduler for microservice-based system,

Y . Jian, P. Liu, S. Yang, Y . Zhang, and Z. Li, “DRS: A deep reinforce- ment learning enhanced Kubernetes scheduler for microservice-based system,”Software: Practice and Experience, vol. 54, no. 2, pp. 287– 306, 2024

work page 2024

[22] [22]

Optimization of task-scheduling strategy in edge Kubernetes clusters based on deep reinforcement learning,

X. Wang, K. Zhao, and B. Qin, “Optimization of task-scheduling strategy in edge Kubernetes clusters based on deep reinforcement learning,” Mathematics, vol. 11, no. 20, p. 4269, 2023

work page 2023

[23] [23]

Kubernetes scheduling: Taxonomy, ongoing issues and challenges,

C. Carri ´on, “Kubernetes scheduling: Taxonomy, ongoing issues and challenges,”ACM Computing Surveys, vol. 55, no. 7, pp. 1–37, 2022

work page 2022

[24] [24]

A contextual-bandit approach to personalized news article recommendation,

L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” inWWW, 2010

work page 2010

[25] [25]

Finite-time analysis of the multiarmed bandit problem,

P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,”Machine Learning, vol. 47, no. 2-3, pp. 235–256, 2002

work page 2002

[26] [26]

Neural contextual bandits with upper confidence bound-based exploration,

D. Zhou, L. Li, and Q. Gu, “Neural contextual bandits with upper confidence bound-based exploration,” inUAI, 2020

work page 2020

[27] [27]

K-bench: Workload benchmark for Kubernetes,

VMware Tanzu, “K-bench: Workload benchmark for Kubernetes,” https: //github.com/vmware-tanzu/k-bench, 2020, accessed: 2025-03-09

work page 2020