NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-Networks

Chamath Wanigasooriya; Indrajith Ekanayake

arxiv: 2604.11017 · v1 · submitted 2026-04-13 · 💻 cs.DC · cs.AI

NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-Networks

Chamath Wanigasooriya , Indrajith Ekanayake This is my paper

Pith reviewed 2026-05-10 16:08 UTC · model grok-4.3

classification 💻 cs.DC cs.AI

keywords kubernetesautoscalingdeep q-networklstm forecastingproactive scalingreinforcement learningcloud nativemicroservices

0 comments

The pith

A deep Q-network agent with LSTM forecasts enables proactive Kubernetes autoscaling that outperforms reactive methods in performance and cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces NimbusGuard as an open-source Kubernetes system that uses a deep Q-network reinforcement learning agent augmented by an LSTM model to forecast workload patterns and scale resources ahead of demand. This contrasts with reactive autoscalers like the Horizontal Pod Autoscaler and KEDA, which respond only after detecting changes and risk either over-provisioning or performance shortfalls. The authors evaluate NimbusGuard through experiments that compare it directly to those baselines. If the results hold, cloud operators running microservice applications could achieve more efficient resource use without sacrificing responsiveness.

Core claim

NimbusGuard is a Kubernetes autoscaling framework that employs a Deep Q-Network agent whose perception is augmented by a Long Short-Term Memory model to forecast future workload patterns, thereby enabling proactive scaling decisions that deliver superior performance and cost efficiency compared to the built-in Horizontal Pod Autoscaler and the event-driven autoscaler KEDA.

What carries the argument

The DQN reinforcement learning agent augmented by an LSTM model that forecasts workload patterns to drive proactive scaling actions.

If this is right

Lower cloud costs through reduced over-provisioning while avoiding under-provisioning during demand spikes.
Improved responsiveness for microservice applications by scaling resources before load increases.
An open-source alternative that can replace or supplement standard Kubernetes scaling controllers.
More efficient overall resource utilization in elastic cloud-native environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Periodic retraining on fresh traces may still be required for workloads that drift over time, limiting long-term savings.
The same proactive forecasting approach could be tested in other container platforms or serverless settings.
Adding explicit cost models or latency targets into the reward function might yield further gains.
Broader validation across diverse real-world traces would strengthen claims about generalization.

Load-bearing premise

The trained DQN agent with LSTM forecasts will generalize to new, unseen production workloads without major performance loss or the need for frequent retraining.

What would settle it

Deploy NimbusGuard on a production-like cluster using workload traces distinct from the training set and check whether its cost and performance metrics remain better than those of HPA and KEDA.

Figures

Figures reproduced from arXiv: 2604.11017 by Chamath Wanigasooriya, Indrajith Ekanayake.

**Figure 1.** Figure 1: High-level Overview of the Framework B. Algorithmic Framework NimbusGuard implements a novel hybrid autoscaling algorithm that combines Deep Q-Network (DQN), Long ShortTerm Memory (LSTM) forecasting, and an optional Large Language Model (LLM) validation layer for intelligent Kubernetes container scaling. The system operates on a 30- second decision interval. At each interval, it constructs a 6- dimensio… view at source ↗

**Figure 2.** Figure 2: HPA (Reactive Baseline) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: KEDA (Flexible Trigger) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: NimbusGuard (Proactive) TABLE II: Experimental Performance Comparison Performance Metric DQN HPA KEDA Avg. Time to Scale (sec) ∼ 60 s ± 5 s ∼ 300 s ± 5 s ∼ 90 s ± 5 s Avg. Replicas (pods) 5.44 3.05 2.93 Peak Replicas (pods) 7 4 4 Total Scaling Events 8 4 4 DQN represents the proposed NimbusGuard system with Deep Q-Network intelligence with an uncertainty of ±5 s B. DQN-Specific Intelligence Analysis A deep… view at source ↗

**Figure 5.** Figure 5: LSTM Feature Analysis shows the proactive forecasts [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Reward Analysis shows the evolution of the reward [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

Cloud native architecture is about building and running scalable microservice applications to take full advantage of the cloud environments. Managed Kubernetes is the powerhouse orchestrating cloud native applications with elastic scaling. However, traditional Kubernetes autoscalers are reactive, meaning the scaling controllers adjust resources only after they detect demand within the cluster and do not incorporate any predictive measures. This can lead to either over-provisioning and increased costs or under-provisioning and performance degradation. We propose NimbusGuard, an open-source, Kubernetes-based autoscaling system that leverages a deep reinforcement learning agent to provide proactive autoscaling. The agents perception is augmented by a Long Short-Term Memory model that forecasts future workload patterns. The evaluations were conducted by comparing NimbusGuard against the built-in scaling controllers, such as Horizontal Pod Autoscaler, and the event-driven autoscaler KEDA. The experimental results demonstrate how NimbusGuard's proactive framework translates into superior performance and cost efficiency compared to existing reactive methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NimbusGuard is a straightforward open-source Kubernetes implementation of DQN with LSTM forecasting for proactive autoscaling, but the abstract gives no training details or workload breakdowns so the performance claims cannot be verified.

read the letter

The main point is that this paper ships an open-source Kubernetes autoscaler that pairs a DQN agent with LSTM forecasts to act before demand hits, rather than reacting like HPA or KEDA. That is the concrete deliverable: a working system people can install and try. The authors correctly note that reactive scaling often leads to either wasted resources or latency spikes, and they target that gap with a predictive loop. Releasing the code is useful for anyone running production clusters who wants to experiment without building the whole thing themselves. If the implementation is clean and the forecasts actually help in practice, it could save money on over-provisioning while keeping response times down. The approach itself is not new; DQN and LSTM have been used for predictive scaling in earlier work, so the novelty sits in the Kubernetes packaging and the direct comparison to the two most common controllers. The soft spot is the evaluation. The abstract claims superior performance and cost efficiency but reports none of the numbers, no reward function, no training procedure, no workload traces, and no indication that test data was held out from training. Without those, it is impossible to know whether the gains come from the proactive mechanism or from fitting to the particular test cases. The stress-test concern about missing disjoint real-world traces looks accurate based on what is shown. RL agents for autoscaling frequently degrade on shifted patterns, and nothing here rules that out. This paper is for practitioners who need a ready example of predictive autoscaling in Kubernetes rather than for readers seeking new algorithms or formal results. I would bring it to a reading group only if the group wants to discuss applied implementations and open-source tools. I would not cite it because the underlying methods are established and the results are not substantiated. It deserves peer review so referees can examine the full methods and experiments, but it will require substantial additions to the evaluation section before the central claims can be trusted.

Referee Report

2 major / 1 minor

Summary. The paper proposes NimbusGuard, an open-source Kubernetes autoscaling system that uses a Deep Q-Network (DQN) agent augmented by an LSTM model to forecast workload patterns and enable proactive scaling decisions. It evaluates the system against the built-in Horizontal Pod Autoscaler (HPA) and the event-driven KEDA controller, claiming that the proactive DQN+LSTM approach yields superior performance and cost efficiency relative to these reactive baselines.

Significance. If the experimental results are shown to be robust and generalizable, the work would offer a practical contribution to cloud-native orchestration by demonstrating how reinforcement learning combined with time-series forecasting can reduce over-provisioning and latency penalties in managed Kubernetes environments.

major comments (2)

[Abstract] Abstract: The central claim of superior performance and cost efficiency is stated at a high level only, with no reporting of the DQN training procedure, reward function, state/action space definition, evaluation metrics (e.g., pod latency, throughput, or dollar cost), statistical significance, or workload trace characteristics. This absence prevents verification that the reported gains are attributable to the proactive mechanism rather than workload-specific tuning.
[Evaluation] Evaluation section: The strongest claim requires evidence that test workloads were drawn from a distribution disjoint from the training traces (e.g., different burst patterns, diurnal cycles, or multi-tenant interference). Without explicit confirmation of held-out production-like traces and no retraining on test data, the proactive benefit cannot be isolated from potential overfitting.

minor comments (1)

Define all acronyms (DQN, LSTM, HPA, KEDA) on first use and ensure consistent terminology between the abstract and the body.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and provide point-by-point responses below. Revisions have been made to strengthen the presentation of our contributions and address concerns about clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of superior performance and cost efficiency is stated at a high level only, with no reporting of the DQN training procedure, reward function, state/action space definition, evaluation metrics (e.g., pod latency, throughput, or dollar cost), statistical significance, or workload trace characteristics. This absence prevents verification that the reported gains are attributable to the proactive mechanism rather than workload-specific tuning.

Authors: We agree that the abstract, as a concise summary, would benefit from additional context to support the central claims. The full details on the DQN training procedure, reward function, state and action space definitions, evaluation metrics (including pod latency, throughput, and cost), statistical significance testing, and workload trace characteristics are provided in Sections 3 (Methodology) and 4 (Evaluation). In the revised version, we have updated the abstract to include key quantitative results (e.g., percentage improvements in latency and cost) along with brief references to these sections. This maintains brevity while enabling readers to verify that gains stem from the proactive DQN+LSTM mechanism. revision: yes
Referee: [Evaluation] Evaluation section: The strongest claim requires evidence that test workloads were drawn from a distribution disjoint from the training traces (e.g., different burst patterns, diurnal cycles, or multi-tenant interference). Without explicit confirmation of held-out production-like traces and no retraining on test data, the proactive benefit cannot be isolated from potential overfitting.

Authors: We concur that explicit confirmation of disjoint training and test distributions is essential to demonstrate generalization and isolate proactive benefits. Our evaluation employs held-out traces from distinct time periods and patterns (including varied burst intensities, diurnal cycles, and simulated multi-tenant interference) that were not used in training, with no retraining or fine-tuning performed on the test data. We have revised the Evaluation section to explicitly document the data partitioning process, confirm the absence of overlap, and describe the production-like characteristics of the held-out traces. This clarification strengthens the evidence that performance gains are attributable to the proactive framework rather than overfitting. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation with external baselines

full rationale

The paper presents NimbusGuard as a DQN+LSTM-based proactive autoscaler and evaluates it experimentally against independent external controllers (HPA, KEDA). No mathematical derivation chain, fitted-parameter predictions, self-definitional equations, or load-bearing self-citations are present in the abstract or described structure. The central claim rests on comparative performance metrics from workload traces rather than any reduction of outputs to inputs by construction. Generalization concerns (e.g., held-out traces) affect empirical validity but do not constitute circularity under the defined patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions of reinforcement learning (Markov decision process, reward signal existence) and time-series forecasting (LSTM can capture workload periodicity). No new physical or mathematical entities are postulated.

axioms (2)

domain assumption Workload patterns are sufficiently stationary or periodic for an LSTM to produce useful forecasts
Implicit in the use of LSTM for proactive scaling; stated in the abstract description of workload forecasting.
domain assumption A scalar reward function can be defined that balances performance and cost without introducing unintended behaviors
Required for any DQN-based controller; not detailed in the provided abstract.

pith-pipeline@v0.9.0 · 5466 in / 1274 out tokens · 72055 ms · 2026-05-10T16:08:01.043825+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ADAPT: A Self-Calibrating Proactive Autoscaler for Container Orchestration
cs.DC 2026-05 unverdicted novelty 4.0

ADAPT uses an EWMA estimator for cold-start durations to set a dynamic horizon in an MPC-based proactive autoscaler, achieving under 5% SLA violations with MPC+LSTM across tested workloads versus higher rates for HPA ...

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · cited by 1 Pith paper

[1]

Understanding cloud-native applications after 10 years of cloud computing-a systematic mapping study,

N. Kratzke and P.-C. Quint, “Understanding cloud-native applications after 10 years of cloud computing-a systematic mapping study,”Journal of Systems and Software, vol. 126, pp. 1–16, 2017

work page 2017
[2]

An experimental evaluation of the kubernetes cluster autoscaler in the cloud,

M. A. Tamiru, J. Tordsson, E. Elmroth, and G. Pierre, “An experimental evaluation of the kubernetes cluster autoscaler in the cloud,” in2020 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2020, pp. 17–24

work page 2020
[3]

Kubernetes,

“Kubernetes,” https://kubernetes.io/, 2025, [Accessed 2025-07-28]

work page 2025
[4]

Kubernetes and the path to cloud native,

E. A. Brewer, “Kubernetes and the path to cloud native,” inProceedings of the sixth ACM symposium on cloud computing, 2015, pp. 167–167

work page 2015
[5]

Scalable containerized pipeline for real-time big data analytics,

R. Aurangzaib, W. Iqbal, M. Abdullah, F. Bukhari, F. Ullah, and A. Er- radi, “Scalable containerized pipeline for real-time big data analytics,” in 2022 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2022, pp. 25–32

work page 2022
[6]

Machine learning-based scaling management for kubernetes edge clusters,

L. Toka, G. Dobreff, B. Fodor, and B. Sonkoly, “Machine learning-based scaling management for kubernetes edge clusters,”IEEE Transactions on Network and Service Management, vol. 18, no. 1, pp. 958–972, 2021

work page 2021
[7]

LangGraph — langchain.com,

“LangGraph — langchain.com,” https://www.langchain.com/langgraph, [Accessed 30-07-2025]

work page 2025
[8]

GitHub - kedacore/keda: KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes — github.com,

“GitHub - kedacore/keda: KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes — github.com,” https://github.com/kedacore/ keda, [Accessed 28-07-2025]

work page 2025
[9]

KEDA — cncf.io,

“KEDA — cncf.io,” https://www.cncf.io/projects/keda/, [Accessed 28- 07-2025]

work page 2025
[10]

Hierarchical scaling of microservices in kubernetes,

F. Rossi, V . Cardellini, and F. L. Presti, “Hierarchical scaling of microservices in kubernetes,” in2020 IEEE international conference on autonomic computing and self-organizing systems (ACSOS). IEEE, 2020, pp. 28–37

work page 2020
[11]

Toward optimal load prediction and customizable autoscaling scheme for kubernetes,

S. K. Mondal, X. Wu, H. M. D. Kabir, H.-N. Dai, K. Ni, H. Yuan, and T. Wang, “Toward optimal load prediction and customizable autoscaling scheme for kubernetes,”Mathematics, vol. 11, no. 12, p. 2675, 2023

work page 2023
[12]

Deep learning-based autoscaling using bidirectional long short-term memory for kubernetes,

N.-M. Dang-Quang and M. Yoo, “Deep learning-based autoscaling using bidirectional long short-term memory for kubernetes,”Applied Sciences, vol. 11, no. 9, p. 3835, 2021

work page 2021
[13]

Intelligent autoscaling of microservices in the cloud for real-time applications,

A. A. Khaleq and I. Ra, “Intelligent autoscaling of microservices in the cloud for real-time applications,”IEEE access, vol. 9, pp. 35 464–35 476, 2021

work page 2021
[14]

Machine learning- based auto-scaling for containerized applications,

M. Imdoukh, I. Ahmad, and M. G. Alfailakawi, “Machine learning- based auto-scaling for containerized applications,”Neural Computing and Applications, vol. 32, no. 13, pp. 9745–9760, 2020

work page 2020
[15]

Hansel: Adaptive hori- zontal scaling of microservices using bi-lstm,

M. Yan, X. Liang, Z. Lu, J. Wu, and W. Zhang, “Hansel: Adaptive hori- zontal scaling of microservices using bi-lstm,”Applied Soft Computing, vol. 105, p. 107216, 2021

work page 2021
[16]

Predictive hybrid autoscaling for containerized applications,

D.-D. Vu, M.-N. Tran, and Y . Kim, “Predictive hybrid autoscaling for containerized applications,”IEEE Access, vol. 10, pp. 109 768–109 778, 2022

work page 2022
[17]

A q-learning approach for the autoscaling of scientific workflows in the cloud,

Y . Gar´ı, D. A. Monge, and C. Mateos, “A q-learning approach for the autoscaling of scientific workflows in the cloud,”Future Generation Computer Systems, vol. 127, pp. 168–180, 2022

work page 2022
[18]

Drs: A deep reinforcement learning enhanced kubernetes scheduler for microservice-based system,

Z. Jian, X. Xie, Y . Fang, Y . Jiang, Y . Lu, A. Dash, T. Li, and G. Wang, “Drs: A deep reinforcement learning enhanced kubernetes scheduler for microservice-based system,”Software: Practice and Experience, vol. 54, no. 10, pp. 2102–2126, 2024

work page 2024
[19]

Kis-s: A gpu- aware kubernetes inference simulator with rl-based auto-scaling,

G. Zhang, W. Guo, Z. Tan, Q. Guan, and H. Jiang, “Kis-s: A gpu- aware kubernetes inference simulator with rl-based auto-scaling,”arXiv preprint arXiv:2507.07932, 2025

work page arXiv 2025
[20]

Fire-and-forget: Load/store scheduling with no store queue at all,

S. Subramaniam and G. H. Loh, “Fire-and-forget: Load/store scheduling with no store queue at all,” in2006 39th Annual IEEE/ACM Interna- tional Symposium on Microarchitecture (MICRO’06). IEEE, 2006, pp. 273–284

work page 2006
[21]

asyncio Asynchronous I/O,

“asyncio Asynchronous I/O,” https://docs.python.org/3/library/asyncio. html, [Accessed 30-07-2025]

work page 2025

[1] [1]

Understanding cloud-native applications after 10 years of cloud computing-a systematic mapping study,

N. Kratzke and P.-C. Quint, “Understanding cloud-native applications after 10 years of cloud computing-a systematic mapping study,”Journal of Systems and Software, vol. 126, pp. 1–16, 2017

work page 2017

[2] [2]

An experimental evaluation of the kubernetes cluster autoscaler in the cloud,

M. A. Tamiru, J. Tordsson, E. Elmroth, and G. Pierre, “An experimental evaluation of the kubernetes cluster autoscaler in the cloud,” in2020 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2020, pp. 17–24

work page 2020

[3] [3]

Kubernetes,

“Kubernetes,” https://kubernetes.io/, 2025, [Accessed 2025-07-28]

work page 2025

[4] [4]

Kubernetes and the path to cloud native,

E. A. Brewer, “Kubernetes and the path to cloud native,” inProceedings of the sixth ACM symposium on cloud computing, 2015, pp. 167–167

work page 2015

[5] [5]

Scalable containerized pipeline for real-time big data analytics,

R. Aurangzaib, W. Iqbal, M. Abdullah, F. Bukhari, F. Ullah, and A. Er- radi, “Scalable containerized pipeline for real-time big data analytics,” in 2022 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2022, pp. 25–32

work page 2022

[6] [6]

Machine learning-based scaling management for kubernetes edge clusters,

L. Toka, G. Dobreff, B. Fodor, and B. Sonkoly, “Machine learning-based scaling management for kubernetes edge clusters,”IEEE Transactions on Network and Service Management, vol. 18, no. 1, pp. 958–972, 2021

work page 2021

[7] [7]

LangGraph — langchain.com,

“LangGraph — langchain.com,” https://www.langchain.com/langgraph, [Accessed 30-07-2025]

work page 2025

[8] [8]

GitHub - kedacore/keda: KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes — github.com,

“GitHub - kedacore/keda: KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes — github.com,” https://github.com/kedacore/ keda, [Accessed 28-07-2025]

work page 2025

[9] [9]

KEDA — cncf.io,

“KEDA — cncf.io,” https://www.cncf.io/projects/keda/, [Accessed 28- 07-2025]

work page 2025

[10] [10]

Hierarchical scaling of microservices in kubernetes,

F. Rossi, V . Cardellini, and F. L. Presti, “Hierarchical scaling of microservices in kubernetes,” in2020 IEEE international conference on autonomic computing and self-organizing systems (ACSOS). IEEE, 2020, pp. 28–37

work page 2020

[11] [11]

Toward optimal load prediction and customizable autoscaling scheme for kubernetes,

S. K. Mondal, X. Wu, H. M. D. Kabir, H.-N. Dai, K. Ni, H. Yuan, and T. Wang, “Toward optimal load prediction and customizable autoscaling scheme for kubernetes,”Mathematics, vol. 11, no. 12, p. 2675, 2023

work page 2023

[12] [12]

Deep learning-based autoscaling using bidirectional long short-term memory for kubernetes,

N.-M. Dang-Quang and M. Yoo, “Deep learning-based autoscaling using bidirectional long short-term memory for kubernetes,”Applied Sciences, vol. 11, no. 9, p. 3835, 2021

work page 2021

[13] [13]

Intelligent autoscaling of microservices in the cloud for real-time applications,

A. A. Khaleq and I. Ra, “Intelligent autoscaling of microservices in the cloud for real-time applications,”IEEE access, vol. 9, pp. 35 464–35 476, 2021

work page 2021

[14] [14]

Machine learning- based auto-scaling for containerized applications,

M. Imdoukh, I. Ahmad, and M. G. Alfailakawi, “Machine learning- based auto-scaling for containerized applications,”Neural Computing and Applications, vol. 32, no. 13, pp. 9745–9760, 2020

work page 2020

[15] [15]

Hansel: Adaptive hori- zontal scaling of microservices using bi-lstm,

M. Yan, X. Liang, Z. Lu, J. Wu, and W. Zhang, “Hansel: Adaptive hori- zontal scaling of microservices using bi-lstm,”Applied Soft Computing, vol. 105, p. 107216, 2021

work page 2021

[16] [16]

Predictive hybrid autoscaling for containerized applications,

D.-D. Vu, M.-N. Tran, and Y . Kim, “Predictive hybrid autoscaling for containerized applications,”IEEE Access, vol. 10, pp. 109 768–109 778, 2022

work page 2022

[17] [17]

A q-learning approach for the autoscaling of scientific workflows in the cloud,

Y . Gar´ı, D. A. Monge, and C. Mateos, “A q-learning approach for the autoscaling of scientific workflows in the cloud,”Future Generation Computer Systems, vol. 127, pp. 168–180, 2022

work page 2022

[18] [18]

Drs: A deep reinforcement learning enhanced kubernetes scheduler for microservice-based system,

Z. Jian, X. Xie, Y . Fang, Y . Jiang, Y . Lu, A. Dash, T. Li, and G. Wang, “Drs: A deep reinforcement learning enhanced kubernetes scheduler for microservice-based system,”Software: Practice and Experience, vol. 54, no. 10, pp. 2102–2126, 2024

work page 2024

[19] [19]

Kis-s: A gpu- aware kubernetes inference simulator with rl-based auto-scaling,

G. Zhang, W. Guo, Z. Tan, Q. Guan, and H. Jiang, “Kis-s: A gpu- aware kubernetes inference simulator with rl-based auto-scaling,”arXiv preprint arXiv:2507.07932, 2025

work page arXiv 2025

[20] [20]

Fire-and-forget: Load/store scheduling with no store queue at all,

S. Subramaniam and G. H. Loh, “Fire-and-forget: Load/store scheduling with no store queue at all,” in2006 39th Annual IEEE/ACM Interna- tional Symposium on Microarchitecture (MICRO’06). IEEE, 2006, pp. 273–284

work page 2006

[21] [21]

asyncio Asynchronous I/O,

“asyncio Asynchronous I/O,” https://docs.python.org/3/library/asyncio. html, [Accessed 30-07-2025]

work page 2025