A Stackelberg Game Framework with Drainability Guardrails for Pricing and Scaling in Multi-Tenant GPU Cloud Platforms

Asrin Efe Yorulmaz; Hanchen Zhou; Junji Yan; Tamer Ba\c{s}ar

arxiv: 2604.16802 · v1 · submitted 2026-04-18 · 💻 cs.GT · cs.SY· eess.SY· math.OC

A Stackelberg Game Framework with Drainability Guardrails for Pricing and Scaling in Multi-Tenant GPU Cloud Platforms

Junji Yan , Asrin Efe Yorulmaz , Hanchen Zhou , Tamer Ba\c{s}ar This is my paper

Pith reviewed 2026-05-10 07:35 UTC · model grok-4.3

classification 💻 cs.GT cs.SYeess.SYmath.OC

keywords Stackelberg gameGPU cloud pricingdrainability guardrailmulti-tenant systemsdemand equilibrium mapconvergence analysisreinforcement learning safetydynamic scaling

0 comments

The pith

A computable drainability guardrail certifies unique convergence to an operating point for any fixed price-capacity pair in Stackelberg GPU-cloud pricing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates joint pricing and scaling as a large-population Stackelberg game between a cloud provider and heterogeneous tenants whose demand responds endogenously to prices and congestion. From the resulting equilibrium demand map the authors identify a structural failure: delay-insensitive workloads create a residual demand floor that leaves backlog undrainable under bounded price and capacity. They introduce a guardrail condition on price-capacity pairs that certifies uniformly negative drift in this regime. For any pair meeting the guardrail they prove existence of a unique operating point and global convergence to it when the step size satisfies a checkable bound. They then build an optimizer-agnostic action shield that uses the same guardrail to keep model-free reinforcement learning safe during dynamic operation.

Core claim

Deriving an explicit equilibrium demand map from the Stackelberg game reveals that delay-insensitive tenants sustain a residual demand floor, rendering backlog undrainable. The drainability guardrail is a computable condition on price and service capacity that guarantees uniformly negative drift whenever residual demand appears. For every fixed price-capacity pair satisfying the guardrail there exists a unique operating point, and the closed-loop dynamics converge globally to that point under a verifiable step-size restriction. The fixed-pair result directly supports an optimizer-agnostic action shield for the full dynamic pricing-and-scaling problem.

What carries the argument

The drainability guardrail, a computable condition on price-capacity pairs that certifies uniformly negative drift in the residual-demand regime of the Stackelberg equilibrium demand map.

If this is right

Unique operating point exists for every fixed price-capacity pair satisfying the guardrail.
Global convergence to that point holds whenever the step size meets the checkable condition.
The optimizer-agnostic action shield improves safety and robustness of model-free RL for the dynamic joint problem.
Backlog remains drainable even when delay-insensitive workloads are present.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Providers could embed the guardrail as a hard safety constraint when tuning prices or capacities in production.
The same guardrail-plus-shield structure might transfer to other multi-tenant resources such as CPU or storage pools.
Combining the shield with existing RL algorithms could shrink the set of unsafe actions encountered during online learning.
Trace-driven experiments comparing large-population predictions against measured tenant responses would test how well the model matches practice.

Load-bearing premise

The large-population limit and the explicit equilibrium demand map derived from the Stackelberg game accurately capture the endogenous, heterogeneous tenant behavior in real GPU clouds.

What would settle it

A simulation or trace-driven run in which a price-capacity pair meets both the drainability guardrail and the step-size bound, yet residual demand exhibits persistent non-negative drift or the trajectory fails to converge to a single operating point.

Figures

Figures reproduced from arXiv: 2604.16802 by Asrin Efe Yorulmaz, Hanchen Zhou, Junji Yan, Tamer Ba\c{s}ar.

**Figure 4.** Figure 4: Burst demand shift test. (a) Backlog response under shielded and [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 2.** Figure 2: (a) Relative off-grid return gap versus planning horizon [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Guardrail ablation in tabular Q-learning under off-grid dynamics. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Modern Graphics Processing Unit (GPU)-backed services must satisfy strict latency service-level objectives (SLOs) while controlling spare-capacity cost. In multi-tenant GPU cloud platforms, this trade-off is inherently dynamic because workload demand is endogenous; specifically, pricing shapes the submissions of heterogeneous tenants, which subsequently impact congestion and delay. We formulate the joint pricing-and-scaling problem as a large-population Stackelberg game problem, and we derive an explicit equilibrium demand map. The resulting closed-loop model reveals a structural failure mode in which delay-insensitive workloads sustain a residual demand floor, making the backlog undrainable under bounded price and service capacity. This observation motivates a computable drainability guardrail that certifies uniformly negative drift in the residual-demand regime. For any fixed price-capacity pair satisfying the drainability guardrail, we establish a unique operating point and global convergence towards it under a checkable step-size condition. Building on this fixed-pair analysis, we further develop an optimizer-agnostic action shield for the full dynamic problem and show empirically that it improves safety and robustness for model-free reinforcement learning (RL) in this setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a guardrail from a Stackelberg model to stop undrainable backlogs in GPU clouds and uses it as a safety shield for model-free RL on pricing and scaling.

read the letter

The paper's main point is that you can use a Stackelberg game to derive a guardrail that stops GPU cloud systems from developing undrainable backlogs, and then apply that guardrail as a shield when running reinforcement learning for pricing and scaling decisions. They model the joint problem as a large-population Stackelberg game and extract an explicit equilibrium demand map. This reveals a structural issue: delay-insensitive workloads can maintain a residual demand floor, so the backlog never drains under limited price and capacity changes. The drainability guardrail is a computable condition that ensures negative drift in the residual-demand case. For fixed price-capacity pairs inside the guardrail, they establish a unique operating point with global convergence under a checkable step-size condition. They build an optimizer-agnostic action shield from this and test it with model-free RL, finding better safety and robustness. The work does a good job of turning the game analysis into a practical tool for RL. The failure mode is a real concern in these platforms, and the guardrail addresses it without requiring the RL agent to know the full model. The convergence result is standard fixed-point stuff but fits the setting well. One soft spot is the large-population limit used for the demand map. Real multi-tenant setups have finite, heterogeneous users, so the map is an approximation that might not hold precisely. This could make the guardrail either too loose or too conservative in practice. The empirical claims for the RL shield are only high-level in the abstract, so the actual gains over unprotected RL need to be verified with the numbers. This paper targets people working on cloud resource management, especially those using game theory or RL for dynamic pricing and scaling in shared GPU environments. It would interest readers focused on safety in learned controllers for systems with feedback from user behavior. It deserves serious referee attention because the modeling is consistent and the guardrail idea is a useful addition to the literature on these platforms. I recommend sending it for peer review.

Referee Report

1 major / 3 minor

Summary. The manuscript formulates the joint pricing-and-scaling problem in multi-tenant GPU cloud platforms as a large-population Stackelberg game and derives an explicit equilibrium demand map. It identifies a structural failure mode in which delay-insensitive workloads sustain a residual demand floor that renders the backlog undrainable under bounded price and capacity. This motivates a computable drainability guardrail that certifies uniformly negative drift. For any fixed price-capacity pair satisfying the guardrail, the paper establishes a unique operating point and proves global convergence of the closed-loop dynamics under a checkable step-size condition. It further develops an optimizer-agnostic action shield for the full dynamic problem and reports empirical improvements in safety and robustness when the shield is applied to model-free reinforcement learning.

Significance. If the derivations and convergence results hold, the work supplies a rigorous, checkable framework for stable pricing and scaling that directly addresses endogenous demand in GPU clouds. The explicit demand map, negative-drift guardrail, and global-convergence theorem under a verifiable step-size condition constitute clear strengths, providing falsifiable predictions and a foundation for safe RL deployment. These elements could influence both theoretical mechanism design and practical cloud resource management.

major comments (1)

The central convergence claim (unique operating point and global convergence under the drainability guardrail) rests on the large-population limit used to obtain the equilibrium demand map. The manuscript should supply a concrete error bound or finite-N simulation comparison showing how closely the limit approximates heterogeneous tenant behavior; without this, the guardrail's practical certification power remains an unverified modeling assumption rather than a proven property.

minor comments (3)

The abstract and introduction would benefit from a short table or bullet list explicitly contrasting the proposed guardrail with standard Lyapunov-drift or capacity-constraint approaches in the cloud-computing literature.
Notation for the demand map, residual-demand regime, and step-size condition should be standardized and cross-referenced between the fixed-pair analysis and the dynamic action-shield section to improve readability.
The empirical evaluation section should report the precise RL algorithm, number of runs, and statistical significance tests used to claim improved safety and robustness; current description is too high-level for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the constructive comment on the large-population approximation. We address the point directly below and will revise the manuscript to incorporate additional validation.

read point-by-point responses

Referee: The central convergence claim (unique operating point and global convergence under the drainability guardrail) rests on the large-population limit used to obtain the equilibrium demand map. The manuscript should supply a concrete error bound or finite-N simulation comparison showing how closely the limit approximates heterogeneous tenant behavior; without this, the guardrail's practical certification power remains an unverified modeling assumption rather than a proven property.

Authors: We agree that explicit validation of the mean-field approximation for finite but large tenant populations would strengthen the practical interpretation of the drainability guardrail. In the revised version we will add a dedicated subsection containing Monte Carlo simulations of finite-N heterogeneous tenant populations drawn from the same type distribution used in the analysis. These experiments will quantify the L1 distance between the finite-N aggregate demand trajectory and the equilibrium demand map, demonstrate that the error vanishes with growing N, and confirm that the guardrail continues to enforce uniformly negative drift and convergence for N in the range 50–200, which is representative of realistic multi-tenant GPU clusters. We will also include a brief discussion of the modeling conditions under which the large-population limit remains a conservative and useful certification tool. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives an explicit equilibrium demand map from a large-population Stackelberg game formulation, identifies a structural failure mode (residual demand floor under bounded price/capacity), and introduces a computable drainability guardrail motivated by that mode to certify negative drift. For fixed pairs satisfying the guardrail it then proves uniqueness and global convergence under a step-size condition, followed by an action shield for the dynamic case. None of these steps reduce by construction to their inputs: the guardrail is not defined in terms of the convergence it certifies, the demand map is obtained from the game rather than fitted to the target quantities, and no self-citation chain or ansatz smuggling is required for the central fixed-point argument. The pipeline is internally consistent on its own mathematical terms without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on standard large-population game-theoretic assumptions and introduces the drainability guardrail as a new derived condition; no free parameters or invented physical entities are mentioned in the abstract.

axioms (2)

domain assumption Large-population limit allows derivation of an explicit equilibrium demand map
Invoked to obtain closed-form demand response from heterogeneous tenants.
domain assumption Tenant demand is endogenous and shaped by price and observed delay
Core modeling choice that creates the dynamic feedback loop.

invented entities (1)

Drainability guardrail no independent evidence
purpose: Condition on price-capacity pair that certifies uniformly negative drift in the residual-demand regime
New construct introduced to eliminate the undrainable-backlog failure mode identified in the model.

pith-pipeline@v0.9.0 · 5529 in / 1547 out tokens · 48963 ms · 2026-05-10T07:35:54.200916+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Autoscale: Dynamic, robust capacity management for multi-tier data centers,

A. Gandhi, M. Harchol-Balter, R. Raghunathan, and M. A. Kozuch, “Autoscale: Dynamic, robust capacity management for multi-tier data centers,”ACM Transactions on Computer Systems (TOCS), vol. 30, no. 4, pp. 1–26, 2012

work page 2012
[2]

Cold start latency in serverless computing: A systematic review, taxonomy, and future directions,

M. Golec, G. K. Walia, M. Kumar, F. Cuadrado, S. S. Gill, and S. Uhlig, “Cold start latency in serverless computing: A systematic review, taxonomy, and future directions,”ACM Computing Surveys, vol. 57, no. 3, pp. 1–36, 2024

work page 2024
[3]

LLM inference serving: Survey of recent advances and opportunities,

B. Li, Y . Jiang, V . Gadepally, and D. Tiwari, “LLM inference serving: Survey of recent advances and opportunities,” in2024 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2024, pp. 1–8

work page 2024
[4]

Enabling cost-effective, SLO-aware machine learning inference serving on public cloud,

C. Zhang, M. Yu, W. Wang, and F. Yan, “Enabling cost-effective, SLO-aware machine learning inference serving on public cloud,”IEEE Transactions on Cloud Computing, vol. 10, no. 3, pp. 1765–1779, 2020

work page 2020
[5]

Queue management for SLO-oriented large language model serving,

A. Patke, D. Reddy, S. Jha, H. Qiu, C. Pinto, C. Narayanaswami, Z. Kalbarczyk, and R. Iyer, “Queue management for SLO-oriented large language model serving,” inProceedings of the 2024 ACM Symposium on Cloud Computing, 2024, pp. 18–35

work page 2024
[6]

Bas ¸ar and G

T. Bas ¸ar and G. J. Olsder,Dynamic Noncooperative Game Theory, ser. SIAM Series in Classics in Applied Mathematics. Philadelphia: SIAM, 1998

work page 1998
[7]

Bas ¸ar, B

T. Bas ¸ar, B. Djehiche, and H. Tembine,Mean-Field-Type Game Theory I: Foundations and New Directions. Switzerland: Springer International Publishing AG, Feb. 2026

work page 2026
[8]

Reducing the cost of GPU cold starts in serverless deep learning inference serving,

J. San Juan and B. Wong, “Reducing the cost of GPU cold starts in serverless deep learning inference serving,” in2023 IEEE Inter- national Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops). IEEE, 2023, pp. 225–230

work page 2023
[9]

Exact analysis of the M/M/k/setup class of Markov chains via recursive renewal reward,

A. Gandhi, S. Doroudi, M. Harchol-Balter, and A. Scheller-Wolf, “Exact analysis of the M/M/k/setup class of Markov chains via recursive renewal reward,”Queueing Systems, vol. 77, no. 2, pp. 177– 209, 2014

work page 2014
[10]

A review of auto-scaling techniques for elastic applications in cloud environments,

T. Lorido-Botran, J. Miguel-Alonso, and J. A. Lozano, “A review of auto-scaling techniques for elastic applications in cloud environments,” Journal of Grid Computing, vol. 12, no. 4, pp. 559–592, 2014

work page 2014
[11]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduc- tion. Cambridge, MA: MIT Press, 1998

work page 1998
[12]

Safe reinforcement learning via shielding,

M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018

work page 2018
[13]

Agora: Bridging the GPU cloud resource-price disconnect,

I. McDougall, N. Scott, J. Huh, K. Kandasamy, and K. Sankaralingam, “Agora: Bridging the GPU cloud resource-price disconnect,”arXiv preprint arXiv:2510.05111, 2025

work page arXiv 2025
[14]

Dynamic pricing for network service: Equilibrium and stability,

Y . Masuda and S. Whang, “Dynamic pricing for network service: Equilibrium and stability,”Management Science, vol. 45, no. 6, pp. 857–869, 1999

work page 1999
[15]

Dynamic pricing and lead-time quotation for a multiclass make-to-order queue,

S. C ¸ elik and C. Maglaras, “Dynamic pricing and lead-time quotation for a multiclass make-to-order queue,”Management Science, vol. 54, no. 6, pp. 1132–1146, 2008

work page 2008
[16]

Pricing strategies and service differ- entiation in queues—a profit maximization perspective,

A.-K. Katta and J. Sethuraman, “Pricing strategies and service differ- entiation in queues—a profit maximization perspective,”Department of Industrial Engineering and Operations Research, Columbia Uni- versity, 2005

work page 2005
[17]

A Stackelberg network game with a large number of followers,

T. Bas ¸ar and R. Srikant, “A Stackelberg network game with a large number of followers,”Journal of Optimization Theory and Applications, vol. 115, no. 3, pp. 479–490, Dec. 2002. [Online]. Available: https://doi.org/10.1023/A:1021294828483

work page doi:10.1023/a:1021294828483 2002
[18]

Pricing and congestion management in a network with heterogeneous users,

S. Stidham, “Pricing and congestion management in a network with heterogeneous users,”IEEE Transactions on Automatic Control, vol. 49, no. 6, pp. 976–981, 2004

work page 2004
[19]

Optimal pricing and service provisioning strategies in cloud systems: A Stackelberg Game Approach,

V . D. Valerio, V . Cardellini, and F. L. Presti, “Optimal pricing and service provisioning strategies in cloud systems: A Stackelberg Game Approach,” in2013 IEEE Sixth International Conference on Cloud Computing, 2013, pp. 115–122

work page 2013
[20]

Learning curves and stochastic models for pricing and provisioning cloud computing services,

A. Gera and C. H. Xia, “Learning curves and stochastic models for pricing and provisioning cloud computing services,”Service Science, vol. 3, no. 1, pp. 99–109, 2011

work page 2011
[21]

The value of dynamic pricing in large queueing systems,

J. Kim and R. S. Randhawa, “The value of dynamic pricing in large queueing systems,”Operations Research, vol. 66, no. 2, pp. 409–425, 2018

work page 2018
[22]

Revenue-maximizing pricing and capacity expansion in a many-users regime,

T. Bas ¸ar and R. Srikant, “Revenue-maximizing pricing and capacity expansion in a many-users regime,” inProceedings. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 1. IEEE, 2002, pp. 294–301

work page 2002
[23]

Delayed (de-) activation in servers with a sleep mode,

M. Herlich, N. Bredenbals, and H. Karl, “Delayed (de-) activation in servers with a sleep mode,”Sustainable Computing: Informatics and Systems, vol. 10, pp. 48–55, 2016

work page 2016
[24]

D. P. Bertsekas,Dynamic Programming and Optimal Control: Volume I. Athena Scientific, 2012

work page 2012
[25]

Neuro-dynamic programming: an overview,

D. P. Bertsekas and J. N. Tsitsiklis, “Neuro-dynamic programming: an overview,” inProceedings of 1995 34th IEEE Conference on Decision and Control, vol. 1. IEEE, 1995, pp. 560–564

work page 1995
[26]

Assessing the impact of distribution shift on reinforcement learning perfor- mance,

T. Fujimoto, J. Suetterlein, S. Chatterjee, and A. Ganguly, “Assessing the impact of distribution shift on reinforcement learning perfor- mance,”arXiv preprint arXiv:2402.03590, 2024

work page arXiv 2024

[1] [1]

Autoscale: Dynamic, robust capacity management for multi-tier data centers,

A. Gandhi, M. Harchol-Balter, R. Raghunathan, and M. A. Kozuch, “Autoscale: Dynamic, robust capacity management for multi-tier data centers,”ACM Transactions on Computer Systems (TOCS), vol. 30, no. 4, pp. 1–26, 2012

work page 2012

[2] [2]

Cold start latency in serverless computing: A systematic review, taxonomy, and future directions,

M. Golec, G. K. Walia, M. Kumar, F. Cuadrado, S. S. Gill, and S. Uhlig, “Cold start latency in serverless computing: A systematic review, taxonomy, and future directions,”ACM Computing Surveys, vol. 57, no. 3, pp. 1–36, 2024

work page 2024

[3] [3]

LLM inference serving: Survey of recent advances and opportunities,

B. Li, Y . Jiang, V . Gadepally, and D. Tiwari, “LLM inference serving: Survey of recent advances and opportunities,” in2024 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2024, pp. 1–8

work page 2024

[4] [4]

Enabling cost-effective, SLO-aware machine learning inference serving on public cloud,

C. Zhang, M. Yu, W. Wang, and F. Yan, “Enabling cost-effective, SLO-aware machine learning inference serving on public cloud,”IEEE Transactions on Cloud Computing, vol. 10, no. 3, pp. 1765–1779, 2020

work page 2020

[5] [5]

Queue management for SLO-oriented large language model serving,

A. Patke, D. Reddy, S. Jha, H. Qiu, C. Pinto, C. Narayanaswami, Z. Kalbarczyk, and R. Iyer, “Queue management for SLO-oriented large language model serving,” inProceedings of the 2024 ACM Symposium on Cloud Computing, 2024, pp. 18–35

work page 2024

[6] [6]

Bas ¸ar and G

T. Bas ¸ar and G. J. Olsder,Dynamic Noncooperative Game Theory, ser. SIAM Series in Classics in Applied Mathematics. Philadelphia: SIAM, 1998

work page 1998

[7] [7]

Bas ¸ar, B

T. Bas ¸ar, B. Djehiche, and H. Tembine,Mean-Field-Type Game Theory I: Foundations and New Directions. Switzerland: Springer International Publishing AG, Feb. 2026

work page 2026

[8] [8]

Reducing the cost of GPU cold starts in serverless deep learning inference serving,

J. San Juan and B. Wong, “Reducing the cost of GPU cold starts in serverless deep learning inference serving,” in2023 IEEE Inter- national Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops). IEEE, 2023, pp. 225–230

work page 2023

[9] [9]

Exact analysis of the M/M/k/setup class of Markov chains via recursive renewal reward,

A. Gandhi, S. Doroudi, M. Harchol-Balter, and A. Scheller-Wolf, “Exact analysis of the M/M/k/setup class of Markov chains via recursive renewal reward,”Queueing Systems, vol. 77, no. 2, pp. 177– 209, 2014

work page 2014

[10] [10]

A review of auto-scaling techniques for elastic applications in cloud environments,

T. Lorido-Botran, J. Miguel-Alonso, and J. A. Lozano, “A review of auto-scaling techniques for elastic applications in cloud environments,” Journal of Grid Computing, vol. 12, no. 4, pp. 559–592, 2014

work page 2014

[11] [11]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduc- tion. Cambridge, MA: MIT Press, 1998

work page 1998

[12] [12]

Safe reinforcement learning via shielding,

M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018

work page 2018

[13] [13]

Agora: Bridging the GPU cloud resource-price disconnect,

I. McDougall, N. Scott, J. Huh, K. Kandasamy, and K. Sankaralingam, “Agora: Bridging the GPU cloud resource-price disconnect,”arXiv preprint arXiv:2510.05111, 2025

work page arXiv 2025

[14] [14]

Dynamic pricing for network service: Equilibrium and stability,

Y . Masuda and S. Whang, “Dynamic pricing for network service: Equilibrium and stability,”Management Science, vol. 45, no. 6, pp. 857–869, 1999

work page 1999

[15] [15]

Dynamic pricing and lead-time quotation for a multiclass make-to-order queue,

S. C ¸ elik and C. Maglaras, “Dynamic pricing and lead-time quotation for a multiclass make-to-order queue,”Management Science, vol. 54, no. 6, pp. 1132–1146, 2008

work page 2008

[16] [16]

Pricing strategies and service differ- entiation in queues—a profit maximization perspective,

A.-K. Katta and J. Sethuraman, “Pricing strategies and service differ- entiation in queues—a profit maximization perspective,”Department of Industrial Engineering and Operations Research, Columbia Uni- versity, 2005

work page 2005

[17] [17]

A Stackelberg network game with a large number of followers,

T. Bas ¸ar and R. Srikant, “A Stackelberg network game with a large number of followers,”Journal of Optimization Theory and Applications, vol. 115, no. 3, pp. 479–490, Dec. 2002. [Online]. Available: https://doi.org/10.1023/A:1021294828483

work page doi:10.1023/a:1021294828483 2002

[18] [18]

Pricing and congestion management in a network with heterogeneous users,

S. Stidham, “Pricing and congestion management in a network with heterogeneous users,”IEEE Transactions on Automatic Control, vol. 49, no. 6, pp. 976–981, 2004

work page 2004

[19] [19]

Optimal pricing and service provisioning strategies in cloud systems: A Stackelberg Game Approach,

V . D. Valerio, V . Cardellini, and F. L. Presti, “Optimal pricing and service provisioning strategies in cloud systems: A Stackelberg Game Approach,” in2013 IEEE Sixth International Conference on Cloud Computing, 2013, pp. 115–122

work page 2013

[20] [20]

Learning curves and stochastic models for pricing and provisioning cloud computing services,

A. Gera and C. H. Xia, “Learning curves and stochastic models for pricing and provisioning cloud computing services,”Service Science, vol. 3, no. 1, pp. 99–109, 2011

work page 2011

[21] [21]

The value of dynamic pricing in large queueing systems,

J. Kim and R. S. Randhawa, “The value of dynamic pricing in large queueing systems,”Operations Research, vol. 66, no. 2, pp. 409–425, 2018

work page 2018

[22] [22]

Revenue-maximizing pricing and capacity expansion in a many-users regime,

T. Bas ¸ar and R. Srikant, “Revenue-maximizing pricing and capacity expansion in a many-users regime,” inProceedings. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 1. IEEE, 2002, pp. 294–301

work page 2002

[23] [23]

Delayed (de-) activation in servers with a sleep mode,

M. Herlich, N. Bredenbals, and H. Karl, “Delayed (de-) activation in servers with a sleep mode,”Sustainable Computing: Informatics and Systems, vol. 10, pp. 48–55, 2016

work page 2016

[24] [24]

D. P. Bertsekas,Dynamic Programming and Optimal Control: Volume I. Athena Scientific, 2012

work page 2012

[25] [25]

Neuro-dynamic programming: an overview,

D. P. Bertsekas and J. N. Tsitsiklis, “Neuro-dynamic programming: an overview,” inProceedings of 1995 34th IEEE Conference on Decision and Control, vol. 1. IEEE, 1995, pp. 560–564

work page 1995

[26] [26]

Assessing the impact of distribution shift on reinforcement learning perfor- mance,

T. Fujimoto, J. Suetterlein, S. Chatterjee, and A. Ganguly, “Assessing the impact of distribution shift on reinforcement learning perfor- mance,”arXiv preprint arXiv:2402.03590, 2024

work page arXiv 2024