KubePACS: Kubernetes Cluster Using Performant, Highly Available, and Cost Efficient Spot Instances
Pith reviewed 2026-05-08 01:37 UTC · model grok-4.3
The pith
KubePACS picks Kubernetes spot instances by jointly optimizing real-time prices, workload performance benchmarks, and multi-node availability scores.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
KubePACS formulates instance-type selection as a multi-objective optimization that incorporates spot prices, performance benchmarks, and multi-node Spot Placement Scores, solves the problem efficiently with an Integer Linear Programming model guided by Golden Section Search, and integrates the outcome with Karpenter to jointly decide instance types and scaling while preserving availability.
What carries the argument
Multi-objective Integer Linear Programming model guided by Golden Section Search that balances cost, performance, and availability using real-time spot prices, benchmarks, and multi-node SPS data.
If this is right
- Kubernetes operators can run the same workloads on spot instances with materially higher throughput per dollar spent.
- The Karpenter integration lets existing clusters adopt the new selection logic without changing their scaling workflow.
- Workload-specific scaling of performance metrics lets the same system handle both general and specialized instance preferences.
- Clusters stay available because the optimization explicitly includes multi-node placement scores rather than treating availability as an afterthought.
Where Pith is reading between the lines
- The same optimization structure could be applied to other container platforms or to on-demand instances when performance data is available.
- Cloud providers might begin publishing richer, workload-aware benchmark data if systems like this demonstrate consistent value.
- Longer-running experiments on GPU or memory-intensive jobs would test whether the reported gains generalize beyond the evaluated workloads.
Load-bearing premise
That current spot prices, performance benchmarks, and Spot Placement Scores remain reliable predictors of long-term cost, speed, and interruption risk once instances are actually running.
What would settle it
Deploy KubePACS and a price-only baseline on identical production workloads for multiple weeks, then compare measured total cost of ownership, actual throughput, and interruption frequency against the predictions made at provisioning time.
Figures
read the original abstract
Cloud users aim to minimize cost while maximizing performance by selecting the most suitable instance types for their workloads. To reduce expenses, spot instances have been widely adopted due to their steep discounts compared to on-demand pricing. However, their use introduces reliability risks due to potential interruptions, and existing research has primarily focused on mitigating this trade-off from a cost or availability perspective alone. Despite the diversity in hardware capabilities among instance types, current provisioning systems tend to ignore performance variation, selecting nodes solely based on minimum resource requirements. In this paper, we present KubePACS, a Kubernetes-native spot instance provisioning system that constructs node pools optimized for both cost and performance while guaranteeing high availability. KubePACS formulates the node selection process as a multi-objective optimization problem, incorporating real-time data such as spot prices, performance benchmarks, and availability scores, including the multi-node Spot Placement Score (SPS). It solves this problem efficiently using an Integer Linear Programming (ILP) approach guided by the Golden Section Search (GSS) algorithm to find the optimal configuration. By integrating with the Karpenter node autoscaler, KubePACS jointly optimizes instance-type selection and node scaling decisions within a standard provisioning workflow. KubePACS also adopts a novel heuristic to support workload-specific preferences by scaling performance metrics for specialized instances. Through extensive evaluation across synthetic and real-world workloads, KubePACS demonstrates on average 55.09% and up to 81.06% higher performance per dollar over state-of-the-art solutions such as Karpenter, SpotVerse, and SpotKube, which only reference the spot instance prices and limited availability data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents KubePACS, a Kubernetes-native spot instance provisioning system that formulates node selection as a multi-objective ILP problem solved using the Golden Section Search algorithm. It incorporates real-time spot prices, performance benchmarks, and multi-node Spot Placement Scores (SPS) to optimize for cost, performance, and availability, integrates with Karpenter, and uses a heuristic for workload-specific preferences. Through evaluations on synthetic and real-world workloads, it claims an average 55.09% and up to 81.06% higher performance per dollar compared to baselines like Karpenter, SpotVerse, and SpotKube.
Significance. If the claimed gains prove robust, KubePACS could meaningfully advance practical cost-performance optimization for spot-based Kubernetes deployments by jointly handling instance-type selection and scaling. The ILP+GSS formulation and integration with an existing autoscaler provide a concrete, deployable approach that goes beyond price-only or availability-only methods used in baselines. The explicit quantitative comparison to three prior systems is a strength, but only if the evaluation captures sustained behavior rather than point-in-time selection.
major comments (1)
- [Evaluation (abstract claims and implied experimental section)] The headline result (55.09% average and 81.06% maximum performance-per-dollar improvement) is load-bearing on the claim that real-time inputs (spot prices, benchmarks, SPS) produce node pools whose measured cost, throughput, and uptime match the selection-time predictions. The abstract states that baselines use only prices and limited availability data, yet provides no indication that KubePACS evaluation includes post-provisioning interruption modeling, price fluctuation during workload runs, or replacement overhead applied uniformly to all systems. If workloads are short or interruptions are omitted, the reported gains do not demonstrate production-relevant superiority.
minor comments (1)
- [Abstract] The abstract refers to 'extensive evaluation' and 'real-world workloads' without defining workload durations, interruption rates, statistical tests, or error bars; adding these details would strengthen verifiability of the 55.09%/81.06% figures.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comment on the evaluation below and clarify the methodology while strengthening the presentation of results where appropriate.
read point-by-point responses
-
Referee: [Evaluation (abstract claims and implied experimental section)] The headline result (55.09% average and 81.06% maximum performance-per-dollar improvement) is load-bearing on the claim that real-time inputs (spot prices, benchmarks, SPS) produce node pools whose measured cost, throughput, and uptime match the selection-time predictions. The abstract states that baselines use only prices and limited availability data, yet provides no indication that KubePACS evaluation includes post-provisioning interruption modeling, price fluctuation during workload runs, or replacement overhead applied uniformly to all systems. If workloads are short or interruptions are omitted, the reported gains do not demonstrate production-relevant superiority.
Authors: We thank the referee for highlighting the importance of validating that selection-time predictions translate to measured outcomes under realistic conditions. Our evaluation deployed the node pools chosen by KubePACS and each baseline (Karpenter, SpotVerse, SpotKube) on actual AWS spot instances and executed both the synthetic benchmarks and real-world workloads on those live clusters. The reported performance-per-dollar values are derived from measured throughput and actual incurred costs during these runs, which therefore incorporate any interruptions, price changes, and replacement effects that occurred. The multi-node SPS component was specifically intended to improve uptime, and observed uptime contributed to the metrics. We acknowledge, however, that the manuscript does not explicitly document workload durations, the uniform modeling of replacement overhead, or simulated price fluctuations applied identically to all systems. In the revised manuscript we will expand the experimental section with a dedicated subsection describing the evaluation protocol, workload runtimes, observed interruption rates, and how replacement costs were factored uniformly into the performance-per-dollar calculations for every compared system. This addition will make the production relevance of the results more transparent. revision: partial
Circularity Check
No circularity: standard ILP+GSS on external inputs with empirical evaluation
full rationale
The paper's core derivation formulates node-pool selection as a multi-objective ILP incorporating external real-time spot prices, performance benchmarks, and multi-node SPS, then solves it with the standard GSS algorithm before integrating with Karpenter. The reported 55.09% average (up to 81.06%) perf/$ gains are obtained from post-deployment measurements on synthetic and real workloads, not by algebraic reduction of the objective to its own fitted parameters or self-citations. No equation or step equates the claimed superiority to a tautological renaming or input-only prediction; the baselines are simply described as using fewer data sources. The chain is therefore self-contained against external cloud APIs and benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Real-time spot prices, performance benchmarks, and multi-node Spot Placement Scores are reliable and stable enough to drive provisioning decisions
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.