pith. sign in

arxiv: 2605.15788 · v1 · pith:S4JFLVXNnew · submitted 2026-05-15 · 💻 cs.DC · cs.LG

ADAPT: A Self-Calibrating Proactive Autoscaler for Container Orchestration

Pith reviewed 2026-05-19 19:22 UTC · model grok-4.3

classification 💻 cs.DC cs.LG
keywords autoscalingcontainer orchestrationmodel predictive controlcold startproactive scalingEWMA estimatorSLA violationprovisioning delay
0
0 comments X

The pith

An online EWMA estimator of varying cold-start durations lets an MPC controller hold SLA violations below 5 percent across all tested workloads.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that provisioning delays in container orchestration are not fixed and can change even between consecutive scale-out events. It introduces a runtime estimator that measures these delays on the fly and uses the measurements to set a moving planning window for a predictive controller. This closed loop replaces fixed-horizon assumptions with measured adaptation. If the approach holds, operators gain a proactive scaler that needs no manual calibration or extra sensors yet still beats both reactive baselines and other predictive combinations on service-level violations.

Core claim

ADAPT maintains an exponentially weighted moving average of observed cold-start durations at runtime and supplies the resulting value as a dynamic planning horizon (FH-OPT) to a Model Predictive Controller. The MPC then optimizes replica counts over a rolling window whose length is set by the current horizon estimate. The combination produces a self-calibrating proactive autoscaler whose lookahead automatically tracks the environment's actual provisioning delay.

What carries the argument

ADAPT, the online EWMA estimator that tracks cold-start duration at runtime and supplies a dynamic planning horizon to the MPC optimizer.

If this is right

  • MPC paired with an LSTM predictor keeps SLA violations under 5 percent on every workload archetype tested.
  • The same MPC with a Prophet predictor reaches 28.7 percent violations on bimodal traffic.
  • Reactive HPA produces 7 to 19 percent violations on the same workloads.
  • The closed-loop design removes the need for a static lookahead chosen in advance.
  • Adaptation occurs solely from observed provisioning events inside the running system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same estimator could be attached to other prediction or control loops that currently assume fixed startup costs.
  • In multi-cloud or heterogeneous node pools the online measurement might expose systematic differences that static models miss.
  • If cold-start variance proves higher in production than in the six archetypes, the EWMA smoothing factor itself may need environment-specific tuning.

Load-bearing premise

An online EWMA can reliably follow changes in cold-start duration across environments and successive scale-outs without extra sensors or manual recalibration.

What would settle it

A production trace in which measured cold-start times shift abruptly and the EWMA-based horizon produces SLA violations above the 5 percent threshold reported for MPC+LSTM would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.15788 by Himanshu Singh Baghel.

Figure 1
Figure 1. Figure 1: SLA violation rate (%) per policy across all six [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ADAPT estimate ∆( ˆ t) converging to the ground-truth cold-start of 120 s on a diurnal-burst trace. Shaded region shows ±1 standard deviation from Welford online variance. ramp-up workloads the difference is smaller; in those cases, a reactive policy has enough time to catch up and the proactive overhead becomes less clearly beneficial. B. ADAPT Convergence [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: SLA violation rate (%) as a function of cold-start [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Total cost (replica-minutes) vs. mean SLA violation [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

Proactive autoscaling for containerized workloads depends on knowing the provisioning delay, i.e., the time between a scaling decision and the moment new capacity is ready to serve traffic. In practice, this cold-start duration can vary substantially across environments and even across consecutive scale-out events. We present ADAPT (Adaptive Duration Approximation for Predictive Timing), an online EWMA estimator that tracks coldstart duration at runtime. ADAPT feeds a dynamic planning horizon, FH-OPT, into a Model Predictive Controller (MPC) that optimizes replica counts over a rolling window. Together, these components form a closed-loop proactive autoscaling design that adapts its lookahead based on measured provisioning delay. Evaluated across three policies (MPC+LSTM, MPC+Prophet, HPA) and six workload archetypes with five random seeds, MPC+LSTM achieves below 5% SLA violation on all workloads, compared with 7-19% for reactive HPA and up to 28.7% for MPC+Prophet on bimodal traffic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents ADAPT, an online EWMA-based estimator that tracks variable cold-start durations at runtime to dynamically set the planning horizon FH-OPT for a Model Predictive Controller (MPC). Combined with LSTM or Prophet predictors, the closed-loop system is evaluated against reactive HPA across six workload archetypes and five random seeds, with the central claim that MPC+LSTM keeps SLA violations below 5% on all workloads while HPA ranges 7-19% and MPC+Prophet reaches 28.7% on bimodal traffic.

Significance. If the performance claims hold under rigorous validation, ADAPT would provide a practical, self-calibrating mechanism for handling environment-dependent provisioning delays in container orchestration, reducing reliance on static assumptions in proactive autoscalers. The online data-driven estimator and its integration into MPC represent a clear engineering contribution for dynamic cloud workloads.

major comments (2)
  1. [Evaluation] Evaluation section: the headline result that MPC+LSTM achieves below 5% SLA violation across all workloads is stated without any description of experimental setup, workload generation method, definition or measurement of SLA violations, or statistical significance testing. This directly undermines the ability to assess support for the central performance claims.
  2. [Estimator design] Estimator design (around the EWMA update rule): no quantification is given for the estimator's accuracy (e.g., tracking error), adaptation speed, or convergence time after changes in cold-start duration across consecutive scale-outs or environments. Because FH-OPT is derived from this estimator and fed to the MPC, the absence of these metrics leaves the robustness of the closed-loop adaptation unverified.
minor comments (1)
  1. [Abstract] The abstract mentions 'five random seeds' but does not indicate what was randomized (e.g., workload parameters, initial conditions) or how variability across seeds is summarized in the reported percentages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, acknowledging where additional detail is needed and outlining the specific revisions we will incorporate.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the headline result that MPC+LSTM achieves below 5% SLA violation across all workloads is stated without any description of experimental setup, workload generation method, definition or measurement of SLA violations, or statistical significance testing. This directly undermines the ability to assess support for the central performance claims.

    Authors: We agree that the Evaluation section in the submitted manuscript lacks sufficient methodological detail to allow full assessment of the reported results. In the revised version we will expand this section with: a complete description of the experimental testbed and container orchestration environment; the precise procedure used to synthesize the six workload archetypes; the formal definition of SLA violations together with how they are measured and aggregated; and statistical significance testing (including p-values from paired t-tests or ANOVA) across the five random seeds. These additions will directly support evaluation of the central performance claims. revision: yes

  2. Referee: [Estimator design] Estimator design (around the EWMA update rule): no quantification is given for the estimator's accuracy (e.g., tracking error), adaptation speed, or convergence time after changes in cold-start duration across consecutive scale-outs or environments. Because FH-OPT is derived from this estimator and fed to the MPC, the absence of these metrics leaves the robustness of the closed-loop adaptation unverified.

    Authors: We concur that explicit quantification of the EWMA estimator would strengthen verification of the closed-loop design. We will add a dedicated subsection (or appendix) in the revision that reports the estimator's mean tracking error, adaptation speed (measured in scale-out events), and convergence time under both gradual and abrupt changes in cold-start duration. These metrics will be derived from the same experimental traces used for the end-to-end evaluation, thereby demonstrating the robustness of the dynamic FH-OPT input to the MPC. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain

full rationale

The paper describes ADAPT as an online EWMA estimator that tracks cold-start duration at runtime in a data-driven manner and feeds the resulting dynamic FH-OPT into an MPC optimizer. The central performance claims (MPC+LSTM below 5% SLA violation across workloads) are presented as outcomes of empirical evaluation on six workload archetypes with comparisons to HPA and MPC+Prophet baselines. No load-bearing step reduces by construction to its inputs: there is no self-definitional loop where a fitted parameter is renamed as a prediction, no self-citation chain invoked as a uniqueness theorem, and no ansatz smuggled via prior work. The estimator is explicitly online and adaptive rather than tuned to the final SLA metric, leaving the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The design implicitly assumes that cold-start duration is the dominant variable delay and that EWMA suffices to track it without external inputs.

pith-pipeline@v0.9.0 · 5700 in / 1167 out tokens · 41378 ms · 2026-05-19T19:22:29.413991+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 2 internal anchors

  1. [1]

    Horizontal Pod Autoscaling,

    Kubernetes Authors, “Horizontal Pod Autoscaling,” https: //kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/, 2024, [Accessed 2026-05-01]

  2. [2]

    AWS Lambda cold start latency — performance under load,

    Amazon Web Services, “AWS Lambda cold start latency — performance under load,” https://aws.amazon.com/blogs/compute/, 2024, [Accessed 2026-05-01]

  3. [3]

    An experimental evaluation of the Kubernetes cluster autoscaler in the cloud,

    M. A. Tamiru, J. Tordsson, E. Elmroth, and G. Pierre, “An experimental evaluation of the Kubernetes cluster autoscaler in the cloud,” in2020 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2020, pp. 17–24

  4. [4]

    Machine learning-based scaling management for Kubernetes edge clusters,

    L. Toka, G. Dobreff, B. Fodor, and B. Sonkoly, “Machine learning-based scaling management for Kubernetes edge clusters,”IEEE Transactions on Network and Service Management, vol. 18, no. 1, pp. 958–972, 2021

  5. [5]

    Toward optimal load prediction and customizable autoscaling scheme for Kubernetes,

    S. K. Mondal, X. Wu, H. M. D. Kabir, H.-N. Dai, K. Ni, H. Yuan, and T. Wang, “Toward optimal load prediction and customizable autoscaling scheme for Kubernetes,”Mathematics, vol. 11, no. 12, p. 2675, 2023

  6. [7]

    NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-Networks

    C. Wanigasooriya and I. Ekanayake, “NimbusGuard: A novel frame- work for proactive Kubernetes autoscaling using deep Q-networks,” in Proceedings of the IEEE ICIIS, 2025, arXiv:2604.11017

  7. [8]

    Predictive Autoscaling for Node.js on Kubernetes: Lower Latency, Right-Sized Capacity

    Platformatic, “Predictive autoscaling for Node.js applications,” https:// arxiv.org/abs/2604.19705, 2025, arXiv:2604.19705v2

  8. [10]

    Burstgpt: A real-world workload dataset to optimize llm serving systems,

    Microsoft Azure, “Characterizing and efficiently serving large lan- guage model inference requests,” https://arxiv.org/abs/2401.17644, 2024, arXiv:2401.17644

  9. [11]

    KEDA: Kubernetes Event-Driven Autoscaling,

    “KEDA: Kubernetes Event-Driven Autoscaling,” https://keda.sh/, 2024

  10. [12]

    Kubernetes Cluster Autoscaler,

    Kubernetes Authors, “Kubernetes Cluster Autoscaler,” https://github. com/kubernetes/autoscaler, 2024, [Accessed 2026-05-01]

  11. [13]

    Autopilot: Workload autoscaling at Google,

    K. Rzadca, M. Waruszewskiet al., “Autopilot: Workload autoscaling at Google,” inProceedings of the 15th European Conference on Computer Systems (EuroSys). ACM, 2020

  12. [14]

    Predictive scaling for amazon ec2 auto scaling,

    Amazon Web Services, “Predictive scaling for amazon ec2 auto scaling,” https://docs.aws.amazon.com/autoscaling/ec2/userguide/ ec2-auto-scaling-predictive-scaling.html, 2024, [Accessed 2026-05-14]

  13. [15]

    SHOW AR: Right-sizing and efficient scheduling of microservices,

    K. Rzadcaet al., “SHOW AR: Right-sizing and efficient scheduling of microservices,” inProceedings of the ACM Symposium on Cloud Computing (SoCC). ACM, 2021

  14. [16]

    FIRM: An intelligent fine-grained resource management framework for SLO- oriented microservices,

    H. Qiu, S. S. Banerjee, S. Jha, Z. Kalbarczyk, and R. K. Iyer, “FIRM: An intelligent fine-grained resource management framework for SLO- oriented microservices,” in14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020, pp. 805–825

  15. [17]

    Model predictive control for horizontal autoscaling in cloud environments,

    R. Rajkumaret al., “Model predictive control for horizontal autoscaling in cloud environments,” inProceedings of IEEE CLOUD. IEEE, 2022

  16. [18]

    Empirical analysis of container cold start latency variability in public clouds,

    A. Mohanet al., “Empirical analysis of container cold start latency variability in public clouds,” inProceedings of the ACM Symposium on Cloud Computing (SoCC). ACM, 2023

  17. [19]

    Workload prediction using ARIMA model and its impact on cloud applications’ QoS,

    R. N. Calheiros, E. Masoumi, R. Ranjan, and R. Buyya, “Workload prediction using ARIMA model and its impact on cloud applications’ QoS,”IEEE Transactions on Cloud Computing, vol. 3, no. 4, pp. 449– 458, 2015

  18. [20]

    Forecasting at scale,

    S. J. Taylor and B. Letham, “Forecasting at scale,”The American Statistician, vol. 72, no. 1, pp. 37–45, 2018

  19. [21]

    Prophet-based capacity planning for cloud services,

    Y . Jianget al., “Prophet-based capacity planning for cloud services,” in Proceedings of IEEE CLOUD. IEEE, 2021

  20. [22]

    Deep learning-based autoscaling using bidirectional LSTM for Kubernetes,

    N.-M. Dang-Quang and M. Yoo, “Deep learning-based autoscaling using bidirectional LSTM for Kubernetes,”Applied Sciences, vol. 11, no. 9, p. 3835, 2021

  21. [23]

    Temporal fusion transformers for interpretable multi-horizon time series forecasting,

    B. Lim, S. O. Arik, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time series forecasting,” in International Journal of Forecasting, vol. 37, no. 4, 2021, pp. 1748– 1764

  22. [24]

    Auto-scaling approaches for cloud-native applications: A survey and taxonomy,

    M. Xu, L. Wen, J. Liao, H. Wu, K. Ye, and C. Xu, “Auto-scaling approaches for cloud-native applications: A survey and taxonomy,” https://arxiv.org/abs/2507.17128, 2025, arXiv:2507.17128 [cs.DC]

  23. [25]

    SOCK: Rapid task provisioning with serverless-optimized containers,

    E. Oakes, L. Yang, D. Zhou, K. Houck, T. Harter, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “SOCK: Rapid task provisioning with serverless-optimized containers,” in2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, 2018, pp. 57–70

  24. [26]

    Catalyzer: Sub-millisecond startup for serverless computing with initialization-less booting,

    D. Du, T. Yu, Y . Xia, B. Zang, G. Yan, C. Qin, Q. Wu, and H. Chen, “Catalyzer: Sub-millisecond startup for serverless computing with initialization-less booting,” inProceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). ACM, 2020, pp. 467–481

  25. [27]

    Checkpoint-restore for fast container startup in Kuber- netes,

    P. Vasiet al., “Checkpoint-restore for fast container startup in Kuber- netes,” https://proceedingsofieeecloud, 2022, iEEE CLOUD

  26. [28]

    Efficient GPU memory management for large model inference in cloud containers,

    S. Hanet al., “Efficient GPU memory management for large model inference in cloud containers,” https://arxiv.org/abs/2402.01361, 2024, arXiv:2402.01361

  27. [29]

    Note on a method for calculating corrected sums of squares and products,

    B. P. Welford, “Note on a method for calculating corrected sums of squares and products,”Technometrics, vol. 4, no. 3, pp. 419–420, 1962

  28. [30]

    Individual comparisons by ranking methods,

    F. Wilcoxon, “Individual comparisons by ranking methods,”Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945