pith. sign in

arxiv: 2602.13789 · v2 · submitted 2026-02-14 · 💻 cs.DC

Laminar: A Probe-First Scheduling Paradigm with Deterministic Runtime Survival

Pith reviewed 2026-05-15 22:20 UTC · model grok-4.3

classification 💻 cs.DC
keywords decentralized schedulingGPU clustersruntime survivalmemory pressureprobe-firstexascale computingAirlock layer
0
0 comments X

The pith

Laminar keeps control-plane work near constant time in fragmented GPU clusters by using probe-first decisions and a local survival layer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Laminar as a decentralized scheduling approach for exascale GPU clusters where finished jobs leave behind unstable leftover capacity that new tasks must share with long-running work. Existing methods either centralize decisions and pay rising coordination costs or rely on crude memory-kill rules once execution starts. Laminar instead probes first through zone-level probabilistic splitting and local agents so that hot-path overhead stays near constant, then uses its Airlock layer to turn memory pressure into an ordered sequence of suspension, recovery, or reclamation that protects high-value tasks. A sympathetic reader would care because this combination would let clusters run closer to full capacity while giving deterministic guarantees on which workloads survive. The core shift is moving survival logic from host-level heuristics into the scheduler itself.

Core claim

Laminar is a decentralized probe-first, execute-later scheduling paradigm that keeps hot-path control-plane work near O(1) through Zone-level probabilistic flow splitting, bounded in-Zone probing by persistent lightweight agents, and node-local arbitration. It further introduces Airlock, a bounded node-local runtime-survival layer that converts severe memory pressure into an ordered policy of suspension, in-situ recovery, bounded secondary re-addressing, or reclamation. By enforcing priority-ordered survival under pressure, Laminar enables lifecycle-aware scheduling that preserves high-value long-resident work and operates closer to physical saturation without relying on protocol-level over-

What carries the argument

The probe-first scheduling paradigm with Zone-level probabilistic flow splitting, bounded probing by persistent lightweight agents, node-local arbitration, and the Airlock layer that turns memory pressure into priority-ordered suspension or recovery.

If this is right

  • Control-plane overhead stays near O(1) as fragmentation grows instead of scaling with the number of competing tasks.
  • High-value long-resident workloads receive priority preservation when memory pressure triggers the survival policies.
  • Scheduling decisions can push utilization closer to physical saturation without protocol-level overcommitment.
  • Runtime survival becomes an explicit, ordered part of the scheduler rather than an after-the-fact host heuristic.
  • Decentralized operation avoids the coordination and retry amplification costs of centralized or hierarchical alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same probe-first structure could reduce restart overhead in other distributed systems that mix long-running and bursty workloads on shared hardware.
  • If the local arbitration proves stable, it might allow schedulers to expose explicit survival priorities to users without changing application code.
  • Extending the zone-splitting probabilities with workload-type hints could further reduce probe collisions in heterogeneous clusters.
  • Measuring the actual memory-recovery latency of the in-situ recovery path would show whether the bounded guarantees hold under realistic pressure.

Load-bearing premise

Zone-level probabilistic flow splitting together with persistent lightweight agents and node-local arbitration can keep control-plane work near constant and deliver deterministic survival without hidden coordination or retry costs in fragmented exascale GPU clusters.

What would settle it

A trace showing control-plane latency growing faster than constant with increasing numbers of transient tasks, or high-priority long-resident workloads being terminated under memory pressure despite the ordered Airlock policies.

Figures

Figures reproduced from arXiv: 2602.13789 by Zhengyan Chu.

Figure 1
Figure 1. Figure 1: Overall architecture and lifecycle state machine of Laminar. The [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Start success ratio (left) and p99 arrival-to-start latency (right) for four [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Start success ratio (left) and p99 arrival-to-start latency (right) for [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Start success ratio (left) and p99 arrival-to-start latency (right) for [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Control work per successful execution start for Laminar under mixed [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Exp5 runtime-survival behavior at 5,000 nodes and [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

In exascale-oriented GPU clusters, rigid-topology jobs leave behind a fragmented post-landing ecology in which long-resident workloads and highly transient tasks compete for unstable residual capacity. Existing centralized, hierarchical, and local-first decentralized schedulers incur growing coordination and retry-amplification costs in this regime and typically stop their explicit responsibility at execution start, leaving runtime survival to indiscriminate host-level OOM heuristics. We present Laminar, a decentralized probe-first, execute-later scheduling paradigm that keeps hot-path control-plane work near $\mathcal{O}(1)$ through Zone-level probabilistic flow splitting, bounded in-Zone probing by persistent lightweight agents, and node-local arbitration. Laminar further introduces Airlock, a bounded node-local runtime-survival layer that converts severe memory pressure into an ordered policy of suspension, in-situ recovery, bounded secondary re-addressing, or reclamation. By enforcing priority-ordered survival under pressure, Laminar enables lifecycle-aware scheduling that preserves high-value long-resident work and operates closer to physical saturation without relying on protocol-level overcommitment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Laminar, a decentralized probe-first, execute-later scheduling paradigm for exascale GPU clusters. It claims to keep hot-path control-plane work near O(1) via Zone-level probabilistic flow splitting, bounded in-Zone probing by persistent lightweight agents, and node-local arbitration. It further introduces Airlock, a bounded node-local runtime-survival layer that converts severe memory pressure into an ordered policy of suspension, in-situ recovery, bounded secondary re-addressing, or reclamation to enable priority-ordered survival of high-value long-resident workloads in fragmented post-landing ecologies.

Significance. If the central claims hold, the work could meaningfully advance scheduling for exascale GPU clusters by reducing coordination and retry costs while providing deterministic survival without protocol-level overcommitment. The probe-first paradigm and lifecycle-aware Airlock policy address a relevant gap in handling transient vs. long-resident workloads. The design is internally consistent at the descriptive level and introduces no obvious circularity, but the absence of quantitative bounds or measurements leaves the O(1) and survival guarantees as untested proposals.

major comments (2)
  1. [Abstract] Abstract: the claim that control-plane work remains near O(1) is load-bearing for the central contribution, yet no complexity derivation, worst-case bound, or analysis of hidden coordination/retry costs under probabilistic splitting is supplied; the description of zone-level mechanisms and local arbitration remains qualitative.
  2. [Abstract] Abstract: the deterministic survival guarantees of Airlock rest on the ordered policy under memory pressure, but no correctness argument, recovery-time bounds, or evaluation of failure modes (e.g., when in-situ recovery or secondary re-addressing cannot complete) is provided, leaving the claim unsupported.
minor comments (2)
  1. The term 'post-landing ecology' is introduced without a formal definition or reference to prior usage, reducing clarity for readers outside the immediate subfield.
  2. The manuscript would benefit from a dedicated section outlining the system model, assumptions on cluster topology, and failure semantics before presenting the mechanisms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and commit to revisions that strengthen the supporting arguments for the central claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that control-plane work remains near O(1) is load-bearing for the central contribution, yet no complexity derivation, worst-case bound, or analysis of hidden coordination/retry costs under probabilistic splitting is supplied; the description of zone-level mechanisms and local arbitration remains qualitative.

    Authors: We agree that the O(1) claim requires formal support. The manuscript describes the mechanisms (Zone-level probabilistic flow splitting, bounded probing by lightweight agents, and node-local arbitration) but provides only a qualitative argument without explicit complexity derivation or worst-case analysis of coordination/retry costs. In the revision we will add a dedicated analysis subsection deriving expected and worst-case control-plane overhead under the probabilistic model. revision: yes

  2. Referee: [Abstract] Abstract: the deterministic survival guarantees of Airlock rest on the ordered policy under memory pressure, but no correctness argument, recovery-time bounds, or evaluation of failure modes (e.g., when in-situ recovery or secondary re-addressing cannot complete) is provided, leaving the claim unsupported.

    Authors: We acknowledge that the Airlock survival guarantees are currently unsupported by formal arguments. The manuscript outlines the ordered policy (suspension, in-situ recovery, bounded secondary re-addressing, or reclamation) but supplies neither a correctness argument nor recovery-time bounds nor failure-mode analysis. We will revise the relevant section to include a correctness sketch, analytic bounds on recovery steps where possible, and discussion of cases where in-situ recovery or re-addressing cannot complete. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a high-level descriptive design proposal for a decentralized scheduling paradigm and node-local survival layer. It asserts that O(1) control-plane work follows from zone-level probabilistic flow splitting, bounded in-Zone probing by persistent agents, and node-local arbitration, and that deterministic survival follows from the ordered Airlock policy under memory pressure. No equations, quantitative derivations, fitted parameters, or self-citations appear in the text that reduce any claim to its own inputs by construction. The argument remains internally consistent as a mechanism proposal without load-bearing reductions or hidden circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on domain assumptions about cluster fragmentation and the effectiveness of the proposed mechanisms; no free parameters or invented entities with independent evidence are visible in the abstract.

axioms (1)
  • domain assumption Exascale GPU clusters leave behind a fragmented post-landing ecology in which long-resident workloads and transient tasks compete for unstable residual capacity.
    Stated directly in the abstract as the motivating regime.
invented entities (2)
  • Laminar no independent evidence
    purpose: Decentralized probe-first scheduling paradigm
    New system name and architecture introduced to solve the stated problem.
  • Airlock no independent evidence
    purpose: Bounded node-local runtime-survival layer with ordered policy
    New component for converting memory pressure into deterministic survival actions.

pith-pipeline@v0.9.0 · 5471 in / 1338 out tokens · 41243 ms · 2026-05-15T22:20:50.925285+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Heracles: Improving resource efficiency at scale,

    D. Lo, L. Cheng, R. Govindaraju, P. Ranganathan, and C. Kozyrakis, “Heracles: Improving resource efficiency at scale,” inProceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), 2015, pp. 450–462, [DOI Link]

  2. [2]

    etcd: Distributed reliable key- value store,

    Cloud Native Computing Foundation, “etcd: Distributed reliable key- value store,” 2023, [Project Link]

  3. [3]

    Ray: A distributed framework for emerging AI applications,

    P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordanet al., “Ray: A distributed framework for emerging AI applications,” in13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 2018, pp. 561–577, [USENIX Link]

  4. [4]

    Flux: A next- generation resource management framework for large HPC centers,

    D. H. Ahn, J. Garlick, M. Grondona, D. Donofrio, T. R. Scogland, C. Chu, B. Kocoloski, D. L. Lee, and J. S. Vetter, “Flux: A next- generation resource management framework for large HPC centers,” in 2014 43rd International Conference on Parallel Processing Workshops (ICPPW). IEEE, 2014, pp. 9–17, [DOI Link]

  5. [5]

    Large-scale cluster management at Google with Borg,

    A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster management at Google with Borg,” in Proceedings of the Tenth European Conference on Computer Systems (EuroSys), 2015, pp. 1–18, [DOI Link]

  6. [6]

    Borg, Omega, and Kubernetes,

    B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, Omega, and Kubernetes,”ACM Queue, vol. 14, no. 1, pp. 70–93, 2016, [DOI Link]

  7. [7]

    Slurm: Simple Linux utility for resource management,

    A. B. Yoo, M. A. Jette, and M. Grondona, “Slurm: Simple Linux utility for resource management,” inJob Scheduling Strategies for Parallel Processing. Springer, 2003, pp. 44–60, [DOI Link]

  8. [8]

    Omega: flexible, scalable schedulers for large compute clusters,

    M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes, “Omega: flexible, scalable schedulers for large compute clusters,” in Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys), 2013, pp. 351–364, [DOI Link]

  9. [9]

    Mesos: A platform for fine-grained resource sharing in the data center,

    B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica, “Mesos: A platform for fine-grained resource sharing in the data center,” in8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 11), 2011, pp. 295–308, [USENIX Link]

  10. [10]

    Apache Hadoop Y ARN: Yet another resource negotiator,

    V . K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Sethet al., “Apache Hadoop Y ARN: Yet another resource negotiator,” inProceedings of the 4th annual Symposium on Cloud Computing (SoCC), 2013, pp. 1–16, [DOI Link]

  11. [11]

    Fir- mament: fast, centralized cluster scheduling at scale,

    I. Gog, M. Schwarzkopf, A. Gleave, R. N. Watson, and S. Hand, “Fir- mament: fast, centralized cluster scheduling at scale,” in12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016, pp. 99–115, [USENIX Link]

  12. [12]

    Apollo: Scalable and coordinated scheduling for cloud-scale computing,

    E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou, “Apollo: Scalable and coordinated scheduling for cloud-scale computing,” in11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pp. 285–300, [USENIX Link]

  13. [13]

    The power of two choices in randomized load balancing,

    M. Mitzenmacher, “The power of two choices in randomized load balancing,”IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 10, pp. 1094–1104, 2001, [DOI Link]

  14. [14]

    Sparrow: dis- tributed, low latency scheduling,

    K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica, “Sparrow: dis- tributed, low latency scheduling,” inProceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP), 2013, pp. 69–84, [DOI Link]

  15. [15]

    Tiresias: A GPU cluster manager for distributed deep learning,

    J. Gu, M. C. Song, K. G. Shinet al., “Tiresias: A GPU cluster manager for distributed deep learning,” in16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), 2019, pp. 485–500, [USENIX Link]

  16. [16]

    Pollux: Co-adaptive cluster scheduling for goodput- optimized deep learning,

    A. Qiao, S. K. Choe, S. J. Su, P. Zhang, A. Das, C. Baird, A. Modani, T. Petrouet al., “Pollux: Co-adaptive cluster scheduling for goodput- optimized deep learning,” in15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), 2021, pp. 49–65, [USENIX Link]

  17. [17]

    Themis: Fair and efficient GPU cluster scheduling,

    K. Mahajan, A. Balasubramanian, A. Sharma, V . Makkar, and A. Phan- ishayee, “Themis: Fair and efficient GPU cluster scheduling,” in17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), 2020, pp. 289–304, [USENIX Link]

  18. [18]

    Heterogeneity-aware cluster scheduling policies for deep learning workloads,

    D. Narayanan, K. Santhanam, F. Kazhamiaka, A. P. Amarasinghe, and M. Zaharia, “Heterogeneity-aware cluster scheduling policies for deep learning workloads,” in14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), 2020, pp. 481–498, [USENIX Link]