pith. sign in

arxiv: 2606.13329 · v1 · pith:7LLZIVGMnew · submitted 2026-06-11 · 💻 cs.DC

Work Stealing for the 2D-Mesh Topology of Satellite Constellations in Low Earth Orbit

Pith reviewed 2026-06-27 05:42 UTC · model grok-4.3

classification 💻 cs.DC
keywords work stealingsatellite constellationslow earth orbitmesh topologyasynchronous many-taskspace edge computingload balancing
0
0 comments X

The pith

Neighbor-only work stealing in LEO satellite meshes reduces per-attempt latency with growing advantage as size increases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that in the sparse mesh networks formed by LEO satellite constellations, work stealing limited to direct neighbors avoids multi-hop delays and provides a latency benefit that scales with the number of satellites. It shows through modeling that this local strategy becomes increasingly advantageous at larger scales compared to selecting any worker globally. Emulation experiments on an HPC cluster indicate that load balancing quality stays nearly identical, within 2.2 percent of global stealing, for both balanced and irregular workloads. A sympathetic reader would care because this suggests a practical way to adapt high-performance computing task schedulers to emerging space-based compute clusters without needing full connectivity.

Core claim

Restricting work stealing to directly connected neighbors in the 2D-mesh topology of satellite constellations yields a per-attempt latency advantage that grows with constellation size. The neighbor-only strategy performs within approximately 2.2% of global stealing on both balanced and irregular workloads in emulation, indicating that restricting the victim set does not harm load balancing.

What carries the argument

The neighbor-only work stealing strategy, which limits victim selection to directly connected satellites via inter-satellite links.

Load-bearing premise

The HPC cluster emulation with uniform low-latency links accurately represents the load-balancing outcomes from victim selection in real LEO mesh networks that have variable multi-hop latencies.

What would settle it

Direct comparison of task throughput and steal latencies in a real or high-fidelity simulated LEO satellite network with actual inter-satellite link delays would confirm if neighbor-only stealing maintains its performance parity.

Figures

Figures reproduced from arXiv: 2606.13329 by Dorian Chenet, Jonas Posner, Mia Reitz.

Figure 1
Figure 1. Figure 1: 2D mesh topology formed by optical ISLs in a LEO constellation. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of global (left) and neighbor-only (right) stealing on a [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Strong-scaling execution time (seconds, lower is better) for FIB (left) [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Relative performance of neighbor-only versus global stealing, calculated [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

Asynchronous Many-Task (AMT) is a parallel programming model used in High Performance Computing (HPC). An AMT runtime can distribute fine-grained tasks across processing units called workers, through work stealing: when a worker has no tasks left to process, it tries to steal tasks from other workers. Workers are not restricted to a single compute node but can also be distributed across multiple nodes of an HPC cluster. Existing AMT runtimes assume a fully connected network with low, uniform latency and perform global work stealing, selecting another worker at random from all workers in the system. Space Edge Computing (SEC) uses constellations of satellites in Low Earth Orbit (LEO) as distributed compute clusters. Unlike HPC clusters, LEO satellites communicate through inter-satellite links that form a sparse mesh topology. Reaching a distant satellite requires multiple hops, each adding latency. As a step toward adapting AMT to SEC, this paper proposes a neighbor-only work stealing strategy in which workers steal exclusively from directly connected neighbors, avoiding multi-hop communication. An analytical model shows that restricting stealing this way yields a per-attempt latency advantage that grows with constellation size. Preliminary experiments on an HPC cluster with an emulated mesh over uniform low-latency links isolate the effect of victim selection: the neighbor-only strategy performs within ~2.2% of global stealing on both balanced and irregular workloads, indicating that restricting the victim set does not harm load balancing in this setting. Taken together, the experiments suggest that neighbor-only stealing can be on a par with global stealing, and the model suggests that neighbor-only stealing becomes preferable at scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a neighbor-only work stealing strategy for Asynchronous Many-Task (AMT) runtimes on the 2D-mesh topology of LEO satellite constellations. An analytical model is claimed to show a per-attempt latency advantage that grows with constellation size when restricting steals to direct neighbors. Preliminary emulation on an HPC cluster with an emulated mesh using uniform low-latency links reports that the neighbor-only strategy performs within ~2.2% of global stealing on both balanced and irregular workloads, suggesting the restriction does not harm load balancing.

Significance. If the latency advantage and performance parity hold, the work provides a concrete step toward adapting HPC AMT models to space edge computing environments with sparse, multi-hop topologies. The separation of the analytical model from the emulation experiments and the direct measurement approach are positive features that could support falsifiable predictions at larger scales.

major comments (2)
  1. [preliminary experiments / emulation results] Emulation results: the setup applies uniform low-latency links to all pairs, which isolates the victim-selection effect but does not model the multi-hop delays that global steals would incur in a real 2D mesh; this leaves untested whether longer steal completion times would change the rate of imbalance correction and thus the observed ~2.2% gap, which is load-bearing for the claim that neighbor-only performs on a par with global stealing.
  2. [analytical model] Analytical model description: the model is stated to be independent of the experiments and to quantify a latency advantage that grows with size, but the equations, workload assumptions, and derivation are not provided, preventing verification that the advantage is not an artifact of the model construction.
minor comments (1)
  1. Workload definitions and error bars are not reported for the emulation results, which would be needed to assess the statistical significance of the ~2.2% figure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the two major comments point by point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: Emulation results: the setup applies uniform low-latency links to all pairs, which isolates the victim-selection effect but does not model the multi-hop delays that global steals would incur in a real 2D mesh; this leaves untested whether longer steal completion times would change the rate of imbalance correction and thus the observed ~2.2% gap, which is load-bearing for the claim that neighbor-only performs on a par with global stealing.

    Authors: We agree that the emulation deliberately uses uniform low-latency links across all pairs and therefore does not capture the multi-hop latency penalty that global steals would experience on a real 2D mesh. The experimental design was chosen to isolate the effect of restricting the victim set, as stated in the manuscript. This choice means the observed performance parity is demonstrated only under uniform latency; whether the ~2.2% gap would widen, shrink, or reverse when global steals incur additional hops remains untested. In the revised manuscript we will add an explicit limitations paragraph that qualifies the parity claim accordingly and will outline a concrete plan for follow-on experiments that inject per-hop delays. revision: partial

  2. Referee: Analytical model description: the model is stated to be independent of the experiments and to quantify a latency advantage that grows with size, but the equations, workload assumptions, and derivation are not provided, preventing verification that the advantage is not an artifact of the model construction.

    Authors: The manuscript states that an analytical model exists and is independent of the experiments, yet the referee is correct that the full equations, workload assumptions, and derivation steps are not supplied in sufficient detail. We will revise the model section to include the complete mathematical formulation, the precise workload and topology assumptions, and the step-by-step derivation so that readers can reproduce and verify the claimed latency advantage. revision: yes

Circularity Check

0 steps flagged

No circularity: model and measurements are independent of inputs

full rationale

The paper's central claims rest on an analytical model quantifying per-attempt latency advantage (growing with size) and direct emulation measurements showing neighbor-only stealing within 2.2% of global on balanced/irregular workloads. No equations, fitted parameters, or self-citations are shown that would reduce any prediction to its inputs by construction. The emulation isolates victim selection via uniform links and reports measured outcomes rather than fitted results. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on standard assumptions from AMT literature and the domain premise of mesh topology in LEO constellations; no free parameters or invented entities are described.

axioms (1)
  • domain assumption LEO satellite constellations communicate via inter-satellite links forming a sparse 2D mesh with multi-hop latency
    Invoked to motivate the neighbor-only restriction and the latency model

pith-pipeline@v0.9.1-grok · 5830 in / 1189 out tokens · 27015 ms · 2026-06-27T05:42:52.772871+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 18 canonical work pages

  1. [1]

    IEEE Communications Magazine pp

    Garcia-Cabeza, J., Albert-Smet, J., Frias, Z., Mendo, L., Azcoitia, S.A., Yraola, E.: Direct-to-Cell: A First Look into Starlink’s Direct Satellite-to-Device Radio Access Network through Crowdsourced Measurements. IEEE Communications Magazine pp. 1–7 (2025), https://doi.org/10.1109/MCOM.001.2500350 14 Reitz et al

  2. [2]

    In: Euro-Par 2024: Parallel Processing Workshops

    Chenet, D., Lorandel, J., Moy, C., Ramirez-Gallego, S., Alvarez, R.R.: Space Edge Computing for Satellite Systems: Definition and Key Enabling Technolo- gies. In: Euro-Par 2024: Parallel Processing Workshops. Lecture Notes in Com- puter Science, vol. 15386, pp. 399–404. Springer (2024), https://doi.org/10.1007/ 978-3-031-90203-1_48

  3. [3]

    org/10.48550/arXiv.2505.22128

    Mousist, A.D.: Real-Time Blind Defocus Deblurring for Earth Observation: The IMAGIN-e Mission Approach (2025), arXiv preprint arXiv:2505.22128, https://doi. org/10.48550/arXiv.2505.22128

  4. [4]

    Tuto- rial at European Network on High-performance Embedded Architecture and Com- pilation Conference (HiPEAC) (2020), https://doi.org/10.5281/zenodo.8425959

    Fohry, C.: TaPP: An Overview of Task-Based Parallel Programming Models. Tuto- rial at European Network on High-performance Embedded Architecture and Com- pilation Conference (HiPEAC) (2020), https://doi.org/10.5281/zenodo.8425959

  5. [5]

    Journal of the ACM (JACM) 46(5), 720–748 (1999), https://doi.org/10

    Blumofe, R.D., Leiserson, C.E.: Scheduling Multithreaded Computations by Work Stealing. Journal of the ACM (JACM) 46(5), 720–748 (1999), https://doi.org/10. 1145/324133.324234

  6. [6]

    In: Euro-Par 2024: Parallel Processing Workshops

    Reitz, M., Gerhards, B., Hundhausen, J., Fohry, C.: Investigating the Performance Difference of Task Communication via Futures or Side Effects. In: Euro-Par 2024: Parallel Processing Workshops. pp. 255–267. Lecture Notes in Computer Science, Springer (2024), https://doi.org/10.1007/978-3-031-90200-0_21

  7. [7]

    In: International Conference for High Performance Com- puting, Networking, Storage and Analysis (SC)

    Shiina, S., Taura, K.: Itoyori: Reconciling Global Address Space and Global Fork- Join Task Parallelism. In: International Conference for High Performance Com- puting, Networking, Storage and Analysis (SC). pp. 1–15. ACM (2023), https: //doi.org/10.1145/3581784.3607049

  8. [8]

    IEEE Communications Surveys & Tu- torials 18(4), 2442–2473 (2016), https://doi.org/10.1109/COMST.2016.2564990

    Radhakrishnan, R., Edmonson, W.W., Afghah, F., Rodriguez-Osorio, R.M., Pinto, F., Burleigh, S.C.: Survey of inter-satellite communication for small satellite sys- tems: Physical layer to network layer view. IEEE Communications Surveys & Tu- torials 18(4), 2442–2473 (2016), https://doi.org/10.1109/COMST.2016.2564990

  9. [9]

    Indian Journal of Science and Technology 13(6), 712–724 (2020), https://doi.org/10.17485/ijst/2020/v13i06/147998

    Tiwari, G., Chauhan, R.C.S.: A review on inter-satellite links free space optical communication. Indian Journal of Science and Technology 13(6), 712–724 (2020), https://doi.org/10.17485/ijst/2020/v13i06/147998

  10. [10]

    In: IEEE International Conference on Communications (ICC)

    Soret, B., Ravikanti, S., Popovski, P.: Latency and timeliness in multi-hop satellite networks. In: IEEE International Conference on Communications (ICC). pp. 1–6. IEEE (2020), https://doi.org/10.1109/ICC40277.2020.9149009

  11. [11]

    IEEE Transactions on Network and Service Management 20(4), 4815–4830 (2023), https://doi.org/10.1109/TNSM.2023.3274638

    Yang, H., Guo, B., Xue, X., Deng, X., Zhao, Y., Cui, X., Pang, C., Ren, H., Huang, S.: Interruption tolerance strategy for LEO constellation with optical inter-satellite link. IEEE Transactions on Network and Service Management 20(4), 4815–4830 (2023), https://doi.org/10.1109/TNSM.2023.3274638

  12. [12]

    https://www

    MPI Forum: MPI: A Message-Passing Interface Standard Version 5.0. https://www. mpi-forum.org/docs/mpi-5.0/mpi50-report.pdf (2025), accessed: 2026-04-21

  13. [13]

    11354328

    Reitz, M.: Source Code of paper "Work Stealing for the 2D-Mesh Topology of Satel- lite Constellations in Low Earth Orbit" (2026), https://doi.org/10.5281/zenodo. 20643665

  14. [14]

    https://www.top500.org/system/179588 (2018), accessed: 2026- 04-21

    TOP500.org: GOETHE-HLR cluster (MEGW ARE MiriQuid, Xeon Gold 6148, In- finiBand EDR). https://www.top500.org/system/179588 (2018), accessed: 2026- 04-21

  15. [15]

    In: Languages and Compilers for Parallel Computing (LCPC), Lecture Notes in Computer Science, vol

    Olivier, S., Huan, J., Liu, J., Prins, J., Dinan, J., Sadayappan, P., Tseng, C.W.: UTS: An Unbalanced Tree Search Benchmark. In: Languages and Compilers for Parallel Computing (LCPC), Lecture Notes in Computer Science, vol. 4382, pp. 235–250. Springer (2007), https://doi.org/10.1007/978-3-540-72521-3_18

  16. [16]

    IEEE Access 10, 100584–100593 (2022), https://doi.org/10.1109/ ACCESS.2022.3207293

    Wang, Y., Che, J., Wang, N., Liu, L., Wu, N., Zhong, X., Han, X.: Load-Balancing Method for LEO Satellite Edge-Computing Networks Based on the Maximum Flow WS for the 2D-Mesh Topology in SEC 15 of Virtual Links. IEEE Access 10, 100584–100593 (2022), https://doi.org/10.1109/ ACCESS.2022.3207293

  17. [17]

    In: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)

    Denby, B., Lucia, B.: Orbital Edge Computing: Nanosatellite Constellations as a New Class of Computer System. In: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). pp. 939–

  18. [18]

    ACM (2020), https://doi.org/10.1145/3373376.3378473

  19. [19]

    In: Conference on Partitioned Global Address Space Programming Models (PGAS)

    Min, S.J., Iancu, C., Yelick, K.: Hierarchical Work Stealing on Manycore Clusters. In: Conference on Partitioned Global Address Space Programming Models (PGAS). ACM (2011)

  20. [20]

    In: ACM Symposium on Principles and Practice of Parallel Programming (PPoPP)

    Saraswat, V.A., Kambadur, P., Kodali, S., Grove, D., Krishnamoorthy, S.: Lifeline- based Global Load Balancing. In: ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). pp. 201–212. ACM (2011), https://doi.org/10. 1145/1941553.1941582

  21. [21]

    IEEE Internet of Things Journal 3(5), 637–646 (2016), https://doi.org/10.1109/ JIOT.2016.2579198

    Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge Computing: Vision and Challenges. IEEE Internet of Things Journal 3(5), 637–646 (2016), https://doi.org/10.1109/ JIOT.2016.2579198

  22. [22]

    IEEE Communications Surveys & Tutorials 19(4), 2322–2358 (2017), https://doi.org/10.1109/COMST.2017.2745201

    Mao, Y., You, C., Zhang, J., Huang, K., Letaief, K.B.: A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Communications Surveys & Tutorials 19(4), 2322–2358 (2017), https://doi.org/10.1109/COMST.2017.2745201

  23. [23]

    Par- allel Processing Letters (PPL) 23(4), 1340011 (2013), https://doi.org/10.1142/ s0129626413400112

    Shahzad, F., Wittmann, M., Kreutzer, M., Zeiser, T., Hager, G., Wellein, G.: A survey of checkpoint/restart techniques on distributed memory systems. Par- allel Processing Letters (PPL) 23(4), 1340011 (2013), https://doi.org/10.1142/ s0129626413400112

  24. [24]

    Future Generation Computer Systems (FGCS) 105, 119–134 (2020), https://doi.org/10.1016/j.future.2019.11.031

    Posner, J., Reitz, M., Fohry, C.: A Comparison of Application-Level Fault Toler- ance Schemes for Task Pools. Future Generation Computer Systems (FGCS) 105, 119–134 (2020), https://doi.org/10.1016/j.future.2019.11.031

  25. [25]

    Super- vision

    Posner, J., Reitz, M., Fohry, C.: Task-Level Resilience: Checkpointing vs. Super- vision. International Journal of Networking and Computing (IJNC) 12(1), 47–72 (2022), https://doi.org/10.15803/ijnc.12.1_47

  26. [26]

    Journal of Software Engineering and Applications (JSEA) 10(13), 925– 958 (2017), https://doi.org/10.4236/jsea.2017.1013053

    Fohry, C., Bungart, M., Plock, P.: Fault Tolerance for Lifeline-Based Global Load Balancing. Journal of Software Engineering and Applications (JSEA) 10(13), 925– 958 (2017), https://doi.org/10.4236/jsea.2017.1013053

  27. [27]

    In: IEEE International Parallel and Distributed Processing Symposium (IPDPS)

    Kestor, G., Krishnamoorthy, S., Ma, W.: Localized Fault Recovery for Nested Fork-Join Programs. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS). pp. 397–408. IEEE (2017), https://doi.org/10.1109/ipdps. 2017.75

  28. [28]

    In: International Conference on Parallel Processing Workshops (ICPP Workshops)

    Posner, J., Fohry, C.: Transparent Resource Elasticity for Task-Based Cluster Envi- ronments with Work Stealing. In: International Conference on Parallel Processing Workshops (ICPP Workshops). pp. 1–10. ACM (2021), https://doi.org/10.1145/ 3458744.3473361

  29. [29]

    SN Computer Science 6(8) (2025), https://doi.org/10.1007/s42979-025-04405-3

    Posner, J., Ellersiek, T., Bietendorf, N., Huber, D., Schreiber, M., Schulz, M.: Toward Dynamic Resource Management: An Asynchronous Many-Task (AMT) Runtime System leveraging Dynamic Processes with PSets (DPP). SN Computer Science 6(8) (2025), https://doi.org/10.1007/s42979-025-04405-3