Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks

Liang Li; Songge Zhang; Wen Wu; Xuemin (Sherman) Shen; Ye Wang

arxiv: 2604.04654 · v1 · submitted 2026-04-06 · 💻 cs.DC

Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks

Songge Zhang , Wen Wu , Liang Li , Ye Wang , Xuemin (Sherman) Shen This is my paper

Pith reviewed 2026-05-10 19:05 UTC · model grok-4.3

classification 💻 cs.DC

keywords LEO satellite networkscollaborative LLM inferencemodel splittingactivation compressionpipeline parallelisminference delaycommunication overheadA-star optimization

0 comments

The pith

Splitting large language models across multiple LEO satellites with adaptive compression and pipeline parallelism cuts inference delay by up to 42% and communication overhead by up to 71% while keeping accuracy loss under 1%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that large language models become practical for LEO satellite networks by dividing each model into sub-models placed on different satellites that exchange intermediate activations during inference. Pipeline parallelism overlaps sub-model computation with activation transmission to shrink overall delay, and an adaptive compression method limits error buildup across the multiple split stages. The authors turn the problem of choosing split points and compression ratios into a shortest-path search on a directed acyclic graph and solve it with a modified A* algorithm under constraints on memory and accuracy. A reader would care because single satellites lack the memory and speed for full LLMs yet AI on LEO constellations is needed for Earth observation.

Core claim

The authors establish that collaborative LLM inference, achieved by splitting the model across satellites, applying pipeline parallelism to overlap inference and transmission, and using adaptive activation compression to control cumulative errors, when optimized jointly via a graph-based search, delivers up to 42% lower inference delay and 71% lower communication overhead than existing benchmarks while holding accuracy loss below 1%.

What carries the argument

The transformation of joint model splitting and compression-ratio selection into a shortest-path problem on a directed acyclic graph whose edges carry explicit delay costs from each split-compression choice, solved by a modified A* algorithm.

If this is right

LLMs become usable for onboard intelligent Earth observation on memory-limited LEO satellites by distributing computation across the constellation.
Pipeline parallelism reduces total latency by hiding transmission time behind computation when activations are exchanged between satellites.
The graph-search optimizer finds split points and compression ratios that respect onboard memory limits while meeting accuracy targets.
Communication volume drops substantially because only compressed activations, not full model parameters or raw data, travel between satellites.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same splitting-plus-compression pattern could support LLM inference on other distributed platforms with tight per-node memory, such as drone swarms or remote sensor arrays.
Hardware validation on actual satellites would reveal whether modeled delays and topologies hold under real orbital motion and link variability.
Combining the adaptive compression with quantization or other LLM-specific reductions might yield further overhead savings.
The approach implies that model parallelism across space networks can turn a constellation into a single logical inference engine for time-sensitive tasks.

Load-bearing premise

The adaptive activation compression scheme keeps cumulative errors from multi-stage splitting small enough to preserve accuracy, and the modeled communication delays and satellite topologies match real LEO network behavior.

What would settle it

Running the proposed splitting, compression, and pipelining strategy on a real LEO satellite testbed or high-fidelity emulator and checking whether measured delay reductions reach 42%, overhead reductions reach 71%, and accuracy loss stays below 1% or whether errors accumulate faster than predicted.

Figures

Figures reproduced from arXiv: 2604.04654 by Liang Li, Songge Zhang, Wen Wu, Xuemin (Sherman) Shen, Ye Wang.

**Figure 2.** Figure 2: Parallel computing process for model splitting. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 5.** Figure 5: Inference delay under different satellite numbers. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 4.** Figure 4: Inference delay with respect to S2G rates. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 9.** Figure 9: Accuracy performance of LLM on the EuroSAT and [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 8.** Figure 8: Ablation Study of Compression Methods. overhead. 2) Communication Overhead: As shown in [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 10.** Figure 10: Validation accuracy under various LLM splitting [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 12.** Figure 12: Total inference delay under different split strategies. [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

read the original abstract

Low Earth orbit (LEO) satellites play an essential role in intelligent Earth observation by leveraging artificial intelligence models. However, limited onboard memory and excessive inference delay prevent the practical deployment of large language models (LLMs) on a single satellite. In this paper, we propose a communication-efficient collaborative LLM inference scheme for LEO satellite networks. Specifically, the entire LLM is split into multiple sub-models, with each deployed on a satellite, thereby enabling collaborative LLM inference via exchanging intermediate activations between satellites. The proposed scheme also leverages the pipeline parallelism mechanism that overlaps sub-model inference with intermediate activation transmission, thereby reducing LLM inference delay. An adaptive activation compression scheme is designed to mitigate cumulative errors from multi-stage model splitting while preserving inference accuracy. Furthermore, we formulate the LLM inference delay minimization problem by jointly optimizing model splitting and compression ratios under onboard memory and inference accuracy constraints. The problem is transformed into a shortest-path search problem over a directed acyclic graph that edge weights explicitly quantify the inference delay induced by model splitting and compression strategies, which is solved via a modified A Star-based search algorithm. Extensive simulation results indicate that the proposed solution can reduce inference delay by up to 42% and communication overhead by up to 71% compared to state-of-the-art benchmarks, while maintaining the inference accuracy loss of less than 1%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper ports model splitting, pipeline overlap, and adaptive activation compression to LLM inference across LEO satellites, solved via A* on a DAG, and reports 42% delay and 71% overhead cuts in simulation, but the network delay model looks too static.

read the letter

The main thing to know is that the authors split an LLM into sub-models placed on separate LEO satellites, overlap sub-model inference with activation transmission through pipeline parallelism, add adaptive compression to control error accumulation across stages, and jointly optimize split points plus compression ratios by turning the delay minimization into a shortest-path problem on a DAG solved with modified A* search. Simulations show up to 42% lower inference delay and 71% less communication overhead versus benchmarks while keeping accuracy loss below 1% under memory and accuracy constraints. They do a clean job of the formulation. The edge weights directly encode inference time plus compressed transmission cost for each choice, and the A* approach finds feasible configurations efficiently without needing to enumerate everything. The adaptive compression step is a practical response to the cumulative quantization error that multi-stage splitting creates. The soft spot is the evaluation. The underlying per-link delay and connectivity model does not appear to incorporate time-varying inter-satellite distances, handovers, or Doppler-induced rate changes that are central to real LEO constellations. If the simulations rely on static or averaged links, the reported pipeline gains and the compression ratios that stay under 1% accuracy loss become harder to trust. The paper does not show error bars or detailed baseline descriptions in the abstract, which leaves the percentage improvements difficult to judge. This work is aimed at researchers focused on distributed inference in satellite or other severely constrained networks. A reader working on edge LLM deployment or space-based AI would find the optimization framing and compression handling useful. It deserves a serious referee because the core reduction to shortest-path search is formal and the claims are concrete enough to test, even if the network assumptions need closer scrutiny.

Referee Report

3 major / 2 minor

Summary. The paper proposes a collaborative LLM inference framework for LEO satellite networks in which the model is partitioned across satellites, pipeline parallelism overlaps sub-model execution with activation transmission, and an adaptive compression scheme controls cumulative quantization error. Delay minimization is formulated as a joint optimization over splitting points and per-stage compression ratios, reduced to a shortest-path problem on a DAG whose edge weights encode inference and communication costs, and solved with a modified A* algorithm. Simulations report up to 42% lower inference delay and 71% lower communication overhead versus benchmarks while keeping accuracy loss below 1%.

Significance. If the reported gains hold under realistic orbital dynamics, the work would provide a practical route to running large models on memory-limited satellites by distributing both computation and communication. The explicit DAG encoding of pipeline overlap and compression trade-offs is a clean technical contribution that makes the joint optimization tractable.

major comments (3)

[§5] §5 (Simulation Results) and the LEO topology model description: the reported 42% delay and 71% overhead reductions rest on a communication-delay model whose fidelity to time-varying LEO effects (changing inter-satellite distances, handovers, Doppler-induced rate fluctuations) is not demonstrated. If the model uses static or orbit-averaged link parameters, both the pipeline-overlap gains and the feasibility of the chosen compression ratios become overstated.
[method description preceding §4] Adaptive activation compression section (method description preceding §4): the scheme is asserted to mitigate cumulative errors from multi-stage splitting, yet the paper supplies only end-to-end accuracy figures without error-propagation analysis, ablation on per-stage quantization, or bounds showing that the <1% loss remains stable when pipeline stages experience network-induced retransmissions or variable latency.
[§4] §4 (Problem Formulation): the reduction to a static shortest-path instance solved once by modified A* assumes fixed edge weights. Because LEO connectivity is inherently time-varying, a path computed at the start of inference may cease to be optimal or even feasible mid-execution, weakening the central delay-minimization claim.

minor comments (2)

[abstract and §5] The abstract and §5 refer to “state-of-the-art benchmarks” without naming the exact baselines or their splitting/compression strategies; this should be stated explicitly when the performance numbers are introduced.
[§3 and §4] Notation for compression ratios and activation sizes is introduced without a consolidated table; a single reference table would improve readability when the DAG edge-weight definitions are presented.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help improve the clarity and rigor of our work. We address each major comment point by point below, indicating planned revisions where the manuscript requires strengthening.

read point-by-point responses

Referee: [§5] §5 (Simulation Results) and the LEO topology model description: the reported 42% delay and 71% overhead reductions rest on a communication-delay model whose fidelity to time-varying LEO effects (changing inter-satellite distances, handovers, Doppler-induced rate fluctuations) is not demonstrated. If the model uses static or orbit-averaged link parameters, both the pipeline-overlap gains and the feasibility of the chosen compression ratios become overstated.

Authors: We acknowledge that the LEO topology model in the manuscript relies on orbit-averaged link parameters derived from standard orbital mechanics to represent average inter-satellite distances and rates. This simplification is common in initial satellite-network studies to focus on the optimization framework. We agree that explicit demonstration of robustness under full time-varying dynamics (handovers, Doppler fluctuations) would strengthen the claims. In the revised manuscript we will expand the topology-model description to detail the averaging procedure, add a sensitivity analysis subsection in §5 that incorporates dynamic link variations, and report additional simulation results under time-varying conditions to quantify any degradation in the reported gains. revision: yes
Referee: [method description preceding §4] Adaptive activation compression section (method description preceding §4): the scheme is asserted to mitigate cumulative errors from multi-stage splitting, yet the paper supplies only end-to-end accuracy figures without error-propagation analysis, ablation on per-stage quantization, or bounds showing that the <1% loss remains stable when pipeline stages experience network-induced retransmissions or variable latency.

Authors: The adaptive compression scheme selects per-stage ratios to control cumulative quantization error, but we agree that the current presentation provides only aggregate accuracy results. We will revise the method section to include (i) a brief error-propagation analysis deriving an upper bound on accumulated error under the assumed pipeline, (ii) an ablation study isolating per-stage quantization effects, and (iii) additional experiments that inject simulated retransmissions and latency jitter to verify that accuracy loss remains below 1%. These additions will be placed before §4 and referenced in the simulation results. revision: yes
Referee: [§4] §4 (Problem Formulation): the reduction to a static shortest-path instance solved once by modified A* assumes fixed edge weights. Because LEO connectivity is inherently time-varying, a path computed at the start of inference may cease to be optimal or even feasible mid-execution, weakening the central delay-minimization claim.

Authors: The current formulation computes the splitting and compression solution once using the network state at inference start. We agree that a purely static solution is insufficient for time-varying LEO topologies. In the revision we will explicitly state in §4 that the modified A* search is re-executed periodically or upon detection of significant link-quality changes (e.g., handovers), and we will quantify the re-optimization overhead relative to inference latency. This adaptive re-optimization strategy preserves the tractability of the DAG formulation while addressing mid-execution topology shifts. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper transforms the joint optimization of model splitting points and compression ratios into a standard shortest-path problem on a DAG whose edge weights are defined directly from the per-stage inference and transmission delays; this is solved by a modified A* algorithm. No step reduces a claimed prediction or first-principles result to a fitted parameter, self-defined quantity, or load-bearing self-citation. The adaptive compression scheme is presented as an explicit design choice to bound cumulative quantization error, and the reported simulation gains are empirical outcomes under the stated model rather than tautological outputs of the formulation itself. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about satellite communication delays and network topology plus the effectiveness of the proposed compression in controlling error accumulation; no explicit free parameters or invented entities are detailed in the abstract.

axioms (2)

domain assumption Satellite network topologies and inter-satellite communication delays can be accurately modeled for delay minimization.
Invoked when formulating the LLM inference delay minimization problem and transforming it into a graph search.
domain assumption Adaptive compression ratios can be chosen to bound cumulative activation errors below a threshold that keeps overall accuracy loss under 1%.
Central to the adaptive activation compression scheme and the accuracy constraint in the optimization.

pith-pipeline@v0.9.0 · 5539 in / 1312 out tokens · 55212 ms · 2026-05-10T19:05:37.095034+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

formulate the LLM inference delay minimization problem by jointly optimizing model splitting and compression ratios... transformed into a shortest-path search problem over a directed acyclic graph... solved via a modified A Star-based search algorithm
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

adaptive activation compression scheme... Gumbel-mask... quantization... entropy-based coding

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SpaceMoE: Towards Orbital General Intelligence with Distributed Mixture-of-Experts Inference
cs.NI 2026-05 unverdicted novelty 4.0

SpaceMoE is presented as a new paradigm for distributed MoE inference in satellite networks, with satellite-specific constraints reshaping expert placement, selection, and hidden-state routing.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 1 Pith paper

[1]

Holistic network virtualization and pervasive network intelligence for 6G,

X. Shen, J. Gao, W. Wu, M. Li, C. Zhou, and W. Zhuang, “Holistic network virtualization and pervasive network intelligence for 6G,”IEEE Commun. Surveys Tuts., vol. 24, no. 1, pp. 1–30, 2022

work page 2022
[2]

Generative AI agents with large language model for satellite networks via a mixture of experts transmission,

R. Zhang, H. Du, Y . Liu, D. Niyato, J. Kang, Z. Xiong, A. Jamalipour, and D. In Kim, “Generative AI agents with large language model for satellite networks via a mixture of experts transmission,”IEEE J. Sel. Areas Commun., vol. 42, no. 12, pp. 3581–3596, 2024

work page 2024
[3]

LEOEdge: A satellite-ground cooperation platform for the AI inference in large LEO constellation,

S. Yao, Y . Lin, M. Wang, K. Xu, M. Xu, C. Xu, and H. Zhang, “LEOEdge: A satellite-ground cooperation platform for the AI inference in large LEO constellation,”IEEE J. Sel. Areas Commun., vol. 43, no. 1, pp. 36–50, 2025

work page 2025
[4]

Satellite federated edge learning: Architecture design and convergence analysis,

Y . Shi, L. Zeng, J. Zhu, Y . Zhou, C. Jiang, and K. B. Letaief, “Satellite federated edge learning: Architecture design and convergence analysis,” IEEE Trans. Wireless Commun., vol. 23, no. 10, pp. 15 212–15 229, 2024

work page 2024
[5]

Enhancing remote sensing image scene classification with satellite-terrestrial collaboration and attention-aware transmission policy,

A. Lu, Y . Hu, Z. Cao, J. Liu, L. Li, and Z. Li, “Enhancing remote sensing image scene classification with satellite-terrestrial collaboration and attention-aware transmission policy,”IEEE Trans. Mobile Comput., vol. 24, no. 5, pp. 4496–4509, 2025

work page 2025
[6]

AI-native network slicing for 6G networks,

W. Wu, C. Zhou, M. Li, H. Wu, H. Zhou, N. Zhang, X. S. Shen, and W. Zhuang, “AI-native network slicing for 6G networks,”IEEE Wireless Commun., vol. 29, no. 1, pp. 96–103, 2022

work page 2022
[7]

Efficient federated learning for modern NLP,

D. Cai, Y . Wu, S. Wang, F. X. Lin, and M. Xu, “Efficient federated learning for modern NLP,” inProc. ACM Mobicom, Oct. 2023, pp. 1– 16

work page 2023
[8]

DeViT: Decomposing vision transformers for collaborative inference in edge devices,

G. Xu, Z. Hao, Y . Luo, H. Hu, J. An, and S. Mao, “DeViT: Decomposing vision transformers for collaborative inference in edge devices,”IEEE Trans. Mobile Comput., vol. 23, no. 5, pp. 5917–5932, Sep. 2024

work page 2024
[9]

Dual vision transformer,

T. Yao, Y . Li, Y . Pan, Y . Wang, X. Zhang, and T. Mei, “Dual vision transformer,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 9, pp. 10 870–10 882, Apr. 2023

work page 2023
[10]

Minilm: deep self-attention distillation for task-agnostic compression of pre- trained transformers,

W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, “Minilm: deep self-attention distillation for task-agnostic compression of pre- trained transformers,” inProc. NeurIPS, 2020, pp. 5776–5788

work page 2020
[11]

Movement pruning: Adaptive sparsity by fine-tuning,

V . Sanh, T. Wolf, and A. Rush, “Movement pruning: Adaptive sparsity by fine-tuning,” inProc. NeurIPS, 2020, pp. 20 378–20 389

work page 2020
[12]

Optq: Accurate quantization for generative pre-trained transformers,

E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Optq: Accurate quantization for generative pre-trained transformers,” inProc. ICLR, 2023, pp. 1–16

work page 2023
[13]

Split learning over wireless networks: Parallel design and resource management,

W. Wu, M. Li, K. Qu, C. Zhou, X. Shen, W. Zhuang, X. Li, and W. Shi, “Split learning over wireless networks: Parallel design and resource management,”IEEE J. Sel. Areas Commun., vol. 41, no. 4, pp. 1051– 1066, 2023

work page 2023
[14]

Ftpipehd: A fault-tolerant pipeline-parallel distributed training approach for het- erogeneous edge devices,

Y . Chen, Q. Yang, S. He, Z. Shi, J. Chen, and M. Guizani, “Ftpipehd: A fault-tolerant pipeline-parallel distributed training approach for het- erogeneous edge devices,”IEEE Trans. Mobile Comput., vol. 23, no. 4, pp. 3200–3212, 2024

work page 2024
[15]

Efficient model training in edge networks with hierarchical split learning,

S. Zhang, W. Wu, L. Song, and X. Shen, “Efficient model training in edge networks with hierarchical split learning,”IEEE Trans. Mobile Comput., vol. 24, no. 10, pp. 10 214–10 229, 2025

work page 2025
[16]

Edge-assisted multi-layer offloading optimization of leo satellite-terrestrial integrated networks,

X. Cao, B. Yang, Y . Shen, C. Yuen, Y . Zhang, Z. Han, H. V . Poor, and L. Hanzo, “Edge-assisted multi-layer offloading optimization of leo satellite-terrestrial integrated networks,”IEEE J. Sel. Areas Commun., vol. 41, no. 2, pp. 381–398, 2023

work page 2023
[17]

The Sat-1 mission: The first on-board deep neural network demonstrator for satellite earth observation,

G. Giuffrida, L. Fanucci, G. Meoni, M. Bati ˇc, L. Buckley, A. Dunne, C. van Dijk, M. Esposito, J. Hefele, N. Vercruyssen, G. Furano, M. Pastena, and J. Aschbacher, “The Sat-1 mission: The first on-board deep neural network demonstrator for satellite earth observation,”IEEE Trans. Geosci. Remote Sens., vol. 60, no. 1, pp. 1–14, 2022

work page 2022
[18]

Efficient FPGA-accelerated convolutional neural networks for cloud detection on cubesats,

A. Cratere, M. S. Farissi, A. Carbone, M. Asciolla, M. Rizzi, F. Dell’Olio, A. Nascetti, and D. Spiller, “Efficient FPGA-accelerated convolutional neural networks for cloud detection on cubesats,” IEEE J. Miniatur. Air Space Syst., Jan. 15 2025, Early Access, doi:10.1109/TMC.2025.3569407

work page doi:10.1109/tmc.2025.3569407 2025
[19]

Object knowledge distillation for joint detection and tracking in satellite videos,

W. Zhang, W. Deng, Z. Cui, J. Liu, and L. Jiao, “Object knowledge distillation for joint detection and tracking in satellite videos,”IEEE Geosci. Remote Sens. Lett., vol. 62, no. 1, pp. 1–13, 2024

work page 2024
[20]

An efficient privacy-aware split learning framework for satellite communications,

J. Sun, C. Wu, S. Mumtaz, J. Tao, M. Cao, M. Wang, and V . Fras- colla, “An efficient privacy-aware split learning framework for satellite communications,”IEEE J. Sel. Areas Commun., vol. 42, no. 12, pp. 3355–3365, 2024

work page 2024
[21]

AI-assisted network-slicing based next-generation wireless networks,

X. Shen, J. Gao, W. Wu, K. Lyu, M. Li, W. Zhuang, X. Li, and J. Rao, “AI-assisted network-slicing based next-generation wireless networks,” IEEE Open J. Veh. Technol, vol. 1, pp. 45–66, 2020

work page 2020
[22]

Collaborative inference in DNN-based satellite systems with dynamic task streams,

J. Guan, Q. Zhang, I. Murturi, P. K. Donta, S. Dustdar, and S. Wang, “Collaborative inference in DNN-based satellite systems with dynamic task streams,” inProc. IEEE ICC, 2024, pp. 3803–3808

work page 2024
[23]

Towards space intelligence: Adaptive scheduling of satellite-ground collaborative model inference with space edge computing,

Y . Wang, K. Zhao, X. Zhang, and X. Chen, “Towards space intelligence: Adaptive scheduling of satellite-ground collaborative model inference with space edge computing,” inProc. IEEE INFOCOM WKSHPS, 2024, pp. 1–6

work page 2024
[24]

HiTDL: High- throughput deep learning inference at the hybrid mobile edge,

J. Wu, L. Wang, Q. Pei, X. Cui, F. Liu, and T. Yang, “HiTDL: High- throughput deep learning inference at the hybrid mobile edge,”IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 12, pp. 4499–4514, 2022

work page 2022
[25]

Throughput maximization of delay-aware DNN inference in edge computing by exploring DNN model partitioning and inference parallelism,

J. Li, W. Liang, Y . Li, Z. Xu, X. Jia, and S. Guo, “Throughput maximization of delay-aware DNN inference in edge computing by exploring DNN model partitioning and inference parallelism,”IEEE J. Sel. Areas Commun., vol. 22, no. 5, pp. 3017–3030, 2023

work page 2023
[26]

Accelerating end- cloud collaborative inference via near bubble-free pipeline optimization,

L. Gao, J. Liu, H. Xu, S. Xu, Q. Ma, and L. Huang, “Accelerating end- cloud collaborative inference via near bubble-free pipeline optimization,” inProc. IEEE INFOCOM, 2025, pp. 1–10

work page 2025
[27]

Edge-assisted multi-layer offloading optimization of LEO satellite-terrestrial integrated networks,

X. Cao, B. Yang, Y . Shen, C. Yuen, Y . Zhang, Z. Han, H. V . Poor, and L. Hanzo, “Edge-assisted multi-layer offloading optimization of LEO satellite-terrestrial integrated networks,”IEEE J. Sel. Areas Commun., vol. 41, no. 2, pp. 381–398, 2023

work page 2023
[28]

Satellite- terrestrial integrated edge computing networks: Architecture, challenges, and open issues,

R. Xie, Q. Tang, Q. Wang, X. Liu, F. R. Yu, and T. Huang, “Satellite- terrestrial integrated edge computing networks: Architecture, challenges, and open issues,”IEEE Network, vol. 34, no. 3, pp. 224–231, 2020

work page 2020
[29]

Pipedream: Generalized pipeline parallelism for dnn training,

D. Narayanan, A. Harlap, A. Phanishayee, V . Seshadri, N. R. Devanur, G. R. Ganger, P. B. Gibbons, and M. Zaharia, “Pipedream: Generalized pipeline parallelism for dnn training,” inProc. ACM SOSP, 2019, pp. 1–15

work page 2019
[30]

HiveMind: Towards cellular native machine learning model splitting,

S. Wang, X. Zhang, H. Uchiyama, and H. Matsuda, “HiveMind: Towards cellular native machine learning model splitting,”IEEE J. Sel. Areas Commun., vol. 40, no. 2, pp. 626–640, 2021

work page 2021
[31]

Mobillm: Enabling on-device fine-tuning of billion-sized llms via server-assisted side-tuning,

L. Li, X. Yang, W. Wu, H. Wang, T. Ohtsuki, X. Fu, M. Pan, and X. Shen, “Mobillm: Enabling on-device fine-tuning of billion-sized llms via server-assisted side-tuning,”IEEE J. Sel. Topics Signal Process., Nov. 17 2025, Early Access, doi:10.1109/JSTSP.2025.3633550

work page doi:10.1109/jstsp.2025.3633550 2025
[32]

Reducing communication for split learning by randomized top-k sparsification,

F. Zheng, C. Chen, L. Lyu, and B. Yao, “Reducing communication for split learning by randomized top-k sparsification,” inProc. ACM IJCAI, no. 519, 2023, pp. 4665–4673

work page 2023
[33]

Split fine-tuning for large language models in wireless networks,

S. Zhang, G. Cheng, W. Wu, X. Huang, L. Song, and X. Shen, “Split fine-tuning for large language models in wireless networks,” IEEE J. Sel. Topics Signal Process., Jun. 19 2025, Early Access, doi:10.1109/JSTSP.2025.3581484

work page doi:10.1109/jstsp.2025.3581484 2025
[34]

Spacex to launch 1st space-hardened Nvidia AI GPU on upcoming rideshare mission,

“Spacex to launch 1st space-hardened Nvidia AI GPU on upcoming rideshare mission,” https:// www.space.com/ai-nvidia-gpu-spacex-launch-transporter-11, accessed: 2024-08-14. [Online]. Available: https://www.space.com/ ai-nvidia-gpu-spacex-launch-transporter-11

work page 2024
[35]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inProc. ICLR, 2021, pp. 1–21

work page 2021
[36]

Scaling vision transformers to 22 billion parameters,

M. Dehghani, J. Djolonga, B. Mustafa, P. Padlewski, J. Heek, J. Gilmer, A. P. Steiner, M. Caron, R. Geirhos, I. Alabdulmohsinet al., “Scaling vision transformers to 22 billion parameters,” inProc. PMLR. PMLR, 2023, pp. 7480–7512

work page 2023
[37]

EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification,

P. Helber, B. Bischke, A. Dengel, and D. Borth, “EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 7, pp. 2217–2226, 2019

work page 2019
[38]

Remote sensing image scene classifica- tion: Benchmark and state of the art,

G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classifica- tion: Benchmark and state of the art,”Proc. IEEE, vol. 105, no. 10, pp. 1865–1883, 2017

work page 2017

[1] [1]

Holistic network virtualization and pervasive network intelligence for 6G,

X. Shen, J. Gao, W. Wu, M. Li, C. Zhou, and W. Zhuang, “Holistic network virtualization and pervasive network intelligence for 6G,”IEEE Commun. Surveys Tuts., vol. 24, no. 1, pp. 1–30, 2022

work page 2022

[2] [2]

Generative AI agents with large language model for satellite networks via a mixture of experts transmission,

R. Zhang, H. Du, Y . Liu, D. Niyato, J. Kang, Z. Xiong, A. Jamalipour, and D. In Kim, “Generative AI agents with large language model for satellite networks via a mixture of experts transmission,”IEEE J. Sel. Areas Commun., vol. 42, no. 12, pp. 3581–3596, 2024

work page 2024

[3] [3]

LEOEdge: A satellite-ground cooperation platform for the AI inference in large LEO constellation,

S. Yao, Y . Lin, M. Wang, K. Xu, M. Xu, C. Xu, and H. Zhang, “LEOEdge: A satellite-ground cooperation platform for the AI inference in large LEO constellation,”IEEE J. Sel. Areas Commun., vol. 43, no. 1, pp. 36–50, 2025

work page 2025

[4] [4]

Satellite federated edge learning: Architecture design and convergence analysis,

Y . Shi, L. Zeng, J. Zhu, Y . Zhou, C. Jiang, and K. B. Letaief, “Satellite federated edge learning: Architecture design and convergence analysis,” IEEE Trans. Wireless Commun., vol. 23, no. 10, pp. 15 212–15 229, 2024

work page 2024

[5] [5]

Enhancing remote sensing image scene classification with satellite-terrestrial collaboration and attention-aware transmission policy,

A. Lu, Y . Hu, Z. Cao, J. Liu, L. Li, and Z. Li, “Enhancing remote sensing image scene classification with satellite-terrestrial collaboration and attention-aware transmission policy,”IEEE Trans. Mobile Comput., vol. 24, no. 5, pp. 4496–4509, 2025

work page 2025

[6] [6]

AI-native network slicing for 6G networks,

W. Wu, C. Zhou, M. Li, H. Wu, H. Zhou, N. Zhang, X. S. Shen, and W. Zhuang, “AI-native network slicing for 6G networks,”IEEE Wireless Commun., vol. 29, no. 1, pp. 96–103, 2022

work page 2022

[7] [7]

Efficient federated learning for modern NLP,

D. Cai, Y . Wu, S. Wang, F. X. Lin, and M. Xu, “Efficient federated learning for modern NLP,” inProc. ACM Mobicom, Oct. 2023, pp. 1– 16

work page 2023

[8] [8]

DeViT: Decomposing vision transformers for collaborative inference in edge devices,

G. Xu, Z. Hao, Y . Luo, H. Hu, J. An, and S. Mao, “DeViT: Decomposing vision transformers for collaborative inference in edge devices,”IEEE Trans. Mobile Comput., vol. 23, no. 5, pp. 5917–5932, Sep. 2024

work page 2024

[9] [9]

Dual vision transformer,

T. Yao, Y . Li, Y . Pan, Y . Wang, X. Zhang, and T. Mei, “Dual vision transformer,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 9, pp. 10 870–10 882, Apr. 2023

work page 2023

[10] [10]

Minilm: deep self-attention distillation for task-agnostic compression of pre- trained transformers,

W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, “Minilm: deep self-attention distillation for task-agnostic compression of pre- trained transformers,” inProc. NeurIPS, 2020, pp. 5776–5788

work page 2020

[11] [11]

Movement pruning: Adaptive sparsity by fine-tuning,

V . Sanh, T. Wolf, and A. Rush, “Movement pruning: Adaptive sparsity by fine-tuning,” inProc. NeurIPS, 2020, pp. 20 378–20 389

work page 2020

[12] [12]

Optq: Accurate quantization for generative pre-trained transformers,

E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Optq: Accurate quantization for generative pre-trained transformers,” inProc. ICLR, 2023, pp. 1–16

work page 2023

[13] [13]

Split learning over wireless networks: Parallel design and resource management,

W. Wu, M. Li, K. Qu, C. Zhou, X. Shen, W. Zhuang, X. Li, and W. Shi, “Split learning over wireless networks: Parallel design and resource management,”IEEE J. Sel. Areas Commun., vol. 41, no. 4, pp. 1051– 1066, 2023

work page 2023

[14] [14]

Ftpipehd: A fault-tolerant pipeline-parallel distributed training approach for het- erogeneous edge devices,

Y . Chen, Q. Yang, S. He, Z. Shi, J. Chen, and M. Guizani, “Ftpipehd: A fault-tolerant pipeline-parallel distributed training approach for het- erogeneous edge devices,”IEEE Trans. Mobile Comput., vol. 23, no. 4, pp. 3200–3212, 2024

work page 2024

[15] [15]

Efficient model training in edge networks with hierarchical split learning,

S. Zhang, W. Wu, L. Song, and X. Shen, “Efficient model training in edge networks with hierarchical split learning,”IEEE Trans. Mobile Comput., vol. 24, no. 10, pp. 10 214–10 229, 2025

work page 2025

[16] [16]

Edge-assisted multi-layer offloading optimization of leo satellite-terrestrial integrated networks,

X. Cao, B. Yang, Y . Shen, C. Yuen, Y . Zhang, Z. Han, H. V . Poor, and L. Hanzo, “Edge-assisted multi-layer offloading optimization of leo satellite-terrestrial integrated networks,”IEEE J. Sel. Areas Commun., vol. 41, no. 2, pp. 381–398, 2023

work page 2023

[17] [17]

The Sat-1 mission: The first on-board deep neural network demonstrator for satellite earth observation,

G. Giuffrida, L. Fanucci, G. Meoni, M. Bati ˇc, L. Buckley, A. Dunne, C. van Dijk, M. Esposito, J. Hefele, N. Vercruyssen, G. Furano, M. Pastena, and J. Aschbacher, “The Sat-1 mission: The first on-board deep neural network demonstrator for satellite earth observation,”IEEE Trans. Geosci. Remote Sens., vol. 60, no. 1, pp. 1–14, 2022

work page 2022

[18] [18]

Efficient FPGA-accelerated convolutional neural networks for cloud detection on cubesats,

A. Cratere, M. S. Farissi, A. Carbone, M. Asciolla, M. Rizzi, F. Dell’Olio, A. Nascetti, and D. Spiller, “Efficient FPGA-accelerated convolutional neural networks for cloud detection on cubesats,” IEEE J. Miniatur. Air Space Syst., Jan. 15 2025, Early Access, doi:10.1109/TMC.2025.3569407

work page doi:10.1109/tmc.2025.3569407 2025

[19] [19]

Object knowledge distillation for joint detection and tracking in satellite videos,

W. Zhang, W. Deng, Z. Cui, J. Liu, and L. Jiao, “Object knowledge distillation for joint detection and tracking in satellite videos,”IEEE Geosci. Remote Sens. Lett., vol. 62, no. 1, pp. 1–13, 2024

work page 2024

[20] [20]

An efficient privacy-aware split learning framework for satellite communications,

J. Sun, C. Wu, S. Mumtaz, J. Tao, M. Cao, M. Wang, and V . Fras- colla, “An efficient privacy-aware split learning framework for satellite communications,”IEEE J. Sel. Areas Commun., vol. 42, no. 12, pp. 3355–3365, 2024

work page 2024

[21] [21]

AI-assisted network-slicing based next-generation wireless networks,

X. Shen, J. Gao, W. Wu, K. Lyu, M. Li, W. Zhuang, X. Li, and J. Rao, “AI-assisted network-slicing based next-generation wireless networks,” IEEE Open J. Veh. Technol, vol. 1, pp. 45–66, 2020

work page 2020

[22] [22]

Collaborative inference in DNN-based satellite systems with dynamic task streams,

J. Guan, Q. Zhang, I. Murturi, P. K. Donta, S. Dustdar, and S. Wang, “Collaborative inference in DNN-based satellite systems with dynamic task streams,” inProc. IEEE ICC, 2024, pp. 3803–3808

work page 2024

[23] [23]

Towards space intelligence: Adaptive scheduling of satellite-ground collaborative model inference with space edge computing,

Y . Wang, K. Zhao, X. Zhang, and X. Chen, “Towards space intelligence: Adaptive scheduling of satellite-ground collaborative model inference with space edge computing,” inProc. IEEE INFOCOM WKSHPS, 2024, pp. 1–6

work page 2024

[24] [24]

HiTDL: High- throughput deep learning inference at the hybrid mobile edge,

J. Wu, L. Wang, Q. Pei, X. Cui, F. Liu, and T. Yang, “HiTDL: High- throughput deep learning inference at the hybrid mobile edge,”IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 12, pp. 4499–4514, 2022

work page 2022

[25] [25]

Throughput maximization of delay-aware DNN inference in edge computing by exploring DNN model partitioning and inference parallelism,

J. Li, W. Liang, Y . Li, Z. Xu, X. Jia, and S. Guo, “Throughput maximization of delay-aware DNN inference in edge computing by exploring DNN model partitioning and inference parallelism,”IEEE J. Sel. Areas Commun., vol. 22, no. 5, pp. 3017–3030, 2023

work page 2023

[26] [26]

Accelerating end- cloud collaborative inference via near bubble-free pipeline optimization,

L. Gao, J. Liu, H. Xu, S. Xu, Q. Ma, and L. Huang, “Accelerating end- cloud collaborative inference via near bubble-free pipeline optimization,” inProc. IEEE INFOCOM, 2025, pp. 1–10

work page 2025

[27] [27]

Edge-assisted multi-layer offloading optimization of LEO satellite-terrestrial integrated networks,

X. Cao, B. Yang, Y . Shen, C. Yuen, Y . Zhang, Z. Han, H. V . Poor, and L. Hanzo, “Edge-assisted multi-layer offloading optimization of LEO satellite-terrestrial integrated networks,”IEEE J. Sel. Areas Commun., vol. 41, no. 2, pp. 381–398, 2023

work page 2023

[28] [28]

Satellite- terrestrial integrated edge computing networks: Architecture, challenges, and open issues,

R. Xie, Q. Tang, Q. Wang, X. Liu, F. R. Yu, and T. Huang, “Satellite- terrestrial integrated edge computing networks: Architecture, challenges, and open issues,”IEEE Network, vol. 34, no. 3, pp. 224–231, 2020

work page 2020

[29] [29]

Pipedream: Generalized pipeline parallelism for dnn training,

D. Narayanan, A. Harlap, A. Phanishayee, V . Seshadri, N. R. Devanur, G. R. Ganger, P. B. Gibbons, and M. Zaharia, “Pipedream: Generalized pipeline parallelism for dnn training,” inProc. ACM SOSP, 2019, pp. 1–15

work page 2019

[30] [30]

HiveMind: Towards cellular native machine learning model splitting,

S. Wang, X. Zhang, H. Uchiyama, and H. Matsuda, “HiveMind: Towards cellular native machine learning model splitting,”IEEE J. Sel. Areas Commun., vol. 40, no. 2, pp. 626–640, 2021

work page 2021

[31] [31]

Mobillm: Enabling on-device fine-tuning of billion-sized llms via server-assisted side-tuning,

L. Li, X. Yang, W. Wu, H. Wang, T. Ohtsuki, X. Fu, M. Pan, and X. Shen, “Mobillm: Enabling on-device fine-tuning of billion-sized llms via server-assisted side-tuning,”IEEE J. Sel. Topics Signal Process., Nov. 17 2025, Early Access, doi:10.1109/JSTSP.2025.3633550

work page doi:10.1109/jstsp.2025.3633550 2025

[32] [32]

Reducing communication for split learning by randomized top-k sparsification,

F. Zheng, C. Chen, L. Lyu, and B. Yao, “Reducing communication for split learning by randomized top-k sparsification,” inProc. ACM IJCAI, no. 519, 2023, pp. 4665–4673

work page 2023

[33] [33]

Split fine-tuning for large language models in wireless networks,

S. Zhang, G. Cheng, W. Wu, X. Huang, L. Song, and X. Shen, “Split fine-tuning for large language models in wireless networks,” IEEE J. Sel. Topics Signal Process., Jun. 19 2025, Early Access, doi:10.1109/JSTSP.2025.3581484

work page doi:10.1109/jstsp.2025.3581484 2025

[34] [34]

Spacex to launch 1st space-hardened Nvidia AI GPU on upcoming rideshare mission,

“Spacex to launch 1st space-hardened Nvidia AI GPU on upcoming rideshare mission,” https:// www.space.com/ai-nvidia-gpu-spacex-launch-transporter-11, accessed: 2024-08-14. [Online]. Available: https://www.space.com/ ai-nvidia-gpu-spacex-launch-transporter-11

work page 2024

[35] [35]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inProc. ICLR, 2021, pp. 1–21

work page 2021

[36] [36]

Scaling vision transformers to 22 billion parameters,

M. Dehghani, J. Djolonga, B. Mustafa, P. Padlewski, J. Heek, J. Gilmer, A. P. Steiner, M. Caron, R. Geirhos, I. Alabdulmohsinet al., “Scaling vision transformers to 22 billion parameters,” inProc. PMLR. PMLR, 2023, pp. 7480–7512

work page 2023

[37] [37]

EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification,

P. Helber, B. Bischke, A. Dengel, and D. Borth, “EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 7, pp. 2217–2226, 2019

work page 2019

[38] [38]

Remote sensing image scene classifica- tion: Benchmark and state of the art,

G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classifica- tion: Benchmark and state of the art,”Proc. IEEE, vol. 105, no. 10, pp. 1865–1883, 2017

work page 2017