SpaceMoE: Towards Orbital General Intelligence with Distributed Mixture-of-Experts Inference

Kaibin Huang; Min Sheng; Qian Chen; Xianhao Chen

arxiv: 2605.16849 · v1 · pith:QLDTSKSHnew · submitted 2026-05-16 · 💻 cs.NI

SpaceMoE: Towards Orbital General Intelligence with Distributed Mixture-of-Experts Inference

Qian Chen , Xianhao Chen , Min Sheng , Kaibin Huang This is my paper

Pith reviewed 2026-05-19 19:32 UTC · model grok-4.3

classification 💻 cs.NI

keywords SpaceMoEmixture-of-expertssatellite networksdistributed inferencelarge language modelsexpert placementon-orbit computationspace AGI

0 comments

The pith

Mixture-of-experts models can be distributed across satellite networks to run large language models despite strict onboard limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that the sparse activation property of mixture-of-experts architectures can be adapted to satellite constellations so that only a small fraction of model parameters are active on any single satellite. This approach is presented as a way to meet tight constraints on memory, compute, and energy while still supporting the kind of large-scale inference needed for future space-based AGI services. The authors argue that three core problems—expert placement, expert selection, and hidden-state transmission—must be solved differently because satellites move, lose battery capacity over time, and face thermal limits. If these adaptations work, on-orbit inference becomes feasible without constant ground-station offloading. The overview also ties the idea to ongoing industrial and standardization efforts in satellite networks.

Core claim

SpaceMoE is a distributed inference paradigm in which the mixture-of-experts structure is mapped onto satellite networks by redesigning expert placement, selection, and hidden-state routing to account for dynamic topology, battery degradation, and thermal constraints, thereby enabling scalable on-orbit LLM inference that would otherwise be blocked by individual satellite resource limits.

What carries the argument

Mixture-of-experts (MoE) architecture with sparse expert activation, extended to satellite networks through topology-aware expert placement, selection, and hidden-state transmission.

If this is right

Only a subset of experts needs to be loaded and executed on any one satellite, directly lowering memory and energy demands.
Routing decisions must be recomputed as satellites move relative to one another, turning network topology into a first-class input to the inference scheduler.
Battery and thermal state become additional costs in the expert-selection objective, so the system can avoid overloading satellites that are already stressed.
Hidden-state exchanges between satellites replace the usual all-to-all communication patterns used in terrestrial MoE training.
Overall system capacity scales with the number of satellites rather than the capacity of any single one.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same placement and routing logic could be reused for other sparse models beyond language, such as vision or multimodal experts, once they are similarly partitioned across orbits.
If the overhead of dynamic re-routing stays low, the approach might also apply to fleets of high-altitude platforms or other mobile edge nodes with changing connectivity.
Long-term battery degradation tracking introduces a new time-scale into model-serving decisions, suggesting that expert assignment could be planned over weeks rather than single orbits.

Load-bearing premise

Satellite-specific factors such as dynamic topology, battery degradation, and thermal limits can be folded into expert placement, selection, and routing without causing prohibitive overhead or loss of model accuracy.

What would settle it

A concrete test would measure whether a prototype that places and routes experts according to current satellite orbit, battery state, and temperature data incurs more than a small fixed percentage increase in end-to-end latency or accuracy drop compared with a static ground-based MoE baseline.

Figures

Figures reproduced from arXiv: 2605.16849 by Kaibin Huang, Min Sheng, Qian Chen, Xianhao Chen.

**Figure 2.** Figure 2: Evolution of space general AI inference architectures: From traditional space inference to SpaceMoE. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Accuracy comparison between similarity-aware expert selection and [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Remaining thermal budget versus time under different radiator [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

As satellite networks evolve to support increasingly diverse services and artificial general intelligence (AGI), large language models (LLMs) are emerging as a critical foundation for future space systems. However, deploying LLMs on satellites is hindered by stringent constraints on onboard memory, computation, and energy. In this context, the mixture-of-experts (MoE) architecture emerges as a promising solution, leveraging sparse expert activation to enable scalable model inference. By harnessing the architectural advantages of MoE, this article provides a comprehensive overview of SpaceMoE, a new paradigm for distributed MoE inference in satellite networks. We first review recent industrial progress and emerging standardization trends that motivate the evolution toward space AGI systems. Then, we introduce the fundamentals and architectural evolution of SpaceMoE. Subsequently, we discuss three fundamental design problems in SpaceMoE, namely expert placement, expert selection, and hidden-state transmission and routing, highlighting how satellite-specific factors such as dynamic topology, battery degradation, and thermal limits fundamentally reshape their solutions. Finally, we outline promising research directions for realizing scalable, efficient, and sustainable on-orbit MoE inference in future satellite networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SpaceMoE is a conceptual overview that flags important satellite constraints for MoE inference but offers no analysis to show the approach stays practical.

read the letter

SpaceMoE is basically a position paper that sketches how mixture-of-experts could work in orbit but does not deliver any new measurements or proofs. The title suggests a step toward orbital general intelligence, yet the content stays at the level of identifying challenges rather than solving them. What stands out is the clear breakdown of three problems that any distributed MoE system in space would face. The authors explain why dynamic satellite links, power limits, and heat management would affect where experts live, which ones activate, and how states move between nodes. They also give a quick tour of current industry efforts in space comms and AI standards. This part is useful because it connects the MoE literature to the specific realities of satellite networks in one place. The paper is honest about the constraints but does not go further. It lists the factors without showing a way to model them or any estimate of the extra cost they would impose on inference speed or accuracy. That leaves the central promise—that MoE can be adapted efficiently—still unproven. The stress test note is right on this point: without some quantitative check, it is hard to know if the added state tracking for batteries and thermal limits would eat up the efficiency gains from sparse activation. Readers working on edge AI or satellite systems might pick this up to get oriented on the open questions. It is not the place to look for ready-to-use techniques or validated claims. Someone looking for a survey of the area would find it light on citations to prior distributed inference work. I think it is worth a serious review. The ideas are coherent and the motivation is grounded in real trends, so referees could help sharpen what needs to be shown next to make the paradigm practical.

Referee Report

1 major / 2 minor

Summary. The paper introduces SpaceMoE as a new paradigm for distributed Mixture-of-Experts inference in satellite networks to enable AGI in space systems. Motivated by industrial progress and standardization trends, it reviews MoE fundamentals, identifies three core design problems (expert placement, expert selection, and hidden-state transmission/routing), and argues that satellite-specific constraints—dynamic topology, battery degradation, and thermal limits—fundamentally reshape solutions to these problems. It closes by outlining research directions for scalable, efficient on-orbit MoE inference.

Significance. If the satellite constraints can be integrated into MoE mechanisms while preserving sparse activation efficiency, the framing could help guide deployment of large models under extreme resource limits in orbital environments. The overview usefully connects satellite networking trends with MoE architecture, potentially stimulating targeted work on energy-aware and topology-adaptive inference; however, its conceptual scope means significance depends on future validation rather than immediate technical advance.

major comments (1)

[Abstract and design-problems discussion] Abstract and the section introducing the three design problems: the assertion that satellite factors 'fundamentally reshape' expert placement, selection, and routing solutions lacks any quantitative model, simulation result, or analytic bound demonstrating that added state (e.g., per-satellite battery curves or thermal throttling) can be tracked and optimized at the latency and scale required for distributed LLM inference. This assumption is load-bearing for the claim that SpaceMoE constitutes a distinct paradigm rather than a straightforward application of existing MoE techniques.

minor comments (2)

[Overall] The manuscript would benefit from a brief table or diagram contrasting terrestrial MoE routing with the satellite-adapted variants proposed for each of the three problems.
[Motivation and review section] Ensure all references to industrial progress and emerging standardization trends include specific citations with dates and document identifiers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for the thoughtful comments and the recommendation for major revision. The feedback helps us better position the manuscript as an overview of emerging challenges rather than a fully validated technical contribution. We address the major comment in detail below.

read point-by-point responses

Referee: [Abstract and design-problems discussion] Abstract and the section introducing the three design problems: the assertion that satellite factors 'fundamentally reshape' expert placement, selection, and routing solutions lacks any quantitative model, simulation result, or analytic bound demonstrating that added state (e.g., per-satellite battery curves or thermal throttling) can be tracked and optimized at the latency and scale required for distributed LLM inference. This assumption is load-bearing for the claim that SpaceMoE constitutes a distinct paradigm rather than a straightforward application of existing MoE techniques.

Authors: We appreciate the referee's observation that our claims regarding the reshaping of MoE design problems by satellite constraints would benefit from quantitative support. As an overview paper, our goal is to identify and articulate the unique challenges posed by orbital environments to distributed MoE inference, drawing on domain knowledge of satellite systems. The discussion in the manuscript qualitatively explains how factors like dynamic topology affect expert placement (e.g., due to changing inter-satellite links), battery degradation impacts selection policies (prioritizing energy-efficient experts), and thermal limits influence routing decisions for hidden states. However, we acknowledge that demonstrating the feasibility of tracking and optimizing such state at the required scale is indeed an open question and part of the research directions we outline. We do not intend to claim that SpaceMoE is already a distinct implemented paradigm with proven efficiency gains; rather, it is a conceptual framework highlighting the need for satellite-aware adaptations. To strengthen the manuscript, we will revise the abstract and the relevant section to use more cautious language, such as 'are anticipated to reshape' instead of 'fundamentally reshape', and explicitly note the absence of quantitative validation in this work. This revision will better align the claims with the conceptual nature of the paper while preserving the motivation for future work on these topics. revision: partial

Circularity Check

0 steps flagged

No circularity: conceptual overview without derivations or self-referential reductions

full rationale

The paper is a high-level overview that reviews industrial trends, introduces SpaceMoE fundamentals, and identifies three design problems (expert placement, selection, and routing) while noting satellite factors such as dynamic topology and battery limits. No equations, fitted parameters, predictions, or load-bearing derivations appear in the provided text or abstract. Claims are framed as motivations and open research directions rather than results obtained by construction from prior self-citations or definitions. The manuscript therefore contains no steps that reduce to their own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a high-level overview with no explicit mathematical derivations, fitted parameters, or new postulated entities; all content rests on standard assumptions about MoE sparsity and satellite network dynamics drawn from prior literature.

pith-pipeline@v0.9.0 · 5732 in / 1126 out tokens · 34756 ms · 2026-05-19T19:32:34.888387+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 3 internal anchors

[1]

6G wireless networks: Vision, requirements, architecture, and key technologies,

Z. Zhang, Y . Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Karagiannidis, and P. Fan, “6G wireless networks: Vision, requirements, architecture, and key technologies,”IEEE V eh. Technol. Mag., vol. 14, no. 3, pp. 28–41, Sep. 2019

work page 2019
[2]

Federated learning in satellite constellations,

B. Matthiesen, N. Razmi, I. Leyva-Mayorga, A. Dekorsy, and P. Popovski, “Federated learning in satellite constellations,”IEEE Netw., vol. 38, no. 2, pp. 232–239, Mar. 2024

work page 2024
[3]

Towards intelligent SAGIN: Leveraging big AI models and SDN for end-to-end automation,

C. Wu, X. Wang, Y . Hu, S. Han, W. Meng, and D. Niyato, “Towards intelligent SAGIN: Leveraging big AI models and SDN for end-to-end automation,”to appear in IEEE Netw., 2025

work page 2025
[4]

Mobile edge intelligence for large language models: A contemporary survey,

G. Qu, Q. Chen, W. Wei, Z. Lin, X. Chen, and K. Huang, “Mobile edge intelligence for large language models: A contemporary survey,”IEEE Commun. Surveys Tuts., vol. 27, no. 6, pp. 3820–3860, Dec. 2025

work page 2025
[5]

Aerospace integrated networks innovation for empowering 6G: A survey and future challenges,

D. Zhou, M. Sheng, J. Li, and Z. Han, “Aerospace integrated networks innovation for empowering 6G: A survey and future challenges,”IEEE Commun. Surveys Tuts., vol. 25, no. 2, pp. 975–1019, Secondquarter 2023

work page 2023
[6]

Space-ground fluid AI for 6G edge intelligence,

Q. Chen, Z. Wang, X. Chen, J. Wen, D. Zhou, S. Ji, M. Sheng, and K. Huang, “Space-ground fluid AI for 6G edge intelligence,” Engineering, vol. 54, no. 11, pp. 14–19, Nov. 2025

work page 2025
[7]

Scaling Laws for Neural Language Models

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,”arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001
[8]

Mixtral of Experts

A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bam- ford, D. S. Chaplot, D. d. l. Casas, E. B. Hanna, F. Bressandet al., “Mixtral of experts,”arXiv preprint arXiv:2401.04088, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

Serving MoE models on resource-constrained edge devices via dynamic expert swapping,

R. Kong, Y . Li, W. Wang, L. Kong, and Y . Liu, “Serving MoE models on resource-constrained edge devices via dynamic expert swapping,”IEEE Trans. Comput., vol. 74, no. 8, pp. 2799–2811, Aug. 2025

work page 2025
[10]

Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks

S. Zhang, W. Wu, L. Li, Y . Wanget al., “Communication-efficient col- laborative LLM inference over LEO satellite networks,”arXiv preprint arXiv:2604.04654, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[11]

SlimCaching: Edge caching of mixture-of-experts for distributed inference,

Q. Chen, X. Chen, and K. Huang, “SlimCaching: Edge caching of mixture-of-experts for distributed inference,”to appear in IEEE Trans. Mobile Comput., 2026

work page 2026
[12]

SiftMoE: Similarity-aware energy-efficient expert selection for wireless distributed MoE inference,

——, “SiftMoE: Similarity-aware energy-efficient expert selection for wireless distributed MoE inference,”arXiv preprint arXiv:2603.23888, 2026

work page arXiv 2026
[13]

Unseen cost of space computing: Quantifying LEO battery aging via physics-driven modeling,

L. Zeng, J. Zhu, Z. Wang, Y . Shi, and K. B. Letaief, “Unseen cost of space computing: Quantifying LEO battery aging via physics-driven modeling,”arXiv preprint arXiv:2603.04372, 2026

work page arXiv 2026
[14]

Why we should train AI in space,

E. Feilden, A. Oltean, and P. Johnston, “Why we should train AI in space,” Lumen Orbit (now Starcloud), White Paper v1.03, Sep. 2024. [Online]. Available: https://starcloudinc.github.io/wp.pdf

work page 2024
[15]

FedMeld: A model-dispersal feder- ated learning framework for space-ground integrated networks,

Q. Chen, X. Chen, and K. Huang, “FedMeld: A model-dispersal feder- ated learning framework for space-ground integrated networks,”IEEE Trans. Mobile Comput., vol. 25, no. 6, pp. 8221–8234, Jun. 2026. BIOGRAPHIES Qian Chenis a postdoctoral fellow at the Department of Electrical and Computer Engineering, The University of Hong Kong. Her research inter- ests ...

work page 2026

[1] [1]

6G wireless networks: Vision, requirements, architecture, and key technologies,

Z. Zhang, Y . Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Karagiannidis, and P. Fan, “6G wireless networks: Vision, requirements, architecture, and key technologies,”IEEE V eh. Technol. Mag., vol. 14, no. 3, pp. 28–41, Sep. 2019

work page 2019

[2] [2]

Federated learning in satellite constellations,

B. Matthiesen, N. Razmi, I. Leyva-Mayorga, A. Dekorsy, and P. Popovski, “Federated learning in satellite constellations,”IEEE Netw., vol. 38, no. 2, pp. 232–239, Mar. 2024

work page 2024

[3] [3]

Towards intelligent SAGIN: Leveraging big AI models and SDN for end-to-end automation,

C. Wu, X. Wang, Y . Hu, S. Han, W. Meng, and D. Niyato, “Towards intelligent SAGIN: Leveraging big AI models and SDN for end-to-end automation,”to appear in IEEE Netw., 2025

work page 2025

[4] [4]

Mobile edge intelligence for large language models: A contemporary survey,

G. Qu, Q. Chen, W. Wei, Z. Lin, X. Chen, and K. Huang, “Mobile edge intelligence for large language models: A contemporary survey,”IEEE Commun. Surveys Tuts., vol. 27, no. 6, pp. 3820–3860, Dec. 2025

work page 2025

[5] [5]

Aerospace integrated networks innovation for empowering 6G: A survey and future challenges,

D. Zhou, M. Sheng, J. Li, and Z. Han, “Aerospace integrated networks innovation for empowering 6G: A survey and future challenges,”IEEE Commun. Surveys Tuts., vol. 25, no. 2, pp. 975–1019, Secondquarter 2023

work page 2023

[6] [6]

Space-ground fluid AI for 6G edge intelligence,

Q. Chen, Z. Wang, X. Chen, J. Wen, D. Zhou, S. Ji, M. Sheng, and K. Huang, “Space-ground fluid AI for 6G edge intelligence,” Engineering, vol. 54, no. 11, pp. 14–19, Nov. 2025

work page 2025

[7] [7]

Scaling Laws for Neural Language Models

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,”arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001

[8] [8]

Mixtral of Experts

A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bam- ford, D. S. Chaplot, D. d. l. Casas, E. B. Hanna, F. Bressandet al., “Mixtral of experts,”arXiv preprint arXiv:2401.04088, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[9] [9]

Serving MoE models on resource-constrained edge devices via dynamic expert swapping,

R. Kong, Y . Li, W. Wang, L. Kong, and Y . Liu, “Serving MoE models on resource-constrained edge devices via dynamic expert swapping,”IEEE Trans. Comput., vol. 74, no. 8, pp. 2799–2811, Aug. 2025

work page 2025

[10] [10]

Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks

S. Zhang, W. Wu, L. Li, Y . Wanget al., “Communication-efficient col- laborative LLM inference over LEO satellite networks,”arXiv preprint arXiv:2604.04654, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[11] [11]

SlimCaching: Edge caching of mixture-of-experts for distributed inference,

Q. Chen, X. Chen, and K. Huang, “SlimCaching: Edge caching of mixture-of-experts for distributed inference,”to appear in IEEE Trans. Mobile Comput., 2026

work page 2026

[12] [12]

SiftMoE: Similarity-aware energy-efficient expert selection for wireless distributed MoE inference,

——, “SiftMoE: Similarity-aware energy-efficient expert selection for wireless distributed MoE inference,”arXiv preprint arXiv:2603.23888, 2026

work page arXiv 2026

[13] [13]

Unseen cost of space computing: Quantifying LEO battery aging via physics-driven modeling,

L. Zeng, J. Zhu, Z. Wang, Y . Shi, and K. B. Letaief, “Unseen cost of space computing: Quantifying LEO battery aging via physics-driven modeling,”arXiv preprint arXiv:2603.04372, 2026

work page arXiv 2026

[14] [14]

Why we should train AI in space,

E. Feilden, A. Oltean, and P. Johnston, “Why we should train AI in space,” Lumen Orbit (now Starcloud), White Paper v1.03, Sep. 2024. [Online]. Available: https://starcloudinc.github.io/wp.pdf

work page 2024

[15] [15]

FedMeld: A model-dispersal feder- ated learning framework for space-ground integrated networks,

Q. Chen, X. Chen, and K. Huang, “FedMeld: A model-dispersal feder- ated learning framework for space-ground integrated networks,”IEEE Trans. Mobile Comput., vol. 25, no. 6, pp. 8221–8234, Jun. 2026. BIOGRAPHIES Qian Chenis a postdoctoral fellow at the Department of Electrical and Computer Engineering, The University of Hong Kong. Her research inter- ests ...

work page 2026