Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges

Haozhao Wang; Jingling Yuan; Ruixuan Li; Rui Zhang; Senyao Li; Song Guo; Tianwei Zhang; Wenchao Xu; Xian Zhong

arxiv: 2507.16731 · v1 · pith:L2AXAXIYnew · submitted 2025-07-22 · 💻 cs.DC

Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges

Senyao Li , Haozhao Wang , Wenchao Xu , Rui Zhang , Song Guo , Jingling Yuan , Xian Zhong , Tianwei Zhang

show 1 more author

Ruixuan Li

This is my paper

classification 💻 cs.DC

keywords collaborationinferencellmssurveytaskcloudcollaborativeedge

0 comments

read the original abstract

As large language models (LLMs) evolve, deploying them solely in the cloud or compressing them for edge devices has become inadequate due to concerns about latency, privacy, cost, and personalization. This survey explores a collaborative paradigm in which cloud-based LLMs and edge-deployed small language models (SLMs) cooperate across both inference and training. We present a unified taxonomy of edge-cloud collaboration strategies. For inference, we categorize approaches into task assignment, task division, and mixture-based collaboration at both task and token granularity, encompassing adaptive scheduling, resource-aware offloading, speculative decoding, and modular routing. For training, we review distributed adaptation techniques, including parameter alignment, pruning, bidirectional distillation, and small-model-guided optimization. We further summarize datasets, benchmarks, and deployment cases, and highlight privacy-preserving methods and vertical applications. This survey provides the first systematic foundation for LLM-SLM collaboration, bridging system and algorithm co-design to enable efficient, scalable, and trustworthy edge-cloud intelligence.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation
cs.CV 2026-03 unverdicted novelty 7.0

RailVQA-bench supplies 21,168 QA pairs for ATO visual cognition while RailVQA-CoM combines large-model reasoning with small-model efficiency via transparent modules and temporal sampling.
An Efficient and Privacy-Preserving Architecture for Cross-Institutional Collaborative RAG
cs.CR 2026-05 unverdicted novelty 6.0

FedRAG uses a Scrambled Distributed Attention protocol with feature scrambling and token permutation to enable high-throughput, privacy-preserving federated RAG without special hardware or retraining.
PrivScope: Task-scoped Disclosure Control for Hybrid Agentic Systems
cs.CR 2026-05 unverdicted novelty 6.0

PrivScope enforces task-scoped disclosure at the local-cloud boundary in hybrid agents, eliminating profile leakage and halving re-identification risk on medical workflows while preserving task success.
Administrative Decentralization in Edge-Cloud Multi-Agent for Mobile Automation
cs.DC 2026-04 unverdicted novelty 6.0

AdecPilot decentralizes administration in edge-cloud multi-agent frameworks by using a UI-agnostic cloud designer and a bimodal edge team with a Hierarchical Implicit Termination protocol, yielding 21.7% higher task s...
Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda
cs.DC 2026-04 unverdicted novelty 2.0

This research agenda argues that cloud-native architectures, microservices, autoscaling, and emerging trends like serverless inference and federated learning are required to make large language models efficient and scalable.