pith. sign in

arxiv: 2507.16731 · v1 · pith:L2AXAXIYnew · submitted 2025-07-22 · 💻 cs.DC

Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges

classification 💻 cs.DC
keywords collaborationinferencellmssurveytaskcloudcollaborativeedge
0
0 comments X
read the original abstract

As large language models (LLMs) evolve, deploying them solely in the cloud or compressing them for edge devices has become inadequate due to concerns about latency, privacy, cost, and personalization. This survey explores a collaborative paradigm in which cloud-based LLMs and edge-deployed small language models (SLMs) cooperate across both inference and training. We present a unified taxonomy of edge-cloud collaboration strategies. For inference, we categorize approaches into task assignment, task division, and mixture-based collaboration at both task and token granularity, encompassing adaptive scheduling, resource-aware offloading, speculative decoding, and modular routing. For training, we review distributed adaptation techniques, including parameter alignment, pruning, bidirectional distillation, and small-model-guided optimization. We further summarize datasets, benchmarks, and deployment cases, and highlight privacy-preserving methods and vertical applications. This survey provides the first systematic foundation for LLM-SLM collaboration, bridging system and algorithm co-design to enable efficient, scalable, and trustworthy edge-cloud intelligence.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation

    cs.CV 2026-03 unverdicted novelty 7.0

    RailVQA-bench supplies 21,168 QA pairs for ATO visual cognition while RailVQA-CoM combines large-model reasoning with small-model efficiency via transparent modules and temporal sampling.

  2. An Efficient and Privacy-Preserving Architecture for Cross-Institutional Collaborative RAG

    cs.CR 2026-05 unverdicted novelty 6.0

    FedRAG uses a Scrambled Distributed Attention protocol with feature scrambling and token permutation to enable high-throughput, privacy-preserving federated RAG without special hardware or retraining.

  3. PrivScope: Task-scoped Disclosure Control for Hybrid Agentic Systems

    cs.CR 2026-05 unverdicted novelty 6.0

    PrivScope enforces task-scoped disclosure at the local-cloud boundary in hybrid agents, eliminating profile leakage and halving re-identification risk on medical workflows while preserving task success.

  4. Administrative Decentralization in Edge-Cloud Multi-Agent for Mobile Automation

    cs.DC 2026-04 unverdicted novelty 6.0

    AdecPilot decentralizes administration in edge-cloud multi-agent frameworks by using a UI-agnostic cloud designer and a bimodal edge team with a Hierarchical Implicit Termination protocol, yielding 21.7% higher task s...

  5. Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda

    cs.DC 2026-04 unverdicted novelty 2.0

    This research agenda argues that cloud-native architectures, microservices, autoscaling, and emerging trends like serverless inference and federated learning are required to make large language models efficient and scalable.