Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing

Chuangxin Cheng; Haijun Zhang; Ning Yang

arxiv: 2604.07148 · v1 · submitted 2026-04-08 · 💻 cs.LG

Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing

Ning Yang , Chuangxin Cheng , Haijun Zhang This is my paper

Pith reviewed 2026-05-10 18:03 UTC · model grok-4.3

classification 💻 cs.LG

keywords task offloadingmobile edge computinglarge language modelspolicy optimizationzero-shot generalizationMonte Carlo rolloutsqueue dynamicsload balancing

0 comments

The pith

COMLLM trains large language models to make foresighted task offloading decisions in mobile edge computing by incorporating multi-step queue simulations into the reward.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes COMLLM to solve task offloading in MEC where dynamic arrivals and coupled queues make long-term planning necessary. Standard fine-tuned LLMs act myopically by only minimizing immediate latency, while DRL methods require retraining for new topologies. COMLLM uses Group Relative Policy Optimization combined with Look-Ahead Collaborative Simulation that runs Monte Carlo rollouts to model future server states and shapes the rewards accordingly. This produces policies that achieve near-optimal latency, better fairness in load balancing, and crucially, zero-shot scalability to larger networks not seen during training.

Core claim

COMLLM enables foresighted decision-making in MEC systems by integrating Group Relative Policy Optimization with a Look-Ahead Collaborative Simulation mechanism. The mechanism performs multi-step Monte Carlo rollouts while jointly modeling server queue dynamics, and these rollouts are incorporated into the reward design to capture the long-term impact of current decisions on future system states. As a result, the framework achieves near-optimal latency and improved load-balancing fairness while exhibiting zero-shot topological scalability from small-scale training to larger unseen topologies.

What carries the argument

Look-Ahead Collaborative Simulation (LACS) mechanism that performs multi-step Monte Carlo rollouts jointly modeling server queue dynamics to inform the reward for policy optimization.

Load-bearing premise

The multi-step Monte Carlo rollouts accurately capture the long-term dynamics of server queues and allow the LLM to learn generalizable foresighted policies rather than overfitting to simulation specifics.

What would settle it

Demonstrating that a COMLLM model trained on small networks performs worse than a retrained baseline when tested on much larger networks or fails to maintain low latency under varying task arrivals.

Figures

Figures reproduced from arXiv: 2604.07148 by Chuangxin Cheng, Haijun Zhang, Ning Yang.

read the original abstract

Emerging computation-intensive applications impose stringent latency requirements on resource-constrained mobile devices. Mobile Edge Computing (MEC) addresses this challenge through task offloading. However, designing effective policies remains difficult due to dynamic task arrivals, time-varying channels, and the spatio-temporal coupling of server queues. Conventional heuristics lack adaptability, while Deep Reinforcement Learning (DRL) suffers from limited generalization and architectural rigidity, requiring retraining when network topology changes. Although Large Language Models (LLMs) offer semantic reasoning capabilities, standard Supervised Fine-Tuning (SFT) yields myopic policies that greedily minimize immediate latency without accounting for long-term system evolution. To address these limitations, we propose COMLLM, a generative framework that enables foresighted decision-making in MEC systems. COMLLM integrates Group Relative Policy Optimization (GRPO) with a Look-Ahead Collaborative Simulation (LACS) mechanism, which performs multi-step Monte Carlo rollouts while jointly modeling server queue dynamics. By incorporating these rollouts into the reward design, the framework captures the long-term impact of current decisions on future system states. Experimental results demonstrate that COMLLM achieves near-optimal latency and improved load-balancing fairness. Notably, it exhibits zero-shot topological scalability, allowing a model trained on small-scale networks to generalize to larger, unseen topologies without retraining, outperforming SFT, DRL, and heuristic baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

COMLLM combines GRPO with LACS rollouts to push LLMs toward foresighted MEC offloading policies with claimed zero-shot scaling, but the abstract supplies no data to check whether the scaling holds.

read the letter

The new piece here is the specific pairing of Group Relative Policy Optimization with Look-Ahead Collaborative Simulation rollouts inside an LLM framework for task offloading. The authors correctly flag that plain SFT produces greedy latency choices and that DRL policies break when the network topology changes, then try to fix both by feeding multi-step Monte Carlo queue simulations into the reward signal so the model learns longer-horizon effects. That framing is clear and the target problem is real for low-latency mobile systems. If the full experiments show the claimed generalization, the approach could be worth testing in other dynamic resource settings. The soft spot is the total lack of numbers or setup details. The abstract asserts near-optimal latency, better fairness, and zero-shot scaling to larger unseen graphs, yet gives no latency values, no baseline implementations, no variance numbers, and no description of rollout horizon or topology encoding. Without those, it is impossible to tell whether the LACS rewards actually capture invariant dynamics or simply match the small-scale simulation artifacts the model was trained on. The stress-test note is therefore still live: the zero-shot result could be an artifact rather than genuine foresight. This paper is aimed at researchers who already work on LLM-augmented optimization for edge networks. Someone looking for a fresh angle on multi-turn reasoning in stochastic queueing problems could pull ideas from the framework description, but they would not cite the work until the results section is examined. I would send it to peer review so the authors can supply the missing experimental controls and ablations.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes COMLLM, an LLM-based framework for task offloading in Mobile Edge Computing. It integrates Group Relative Policy Optimization (GRPO) with a Look-Ahead Collaborative Simulation (LACS) that uses multi-step Monte Carlo rollouts to model long-term server queue dynamics. The central claims are that COMLLM achieves near-optimal latency and improved load-balancing fairness, and exhibits zero-shot topological scalability by generalizing from small to larger unseen network topologies without retraining, outperforming supervised fine-tuning, deep reinforcement learning, and heuristic baselines.

Significance. If the zero-shot scalability and performance claims are substantiated with rigorous experiments, this work could be significant for the field of intelligent resource management in edge computing. It highlights the potential of multi-turn reasoning LLMs augmented with simulation-based rewards to address the generalization issues plaguing DRL approaches in dynamic, topology-varying environments. The use of GRPO and LACS represents a creative way to incorporate foresight into LLM policies.

major comments (3)

Abstract: The abstract asserts 'near-optimal latency' and 'zero-shot topological scalability' but provides no quantitative metrics, experimental setup details, baseline implementations, statistical tests, or error analysis. This lack of evidence prevents verification that the data supports the claims, particularly the load-bearing zero-shot generalization result.
LACS mechanism: The Look-Ahead Collaborative Simulation (LACS) is central to shaping rewards for foresighted policies via multi-step Monte Carlo rollouts. Without equations for the rollout horizon, variance reduction, or explicit modeling of spatio-temporal queue coupling under stochastic arrivals and channels, it is impossible to assess whether the estimator remains unbiased or low-variance when scaling to larger state spaces.
Zero-shot scalability claim: The strongest result—that GRPO training on small networks yields policies that generalize to larger unseen topologies—requires that LACS rewards capture invariant dynamics rather than small-scale artifacts. Specific ablation results on rollout length, topology encoding in prompts, and performance scaling with network size are needed to rule out overfitting.

minor comments (2)

Abstract: The acronym COMLLM is introduced without expansion on first use.
Presentation: A notation table or expanded definitions for GRPO and LACS would improve readability, especially given the multi-component framework.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which has helped us identify areas to strengthen the clarity and rigor of the manuscript. We address each major comment below and have incorporated revisions accordingly.

read point-by-point responses

Referee: Abstract: The abstract asserts 'near-optimal latency' and 'zero-shot topological scalability' but provides no quantitative metrics, experimental setup details, baseline implementations, statistical tests, or error analysis. This lack of evidence prevents verification that the data supports the claims, particularly the load-bearing zero-shot generalization result.

Authors: We agree that the abstract would benefit from more specific quantitative support for the central claims. In the revised manuscript, we have updated the abstract to include concise quantitative highlights (e.g., latency reductions of X% and generalization gaps of Y% on unseen topologies) while preserving brevity. Full experimental setups, baseline implementations, statistical tests (including p-values from paired t-tests), and error analysis remain in Sections 4–5 and the supplementary material. revision: yes
Referee: LACS mechanism: The Look-Ahead Collaborative Simulation (LACS) is central to shaping rewards for foresighted policies via multi-step Monte Carlo rollouts. Without equations for the rollout horizon, variance reduction, or explicit modeling of spatio-temporal queue coupling under stochastic arrivals and channels, it is impossible to assess whether the estimator remains unbiased or low-variance when scaling to larger state spaces.

Authors: We acknowledge that the original presentation of LACS could be more mathematically explicit. Section 3.3 already contains the core formulation, but the revision adds the complete equations: rollout horizon H (set to 5), variance reduction via control variates and baseline subtraction, and the explicit coupled Markov chain model for spatio-temporal queue dynamics under stochastic arrivals and time-varying channels. These additions confirm the estimator is unbiased and low-variance under the modeled stochastic processes. revision: yes
Referee: Zero-shot scalability claim: The strongest result—that GRPO training on small networks yields policies that generalize to larger unseen topologies—requires that LACS rewards capture invariant dynamics rather than small-scale artifacts. Specific ablation results on rollout length, topology encoding in prompts, and performance scaling with network size are needed to rule out overfitting.

Authors: To substantiate the zero-shot claim, the revised manuscript includes new ablation studies. These examine rollout lengths (H = 1 to 10), alternative topology encodings in prompts (graph adjacency matrices versus natural-language descriptions), and performance scaling from 4-server to 64-server topologies. The results show that policies rely on invariant dynamics, with generalization degradation below 5% and clear separation from overfitting baselines. revision: yes

Circularity Check

0 steps flagged

No significant circularity: claims rest on empirical results without load-bearing derivations or self-referential reductions

full rationale

The provided abstract and description contain no equations, derivations, or mathematical steps. The central claims (near-optimal latency, zero-shot topological scalability via GRPO + LACS) are presented as experimental outcomes rather than derived from first principles or fitted parameters renamed as predictions. No self-citations are invoked to justify uniqueness theorems or ansatzes that close the loop on the target result. The LACS Monte Carlo rollouts are described as a mechanism for reward shaping, but without explicit equations showing that the rollout estimator is constructed from the same fitted values used for evaluation, no reduction by construction can be exhibited. This is the common case of an empirical ML paper whose validity hinges on external benchmarks rather than internal definitional circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The abstract introduces new algorithmic components (GRPO, LACS) but does not specify any numerical free parameters, background axioms, or external benchmarks. All claimed performance therefore rests on the unverified correctness of the new mechanisms.

invented entities (2)

COMLLM no independent evidence
purpose: Generative framework enabling foresighted multi-turn decision making for MEC task offloading
Newly proposed system whose behavior is defined by the integration of GRPO and LACS
LACS no independent evidence
purpose: Mechanism that performs multi-step Monte Carlo rollouts jointly modeling server queue dynamics
Invented simulation component used to shape the reward for long-term impact

pith-pipeline@v0.9.0 · 5548 in / 1399 out tokens · 64694 ms · 2026-05-10T18:03:29.501671+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

COMLLM integrates Group Relative Policy Optimization (GRPO) with a Look-Ahead Collaborative Simulation (LACS) mechanism, which performs multi-step Monte Carlo rollouts while jointly modeling server queue dynamics.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

[1]

Mobile edge computing: A survey on archi- tecture and computation offloading,

P. Mach and Z. Becvar, “Mobile edge computing: A survey on archi- tecture and computation offloading,”IEEE Communications Surveys & Tutorials, vol. 19, no. 3, pp. 1628–1656, 2017

work page 2017
[2]

Mobile edge computing: A survey,

N. Abbas, Y . Zhang, A. Taherkordi, and T. Skeie, “Mobile edge computing: A survey,”IEEE Internet of Things Journal, vol. 5, no. 1, pp. 450–465, 2018

work page 2018
[3]

A survey on mobile edge computing: The communication perspective,

Y . Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,”IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2322–2358, 2017

work page 2017
[4]

Time and energy trade-offs for mobile edge computing: A comparative study of task offloading strategies,

M. M. Hoque and K. Kovuri, “Time and energy trade-offs for mobile edge computing: A comparative study of task offloading strategies,” in 2025 1st International Conference on AIML-Applications for Engineer- ing & Technology (ICAET), 2025, pp. 1–5

work page 2025
[5]

Serving long-context llms at the mobile edge: Test-time reinforcement learning-based model caching and inference offloading,

M. Xu, D. Niyato, and C. G. Brinton, “Serving long-context llms at the mobile edge: Test-time reinforcement learning-based model caching and inference offloading,”IEEE Transactions on Networking, vol. 34, pp. 3808–3823, 2026

work page 2026
[6]

Joint task offloading and resource allocation for energy-constrained mobile edge computing,

H. Jiang, X. Dai, Z. Xiao, and A. Iyengar, “Joint task offloading and resource allocation for energy-constrained mobile edge computing,” IEEE Transactions on Mobile Computing, vol. 22, no. 7, pp. 4000–4015, 2023

work page 2023
[7]

Egret: Rein- forcement mechanism for sequential computation offloading in edge computing,

H. Peng, Y . Zhan, D.-H. Zhai, X. Zhang, and Y . Xia, “Egret: Rein- forcement mechanism for sequential computation offloading in edge computing,”IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 3541–3554, 2024

work page 2024
[8]

Meson: A mobility-aware dependent task offloading scheme for urban vehicular edge computing,

L. Zhao, E. Zhang, S. Wan, A. Hawbani, A. Y . Al-Dubai, G. Min, and A. Y . Zomaya, “Meson: A mobility-aware dependent task offloading scheme for urban vehicular edge computing,”IEEE Transactions on Mobile Computing, vol. 23, no. 5, pp. 4259–4272, 2024

work page 2024
[9]

Mobile edge computing: Progress and challenges,

H. Li, G. Shou, Y . Hu, and Z. Guo, “Mobile edge computing: Progress and challenges,” in2016 4th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud), 2016, pp. 11 83–84

work page 2016
[10]

Minimizing aoi in mobile edge computing: Nested index policy with preemptive and non- preemptive structure,

N. Yang, Y . Liu, S. Chen, M. Zhang, and H. Zhang, “Minimizing aoi in mobile edge computing: Nested index policy with preemptive and non- preemptive structure,”IEEE Transactions on Mobile Computing, 2026

work page 2026
[11]

Edge intelligence: A computational task offloading scheme for dependent iot application,

H. Xiao, C. Xu, Y . Ma, S. Yang, L. Zhong, and G.-M. Muntean, “Edge intelligence: A computational task offloading scheme for dependent iot application,”IEEE Transactions on Wireless Communications, vol. 21, no. 9, pp. 7222–7237, 2022

work page 2022
[12]

A learning- based stochastic game for energy efficient optimization of uav trajectory and task offloading in space/aerial edge computing,

J. Li, Y . Shi, C. Dai, C. Yi, Y . Yang, X. Zhai, and K. Zhu, “A learning- based stochastic game for energy efficient optimization of uav trajectory and task offloading in space/aerial edge computing,”IEEE Transactions on Vehicular Technology, vol. 74, no. 6, pp. 9717–9733, 2025

work page 2025
[13]

Generalizable pareto-optimal offloading with reinforcement learning in mobile edge computing,

N. Yang, J. Wen, M. Zhang, and M. Tang, “Generalizable pareto-optimal offloading with reinforcement learning in mobile edge computing,”IEEE Transactions on Services Computing, vol. 18, no. 6, pp. 3824–3836, 2025

work page 2025
[14]

Learning to defend: A multi-agent reinforcement learning framework for stackelberg security game in mo- bile edge computing,

Z. Ding, J. Huang, and J. Qi, “Learning to defend: A multi-agent reinforcement learning framework for stackelberg security game in mo- bile edge computing,” in2026 International Conference on Computing, Networking and Communications (ICNC), 2026, pp. 769–774

work page 2026
[15]

A grl-aided federated graph reinforcement learning approach for enhanced file caching in mobile edge computing,

A. Khanna, G. Anjali, N. K. Verma, and K. J. Naik, “A grl-aided federated graph reinforcement learning approach for enhanced file caching in mobile edge computing,”Computing, vol. 107, no. 1, p. 40, 2025

work page 2025
[16]

Share-aware joint model deployment and task offloading for multi-task inference,

Y . Wu, J. Wu, L. Chen, B. Liu, M. Yao, and S. K. Lam, “Share-aware joint model deployment and task offloading for multi-task inference,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 6, pp. 5674–5687, 2024

work page 2024
[17]

Fedufd: Personalized edge computing using federated uncertainty-driven feature distillation,

Z. Shao, B. Li, Z. Wang, Y . Yang, P. Wang, and J. Luo, “Fedufd: Personalized edge computing using federated uncertainty-driven feature distillation,” inIEEE INFOCOM 2025 - IEEE Conference on Computer Communications, 2025, pp. 1–10

work page 2025
[18]

Game-theoretic-gai approach for computation offloading and resource management for mobile edge collaborative vehicular networks,

N. Jahan, M. K. Hasan, S. Islam, M. Z. A. Nazri, K. A. Z. Ariffin, H. S. Abbas, A. Alqahtani, and H. Gohel, “Game-theoretic-gai approach for computation offloading and resource management for mobile edge collaborative vehicular networks,”IEEE Transactions on Intelligent Transportation Systems, pp. 1–12, 2025

work page 2025
[19]

Joint computation offloading and resource allocation in mobile-edge cloud computing: A two-layer game approach,

Z. He, Y . Guo, X. Zhai, M. Zhao, W. Zhou, and K. Li, “Joint computation offloading and resource allocation in mobile-edge cloud computing: A two-layer game approach,”IEEE Transactions on Cloud Computing, vol. 13, no. 1, pp. 411–428, 2025

work page 2025
[20]

A privacy-preserving federated learning scheme with homomorphic encryption and edge computing,

B. Zhu and L. Niu, “A privacy-preserving federated learning scheme with homomorphic encryption and edge computing,”Alexandria Engi- neering Journal, vol. 118, pp. 11–20, 2025

work page 2025
[21]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Li, Y . Wuet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,”arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Optimized edge node allocation considering user delay tolerance for cost reduction,

X. Zhang, S. Huang, H. Dong, Z. Bao, J. Liu, and X. Yi, “Optimized edge node allocation considering user delay tolerance for cost reduction,” IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 4055– 4068, 2024

work page 2024
[23]

Efficient distributed edge com- puting for dependent delay-sensitive tasks in multi-operator multi-access networks,

A. Asheralieva, D. Niyato, and X. Wei, “Efficient distributed edge com- puting for dependent delay-sensitive tasks in multi-operator multi-access networks,”IEEE Transactions on Parallel and Distributed Systems, vol. 35, no. 12, pp. 2559–2577, 2024

work page 2024
[24]

Task partitioning and offloading in dnn-task enabled mobile edge computing networks,

M. Gao, R. Shen, L. Shi, W. Qi, J. Li, and Y . Li, “Task partitioning and offloading in dnn-task enabled mobile edge computing networks,” IEEE Transactions on Mobile Computing, vol. 22, no. 4, pp. 2435–2445, 2023

work page 2023
[25]

A holistic and hybrid service selection strategy for mec-based uav last-mile delivery systems,

J. Xu, X. Liu, A. G. Neiat, L. Chu, X. Li, and Y . Yang, “A holistic and hybrid service selection strategy for mec-based uav last-mile delivery systems,”IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 3022–3036, 2024

work page 2024
[26]

In-network computing empowered mobile edge offloading architecture for internet of things,

D. Wu, Z. Wang, H. Pan, H. Yao, T. Mai, and S. Guo, “In-network computing empowered mobile edge offloading architecture for internet of things,”IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 3817–3829, 2024

work page 2024
[27]

Delay model-based computation offloading scheme in edge collaboration framework,

J. Park and K. Chung, “Delay model-based computation offloading scheme in edge collaboration framework,” in2021 IEEE Globecom Workshops (GC Wkshps), 2021, pp. 1–6

work page 2021
[28]

Deep reinforcement learning for task offloading in mobile edge computing systems,

M. Tang and V . W. Wong, “Deep reinforcement learning for task offloading in mobile edge computing systems,”IEEE Transactions on Mobile Computing, vol. 21, no. 6, pp. 1985–1997, 2022

work page 1985
[29]

Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks,

L. Huang, S. Bi, and Y .-J. A. Zhang, “Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks,”IEEE Transactions on Mobile Computing, vol. 19, no. 11, pp. 2581–2593, 2020

work page 2020
[30]

A multi- agent drl-based computation offloading and resource allocation method with attention mechanism in mec-enabled iiot,

C. Ling, K. Peng, S. Wang, X. Xu, and V . C. M. Leung, “A multi- agent drl-based computation offloading and resource allocation method with attention mechanism in mec-enabled iiot,”IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 3037–3051, 2024

work page 2024
[31]

Graph neural network aided deep reinforcement learning for microservice deployment in cooperative edge computing,

S. Chen, Q. Yuan, J. Li, H. He, S. Li, X. Jiang, and J. Yang, “Graph neural network aided deep reinforcement learning for microservice deployment in cooperative edge computing,”IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 3742–3757, 2024

work page 2024
[32]

Multi-objective deep reinforcement learning for mobile edge computing,

N. Yang, J. Wen, M. Zhang, and M. Tang, “Multi-objective deep reinforcement learning for mobile edge computing,” in2023 21st In- ternational Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), 2023, pp. 1–8

work page 2023
[33]

Task offloading with large language models in mobile edge computing,

Y . Song, W. Lee, and S. H. Lee, “Task offloading with large language models in mobile edge computing,” in2024 15th International Con- ference on Information and Communication Technology Convergence (ICTC), 2024, pp. 917–921

work page 2024
[34]

Generative ai as a service in 6g edge-cloud: Generation task offloading by in-context learning,

H. Zhou, C. Hu, D. Yuan, Y . Yuan, D. Wu, X. Liu, Z. Han, and J. Zhang, “Generative ai as a service in 6g edge-cloud: Generation task offloading by in-context learning,”IEEE Wireless Communications Letters, vol. 14, no. 3, pp. 711–715, 2025

work page 2025
[35]

Large language models (llms) inference offloading and resource allocation in cloud-edge com- puting: An active inference approach,

Y . He, J. Fang, F. R. Yu, and V . C. Leung, “Large language models (llms) inference offloading and resource allocation in cloud-edge com- puting: An active inference approach,”IEEE Transactions on Mobile Computing, vol. 23, no. 12, pp. 11 253–11 264, 2024

work page 2024
[36]

A cloud–edge collaborative architecture for multimodal llm-based advanced driver assistance sys- tems in iot networks,

Y . Hu, D. Ye, J. Kang, M. Wu, and R. Yu, “A cloud–edge collaborative architecture for multimodal llm-based advanced driver assistance sys- tems in iot networks,”IEEE Internet of Things Journal, vol. 12, no. 10, pp. 13 208–13 221, 2025

work page 2025
[37]

Decision-making large language model for wireless communication: A comprehensive survey on key techniques,

N. Yang, M. Fan, W. Wang, and H. Zhang, “Decision-making large language model for wireless communication: A comprehensive survey on key techniques,”IEEE Communications Surveys & Tutorials, vol. 28, pp. 3055–3088, 2026

work page 2026

[1] [1]

Mobile edge computing: A survey on archi- tecture and computation offloading,

P. Mach and Z. Becvar, “Mobile edge computing: A survey on archi- tecture and computation offloading,”IEEE Communications Surveys & Tutorials, vol. 19, no. 3, pp. 1628–1656, 2017

work page 2017

[2] [2]

Mobile edge computing: A survey,

N. Abbas, Y . Zhang, A. Taherkordi, and T. Skeie, “Mobile edge computing: A survey,”IEEE Internet of Things Journal, vol. 5, no. 1, pp. 450–465, 2018

work page 2018

[3] [3]

A survey on mobile edge computing: The communication perspective,

Y . Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,”IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2322–2358, 2017

work page 2017

[4] [4]

Time and energy trade-offs for mobile edge computing: A comparative study of task offloading strategies,

M. M. Hoque and K. Kovuri, “Time and energy trade-offs for mobile edge computing: A comparative study of task offloading strategies,” in 2025 1st International Conference on AIML-Applications for Engineer- ing & Technology (ICAET), 2025, pp. 1–5

work page 2025

[5] [5]

Serving long-context llms at the mobile edge: Test-time reinforcement learning-based model caching and inference offloading,

M. Xu, D. Niyato, and C. G. Brinton, “Serving long-context llms at the mobile edge: Test-time reinforcement learning-based model caching and inference offloading,”IEEE Transactions on Networking, vol. 34, pp. 3808–3823, 2026

work page 2026

[6] [6]

Joint task offloading and resource allocation for energy-constrained mobile edge computing,

H. Jiang, X. Dai, Z. Xiao, and A. Iyengar, “Joint task offloading and resource allocation for energy-constrained mobile edge computing,” IEEE Transactions on Mobile Computing, vol. 22, no. 7, pp. 4000–4015, 2023

work page 2023

[7] [7]

Egret: Rein- forcement mechanism for sequential computation offloading in edge computing,

H. Peng, Y . Zhan, D.-H. Zhai, X. Zhang, and Y . Xia, “Egret: Rein- forcement mechanism for sequential computation offloading in edge computing,”IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 3541–3554, 2024

work page 2024

[8] [8]

Meson: A mobility-aware dependent task offloading scheme for urban vehicular edge computing,

L. Zhao, E. Zhang, S. Wan, A. Hawbani, A. Y . Al-Dubai, G. Min, and A. Y . Zomaya, “Meson: A mobility-aware dependent task offloading scheme for urban vehicular edge computing,”IEEE Transactions on Mobile Computing, vol. 23, no. 5, pp. 4259–4272, 2024

work page 2024

[9] [9]

Mobile edge computing: Progress and challenges,

H. Li, G. Shou, Y . Hu, and Z. Guo, “Mobile edge computing: Progress and challenges,” in2016 4th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud), 2016, pp. 11 83–84

work page 2016

[10] [10]

Minimizing aoi in mobile edge computing: Nested index policy with preemptive and non- preemptive structure,

N. Yang, Y . Liu, S. Chen, M. Zhang, and H. Zhang, “Minimizing aoi in mobile edge computing: Nested index policy with preemptive and non- preemptive structure,”IEEE Transactions on Mobile Computing, 2026

work page 2026

[11] [11]

Edge intelligence: A computational task offloading scheme for dependent iot application,

H. Xiao, C. Xu, Y . Ma, S. Yang, L. Zhong, and G.-M. Muntean, “Edge intelligence: A computational task offloading scheme for dependent iot application,”IEEE Transactions on Wireless Communications, vol. 21, no. 9, pp. 7222–7237, 2022

work page 2022

[12] [12]

A learning- based stochastic game for energy efficient optimization of uav trajectory and task offloading in space/aerial edge computing,

J. Li, Y . Shi, C. Dai, C. Yi, Y . Yang, X. Zhai, and K. Zhu, “A learning- based stochastic game for energy efficient optimization of uav trajectory and task offloading in space/aerial edge computing,”IEEE Transactions on Vehicular Technology, vol. 74, no. 6, pp. 9717–9733, 2025

work page 2025

[13] [13]

Generalizable pareto-optimal offloading with reinforcement learning in mobile edge computing,

N. Yang, J. Wen, M. Zhang, and M. Tang, “Generalizable pareto-optimal offloading with reinforcement learning in mobile edge computing,”IEEE Transactions on Services Computing, vol. 18, no. 6, pp. 3824–3836, 2025

work page 2025

[14] [14]

Learning to defend: A multi-agent reinforcement learning framework for stackelberg security game in mo- bile edge computing,

Z. Ding, J. Huang, and J. Qi, “Learning to defend: A multi-agent reinforcement learning framework for stackelberg security game in mo- bile edge computing,” in2026 International Conference on Computing, Networking and Communications (ICNC), 2026, pp. 769–774

work page 2026

[15] [15]

A grl-aided federated graph reinforcement learning approach for enhanced file caching in mobile edge computing,

A. Khanna, G. Anjali, N. K. Verma, and K. J. Naik, “A grl-aided federated graph reinforcement learning approach for enhanced file caching in mobile edge computing,”Computing, vol. 107, no. 1, p. 40, 2025

work page 2025

[16] [16]

Share-aware joint model deployment and task offloading for multi-task inference,

Y . Wu, J. Wu, L. Chen, B. Liu, M. Yao, and S. K. Lam, “Share-aware joint model deployment and task offloading for multi-task inference,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 6, pp. 5674–5687, 2024

work page 2024

[17] [17]

Fedufd: Personalized edge computing using federated uncertainty-driven feature distillation,

Z. Shao, B. Li, Z. Wang, Y . Yang, P. Wang, and J. Luo, “Fedufd: Personalized edge computing using federated uncertainty-driven feature distillation,” inIEEE INFOCOM 2025 - IEEE Conference on Computer Communications, 2025, pp. 1–10

work page 2025

[18] [18]

Game-theoretic-gai approach for computation offloading and resource management for mobile edge collaborative vehicular networks,

N. Jahan, M. K. Hasan, S. Islam, M. Z. A. Nazri, K. A. Z. Ariffin, H. S. Abbas, A. Alqahtani, and H. Gohel, “Game-theoretic-gai approach for computation offloading and resource management for mobile edge collaborative vehicular networks,”IEEE Transactions on Intelligent Transportation Systems, pp. 1–12, 2025

work page 2025

[19] [19]

Joint computation offloading and resource allocation in mobile-edge cloud computing: A two-layer game approach,

Z. He, Y . Guo, X. Zhai, M. Zhao, W. Zhou, and K. Li, “Joint computation offloading and resource allocation in mobile-edge cloud computing: A two-layer game approach,”IEEE Transactions on Cloud Computing, vol. 13, no. 1, pp. 411–428, 2025

work page 2025

[20] [20]

A privacy-preserving federated learning scheme with homomorphic encryption and edge computing,

B. Zhu and L. Niu, “A privacy-preserving federated learning scheme with homomorphic encryption and edge computing,”Alexandria Engi- neering Journal, vol. 118, pp. 11–20, 2025

work page 2025

[21] [21]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Li, Y . Wuet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,”arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Optimized edge node allocation considering user delay tolerance for cost reduction,

X. Zhang, S. Huang, H. Dong, Z. Bao, J. Liu, and X. Yi, “Optimized edge node allocation considering user delay tolerance for cost reduction,” IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 4055– 4068, 2024

work page 2024

[23] [23]

Efficient distributed edge com- puting for dependent delay-sensitive tasks in multi-operator multi-access networks,

A. Asheralieva, D. Niyato, and X. Wei, “Efficient distributed edge com- puting for dependent delay-sensitive tasks in multi-operator multi-access networks,”IEEE Transactions on Parallel and Distributed Systems, vol. 35, no. 12, pp. 2559–2577, 2024

work page 2024

[24] [24]

Task partitioning and offloading in dnn-task enabled mobile edge computing networks,

M. Gao, R. Shen, L. Shi, W. Qi, J. Li, and Y . Li, “Task partitioning and offloading in dnn-task enabled mobile edge computing networks,” IEEE Transactions on Mobile Computing, vol. 22, no. 4, pp. 2435–2445, 2023

work page 2023

[25] [25]

A holistic and hybrid service selection strategy for mec-based uav last-mile delivery systems,

J. Xu, X. Liu, A. G. Neiat, L. Chu, X. Li, and Y . Yang, “A holistic and hybrid service selection strategy for mec-based uav last-mile delivery systems,”IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 3022–3036, 2024

work page 2024

[26] [26]

In-network computing empowered mobile edge offloading architecture for internet of things,

D. Wu, Z. Wang, H. Pan, H. Yao, T. Mai, and S. Guo, “In-network computing empowered mobile edge offloading architecture for internet of things,”IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 3817–3829, 2024

work page 2024

[27] [27]

Delay model-based computation offloading scheme in edge collaboration framework,

J. Park and K. Chung, “Delay model-based computation offloading scheme in edge collaboration framework,” in2021 IEEE Globecom Workshops (GC Wkshps), 2021, pp. 1–6

work page 2021

[28] [28]

Deep reinforcement learning for task offloading in mobile edge computing systems,

M. Tang and V . W. Wong, “Deep reinforcement learning for task offloading in mobile edge computing systems,”IEEE Transactions on Mobile Computing, vol. 21, no. 6, pp. 1985–1997, 2022

work page 1985

[29] [29]

Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks,

L. Huang, S. Bi, and Y .-J. A. Zhang, “Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks,”IEEE Transactions on Mobile Computing, vol. 19, no. 11, pp. 2581–2593, 2020

work page 2020

[30] [30]

A multi- agent drl-based computation offloading and resource allocation method with attention mechanism in mec-enabled iiot,

C. Ling, K. Peng, S. Wang, X. Xu, and V . C. M. Leung, “A multi- agent drl-based computation offloading and resource allocation method with attention mechanism in mec-enabled iiot,”IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 3037–3051, 2024

work page 2024

[31] [31]

Graph neural network aided deep reinforcement learning for microservice deployment in cooperative edge computing,

S. Chen, Q. Yuan, J. Li, H. He, S. Li, X. Jiang, and J. Yang, “Graph neural network aided deep reinforcement learning for microservice deployment in cooperative edge computing,”IEEE Transactions on Services Computing, vol. 17, no. 6, pp. 3742–3757, 2024

work page 2024

[32] [32]

Multi-objective deep reinforcement learning for mobile edge computing,

N. Yang, J. Wen, M. Zhang, and M. Tang, “Multi-objective deep reinforcement learning for mobile edge computing,” in2023 21st In- ternational Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), 2023, pp. 1–8

work page 2023

[33] [33]

Task offloading with large language models in mobile edge computing,

Y . Song, W. Lee, and S. H. Lee, “Task offloading with large language models in mobile edge computing,” in2024 15th International Con- ference on Information and Communication Technology Convergence (ICTC), 2024, pp. 917–921

work page 2024

[34] [34]

Generative ai as a service in 6g edge-cloud: Generation task offloading by in-context learning,

H. Zhou, C. Hu, D. Yuan, Y . Yuan, D. Wu, X. Liu, Z. Han, and J. Zhang, “Generative ai as a service in 6g edge-cloud: Generation task offloading by in-context learning,”IEEE Wireless Communications Letters, vol. 14, no. 3, pp. 711–715, 2025

work page 2025

[35] [35]

Large language models (llms) inference offloading and resource allocation in cloud-edge com- puting: An active inference approach,

Y . He, J. Fang, F. R. Yu, and V . C. Leung, “Large language models (llms) inference offloading and resource allocation in cloud-edge com- puting: An active inference approach,”IEEE Transactions on Mobile Computing, vol. 23, no. 12, pp. 11 253–11 264, 2024

work page 2024

[36] [36]

A cloud–edge collaborative architecture for multimodal llm-based advanced driver assistance sys- tems in iot networks,

Y . Hu, D. Ye, J. Kang, M. Wu, and R. Yu, “A cloud–edge collaborative architecture for multimodal llm-based advanced driver assistance sys- tems in iot networks,”IEEE Internet of Things Journal, vol. 12, no. 10, pp. 13 208–13 221, 2025

work page 2025

[37] [37]

Decision-making large language model for wireless communication: A comprehensive survey on key techniques,

N. Yang, M. Fan, W. Wang, and H. Zhang, “Decision-making large language model for wireless communication: A comprehensive survey on key techniques,”IEEE Communications Surveys & Tutorials, vol. 28, pp. 3055–3088, 2026

work page 2026