UAV-Assisted Cooperative Edge Inference for Low-Altitude Economy via MoE-based Hierarchical Deep Reinforcement Learning
Pith reviewed 2026-05-20 03:11 UTC · model grok-4.3
The pith
A hierarchical deep reinforcement learning method with mixture-of-experts lets UAVs optimize both their flight paths and edge AI support for ground devices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Within a constrained POMDP model for UAV-assisted cooperative edge inference, HDRL-MoE decouples slow-varying inference decisions from rapidly changing UAV trajectory control and integrates a mixture-of-experts architecture where a router network orchestrates discrete offloading decisions while expert networks independently optimize feature compression ratios.
What carries the argument
HDRL-MoE, a hierarchical deep reinforcement learning framework that separates timescale-specific optimizations and uses mixture-of-experts with a router for offloading decisions and experts for compression ratios.
If this is right
- UAVs maintain closer adherence to reference trajectories while providing higher accuracy inference support.
- The MoE design enables efficient scaling as the number of inference tasks or devices increases.
- Joint optimization of trajectories, offloading, and compression outperforms approaches that handle these separately.
- Feature compression can be tuned dynamically per expert without interfering with high-level mission decisions.
Where Pith is reading between the lines
- This separation of decision layers could apply to other scenarios involving mobile agents with both navigation and computation duties.
- Real UAV testbeds would help check if the POMDP captures wireless and mobility uncertainties well enough.
- Extending the router to handle more types of decisions might further improve performance in dynamic environments.
Load-bearing premise
The constrained POMDP and the split between slow inference decisions and fast trajectory control capture the essential real-world trade-offs without major loss of optimality.
What would settle it
Deploy the learned policies on physical UAVs in a test environment and compare measured inference accuracy and path deviations to the simulation results and to baseline methods.
Figures
read the original abstract
The low-altitude economy (LAE) is reshaping the industrial landscape by deploying unmanned aerial vehicles (UAVs) to facilitate a wide range of applications demanding flexible aerial mobility. Integrating edge artificial intelligence (AI) into LAE platforms creates a compelling paradigm where UAVs provide real-time AI-driven analysis while simultaneously executing their primary aerial mission duties. However, realizing this paradigm remains challenging due to the strict mission constraints imposed by these primary duties and the throughput bottlenecks of wireless links. To bridge this gap, we propose a UAV-assisted cooperative edge inference framework where UAVs execute mission-critical LAE duties, quantified by trajectory deviations from reference paths, while concurrently supporting ground devices via intermediate feature offloading. Within this framework, UAV trajectories, inference task offloading decisions, and feature compression ratios are jointly optimized to maximize the system performance. We cast this joint optimization task into a constrained partially observable Markov decision process (POMDP) framework. To efficiently solve it, we propose HDRL-MoE, a novel hierarchical deep reinforcement learning framework that decouples the optimization of slow-varying inference decisions from rapidly changing UAV trajectory control. Furthermore, HDRL-MoE integrates a mixture-of-experts (MoE) architecture, where a router network orchestrates discrete offloading decisions while expert networks independently optimize the feature compression ratios. Extensive simulations show that HDRL-MoE achieves significant inference accuracy gains over baselines and exhibits high scalability and efficiency through its MoE design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a UAV-assisted cooperative edge inference framework for low-altitude economy applications. UAVs perform primary trajectory-constrained missions while supporting ground devices through intermediate feature offloading. The joint optimization of trajectories, offloading decisions, and compression ratios is cast as a constrained POMDP. This is solved via HDRL-MoE, a hierarchical DRL architecture that decouples slow-varying inference decisions from fast UAV trajectory control and employs a mixture-of-experts router for discrete offloading with independent expert networks for compression ratios. Extensive simulations are reported to demonstrate significant inference accuracy gains, scalability, and efficiency relative to baselines.
Significance. If the simulation results hold, the work offers a practical approach to multi-timescale optimization in UAV-edge AI systems under mission and wireless constraints. The hierarchical decoupling and MoE design address action-space dimensionality and differing update rates, which are recurring challenges in wireless control and edge inference. The explicit time-scale separation and reported ablation studies on the MoE component provide a concrete technical contribution that could inform similar hierarchical RL designs in dynamic networked systems.
major comments (1)
- [§3] §3 (POMDP formulation): the claim that hierarchical decoupling of inference decisions from trajectory control preserves near-optimality rests on differing update frequencies; additional analysis (e.g., sub-optimality bounds or sensitivity to frequency mismatch) would strengthen the central feasibility argument for real-world deployment.
minor comments (3)
- [Abstract] Abstract: the statement of 'significant inference accuracy gains' would benefit from one or two concrete percentage improvements or key metric values to give readers an immediate sense of scale.
- [Simulation section] Simulation section: confirm that all reported curves include error bars or standard deviations across random seeds, and that baseline implementations are described with sufficient hyper-parameter detail for reproducibility.
- [Notation] Notation: ensure consistent use of symbols for the router network output and expert selection probabilities across equations and algorithm pseudocode.
Simulated Author's Rebuttal
We thank the referee for the constructive comment and the recommendation for minor revision. We address the point on the POMDP formulation below by agreeing to incorporate additional empirical analysis.
read point-by-point responses
-
Referee: [§3] §3 (POMDP formulation): the claim that hierarchical decoupling of inference decisions from trajectory control preserves near-optimality rests on differing update frequencies; additional analysis (e.g., sub-optimality bounds or sensitivity to frequency mismatch) would strengthen the central feasibility argument for real-world deployment.
Authors: We appreciate this observation. The hierarchical decoupling in HDRL-MoE is designed around the natural timescale separation between slow-varying inference decisions (offloading and compression ratios) and fast UAV trajectory control. While providing theoretical sub-optimality bounds would require a major theoretical extension outside the paper's simulation-focused scope, we agree that sensitivity analysis to frequency mismatch is valuable and feasible. In the revised manuscript, we will add simulation results (new figure or subsection in the evaluation section) that vary the relative update rates of the high-level and low-level policies and report the resulting inference accuracy and constraint satisfaction. This will empirically demonstrate robustness to different degrees of timescale separation. revision: partial
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper formulates a joint optimization problem as a constrained POMDP and introduces the HDRL-MoE architecture as an independent solution method. Hierarchical decoupling is explicitly motivated by differing update frequencies between inference decisions and UAV trajectories, while the MoE router/expert split is presented as a design to manage action-space dimensionality. Performance claims rest on simulation comparisons against baselines rather than any fitted parameter renamed as prediction or self-citation chain. No load-bearing step reduces by construction to its own inputs; the framework is self-contained with independent grounding in the described architecture and empirical results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Privacy-aware multi-device cooperative edge inference with distributed resource bidding,
W. Zhuang and Y . Mao, “Privacy-aware multi-device cooperative edge inference with distributed resource bidding,” inProc. IEEE Global Commun. Conf. (GLOBECOM), Taipei, Taiwan, Dec. 2025
work page 2025
-
[2]
The low-altitude network by integrated sensing and communication,
China Telecom, Ericsson, Huawei, Nokia, ZTE, CICT Mobile, OPPO, Xiaomi, vivo, Lenovo, Qualcomm, MediaTek, and UNISOC, “The low-altitude network by integrated sensing and communication,” White Paper, Feb. 2024. [Online]. Available: https://www.zte.com.cn/content/ dam/zte-site/res-www-zte-com-cn/mediares/zte/%E6%97%A0%E7% BA%BF%E6%8E%A5%E5%85%A5/%E7%99%BD%E...
work page 2024
-
[3]
Y . Wanget al., “Toward realization of low-altitude economy networks: Core architecture, integrated technologies, and future directions,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 5, pp. 2788–2820, Aug. 2025
work page 2025
-
[4]
A. D. Boursianiset al., “Internet of things (IoT) and agricultural unmanned aerial vehicles (UA Vs) in smart farming: A comprehensive review,”Internet of Things, vol. 18, p. 100187, 2022
work page 2022
-
[5]
Green edge AI: A contemporary survey,
Y . Mao, X. Yu, K. Huang, Y .-J. Angela Zhang, and J. Zhang, “Green edge AI: A contemporary survey,”Proc. IEEE, vol. 112, no. 7, pp. 880– 911, 2024
work page 2024
-
[6]
Empowering intelligent low-altitude economy with large AI model deployment,
Z. Lyuet al., “Empowering intelligent low-altitude economy with large AI model deployment,”IEEE Wireless Commun., vol. 33, no. 1, pp. 64–72, 2026
work page 2026
-
[7]
Wireless communications with unmanned aerial vehicles: opportunities and challenges,
Y . Zeng, R. Zhang, and T. J. Lim, “Wireless communications with unmanned aerial vehicles: opportunities and challenges,”IEEE Commun. Mag., vol. 54, no. 5, pp. 36–42, 2016
work page 2016
-
[8]
L. Zeng, X. Chen, Z. Zhou, L. Yang, and J. Zhang, “CoEdge: Co- operative DNN inference with adaptive workload partitioning over heterogeneous edge devices,”IEEE/ACM Trans. Netw., vol. 29, no. 2, pp. 595–608, Apr. 2021
work page 2021
-
[9]
Learning task-oriented communication for edge inference: An information bottleneck approach,
J. Shao, Y . Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,”IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 197–211, Nov. 2021
work page 2021
-
[10]
Task-oriented sensing, computation, and communication integration for multi-device edge AI,
D. Wenet al., “Task-oriented sensing, computation, and communication integration for multi-device edge AI,”IEEE Trans. Wireless Commun., vol. 23, no. 3, pp. 2486–2502, Mar. 2024
work page 2024
-
[11]
Towards federated inference: An online model ensemble framework for cooper- ative edge AI,
Z. Zhou, J. Xie, M. Huang, T. Ouyang, F. Liu, and X. Chen, “Towards federated inference: An online model ensemble framework for cooper- ative edge AI,” inProc. IEEE Conf. Comput. Commun. (INFOCOM), London, UK, May 2025
work page 2025
-
[12]
X. Li and S. Bi, “Optimal AI model splitting and resource allocation for device-edge co-inference in multi-user wireless sensing systems,”IEEE Trans. Wireless Commun., vol. 23, pp. 11 094–11 108, Sep. 2024
work page 2024
-
[13]
Energy-efficient edge inference in integrated sensing, communication, and computation networks,
J. Yao, W. Xu, G. Zhu, K. Huang, and S. Cui, “Energy-efficient edge inference in integrated sensing, communication, and computation networks,”IEEE J. Sel. Areas Commun., vol. 43, no. 10, pp. 3580–3595, Oct. 2025
work page 2025
-
[14]
Communication efficient coopera- tive edge AI via event-triggered computation offloading,
Y . Zhou, C. You, and K. Huang, “Communication efficient coopera- tive edge AI via event-triggered computation offloading,”IEEE Trans. Commun., vol. 74, pp. 3190–3205, Dec. 2025
work page 2025
-
[15]
Joint trajectory and communication design for multi-UA V enabled wireless networks,
Q. Wu, Y . Zeng, and R. Zhang, “Joint trajectory and communication design for multi-UA V enabled wireless networks,”IEEE Trans. Wireless Commun., vol. 17, no. 3, pp. 2109–2121, Mar. 2018
work page 2018
-
[16]
Throughput maximization for UA V-enabled integrated periodic sensing and commu- nication,
K. Meng, Q. Wu, S. Ma, W. Chen, K. Wang, and J. Li, “Throughput maximization for UA V-enabled integrated periodic sensing and commu- nication,”IEEE Trans. Wireless Commun., vol. 22, no. 1, pp. 671–687, Jan. 2023
work page 2023
-
[17]
Energy-efficient scheduling in UA V-assisted hierarchical wireless sensor networks,
R. Laiet al., “Energy-efficient scheduling in UA V-assisted hierarchical wireless sensor networks,”IEEE Internet Things J., vol. 11, pp. 20 194– 20 206, Jun. 2024
work page 2024
-
[18]
G. Cheng, X. Song, Z. Lyu, and J. Xu, “Networked ISAC for low- altitude economy: Coordinated transmit beamforming and UA V trajec- tory design,”IEEE Trans. Commun., vol. 73, no. 8, pp. 5832–5847, Aug. 2025
work page 2025
-
[19]
X. Ye, Y . Mao, X. Yu, S. Sun, L. Fu, and J. Xu, “Integrated sensing and communications for low-altitude economy: A deep reinforcement learning approach,”IEEE Trans. Wireless Commun., vol. 25, pp. 351– 367, 2026
work page 2026
-
[20]
Z. Fanget al., “Task-oriented communications for visual navigation with edge-aerial collaboration in low altitude economy,” inProc. IEEE Global Commun. Conf. (GLOBECOM), Taipei, Taiwan, Dec. 2025
work page 2025
-
[21]
Y . Fu, P. Qin, J. Zhang, and Z. Lu, “Joint AI inference and target tracking at network edge: A hybrid offline-online design for UA V- enabled network,”IEEE Trans. Wireless Commun., vol. 23, no. 12, pp. 17 959–17 973, Dec. 2024
work page 2024
-
[22]
Dynamic UA V-assisted cooperative edge AI inference,
J. Huanget al., “Dynamic UA V-assisted cooperative edge AI inference,” IEEE Trans. Wireless Commun., vol. 24, no. 1, pp. 615–628, 2025
work page 2025
-
[23]
R. S. Sutton, A. G. Bartoet al.,Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1
work page 1998
-
[24]
UA V trajectory optimiza- tion for joint relay communication and image surveillance,
N. V . Cuong, Y .-W. P. Hong, and J.-P. Sheu, “UA V trajectory optimiza- tion for joint relay communication and image surveillance,”IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10 177–10 192, Dec. 2022
work page 2022
-
[25]
C. Deng, X. Fang, and X. Wang, “Beamforming design and trajectory optimization for UA V-empowered adaptable integrated sensing and communication,”IEEE Trans. Wireless Commun., vol. 22, no. 11, pp. 8512–8526, Nov. 2023
work page 2023
-
[26]
Coverage path planning for spraying drones,
E. V . V .-Carmona, J. I. V .-Gomez, J. C. H.-Lozada, and M. Antonio- Cruz, “Coverage path planning for spraying drones,”Comput. Ind. Eng., vol. 168, p. 108125, Apr. 2022
work page 2022
-
[27]
Strategies for optimized UA V surveillance in various tasks and scenarios: A review,
Z. Fang and A. V . Savkin, “Strategies for optimized UA V surveillance in various tasks and scenarios: A review,”Drones, vol. 8, no. 5, 2024
work page 2024
-
[28]
Task-oriented communication for mul- tidevice cooperative edge inference,
J. Shao, Y . Mao, and J. Zhang, “Task-oriented communication for mul- tidevice cooperative edge inference,”IEEE Trans. Wireless Commun., vol. 22, no. 1, pp. 73–87, Jan. 2023
work page 2023
-
[29]
Early-exit deep neural network - a comprehensive survey,
H. Rahmath P, V . Srivastava, K. Chaurasia, R. G. Pacheco, and R. S. Couto, “Early-exit deep neural network - a comprehensive survey,”ACM Comput. Surv., vol. 57, no. 3, Nov. 2024
work page 2024
-
[30]
Resource-constrained edge AI with early exit prediction,
R. Dong, Y . Mao, and J. Zhang, “Resource-constrained edge AI with early exit prediction,”J. Commun. Inf. Netw., vol. 7, no. 2, pp. 122–134, Jun. 2022
work page 2022
-
[31]
Multi-camera multiple 3D object tracking on the move for autonomous vehicles,
P. Nguyen, K. G. Quach, C. Nhan Duong, N. Le, X.-B. Nguyen, and K. Luu, “Multi-camera multiple 3D object tracking on the move for autonomous vehicles,” inProc. IEEE/CVF Conf Comput. Vis. Pattern Recognit. Workshops (CVPRW), New Orleans, LA, USA, Jun. 2022
work page 2022
-
[32]
Outrageously large neural networks: The sparsely- gated mixture-of-experts layer,
N. Shazeeret al., “Outrageously large neural networks: The sparsely- gated mixture-of-experts layer,” inProc. Int. Conf. Learn. Represent. (ICLR), Toulon, France, Apr. 2017
work page 2017
-
[33]
Active learning literature survey,
B. Settles, “Active learning literature survey,” University of Wisconsin– Madison, Computer Sciences Technical Report 1648, 2009
work page 2009
-
[34]
Stop regressing: Training value functions via classification for scalable deep RL,
J. Farebrotheret al., “Stop regressing: Training value functions via classification for scalable deep RL,” inProc. Int. Conf. Mach. Learn. (ICML), Vienna, Austria, Jul. 2024
work page 2024
-
[35]
J. Mei, X. Wang, K. Zheng, G. Boudreau, A. B. Sediq, and H. Abou- Zeid, “Intelligent radio access network slicing for service provisioning in 6G: A hierarchical deep reinforcement learning approach,”IEEE Trans. Commun., vol. 69, no. 9, pp. 6063–6078, Sep. 2021
work page 2021
-
[36]
Deep reinforcement learning with double q-learning,
H. v. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” inProc. AAAI Conf. Artif. Intell., Phoenix, AZ, USA, Feb. 2016
work page 2016
-
[37]
Continuous control with deep reinforcement learning,
T. P. Lillicrapet al., “Continuous control with deep reinforcement learning,” inProc. Int. Conf. Learn. Represent. (ICLR), San Juan, Puerto Rico, May 2016
work page 2016
-
[38]
Categorical reparameterization with gumbel-softmax,
E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” inProc. Int. Conf. Learn. Represent. (ICLR), Toulon, France, Apr. 2017
work page 2017
-
[39]
Human-level control through deep reinforcement learn- ing,
V . Mnihet al., “Human-level control through deep reinforcement learn- ing,”nature, vol. 518, no. 7540, pp. 529–533, 2015
work page 2015
-
[40]
Masked training of neural net- works with partial gradients,
A. Mohtashami, M. Jaggi, and S. Stich, “Masked training of neural net- works with partial gradients,” inInt. Conf. Artif. Intell. Stat. (AISTATS), Valencia, Spain, Mar. 2022
work page 2022
-
[41]
An analysis of single-layer networks in unsupervised feature learning,
A. Coates, A. Y . Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” inProc. Int. Conf. Artif. Intell. Statist (AISTATS-11), Fort Lauderdale, FL, USA, Apr. 2011
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.