DRL-Driven Edge-Aware Utility Optimization for Multi-Slice 6G Networks

Ahmed M. Abd El-Haleem; Ibrahim I. Ibrahim; Khaled M. Naguib; Mahmoud M. Elmessalawy; Soumaya Cherkaoui

arxiv: 2605.23056 · v1 · pith:HZGKLLCXnew · submitted 2026-05-21 · 💻 cs.NI · cs.AI

DRL-Driven Edge-Aware Utility Optimization for Multi-Slice 6G Networks

Khaled M. Naguib , Soumaya Cherkaoui , Mahmoud M. Elmessalawy , Ahmed M. Abd El-Haleem , Ibrahim I. Ibrahim This is my paper

Pith reviewed 2026-05-25 04:59 UTC · model grok-4.3

classification 💻 cs.NI cs.AI

keywords deep q-networkedge caching6g networkso-rannetwork slicingvirtual realityresource allocationreinforcement learning

0 comments

The pith

A DQN-based framework optimizes edge caching and resource allocation across 6G O-RAN slices to cut latency for VR services.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes a framework that places Deep Q-Network agents in the O-RAN control plane to optimize edge caching and dynamic resource provisioning for multiple network slices. The system targets the ultra-low latency and high bandwidth needs of virtual reality by supporting eMBB, URLLC, and MBRLLC slices through proactive content distribution and real-time allocation. Simulations show the approach reduces latency and raises throughput relative to conventional methods. A reader would care because it directly tackles the delivery of immersive experiences on next-generation mobile networks.

Core claim

The paper claims that incorporating DRL agents into the network control plane allows proactive and adaptive content distribution as well as real-time computational resource allocation that meets the quality-of-service demands of eMBB, URLLC, and MBRLLC slices essential for VR, with the DQN-based optimization consistently outperforming traditional methods in reducing latency and improving throughput.

What carries the argument

Deep Q-Network agents integrated into the O-RAN control plane to perform proactive content distribution and real-time resource allocation across eMBB, URLLC, and MBRLLC slices.

If this is right

The framework supports quality-of-service requirements for eMBB, URLLC, and MBRLLC slices.
Latency decreases compared with traditional resource allocation methods.
Throughput increases for immersive VR traffic.
Support for VR applications becomes more reliable and responsive in 6G settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same DRL placement could be tested on traffic patterns beyond VR, such as real-time gaming or remote surgery streams.
Compatibility checks would need to cover how the agents interact with existing O-RAN interfaces under varying load.
Energy consumption or signaling overhead might become visible only after longer-term runs not shown in the current simulations.

Load-bearing premise

DRL agents can be integrated into the O-RAN control plane to enable proactive content distribution and real-time resource allocation without introducing unacceptable overhead or compatibility problems.

What would settle it

A measurement in a physical O-RAN testbed that records the actual overhead added by the DRL agents and checks whether latency and throughput gains remain when moving from simulation to hardware.

Figures

Figures reproduced from arXiv: 2605.23056 by Ahmed M. Abd El-Haleem, Ibrahim I. Ibrahim, Khaled M. Naguib, Mahmoud M. Elmessalawy, Soumaya Cherkaoui.

**Figure 1.** Figure 1: System Model NonRT-RIC and interact with a proprietary xAPP in the nRTRIC for precise control. Edge computing modules co-located with or near each O-DU improve this architecture by providing essential local computation and storage capabilities, critical for latency-sensitive and high-throughput applications. To fully exploit edge computing, intelligent caching mechanisms are employed, whereby popular VR c… view at source ↗

**Figure 2.** Figure 2: Effect of Cache Gain Weight (κ) on Throughput Distribution IV. RESULTS AND DISCUSSION The proposed DQN-based cache and resource allocation scheme is evaluated in a simulated 6G O-RAN network comprising 7 base stations (BSs). Each BS serves 42 mobile users distributed across a 150-meter coverage area. Users are assigned to one of three network slices; eMBB, URLLC, or MBRLLC, according to their service requi… view at source ↗

read the original abstract

Virtual Reality (VR) services delivered over 6G networks demand ultra-low latency and high bandwidth to ensure seamless user experiences. This paper presents an intelligent resource allocation and edge caching framework for 6G O-RAN networks, leveraging Deep Q-Network (DQN) learning for optimizing edge caching and dynamic resource provisioning across multiple network slices within an O-RAN-compliant architecture. By incorporating DRL agents into the network control plane, the proposed system enables proactive and adaptive content distribution as well as real-time computational resource allocation that meets the quality-of-service demands of eMBB, URLLC, and especially the emerging MBRLLC slices essential for VR. Simulation results demonstrate that the DQN-based framework consistently outperforms traditional methods in reducing latency and improving throughput, leading to more reliable and responsive support for immersive VR applications in 6G environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies DQN to O-RAN multi-slice caching for VR but its latency claims rest on simulations that ignore DRL inference and control-plane costs.

read the letter

The main takeaway is that this work takes standard DQN and plugs it into an O-RAN setup for edge caching plus dynamic slicing aimed at VR traffic, claiming lower latency and higher throughput than baselines. It frames the agents inside the control plane to handle eMBB, URLLC, and MBRLLC slices together. That framing is the clearest part of the contribution. The abstract shows the authors thought through how the slices map to VR requirements and how caching can be made proactive. That is useful context even if the method itself is not new. The simulations are said to show consistent gains, which at least gives a concrete target for later comparison. The central weakness is that nothing in the description accounts for the cost of running the DQN forward pass or the signaling needed to push decisions through the RIC and E2 interface. If those add even modest delay, the reported advantage for URLLC and MBRLLC slices is likely overstated. The abstract also gives no information on simulation parameters, baseline definitions, number of runs, or statistical tests, so it is impossible to judge whether the outperformance is robust. This paper is aimed at people already working on AI-driven resource management in 6G and O-RAN. A reader looking for a worked example of DQN in that setting might pick up the slice-mapping ideas, but anyone needing reproducible evidence or overhead analysis will find the current version thin. It is worth sending to referees so the authors can supply the missing simulation details and address the control-plane cost question.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a DQN-based framework for edge caching and dynamic resource allocation across eMBB, URLLC, and MBRLLC slices in an O-RAN 6G architecture, with the central claim that embedding DRL agents in the control plane enables proactive content distribution and real-time provisioning that outperforms traditional methods on latency and throughput for VR services.

Significance. If the simulation results survive addition of inference and signaling overheads plus full experimental disclosure, the work would offer a concrete data point on DRL applicability to multi-slice O-RAN; the explicit treatment of the emerging MBRLLC slice is a modest positive. At present the performance claims rest entirely on unreported simulation details and an unmodeled assumption of zero-cost agent decisions.

major comments (2)

[Simulation results section] Simulation results section: the setup, baselines, statistical significance tests, error bars, raw-data references, and exclusion rules are not described, rendering the reported latency/throughput gains impossible to evaluate or reproduce.
[Abstract and framework description] Abstract and framework description: the headline claim of real-time resource allocation for URLLC/MBRLLC slices does not model DQN forward-pass latency, E2/RIC signaling overhead, or contention with existing xApp/rApp workloads; any non-negligible per-decision cost would erase the reported advantage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important issues of reproducibility and modeling assumptions. We address each point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Simulation results section] Simulation results section: the setup, baselines, statistical significance tests, error bars, raw-data references, and exclusion rules are not described, rendering the reported latency/throughput gains impossible to evaluate or reproduce.

Authors: We agree that the Simulation Results section lacks sufficient detail for reproducibility. In the revised manuscript we will expand this section to fully specify the simulation parameters (network topology, traffic models, slice configurations for eMBB/URLLC/MBRLLC), DQN architecture and training hyperparameters, the complete set of baselines (including static allocation, LRU caching, and non-DRL optimization heuristics), the statistical significance tests applied, error bars on all reported figures, a statement on raw data availability, and any exclusion criteria used in the evaluation. These additions will allow independent verification of the latency and throughput results. revision: yes
Referee: [Abstract and framework description] Abstract and framework description: the headline claim of real-time resource allocation for URLLC/MBRLLC slices does not model DQN forward-pass latency, E2/RIC signaling overhead, or contention with existing xApp/rApp workloads; any non-negligible per-decision cost would erase the reported advantage.

Authors: The referee correctly notes that the current model treats DQN decisions as zero-cost. This is an explicit modeling choice common in early DRL network studies to isolate the value of learned policies. We will revise the abstract, framework description, and add a dedicated limitations subsection that (i) states the zero-overhead assumption, (ii) discusses the potential impact of forward-pass latency and E2/RIC signaling on URLLC/MBRLLC slices, and (iii) provides a qualitative sensitivity analysis. A full re-simulation that includes measured inference and contention costs is beyond the scope of a major revision but will be flagged as future work. The core claim of improved utility under idealized control-plane execution therefore remains, subject to the clarified assumptions. revision: partial

Circularity Check

0 steps flagged

No significant circularity; performance claims rest on simulation outcomes

full rationale

The paper presents a DRL/DQN framework for multi-slice resource allocation evaluated via simulations. No derivation chain, equations, or fitted parameters are described that reduce to inputs by construction. Claims of outperformance are tied directly to empirical results rather than analytical self-reference, self-citation load-bearing premises, or ansatz smuggling. This matches the default expectation for simulation-driven papers with no mathematical derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; ledger left empty.

pith-pipeline@v0.9.0 · 5697 in / 997 out tokens · 26533 ms · 2026-05-25T04:59:27.689204+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

[1]

Open ran slicing for mvnos with deep re- inforcement learning,

A. Filaliet al., “Open ran slicing for mvnos with deep re- inforcement learning,”IEEE Internet of Things Journal, vol. 11, no. 10, pp. 18 711–18 725, 2024

work page 2024
[2]

Intelligible protocol learning for resource allocation in 6g o-ran slicing,

F. Rezazadehet al., “Intelligible protocol learning for resource allocation in 6g o-ran slicing,”IEEE Wireless Communications, vol. 31, no. 5, pp. 192–199, 2024

work page 2024
[3]

Federated deep reinforcement learning for open ran slicing in 6g networks,

A. Abouaomaret al., “Federated deep reinforcement learning for open ran slicing in 6g networks,”IEEE Communications Magazine, vol. 61, no. 2, pp. 126–132, 2023

work page 2023
[4]

Open ran slicing with quantum opti- mization,

P. Keyelaet al., “Open ran slicing with quantum opti- mization,” in2025 Global Information Infrastructure and Networking Symposium (GIIS), 2025, pp. 1–6

work page 2025
[5]

Oranslice: An open source 5g network slicing platform for o-ran,

H. Chenget al., “Oranslice: An open source 5g network slicing platform for o-ran,” inProceedings of the 30th Annual International Conference on Mobile Computing and Networking, ser. ACM MobiCom ’24. Association for Computing Machinery, 2024, p. 2297–2302. [Online]. Available: https://doi.org/10.1145/3636534.3701544

work page doi:10.1145/3636534.3701544 2024
[6]

Caching and computing resource allo- cation in cooperative heterogeneous 5g edge networks using deep reinforcement learning,

T. Boseet al., “Caching and computing resource allo- cation in cooperative heterogeneous 5g edge networks using deep reinforcement learning,”IEEE Transactions on Network and Service Management, vol. 21, no. 4, pp. 4161–4178, 2024

work page 2024
[7]

Communication and computation o- ran resource slicing for urllc services using deep rein- forcement learning,

A. Filaliet al., “Communication and computation o- ran resource slicing for urllc services using deep rein- forcement learning,”IEEE Communications Standards Magazine, vol. 7, no. 1, pp. 66–73, 2023

work page 2023
[8]

Iot-5g and b5g/6g resource allocation and network slicing orchestration using learning algorithms,

A. A. Abba Ariet al., “Iot-5g and b5g/6g resource allocation and network slicing orchestration using learning algorithms,”IET Networks, vol. 14, no. 1, p. e70002, 2025. [Online]. Available: https://ietresearch. onlinelibrary.wiley.com/doi/abs/10.1049/ntw2.70002

work page doi:10.1049/ntw2.70002 2025
[9]

Learning-based resource allocation for mbrllc and homogeneous slices in 6g networks,

H. Awadaet al., “Learning-based resource allocation for mbrllc and homogeneous slices in 6g networks,” in2024 3rd International Conference on 6G Networking (6GNet), 2024, pp. 127–134

work page 2024
[10]

Enhanced vr experience with edge com- puting: The impact of decoding latency,

L. Huanget al., “Enhanced vr experience with edge com- puting: The impact of decoding latency,”IEEE Transac- tions on Mobile Computing, vol. 24, no. 7, pp. 6275– 6292, 2025

work page 2025

[1] [1]

Open ran slicing for mvnos with deep re- inforcement learning,

A. Filaliet al., “Open ran slicing for mvnos with deep re- inforcement learning,”IEEE Internet of Things Journal, vol. 11, no. 10, pp. 18 711–18 725, 2024

work page 2024

[2] [2]

Intelligible protocol learning for resource allocation in 6g o-ran slicing,

F. Rezazadehet al., “Intelligible protocol learning for resource allocation in 6g o-ran slicing,”IEEE Wireless Communications, vol. 31, no. 5, pp. 192–199, 2024

work page 2024

[3] [3]

Federated deep reinforcement learning for open ran slicing in 6g networks,

A. Abouaomaret al., “Federated deep reinforcement learning for open ran slicing in 6g networks,”IEEE Communications Magazine, vol. 61, no. 2, pp. 126–132, 2023

work page 2023

[4] [4]

Open ran slicing with quantum opti- mization,

P. Keyelaet al., “Open ran slicing with quantum opti- mization,” in2025 Global Information Infrastructure and Networking Symposium (GIIS), 2025, pp. 1–6

work page 2025

[5] [5]

Oranslice: An open source 5g network slicing platform for o-ran,

H. Chenget al., “Oranslice: An open source 5g network slicing platform for o-ran,” inProceedings of the 30th Annual International Conference on Mobile Computing and Networking, ser. ACM MobiCom ’24. Association for Computing Machinery, 2024, p. 2297–2302. [Online]. Available: https://doi.org/10.1145/3636534.3701544

work page doi:10.1145/3636534.3701544 2024

[6] [6]

Caching and computing resource allo- cation in cooperative heterogeneous 5g edge networks using deep reinforcement learning,

T. Boseet al., “Caching and computing resource allo- cation in cooperative heterogeneous 5g edge networks using deep reinforcement learning,”IEEE Transactions on Network and Service Management, vol. 21, no. 4, pp. 4161–4178, 2024

work page 2024

[7] [7]

Communication and computation o- ran resource slicing for urllc services using deep rein- forcement learning,

A. Filaliet al., “Communication and computation o- ran resource slicing for urllc services using deep rein- forcement learning,”IEEE Communications Standards Magazine, vol. 7, no. 1, pp. 66–73, 2023

work page 2023

[8] [8]

Iot-5g and b5g/6g resource allocation and network slicing orchestration using learning algorithms,

A. A. Abba Ariet al., “Iot-5g and b5g/6g resource allocation and network slicing orchestration using learning algorithms,”IET Networks, vol. 14, no. 1, p. e70002, 2025. [Online]. Available: https://ietresearch. onlinelibrary.wiley.com/doi/abs/10.1049/ntw2.70002

work page doi:10.1049/ntw2.70002 2025

[9] [9]

Learning-based resource allocation for mbrllc and homogeneous slices in 6g networks,

H. Awadaet al., “Learning-based resource allocation for mbrllc and homogeneous slices in 6g networks,” in2024 3rd International Conference on 6G Networking (6GNet), 2024, pp. 127–134

work page 2024

[10] [10]

Enhanced vr experience with edge com- puting: The impact of decoding latency,

L. Huanget al., “Enhanced vr experience with edge com- puting: The impact of decoding latency,”IEEE Transac- tions on Mobile Computing, vol. 24, no. 7, pp. 6275– 6292, 2025

work page 2025