pith. sign in

arxiv: 2604.02461 · v1 · submitted 2026-04-02 · 💻 cs.NI

RL-Loop: Reinforcement Learning-Driven Real-Time 5G Slice Control for Connected and Autonomous Mobility Services

Pith reviewed 2026-05-13 20:49 UTC · model grok-4.3

classification 💻 cs.NI
keywords 5G network slicingreinforcement learningPPO agentCPU resource allocationconnected mobilityedge computingQoS controlreal-time feedback
0
0 comments X

The pith

A reinforcement learning controller can cut CPU allocation for 5G network slices by over 55 percent while keeping quality of service for connected vehicles nearly unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

RL-Loop is a closed-loop framework that applies a Proximal Policy Optimization agent to dynamically adjust CPU resources in 5G edge slices serving connected mobility services. The agent observes slice key performance indicators in real time and changes allocations every second on a physical testbed. Experiments show more than 55 percent lower average CPU usage than a fixed reference point, with quality-of-service degradation remaining at comparable levels. The approach illustrates how lightweight reinforcement learning feedback can manage variable traffic loads efficiently for smart mobility and connected vehicle applications.

Core claim

RL-Loop is a closed-loop reinforcement learning framework for real-time CPU resource control in 5G network slicing for connected mobility services. It uses a Proximal Policy Optimization agent that continuously observes slice-level key performance indicators and adjusts edge CPU allocations at one-second granularity on a real testbed. The system achieves over 55 percent reduction in average CPU allocation relative to the reference operating point while reaching a comparable quality-of-service degradation region.

What carries the argument

A Proximal Policy Optimization (PPO) reinforcement learning agent that observes slice-level key performance indicators and outputs real-time adjustments to edge CPU allocations for 5G network slices.

If this is right

  • One-second granularity adjustments allow the system to track rapidly changing traffic conditions in connected mobility services.
  • Real-time feedback control through lightweight reinforcement learning supports efficient edge resource use and service differentiation.
  • Comparable quality of service at lower CPU levels indicates potential operational cost reductions for 5G infrastructure supporting autonomous mobility.
  • The framework demonstrates that reinforcement learning can provide responsive, software-defined management for real-time communication in connected vehicles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar reinforcement learning loops could be extended to allocate additional resources such as bandwidth or storage in the same slicing environment.
  • Production deployments would likely need hybrid safeguards combining the learned policy with simple rules to handle rare traffic events.
  • Testing the approach across multiple coordinated slices or edge sites would reveal coordination overhead not examined in the single-slice experiments.

Load-bearing premise

The trained PPO policy remains stable and avoids unexpected quality-of-service violations when exposed to real-world variations in mobility traffic patterns.

What would settle it

Deploying RL-Loop on the testbed with mobility traffic traces containing sudden density spikes or patterns absent from training data and verifying whether the reported CPU savings and QoS levels are preserved.

Figures

Figures reproduced from arXiv: 2604.02461 by Abdallah Shami, Ali Chouman, Hanan Lutfiyya, Lara Tarkh.

Figure 1
Figure 1. Figure 1: RL-Loop closed feedback loop in the 5G testbed. The [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: 5G testbed used for RL-Loop experiments QoS degradation curves and CPU operating points under simi￾lar traffic ranges. These numbers provide a useful reference to understand where our RL policy sits relative to a model-driven controller, while we clearly state that the comparison is not one to one because the environments differ. Methodologically, our use of MicroOpt as an indicative baseline is similar in… view at source ↗
Figure 3
Figure 3. Figure 3: Reward versus CPU allocation for RL-Loop. The [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Evolution of the RL-Loop reward during the 900 s [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Smart and connected mobility systems rely on 5G edge infrastructure to support real-time communication, control, and service differentiation. Achieving this requires adaptive resource management mechanisms that can react to rapidly changing traffic conditions. In this paper, we propose RL-Loop, a closed-loop reinforcement learning framework for real-time CPU resource control in 5G network slicing environments supporting connected mobility services. RL-Loop employs a Proximal Policy Optimization (PPO) agent that continuously observes slice-level key performance indicators and adjusts edge CPU allocations at one-second granularity on a real testbed. The framework leverages real-time observability and feedback to enable adaptive, software-defined edge intelligence. Experimental results suggest that RL-Loop can reduce average CPU allocation by over 55% relative to the reference operating point while reaching a comparable quality-of-service degradation region. These results indicate that lightweight reinforcement learning--based feedback control can provide efficient and responsive resource management for 5G-enabled smart mobility and connected vehicle services.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes RL-Loop, a closed-loop reinforcement learning framework using a Proximal Policy Optimization (PPO) agent to perform real-time CPU resource allocation for 5G network slices supporting connected and autonomous mobility services. The agent observes slice KPIs and adjusts allocations at 1-second granularity on a real testbed; the central claim is that this yields >55% reduction in average CPU allocation relative to a reference operating point while maintaining a comparable QoS degradation region.

Significance. If the experimental results hold under broader conditions, the work would be significant for practical 5G edge resource management in mobility scenarios, demonstrating that lightweight RL feedback control can achieve substantial efficiency gains on real hardware. The use of a real testbed and fine-grained (1 s) control loop is a concrete strength that moves beyond simulation-only studies.

major comments (3)
  1. [Abstract and §5] Abstract and §5 (Experimental Results): the headline claim of >55% average CPU reduction with comparable QoS degradation is presented without reported baselines, error bars, statistical tests, number of runs, or variance measures, rendering the quantitative result impossible to verify or reproduce from the given information.
  2. [§4] §4 (Traffic Generation and Evaluation Methodology): the traffic scenarios are confined to a narrow set of mobility patterns (specific vehicle counts, speeds, and service mixes) with no reported stress tests, distribution-shift experiments, or out-of-distribution traces; this directly undermines the assumption that the learned PPO policy remains stable and safe under real-world variations.
  3. [§5.2] §5.2 (Policy Evaluation): no ablation on training stability across random seeds, no analysis of edge-case QoS violations, and no comparison against standard non-RL baselines (e.g., static allocation or threshold-based controllers) are provided, leaving the generalization and superiority claims unsupported.
minor comments (2)
  1. [Throughout] Several acronyms (PPO, KPI, QoS) are used before their first explicit definition; a short table of abbreviations would improve readability.
  2. [Figures in §5] Figure captions and axis labels in the experimental plots lack units and confidence intervals, reducing clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which identifies key areas where additional rigor and analysis will improve the manuscript. We address each major comment below and commit to revisions that enhance verifiability and robustness without altering the core contributions.

read point-by-point responses
  1. Referee: [Abstract and §5] Abstract and §5 (Experimental Results): the headline claim of >55% average CPU reduction with comparable QoS degradation is presented without reported baselines, error bars, statistical tests, number of runs, or variance measures, rendering the quantitative result impossible to verify or reproduce from the given information.

    Authors: We agree that the quantitative claims in the abstract and Section 5 require explicit statistical support for reproducibility. In the revised manuscript we will report the number of independent runs (minimum 10), include error bars and variance measures (standard deviation and confidence intervals), describe the exact reference baseline used for the 55% reduction, and add statistical significance tests (e.g., paired t-tests) comparing RL-Loop against the reference operating point. revision: yes

  2. Referee: [§4] §4 (Traffic Generation and Evaluation Methodology): the traffic scenarios are confined to a narrow set of mobility patterns (specific vehicle counts, speeds, and service mixes) with no reported stress tests, distribution-shift experiments, or out-of-distribution traces; this directly undermines the assumption that the learned PPO policy remains stable and safe under real-world variations.

    Authors: The scenarios in Section 4 are derived from real connected-vehicle traces to reflect representative conditions. We will expand the section with additional stress tests at higher vehicle densities and varied speed distributions, plus limited distribution-shift experiments using synthetically perturbed traces. Comprehensive out-of-distribution evaluation will be acknowledged as a limitation and flagged for future work if exhaustive coverage exceeds revision scope. revision: partial

  3. Referee: [§5.2] §5.2 (Policy Evaluation): no ablation on training stability across random seeds, no analysis of edge-case QoS violations, and no comparison against standard non-RL baselines (e.g., static allocation or threshold-based controllers) are provided, leaving the generalization and superiority claims unsupported.

    Authors: We acknowledge these omissions. The revised Section 5.2 will add (i) training stability results across multiple random seeds with convergence metrics, (ii) explicit analysis of edge cases and resulting QoS violations together with policy behavior, and (iii) direct comparisons against non-RL baselines including static allocation and threshold-based controllers to support the claimed advantages of the PPO agent. revision: yes

Circularity Check

0 steps flagged

No circularity; experimental comparison to reference point is independent

full rationale

The paper introduces RL-Loop as a PPO-based closed-loop controller for 5G CPU slicing and reports direct testbed measurements of CPU allocation and QoS metrics at 1-second granularity. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The >55% CPU reduction claim is a straightforward empirical delta against a fixed reference operating point, with no self-definitional reduction or load-bearing uniqueness theorem. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, or new entities are introduced in the abstract; the work relies on standard PPO assumptions and experimental comparison.

pith-pipeline@v0.9.0 · 5480 in / 1086 out tokens · 53770 ms · 2026-05-13T20:49:39.945536+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

  1. [1]

    Towards supporting intelligence in 5g/6g core networks: Nwdaf implementation and initial analysis,

    A. Chouman, D. M. Manias, and A. Shami, “Towards supporting intelligence in 5g/6g core networks: Nwdaf implementation and initial analysis,” in2022 International Wireless Communications and Mobile Computing (IWCMC), 2022, pp. 324–329

  2. [2]

    An nwdaf approach to 5g core network signaling traffic: Analysis and characterization,

    D. M. Manias, A. Chouman, and A. Shami, “An nwdaf approach to 5g core network signaling traffic: Analysis and characterization,” in GLOBECOM 2022 - 2022 IEEE Global Communications Conference, 2022, pp. 6001–6006

  3. [3]

    End-to-end network slicing in radio access network, transport network and core network domains,

    X. Li, R. Ni, J. Chen, Y . Lyu, Z. Rong, and R. Du, “End-to-end network slicing in radio access network, transport network and core network domains,”IEEE Access, vol. 8, pp. 29 525–29 537, 2020

  4. [4]

    Reinforcement learning for dynamic resource optimization in 5g radio access network slicing,

    Y . Shi, Y . E. Sagduyu, and T. Erpek, “Reinforcement learning for dynamic resource optimization in 5g radio access network slicing,” in 2020 IEEE 25th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 2020, pp. 1–6

  5. [5]

    Applications of machine learning in resource management for ran-slicing in 5g and beyond networks: A survey,

    Y . Azimi, S. Yousefi, H. Kalbkhani, and T. Kunz, “Applications of machine learning in resource management for ran-slicing in 5g and beyond networks: A survey,”IEEE Access, vol. 10, pp. 106 581–106 612, 2022

  6. [6]

    Deepcog: Optimizing resource provisioning in network slicing with ai-based capacity forecasting,

    D. Bega, M. Gramaglia, M. Fiore, A. Banchs, and X. Costa-P ´erez, “Deepcog: Optimizing resource provisioning in network slicing with ai-based capacity forecasting,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 2, pp. 361–376, 2020

  7. [7]

    Microopt: Model-driven slice resource optimization in 5g and beyond networks,

    M. Sulaiman, M. Ahmadi, B. Sun, M. A. Salahuddin, R. Boutaba, and A. Saleh, “Microopt: Model-driven slice resource optimization in 5g and beyond networks,”IEEE Transactions on Network and Service Management, vol. 22, no. 5, pp. 4448–4462, 2025

  8. [8]

    Model drift in dynamic networks,

    D. M. Manias, A. Chouman, and A. Shami, “Model drift in dynamic networks,”IEEE Communications Magazine, vol. 61, no. 10, pp. 78–84, 2023

  9. [9]

    Machine learning for performance-aware virtual network function placement,

    D. M. Manias, M. Jammal, H. Hawilo, A. Shami, P. Heidari, A. Larabi, and R. Brunner, “Machine learning for performance-aware virtual network function placement,” in2019 IEEE Global Communications Conference (GLOBECOM), 2019, pp. 1–6

  10. [10]

    Joint baseband and radio resource allocation for 5g network slicing in h-crans,

    C. Xing, Y . L. Lee, and Y . C. Chang, “Joint baseband and radio resource allocation for 5g network slicing in h-crans,” in2022 IEEE International Conference on Communications Workshops (ICC Workshops), 2022, pp. 891–896

  11. [11]

    Closed-loop automation for 5g slice assurance,

    P. Naik, C. Govindarajan, S. Goel, K. Govindarajan, D. Behl, A. Singh, M. Thomas, U. Mangla, and P. Jayachandran, “Closed-loop automation for 5g slice assurance,” in2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS), 2022, pp. 424– 426

  12. [12]

    Management closed control loop automation for improved ran slice performance by subslic- ing,

    M. Kulmar, M. Mahtab Alam, and I. M ¨u¨ursepp, “Management closed control loop automation for improved ran slice performance by subslic- ing,”IEEE Access, vol. 12, pp. 103 082–103 095, 2024

  13. [13]

    Hierarchical reinforcement learning based resource allocation for RAN slicing,

    H. Anıl Akyıldız, ¨O. Faruk Gemici, I. H ¨okelek, and H. Ali C ¸ ırpan, “Hierarchical reinforcement learning based resource allocation for RAN slicing,”IEEE Access, vol. 12, pp. 75 818–75 831, 2024

  14. [14]

    Deep rein- forcement learning approaches to network slice scaling and placement: A survey,

    N. Saha, M. Zangooei, M. Golkarifard, and R. Boutaba, “Deep rein- forcement learning approaches to network slice scaling and placement: A survey,”IEEE Communications Magazine, vol. 61, no. 2, pp. 82–87, 2023

  15. [15]

    Maghe: A predictive mobility-aware framework for vnf embedding in b5g networks,

    L. Tarkh, M. Li, I. Batool, M. M. Fouda, M. I. Ibrahem, H. Lutfiyya, and Z. M. Fadlullah, “Maghe: A predictive mobility-aware framework for vnf embedding in b5g networks,” in2025 IEEE Virtual Conference on Communications (VCC), 2025, pp. 1–6

  16. [16]

    Per- formance evaluation of 5g use cases for smart factory,

    M. Alfaqawi, S. Baron, V . Pitard, S. Davai, and N. Banoun, “Per- formance evaluation of 5g use cases for smart factory,” in2022 In- ternational Conference on Smart Applications, Communications and Networking (SmartNets), 2022, pp. 01–06

  17. [17]

    A survey on beyond 5g network slicing for smart cities applications,

    W. Rafique, J. Rani Barai, A. O. Fapojuwo, and D. Krishnamurthy, “A survey on beyond 5g network slicing for smart cities applications,”IEEE Communications Surveys & Tutorials, vol. 27, no. 1, pp. 595–628, 2025

  18. [18]

    Security orchestration in 5g and beyond smart network technologies,

    S. Batewela, M. Liyanage, E. Zeydan, M. Ylianttila, and P. Ranaweera, “Security orchestration in 5g and beyond smart network technologies,” IEEE Open Journal of the Computer Society, vol. 6, pp. 554–573, 2025

  19. [19]

    A multi-source dataset of urban life in the city of milan and the province of trentino,

    G. Barlacchi, M. De Nadai, R. Larcher, A. Casella, C. Chitic, G. Torrisi, F. Antonelli, A. Vespignani, A. Pentland, and B. Lepri, “A multi-source dataset of urban life in the city of milan and the province of trentino,” Scientific Data, vol. 2, p. 150055, 2015

  20. [20]

    Onslicing: online end-to-end network slicing with reinforcement learning,

    Q. Liu, N. Choi, and T. Han, “Onslicing: online end-to-end network slicing with reinforcement learning,” inProceedings of the 17th International Conference on Emerging Networking EXperiments and Technologies, ser. CoNEXT ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 141–153. [Online]. Available: https://doi.org/10.1145/3485983.3494850

  21. [21]

    A modular, end-to-end next-generation network testbed: Toward a fully automated network management platform,

    A. Chouman, D. M. Manias, and A. Shami, “A modular, end-to-end next-generation network testbed: Toward a fully automated network management platform,”IEEE Transactions on Network and Service Management, vol. 21, no. 5, pp. 5445–5463, 2024

  22. [22]

    Epic sax gandalf 10 hours [ORIGINAL],

    tejulaify, “Epic sax gandalf 10 hours [ORIGINAL],” https://www. youtube.com/watch?v=Gy 1 bk2-7w. 8