An Intelligent eUPF for Time-Sensitive Path Selection in B5G Edge Networks
Pith reviewed 2026-05-09 18:00 UTC · model grok-4.3
The pith
A DQN agent inside an enhanced User Plane Function selects low-latency paths in B5G networks using passive eBPF measurements.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By formulating path selection as a Partially Observable Markov Decision Process and supplying it with delay estimates obtained from eBPF-linked TEID timestamps in GTP-U traffic, the eUPF can run a DQN agent that learns to pick lower-latency routes between edge and cloud in real time. The agent outperforms a random policy on average latency, reward stability, and reliability of low-delay choices, establishing that AI-driven control can operate inside the user plane without active measurement traffic.
What carries the argument
DQN agent whose state is built from POMDP observations supplied by eBPF passive delay measurement on GTP-U TEID timestamps.
If this is right
- The DQN policy achieves measurably lower average latency than random path selection.
- Reward values remain more stable across episodes, indicating consistent path quality.
- Low-delay paths are chosen more reliably than under random selection.
- Passive eBPF measurement supplies the necessary state information without adding probe traffic.
- Reinforcement learning can be embedded directly inside B5G core network functions.
Where Pith is reading between the lines
- The same passive timestamp-linking approach could be reused for other metrics such as jitter or packet loss in GTP-U flows.
- The eUPF architecture could be extended to joint path and compute selection across multiple MEC sites.
- Production deployment would reduce monitoring overhead compared with active probing schemes.
- The learned policies might transfer to related 5G control tasks such as dynamic slice assignment.
Load-bearing premise
The POMDP formulation together with the eBPF-derived delay estimates are assumed to provide a sufficiently accurate and low-overhead model of real network conditions for the DQN agent to learn effective policies.
What would settle it
A side-by-side run of the DQN agent and random baseline on a live B5G testbed in which the agent shows no reduction in average latency or improvement in reward stability.
Figures
read the original abstract
In Beyond 5G (B5G) networks, intelligent, flexible traffic management is essential to meet the stringent speed and reliability requirements of new applications. This paper presents an improved User Plane Function (eUPF) design that uses a Deep Q-Network (DQN) agent for real-time path selection between Multi-access Edge Computing (MEC) and cloud endpoints. The path selection problem is formulated as a Partially Observable Markov Decision Process (POMDP). We propose a novel passive delay measurement method that uses eBPF programs to link TEID-based timestamps in GTP-U traffic, allowing for low-cost delay estimation without active testing. Experiments show that the DQN agent substantially outperforms a random baseline, with lower average latency, more stable rewards, and more reliable low-delay path choices. These results demonstrate the effectiveness of AI-driven control in B5G core networks and the promise of reinforcement learning for modern network management.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an enhanced User Plane Function (eUPF) for Beyond 5G edge networks that uses a Deep Q-Network (DQN) agent to perform real-time path selection between MEC and cloud endpoints. The problem is cast as a POMDP whose observations come from a novel passive delay estimator that links TEID timestamps in GTP-U traffic via eBPF programs. Experiments are reported to show that the learned policy yields lower average latency, more stable rewards, and more reliable low-delay paths than a random baseline.
Significance. If the empirical gains prove robust, the work would be of moderate significance for B5G core-network research. It illustrates a concrete combination of eBPF-based passive observability with reinforcement learning for latency-sensitive routing, an approach that could reduce the overhead of active probing while enabling adaptive traffic steering. No machine-checked proofs or fully reproducible artifacts are described, but the empirical comparison to a baseline supplies a falsifiable starting point that future studies could extend.
major comments (2)
- [Experimental results and passive delay measurement description] The central claim that the DQN agent substantially outperforms the random baseline rests on the accuracy of the eBPF-derived delay estimates used to construct POMDP observations, yet the manuscript supplies no quantitative calibration (MAE, correlation, or comparison against active probes or hardware timestamps) of these passive measurements. Without such validation, it is impossible to determine whether the reported latency reductions and stable rewards reflect genuine network conditions or artifacts of the estimator.
- [Experiments section] No details are given on the simulation or testbed configuration, number of independent runs, variance across trials, or statistical tests supporting the claims of lower average latency and more reliable path choices. Likewise, the POMDP state representation, reward function, and DQN hyper-parameters are not specified, rendering the performance gains impossible to reproduce or assess for robustness.
minor comments (1)
- [Abstract] The abstract states that the DQN agent provides 'more stable rewards' but does not define the reward function or the metric used to quantify stability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on experimental validation and reproducibility. We address each point below and commit to revisions that strengthen the manuscript without altering its core contributions.
read point-by-point responses
-
Referee: The central claim that the DQN agent substantially outperforms the random baseline rests on the accuracy of the eBPF-derived delay estimates used to construct POMDP observations, yet the manuscript supplies no quantitative calibration (MAE, correlation, or comparison against active probes or hardware timestamps) of these passive measurements. Without such validation, it is impossible to determine whether the reported latency reductions and stable rewards reflect genuine network conditions or artifacts of the estimator.
Authors: We agree that quantitative validation of the passive delay estimator is necessary to substantiate the performance claims. The original manuscript emphasizes the novel integration of eBPF-based passive measurement with DQN-driven path selection rather than standalone estimator benchmarking. In the revised manuscript we will add a new subsection to the Experiments section that reports calibration results, including MAE, Pearson correlation, and direct comparisons against active probing and hardware timestamps on the same testbed traffic. revision: yes
-
Referee: No details are given on the simulation or testbed configuration, number of independent runs, variance across trials, or statistical tests supporting the claims of lower average latency and more reliable path choices. Likewise, the POMDP state representation, reward function, and DQN hyper-parameters are not specified, rendering the performance gains impossible to reproduce or assess for robustness.
Authors: We accept that the original submission omitted these implementation details, limiting reproducibility. The revised Experiments section will be expanded to provide: (i) complete simulation and testbed configuration (topology, link capacities, traffic generators, and eBPF deployment), (ii) results from 20 independent runs with reported means and standard deviations, (iii) statistical significance tests (paired t-tests with p-values), and (iv) the full POMDP specification (state features, action space, reward function) together with all DQN hyperparameters (network architecture, learning rate, discount factor, replay buffer size, and training schedule). revision: yes
Circularity Check
No circularity; empirical comparison to baseline is independent of definitions
full rationale
The paper formulates path selection as a POMDP, implements passive eBPF-based delay estimation from TEID timestamps, trains a DQN agent, and reports experimental outperformance versus a random baseline on latency and reward metrics. No load-bearing step reduces by construction to its own inputs: there are no fitted parameters renamed as predictions, no self-definitional equations, and no uniqueness theorems or ansatzes imported via self-citation. The central result is an external empirical comparison whose validity depends on the accuracy of the measurement method and testbed, not on any definitional equivalence within the paper itself.
Axiom & Free-Parameter Ledger
free parameters (1)
- DQN hyperparameters
axioms (1)
- domain assumption Path selection in B5G edge networks can be usefully modeled as a Partially Observable Markov Decision Process
Reference graph
Works this paper leans on
-
[1]
doi: 10.1109/tpami.2025.3562422
Agarwal, B., Irmer, R., Lister, D., Muntean, G.M.: Open ran for 6g networks: Ar- chitecture, use cases and open issues. IEEE Communications Surveys & Tutorials pp. 1–1 (2025). https://doi.org/10.1109/COMST.2025.3562429
-
[2]
In: 2024 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN)
Bellin, A., Di Cicco, N., Munaretto, D., Granelli, F.: Power consumption-aware 5g edge upf selection using deep reinforcement learning. In: 2024 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN). pp. 1–6 (2024). https://doi.org/10.1109/NFV-SDN61811.2024.10807472
-
[3]
IEEE Network39(3), 91–98 (2025)
Chen, C.C., Chang, C.Y., Nikaein, N.: Iup: Integrated and programmable user plane for next-generation mobile networks. IEEE Network39(3), 91–98 (2025). https://doi.org/10.1109/MNET.2025.3551245
-
[4]
IEEE Access13, 103077–103094 (2025)
Christopoulou, M., Koufos, I., Xilouris, G., Dimitriou, N.: 5G/6G Architecture Evolution for XR and Metaverse: Feasibility Study, Security, and Privacy Chal- lenges for Smart Culture Applications. IEEE Access13, 103077–103094 (2025). https://doi.org/10.1109/ACCESS.2025.3578595
-
[5]
Donatti, A., Corrêa, S.L., Martins, J.S.B., Abelem, A.J.G., Both, C.B., de Oliveira Silva, F., Suruagy, J.A., Pasquini, R., Moreira, R., Cardoso, K.V., Carvalho, T.C.: Survey on machine learning-enabled network slicing: Covering the entire life cycle. IEEE Transactions on Network and Service Management21(1), 994–1011 (2024). https://doi.org/10.1109/TNSM.2...
-
[6]
Gan, Z., Lin, R., Zou, H.: A multi-agent deep reinforcement learning approach for computation offloading in 5g mobile edge computing. In: 2022 22nd IEEE In- ternational Symposium on Cluster, Cloud and Internet Computing (CCGrid). pp. 645–648 (2022). https://doi.org/10.1109/CCGrid54584.2022.00074
-
[7]
IEEE Network38(4), 182–189 (2024)
Garcia-Martin, M.A., Gramaglia, M., Serrano, P.: Network automation and data analytics in 3gpp 5g systems. IEEE Network38(4), 182–189 (2024). https://doi.org/10.1109/MNET.2023.3321524
-
[8]
Big Data and Cognitive Comput- ing8(12) (2024)
Hisyam Ng, H.A., Mahmoodi, T.: Machine learning-driven dynamic traffic steer- ing in 6g: A novel path selection scheme. Big Data and Cognitive Comput- ing8(12) (2024). https://doi.org/10.3390/bdcc8120172,https://www.mdpi.com/ 2504-2289/8/12/172
-
[9]
Ibrahimi, K., Jouhari, M., Sow, S., Ayoub, F., Kamili, M.E., Choug- dali, K.: Reinforcement learning for optimized resource allocation in 5g urllc. In: 2024 7th International Conference on Advanced Com- munication Technologies and Networking (CommNet). pp. 1–6 (2024). https://doi.org/10.1109/CommNet63022.2024.10793310
-
[10]
Network Energy Saving for 6G and Beyond: A Deep Re- inforcement Learning Approach,
Kibalya, G., Dalgitsis, M., Serrano, M.A., Bartzoudis, N., Blanco, L., Zey- dan, E., Antonopoulos, A.: Joint upf and application placement in multi- slice edge networks: A reinforcement learning strategy. In: 2025 IEEE Wire- less Communications and Networking Conference (WCNC). pp. 1–6 (2025). https://doi.org/10.1109/WCNC61545.2025.10978828
-
[11]
IEEE Transactions on Mobile Computing23(10), 9324–9336 (2024)
Maleki, E.F., Ma, W., Mashayekhy, L., La Roche, H.J.: Qos-aware con- tent delivery in 5g-enabled edge computing: Learning-based approaches. IEEE Transactions on Mobile Computing23(10), 9324–9336 (2024). https://doi.org/10.1109/TMC.2024.3363143
-
[12]
In: Anais do IV Workshop de Redes 6G
Moreira, L., Moreira, R., Silva, F., Backes, A.: Towards Cognitive Ser- vice Delivery on B5G through AIaaS Architecture. In: Anais do IV Workshop de Redes 6G. pp. 1–8. SBC, Porto Alegre, RS, Brasil (2024). https://doi.org/10.5753/w6g.2024.3304,https://sol.sbc.org.br/index.php/ w6g/article/view/29773 Title Suppressed Due to Excessive Length 15
-
[13]
Computer Communications179, 131–144 (2021)
Moreira,R.,Rosa,P.F.,Aguiar,R.L.A.,deOliveiraSilva,F.:Nasor:Anetworkslic- ing approach for multiple autonomous systems. Computer Communications179, 131–144 (2021). https://doi.org/https://doi.org/10.1016/j.comcom.2021.07.028, https://www.sciencedirect.com/science/article/pii/S0140366421002917
-
[14]
IEEE Access9, 165892–165906 (2021)
Nguyen, H.T., Van Do, T., Rotter, C.: Scaling upf instances in 5g/6g core with deep reinforcement learning. IEEE Access9, 165892–165906 (2021). https://doi.org/10.1109/ACCESS.2021.3135315
-
[15]
IEEE Trans- actions on Network and Service Management22(2), 1174–1187 (2025)
Pimpalkar, Y., Ravindran, S., Bapat, J., Das, D.: A novel e2e path se- lection algorithm for superior qos and qoe for 6g services. IEEE Trans- actions on Network and Service Management22(2), 1174–1187 (2025). https://doi.org/10.1109/TNSM.2024.3519707
-
[16]
IEEE Access13, 4547–4561 (2025)
Sasithong, P., Sanguanpuak, T., Vanichchanunt, P., Wuttisittikulkij, L.: User plane function (upf) allocation for c-v2x network using deep reinforcement learning. IEEE Access13, 4547–4561 (2025). https://doi.org/10.1109/ACCESS.2024.3524886
-
[17]
IEEE Transactions on Mobile Computing23(5), 5097–5110 (2024)
Shokrnezhad, M., Taleb, T., Dazzi, P.: Double deep q-learning-based path selection and service placement for latency-sensitive beyond 5g applica- tions. IEEE Transactions on Mobile Computing23(5), 5097–5110 (2024). https://doi.org/10.1109/TMC.2023.3301506
-
[18]
In: Anais do XVI Workshop de Pesquisa Experimen- tal da Internet do Futuro
Silva, B., Moreira, L.R., de Oliveira Silva, F., Moreira, R.: Optimizing Edge Gaming Slices through an Enhanced User Plane Function and Analytics in Beyond-5G Networks. In: Anais do XVI Workshop de Pesquisa Experimen- tal da Internet do Futuro. pp. 1–8. SBC, Porto Alegre, RS, Brasil (2025). https://doi.org/10.5753/wpeif.2025.8714,https://sol.sbc.org.br/in...
-
[19]
IEEE Access12, 88370–88382 (2024)
Tran, M.N., Duong, V.B., Kim, Y.: Design of computing-aware traffic steering architecture for 5g mobile user plane. IEEE Access12, 88370–88382 (2024). https://doi.org/10.1109/ACCESS.2024.3418960
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.