Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem

Ali Al Housseini; Cristina Rottondi; Omran Ayoub

arxiv: 2512.05207 · v2 · submitted 2025-12-04 · 💻 cs.NI · cs.LG· cs.MA

Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem

Ali Al Housseini , Cristina Rottondi , Omran Ayoub This is my paper

Pith reviewed 2026-05-17 00:26 UTC · model grok-4.3

classification 💻 cs.NI cs.LGcs.MA

keywords Virtual Network EmbeddingVNE with AlternativesHierarchical Reinforcement LearningDynamic Network SlicingResource AllocationReinforcement LearningNetwork VirtualizationAlternative Topologies

0 comments

The pith

Hierarchical reinforcement learning improves dynamic virtual network embedding with alternative topologies over standard baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that splitting the embedding decision into two reinforcement learning layers lets a system handle requests that can take several different topologies. A high-level policy picks the best alternative or rejects the request, while a low-level policy maps the chosen topology onto the physical network. This setup matters because real network slicing must cope with requests arriving over time and with flexibility in how each request uses resources. When the two policies work together, experiments on realistic networks report higher acceptance rates, more revenue, and better revenue-to-cost ratios than simpler methods or non-hierarchical learning. The results also include a comparison to an exact optimization solver on small cases to show how much room remains for improvement.

Core claim

The authors present HRL-VNEAP, in which a high-level policy selects the most suitable alternative topology for each arriving virtual network request or rejects it, and a low-level policy then embeds the selected topology onto the substrate network. Under dynamic request arrivals and varying traffic loads on realistic topologies, this hierarchical approach produces higher acceptance ratios, greater total revenue, and improved revenue-over-cost ratios than the tested baseline strategies.

What carries the argument

The hierarchical reinforcement learning structure that separates topology selection at the high level from embedding decisions at the low level.

If this is right

The approach raises the fraction of accepted virtual network requests by as much as 20.7 percent compared with the strongest baselines.
Total revenue collected increases by up to 36.2 percent.
The ratio of revenue to cost improves by as much as 22.1 percent.
A remaining gap to the solutions found by mixed-integer linear programming on small instances is quantified and left as motivation for hybrid methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of topology choice from placement may reduce the search space enough to make reinforcement learning stable for other resource-allocation tasks that involve equivalent configurations.
If the performance edge holds on larger instances, operators could serve more slices without adding physical capacity.
Combining the learned policies with periodic exact optimization steps could shrink the optimality gap reported in the benchmarks.

Load-bearing premise

The high-level policy can pick suitable topologies and the low-level policy can embed them reliably without training instability or high computation costs when requests arrive over time.

What would settle it

Running the trained policies on a fresh set of larger substrate networks with higher arrival rates and measuring whether acceptance ratio and revenue gains disappear or training diverges.

Figures

Figures reproduced from arXiv: 2512.05207 by Ali Al Housseini, Cristina Rottondi, Omran Ayoub.

**Figure 2.** Figure 2: Schematic representation of the proposed HRL architecture. HL agent observes the substrate and request features to select the optimal alternative. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Schematic overview of the policy design node within the SN. At each step t, the agent will select an action a LL t,k that corresponds to a substrate node where node k of alternative a HL t will be embedded. The allocation of links is done progressively via K-Shortest Path (K-SP) [11, 12]. The Low-level reward RLL is multi-objective, to alleviate the sparse reward issue, leading to a more efficient explorat… view at source ↗

**Figure 4.** Figure 4: Comparison of methods under arrival rates [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of methods under arrival rates [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative topologies (VNEAP) was introduced to capture malleable VNRs, where each request can be instantiated using one of several functionally equivalent topologies that trade resources differently. While this flexibility enlarges the feasible space, it also introduces an additional decision layer, making dynamic embedding more challenging. This paper proposes HRL-VNEAP, a hierarchical reinforcement learning approach for VNEAP under dynamic arrivals. A high-level policy selects the most suitable alternative topology (or rejects the request), and a low-level policy embeds the chosen topology onto the substrate network. Experiments on realistic substrate topologies under multiple traffic loads show that naive exploitation strategies provide only modest gains, whereas HRL-VNEAP consistently achieves the best performance across all metrics. Compared to the strongest tested baselines, HRL-VNEAP improves acceptance ratio by up to \textbf{20.7\%}, total revenue by up to \textbf{36.2\%}, and revenue-over-cost by up to \textbf{22.1\%}. Finally, we benchmark against an MILP formulation on tractable instances to quantify the remaining gap to optimality and motivate future work on learning- and optimization-based VNEAP solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces HRL-VNEAP, a hierarchical reinforcement learning approach for dynamic Virtual Network Embedding with Alternative topologies (VNEAP). A high-level policy selects one of several functionally equivalent topologies (or rejects the VNR), while a low-level policy embeds the chosen topology onto the substrate. Experiments on realistic topologies under varying traffic loads report that HRL-VNEAP outperforms baselines, with gains of up to 20.7% in acceptance ratio, 36.2% in total revenue, and 22.1% in revenue-over-cost; a MILP benchmark on small instances is included to quantify the optimality gap.

Significance. If the empirical claims hold under rigorous validation, the work meaningfully extends VNE research by incorporating topology alternatives via hierarchical RL, demonstrating practical gains over naive exploitation and standard methods in dynamic settings. The concrete percentage improvements and MILP comparison provide a useful baseline for future learning- and optimization-based VNEAP solutions, though the significance depends on addressing gaps in experimental detail and stability analysis.

major comments (3)

[Experiments] Experiments section: the reported performance gains (e.g., 20.7% acceptance ratio) are presented without specification of training procedure details, hyperparameter values, number of random seeds, variance across runs, or statistical significance tests, leaving the robustness of superiority under dynamic arrivals only partially supported.
[§3] §3 (Hierarchical RL formulation): the high-level policy receives delayed and sparse feedback only after low-level embedding success or failure, yet no mechanisms (separate critics, termination functions, or shaped rewards) are described to mitigate credit-assignment problems, which risks high-variance policies or collapse under varying loads.
[Experiments] Baseline comparisons: exact implementation details, tuning procedures, and source of the 'strongest tested baselines' are not provided, undermining the ability to verify the claimed outperformance margins.

minor comments (2)

[§2] Notation for alternative topologies and their resource trade-offs could be clarified with an explicit example in the problem formulation section.
[Experiments] Figure captions for performance plots should include error bars or confidence intervals if multiple runs were performed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help improve the clarity and rigor of our work on HRL-VNEAP. We address each major comment point by point below, providing explanations and indicating revisions where the manuscript will be updated in the next version.

read point-by-point responses

Referee: [Experiments] Experiments section: the reported performance gains (e.g., 20.7% acceptance ratio) are presented without specification of training procedure details, hyperparameter values, number of random seeds, variance across runs, or statistical significance tests, leaving the robustness of superiority under dynamic arrivals only partially supported.

Authors: We agree that the original presentation of results would benefit from greater transparency on experimental robustness. In the revised manuscript, we have added a new subsection in the Experiments section that fully specifies the training procedure (including episode counts, learning rate schedules, and replay buffer sizes), all hyperparameter values for both the high-level and low-level policies, the use of five independent random seeds, mean and standard deviation of performance metrics across runs, and p-values from paired t-tests confirming statistical significance of the reported gains (including the 20.7% acceptance ratio improvement) under each traffic load. revision: yes
Referee: [§3] §3 (Hierarchical RL formulation): the high-level policy receives delayed and sparse feedback only after low-level embedding success or failure, yet no mechanisms (separate critics, termination functions, or shaped rewards) are described to mitigate credit-assignment problems, which risks high-variance policies or collapse under varying loads.

Authors: The concern about credit assignment in the hierarchical setting is valid given the delayed feedback structure. Our formulation already incorporates separate critics for the high-level and low-level policies within an actor-critic framework, along with a shaped reward for the high-level policy that combines a binary success/failure signal with a continuous component based on the revenue-over-cost ratio achieved by the low-level embedding. Termination occurs explicitly upon low-level completion or rejection. We have expanded §3 to describe these design choices explicitly, including the exact reward shaping function and how the hierarchical update mitigates variance, thereby strengthening the methodological justification. revision: yes
Referee: [Experiments] Baseline comparisons: exact implementation details, tuning procedures, and source of the 'strongest tested baselines' are not provided, undermining the ability to verify the claimed outperformance margins.

Authors: We acknowledge that reproducibility requires more explicit baseline documentation. The revised Experiments section now includes a table and accompanying text detailing the exact implementation of each baseline (neural architectures, optimization algorithms, and any custom adaptations for VNEAP), the hyperparameter tuning process (grid search ranges and selected values), and the sources (re-implementations based on the original publications or publicly available code repositories referenced in the paper). These additions enable direct verification of the outperformance margins reported against the strongest baselines. revision: yes

Circularity Check

0 steps flagged

Standard hierarchical RL applied to VNEAP; no reductions to self-defined inputs

full rationale

The paper defines HRL-VNEAP with a high-level policy selecting among alternative topologies (or rejecting) and a low-level policy performing the embedding. Reported gains in acceptance ratio, revenue, and revenue-over-cost are obtained from direct experimental comparisons against baselines on realistic substrate topologies under multiple traffic loads. No equations, fitted parameters, or claims in the provided text reduce these outcomes to quantities defined by the authors' own prior results or by construction from the method's inputs. The derivation relies on standard hierarchical RL components applied to the VNEAP formulation and is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger reflects typical background assumptions in RL-based network embedding papers rather than explicit statements from the full text.

axioms (2)

domain assumption Alternative topologies for each VNR are provided as input and are functionally equivalent.
The abstract states that each request can be instantiated using one of several functionally equivalent topologies.
domain assumption The substrate network state is fully observable to the agents at decision time.
Standard assumption for embedding policies in dynamic VNE settings.

pith-pipeline@v0.9.0 · 5552 in / 1222 out tokens · 60476 ms · 2026-05-17T00:26:03.313662+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A high-level policy selects the most suitable alternative topology (or rejects the request), and a low-level policy embeds the chosen topology onto the substrate network.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HRL-VNEAP improves acceptance ratio by up to 20.7%, total revenue by up to 36.2%

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

[1]

An overview of network slicing for 5g,

Z. S. et al, “An overview of network slicing for 5g,”IEEE Wireless Communications, 2019

work page 2019
[2]

Virtual network embedding: A survey,

F. et al, “Virtual network embedding: A survey,”IEEE Communications Surveys & Tutorials, 2013

work page 2013
[3]

The power of alternatives in network embedding,

K. et al, “The power of alternatives in network embedding,” inIEEE INFOCOM 2025, 2025

work page 2025
[4]

On orches- trating virtual network functions,

M. F. Bari, S. R. Chowdhury, R. Ahmed, and R. Boutaba, “On orches- trating virtual network functions,” in2015 11th international conference on network and service management (CNSM). IEEE, 2015, pp. 50–56

work page 2015
[5]

Rethinking virtual network embedding: substrate support for path splitting and migration,

Y . M. et al, “Rethinking virtual network embedding: substrate support for path splitting and migration,”ACM SIGCOMM Computer Commu- nication Review, 2008

work page 2008
[6]

Ai-empowered virtual network embedding: A comprehen- sive survey,

W. S. et al, “Ai-empowered virtual network embedding: A comprehen- sive survey,”IEEE Communications Surveys & Tutorials, 2025

work page 2025
[7]

Reinforcement learning-based virtual network embed- ding: A comprehensive survey,

H.-K. L. et al, “Reinforcement learning-based virtual network embed- ding: A comprehensive survey,”ICT Express, 2023

work page 2023
[8]

Vne-hrl: A proactive virtual network embedding algorithm based on hierarchical reinforcement learning,

C. J. et al, “Vne-hrl: A proactive virtual network embedding algorithm based on hierarchical reinforcement learning,”IEEE Transactions on Network and Service Management, 2021

work page 2021
[9]

Joint admission control and resource allocation of virtual network embedding via hierarchical deep reinforcement learning,

W. T. et al, “Joint admission control and resource allocation of virtual network embedding via hierarchical deep reinforcement learning,”IEEE Transactions on Services Computing, 2023

work page 2023
[10]

Polyvine: Policy-based virtual network embedding,

C. M. et al, “Polyvine: Policy-based virtual network embedding,”IEEE Transactions on Network and Service Management, 2012

work page 2012
[11]

Automatic virtual network embedding: A deep reinforcement learning approach with graph convolutional networks,

Y . Z. et al, “Automatic virtual network embedding: A deep reinforcement learning approach with graph convolutional networks,”IEEE Journal on Selected Areas in Communications, 2020

work page 2020
[12]

Virtual network embedding through topology-aware node ranking,

C. X. et al, “Virtual network embedding through topology-aware node ranking,”ACM SIGCOMM Computer Communication Review, 2011

work page 2011
[13]

Proximal Policy Optimization Algorithms

S. J. et al, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Fine-tuning cnn image retrieval with no human annotation,

R. F. et al, “Fine-tuning cnn image retrieval with no human annotation,” IEEE transactions on pattern analysis and machine intelligence, 2018

work page 2018
[15]

SNDlib 1.0–Survivable Network Design Library,

S. O. et al, “SNDlib 1.0–Survivable Network Design Library,” in Proceedings of the 3rd International Network Optimization Conference (INOC 2007), Spa, Belgium, 2007

work page 2007
[16]

Node essentiality assessment and distributed collaborative virtual network embedding in datacenters,

F. W. et al, “Node essentiality assessment and distributed collaborative virtual network embedding in datacenters,”IEEE Transactions on Par- allel and Distributed Systems, 2023

work page 2023
[17]

J. C. et al,VNE-HPSO Virtual Network Embedding Algorithm Based on Hybrid Particle Swarm Optimization. Springer Singapore, 2021

work page 2021
[18]

Reinforcement learning assisted bandwidth aware virtual network resource allocation,

Z. P. et al, “Reinforcement learning assisted bandwidth aware virtual network resource allocation,”IEEE Transactions on Network and Ser- vice Management, 2022

work page 2022

[1] [1]

An overview of network slicing for 5g,

Z. S. et al, “An overview of network slicing for 5g,”IEEE Wireless Communications, 2019

work page 2019

[2] [2]

Virtual network embedding: A survey,

F. et al, “Virtual network embedding: A survey,”IEEE Communications Surveys & Tutorials, 2013

work page 2013

[3] [3]

The power of alternatives in network embedding,

K. et al, “The power of alternatives in network embedding,” inIEEE INFOCOM 2025, 2025

work page 2025

[4] [4]

On orches- trating virtual network functions,

M. F. Bari, S. R. Chowdhury, R. Ahmed, and R. Boutaba, “On orches- trating virtual network functions,” in2015 11th international conference on network and service management (CNSM). IEEE, 2015, pp. 50–56

work page 2015

[5] [5]

Rethinking virtual network embedding: substrate support for path splitting and migration,

Y . M. et al, “Rethinking virtual network embedding: substrate support for path splitting and migration,”ACM SIGCOMM Computer Commu- nication Review, 2008

work page 2008

[6] [6]

Ai-empowered virtual network embedding: A comprehen- sive survey,

W. S. et al, “Ai-empowered virtual network embedding: A comprehen- sive survey,”IEEE Communications Surveys & Tutorials, 2025

work page 2025

[7] [7]

Reinforcement learning-based virtual network embed- ding: A comprehensive survey,

H.-K. L. et al, “Reinforcement learning-based virtual network embed- ding: A comprehensive survey,”ICT Express, 2023

work page 2023

[8] [8]

Vne-hrl: A proactive virtual network embedding algorithm based on hierarchical reinforcement learning,

C. J. et al, “Vne-hrl: A proactive virtual network embedding algorithm based on hierarchical reinforcement learning,”IEEE Transactions on Network and Service Management, 2021

work page 2021

[9] [9]

Joint admission control and resource allocation of virtual network embedding via hierarchical deep reinforcement learning,

W. T. et al, “Joint admission control and resource allocation of virtual network embedding via hierarchical deep reinforcement learning,”IEEE Transactions on Services Computing, 2023

work page 2023

[10] [10]

Polyvine: Policy-based virtual network embedding,

C. M. et al, “Polyvine: Policy-based virtual network embedding,”IEEE Transactions on Network and Service Management, 2012

work page 2012

[11] [11]

Automatic virtual network embedding: A deep reinforcement learning approach with graph convolutional networks,

Y . Z. et al, “Automatic virtual network embedding: A deep reinforcement learning approach with graph convolutional networks,”IEEE Journal on Selected Areas in Communications, 2020

work page 2020

[12] [12]

Virtual network embedding through topology-aware node ranking,

C. X. et al, “Virtual network embedding through topology-aware node ranking,”ACM SIGCOMM Computer Communication Review, 2011

work page 2011

[13] [13]

Proximal Policy Optimization Algorithms

S. J. et al, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Fine-tuning cnn image retrieval with no human annotation,

R. F. et al, “Fine-tuning cnn image retrieval with no human annotation,” IEEE transactions on pattern analysis and machine intelligence, 2018

work page 2018

[15] [15]

SNDlib 1.0–Survivable Network Design Library,

S. O. et al, “SNDlib 1.0–Survivable Network Design Library,” in Proceedings of the 3rd International Network Optimization Conference (INOC 2007), Spa, Belgium, 2007

work page 2007

[16] [16]

Node essentiality assessment and distributed collaborative virtual network embedding in datacenters,

F. W. et al, “Node essentiality assessment and distributed collaborative virtual network embedding in datacenters,”IEEE Transactions on Par- allel and Distributed Systems, 2023

work page 2023

[17] [17]

J. C. et al,VNE-HPSO Virtual Network Embedding Algorithm Based on Hybrid Particle Swarm Optimization. Springer Singapore, 2021

work page 2021

[18] [18]

Reinforcement learning assisted bandwidth aware virtual network resource allocation,

Z. P. et al, “Reinforcement learning assisted bandwidth aware virtual network resource allocation,”IEEE Transactions on Network and Ser- vice Management, 2022

work page 2022