pith. sign in

arxiv: 2604.09322 · v1 · submitted 2026-04-10 · 💻 cs.NI

EYWA: Elastic Load-Balancing and High-Availability Wired Virtual Network Architecture

Pith reviewed 2026-05-10 16:13 UTC · model grok-4.3

classification 💻 cs.NI
keywords virtual networkcloud IaaSload balancinghigh availabilitymulti-tenancylayer-2 semanticsdistributed controlSNAT/DNAT
0
0 comments X

The pith

EYWA uses agents on each hypervisor to distribute virtual network control and support millions of tenants without central bottlenecks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EYWA as a virtual network architecture for multi-tenant cloud IaaS. It establishes that running a simple agent on every hypervisor host allows those agents to function together as a distributed controller. This design supports up to about 16 million tenants through logically isolated virtual LANs while delivering per-tenant public IP services without SNAT or DNAT throughput limits or single points of failure. It also gives each tenant one large IP subnet with extended layer-2 connectivity. A sympathetic reader would care because conventional overlay networks hit scalability walls in large data centers, and EYWA claims to remove those walls using only existing hosts.

Core claim

EYWA overcomes scalability limitations by accommodating a large number of tenants (about 2^24) through logically isolated virtual LANs with unique IP ranges, providing per-tenant public network services without throughput bottlenecks or single points of failure in network address translation, and enabling a single large IP subnet per tenant with extended layer-2 semantics. Its only component is an agent running on each hypervisor host, which collectively act as a distributed controller.

What carries the argument

The agent on each hypervisor host that together form a distributed controller for control and data planes.

If this is right

  • The system supports roughly 16 million tenants with isolated virtual LANs and unique IP ranges.
  • Per-tenant public network access avoids both throughput limits and single points of failure in address translation.
  • Each tenant receives one large IP subnet together with extended layer-2 semantics.
  • Deployment requires only the per-host agent and works on today's hypervisors.
  • High availability and load balancing emerge from the distributed scale-out planes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The design could lower the operational cost of running very large cloud networks by removing any need for a separate centralized controller.
  • Similar per-host distribution might apply to other cloud services that currently rely on dedicated control nodes.
  • Production traces from a real multi-tenant deployment would test whether agent coordination remains stable under live traffic.

Load-bearing premise

Independent agents on separate hosts can coordinate as a single reliable controller without introducing new consistency problems or failure modes.

What would settle it

A measurement showing either a throughput bottleneck in SNAT/DNAT or a coordination failure when running thousands of tenants with public network traffic.

Figures

Figures reproduced from arXiv: 2604.09322 by Jungin Jung, Wookjae Jeong.

Figure 1
Figure 1. Figure 1: Virtual (Overlay) network for multi-tenant cloud [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Traffic Flows in a shared network service host [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Traffic flows on MVRRP architecture Private Network has traditionally been provided through VLAN (802.1Q) [11], which assigns a virtual LAN per tenant. However, the limited VLAN ID space restricts scalability and cannot support a very large number of tenants. • Virtual Extensible LAN (VxLAN) [5]: As discussed in Section II, VLANs are constrained by the 4,094 ID limit. VxLAN overcomes this limitation by sup… view at source ↗
Figure 4
Figure 4. Figure 4: SNAT traffic flows with multiple VRRP groups [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: EYWA environments designs, VM MAC addresses consume limited memory in the physical switch. In contrast, VxLAN encapsulates VM MAC addresses within the host’s address space, avoiding this memory constraint. • Traffic Engineering: VMs that have a local VR benefit from reduced bandwidth consumption and lower latency, since their default gateway does not require traversing a remote VR, as illustrated in [PITH… view at source ↗
Figure 7
Figure 7. Figure 7: EYWA Agent and Mode IV. EYWA The previous section showed that the MVRRP design com￾bined with VxLAN significantly mitigated many limitations of conventional virtual networking. However, to achieve truly unlimited scalability, the remaining issues must be addressed. In this section, we propose EYWA, a final architecture that: (1) accommodates a very large number of tenants, (2) provides per-tenant public ne… view at source ↗
Figure 8
Figure 8. Figure 8: (13) Normal Mode, Outbound ARP Reply (VM → VR): A local VM replies to a remote VR, but this is 13 N/A because the request was already handled by 2 Filtering. (14) Normal Mode, Inbound ARP Reply (VM → VR): A remote orphan VM replies to a local VR (via 1-1 Pass). This is allowed ( 14 Pass). (15) Orphan Mode, Outbound ARP Reply (VM → VR): A local orphan VM replies to a remote VR (via 1-1 Pass). This is 15 N/A… view at source ↗
Figure 11
Figure 11. Figure 11: Total north-south network bandwidth We exclude private network issues, such as east–west traffic within a single tenant, from this evaluation. These behaviors are well understood and are not central to our experimental objectives. A. North-South Traffic In this experiment, we demonstrate that all VMs can fully utilize the available physical bandwidth when communicating with external servers, without encou… view at source ↗
Figure 10
Figure 10. Figure 10: Test environment for north-south communication [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Total north-south network bandwidth increased and decreased by auto-scaled VMs and VRs [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Average network bandwidth per VM in inter-tenant commu [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Total network bandwidth of VM pairs in inter-tenant com [PITH_FULL_IMAGE:figures/full_fig_p010_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Network bandwidth competition by north-south and east-west [PITH_FULL_IMAGE:figures/full_fig_p010_15.png] view at source ↗
read the original abstract

Infrastructure as a Service (IaaS) in cloud environments provides compute, storage, networking, and other fundamental resources that allow consumers to deploy and run arbitrary software, including operating systems and applications. To support multi-tenant environments, IaaS leverages virtualization, but conventional overlay network architectures have become a direct cause of scalability limitations. In particular, current IaaS virtual networks face challenges in high availability and load balancing. To address these issues, we present EYWA, a virtual network architecture that scales to support very large data centers with high availability, efficient load balancing, and large layer-2 semantics. EYWA overcomes scalability limitations by: (1) accommodating a large number of tenants (about 2^24 = 16,777,216) through logically isolated virtual LANs with unique IP ranges, (2) providing per-tenant public network services without throughput bottlenecks or single points of failure in network address translation (SNAT/DNAT), and (3) enabling a single large IP subnet per tenant with extended layer-2 semantics. EYWA combines existing techniques into a distributed scale-out control and data plane. Its only component is an agent running on each hypervisor host, which collectively act as a distributed controller. As a result, EYWA can be deployed in today's multi-tenant cloud environments. We have implemented a proof-of-concept (PoC) of EYWA and evaluated its effectiveness through measurements and experiments in our lab.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents EYWA, a virtual network architecture for multi-tenant IaaS clouds. It claims to overcome scalability limitations of conventional overlay networks by using a distributed set of agents on hypervisor hosts that collectively function as a scale-out control and data plane. Key features include support for approximately 2^24 tenants via logically isolated virtual LANs, per-tenant public network services without throughput bottlenecks or single points of failure in SNAT/DNAT, and a single large IP subnet per tenant with extended layer-2 semantics. The architecture is implemented as a proof-of-concept and evaluated through lab measurements and experiments.

Significance. If the central claims regarding the distributed controller's ability to provide high availability and load balancing without introducing new bottlenecks or failure modes hold, this work could offer a practical approach to scaling virtual networks in large data centers. The combination of existing techniques into a deployable system with only per-host agents is a notable strength, and the PoC demonstrates feasibility in current cloud environments. However, the absence of detailed quantitative performance data, baselines, and error bars in the evaluation weakens the ability to assess the magnitude of improvements over existing architectures.

major comments (2)
  1. [Abstract] Abstract: The claim that the per-host agents 'collectively act as a distributed controller' is load-bearing for the assertions of no single points of failure and absence of throughput bottlenecks in SNAT/DNAT. However, no details are provided on the coordination protocol, state synchronization for IP mappings and NAT rules, consistency model, or failure detection mechanisms. This omission makes it impossible to evaluate whether the architecture avoids the coordination overhead or new failure modes that could undermine the high-availability and load-balancing claims.
  2. [Evaluation] Evaluation section (referenced via lab measurements and experiments): The manuscript states that effectiveness was evaluated through measurements and experiments but reports no quantitative results, comparison baselines, throughput/latency metrics, scalability limits, or error bars. This is critical because the claims of no bottlenecks and effective load balancing rest on these unshown results.
minor comments (1)
  1. [Abstract] Abstract: The tenant count is stated as 'about 2^24 = 16,777,216'; consider clarifying whether this is exact or approximate and how the limit is enforced in the architecture.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below and will revise the manuscript to incorporate the requested clarifications and data.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the per-host agents 'collectively act as a distributed controller' is load-bearing for the assertions of no single points of failure and absence of throughput bottlenecks in SNAT/DNAT. However, no details are provided on the coordination protocol, state synchronization for IP mappings and NAT rules, consistency model, or failure detection mechanisms. This omission makes it impossible to evaluate whether the architecture avoids the coordination overhead or new failure modes that could undermine the high-availability and load-balancing claims.

    Authors: We agree that the submitted manuscript provides only a high-level description of the per-host agents acting collectively as a distributed controller and does not include the requested specifics on coordination. In the revised version we will add a new subsection detailing the coordination protocol among agents, state synchronization for IP mappings and NAT rules, the consistency model, and failure detection mechanisms. This will enable a clearer assessment of whether the design avoids new bottlenecks or failure modes. revision: yes

  2. Referee: [Evaluation] Evaluation section (referenced via lab measurements and experiments): The manuscript states that effectiveness was evaluated through measurements and experiments but reports no quantitative results, comparison baselines, throughput/latency metrics, scalability limits, or error bars. This is critical because the claims of no bottlenecks and effective load balancing rest on these unshown results.

    Authors: We acknowledge that the current evaluation section describes the lab measurements and experiments only at a high level and omits the quantitative results, baselines, metrics, and error bars. We will expand the evaluation section in the revised manuscript to include detailed quantitative performance data, comparison baselines against conventional overlay networks, throughput and latency metrics, scalability limits, and error bars from the experiments. These additions will provide stronger substantiation for the claims of no bottlenecks and effective load balancing. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture is descriptive with no equations or fitted predictions

full rationale

The paper describes EYWA as a combination of existing techniques into a distributed per-host agent architecture for virtual networking. No mathematical derivations, equations, parameter fits, or 'predictions' appear in the abstract or described content. Claims about tenant scaling (2^24), per-tenant SNAT/DNAT without SPOFs, and extended L2 semantics are presented as direct consequences of the agent-based design rather than reductions to prior fitted values or self-citations. The PoC implementation and lab evaluation provide external grounding. No self-definitional loops, uniqueness theorems, or ansatzes smuggled via citation are present. The unspecified coordination details noted in the skeptic take represent a potential correctness gap but do not constitute circularity under the defined patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are stated. The design implicitly assumes that existing hypervisor virtualization primitives can be coordinated by lightweight agents without new primitives.

pith-pipeline@v0.9.0 · 5565 in / 1082 out tokens · 49000 ms · 2026-05-10T16:13:19.001263+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 1 internal anchor

  1. [1]

    EYWA: Elastic Load-Balancing and High-Availability Wired Virtual Network Architecture

    to overcome the limitations of the conventional three-tier model and to identify optimal network designs. Unfortunately, comparable advances in virtualized networking remain limited. Each existing open-source solution introduces its own net- work models, which are sometimes similar, and in some cases, multiple models coexist within a single solution. This...

  2. [2]

    In the shared service model, all public traffic is forced through the remote service host (Figure 2), further amplifying bandwidth consumption and latency

    avoids this overhead. In the shared service model, all public traffic is forced through the remote service host (Figure 2), further amplifying bandwidth consumption and latency. Private Networkis required because cloud tenants demand that their VMs reside in different layer-2 subnets or layer-3 networks from others, to ensure security and traffic isolatio...

  3. [3]

    Specifically, the agent evaluates the VR’s status by monitoring ARP sessions, processing Gratuitous ARP (GARP) messages, and conducting periodic health checks

    VR Monitoring:EYW A’s agent monitors the vPort to assess the state and bandwidth usage of the local VR, and performs health checks through the vPort connected to the VSi. Specifically, the agent evaluates the VR’s status by monitoring ARP sessions, processing Gratuitous ARP (GARP) messages, and conducting periodic health checks. The VR’s state can also be...

  4. [4]

    ARP Caching:The agent maintains an ARP cache to store IP-to-MAC address mappings and relies on its Proxy ARP function until the cache entries expire. For effective caching, the agent records the addresses of the local VR, local VMs, and remote VMs by monitoring ARP sessions and Gratuitous ARP (GARP) packets through the vPort and VTEP. To ensure consistenc...

  5. [5]

    To prevent conflicts, the agent filters ARP packets passing through the VTEP to ensure that VMs discover only a single gateway

    ARP Filtering & Proxy ARP:As described earlier, mul- tiple VRs per tenant may share the same private IP address. To prevent conflicts, the agent filters ARP packets passing through the VTEP to ensure that VMs discover only a single gateway. This filtering avoids IP address conflicts among VRs and also enables the agent to act as a Proxy ARP, reducing ARP ...

  6. [6]

    In this setting, every VM transmits traffic to external servers at its full physical bandwidth

    Outbound Communications:In the baseline scenario, each hypervisor host contains both a VR and a VM belonging to the same tenant, i.e., all instances operate in Normal Mode. In this setting, every VM transmits traffic to external servers at its full physical bandwidth. As illustrated in Figure 11, the aggregate outbound throughput of all VMs equals the sum...

  7. [7]

    For evaluation purposes, we emulate this behavior by manually launching or terminating VR and VM instances

    Outbound Communications in the Auto-Scaling Scenario of VRs and VMs:We further evaluate the case where the cloud platform supports auto-scaling of VRs and VMs according to predefined policies (e.g., based on network bandwidth utilization). For evaluation purposes, we emulate this behavior by manually launching or terminating VR and VM instances. The testb...

  8. [8]

    Each hypervisor host contains one VR and one VM belonging to the same tenant

    1-to-1 Communications:The test environment is iden- tical to Section V-A1, except that the 10 hypervisor hosts are equally divided between tenant A and tenant B. Each hypervisor host contains one VR and one VM belonging to the same tenant. In this setup, every VM of tenant A transmits traffic at full bandwidth to an idle VM of tenant B. As illustrated in ...

  9. [9]

    Each VM of tenant A transmits traffic at full bandwidth to all VMs of tenant B, and symmetrically, each VM of tenant B sends to all VMs of tenant A

    1-to-N Communications:The test environment is identi- cal to Section V-B1. Each VM of tenant A transmits traffic at full bandwidth to all VMs of tenant B, and symmetrically, each VM of tenant B sends to all VMs of tenant A. As shown in Figure 14, the aggregate outbound throughput of host pairs equals the sum of the physical link capacities of all communic...

  10. [10]

    Each host runs one VR and two VMs: one designated for east–west communication and one for north–south communi- cation

    Outbound and 1-to-N Communications:The 10 hypervi- sor hosts are equally divided between tenant A and tenant B. Each host runs one VR and two VMs: one designated for east–west communication and one for north–south communi- cation. In this setup, each east–west VM of tenant A transmits traffic at full bandwidth to all east–west VMs of tenant B, while each ...

  11. [11]

    Open-source solutions such as HAProxy [35] and Linux Virtual Server (LVS) [36] are also widely used

    uses DNS-based traffic distribution across EC2 instances. Open-source solutions such as HAProxy [35] and Linux Virtual Server (LVS) [36] are also widely used. Tunneling Protocols: NVGRE [6] uses GRE encapsulation with a 24-bit tenant ID space, supporting up to 16 million networks, similar to VxLAN. STT [7] extends this further by introducing a 64-bit netw...

  12. [12]

    Lahiri, D

    Greenberg, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. Towards a next generation data center architecture: Scalability and commoditiza- tion. In PRESTO Workshop at SIGCOMM, 2008

  13. [13]

    Greenberg, James R

    A. Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel and Sudipta Sengupta. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM, 2009

  14. [14]

    C. Kim, M. Caesar, and J. Rexford. Floodless in SEATTLE: a scalable ethernet architecture for large enterprises. In SIGCOMM, 2008

  15. [15]

    Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, Naveen Karri

    Parveen Patel, Deepak Bansal, Lihua Yuan, Ashwin Murthy, Albert Greenberg, David A. Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, Naveen Karri. Ananta: Cloud Scale Load Balancing. In SIGCOMM, 2013

  16. [16]

    Mahalingam, D

    M. Mahalingam, D. Dutt, K. Duda, P. Agarwal, L. Kreeger, T. Sridhar, M. Bursell and C. Wright. VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks, IETF Internet Draft

  17. [17]

    Sridharan, K

    M. Sridharan, K. Duda, I. Ganga, A. Greenberg, G. Lin, M. Pearson, P. Thaler, C. Tumuluri, N. Venkataramiah, Y . Wang. NVGRE: Network Virtualization using Generic Routing Encapsulation, IETF Internet Draft

  18. [18]

    Davie, Ed

    B. Davie, Ed. J. Gross, A Stateless Transport Tunneling Protocol for Network Virtualization (STT), IETF Internet Draft

  19. [19]

    PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

    Radhika Niranjan Mysore, Andreas Pamboris, Nathan Far-rington, Nel- son Huang, Pardis Miri, Sivasankar Radhakrish-nan, Vikram Subra- manya, and Amin Vahdat. PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. In SIGCOMM, 2009

  20. [20]

    A Scal- able, Commodity Data Center network Architecture

    Mohammad Al-Fares, Alexander Loukissas and Amin Vahdat. A Scal- able, Commodity Data Center network Architecture. In SIGCOMM, 2008

  21. [21]

    Mckeown, T

    N. Mckeown, T. Anderson, H. Balakrishnan, G. M. Parulkar, L. L. Peterson, J. Rexford, S. Shenker, and J. S. Turner. OpenFlow: Enabling Innovation in Campus Networks. In SIGCOMM, 2008

  22. [22]

    Nadas, Ed

    IEEE 802.1Q VLANs, Media Access Control Bridges and Virtual Bridged Local Area Networks S. Nadas, Ed. Ericsson, Virtual Router Redundancy Protocol (VRRP) Version 3 for IPv4 and IPv6. IETF RFC 5798

  23. [23]

    Nadas, Ed

    S. Nadas, Ed. Ericsson, Virtual Router Redundancy Protocol (VRRP) Version 3 for IPv4 and IPv6. IETF RFC 5798

  24. [24]

    Perlman et al

    R. Perlman et al. TRILL: Transparent Interconnection of Lots of Links. IETF RFC

  25. [25]

    IEEE 802.1aq Shortest Path Bridging

  26. [26]

    Allan, N

    D. Allan, N. Bragg, P. Unbehagen. IS-IS Extensions Support-ing IEEE 802.1aq Shortest Path Bridging, IETF RFC

  27. [27]

    Openstack, http://www.openstack.org

  28. [28]

    Apache Cloudstack, http://cloudstack.apache.org

  29. [29]

    Eucalyptus, http://www.eucalyptus.com

  30. [30]

    OpenNebula, http://opennebula.org

  31. [31]

    Amazon Web Services, http://aws.amazon.com

  32. [32]

    Microsoft, https://www.microsoft.com

  33. [33]

    Microsoft Azure, http://azure.microsoft.com

  34. [34]

    Vmware, http://www.vmware.com

  35. [35]

    Rackspace Open Cloud, http://www.rackspace.com/cloud

  36. [36]

    Google Compute Engine, http://cloud.google.com/compute

  37. [37]

    IBM Cloud, http://www.ibm.com/cloud-computing

  38. [38]

    Ucloud biz, https://ucloudbiz.olleh.com

  39. [39]

    OpenFlow, http://archive.openflow.org

  40. [40]

    AWS Virtual Private Cloud (VPC), http://aws.amazon.com/vpc

  41. [41]

    AWS Elastic Load Balancing (ELB), http://aws.amazon.com/elasticloadbalancing

  42. [42]

    AWS Route 53, http://aws.amazon.com/route53

  43. [43]

    MidoNet, http://www.midokura.com/midonet

  44. [44]

    Openstack Neutron/Distributed Virtual Router (DVR), https://wiki.openstack.org/wiki/Neutron/DVR

  45. [45]

    http://www.citrix.com

    NetScalar VPX Virtual Appliance. http://www.citrix.com

  46. [46]

    HAProxy Load Balancer, http://www.haproxy.org

  47. [47]

    Linux Virtual Server, http://www.linuxvirtualserver.org

  48. [48]

    Vyatta Virtual Router, http://www.brocade.com

  49. [49]

    OVS Virtual Switch, http://openvswitch.org

  50. [50]

    Linux Bridge, http://www.linuxfoundation.org

  51. [51]

    OpenStack Neutron/LBaaS, https://wiki.openstack.org/wiki/Neutron/LBaaS

  52. [52]

    EYW A simple PoC, https://goo.gl/A1dMJ0

  53. [53]

    EYW A presentation Prezi, https://goo.gl/wMjCgI