A Scalable 256-Antenna Distributed MIMO Testbed with Real-Time Fully Digital Beamforming

Anders J Johansson; Baktash Behmanesh; Dumitra Iancu; Emil Bergman; Liang Liu; Lina Tinnerberg; Mikael Henriksson; Ove Edfors; Sijia Cheng; Vilgot Snygg

arxiv: 2605.02388 · v2 · pith:NO5MUA3Lnew · submitted 2026-05-04 · 📡 eess.SP · cs.SY· eess.SY

A Scalable 256-Antenna Distributed MIMO Testbed with Real-Time Fully Digital Beamforming

Dumitra Iancu , Vilgot Snygg , Sijia Cheng , Lina Tinnerberg , Mikael Henriksson , Emil Bergman , Anders J Johansson , Baktash Behmanesh

show 2 more authors

Ove Edfors Liang Liu

This is my paper

Pith reviewed 2026-05-08 18:54 UTC · model grok-4.3

classification 📡 eess.SP cs.SYeess.SY

keywords distributed MIMOtestbedreal-time beamformingRFSoCscalable architecturemassive MIMOuplink transmissioncoherent RF chains

0 comments

The pith

A distributed MIMO testbed scales to 256 antennas using 16 RFSoC boards for real-time fully digital beamforming.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper describes the construction of a large-scale testbed to study distributed massive MIMO systems in practice. The LuLIS setup coordinates up to 256 radio chains across multiple boards to perform beamforming while keeping all operations real time. Scaling occurs simply by adding boards in groups of 16, without new hardware designs or added delays in data movement. A sympathetic reader would care because such a system makes it possible to measure the actual performance gains of decentralized processing when signals travel from several single-antenna users to a growing base-station array.

Core claim

The LuLIS testbed operates up to 256 coherent RF chains by using 16 AMD Zynq UltraScale RFSoC ZCU216 boards as distributed processing nodes. Real-time MIMO processing is achieved through acceleration and distribution of the required algorithms directly on each board's FPGA fabric. The architecture permits scaling in multiples of 16 antennas by adding further nodes and supports flexible placement of the nodes either fully distributed or co-located. Initial uplink measurements are reported for four single-antenna user equipments transmitting to 64, 128, and 256 base-station antennas.

What carries the argument

Sixteen AMD Zynq UltraScale RFSoC ZCU216 boards serving as distributed nodes that run FPGA-accelerated MIMO algorithms to keep all 256 RF chains phase-coherent and real-time.

If this is right

Antenna count can be increased from 64 to 256 by adding RFSoC boards in groups of 16 without redesigning any hardware.
Real-time fully digital beamforming remains possible for uplink signals from four single-antenna users at every supported array size.
Nodes can be deployed either spread out as a distributed MIMO system or grouped together as a conventional massive MIMO array.
Processing load stays distributed, avoiding centralized data-transfer overhead when the number of antennas grows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The modular node design could let operators expand existing base stations incrementally rather than replacing entire arrays at once.
If coherence holds over larger physical separations, the same boards could be used to test how distributed apertures affect interference suppression in crowded environments.
The absence of hardware redesign at each scale step suggests the architecture might extend past 256 antennas if additional nodes remain synchronized.

Load-bearing premise

Synchronization and coherence across physically separated RFSoC nodes can be preserved at full scale without creating data-transfer bottlenecks or extra latency.

What would settle it

A direct comparison of measured latency, phase coherence, and throughput when the system runs at 64 antennas versus at 256 antennas under identical user traffic would show whether scaling introduces measurable degradation.

Figures

Figures reproduced from arXiv: 2605.02388 by Anders J Johansson, Baktash Behmanesh, Dumitra Iancu, Emil Bergman, Liang Liu, Lina Tinnerberg, Mikael Henriksson, Ove Edfors, Sijia Cheng, Vilgot Snygg.

**Figure 1.** Figure 1: Depiction of the distributed, scalable, antenna panel view at source ↗

**Figure 2.** Figure 2: Block diagram of the processing capabilities of the panels showing only the data path. view at source ↗

**Figure 3.** Figure 3: Fabricated PCB of the 16 channel AFE. between the TDD switches and antennas to mitigate out-ofband transmission in TX mode and limit the in-band noise in the RX mode. The filter has a passband bandwidth of 280 MHz around 3.84 GHz with a steep roll-off behavior and a typical in-band insertion loss of around 2.2 dB. The AFE is powered using an external 12V power supply, which is regulated down to two separa… view at source ↗

**Figure 4.** Figure 4: The 16-panel testbed deployed in two different scenarios. The users, signaled with the magenta square, are tightly view at source ↗

**Figure 5.** Figure 5: Results from one OFDM symbol. User constellation view at source ↗

**Figure 6.** Figure 6: SIR across subcarriers before ZF. front, resulting in higher inter-user interference. This is clearly illustrated in Fig. 5b, where the interference is worse for the user pairs (0,2) and (1,3), as compared to the Fig. 5d, where the per-pair interference is diminished, as the beams are not only focused from one direction. This was expected as the array aperture is increased, yielding better spatial resolution view at source ↗

read the original abstract

Distributed massive MIMO (D-MIMO) is a promising technology for future generation wireless systems as it takes advantage of both an increased array aperture and a decentralized processing architecture and topology. In order to truly understand the possibilities and limitations of these approaches in real scenarios, practical realization of testbeds is an essential step in the technology advancement. This work presents the Lund University Large Intelligent Surface testbed -- LuLIS, that can operate up to 256 coherent radio frequency (RF) chains using 16 AMD Zynq UltraScale RFSoC ZCU216 evaluation boards acting as distributed processing nodes. Real-time processing is facilitated by acceleration and distribution of MIMO processing algorithms on the FPGA fabric of the boards. The system is easily scalable, as increasing the number of antennas is done in multiples of 16 by adding more RFSoCs, which also implies addition of another processing node. The design allows up-scaling without hardware redesign, introduction of large latencies or data transfer overhead. The testbed is flexible in terms of deployment, with options of fully distributing the nodes (as in D-MIMO) or co-locating them (as in more traditional Massive MIMO). A detailed description of the implementation of the testbed is presented and initial results are shown for an uplink (UL) transmission from four single-antenna user equipments (UEs) to 64, 128 and 256 base-station antennas.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a concrete 256-antenna distributed MIMO testbed built from 16 RFSoC boards with real-time FPGA processing and basic uplink results at scale, but the easy-scaling claims rest mostly on architecture rather than measured overhead numbers.

read the letter

The main thing to know is that the authors have built and run a 256-antenna D-MIMO testbed using 16 ZCU216 boards as distributed nodes, with MIMO algorithms accelerated on the FPGAs for real-time fully digital beamforming. They also show initial uplink measurements from four single-antenna UEs as the array grows from 64 to 128 to 256 antennas, and the setup can be deployed either spread out or co-located. That hardware realization is the core contribution here. What the work does well is give a clear implementation blueprint. The description of how processing is distributed across nodes, how the system adds capacity in multiples of 16 without redesign, and the practical flexibility for different deployment modes are useful for anyone trying to replicate or build on large testbeds. The fact that they actually got it running and collected data at the full size counts as real evidence of feasibility. The softer spot is the limited quantitative backing for the scalability story. The paper states that adding nodes introduces no large latencies or data-transfer overhead while keeping coherence, yet it supplies no measured figures on inter-node timing or phase jitter, actual bandwidth consumed during combining at 256 antennas, or how end-to-end latency changes with array size. The initial results confirm basic operation, but without those numbers the “no overhead” part stays closer to design intention than demonstrated property. This is the sort of paper for experimental wireless researchers who need hardware references or starting points for their own testbeds. A reader in that group will find the architecture details and scaling behavior worth seeing. It shows straightforward engineering work and honest reporting of what was achieved. I would send it to peer review. The concrete build and measurements are substantial enough that referees can help strengthen the quantitative side without the paper needing wholesale changes.

Referee Report

1 major / 2 minor

Summary. The manuscript describes the LuLIS testbed, a distributed MIMO system supporting up to 256 coherent RF chains using 16 AMD Zynq UltraScale RFSoC ZCU216 boards as distributed processing nodes. Real-time fully digital beamforming is achieved via FPGA acceleration and distribution of MIMO algorithms. The design is claimed to be easily scalable by adding RFSoC nodes in multiples of 16 without hardware redesign, large latencies, or data-transfer overhead, while supporting both fully distributed (D-MIMO) and co-located deployments. Initial uplink results are presented for transmissions from four single-antenna UEs to 64, 128, and 256 base-station antennas.

Significance. If the scalability and coherence claims hold under distributed operation, the work supplies a practical, extensible hardware platform for experimental validation of distributed massive MIMO concepts, which remains an important gap between theory and deployment. The use of commercial RFSoC boards with FPGA-based real-time processing and the reported initial measurements at multiple array sizes constitute concrete engineering contributions that can enable follow-on studies of synchronization, processing distribution, and performance in real scenarios.

major comments (1)

[Abstract and implementation description] Abstract and testbed scalability description: the central claim that the architecture scales 'without ... data transfer overhead' while preserving real-time performance and coherence when nodes are physically separated is not accompanied by quantitative measurements. No data are supplied on (i) residual timing or phase jitter across the 16 RFSoC boards, (ii) aggregate inter-node bandwidth actually consumed during 256-antenna uplink combining, or (iii) end-to-end latency versus antenna count. These metrics are load-bearing for the 'no overhead' assertion and must be provided to substantiate the scalability claim beyond design intention.

minor comments (2)

[Abstract] The abstract and results section would benefit from explicit cross-references to any tables or figures that report the uplink measurements for the three array sizes, including any error bars or coherence metrics.
[Implementation description] Notation for the number of RF chains versus number of antennas should be clarified if they are not identical in all configurations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed review of our manuscript on the LuLIS testbed. We address the major comment below.

read point-by-point responses

Referee: [Abstract and implementation description] Abstract and testbed scalability description: the central claim that the architecture scales 'without ... data transfer overhead' while preserving real-time performance and coherence when nodes are physically separated is not accompanied by quantitative measurements. No data are supplied on (i) residual timing or phase jitter across the 16 RFSoC boards, (ii) aggregate inter-node bandwidth actually consumed during 256-antenna uplink combining, or (iii) end-to-end latency versus antenna count. These metrics are load-bearing for the 'no overhead' assertion and must be provided to substantiate the scalability claim beyond design intention.

Authors: We agree that quantitative measurements are necessary to substantiate the scalability claims beyond the architectural description. The manuscript presents the design rationale for scaling by adding RFSoC nodes without hardware redesign and the initial uplink results at 64/128/256 antennas, but does not report explicit values for residual timing/phase jitter across distributed boards, measured inter-node bandwidth during combining, or latency scaling. In the revised manuscript we will add these metrics from additional characterization experiments on the LuLIS testbed, including jitter statistics, actual data volumes transferred between nodes for 256-antenna uplink processing, and end-to-end latency figures for the three array sizes. This will directly address the load-bearing aspects of the 'no overhead' claim for both co-located and physically separated deployments. revision: yes

Circularity Check

0 steps flagged

Hardware description paper contains no derivations, predictions, or self-referential claims.

full rationale

The manuscript is a direct report of a constructed testbed using 16 RFSoC boards for up to 256 antennas. Scalability is presented as an architectural property of adding boards in multiples of 16, with no equations, fitted parameters, or first-principles derivations. No load-bearing self-citations appear, and no results are claimed as predictions that reduce to inputs by construction. The work is self-contained as an implementation description and initial UL observations, with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a hardware implementation paper. No free parameters, mathematical axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5599 in / 1109 out tokens · 52552 ms · 2026-05-08T18:54:14.642750+00:00 · methodology

A Scalable 256-Antenna Distributed MIMO Testbed with Real-Time Fully Digital Beamforming

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)