A Scalable 256-Antenna Distributed MIMO Testbed with Real-Time Fully Digital Beamforming
Pith reviewed 2026-05-08 18:54 UTC · model grok-4.3
The pith
A distributed MIMO testbed scales to 256 antennas using 16 RFSoC boards for real-time fully digital beamforming.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The LuLIS testbed operates up to 256 coherent RF chains by using 16 AMD Zynq UltraScale RFSoC ZCU216 boards as distributed processing nodes. Real-time MIMO processing is achieved through acceleration and distribution of the required algorithms directly on each board's FPGA fabric. The architecture permits scaling in multiples of 16 antennas by adding further nodes and supports flexible placement of the nodes either fully distributed or co-located. Initial uplink measurements are reported for four single-antenna user equipments transmitting to 64, 128, and 256 base-station antennas.
What carries the argument
Sixteen AMD Zynq UltraScale RFSoC ZCU216 boards serving as distributed nodes that run FPGA-accelerated MIMO algorithms to keep all 256 RF chains phase-coherent and real-time.
If this is right
- Antenna count can be increased from 64 to 256 by adding RFSoC boards in groups of 16 without redesigning any hardware.
- Real-time fully digital beamforming remains possible for uplink signals from four single-antenna users at every supported array size.
- Nodes can be deployed either spread out as a distributed MIMO system or grouped together as a conventional massive MIMO array.
- Processing load stays distributed, avoiding centralized data-transfer overhead when the number of antennas grows.
Where Pith is reading between the lines
- The modular node design could let operators expand existing base stations incrementally rather than replacing entire arrays at once.
- If coherence holds over larger physical separations, the same boards could be used to test how distributed apertures affect interference suppression in crowded environments.
- The absence of hardware redesign at each scale step suggests the architecture might extend past 256 antennas if additional nodes remain synchronized.
Load-bearing premise
Synchronization and coherence across physically separated RFSoC nodes can be preserved at full scale without creating data-transfer bottlenecks or extra latency.
What would settle it
A direct comparison of measured latency, phase coherence, and throughput when the system runs at 64 antennas versus at 256 antennas under identical user traffic would show whether scaling introduces measurable degradation.
Figures
read the original abstract
Distributed massive MIMO (D-MIMO) is a promising technology for future generation wireless systems as it takes advantage of both an increased array aperture and a decentralized processing architecture and topology. In order to truly understand the possibilities and limitations of these approaches in real scenarios, practical realization of testbeds is an essential step in the technology advancement. This work presents the Lund University Large Intelligent Surface testbed -- LuLIS, that can operate up to 256 coherent radio frequency (RF) chains using 16 AMD Zynq UltraScale RFSoC ZCU216 evaluation boards acting as distributed processing nodes. Real-time processing is facilitated by acceleration and distribution of MIMO processing algorithms on the FPGA fabric of the boards. The system is easily scalable, as increasing the number of antennas is done in multiples of 16 by adding more RFSoCs, which also implies addition of another processing node. The design allows up-scaling without hardware redesign, introduction of large latencies or data transfer overhead. The testbed is flexible in terms of deployment, with options of fully distributing the nodes (as in D-MIMO) or co-locating them (as in more traditional Massive MIMO). A detailed description of the implementation of the testbed is presented and initial results are shown for an uplink (UL) transmission from four single-antenna user equipments (UEs) to 64, 128 and 256 base-station antennas.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the LuLIS testbed, a distributed MIMO system supporting up to 256 coherent RF chains using 16 AMD Zynq UltraScale RFSoC ZCU216 boards as distributed processing nodes. Real-time fully digital beamforming is achieved via FPGA acceleration and distribution of MIMO algorithms. The design is claimed to be easily scalable by adding RFSoC nodes in multiples of 16 without hardware redesign, large latencies, or data-transfer overhead, while supporting both fully distributed (D-MIMO) and co-located deployments. Initial uplink results are presented for transmissions from four single-antenna UEs to 64, 128, and 256 base-station antennas.
Significance. If the scalability and coherence claims hold under distributed operation, the work supplies a practical, extensible hardware platform for experimental validation of distributed massive MIMO concepts, which remains an important gap between theory and deployment. The use of commercial RFSoC boards with FPGA-based real-time processing and the reported initial measurements at multiple array sizes constitute concrete engineering contributions that can enable follow-on studies of synchronization, processing distribution, and performance in real scenarios.
major comments (1)
- [Abstract and implementation description] Abstract and testbed scalability description: the central claim that the architecture scales 'without ... data transfer overhead' while preserving real-time performance and coherence when nodes are physically separated is not accompanied by quantitative measurements. No data are supplied on (i) residual timing or phase jitter across the 16 RFSoC boards, (ii) aggregate inter-node bandwidth actually consumed during 256-antenna uplink combining, or (iii) end-to-end latency versus antenna count. These metrics are load-bearing for the 'no overhead' assertion and must be provided to substantiate the scalability claim beyond design intention.
minor comments (2)
- [Abstract] The abstract and results section would benefit from explicit cross-references to any tables or figures that report the uplink measurements for the three array sizes, including any error bars or coherence metrics.
- [Implementation description] Notation for the number of RF chains versus number of antennas should be clarified if they are not identical in all configurations.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review of our manuscript on the LuLIS testbed. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract and implementation description] Abstract and testbed scalability description: the central claim that the architecture scales 'without ... data transfer overhead' while preserving real-time performance and coherence when nodes are physically separated is not accompanied by quantitative measurements. No data are supplied on (i) residual timing or phase jitter across the 16 RFSoC boards, (ii) aggregate inter-node bandwidth actually consumed during 256-antenna uplink combining, or (iii) end-to-end latency versus antenna count. These metrics are load-bearing for the 'no overhead' assertion and must be provided to substantiate the scalability claim beyond design intention.
Authors: We agree that quantitative measurements are necessary to substantiate the scalability claims beyond the architectural description. The manuscript presents the design rationale for scaling by adding RFSoC nodes without hardware redesign and the initial uplink results at 64/128/256 antennas, but does not report explicit values for residual timing/phase jitter across distributed boards, measured inter-node bandwidth during combining, or latency scaling. In the revised manuscript we will add these metrics from additional characterization experiments on the LuLIS testbed, including jitter statistics, actual data volumes transferred between nodes for 256-antenna uplink processing, and end-to-end latency figures for the three array sizes. This will directly address the load-bearing aspects of the 'no overhead' claim for both co-located and physically separated deployments. revision: yes
Circularity Check
Hardware description paper contains no derivations, predictions, or self-referential claims.
full rationale
The manuscript is a direct report of a constructed testbed using 16 RFSoC boards for up to 256 antennas. Scalability is presented as an architectural property of adding boards in multiples of 16, with no equations, fitted parameters, or first-principles derivations. No load-bearing self-citations appear, and no results are claimed as predictions that reduce to inputs by construction. The work is self-contained as an implementation description and initial UL observations, with no circular steps.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.