Universal Approximation with XL MIMO Systems: OTA Classification via Trainable Analog Combining
Pith reviewed 2026-05-22 19:54 UTC · model grok-4.3
The pith
An XL MIMO system with analog combining acts as a universal function approximator for over-the-air classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under rich fading and low noise conditions with a large number of receive antennas, the XL MIMO system cast into the ELM framework exhibits universal approximation properties and supports efficient over-the-air classification, achieving above 90 percent accuracy on multiple datasets with optimization latency of only a few milliseconds.
What carries the argument
The ELM framework applied to XL MIMO, where the channel coefficients serve as random hidden-layer nodes and the receiver analog combiner serves as the trainable output layer.
Load-bearing premise
The wireless channel coefficients under rich scattering and low noise with XL receive antennas behave like the fixed random weights of an ELM hidden layer when the analog combiner is made trainable.
What would settle it
A physical experiment in which an XL MIMO array under rich fading and low noise fails to reach above 90 percent classification accuracy or requires training times longer than a few milliseconds would falsify the claimed practical performance.
Figures
read the original abstract
In this paper, we show that an eXtremely Large (XL) Multiple-Input Multiple-Output (MIMO) wireless system with appropriate analog combining components exhibits the properties of a universal function approximator, similar to a feedforward neural network. By treating the channel coefficients as the random nodes of a hidden layer and the receiver's analog combiner as a trainable output layer, we cast the XL MIMO system to the Extreme Learning Machine (ELM) framework, leading to a novel formulation for Over-The-Air (OTA) edge inference without requiring traditional digital processing nor pre-processing at the transmitter. Through theoretical analysis and numerical evaluation, we showcase that XL-MIMO-ELM enables near-instantaneous training and efficient classification, even in varying fading conditions, suggesting the paradigm shift of beyond massive MIMO systems as OTA artificial neural networks alongside their profound communications role. Compared to conventional ELMs and deep learning approaches, whose training takes seconds to minutes, the proposed framework achieves on par performance (above $90\%$ classification accuracy across multiple data sets) with optimization latency of few milliseconds under the same number of trainable parameters, considering rich fading, low noise channels with XL receive antennas, making it highly attractive for inference tasks with ultra-low-power devices.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that an XL MIMO wireless system with analog combining components can function as a universal function approximator by mapping channel coefficients to random hidden-layer nodes in the ELM framework and treating the analog combiner weights as trainable output-layer parameters. This enables OTA edge inference and classification without digital processing or transmitter pre-processing, achieving >90% accuracy with millisecond-scale training under rich fading and low noise, outperforming conventional ELM and DL in latency while matching performance.
Significance. If the ELM mapping and universal approximation property hold, the result would establish XL MIMO systems as hardware realizations of neural networks for inference tasks, enabling ultra-low-latency, low-power OTA computation alongside communications. This could shift paradigms for beyond-massive-MIMO systems in edge AI, with explicit strengths in the claimed near-instantaneous training and reproducible numerical results across datasets.
major comments (2)
- [Abstract / system model] Abstract and system model: The casting of the XL MIMO system (y = Hx + n) to the ELM framework treats channel coefficients directly as random hidden nodes with the analog combiner as output weights. However, the ELM universal approximation theorem (Huang et al.) requires a non-constant, bounded, continuous nonlinear activation function σ(·) applied to hidden-layer outputs. The linear MIMO model supplies randomness via fading but no such σ(·); the construction therefore reduces to linear regression over random features, which cannot guarantee approximation of arbitrary continuous functions on compact sets. This directly undermines the central universal-approximation claim.
- [Theoretical analysis] Theoretical analysis section: The conditions stated for the mapping (rich fading, low noise, XL receive antennas) improve conditioning of the linear map but do not introduce the required nonlinearity. No explicit activation (e.g., envelope detection or rectifier) is described prior to the trainable combiner, so the claimed equivalence to ELM does not follow.
minor comments (2)
- [Introduction / theoretical analysis] Add explicit citation to the Huang et al. ELM universal approximation theorem and state the precise conditions being invoked.
- [System model] Clarify whether any analog-component nonlinearity is assumed in the combiner; if so, include its functional form and verify it satisfies ELM activation requirements.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on the connection to the ELM framework. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract / system model] Abstract and system model: The casting of the XL MIMO system (y = Hx + n) to the ELM framework treats channel coefficients directly as random hidden nodes with the analog combiner as output weights. However, the ELM universal approximation theorem (Huang et al.) requires a non-constant, bounded, continuous nonlinear activation function σ(·) applied to hidden-layer outputs. The linear MIMO model supplies randomness via fading but no such σ(·); the construction therefore reduces to linear regression over random features, which cannot guarantee approximation of arbitrary continuous functions on compact sets. This directly undermines the central universal-approximation claim.
Authors: We agree with the referee that the standard ELM universal approximation theorem requires a nonlinear activation function σ(·) and that the linear MIMO channel model y = Hx + n does not supply this. Our construction therefore corresponds to random linear feature regression rather than the full ELM setting. We will revise the abstract, introduction, and system model sections to remove references to universal approximation in the ELM sense and instead describe the approach as OTA random feature regression for classification, while retaining the latency and accuracy results. revision: yes
-
Referee: [Theoretical analysis] Theoretical analysis section: The conditions stated for the mapping (rich fading, low noise, XL receive antennas) improve conditioning of the linear map but do not introduce the required nonlinearity. No explicit activation (e.g., envelope detection or rectifier) is described prior to the trainable combiner, so the claimed equivalence to ELM does not follow.
Authors: We concur that the stated conditions address matrix conditioning but do not introduce nonlinearity, and no activation function is present in the model. We will revise the theoretical analysis section to explicitly acknowledge this limitation, remove the ELM equivalence claim, and clarify that the framework provides efficient linear random-feature-based inference under the given channel assumptions. revision: yes
Circularity Check
No significant circularity; modeling analogy to ELM is independent of target claim
full rationale
The paper's central step is an explicit modeling choice: channel coefficients are treated as random hidden-layer nodes and the analog combiner as output weights to cast the XL-MIMO system into the existing ELM framework (Huang et al.). This mapping is presented as an analogy under rich fading and low noise, followed by separate theoretical analysis and numerical evaluation of classification performance. No equation reduces a derived quantity to a fitted parameter by construction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled via prior author work. The derivation therefore remains self-contained against the external ELM benchmark; any question about whether the strictly linear y = Hx + n model satisfies the required non-constant activation function belongs to correctness rather than circularity.
Axiom & Free-Parameter Ledger
free parameters (2)
- number of XL receive antennas
- analog combiner weights
axioms (2)
- domain assumption channel coefficients provide sufficient randomness to act as fixed random hidden nodes
- standard math universal approximation theorem applies to the resulting ELM structure
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
we cast the XL MIMO system to the Extreme Learning Machine (ELM) framework... Rapp activation function... grapp(yn)≜yn(1+(yn/ysat)α)−1
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Proposition 1... Ricean fading... infinitely differentiable nonlinear function... bounded
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Over-The-Air Extreme Learning Machines with XL Reception via Nonlinear Cascaded Metasurfaces
An XL-MIMO system with stacked intelligent metasurfaces realizes an over-the-air extreme learning machine for binary classification, matching digital model performance in the XL regime.
Reference graph
Works this paper leans on
-
[1]
Toward goal- oriented semantic communications: New metrics, framework, and open challenges,
A. Li, S. Wu, S. Meng, R. Lu, S. Sun, and Q. Zhang, “Toward goal- oriented semantic communications: New metrics, framework, and open challenges,”IEEE Wireless Commun., vol. 31, no. 5, pp. 238–245, 2024
work page 2024
-
[2]
Towards distributed and intelligent integrated sensing and communications for 6G networks,
E. Calvanese Strinati, G. C. Alexandropoulos, N. Amani, M. Croz- zoli, G. Madhusudan, S. Mekki, F. Rivet, V . Sciancalepore, P. Sehier, M. Stark, and H. Wymeersch, “Towards distributed and intelligent integrated sensing and communications for 6G networks,”IEEE Wireless Commun., to appear, 2025
work page 2025
-
[3]
Goal-oriented communications for the IoT: System design and adaptive resource optimization,
P. Di Lorenzo, M. Merluzzi, F. Binucci, C. Battiloro, P. Banelli, E. Cal- vanese Strinati, and S. Barbarossa, “Goal-oriented communications for the IoT: System design and adaptive resource optimization,”IEEE Internet Things Mag., vol. 6, no. 4, pp. 26–32, 2023
work page 2023
-
[4]
Joint source–channel coding: Fundamentals and recent progress in practical designs,
D. G ¨und¨uz, M. A. Wigger, T.-Y . Tung, P. Zhang, and Y . Xiao, “Joint source–channel coding: Fundamentals and recent progress in practical designs,”Proc. IEEE, pp. 1–32, early access, 2024
work page 2024
-
[5]
Deep learning enabled semantic communication systems,
H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, 2021
work page 2021
-
[6]
Wireless image re- trieval at the edge,
M. Jankowski, D. G ¨und¨uz, and K. Mikolajczyk, “Wireless image re- trieval at the edge,”IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 89–100, 2021
work page 2021
-
[7]
Latent space alignment for AI-native MIMO semantic communica- tions,
M. E. Pandolfo, S. Fiorellino, E. Calvanese Strinati, and P. Di Lorenzo, “Latent space alignment for AI-native MIMO semantic communica- tions,” inProc. IEEE IJCNN, 2025
work page 2025
-
[8]
A survey on over-the-air computation,
A. S ¸ahin and R. Yang, “A survey on over-the-air computation,”IEEE Commun. Surv. Tutor., vol. 25, no. 3, pp. 1877–1908, 2023
work page 1908
-
[9]
Z. R. Omam, H. Taghvaee, A. Araghi, M. Garcia-Fernandez, G. Alvarez- Narciandi, G. C. Alexandropoulos, O. Yurduseven, and M. Khalily, “Holographic metasurfaces enabling wave computing for 6G: Sta- tus overview, challenges, and future research trends,”arXiv preprint arXiv:2501.05173, 2025
-
[10]
Deep over-the-air computation,
H. Ye, G. Y . Li, and B.-H. F. Juang, “Deep over-the-air computation,” inProc. IEEE Int. Conf. Commun., virtual, 2020
work page 2020
-
[11]
All-optical machine learning using diffractive deep neural networks,
X. Lin, Y . Rivenson, N. T. Yardimci, M. Veli, Y . Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,”Science, vol. 361, no. 6406, pp. 1004–1008, 2018
work page 2018
-
[12]
Electromagnetic wave-based extreme deep learning with nonlinear time-floquet entanglement,
A. Momeni and R. Fleury, “Electromagnetic wave-based extreme deep learning with nonlinear time-floquet entanglement,”Nature Commun., vol. 13, no. 1, p. 2651, May 2022
work page 2022
-
[13]
Over- the-air edge inference via metasurfaces-integrated artificial neural net- works,
K. Stylianopoulos, P. Di Lorenzo, and G. C. Alexandropoulos, “Over- the-air edge inference via metasurfaces-integrated artificial neural net- works,”arXiv preprint arXiv:2504.00233, 2025
-
[14]
Stacked intelligent metasurfaces for task-oriented semantic communications,
G. Huang, J. An, Z. Yang, L. Gan, M. Bennis, and M. Debbah, “Stacked intelligent metasurfaces for task-oriented semantic communications,” arXiv preprint arXiv:2407.15053, 2024
-
[15]
Implementing neural net- works over-the-air via reconfigurable intelligent surfaces,
M. Hua, C. Bian, H. Wu, and D. Gunduz, “Implementing neural net- works over-the-air via reconfigurable intelligent surfaces,”arXiv preprint arXiv:2508.01840, 2025
-
[16]
Dynamic metasurface antennas for 6G extreme massive mimo communications,
N. Shlezinger, G. C. Alexandropoulos, M. F. Imani, Y . C. Eldar, and D. R. Smith, “Dynamic metasurface antennas for 6G extreme massive mimo communications,”IEEE Wireless Commun., vol. 28, no. 2, pp. 106–113, 2021
work page 2021
-
[17]
Extreme learning machine: Theory and applications,
G.-B. Huang, Q.-Y . Zhu, and C.-K. Siew, “Extreme learning machine: Theory and applications,”Neurocomput., vol. 70, no. 1, pp. 489–501, Dec. 2006
work page 2006
-
[18]
Massive MIMO with spatially correlated rician fading channels,
¨O. ¨Ozdogan, E. Bj ¨ornson, and E. G. Larsson, “Massive MIMO with spatially correlated rician fading channels,”IEEE Trans. Commun., vol. 67, no. 5, pp. 3234–3250, 2019
work page 2019
-
[19]
P. Gavriilidis, D. Mishra, B. Smida, E. Basar, C. Yuen, and G. C. Alexandropoulos, “Active reconfigurable intelligent surfaces: Circuit modeling and reflection amplification optimization,”early access, IEEE Open J. Commun. Society, 2025
work page 2025
-
[20]
Effects of HPA-nonlinearity on a 4-DPSK/OFDM-signal for a digital sound broadcasting signal,
C. Rapp, “Effects of HPA-nonlinearity on a 4-DPSK/OFDM-signal for a digital sound broadcasting signal,” inESA Special Publications Series, B. Kaldeich, Ed., vol. 332, 1991, pp. 179–184
work page 1991
-
[21]
G.-B. Huang and H. Babri, “Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions,”IEEE Trans. Neural Netw., vol. 9, 1998
work page 1998
-
[22]
Serre,Matrices: Theory and Applications
D. Serre,Matrices: Theory and Applications. New York: Springer, 2002
work page 2002
-
[23]
Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease,
M. A. Little, P. E. McSharry, E. J. Hunter, J. Spielman and L. O. Ramig, “Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease,”IEEE Trans. Biomed. Eng., vol. 56, no. 4, pp. 1015–1022, 2009
work page 2009
-
[24]
[WY20] Hao Wang and Dit-Yan Yeung
W. N. Street, W. H. Wolberg, and O. L. Mangasarian, “Diagnostic Wis- consin breast cancer database,” UC Irvine Machine Learning Repository, 2008, doi: 10.24432/C5DW2B
-
[25]
The MNIST database of handwritten digit images for machine learning research,
L. Deng, “The MNIST database of handwritten digit images for machine learning research,”IEEE Signal Process. Mag., vol. 29, no. 6, pp. 141– 142, 2012
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.