pith. sign in

arxiv: 2504.12758 · v3 · submitted 2025-04-17 · 📡 eess.SP · cs.LG

Universal Approximation with XL MIMO Systems: OTA Classification via Trainable Analog Combining

Pith reviewed 2026-05-22 19:54 UTC · model grok-4.3

classification 📡 eess.SP cs.LG
keywords XL MIMOuniversal approximationextreme learning machineover-the-air inferenceanalog combiningwireless classificationedge inference
0
0 comments X

The pith

An XL MIMO system with analog combining acts as a universal function approximator for over-the-air classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that an extremely large MIMO wireless system with suitable analog combining components can function like a feedforward neural network for performing classification directly over the wireless channel. The authors map the channel coefficients to the random nodes of a hidden layer in an extreme learning machine and treat the receiver analog combiner as the trainable output layer. This casting removes the need for digital processing at the receiver or pre-processing at the transmitter. A sympathetic reader would care because the result points to a physical-layer approach that could deliver high-accuracy inference in milliseconds rather than the seconds or minutes required by conventional digital methods, which matters for low-power edge devices operating in rich fading environments.

Core claim

Under rich fading and low noise conditions with a large number of receive antennas, the XL MIMO system cast into the ELM framework exhibits universal approximation properties and supports efficient over-the-air classification, achieving above 90 percent accuracy on multiple datasets with optimization latency of only a few milliseconds.

What carries the argument

The ELM framework applied to XL MIMO, where the channel coefficients serve as random hidden-layer nodes and the receiver analog combiner serves as the trainable output layer.

Load-bearing premise

The wireless channel coefficients under rich scattering and low noise with XL receive antennas behave like the fixed random weights of an ELM hidden layer when the analog combiner is made trainable.

What would settle it

A physical experiment in which an XL MIMO array under rich fading and low noise fails to reach above 90 percent classification accuracy or requires training times longer than a few milliseconds would falsify the claimed practical performance.

Figures

Figures reproduced from arXiv: 2504.12758 by George C. Alexandropoulos, Kyriakos Stylianopoulos.

Figure 1
Figure 1. Figure 1: Soft thresholding response via Rapp’s model [20] which is used as the activation function for our XL-MIMO-ELM implemented directly with RF circuitry. III. XL MIMO LEARNING MACHINES We consider a standard formulation of EI, where a lightweight Tx observes correlated data instances through its sensors, and intends to transmit a computable feature of them to the Rx. A data set D ≜ {(x (i) , t(i) )} D i=1 of D… view at source ↗
Figure 2
Figure 2. Figure 2: Comparative performance of the proposed XL-MIMO-ELM with benchmarks across different datasets for increasing number of Rx antennas Nr (implying the number of units in the hidden layer). provides a unique solution at y ∗ n = ysat(α − 1)−1/α, and grapp(y ∗ n ) = ysat α (α − 1)1− 1 α is a finite maximum value. Following the same arguments for yn ∈ (−∞, 0) and by noting that grapp(0) = 0, grapp(·) is bounded e… view at source ↗
Figure 5
Figure 5. Figure 5: Convergence of iterative re-training in time-varying [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance of XL-MIMO-ELM over channel diversity [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
read the original abstract

In this paper, we show that an eXtremely Large (XL) Multiple-Input Multiple-Output (MIMO) wireless system with appropriate analog combining components exhibits the properties of a universal function approximator, similar to a feedforward neural network. By treating the channel coefficients as the random nodes of a hidden layer and the receiver's analog combiner as a trainable output layer, we cast the XL MIMO system to the Extreme Learning Machine (ELM) framework, leading to a novel formulation for Over-The-Air (OTA) edge inference without requiring traditional digital processing nor pre-processing at the transmitter. Through theoretical analysis and numerical evaluation, we showcase that XL-MIMO-ELM enables near-instantaneous training and efficient classification, even in varying fading conditions, suggesting the paradigm shift of beyond massive MIMO systems as OTA artificial neural networks alongside their profound communications role. Compared to conventional ELMs and deep learning approaches, whose training takes seconds to minutes, the proposed framework achieves on par performance (above $90\%$ classification accuracy across multiple data sets) with optimization latency of few milliseconds under the same number of trainable parameters, considering rich fading, low noise channels with XL receive antennas, making it highly attractive for inference tasks with ultra-low-power devices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that an XL MIMO wireless system with analog combining components can function as a universal function approximator by mapping channel coefficients to random hidden-layer nodes in the ELM framework and treating the analog combiner weights as trainable output-layer parameters. This enables OTA edge inference and classification without digital processing or transmitter pre-processing, achieving >90% accuracy with millisecond-scale training under rich fading and low noise, outperforming conventional ELM and DL in latency while matching performance.

Significance. If the ELM mapping and universal approximation property hold, the result would establish XL MIMO systems as hardware realizations of neural networks for inference tasks, enabling ultra-low-latency, low-power OTA computation alongside communications. This could shift paradigms for beyond-massive-MIMO systems in edge AI, with explicit strengths in the claimed near-instantaneous training and reproducible numerical results across datasets.

major comments (2)
  1. [Abstract / system model] Abstract and system model: The casting of the XL MIMO system (y = Hx + n) to the ELM framework treats channel coefficients directly as random hidden nodes with the analog combiner as output weights. However, the ELM universal approximation theorem (Huang et al.) requires a non-constant, bounded, continuous nonlinear activation function σ(·) applied to hidden-layer outputs. The linear MIMO model supplies randomness via fading but no such σ(·); the construction therefore reduces to linear regression over random features, which cannot guarantee approximation of arbitrary continuous functions on compact sets. This directly undermines the central universal-approximation claim.
  2. [Theoretical analysis] Theoretical analysis section: The conditions stated for the mapping (rich fading, low noise, XL receive antennas) improve conditioning of the linear map but do not introduce the required nonlinearity. No explicit activation (e.g., envelope detection or rectifier) is described prior to the trainable combiner, so the claimed equivalence to ELM does not follow.
minor comments (2)
  1. [Introduction / theoretical analysis] Add explicit citation to the Huang et al. ELM universal approximation theorem and state the precise conditions being invoked.
  2. [System model] Clarify whether any analog-component nonlinearity is assumed in the combiner; if so, include its functional form and verify it satisfies ELM activation requirements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on the connection to the ELM framework. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract / system model] Abstract and system model: The casting of the XL MIMO system (y = Hx + n) to the ELM framework treats channel coefficients directly as random hidden nodes with the analog combiner as output weights. However, the ELM universal approximation theorem (Huang et al.) requires a non-constant, bounded, continuous nonlinear activation function σ(·) applied to hidden-layer outputs. The linear MIMO model supplies randomness via fading but no such σ(·); the construction therefore reduces to linear regression over random features, which cannot guarantee approximation of arbitrary continuous functions on compact sets. This directly undermines the central universal-approximation claim.

    Authors: We agree with the referee that the standard ELM universal approximation theorem requires a nonlinear activation function σ(·) and that the linear MIMO channel model y = Hx + n does not supply this. Our construction therefore corresponds to random linear feature regression rather than the full ELM setting. We will revise the abstract, introduction, and system model sections to remove references to universal approximation in the ELM sense and instead describe the approach as OTA random feature regression for classification, while retaining the latency and accuracy results. revision: yes

  2. Referee: [Theoretical analysis] Theoretical analysis section: The conditions stated for the mapping (rich fading, low noise, XL receive antennas) improve conditioning of the linear map but do not introduce the required nonlinearity. No explicit activation (e.g., envelope detection or rectifier) is described prior to the trainable combiner, so the claimed equivalence to ELM does not follow.

    Authors: We concur that the stated conditions address matrix conditioning but do not introduce nonlinearity, and no activation function is present in the model. We will revise the theoretical analysis section to explicitly acknowledge this limitation, remove the ELM equivalence claim, and clarify that the framework provides efficient linear random-feature-based inference under the given channel assumptions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; modeling analogy to ELM is independent of target claim

full rationale

The paper's central step is an explicit modeling choice: channel coefficients are treated as random hidden-layer nodes and the analog combiner as output weights to cast the XL-MIMO system into the existing ELM framework (Huang et al.). This mapping is presented as an analogy under rich fading and low noise, followed by separate theoretical analysis and numerical evaluation of classification performance. No equation reduces a derived quantity to a fitted parameter by construction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled via prior author work. The derivation therefore remains self-contained against the external ELM benchmark; any question about whether the strictly linear y = Hx + n model satisfies the required non-constant activation function belongs to correctness rather than circularity.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the universal approximation property of ELM being transferable to the wireless channel model, plus assumptions about rich scattering and low noise. No new entities are postulated.

free parameters (2)
  • number of XL receive antennas
    Chosen to ensure sufficient randomness in channel coefficients for the hidden layer role.
  • analog combiner weights
    Trainable parameters optimized for the output layer in the ELM mapping.
axioms (2)
  • domain assumption channel coefficients provide sufficient randomness to act as fixed random hidden nodes
    Invoked when casting the MIMO system to the ELM framework in the abstract.
  • standard math universal approximation theorem applies to the resulting ELM structure
    Background result from ELM literature used to claim universal approximation.

pith-pipeline@v0.9.0 · 5754 in / 1363 out tokens · 40893 ms · 2026-05-22T19:54:51.637899+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Over-The-Air Extreme Learning Machines with XL Reception via Nonlinear Cascaded Metasurfaces

    eess.SP 2026-01 unverdicted novelty 7.0

    An XL-MIMO system with stacked intelligent metasurfaces realizes an over-the-air extreme learning machine for binary classification, matching digital model performance in the XL regime.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 1 Pith paper

  1. [1]

    Toward goal- oriented semantic communications: New metrics, framework, and open challenges,

    A. Li, S. Wu, S. Meng, R. Lu, S. Sun, and Q. Zhang, “Toward goal- oriented semantic communications: New metrics, framework, and open challenges,”IEEE Wireless Commun., vol. 31, no. 5, pp. 238–245, 2024

  2. [2]

    Towards distributed and intelligent integrated sensing and communications for 6G networks,

    E. Calvanese Strinati, G. C. Alexandropoulos, N. Amani, M. Croz- zoli, G. Madhusudan, S. Mekki, F. Rivet, V . Sciancalepore, P. Sehier, M. Stark, and H. Wymeersch, “Towards distributed and intelligent integrated sensing and communications for 6G networks,”IEEE Wireless Commun., to appear, 2025

  3. [3]

    Goal-oriented communications for the IoT: System design and adaptive resource optimization,

    P. Di Lorenzo, M. Merluzzi, F. Binucci, C. Battiloro, P. Banelli, E. Cal- vanese Strinati, and S. Barbarossa, “Goal-oriented communications for the IoT: System design and adaptive resource optimization,”IEEE Internet Things Mag., vol. 6, no. 4, pp. 26–32, 2023

  4. [4]

    Joint source–channel coding: Fundamentals and recent progress in practical designs,

    D. G ¨und¨uz, M. A. Wigger, T.-Y . Tung, P. Zhang, and Y . Xiao, “Joint source–channel coding: Fundamentals and recent progress in practical designs,”Proc. IEEE, pp. 1–32, early access, 2024

  5. [5]

    Deep learning enabled semantic communication systems,

    H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, 2021

  6. [6]

    Wireless image re- trieval at the edge,

    M. Jankowski, D. G ¨und¨uz, and K. Mikolajczyk, “Wireless image re- trieval at the edge,”IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 89–100, 2021

  7. [7]

    Latent space alignment for AI-native MIMO semantic communica- tions,

    M. E. Pandolfo, S. Fiorellino, E. Calvanese Strinati, and P. Di Lorenzo, “Latent space alignment for AI-native MIMO semantic communica- tions,” inProc. IEEE IJCNN, 2025

  8. [8]

    A survey on over-the-air computation,

    A. S ¸ahin and R. Yang, “A survey on over-the-air computation,”IEEE Commun. Surv. Tutor., vol. 25, no. 3, pp. 1877–1908, 2023

  9. [9]

    Holographic metasurfaces enabling wave computing for 6G: Sta- tus overview, challenges, and future research trends,

    Z. R. Omam, H. Taghvaee, A. Araghi, M. Garcia-Fernandez, G. Alvarez- Narciandi, G. C. Alexandropoulos, O. Yurduseven, and M. Khalily, “Holographic metasurfaces enabling wave computing for 6G: Sta- tus overview, challenges, and future research trends,”arXiv preprint arXiv:2501.05173, 2025

  10. [10]

    Deep over-the-air computation,

    H. Ye, G. Y . Li, and B.-H. F. Juang, “Deep over-the-air computation,” inProc. IEEE Int. Conf. Commun., virtual, 2020

  11. [11]

    All-optical machine learning using diffractive deep neural networks,

    X. Lin, Y . Rivenson, N. T. Yardimci, M. Veli, Y . Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,”Science, vol. 361, no. 6406, pp. 1004–1008, 2018

  12. [12]

    Electromagnetic wave-based extreme deep learning with nonlinear time-floquet entanglement,

    A. Momeni and R. Fleury, “Electromagnetic wave-based extreme deep learning with nonlinear time-floquet entanglement,”Nature Commun., vol. 13, no. 1, p. 2651, May 2022

  13. [13]

    Over- the-air edge inference via metasurfaces-integrated artificial neural net- works,

    K. Stylianopoulos, P. Di Lorenzo, and G. C. Alexandropoulos, “Over- the-air edge inference via metasurfaces-integrated artificial neural net- works,”arXiv preprint arXiv:2504.00233, 2025

  14. [14]

    Stacked intelligent metasurfaces for task-oriented semantic communications,

    G. Huang, J. An, Z. Yang, L. Gan, M. Bennis, and M. Debbah, “Stacked intelligent metasurfaces for task-oriented semantic communications,” arXiv preprint arXiv:2407.15053, 2024

  15. [15]

    Implementing neural net- works over-the-air via reconfigurable intelligent surfaces,

    M. Hua, C. Bian, H. Wu, and D. Gunduz, “Implementing neural net- works over-the-air via reconfigurable intelligent surfaces,”arXiv preprint arXiv:2508.01840, 2025

  16. [16]

    Dynamic metasurface antennas for 6G extreme massive mimo communications,

    N. Shlezinger, G. C. Alexandropoulos, M. F. Imani, Y . C. Eldar, and D. R. Smith, “Dynamic metasurface antennas for 6G extreme massive mimo communications,”IEEE Wireless Commun., vol. 28, no. 2, pp. 106–113, 2021

  17. [17]

    Extreme learning machine: Theory and applications,

    G.-B. Huang, Q.-Y . Zhu, and C.-K. Siew, “Extreme learning machine: Theory and applications,”Neurocomput., vol. 70, no. 1, pp. 489–501, Dec. 2006

  18. [18]

    Massive MIMO with spatially correlated rician fading channels,

    ¨O. ¨Ozdogan, E. Bj ¨ornson, and E. G. Larsson, “Massive MIMO with spatially correlated rician fading channels,”IEEE Trans. Commun., vol. 67, no. 5, pp. 3234–3250, 2019

  19. [19]

    Active reconfigurable intelligent surfaces: Circuit modeling and reflection amplification optimization,

    P. Gavriilidis, D. Mishra, B. Smida, E. Basar, C. Yuen, and G. C. Alexandropoulos, “Active reconfigurable intelligent surfaces: Circuit modeling and reflection amplification optimization,”early access, IEEE Open J. Commun. Society, 2025

  20. [20]

    Effects of HPA-nonlinearity on a 4-DPSK/OFDM-signal for a digital sound broadcasting signal,

    C. Rapp, “Effects of HPA-nonlinearity on a 4-DPSK/OFDM-signal for a digital sound broadcasting signal,” inESA Special Publications Series, B. Kaldeich, Ed., vol. 332, 1991, pp. 179–184

  21. [21]

    Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions,

    G.-B. Huang and H. Babri, “Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions,”IEEE Trans. Neural Netw., vol. 9, 1998

  22. [22]

    Serre,Matrices: Theory and Applications

    D. Serre,Matrices: Theory and Applications. New York: Springer, 2002

  23. [23]

    Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease,

    M. A. Little, P. E. McSharry, E. J. Hunter, J. Spielman and L. O. Ramig, “Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease,”IEEE Trans. Biomed. Eng., vol. 56, no. 4, pp. 1015–1022, 2009

  24. [24]

    [WY20] Hao Wang and Dit-Yan Yeung

    W. N. Street, W. H. Wolberg, and O. L. Mangasarian, “Diagnostic Wis- consin breast cancer database,” UC Irvine Machine Learning Repository, 2008, doi: 10.24432/C5DW2B

  25. [25]

    The MNIST database of handwritten digit images for machine learning research,

    L. Deng, “The MNIST database of handwritten digit images for machine learning research,”IEEE Signal Process. Mag., vol. 29, no. 6, pp. 141– 142, 2012