pith. sign in

arxiv: 2602.04728 · v2 · submitted 2026-02-04 · 📡 eess.SP · cs.IT· cs.LG· math.IT

Scalable Cross-Attention Transformer for Cooperative Multi-AP OFDM Uplink Reception

Pith reviewed 2026-05-16 07:04 UTC · model grok-4.3

classification 📡 eess.SP cs.ITcs.LGmath.IT
keywords OFDMcross-attentionTransformermulti-APuplink decodingcooperative receptionWi-Fichannel estimation
0
0 comments X

The pith

A cross-attention Transformer fuses signals from multiple access points to decode uplink OFDM without explicit channel estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a Transformer architecture for joint decoding of OFDM uplink transmissions received at multiple coordinated access points. Each receiver runs a shared encoder on its time-frequency grid, after which a token-wise cross-attention layer combines the encoded features to produce soft log-likelihood ratios for a standard decoder. Training uses a bit-metric objective so the model learns to weight receivers according to their reliability and operates without separate channel estimation. On realistic Wi-Fi channels the approach surpasses conventional pipelines and other neural receivers while often matching the accuracy of a local perfect-CSI reference and remaining efficient on ordinary hardware.

Core claim

The central claim is that a shared per-receiver encoder followed by token-wise cross-attention can fuse multi-AP observations into decoder-ready soft outputs, removing the need for explicit channel estimates while preserving or exceeding the performance of a perfect-CSI baseline on realistic Wi-Fi channels.

What carries the argument

The token-wise cross-attention module that fuses per-receiver encoded tokens according to learned reliability weights.

If this is right

  • Coordinated multi-AP reception becomes practical without CSI feedback or estimation overhead.
  • The receiver adapts automatically to degraded links or sparse pilots through attention weighting.
  • Decoding remains computationally light enough for commodity hardware in next-generation Wi-Fi.
  • Joint processing improves performance under strong frequency selectivity compared with independent per-AP decoding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fusion pattern could reduce pilot overhead in dense cell-free or distributed MIMO deployments.
  • End-to-end training on bit metrics suggests the architecture might extend to other coordinated reception tasks that currently rely on separate estimation stages.
  • Hardware validation would clarify whether simulation-trained weights transfer when real impairments such as hardware phase noise appear.

Load-bearing premise

A model trained on simulated Wi-Fi channels with a bit-metric loss will generalize to real deployments without explicit channel estimates or additional fine-tuning.

What would settle it

Error-rate measurements on a physical multi-AP hardware testbed using live over-the-air Wi-Fi channels, compared directly against a perfect-CSI reference receiver.

Figures

Figures reproduced from arXiv: 2602.04728 by Amor Nafkha, Apostolos Kountouris, Gr\'egoire Lefebvre, Ha\"ifa Fares, Xavier Tardy.

Figure 1
Figure 1. Figure 1: Neural coordinated decoding with three APs. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of the proposed cross-attention Transformer joint decoder. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: BER performance vs. Eb/N0 for varying cooperation levels ( [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

We propose a cross-attention Transformer for joint decoding of uplink OFDM signals received by multiple coordinated access points. A shared per-receiver encoder learns the time-frequency structure of each grid, and a token-wise cross-attention module fuses the receivers to produce soft log-likelihood ratios for a standard channel decoder without explicit channel estimates. Trained with a bit-metric objective, the model adapts its fusion to per-receiver reliability and remains robust under degraded links, strong frequency selectivity, and sparse pilots. Over realistic Wi-Fi channels, it outperforms classical pipelines and strong neural baselines, often matching or surpassing a local perfect-CSI reference while remaining compact and computationally efficient on commodity hardware, making it suitable for next-generation coordinated Wi-Fi receivers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a cross-attention Transformer for joint uplink OFDM decoding across multiple coordinated access points. A shared per-receiver encoder extracts time-frequency features from each AP's received grid, and a token-wise cross-attention module fuses these features to produce soft LLRs for a standard decoder without explicit channel estimation. The model is trained end-to-end on simulated Wi-Fi channels using a bit-metric loss and is claimed to outperform classical pipelines and neural baselines while often matching or exceeding a local perfect-CSI reference, remaining compact and efficient for commodity hardware.

Significance. If the empirical claims hold under broader validation, the work could meaningfully advance coordinated multi-AP reception in Wi-Fi and 6G systems by removing CSI overhead and enabling reliability-aware fusion. The end-to-end bit-metric training that allows the attention mechanism to adapt to per-receiver quality without separate estimation steps is a practical strength, as is the reported compactness. This approach addresses a real deployment pain point in dense networks with frequency-selective channels and sparse pilots.

major comments (2)
  1. [Experimental Evaluation section] Experimental Evaluation section: All reported results use only simulated Wi-Fi channel realizations; no experiments on measured real-world channels, hardware testbeds, or explicit modeling of impairments such as phase noise and I/Q imbalance are provided. This directly undermines the central claim that the model matches or surpasses the local perfect-CSI reference in realistic deployments without fine-tuning.
  2. [Abstract and §4] Abstract and §4: The outperformance claims are stated without any quantitative tables, BER/throughput curves, exact baseline configurations, error bars, or ablation results on the cross-attention module. The central empirical assertion therefore cannot be verified from the manuscript as presented.
minor comments (2)
  1. [§3.1] §3.1: The token embedding procedure for the per-receiver encoder would benefit from an explicit equation or pseudocode block to clarify how the time-frequency grid is tokenized before cross-attention.
  2. [Notation throughout] Notation throughout: The distinction between 'local perfect-CSI reference' and the proposed model's output LLRs should be defined once with a consistent symbol to avoid ambiguity in performance comparisons.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments, which help clarify the scope and presentation of our work. We address each major comment below and describe the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Experimental Evaluation section] Experimental Evaluation section: All reported results use only simulated Wi-Fi channel realizations; no experiments on measured real-world channels, hardware testbeds, or explicit modeling of impairments such as phase noise and I/Q imbalance are provided. This directly undermines the central claim that the model matches or surpasses the local perfect-CSI reference in realistic deployments without fine-tuning.

    Authors: We agree that the evaluation relies exclusively on simulated TGn Wi-Fi channel realizations, which are standard for assessing performance under realistic frequency-selective conditions but do not capture hardware impairments such as phase noise or I/Q imbalance. The central claim is therefore scoped to these simulated realistic channels, where the model often matches or exceeds the local perfect-CSI reference. In the revised manuscript we will add an explicit Limitations subsection that qualifies the claims, discusses the potential impact of unmodeled impairments, and outlines future directions for hardware validation. We cannot add new measured-channel or testbed results in this revision as those experiments have not been performed. revision: partial

  2. Referee: [Abstract and §4] Abstract and §4: The outperformance claims are stated without any quantitative tables, BER/throughput curves, exact baseline configurations, error bars, or ablation results on the cross-attention module. The central empirical assertion therefore cannot be verified from the manuscript as presented.

    Authors: Section 4 already contains BER curves for the proposed model against classical pipelines and neural baselines, together with the local perfect-CSI reference, and the text specifies the baseline configurations and training details. To improve verifiability we will insert a summary table reporting quantitative gains at target BERs, include error bars on the curves where Monte-Carlo variance is relevant, and add an ablation study isolating the contribution of the token-wise cross-attention module. These additions will be placed in §4 and referenced from the abstract. revision: yes

standing simulated objections not resolved
  • Absence of measured real-world channel data or hardware testbed results, which cannot be supplied without conducting new experiments outside the scope of the current revision.

Circularity Check

0 steps flagged

No circularity: empirical training and evaluation on external benchmarks

full rationale

The paper presents an end-to-end trained cross-attention Transformer whose performance is measured by bit-metric loss and empirical comparisons against classical receivers and neural baselines on simulated Wi-Fi channels. No derivation chain reduces any claimed output (LLRs, outperformance, or perfect-CSI matching) to a fitted parameter or self-citation by construction. All load-bearing steps are data-driven and externally falsifiable; the architecture and objective do not embed the target metrics inside the training loop itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the learned cross-attention fusion generalizing from simulated training data; no explicit free parameters beyond standard neural-network hyperparameters are named, and no new physical entities are postulated.

free parameters (1)
  • neural network hyperparameters (layers, heads, embedding size)
    Standard trainable parameters whose specific values are not reported in the abstract.
axioms (1)
  • domain assumption Bit-metric training produces LLRs suitable for a standard channel decoder
    Invoked when stating that the model output feeds a conventional decoder.

pith-pipeline@v0.9.0 · 5440 in / 1256 out tokens · 31374 ms · 2026-05-16T07:04:44.804827+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    P802.11bn - Enhancements for Ultra High Reliability (Project page / PAR),

    “P802.11bn - Enhancements for Ultra High Reliability (Project page / PAR),” 2024, published: IEEE 802.11 PARs / Working Group page

  2. [2]

    Foundations of User- Centric Cell-Free Massive MIMO,

    ¨O. T. Demir, E. Bj ¨ornson, and L. Sanguinetti, “Foundations of User- Centric Cell-Free Massive MIMO,”F ound. Trends Signal Process., vol. 14, no. 3-4, pp. 162–472, Jan. 2021

  3. [3]

    Multi-cell MIMO cooperative networks: A new look at interference,

    D. Gesbert, S. Hanly, H. Huang, S. Shamai, O. Simeone, and W. Yu, “Multi-cell MIMO cooperative networks: A new look at interference,” Journal on Selected Areas in Communications, vol. 28, no. 9, 2010

  4. [4]

    On channel estimation in ofdm systems,

    J. J. van de Beek, O. Edfors, M. Sandell, S. K. Wilson, and P. O. B¨orjesson, “On channel estimation in ofdm systems,” inProceedings of the IEEE V ehicular Technology Conference (VTC), 1995

  5. [5]

    Training-based MIMO channel estimation: A study of estimator tradeoffs and optimal training signals,

    M. Biguesh and A. B. Gershman, “Training-based MIMO channel estimation: A study of estimator tradeoffs and optimal training signals,” IEEE Transactions on Signal Processing, vol. 54, no. 3, 2006

  6. [6]

    Cell-Free Multi-User MIMO Equalization via In-Context Learning,

    M. Zecchin, K. Yu, and O. Simeone, “Cell-Free Multi-User MIMO Equalization via In-Context Learning,” pp. 646–650, Sep. 2024

  7. [7]

    Large Sequence Model for MIMO Equalization in Fully Decoupled Radio Access Network,

    K. Yu, H. Zhou, Y . Xu, Z. Liu, H. Du, and X. Shen, “Large Sequence Model for MIMO Equalization in Fully Decoupled Radio Access Network,” pp. 4491–4504, 2025

  8. [8]

    In-Context Learned Equalization in Cell-Free Massive MIMO via State-Space Models,

    Z. Song, M. Zecchin, B. Rajendran, and O. Simeone, “In-Context Learned Equalization in Cell-Free Massive MIMO via State-Space Models,” pp. 1–6, May 2025

  9. [9]

    Power of Deep Learning for Channel Estimation and Signal Detection in OFDM Systems,

    H. Ye, G. Y . Li, and B.-H. Juang, “Power of Deep Learning for Channel Estimation and Signal Detection in OFDM Systems,”IEEE Wireless Communications Letters, vol. 7, no. 1, pp. 114–117, Feb. 2018

  10. [10]

    DeepRx: Fully Convo- lutional Deep Learning Receiver,

    M. Honkala, D. Korpi, and J. M. J. Huttunen, “DeepRx: Fully Convo- lutional Deep Learning Receiver,” Jan. 2021, arXiv:2005.01494 [eess]

  11. [11]

    Comm-Transformer: A Robust Deep Learning-Based Receiver for OFDM System Under TDL Channel,

    Y . Xie, K. C. Teh, and A. C. Kot, “Comm-Transformer: A Robust Deep Learning-Based Receiver for OFDM System Under TDL Channel,” IEEE Transactions on Communications, vol. 72, no. 4, 2024

  12. [12]

    End-to-end learning for ofdm,

    F. A ¨ıt Aoudia and J. Hoydis, “End-to-end learning for ofdm,”IEEE Transactions on Wireless Communications, 2022

  13. [13]

    An introduction to deep learning for the physical layer,

    T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,”IEEE Transactions on Cognitive Communications and Networking, 2017

  14. [14]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017

  15. [15]

    Cell-Free Massive MIMO: Foundations and Key Results,

    H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Cell-Free Massive MIMO: Foundations and Key Results,”arXiv preprint, 2017

  16. [16]

    Scalable cell-free massive mimo systems,

    E. Bj ¨ornson and L. Sanguinetti, “Scalable cell-free massive mimo systems,”IEEE Transactions on Communications, 2020

  17. [17]

    Fully- Decoupled Radio Access Networks: A Resilient Uplink Base Stations Cooperative Reception Framework,

    J. Zhao, Q. Yu, B. Qian, K. Yu, Y . Xu, H. Zhou, and X. Shen, “Fully- Decoupled Radio Access Networks: A Resilient Uplink Base Stations Cooperative Reception Framework,” pp. 5096–5110, Aug. 2023

  18. [18]

    Neuromorphic In-Context Learning for Energy-Efficient MIMO Symbol Detection,

    “Neuromorphic In-Context Learning for Energy-Efficient MIMO Symbol Detection,” pp. 1–5, Sep. 2024, iSSN: 1948-3252. [Online]. Available: https://ieeexplore.ieee.org/document/10694106

  19. [19]

    TR 138 901 - V16.1.0 - 5G; Study on channel model for frequencies from 0.5 to 100 GHz (3GPP TR 38.901 version 16.1.0 Release 16),

    “TR 138 901 - V16.1.0 - 5G; Study on channel model for frequencies from 0.5 to 100 GHz (3GPP TR 38.901 version 16.1.0 Release 16),” Tech. Rep

  20. [20]

    Sionna: An open-source library for link-level data-driven wireless communications research,

    NVIDIA, “Sionna: An open-source library for link-level data-driven wireless communications research,” https://github.com/nvlabs/sionna