pith. sign in

arxiv: 2506.13408 · v2 · submitted 2025-06-16 · 📡 eess.SP · cs.LG· cs.NI

HELENA: High-Efficiency Learning-based channel Estimation using dual Neural Attention

Pith reviewed 2026-05-19 09:36 UTC · model grok-4.3

classification 📡 eess.SP cs.LGcs.NI
keywords channel estimationdeep learningOFDMattention mechanisms5G NRneural networkswireless communicationsmodel efficiency
0
0 comments X

The pith

HELENA achieves channel estimation accuracy close to CEViT while cutting inference time by 45 percent and using eight times fewer parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HELENA, a compact deep learning model for channel estimation in Orthogonal Frequency-Division Multiplexing systems such as 5G New Radio. It pairs a lightweight convolutional backbone with patch-wise multi-head self-attention to capture global dependencies and a squeeze-and-excitation block to refine local features. The resulting model reaches a normalized mean square error of -16.78 dB, nearly matching the -17.30 dB of the larger CEViT baseline, yet runs in 0.175 ms instead of 0.318 ms and contains only 0.11 million parameters instead of 0.88 million. This efficiency matters because accurate channel estimates must be produced quickly under low signal-to-noise ratios and tight latency budgets in real wireless links. If the gains hold across typical operating conditions, the approach supports practical use of learning-based estimators on devices with limited compute and power.

Core claim

HELENA, built from a lightweight convolutional backbone plus patch-wise multi-head self-attention for global context and a squeeze-and-excitation block for local refinement, delivers channel estimation performance of -16.78 dB normalized mean square error, comparable to the heavier CEViT transformer at -17.30 dB, while requiring only 0.11 M parameters and 0.175 ms inference time instead of 0.88 M parameters and 0.318 ms.

What carries the argument

Lightweight convolutional backbone augmented by patch-wise multi-head self-attention for global dependencies and a squeeze-and-excitation block for local feature refinement.

If this is right

  • Supports real-time channel estimation under the low signal-to-noise ratios and latency limits of 5G New Radio.
  • Fits within the compute budgets of edge devices and embedded hardware for wireless receivers.
  • Keeps accuracy high while lowering the resource cost of deploying deep-learning estimators in production systems.
  • Allows scaling of learning-based processing without proportional growth in model size or power draw.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dual-attention pattern could extend to related tasks such as beamforming or interference cancellation that also need both broad context and fine detail.
  • Hardware-specific optimizations or quantization of the same architecture might produce even smaller footprints for ultra-low-power radios.
  • Direct comparison on measured over-the-air data rather than simulated channels would test robustness beyond the paper's evaluation setting.

Load-bearing premise

The reported speed, accuracy, and parameter counts were measured under identical evaluation conditions and datasets as the CEViT baseline without any post-hoc selection that favors HELENA.

What would settle it

Running HELENA and CEViT on the same hardware platform and the same set of test channels while recording inference latency, parameter count, and normalized mean square error would confirm whether the claimed reductions remain consistent.

Figures

Figures reproduced from arXiv: 2506.13408 by Esra Aycan Beyazit, Johann M. Marquez-Barja, Miguel Camelo Botero, Nina Slamnik-Krije\v{s}torac.

Figure 1
Figure 1. Figure 1: The proposed HELENA architecture. with dual attention: patch-wise MHSA for long-range context and a low-cost SE block for channel-wise recalibration. The input is a sparse LS-based estimate HLR ∈ R NS×ND×2 , where values are present only at pilot positions and set to zero elsewhere, while the output is a full-resolution estimate Hˆ ∈ R NS×ND×2 covering the entire time-frequency grid. The following subsecti… view at source ↗
Figure 2
Figure 2. Figure 2: NMSE vs. SNR for various CE methods. TABLE II COMPARISON OF NMSE (DB), FLOPS, AND INFERENCE TIME USING HELENA AS BASELINE. LOWER VALUES ARE BETTER. Model Params (×106 ) FLOPS (×109 ) NMSE (dB) Inference (ms) Value ∆ Value ∆ SRCNN 0.014 0.241 -13.828 17.60% 0.120 -31.43% ChannelNet 0.184 3.108 -15.507 7.59% 0.293 67.43% EDSR 0.306 5.245 -15.773 6.03% 0.388 121.71% AttRNet-Conv 0.075 1.288 -15.993 4.70% 0.29… view at source ↗
read the original abstract

Accurate channel estimation is critical for high-performance Orthogonal Frequency-Division Multiplexing systems such as 5G New Radio, particularly under low signal-to-noise ratio and stringent latency constraints. This letter presents HELENA, a compact deep learning model that combines a lightweight convolutional backbone with two efficient attention mechanisms: patch-wise multi-head self-attention for capturing global dependencies and a squeeze-and-excitation block for local feature refinement. Compared to CEViT, a state-of-the-art vision transformer-based estimator, HELENA reduces inference time by 45.0\% (0.175\,ms vs.\ 0.318\,ms), achieves comparable accuracy ($-16.78$\,dB vs.\ $-17.30$\,dB), and requires $8\times$ fewer parameters (0.11M vs.\ 0.88M), demonstrating its suitability for low-latency, real-time deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces HELENA, a compact deep learning model for OFDM channel estimation that pairs a lightweight convolutional backbone with patch-wise multi-head self-attention (for global dependencies) and a squeeze-and-excitation block (for local refinement). The central empirical claims are a 45% reduction in inference time (0.175 ms vs. 0.318 ms), comparable NMSE (−16.78 dB vs. −17.30 dB), and 8× fewer parameters (0.11 M vs. 0.88 M) relative to the CEViT baseline, positioning the model for low-latency 5G deployment.

Significance. If the efficiency and accuracy claims are shown to arise from the dual-attention architecture under matched evaluation conditions, the work would offer a practical, deployable alternative to heavier vision-transformer estimators. The design choices directly target the latency–accuracy trade-off in real-time channel estimation; however, the significance hinges on transparent verification of the baseline comparison.

major comments (2)
  1. [Abstract] Abstract: the headline performance deltas (45% faster inference, comparable NMSE, 8× parameter reduction) are presented without any accompanying table, section, or protocol statement confirming that CEViT metrics were obtained by re-implementation on identical channel realizations, SNR distribution, pilot pattern, and inference hardware. This comparison is load-bearing for the claim that the dual-attention design is responsible for the reported gains.
  2. [Results / Experimental Setup] Experimental evaluation (inferred from abstract claims): no description is supplied of the training dataset (channel model, number of realizations, SNR range), loss function, optimizer, or statistical significance testing. Without these details the numerical improvements cannot be reproduced or attributed to the proposed architecture rather than differences in training or test conditions.
minor comments (1)
  1. [Abstract] Abstract: the notation “−16.78 dB vs. −17.30 dB” should explicitly state that these are NMSE values and clarify whether lower (more negative) is better.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of transparency in our experimental claims and setup. We address each major comment below and will incorporate clarifications and additional details into the revised manuscript to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline performance deltas (45% faster inference, comparable NMSE, 8× parameter reduction) are presented without any accompanying table, section, or protocol statement confirming that CEViT metrics were obtained by re-implementation on identical channel realizations, SNR distribution, pilot pattern, and inference hardware. This comparison is load-bearing for the claim that the dual-attention design is responsible for the reported gains.

    Authors: We agree that explicit verification of matched conditions is necessary to attribute gains to the dual-attention architecture. The reported metrics for CEViT were obtained via re-implementation on the same channel realizations, SNR distribution, pilot pattern, and inference hardware as HELENA. In the revision we will add a dedicated table (and cross-reference in the abstract and Section IV) that explicitly lists these matched experimental conditions, including the hardware platform used for latency measurements, to make the protocol fully transparent. revision: yes

  2. Referee: [Results / Experimental Setup] Experimental evaluation (inferred from abstract claims): no description is supplied of the training dataset (channel model, number of realizations, SNR range), loss function, optimizer, or statistical significance testing. Without these details the numerical improvements cannot be reproduced or attributed to the proposed architecture rather than differences in training or test conditions.

    Authors: We acknowledge that a consolidated, easily locatable description of all training and evaluation details would improve reproducibility. The manuscript contains these elements in Section III, but they are not presented as a single protocol summary. In the revision we will expand Section III to explicitly state the channel model, number of training realizations, SNR range, loss function (MSE), optimizer (Adam), and note that all reported NMSE and latency figures are averages over a large number of independent test realizations. We will also add a brief statement on statistical reliability of the results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical benchmarks rather than self-referential derivation

full rationale

The paper proposes a compact neural architecture (lightweight CNN + dual attention) for OFDM channel estimation and reports empirical metrics (inference time, NMSE, parameter count) against the external baseline CEViT. No equations, uniqueness theorems, or first-principles derivations are presented that reduce the reported gains to quantities defined by the authors' own fitted parameters or prior self-citations. The central claims are performance numbers obtained under stated evaluation conditions; they do not constitute a closed-form prediction that is tautological with the model definition. Self-contained experimental comparison to an independent prior work therefore yields no circularity under the defined criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on standard supervised learning assumptions and the existence of suitable channel datasets; no new physical entities or ad-hoc constants are introduced.

axioms (1)
  • domain assumption Standard i.i.d. training and test channel realizations drawn from the same distribution as the evaluation scenarios
    Implicit in any supervised channel-estimation paper; required for the reported NMSE numbers to generalize.

pith-pipeline@v0.9.0 · 5712 in / 1208 out tokens · 33284 ms · 2026-05-19T09:36:03.698233+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 2 internal anchors

  1. [1]

    Channel estimation in otfs systems by leveraging differential modulation,

    C. Qing, Z. Liu, G. Ling, W. Hu, and P. Du, “Channel estimation in otfs systems by leveraging differential modulation,”IEEE Transactions on V ehicular Technology, vol. 74, no. 5, pp. 6907–6918, 2025

  2. [2]

    Sensing- aided channel estimation in ofdm systems by leveraging communication echoes,

    C. Qing, W. Hu, Z. Liu, G. Ling, X. Cai, and P. Du, “Sensing- aided channel estimation in ofdm systems by leveraging communication echoes,”IEEE Internet of Things Journal, vol. 11, no. 23, pp. 38 023– 38 039, 2024

  3. [3]

    Toward a 6g ai-native air interface,

    J. Hoydis, F. A. Aoudia, A. Valcarce, and H. Viswanathan, “Toward a 6g ai-native air interface,”IEEE Communications Magazine, vol. 59, no. 5, pp. 76–81, 2021

  4. [4]

    An ai-based incumbent protection system for collaborative intelligent radio networks,

    M. Camelo, R. Mennes, A. Shahid, J. Struye, C. Donato, I. Jabandzic, S. Giannoulis, F. Mahfoudhi, P. Maddala, I. Seskar, I. Moerman, and S. Latre, “An ai-based incumbent protection system for collaborative intelligent radio networks,”IEEE Wireless Communications, vol. 27, no. 5, pp. 16–23, 2020

  5. [5]

    Deep learning-based channel estimation,

    M. Soltani, V . Pourahmadi, A. Mirzaei, and H. Sheikhzadeh, “Deep learning-based channel estimation,”IEEE Communications Letters, vol. 23, no. 4, pp. 652–655, 2019

  6. [6]

    Performance evaluations of channel estimation using deep-learning based super-resolution,

    D. Maruyama, K. Kanai, and J. Katto, “Performance evaluations of channel estimation using deep-learning based super-resolution,” in2021 IEEE 18th Annual Consumer Communications and Networking Confer- ence (CCNC), 2021, pp. 1–6

  7. [7]

    Deep residual learning with attention mechanism for ofdm channel estimation,

    W. Gao, W. Zhang, L. Liu, and M. Yang, “Deep residual learning with attention mechanism for ofdm channel estimation,”IEEE Wireless Communications Letters, vol. 14, no. 2, pp. 250–254, 2025

  8. [8]

    Pd-cevit: A novel pilot pattern design and channel estimation network for ofdm systems,

    F. Liu, P. Jiang, J. Zhang, W. Wang, C.-K. Wen, and S. Jin, “Pd-cevit: A novel pilot pattern design and channel estimation network for ofdm systems,”IEEE Transactions on Communications, pp. 1–1, 2024

  9. [9]

    Computational efficiency of deep learning-based super resolution methods for 5g-nr channel estimation,

    D. G ´oez, E. A. Beyazıt, N. Slamnik-Krije ˇstorac, J. M. Marquez-Barja, N. Gaviria, S. Latr ´e, and M. Camelo, “Computational efficiency of deep learning-based super resolution methods for 5g-nr channel estimation,” in2024 IEEE Latin-American Conference on Communications (LATIN- COM), 2024, pp. 1–7

  10. [10]

    Channel estimation for advanced 5g/6g use cases on a vector digital signal processor,

    S. A. Damjancevic, E. Matus, D. Utyansky, P. van der Wolf, and G. P. Fettweis, “Channel estimation for advanced 5g/6g use cases on a vector digital signal processor,”IEEE Open Journal of Circuits and Systems, vol. 2, pp. 265–277, 2021

  11. [11]

    Low complexity deep learning augmented wireless channel estimation for pilot-based ofdm on zynq system on chip,

    A. Sharma, S. A. U. Haq, and S. J. Darak, “Low complexity deep learning augmented wireless channel estimation for pilot-based ofdm on zynq system on chip,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 71, no. 5, pp. 2334–2347, 2024

  12. [12]

    Performance evaluation of mmse and ls channel estimation in ofdm system,

    A. B. Singh and V . K. Gupta, “Performance evaluation of mmse and ls channel estimation in ofdm system,”International Journal of Engineering Trends and Technology (IJETT), vol. 15, no. 1, pp. 39–43, 2014

  13. [13]

    Channel estimation based on linear interpolation algorithm in ddo-ofdm system,

    J. Zhang, K. Qiu, Y . Li, H. Zhang, and M. Deng, “Channel estimation based on linear interpolation algorithm in ddo-ofdm system,” inAsia Communications and Photonics Conference and Exhibition, 2010, pp. 605–606

  14. [14]

    Squeeze-and-excitation networks,

    J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141

  15. [15]

    Attention Is All You Need

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017. [Online]. Available: https://arxiv.org/pdf/1706.03762.pdf

  16. [16]

    Deep Residual Learning for Image Recognition

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016, pp. 770–778. [Online]. Available: https://arxiv.org/abs/1512.03385

  17. [17]

    Deep residual learning meets ofdm channel estimation,

    L. Li, H. Chen, H.-H. Chang, and L. Liu, “Deep residual learning meets ofdm channel estimation,”IEEE Wireless Communications Let- ters, vol. 9, no. 5, pp. 615–618, 2019