pith. sign in

arxiv: 1907.11836 · v1 · pith:YACI6J3Vnew · submitted 2019-07-27 · 💻 cs.NI · cs.IT· cs.LG· math.IT

Deep Learning for CSI Feedback Based on Superimposed Coding

Pith reviewed 2026-05-24 15:13 UTC · model grok-4.3

classification 💻 cs.NI cs.ITcs.LGmath.IT
keywords CSI feedbacksuperimposed codingdeep learningmassive MIMOmulti-task neural networkMMSEFDD
0
0 comments X

The pith

A multi-task neural network trained at one SNR and power coefficient improves downlink CSI estimation from superimposed signals while maintaining uplink data detection across varying conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper combines deep learning with superimposed coding to reduce uplink bandwidth usage for CSI feedback in massive MIMO FDD systems. It proposes a multi-task NN at the base station that unfolds two MMSE iterations to jointly recover downlink CSI and uplink user sequences. Trained subnet-by-subnet at a fixed SNR and PPC, this network is shown to outperform standalone SC methods in CSI accuracy with comparable or better UL-US detection even when SNR and PPC change.

Core claim

By unfolding two iterations of the MMSE criterion-based interference reduction into a multi-task neural network architecture and training it subnet-by-subnet, the network recovers downlink CSI and UL-US from superimposed signals, and when trained at a specific SNR and PPC, it consistently improves downlink CSI estimation with similar or better UL-US detection under varying SNR and PPC compared to standalone SC-based CSI scheme.

What carries the argument

The multi-task neural network that unfolds two MMSE iterations for interference reduction, allowing joint recovery of CSI and user data.

If this is right

  • Reduces the occupation of uplink bandwidth resources for CSI feedback in massive MIMO.
  • Improves estimation accuracy of downlink CSI without sacrificing uplink user data detection.
  • Enables the use of superimposed coding with deep learning to handle varying channel conditions without retraining.
  • Facilitates parameter tuning and faster convergence through subnet-by-subnet training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could reduce the need for frequent retraining in dynamic wireless environments.
  • Similar unfolding techniques might apply to other interference cancellation problems in communications.
  • The method suggests that learned interference reduction can generalize better than traditional iterative methods across parameter ranges.

Load-bearing premise

That a network unfolding exactly two MMSE iterations trained at one SNR and PPC will generalize to other SNR and PPC values without retraining.

What would settle it

Testing the multi-task NN on a range of SNRs and PPCs different from the training values and checking if CSI estimation error increases or UL-US detection worsens compared to the standalone SC scheme.

Figures

Figures reproduced from arXiv: 1907.11836 by Bin Cai, Chaojin Qing, Chuan Huang, Jiafan Wang, Qingyao Yang.

Figure 1
Figure 1. Figure 1: FIGURE 1 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: 1) NETWORK FUNCTION SUMMARY For ease of description, we denote four subnets as CSI-NET1, DET-NET1, CSI-NET2, and DET-NET2, respectively. The functionality of the network components is summarized as follows: • CSI-NETi corresponds to the MMSE estimation of downlink CSI (i.e., (3) in SC-baseline), while i = 1, 2 represents the first and second iteration, respectively. • DET-NET1 and DET-NET2 respectively det… view at source ↗
Figure 3
Figure 3. Figure 3: shows that the NMSE of each model (i.e., N = 16, N = 32, and N = 64) outperforms the SC-baseline, especially at high SNR. Although SNR = 5dB is adopted in training phase, the three trained network models work well in the entire SNR span varying from 0dB to 14dB. Thus, it is obvious that the designed and trained subnets (i.e., CSI￾NET1 and CSI-NET2) have a good generalization ability for 0 2 4 6 8 10 12 14 … view at source ↗
Figure 10
Figure 10. Figure 10: Note that, from Fig. 5 to Fig. 10, the NN training [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIGURE 5 [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIGURE 6 [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIGURE 9 [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
read the original abstract

Massive multiple-input multiple-output (MIMO) with frequency division duplex (FDD) mode is a promising approach to increasing system capacity and link robustness for the fifth generation (5G) wireless cellular systems. The premise of these advantages is the accurate downlink channel state information (CSI) fed back from user equipment. However, conventional feedback methods have difficulties in reducing feedback overhead due to significant amount of base station (BS) antennas in massive MIMO systems. Recently, deep learning (DL)-based CSI feedback conquers many difficulties, yet still shows insufficiency to decrease the occupation of uplink bandwidth resources. In this paper, to solve this issue, we combine DL and superimposed coding (SC) for CSI feedback, in which the downlink CSI is spread and then superimposed on uplink user data sequences (UL-US) toward the BS. Then, a multi-task neural network (NN) architecture is proposed at BS to recover the downlink CSI and UL-US by unfolding two iterations of the minimum mean-squared error (MMSE) criterion-based interference reduction. In addition, for a network training, a subnet-by-subnet approach is exploited to facilitate the parameter tuning and expedite the convergence rate. Compared with standalone SC-based CSI scheme, our multi-task NN, trained in a specific signal-to-noise ratio (SNR) and power proportional coefficient (PPC), consistently improves the estimation of downlink CSI with similar or better UL-US detection under SNR and PPC varying.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes combining deep learning with superimposed coding for downlink CSI feedback in massive MIMO FDD systems. Downlink CSI is spread and superimposed onto uplink user data sequences; at the BS a multi-task NN recovers both by unfolding exactly two MMSE iterations, trained subnet-by-subnet at one fixed SNR/PPC pair. The central claim is that this yields consistently better downlink CSI NMSE and comparable or superior UL-US detection when SNR and PPC deviate from the training values.

Significance. If the empirical robustness result holds, the approach would demonstrate a practical route to lowering uplink bandwidth consumption for CSI feedback while preserving data detection performance. The subnet-by-subnet training procedure is a concrete implementation detail that could aid reproducibility. The significance remains provisional because the generalization across operating points rests on an unverified architectural assumption rather than an explicit invariance mechanism.

major comments (2)
  1. [Abstract] Abstract: the claim that the multi-task NN 'consistently improves' downlink CSI estimation under SNR and PPC variation is asserted without any numerical results, error bars, dataset description, or ablation studies; this empirical assertion is load-bearing for the central contribution.
  2. [Abstract] Abstract: the architecture unfolds exactly two MMSE iterations and is trained at one fixed SNR/PPC pair via subnet-by-subnet training, yet no input normalization, PPC/SNR embedding, or regularization is described that would render the learned canceller independent of the operating point; the generalization claim therefore reduces to an unverified robustness assumption.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive comments. We address the two major comments on the abstract point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the multi-task NN 'consistently improves' downlink CSI estimation under SNR and PPC variation is asserted without any numerical results, error bars, dataset description, or ablation studies; this empirical assertion is load-bearing for the central contribution.

    Authors: The abstract summarizes the main empirical findings of the work. The supporting evidence—including NMSE curves for downlink CSI under SNR and PPC sweeps (with direct comparisons to standalone SC), simulation parameters, channel dataset generation details, and performance of the multi-task architecture versus single-task baselines—is contained in Sections IV and V together with the associated figures. These results quantify the consistent improvement and include the operating-point variation tests. We are willing to revise the abstract to include a brief parenthetical reference to these sections if the editor deems it necessary for clarity. revision: partial

  2. Referee: [Abstract] Abstract: the architecture unfolds exactly two MMSE iterations and is trained at one fixed SNR/PPC pair via subnet-by-subnet training, yet no input normalization, PPC/SNR embedding, or regularization is described that would render the learned canceller independent of the operating point; the generalization claim therefore reduces to an unverified robustness assumption.

    Authors: The architecture deliberately unfolds a fixed number of MMSE iterations and employs subnet-by-subnet training at a single operating point to obtain stable convergence. No explicit SNR/PPC embedding or additional regularization for invariance is introduced. Nevertheless, the manuscript reports extensive cross-validation experiments (detailed in the results section) in which the same trained network is evaluated at SNR and PPC values different from the training point; these experiments show that downlink CSI NMSE remains superior to standalone SC while UL-US detection stays comparable. The generalization claim is therefore grounded in the reported empirical behavior rather than an architectural invariance guarantee. We do not claim theoretical independence from the operating point. revision: no

Circularity Check

0 steps flagged

No significant circularity; empirical DL architecture with no self-referential derivation

full rationale

The paper proposes an empirical multi-task NN obtained by unfolding two MMSE iterations for joint CSI and UL-US recovery under superimposed coding, trained subnet-by-subnet at one SNR/PPC pair. The abstract and description contain no equations, uniqueness theorems, or self-citations that reduce the claimed generalization or NMSE improvement to a fitted input by construction, a renamed known result, or a load-bearing self-citation chain. Performance claims rest on simulation comparisons rather than a closed mathematical derivation that loops back to its own inputs; the architecture is presented as a trainable approximator, not a self-defining identity.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard MMSE interference-reduction criterion and the assumption that a two-iteration unfolding suffices; no new physical constants or ad-hoc entities are introduced.

free parameters (2)
  • power proportional coefficient (PPC)
    Used both for signal superposition and for training the network; its specific value is chosen for training and affects generalization claims.
  • training SNR
    Network is trained at one specific SNR; performance under varying SNR is asserted but the training value itself is a design choice.
axioms (1)
  • domain assumption MMSE criterion provides a suitable interference-reduction step that can be unfolded into neural-network layers
    Invoked when the architecture is described as 'unfolding two iterations of the minimum mean-squared error (MMSE) criterion-based interference reduction'.

pith-pipeline@v0.9.0 · 5800 in / 1430 out tokens · 21331 ms · 2026-05-24T15:13:33.821335+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 8 internal anchors

  1. [1]

    Spatial domain management and massive MIMO coordination in 5G SDN,

    S. Sun, B. Rong, R. Q. Hu, and Y . Qian, “Spatial domain management and massive MIMO coordination in 5G SDN,” IEEE Access, vol. 3, pp. 0 2 4 6 8 10 12 14SNR (dB) 10-4 10-3 10-2 10-1 BER SC-baseline (;=0.15)Proposed (;=0.15)SC-baseline (;=0.10)Proposed (;=0.10)SC-baseline (;=0.05)Proposed (;=0.05) FIGURE 10. BER versus SNR, where N = 16 ,M = 512 . 2238–225...

  2. [2]

    An efficient CSI feedback scheme for dual-polarized massive MIMO

    F. Zheng, Y . Chen, Q. Zhan, J. Zhang, “An efficient CSI feedback scheme for dual-polarized massive MIMO”, IEEE Access , vol. 6, pp. 23420– 23430, Mar. 2018

  3. [3]

    Deep learning for massive MIMO CSI feedback

    C. Wen, W. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback”, IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, Oct. 2018

  4. [4]

    Deep learning-based CSI feed- back approach for time-varying massive MIMO channels

    T. Wang, C. Wen, S. Jin, and G. Y . Li, “Deep learning-based CSI feed- back approach for time-varying massive MIMO channels”, IEEE Wireless Commun. Lett., to be published. DOI: 10.1109/LWC.2018.2874264

  5. [5]

    MIMO channel information feedback using deep recurrent network,

    C. Lu, W. Xu, H. Shen, J. Zhu, and K. Wang, “MIMO channel information feedback using deep recurrent network,” IEEE Commun. Lett., vol. 23, no. 1, pp. 188–191, Jan. 2019

  6. [6]

    Deep Autoencoder based CSI Feedback with Feedback Errors and Feedback Delay in FDD Massive MIMO Systems,

    Y . Jang, G. Kong, M. Jung, S. Choi, and I. Kim, “Deep Autoencoder based CSI Feedback with Feedback Errors and Feedback Delay in FDD Massive MIMO Systems,” IEEE Wireless Commun. Lett ., to be published. DOI: 10.1109/LWC.2019.2895039

  7. [7]

    Deep UL2DL: Channel knowledge transfer from uplink to downlink,

    M. Safari and V . Pourahmadi, “Deep UL2DL: Channel knowledge transfer from uplink to downlink,” arXiv preprint arXiv: 1812.07518, 2018

  8. [8]

    Enabling FDD Massive MIMO through Deep Learning-based Channel Prediction

    M. Arnold, S. Dörner, S. Cammerer, S. Yan, J. Hoydis, and S. Brink, “Enabling FDD Massive MIMO through Deep Learning-based Channel Prediction,” arXiv preprint arXiv:1901.03664, 2019

  9. [9]

    Deep-learning-based millimeter- wave massive MIMO for hybrid precoding,

    H. Huang, Y . Song, J. Yang, and G. Gui, “Deep-learning-based millimeter- wave massive MIMO for hybrid precoding,” IEEE Trans. Veh. Technol., vol. 68, no. 3, pp. 3027–3032, Mar. 2019

  10. [10]

    Enhanced CSI acquisition for FDD multi-user massive MIMO systems,

    F. Zhang, S. Sun, Q. Gao, W. Tang, “Enhanced CSI acquisition for FDD multi-user massive MIMO systems,” IEEE Access , vol. 6, pp. 23034– 23042, Apr. 2018

  11. [11]

    Massive-MIMO Enabled FDD Wireless Backhaul Small-Cell Relay Networks: AF Protocol Based Designs With Low Channel Estima- tion and Feedback Complexity,

    C. Song, “Massive-MIMO Enabled FDD Wireless Backhaul Small-Cell Relay Networks: AF Protocol Based Designs With Low Channel Estima- tion and Feedback Complexity,” IEEE Access., vol. 6,pp. 31050–31064, Jun. 2018

  12. [12]

    Compressive sensing-based differential channel feedback for massive MIMO,

    W. Shen, L. Dai, Y . Shi, X. Zhu, Z. Wang, “Compressive sensing-based differential channel feedback for massive MIMO,” Electron Lett., vol. 51, no. 22, pp. 1824–1826, Oct. 2015

  13. [13]

    Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO,

    Z. Gao, L. Dai, Z. Wang, S. Chen, “Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO,”IEEE Trans. Signal Process., vol. 63, no. 23, pp. 6169–6183, Dec. 2015

  14. [14]

    Compressive sensing based channel feedback protocols for spatially-correlated massive antenna arrays,

    P. Kuo, H. Kung, and P. Ting, “Compressive sensing based channel feedback protocols for spatially-correlated massive antenna arrays,” in Proc. IEEE Int. Conf. Wireless Commun. Networking (WCNC), Shanghai, China, Apr. 2012, pp. 492–497

  15. [15]

    Multidimensional compressive sensing based 10 VOLUME 4, 2016 Chaojin Qing et al.: Preparation of Papers for IEEE ACCESS analog CSI feedback for massive MIMO-OFDM systems,

    P. Cheng and Z. Chen, “Multidimensional compressive sensing based 10 VOLUME 4, 2016 Chaojin Qing et al.: Preparation of Papers for IEEE ACCESS analog CSI feedback for massive MIMO-OFDM systems,” in Proc. Veh. Technol. Conf. (VTC)-Fall 2014, Vancouver, Canada, Sept 2014, pp. 1–6

  16. [16]

    Distributed compressive CSIT estimation and feed- back for FDD multi-user massive MIMO systems,

    X. Rao and V . Lau, “Distributed compressive CSIT estimation and feed- back for FDD multi-user massive MIMO systems,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3261–3271, Jun. 2014

  17. [17]

    Data-Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios,

    Y . Wang, M. Liu, J. Yang, and G. Gui, “Data-Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios,” IEEE Trans. Veh. Technol., vol. 68, no. 4, pp. 4074-4077, Apr. 2019

  18. [18]

    An introduction to deep learning for the physical layer,

    T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. on Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, Dec. 2017

  19. [19]

    Deep learning based physical layer wire- less communication techniques: Opportunities and challenges,

    G. Gui, Y . Wang, and H. Huang, “Deep learning based physical layer wire- less communication techniques: Opportunities and challenges,” Journal of Communications, vol. 40, no. 2, pp. 19–23, Feb. 2019

  20. [20]

    Deep Learning in Physical Layer Communications

    Z. Qin, H. Ye, G. Y . Li, and B. Juang, “Deep learning in physical layer communications.” arXiv preprint arXiv: 1807.11713, 2018

  21. [21]

    Deep learning for an effective non-orthogonal multiple access scheme,

    G. Gui, H. Huang, Y . Song, and H. Sari, “Deep learning for an effective non-orthogonal multiple access scheme,” IEEE Trans. Veh. Technol., vol. 67, no. 9, pp. 8440-8450, Sept. 2018

  22. [22]

    Feedback of Downlink Channel State Information Based on Superimposed Coding,

    D. Xu, Y . Huang, and L. Yang, “Feedback of Downlink Channel State Information Based on Superimposed Coding,” IEEE Commun. Lett ., vol 11, no. 3,pp. 240–242, Mar. 2007

  23. [23]

    Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures

    J. Hershey, J. Roux, and F. Weninger, “Deep unfolding: Model-based inspiration of novel deep architectures,” arXiv preprint arXiv:1409.2574, 2014

  24. [24]

    Learning to detect,

    N. Samuel, T. Diskin, and A. Wiesel, “Learning to detect,” IEEE Trans. Signal Process., vol. 67, no. 10, pp. 2554–2564, May 2019

  25. [25]

    Trainable Projected Gradient Detector for Massive Overloaded MIMO Channels: Data-driven Tuning Approach

    S. Takabe, M. Imanishi, T. Wadayama, and K. Hayashi, “Trainable Pro- jected Gradient Detector for Massive Overloaded MIMO Channels: Data- driven Tuning Approach,” arXiv preprint arXiv:1812.10044, 2018

  26. [26]

    The effective- ness of layer-by-layer training using the information bottleneck principle

    A. Elad, D. Haviv, Y . Blau, and T. Michaeli, “The effective- ness of layer-by-layer training using the information bottleneck principle”, submitted to ICLR 2019 , 2019. [Online]. Available: https://openreview.net/pdf?id=r1Nb5i05tX

  27. [27]

    Multitask Learning

    R. Caruana, “Multitask Learning”, Machine Learning, vol. 28, no. 1, pp. 41–75, 1997

  28. [28]

    ComNet: Combination of deep learning and expert knowledge in OFDM receivers,

    X. Gao, S. Jin, C. Wen, and G. Y . Li, “ComNet: Combination of deep learning and expert knowledge in OFDM receivers,”IEEE Commun. Lett., pp. 2627–2630, Dec. 2018

  29. [29]

    Batch normalization: Accelerating deep network training by reducing internal covariate shift,

    S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 448–456

  30. [30]

    Searching for Activation Functions

    P. Ramachandran, B. Zoph, and Q. Le, “searching for activation functions,” arXiv preprint arXiv:1710.05941, 2017

  31. [31]

    Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks

    S. Eger, P. Youssef, and I. Gurevych, “Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks,” arXiv preprint arXiv:1901.02671, 2019

  32. [32]

    Understanding the difficulty of training deep feedforward neural networks,

    X. Glorot and Y . Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. 13th Int. Conf. Artif. Intell. Statist., 2010, vol. 9, pp. 249–256

  33. [33]

    Multi-task learning as multi-objective optimiza- tion,

    O. Sener, and V . Koltun, “Multi-task learning as multi-objective optimiza- tion,” in Proc. Adv. Neural Inf. Process. Syst., pp. 525–536, 2018

  34. [34]

    Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,

    Z. Chen, V . Badrinarayanan, C. Lee, and A. Rabinovich, “Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 793–802

  35. [35]

    The Benefits of Over-parameterization at Initializa- tion in Deep ReLU Networks,

    D. Arpit, Y . Bengio, “The Benefits of Over-parameterization at Initializa- tion in Deep ReLU Networks,” arXiv preprint arXiv:1901.03611, 2019

  36. [36]

    Artificial Intelligence-aided Receiver for A CP-Free OFDM System: Design, Simulation, and Experimental Test

    J. Zhang, C. Wen, S. Jin, and G. Y . Li, “Artificial Intelligence-aided Receiver for A CP-Free OFDM System: Design, Simulation, and Experi- mental Test,” arXiv preprint arXiv:1903.04766, 2019

  37. [37]

    Power of deep learning for channel esti- mation and signal detection in OFDM systems,

    H. Ye, G. Y . Li, and B. Juang, “Power of deep learning for channel esti- mation and signal detection in OFDM systems,” IEEE Wireless Commun. Lett., vol. 7, no. 1, pp. 114–117, Feb. 2018

  38. [38]

    Adam: A Method for Stochastic Optimization

    D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

  39. [39]

    Backpropagating through the air: Deep learning at physical layer without channel models,

    V . Raj, S. Kalyani, “Backpropagating through the air: Deep learning at physical layer without channel models,” IEEE Commun. Lett., vol. 22, no. 11, pp. 2278–22810, Nov. 2018

  40. [40]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, A. Courville, Deep Learning , Cambridge, MA:MIT Press, 2016. CHAOJIN QING (M’15) received the B.S. de- gree in communication engineering from Chengdu University of Information Technology, Chengdu, China, in 2001, the M.S. and Ph.D. degrees in communications and information systems from the University of Electronic Science and Te...