Deep Learning for CSI Feedback Based on Superimposed Coding
Pith reviewed 2026-05-24 15:13 UTC · model grok-4.3
The pith
A multi-task neural network trained at one SNR and power coefficient improves downlink CSI estimation from superimposed signals while maintaining uplink data detection across varying conditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By unfolding two iterations of the MMSE criterion-based interference reduction into a multi-task neural network architecture and training it subnet-by-subnet, the network recovers downlink CSI and UL-US from superimposed signals, and when trained at a specific SNR and PPC, it consistently improves downlink CSI estimation with similar or better UL-US detection under varying SNR and PPC compared to standalone SC-based CSI scheme.
What carries the argument
The multi-task neural network that unfolds two MMSE iterations for interference reduction, allowing joint recovery of CSI and user data.
If this is right
- Reduces the occupation of uplink bandwidth resources for CSI feedback in massive MIMO.
- Improves estimation accuracy of downlink CSI without sacrificing uplink user data detection.
- Enables the use of superimposed coding with deep learning to handle varying channel conditions without retraining.
- Facilitates parameter tuning and faster convergence through subnet-by-subnet training.
Where Pith is reading between the lines
- This approach could reduce the need for frequent retraining in dynamic wireless environments.
- Similar unfolding techniques might apply to other interference cancellation problems in communications.
- The method suggests that learned interference reduction can generalize better than traditional iterative methods across parameter ranges.
Load-bearing premise
That a network unfolding exactly two MMSE iterations trained at one SNR and PPC will generalize to other SNR and PPC values without retraining.
What would settle it
Testing the multi-task NN on a range of SNRs and PPCs different from the training values and checking if CSI estimation error increases or UL-US detection worsens compared to the standalone SC scheme.
Figures
read the original abstract
Massive multiple-input multiple-output (MIMO) with frequency division duplex (FDD) mode is a promising approach to increasing system capacity and link robustness for the fifth generation (5G) wireless cellular systems. The premise of these advantages is the accurate downlink channel state information (CSI) fed back from user equipment. However, conventional feedback methods have difficulties in reducing feedback overhead due to significant amount of base station (BS) antennas in massive MIMO systems. Recently, deep learning (DL)-based CSI feedback conquers many difficulties, yet still shows insufficiency to decrease the occupation of uplink bandwidth resources. In this paper, to solve this issue, we combine DL and superimposed coding (SC) for CSI feedback, in which the downlink CSI is spread and then superimposed on uplink user data sequences (UL-US) toward the BS. Then, a multi-task neural network (NN) architecture is proposed at BS to recover the downlink CSI and UL-US by unfolding two iterations of the minimum mean-squared error (MMSE) criterion-based interference reduction. In addition, for a network training, a subnet-by-subnet approach is exploited to facilitate the parameter tuning and expedite the convergence rate. Compared with standalone SC-based CSI scheme, our multi-task NN, trained in a specific signal-to-noise ratio (SNR) and power proportional coefficient (PPC), consistently improves the estimation of downlink CSI with similar or better UL-US detection under SNR and PPC varying.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes combining deep learning with superimposed coding for downlink CSI feedback in massive MIMO FDD systems. Downlink CSI is spread and superimposed onto uplink user data sequences; at the BS a multi-task NN recovers both by unfolding exactly two MMSE iterations, trained subnet-by-subnet at one fixed SNR/PPC pair. The central claim is that this yields consistently better downlink CSI NMSE and comparable or superior UL-US detection when SNR and PPC deviate from the training values.
Significance. If the empirical robustness result holds, the approach would demonstrate a practical route to lowering uplink bandwidth consumption for CSI feedback while preserving data detection performance. The subnet-by-subnet training procedure is a concrete implementation detail that could aid reproducibility. The significance remains provisional because the generalization across operating points rests on an unverified architectural assumption rather than an explicit invariance mechanism.
major comments (2)
- [Abstract] Abstract: the claim that the multi-task NN 'consistently improves' downlink CSI estimation under SNR and PPC variation is asserted without any numerical results, error bars, dataset description, or ablation studies; this empirical assertion is load-bearing for the central contribution.
- [Abstract] Abstract: the architecture unfolds exactly two MMSE iterations and is trained at one fixed SNR/PPC pair via subnet-by-subnet training, yet no input normalization, PPC/SNR embedding, or regularization is described that would render the learned canceller independent of the operating point; the generalization claim therefore reduces to an unverified robustness assumption.
Simulated Author's Rebuttal
We thank the referee for the careful review and constructive comments. We address the two major comments on the abstract point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the multi-task NN 'consistently improves' downlink CSI estimation under SNR and PPC variation is asserted without any numerical results, error bars, dataset description, or ablation studies; this empirical assertion is load-bearing for the central contribution.
Authors: The abstract summarizes the main empirical findings of the work. The supporting evidence—including NMSE curves for downlink CSI under SNR and PPC sweeps (with direct comparisons to standalone SC), simulation parameters, channel dataset generation details, and performance of the multi-task architecture versus single-task baselines—is contained in Sections IV and V together with the associated figures. These results quantify the consistent improvement and include the operating-point variation tests. We are willing to revise the abstract to include a brief parenthetical reference to these sections if the editor deems it necessary for clarity. revision: partial
-
Referee: [Abstract] Abstract: the architecture unfolds exactly two MMSE iterations and is trained at one fixed SNR/PPC pair via subnet-by-subnet training, yet no input normalization, PPC/SNR embedding, or regularization is described that would render the learned canceller independent of the operating point; the generalization claim therefore reduces to an unverified robustness assumption.
Authors: The architecture deliberately unfolds a fixed number of MMSE iterations and employs subnet-by-subnet training at a single operating point to obtain stable convergence. No explicit SNR/PPC embedding or additional regularization for invariance is introduced. Nevertheless, the manuscript reports extensive cross-validation experiments (detailed in the results section) in which the same trained network is evaluated at SNR and PPC values different from the training point; these experiments show that downlink CSI NMSE remains superior to standalone SC while UL-US detection stays comparable. The generalization claim is therefore grounded in the reported empirical behavior rather than an architectural invariance guarantee. We do not claim theoretical independence from the operating point. revision: no
Circularity Check
No significant circularity; empirical DL architecture with no self-referential derivation
full rationale
The paper proposes an empirical multi-task NN obtained by unfolding two MMSE iterations for joint CSI and UL-US recovery under superimposed coding, trained subnet-by-subnet at one SNR/PPC pair. The abstract and description contain no equations, uniqueness theorems, or self-citations that reduce the claimed generalization or NMSE improvement to a fitted input by construction, a renamed known result, or a load-bearing self-citation chain. Performance claims rest on simulation comparisons rather than a closed mathematical derivation that loops back to its own inputs; the architecture is presented as a trainable approximator, not a self-defining identity.
Axiom & Free-Parameter Ledger
free parameters (2)
- power proportional coefficient (PPC)
- training SNR
axioms (1)
- domain assumption MMSE criterion provides a suitable interference-reduction step that can be unfolded into neural-network layers
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
a multi-task neural network (NN) architecture is proposed at BS to recover the downlink CSI and UL-US by unfolding two iterations of the minimum mean-squared error (MMSE) criterion-based interference reduction
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
subnet-by-subnet approach is exploited to facilitate the parameter tuning and expedite the convergence rate
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Spatial domain management and massive MIMO coordination in 5G SDN,
S. Sun, B. Rong, R. Q. Hu, and Y . Qian, “Spatial domain management and massive MIMO coordination in 5G SDN,” IEEE Access, vol. 3, pp. 0 2 4 6 8 10 12 14SNR (dB) 10-4 10-3 10-2 10-1 BER SC-baseline (;=0.15)Proposed (;=0.15)SC-baseline (;=0.10)Proposed (;=0.10)SC-baseline (;=0.05)Proposed (;=0.05) FIGURE 10. BER versus SNR, where N = 16 ,M = 512 . 2238–225...
work page 2015
-
[2]
An efficient CSI feedback scheme for dual-polarized massive MIMO
F. Zheng, Y . Chen, Q. Zhan, J. Zhang, “An efficient CSI feedback scheme for dual-polarized massive MIMO”, IEEE Access , vol. 6, pp. 23420– 23430, Mar. 2018
work page 2018
-
[3]
Deep learning for massive MIMO CSI feedback
C. Wen, W. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback”, IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, Oct. 2018
work page 2018
-
[4]
Deep learning-based CSI feed- back approach for time-varying massive MIMO channels
T. Wang, C. Wen, S. Jin, and G. Y . Li, “Deep learning-based CSI feed- back approach for time-varying massive MIMO channels”, IEEE Wireless Commun. Lett., to be published. DOI: 10.1109/LWC.2018.2874264
-
[5]
MIMO channel information feedback using deep recurrent network,
C. Lu, W. Xu, H. Shen, J. Zhu, and K. Wang, “MIMO channel information feedback using deep recurrent network,” IEEE Commun. Lett., vol. 23, no. 1, pp. 188–191, Jan. 2019
work page 2019
-
[6]
Y . Jang, G. Kong, M. Jung, S. Choi, and I. Kim, “Deep Autoencoder based CSI Feedback with Feedback Errors and Feedback Delay in FDD Massive MIMO Systems,” IEEE Wireless Commun. Lett ., to be published. DOI: 10.1109/LWC.2019.2895039
-
[7]
Deep UL2DL: Channel knowledge transfer from uplink to downlink,
M. Safari and V . Pourahmadi, “Deep UL2DL: Channel knowledge transfer from uplink to downlink,” arXiv preprint arXiv: 1812.07518, 2018
-
[8]
Enabling FDD Massive MIMO through Deep Learning-based Channel Prediction
M. Arnold, S. Dörner, S. Cammerer, S. Yan, J. Hoydis, and S. Brink, “Enabling FDD Massive MIMO through Deep Learning-based Channel Prediction,” arXiv preprint arXiv:1901.03664, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[9]
Deep-learning-based millimeter- wave massive MIMO for hybrid precoding,
H. Huang, Y . Song, J. Yang, and G. Gui, “Deep-learning-based millimeter- wave massive MIMO for hybrid precoding,” IEEE Trans. Veh. Technol., vol. 68, no. 3, pp. 3027–3032, Mar. 2019
work page 2019
-
[10]
Enhanced CSI acquisition for FDD multi-user massive MIMO systems,
F. Zhang, S. Sun, Q. Gao, W. Tang, “Enhanced CSI acquisition for FDD multi-user massive MIMO systems,” IEEE Access , vol. 6, pp. 23034– 23042, Apr. 2018
work page 2018
-
[11]
C. Song, “Massive-MIMO Enabled FDD Wireless Backhaul Small-Cell Relay Networks: AF Protocol Based Designs With Low Channel Estima- tion and Feedback Complexity,” IEEE Access., vol. 6,pp. 31050–31064, Jun. 2018
work page 2018
-
[12]
Compressive sensing-based differential channel feedback for massive MIMO,
W. Shen, L. Dai, Y . Shi, X. Zhu, Z. Wang, “Compressive sensing-based differential channel feedback for massive MIMO,” Electron Lett., vol. 51, no. 22, pp. 1824–1826, Oct. 2015
work page 2015
-
[13]
Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO,
Z. Gao, L. Dai, Z. Wang, S. Chen, “Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO,”IEEE Trans. Signal Process., vol. 63, no. 23, pp. 6169–6183, Dec. 2015
work page 2015
-
[14]
P. Kuo, H. Kung, and P. Ting, “Compressive sensing based channel feedback protocols for spatially-correlated massive antenna arrays,” in Proc. IEEE Int. Conf. Wireless Commun. Networking (WCNC), Shanghai, China, Apr. 2012, pp. 492–497
work page 2012
-
[15]
P. Cheng and Z. Chen, “Multidimensional compressive sensing based 10 VOLUME 4, 2016 Chaojin Qing et al.: Preparation of Papers for IEEE ACCESS analog CSI feedback for massive MIMO-OFDM systems,” in Proc. Veh. Technol. Conf. (VTC)-Fall 2014, Vancouver, Canada, Sept 2014, pp. 1–6
work page 2016
-
[16]
Distributed compressive CSIT estimation and feed- back for FDD multi-user massive MIMO systems,
X. Rao and V . Lau, “Distributed compressive CSIT estimation and feed- back for FDD multi-user massive MIMO systems,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3261–3271, Jun. 2014
work page 2014
-
[17]
Data-Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios,
Y . Wang, M. Liu, J. Yang, and G. Gui, “Data-Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios,” IEEE Trans. Veh. Technol., vol. 68, no. 4, pp. 4074-4077, Apr. 2019
work page 2019
-
[18]
An introduction to deep learning for the physical layer,
T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. on Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, Dec. 2017
work page 2017
-
[19]
G. Gui, Y . Wang, and H. Huang, “Deep learning based physical layer wire- less communication techniques: Opportunities and challenges,” Journal of Communications, vol. 40, no. 2, pp. 19–23, Feb. 2019
work page 2019
-
[20]
Deep Learning in Physical Layer Communications
Z. Qin, H. Ye, G. Y . Li, and B. Juang, “Deep learning in physical layer communications.” arXiv preprint arXiv: 1807.11713, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
Deep learning for an effective non-orthogonal multiple access scheme,
G. Gui, H. Huang, Y . Song, and H. Sari, “Deep learning for an effective non-orthogonal multiple access scheme,” IEEE Trans. Veh. Technol., vol. 67, no. 9, pp. 8440-8450, Sept. 2018
work page 2018
-
[22]
Feedback of Downlink Channel State Information Based on Superimposed Coding,
D. Xu, Y . Huang, and L. Yang, “Feedback of Downlink Channel State Information Based on Superimposed Coding,” IEEE Commun. Lett ., vol 11, no. 3,pp. 240–242, Mar. 2007
work page 2007
-
[23]
Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures
J. Hershey, J. Roux, and F. Weninger, “Deep unfolding: Model-based inspiration of novel deep architectures,” arXiv preprint arXiv:1409.2574, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[24]
N. Samuel, T. Diskin, and A. Wiesel, “Learning to detect,” IEEE Trans. Signal Process., vol. 67, no. 10, pp. 2554–2564, May 2019
work page 2019
-
[25]
S. Takabe, M. Imanishi, T. Wadayama, and K. Hayashi, “Trainable Pro- jected Gradient Detector for Massive Overloaded MIMO Channels: Data- driven Tuning Approach,” arXiv preprint arXiv:1812.10044, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[26]
The effective- ness of layer-by-layer training using the information bottleneck principle
A. Elad, D. Haviv, Y . Blau, and T. Michaeli, “The effective- ness of layer-by-layer training using the information bottleneck principle”, submitted to ICLR 2019 , 2019. [Online]. Available: https://openreview.net/pdf?id=r1Nb5i05tX
work page 2019
-
[27]
R. Caruana, “Multitask Learning”, Machine Learning, vol. 28, no. 1, pp. 41–75, 1997
work page 1997
-
[28]
ComNet: Combination of deep learning and expert knowledge in OFDM receivers,
X. Gao, S. Jin, C. Wen, and G. Y . Li, “ComNet: Combination of deep learning and expert knowledge in OFDM receivers,”IEEE Commun. Lett., pp. 2627–2630, Dec. 2018
work page 2018
-
[29]
Batch normalization: Accelerating deep network training by reducing internal covariate shift,
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 448–456
work page 2015
-
[30]
Searching for Activation Functions
P. Ramachandran, B. Zoph, and Q. Le, “searching for activation functions,” arXiv preprint arXiv:1710.05941, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[31]
Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks
S. Eger, P. Youssef, and I. Gurevych, “Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks,” arXiv preprint arXiv:1901.02671, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[32]
Understanding the difficulty of training deep feedforward neural networks,
X. Glorot and Y . Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. 13th Int. Conf. Artif. Intell. Statist., 2010, vol. 9, pp. 249–256
work page 2010
-
[33]
Multi-task learning as multi-objective optimiza- tion,
O. Sener, and V . Koltun, “Multi-task learning as multi-objective optimiza- tion,” in Proc. Adv. Neural Inf. Process. Syst., pp. 525–536, 2018
work page 2018
-
[34]
Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,
Z. Chen, V . Badrinarayanan, C. Lee, and A. Rabinovich, “Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 793–802
work page 2018
-
[35]
The Benefits of Over-parameterization at Initializa- tion in Deep ReLU Networks,
D. Arpit, Y . Bengio, “The Benefits of Over-parameterization at Initializa- tion in Deep ReLU Networks,” arXiv preprint arXiv:1901.03611, 2019
-
[36]
J. Zhang, C. Wen, S. Jin, and G. Y . Li, “Artificial Intelligence-aided Receiver for A CP-Free OFDM System: Design, Simulation, and Experi- mental Test,” arXiv preprint arXiv:1903.04766, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1903
-
[37]
Power of deep learning for channel esti- mation and signal detection in OFDM systems,
H. Ye, G. Y . Li, and B. Juang, “Power of deep learning for channel esti- mation and signal detection in OFDM systems,” IEEE Wireless Commun. Lett., vol. 7, no. 1, pp. 114–117, Feb. 2018
work page 2018
-
[38]
Adam: A Method for Stochastic Optimization
D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[39]
Backpropagating through the air: Deep learning at physical layer without channel models,
V . Raj, S. Kalyani, “Backpropagating through the air: Deep learning at physical layer without channel models,” IEEE Commun. Lett., vol. 22, no. 11, pp. 2278–22810, Nov. 2018
work page 2018
-
[40]
I. Goodfellow, Y . Bengio, A. Courville, Deep Learning , Cambridge, MA:MIT Press, 2016. CHAOJIN QING (M’15) received the B.S. de- gree in communication engineering from Chengdu University of Information Technology, Chengdu, China, in 2001, the M.S. and Ph.D. degrees in communications and information systems from the University of Electronic Science and Te...
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.