Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels

Ahmed Alkhateeb; Markku Juntti; Nhan Thanh Nguyen; Sina Tavakolian

arxiv: 2602.04703 · v1 · pith:UKZWDMXHnew · submitted 2026-02-04 · 📡 eess.SP · cs.LG

Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels

Sina Tavakolian , Nhan Thanh Nguyen , Ahmed Alkhateeb , Markku Juntti This is my paper

Pith reviewed 2026-05-22 10:58 UTC · model grok-4.3

classification 📡 eess.SP cs.LG

keywords knowledge distillationmmWave beam predictionsub-6 GHz channelsdeep learningmodel compressionbeamforminghigh mobilityspectral efficiency

0 comments

The pith

Knowledge distillation allows 99% smaller models to predict optimal mmWave beams from sub-6 GHz channels

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a knowledge distillation approach to predict mmWave beams using sub-6 GHz channel information with compact deep learning models. These student models are trained to imitate large teacher models through individual and relational strategies. A sympathetic reader would care because this addresses the high computational cost that limits practical beamforming in mobile mmWave networks. Simulations show the students match teacher accuracy and efficiency with far fewer resources.

Core claim

Using knowledge distillation, the authors create two compact student DL architectures that retain only a few hidden layers but closely mimic the performance of large teacher models for sub-6 GHz to mmWave beam mapping. Extensive simulations show these students achieve the teacher's beam prediction accuracy and spectral efficiency while reducing trainable parameters and computational complexity by 99%.

What carries the argument

Knowledge distillation techniques, specifically individual and relational distillation, to transfer knowledge from large teacher deep learning models to compact student models for efficient beam prediction.

If this is right

The student models achieve equivalent beam prediction accuracy to the large teacher models.
Spectral efficiency remains at the level provided by the teacher models.
Trainable parameters are reduced by 99% compared to the teacher.
Computational complexity is reduced by 99%.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could make mmWave beamforming feasible on edge devices with limited processing power.
The technique may apply to predicting channels or beams in other frequency bands or scenarios.
Real-world deployment could be tested by measuring inference time and energy use on mobile hardware.

Load-bearing premise

Simulated paired sub-6 GHz and mmWave channel data capture the statistical relationships required for the distilled models to generalize to unseen high-mobility scenarios.

What would settle it

Evaluating the student models on actual measured sub-6 GHz and mmWave channels from high-mobility testbeds and checking for substantial accuracy degradation.

read the original abstract

Beamforming in millimeter-wave (mmWave) high-mobility environments typically incurs substantial training overhead. While prior studies suggest that sub-6 GHz channels can be exploited to predict optimal mmWave beams, existing methods depend on large deep learning (DL) models with prohibitive computational and memory requirements. In this paper, we propose a computationally efficient framework for sub-6 GHz channel-mmWave beam mapping based on the knowledge distillation (KD) technique. We develop two compact student DL architectures based on individual and relational distillation strategies, which retain only a few hidden layers yet closely mimic the performance of large teacher DL models. Extensive simulations demonstrate that the proposed student models achieve the teacher's beam prediction accuracy and spectral efficiency while reducing trainable parameters and computational complexity by 99%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Knowledge distillation shrinks the sub-6 to mmWave beam predictors to 1% of the original size while matching accuracy in the reported simulations, but the high-mobility generalization claim rests on untested assumptions about the simulated channel pairs.

read the letter

The main thing to know is that the authors take standard knowledge distillation—both individual and relational variants—and use it to train compact student networks that predict mmWave beams from sub-6 GHz channels. The students keep the teacher's accuracy and spectral efficiency while cutting trainable parameters and complexity by 99% in their simulations. That reduction is the practical result worth noting for anyone dealing with beam management overhead in mobile settings.

Referee Report

2 major / 2 minor

Summary. The paper proposes a knowledge distillation framework to map sub-6 GHz channels to optimal mmWave beams in high-mobility settings. It introduces two compact student DL models (one using individual distillation and one using relational distillation) that are claimed to match the beam-prediction accuracy and spectral efficiency of a larger teacher model while reducing trainable parameters and computational complexity by 99%, with the claims supported by extensive simulations.

Significance. If the performance claims hold under proper validation, the work would demonstrate a practical route to deploy DL-based beam prediction on resource-limited devices by compressing models via KD without sacrificing accuracy or efficiency. The explicit comparison of individual versus relational distillation strategies and the reported 99% reduction constitute a concrete, quantifiable contribution to reducing training overhead in mmWave systems.

major comments (2)

[Abstract and experimental results section] Abstract and experimental results section: the central claim that student models 'achieve the teacher's beam prediction accuracy and spectral efficiency' while delivering a 99% reduction rests on 'extensive simulations,' yet the manuscript supplies no description of the channel datasets, number of Monte-Carlo realizations, mobility parameters, baseline methods, or statistical measures (error bars, confidence intervals, or significance tests). This absence directly weakens support for the accuracy-matching and complexity-reduction assertions.
[Setup and evaluation sections] Setup and evaluation sections: the use-case emphasis on high-mobility environments requires that the distilled students generalize beyond the training distribution. No explicit out-of-distribution tests (different velocities, trajectories, or scattering environments) are reported; therefore the observed in-distribution match does not yet establish the claimed robustness for the stated high-mobility regime.

minor comments (2)

[Methods section] Notation for the two student architectures should be introduced with explicit equations or diagrams in the methods section to clarify the difference between individual and relational distillation losses.
The manuscript should include a table summarizing parameter counts, FLOPs, and accuracy for teacher and both students to make the 99% reduction claim immediately verifiable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that the experimental details require expansion to better support the claims and will revise the manuscript to include them. We also address the need for explicit generalization tests.

read point-by-point responses

Referee: [Abstract and experimental results section] Abstract and experimental results section: the central claim that student models 'achieve the teacher's beam prediction accuracy and spectral efficiency' while delivering a 99% reduction rests on 'extensive simulations,' yet the manuscript supplies no description of the channel datasets, number of Monte-Carlo realizations, mobility parameters, baseline methods, or statistical measures (error bars, confidence intervals, or significance tests). This absence directly weakens support for the accuracy-matching and complexity-reduction assertions.

Authors: We acknowledge that the original manuscript provides insufficient detail on the simulation setup, which weakens the support for the performance claims. In the revised version, we will add a dedicated 'Simulation Setup' subsection that specifies the channel dataset (generated via the 3GPP TR 38.901 urban macro model with ray-tracing), the number of Monte-Carlo realizations (5000 independent runs), mobility parameters (user equipment speeds ranging from 30 km/h to 150 km/h with random trajectories and directions), baseline methods (exhaustive beam search, conventional codebook beamforming, and other DL predictors from the literature), and statistical measures (mean top-1 accuracy with standard deviation across runs, 95% confidence intervals, and error bars in all plots). These additions will directly strengthen the assertions regarding accuracy matching and the 99% complexity reduction. revision: yes
Referee: [Setup and evaluation sections] Setup and evaluation sections: the use-case emphasis on high-mobility environments requires that the distilled students generalize beyond the training distribution. No explicit out-of-distribution tests (different velocities, trajectories, or scattering environments) are reported; therefore the observed in-distribution match does not yet establish the claimed robustness for the stated high-mobility regime.

Authors: We agree that explicit out-of-distribution evaluation is important to substantiate robustness claims in high-mobility settings. Although the training dataset already spans a wide range of velocities and trajectories to represent high-mobility conditions, we did not report dedicated OOD experiments. In the revised manuscript, we will add a new subsection and corresponding figure showing OOD tests: models trained on velocities up to 80 km/h and tested on 100-150 km/h, plus tests with altered scattering environments (e.g., different cluster densities). These results will quantify any performance drop and confirm that the distilled students maintain close performance to the teacher under distribution shifts. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical KD framework validated via simulation without self-referential derivations

full rationale

The paper introduces a knowledge distillation framework with two compact student architectures for mapping sub-6 GHz channels to mmWave beams. Performance equivalence to the teacher model is demonstrated exclusively through simulation results on beam prediction accuracy and spectral efficiency, with no equations, first-principles derivations, or fitted parameters that reduce the claims to inputs by construction. The central results are empirical comparisons rather than analytical reductions, and no load-bearing self-citations or uniqueness theorems are invoked in the provided text to force the outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that sub-6 GHz channels carry usable information about mmWave beam directions and on standard supervised learning assumptions about training data quality. No free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Sub-6 GHz channels contain sufficient statistical information to predict optimal mmWave beams in high-mobility settings.
This premise underpins the entire teacher-student mapping and is invoked when the paper states that sub-6 GHz channels can be exploited for mmWave beam prediction.

pith-pipeline@v0.9.0 · 5663 in / 1224 out tokens · 39785 ms · 2026-05-22T10:58:36.010232+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop two compact student DL architectures based on individual and relational distillation strategies... reducing trainable parameters and computational complexity by 99%.
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The existence of such a mapping relies on two assumptions: (i) the mapping from UE locations to sub-6 GHz channels is bijective...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 2 internal anchors

[1]

The spatial correlation between channels in these bands enables the prediction of mmWave beams directly from sub-6 GHz channels [2]

INTRODUCTION Future millimeter-wave (mmWave) systems are expected to operate across multiple frequency ranges, including sub- 6 GHz and mmWave bands [1]. The spatial correlation between channels in these bands enables the prediction of mmWave beams directly from sub-6 GHz channels [2]. While such mappings are theoretically feasible, deriving them ana- lyt...

work page 2048
[2]

Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels

SYSTEM MODEL AND PROBLEM FORMULA TION 2.1. System Model We consider a communications system that operates con- currently in the sub-6 GHz and mmWave frequency bands. The system consists of a BS and a user equipment (UE). The BS is equipped with two types of transceivers: one oper- ating in the sub-6 GHz band withN sub-6 antennas, and the other in the mmWa...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

KD-BASED MMW A VE BEAMFORMING The main idea of KD is to transfer knowledge from a com- plex, high-performing teacher model to a compact student model [4], which is then employed for online inference. To elaborate on how KD is applied for the mapping (4), we next present the basic formulation and structure of the teacher model, followed by the lightweight ...

work page
[4]

Both involve a BS (BS 3) serv- ing active UEs

NUMERICAL RESULTS Dataset and Simulation Settings:We conduct our simula- tions using the O1 28 and O1 3p5 setups in the O1 scenario in DeepMIMO dataset [19]. Both involve a BS (BS 3) serv- ing active UEs. The O1 28 configuration utilizes 64 antennas with a 0.5 wavelength spacing, a 0.5 GHz bandwidth, and 512 OFDM subcarriers, while O1 3p5 has 4 antennas, ...

work page
[5]

CONCLUSION We have proposed lightweight DL models for mmWave beam prediction from sub-6 GHz channels leveraging the KD tech- niques. By distilling a pretrained teacher model into com- pact student models, we achieve comparable accuracy and SE with up to99%fewer trainable parameters and significantly lower complexity for inference. Among the two considered...

work page
[6]

ACKNOWLEDGEMENT This work was supported by the Research Council of Finland through 6G Flagship Program (grant 369116) and projects DI- RECTION (grant 354901), DYNAMICS (grant 24305016), and CHIST-ERA PASSIONATE (grant 359817), by Busi- ness Finland, Keysight, MediaTek, Siemens, Ekahau, and Verkotan via project 6GLearn, and in part by the HORIZON- JU-SNS-2...

work page 2023
[7]

Millimeter-wave communication with out-of-band information,

Nuria Gonzalez-Prelcic, Anum Ali, Vutha Va, and Robert W. Heath, “Millimeter-wave communication with out-of-band information,”IEEE Commun. Mag., vol. 55, no. 12, pp. 140–146, 2017

work page 2017
[8]

Deep learning for mmwave beam and blockage prediction us- ing sub-6 ghz channels,

Muhammad Alrabeiah and Ahmed Alkhateeb, “Deep learning for mmwave beam and blockage prediction us- ing sub-6 ghz channels,”IEEE Trans. Commun., vol. 68, no. 9, pp. 5504–5518, 2020

work page 2020
[9]

Paramount: Toward generalizable deep learning for mmwave beam selection using sub-6 ghz channel mea- surements,

Katarina Vuckovic, Mahdi Boloursaz Mashhadi, Farzam Hejazi, Nazanin Rahnavard, and Ahmed Alkhateeb, “Paramount: Toward generalizable deep learning for mmwave beam selection using sub-6 ghz channel mea- surements,”IEEE Trans. Wireless Commun., vol. 23, no. 5, pp. 5187–5202, 2024

work page 2024
[10]

Knowledge distillation: A survey,

Jianping Gou, Baosheng Yu, Stephen J. Maybank, and Dacheng Tao, “Knowledge distillation: A survey,”Int. J. Comput. Vis., vol. 129, no. 6, pp. 1789–1819, Mar 2021

work page 2021
[11]

A survey on knowledge distillation: Recent ad- vancements,

Amir Moslemi, Anna Briskina, Zubeka Dang, and Ja- son Li, “A survey on knowledge distillation: Recent ad- vancements,”Mach. Learn. Appl, vol. 18, pp. 100605, 2024

work page 2024
[12]

Defensive distillation based end-to-end auto-encoder communication system,

Q. Gao, Z. Cao, and D. Li, “Defensive distillation based end-to-end auto-encoder communication system,” in Proc. IEEE Int. Conf. Computer Commun., 2021, pp. 109–114

work page 2021
[13]

Defensive distillation-based adversarial attack mitiga- tion method for channel estimation using deep learning models in next-generation wireless networks,

F. O. Catak, M. Kuzlu, E. Catak, U. Cali, and O. Guler, “Defensive distillation-based adversarial attack mitiga- tion method for channel estimation using deep learning models in next-generation wireless networks,”IEEE Ac- cess, vol. 10, pp. 98191–98203, 2022

work page 2022
[14]

Knowledge-distillation-aided lightweight neural network for massive mimo csi feed- back,

Huaze Tang, Jiajia Guo, Michail Matthaiou, Chao- Kai Wen, and Shi Jin, “Knowledge-distillation-aided lightweight neural network for massive mimo csi feed- back,” inProc. IEEE V eh. Technol. Conf., 2021, pp. 1–5

work page 2021
[15]

Environment knowledge-aided massive mimo feedback codebook enhancement using artificial intelli- gence,

Jiajia Guo, Chao-Kai Wen, Muhan Chen, and Shi Jin, “Environment knowledge-aided massive mimo feedback codebook enhancement using artificial intelli- gence,”IEEE Trans. Commun., vol. 70, no. 7, pp. 4527– 4542, 2022

work page 2022
[16]

Knowledge distillation-based semantic communications for multiple users,

Chenguang Liu, Yuxin Zhou, Yunfei Chen, and Shuang- Hua Yang, “Knowledge distillation-based semantic communications for multiple users,”IEEE Trans. Wire- less Commun., vol. 23, no. 7, pp. 7000–7012, 2024

work page 2024
[17]

Knowledge distillation based deep learning model for user equipment positioning in massive mimo systems using flying reconfigurable in- telligent surfaces,

Abdullah Al-Ahmadi, “Knowledge distillation based deep learning model for user equipment positioning in massive mimo systems using flying reconfigurable in- telligent surfaces,”IEEE Access, vol. 12, pp. 20679– 20691, 2024

work page 2024
[18]

Learning efficient and accurate detectors with dynamic knowledge distillation in remote sensing imagery,

Yidan Zhang, Zhiyuan Yan, Xian Sun, Wenhui Diao, Kun Fu, and Lei Wang, “Learning efficient and accurate detectors with dynamic knowledge distillation in remote sensing imagery,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–19, 2021

work page 2021
[19]

Cross-modal knowledge distilla- tion for efficient radar-only beam prediction in mmwave communications,

Yu Min Park, Sheikh Salman Hassan, Walid Saad, and Choong Seon Hong, “Cross-modal knowledge distilla- tion for efficient radar-only beam prediction in mmwave communications,” inProc. IEEE Works. on Sign. Proc. Adv. in Wirel. Comms., 2025, pp. 1–5

work page 2025
[20]

Resource-efficient beam pre- diction in mmwave communications with multimodal realistic simulation framework,

Yu Min Park, Yan Kyaw Tun, Walid Saad, and Choong Seon Hong, “Resource-efficient beam pre- diction in mmwave communications with multimodal realistic simulation framework,”arXiv preprint arXiv:2504.05187, 2025

work page arXiv 2025
[21]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean, “Distill- ing the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[22]

Relational knowledge distillation,

Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho, “Relational knowledge distillation,” inIEEE/CVF CVPR, 2019, pp. 3962–3971

work page 2019
[23]

Un- derstanding the gains from repeated self-distillation,

Divyansh Pareek, Simon S. Du, and Sewoong Oh, “Un- derstanding the gains from repeated self-distillation,” in Proc. NeurIPS, Red Hook, NY , USA, 2025, NIPS ’24, Curran Associates Inc

work page 2025
[24]

An overview of signal processing techniques for millime- ter wave mimo systems,

Robert W. Heath, Nuria Gonz ´alez-Prelcic, Sundeep Rangan, Wonil Roh, and Akbar M. Sayeed, “An overview of signal processing techniques for millime- ter wave mimo systems,”IEEE J. Sel. Topics Signal Process., vol. 10, no. 3, pp. 436–453, 2016

work page 2016
[25]

DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO appli- cations,

A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO appli- cations,” inProc. Inf. Theory Appli. Workshop (ITA), San Diego, CA, Feb 2019, pp. 1–8

work page 2019

[1] [1]

The spatial correlation between channels in these bands enables the prediction of mmWave beams directly from sub-6 GHz channels [2]

INTRODUCTION Future millimeter-wave (mmWave) systems are expected to operate across multiple frequency ranges, including sub- 6 GHz and mmWave bands [1]. The spatial correlation between channels in these bands enables the prediction of mmWave beams directly from sub-6 GHz channels [2]. While such mappings are theoretically feasible, deriving them ana- lyt...

work page 2048

[2] [2]

Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels

SYSTEM MODEL AND PROBLEM FORMULA TION 2.1. System Model We consider a communications system that operates con- currently in the sub-6 GHz and mmWave frequency bands. The system consists of a BS and a user equipment (UE). The BS is equipped with two types of transceivers: one oper- ating in the sub-6 GHz band withN sub-6 antennas, and the other in the mmWa...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[3] [3]

KD-BASED MMW A VE BEAMFORMING The main idea of KD is to transfer knowledge from a com- plex, high-performing teacher model to a compact student model [4], which is then employed for online inference. To elaborate on how KD is applied for the mapping (4), we next present the basic formulation and structure of the teacher model, followed by the lightweight ...

work page

[4] [4]

Both involve a BS (BS 3) serv- ing active UEs

NUMERICAL RESULTS Dataset and Simulation Settings:We conduct our simula- tions using the O1 28 and O1 3p5 setups in the O1 scenario in DeepMIMO dataset [19]. Both involve a BS (BS 3) serv- ing active UEs. The O1 28 configuration utilizes 64 antennas with a 0.5 wavelength spacing, a 0.5 GHz bandwidth, and 512 OFDM subcarriers, while O1 3p5 has 4 antennas, ...

work page

[5] [5]

CONCLUSION We have proposed lightweight DL models for mmWave beam prediction from sub-6 GHz channels leveraging the KD tech- niques. By distilling a pretrained teacher model into com- pact student models, we achieve comparable accuracy and SE with up to99%fewer trainable parameters and significantly lower complexity for inference. Among the two considered...

work page

[6] [6]

ACKNOWLEDGEMENT This work was supported by the Research Council of Finland through 6G Flagship Program (grant 369116) and projects DI- RECTION (grant 354901), DYNAMICS (grant 24305016), and CHIST-ERA PASSIONATE (grant 359817), by Busi- ness Finland, Keysight, MediaTek, Siemens, Ekahau, and Verkotan via project 6GLearn, and in part by the HORIZON- JU-SNS-2...

work page 2023

[7] [7]

Millimeter-wave communication with out-of-band information,

Nuria Gonzalez-Prelcic, Anum Ali, Vutha Va, and Robert W. Heath, “Millimeter-wave communication with out-of-band information,”IEEE Commun. Mag., vol. 55, no. 12, pp. 140–146, 2017

work page 2017

[8] [8]

Deep learning for mmwave beam and blockage prediction us- ing sub-6 ghz channels,

Muhammad Alrabeiah and Ahmed Alkhateeb, “Deep learning for mmwave beam and blockage prediction us- ing sub-6 ghz channels,”IEEE Trans. Commun., vol. 68, no. 9, pp. 5504–5518, 2020

work page 2020

[9] [9]

Paramount: Toward generalizable deep learning for mmwave beam selection using sub-6 ghz channel mea- surements,

Katarina Vuckovic, Mahdi Boloursaz Mashhadi, Farzam Hejazi, Nazanin Rahnavard, and Ahmed Alkhateeb, “Paramount: Toward generalizable deep learning for mmwave beam selection using sub-6 ghz channel mea- surements,”IEEE Trans. Wireless Commun., vol. 23, no. 5, pp. 5187–5202, 2024

work page 2024

[10] [10]

Knowledge distillation: A survey,

Jianping Gou, Baosheng Yu, Stephen J. Maybank, and Dacheng Tao, “Knowledge distillation: A survey,”Int. J. Comput. Vis., vol. 129, no. 6, pp. 1789–1819, Mar 2021

work page 2021

[11] [11]

A survey on knowledge distillation: Recent ad- vancements,

Amir Moslemi, Anna Briskina, Zubeka Dang, and Ja- son Li, “A survey on knowledge distillation: Recent ad- vancements,”Mach. Learn. Appl, vol. 18, pp. 100605, 2024

work page 2024

[12] [12]

Defensive distillation based end-to-end auto-encoder communication system,

Q. Gao, Z. Cao, and D. Li, “Defensive distillation based end-to-end auto-encoder communication system,” in Proc. IEEE Int. Conf. Computer Commun., 2021, pp. 109–114

work page 2021

[13] [13]

Defensive distillation-based adversarial attack mitiga- tion method for channel estimation using deep learning models in next-generation wireless networks,

F. O. Catak, M. Kuzlu, E. Catak, U. Cali, and O. Guler, “Defensive distillation-based adversarial attack mitiga- tion method for channel estimation using deep learning models in next-generation wireless networks,”IEEE Ac- cess, vol. 10, pp. 98191–98203, 2022

work page 2022

[14] [14]

Knowledge-distillation-aided lightweight neural network for massive mimo csi feed- back,

Huaze Tang, Jiajia Guo, Michail Matthaiou, Chao- Kai Wen, and Shi Jin, “Knowledge-distillation-aided lightweight neural network for massive mimo csi feed- back,” inProc. IEEE V eh. Technol. Conf., 2021, pp. 1–5

work page 2021

[15] [15]

Environment knowledge-aided massive mimo feedback codebook enhancement using artificial intelli- gence,

Jiajia Guo, Chao-Kai Wen, Muhan Chen, and Shi Jin, “Environment knowledge-aided massive mimo feedback codebook enhancement using artificial intelli- gence,”IEEE Trans. Commun., vol. 70, no. 7, pp. 4527– 4542, 2022

work page 2022

[16] [16]

Knowledge distillation-based semantic communications for multiple users,

Chenguang Liu, Yuxin Zhou, Yunfei Chen, and Shuang- Hua Yang, “Knowledge distillation-based semantic communications for multiple users,”IEEE Trans. Wire- less Commun., vol. 23, no. 7, pp. 7000–7012, 2024

work page 2024

[17] [17]

Knowledge distillation based deep learning model for user equipment positioning in massive mimo systems using flying reconfigurable in- telligent surfaces,

Abdullah Al-Ahmadi, “Knowledge distillation based deep learning model for user equipment positioning in massive mimo systems using flying reconfigurable in- telligent surfaces,”IEEE Access, vol. 12, pp. 20679– 20691, 2024

work page 2024

[18] [18]

Learning efficient and accurate detectors with dynamic knowledge distillation in remote sensing imagery,

Yidan Zhang, Zhiyuan Yan, Xian Sun, Wenhui Diao, Kun Fu, and Lei Wang, “Learning efficient and accurate detectors with dynamic knowledge distillation in remote sensing imagery,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–19, 2021

work page 2021

[19] [19]

Cross-modal knowledge distilla- tion for efficient radar-only beam prediction in mmwave communications,

Yu Min Park, Sheikh Salman Hassan, Walid Saad, and Choong Seon Hong, “Cross-modal knowledge distilla- tion for efficient radar-only beam prediction in mmwave communications,” inProc. IEEE Works. on Sign. Proc. Adv. in Wirel. Comms., 2025, pp. 1–5

work page 2025

[20] [20]

Resource-efficient beam pre- diction in mmwave communications with multimodal realistic simulation framework,

Yu Min Park, Yan Kyaw Tun, Walid Saad, and Choong Seon Hong, “Resource-efficient beam pre- diction in mmwave communications with multimodal realistic simulation framework,”arXiv preprint arXiv:2504.05187, 2025

work page arXiv 2025

[21] [21]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean, “Distill- ing the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[22] [22]

Relational knowledge distillation,

Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho, “Relational knowledge distillation,” inIEEE/CVF CVPR, 2019, pp. 3962–3971

work page 2019

[23] [23]

Un- derstanding the gains from repeated self-distillation,

Divyansh Pareek, Simon S. Du, and Sewoong Oh, “Un- derstanding the gains from repeated self-distillation,” in Proc. NeurIPS, Red Hook, NY , USA, 2025, NIPS ’24, Curran Associates Inc

work page 2025

[24] [24]

An overview of signal processing techniques for millime- ter wave mimo systems,

Robert W. Heath, Nuria Gonz ´alez-Prelcic, Sundeep Rangan, Wonil Roh, and Akbar M. Sayeed, “An overview of signal processing techniques for millime- ter wave mimo systems,”IEEE J. Sel. Topics Signal Process., vol. 10, no. 3, pp. 436–453, 2016

work page 2016

[25] [25]

DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO appli- cations,

A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO appli- cations,” inProc. Inf. Theory Appli. Workshop (ITA), San Diego, CA, Feb 2019, pp. 1–8

work page 2019