arxiv: 2605.04514 · v1 · submitted 2026-05-06 · 📡 eess.SP

Deep Learning-Based Computer Vision for Beam Selection and Proactive Blockage Prediction

Sachira Karunasena , Erfan Khordad , Tom Drummond , Rajitha Senanayake This is my paper

Pith reviewed 2026-05-08 16:28 UTC · model grok-4.3

classification 📡 eess.SP

keywords millimeter-wavebeam selectionblockage predictioncomputer visiondeep learningproactive predictionobject trackingmmWave

0 comments

The pith

RGB imagery fused with power profiles enables 98.96% beam prediction accuracy and over 98% blockage forecasting in mmWave systems

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper is trying to establish that deep learning models can use RGB camera images together with received power measurements to accurately select beams and predict upcoming blockages in millimeter-wave communications. A sympathetic reader would care because these links offer high data rates but are easily disrupted by misalignment or obstacles, and reliable prediction could keep connections active longer. The work shows strong results on new test data, including cases where both the signal source and blockers move independently at varying speeds.

Core claim

We address propagation loss through a novel vision-aided beam selection framework that integrates RGB imagery with received power profiles for efficient transmitter identification and beam prediction. This framework achieves 98.96% top-5 beam prediction accuracy, surpassing current state-of-the-art methods by at least 6% across all metrics. We address penetration loss through a proactive blockage prediction framework using a modified object tracker with weighted centroid-based depth estimation. This represents the first analysis of simultaneous non-uniform mobility of both transmitters and obstacles. Evaluated on completely unseen data, this framework achieves over 98% accuracy in predicting

What carries the argument

Vision-aided beam selection framework that integrates RGB imagery with received power profiles for transmitter identification and beam prediction, paired with a modified object tracker using weighted centroid-based depth estimation for proactive blockage forecasting

Load-bearing premise

RGB imagery combined with received power profiles will be available in real time at both transmitter and receiver and that models trained on the authors' datasets will generalize to arbitrary real-world mobility patterns and lighting conditions.

What would settle it

Testing the trained models on new outdoor data with sudden lighting changes and unpredictable simultaneous movements of transmitters and obstacles, then checking whether top-5 beam accuracy drops below 90% or blockage prediction accuracy falls below 90% for three-frame horizons.

Figures

Figures reproduced from arXiv: 2605.04514 by Erfan Khordad, Rajitha Senanayake, Sachira Karunasena, Tom Drummond.

**Figure 1.** Figure 1: [θ (q) start, θ(q) end] of selected 3 beams and the beam steering range of the BS, [Φstart, Φend] from the DeepSense 6G [32] codebook. These values are used to define the corresponding x-region boundaries as defined in (5). Our propagation loss mitigation framework surpasses current state-of-the-art vision-aided beam selection methods [11], [12] by at least 6% across all evaluation metrics. Our penetration… view at source ↗

**Figure 2.** Figure 2: 1) Transmitter Identification (Section III-A): Precisely locating the TX amid multiple interfering objects within the surrounding environment. 2) Transmitter Tracking (Section III-B): Continuously monitoring the identified TX across consecutive frames while it remains within the beam steering range of the BS. 3) Beam Prediction (Section III-C): Estimating the top-N candidate beams for serving the monitored… view at source ↗

**Figure 2.** Figure 2: Proposed end-to-end Vision-aided Beamforming approach (Red box). The first view at source ↗

**Figure 3.** Figure 3: Structuring the input for the TX Identification methods. This view at source ↗

**Figure 4.** Figure 4: Beams given by the codebook of DeepSense6G dataset [32] view at source ↗

**Figure 5.** Figure 5: The process of mapping beams given by the codebook of view at source ↗

**Figure 6.** Figure 6: Beams given by the codebook of DeepSense6G dataset [32] view at source ↗

**Figure 7.** Figure 7: Proposed dual-branch neural network architecture for top- view at source ↗

**Figure 8.** Figure 8: Comparison of Top-N Beam Prediction metrics with current view at source ↗

**Figure 9.** Figure 9: Proposed end-to-end vision-aided blockage prediction framework (red box). The first view at source ↗

**Figure 12.** Figure 12: Ground truth normalized confusion matrices for scenario view at source ↗

**Figure 10.** Figure 10: Ground truth normalized confusion matrices for scenario view at source ↗

**Figure 11.** Figure 11: Ground truth normalized confusion matrices for scenario view at source ↗

read the original abstract

Millimeter-wave communication faces two critical challenges: propagation losses requiring costly narrow-beam alignment, and penetration losses causing link failures from blocked line-of-sight paths. We address propagation loss through a novel vision-aided beam selection framework that integrates RGB imagery with received power profiles for efficient transmitter identification and beam prediction. This framework achieves 98.96% top-5 beam prediction accuracy, surpassing current state-of-the-art methods by at least 6% across all metrics. We address penetration loss through a proactive blockage prediction framework using a modified object tracker with weighted centroid-based depth estimation. This represents the first analysis of simultaneous non-uniform mobility of both transmitters and obstacles. Evaluated on completely unseen data, this framework achieves over 98% accuracy in predicting blockages up to three frames ahead, establishing strong performance benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper backs up its vision-aided mmWave beam selection and blockage prediction claims with proper dataset splits and baseline comparisons, delivering usable accuracy numbers on the combined mobility case.

read the letter

The main thing to know is that the authors combine RGB images with received power profiles to select beams and use a modified tracker for early blockage warnings, even when the transmitter and obstacle both move irregularly. They report 98.96% top-5 beam accuracy (6% above prior work) and over 98% blockage prediction accuracy a few frames ahead on unseen data. The full manuscript supports these figures with explicit splits, architecture details, and re-run baselines on the same data, so the performance edge looks real rather than an artifact of uneven evaluation.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes two deep learning-based computer vision frameworks for millimeter-wave communications: a vision-aided beam selection method that fuses RGB imagery with received power profiles to achieve 98.96% top-5 beam prediction accuracy (outperforming prior SOTA by at least 6% across metrics), and a proactive blockage prediction system based on a modified object tracker with weighted centroid depth estimation that reaches over 98% accuracy on unseen data for up to three frames ahead, including the first reported analysis of simultaneous non-uniform mobility between transmitter and obstacles.

Significance. If the empirical results hold under the reported evaluation protocol, the work offers a practical contribution to mmWave system design by demonstrating how standard camera feeds can reduce beam alignment overhead and preempt link outages. Explicit dataset splits, re-implemented baselines on identical data, and architecture details strengthen the reproducibility of the performance claims. The focus on real-time mobility scenarios addresses a relevant deployment gap in 5G/6G networks.

minor comments (3)

Abstract: the statement that the beam selection framework 'surpasses current state-of-the-art methods by at least 6% across all metrics' would be clearer if the specific metrics (e.g., top-1, top-3) and the exact SOTA references being compared were named inline rather than left to the reader to locate in the results section.
§4 (blockage prediction): the description of the 'weighted centroid-based depth estimation' would benefit from an explicit equation or pseudocode step showing how the weights are computed from the tracker output, as the current prose leaves the weighting rule ambiguous for replication.
The manuscript would be strengthened by adding a short paragraph on inference latency and memory footprint of the two models on embedded hardware, given the real-time requirements implied by the proactive prediction task.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of our manuscript, the recognition of its contributions to vision-aided beam selection and proactive blockage prediction, and the recommendation for minor revision. We appreciate the emphasis on reproducibility and the relevance to 5G/6G deployment scenarios.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper presents purely empirical results from supervised deep learning models trained on RGB imagery combined with received power profiles for beam selection and a modified object tracker for proactive blockage prediction. All headline metrics (98.96% top-5 beam accuracy and >98% blockage prediction on unseen data) are obtained via explicit dataset splits and held-out evaluation, with no equations, derivations, fitted parameters renamed as predictions, or self-citation chains that reduce any claim to its own inputs by construction. The work is self-contained against external benchmarks through re-implemented baselines and standard ML evaluation protocols.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claims rest on the generalization performance of trained deep neural networks whose internal weights are fitted to proprietary or unreleased training data; no additional mathematical axioms or invented physical entities are introduced.

free parameters (1)

neural network weights and biases
Deep learning models contain millions of parameters whose values are determined by training on the authors' image and power-profile datasets.

pith-pipeline@v0.9.0 · 5442 in / 1269 out tokens · 37280 ms · 2026-05-08T16:28:27.089846+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 2 canonical work pages

[1]

Wireless communications and applications above 100 GHz: Opportuni- ties and challenges for 6G and beyond,

T. S. Rappaport, Y. Xing, O. Kanhere, S. Ju, A. Madanayake, S. Mandal, A. Alkhateeb, and G. C. Trichopoulos, “Wireless communications and applications above 100 GHz: Opportuni- ties and challenges for 6G and beyond,” IEEE access, vol. 7, pp. 78 729–78 757, 2019

2019
[3]

Modeling and analyzing millimeter wave cellular systems,

J. G. Andrews, T. Bai, M. N. Kulkarni, A. Alkhateeb, A. K. Gupta, and R. W. Heath, “Modeling and analyzing millimeter wave cellular systems,” IEEE Transactions on Communications, vol. 65, no. 1, pp. 403–430, 2016

2016
[4]

Machine learning for millimeter wave and terahertz beam management: A survey and open challenges,

M. Q. Khan, A. Gaber, P. Schulz, and G. Fettweis, “Machine learning for millimeter wave and terahertz beam management: A survey and open challenges,” IEEE Access, vol. 11, pp. 11 880– 11 902, 2023

2023
[5]

Machine learning for reliable mmWave systems: Blockage prediction and proactive handoff,

A. Alkhateeb, I. Beltagy, and S. Alex, “Machine learning for reliable mmWave systems: Blockage prediction and proactive handoff,” in 2018 IEEE Global conference on signal and infor- mation processing (GlobalSIP). IEEE, 2018, pp. 1055–1059

2018
[6]

Beam manage- ment in millimeter-wave communications for 5G and beyond,

Y.-N. R. Li, B. Gao, X. Zhang, and K. Huang, “Beam manage- ment in millimeter-wave communications for 5G and beyond,” IEEE Access, vol. 8, pp. 13 282–13 293, 2020

2020
[7]

Hierarchical codebook design for beamforming training in millimeter-wave communica- tion,

Z. Xiao, T. He, P. Xia, and X.-G. Xia, “Hierarchical codebook design for beamforming training in millimeter-wave communica- tion,” IEEE Transactions on Wireless Communications, vol. 15, no. 5, pp. 3380–3392, 2016

2016
[8]

Millimeter wave beamforming for wireless backhaul and access in small cell networks,

S. Hur, T. Kim, D. J. Love, J. V. Krogmeier, T. A. Thomas, and A. Ghosh, “Millimeter wave beamforming for wireless backhaul and access in small cell networks,” IEEE transactions on communications, vol. 61, no. 10, pp. 4391–4403, 2013

2013
[9]

Wideband millimeter- wave beam training with true-time-delay array architecture,

H. Yan, V. Boljanovic, and D. Cabric, “Wideband millimeter- wave beam training with true-time-delay array architecture,” in 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 2019, pp. 1447–1452. 14

2019
[10]

Terahertz communications: An array-of- subarrays solution,

C. Lin and G. Y. L. Li, “Terahertz communications: An array-of- subarrays solution,” IEEE Communications Magazine, vol. 54, no. 12, pp. 124–131, 2016

2016
[11]

An eﬀicient nocturnal scenarios beamforming based on multi-modal enhanced by object detection,

J. Nie, Y. Cui, T. Yu, J. Mu, W. Yuan, and X. Jing, “An eﬀicient nocturnal scenarios beamforming based on multi-modal enhanced by object detection,” in 2023 IEEE Globecom Work- shops (GC Wkshps). IEEE, 2023, pp. 515–520

2023
[12]

Environment semantic aided communication: A real world demonstration for beam prediction,

S. Imran, G. Charan, and A. Alkhateeb, “Environment semantic aided communication: A real world demonstration for beam prediction,” in 2023 IEEE International Conference on Com- munications Workshops (ICC Workshops). IEEE, 2023, pp. 48–53

2023
[13]

Computer vision-aided beam- forming for 6G wireless communications: Dataset and training perspective,

S. Kim, Y. Ahn, and B. Shim, “Computer vision-aided beam- forming for 6G wireless communications: Dataset and training perspective,” in ICC 2024-IEEE International Conference on Communications. IEEE, 2024, pp. 672–677

2024
[14]

LiDAR data for deep learning-based mmWave beam-selection,

A. Klautau, N. González-Prelcic, and R. W. Heath, “LiDAR data for deep learning-based mmWave beam-selection,” IEEE Wireless Communications Letters, vol. 8, no. 3, pp. 909–912, 2019

2019
[15]

Position-aided beam prediction in the real world: How useful gps locations actually are?

J. Morais, A. Bchboodi, H. Pezeshki, and A. Alkhateeb, “Position-aided beam prediction in the real world: How useful gps locations actually are?” in ICC 2023-IEEE International Conference on Communications. IEEE, 2023, pp. 1824–1829

2023
[16]

Deep learning for fast and reliable initial access in ai- driven 6G mmWave networks,

T. S. Cousik, V. K. Shah, T. Erpek, Y. E. Sagduyu, and J. H. Reed, “Deep learning for fast and reliable initial access in ai- driven 6G mmWave networks,” IEEE Transactions on Network Science and Engineering, vol. 11, no. 6, pp. 5668–5680, 2022

2022
[17]

Deep learning assisted calibrated beam training for millimeter-wave com- munication systems,

K. Ma, D. He, H. Sun, Z. Wang, and S. Chen, “Deep learning assisted calibrated beam training for millimeter-wave com- munication systems,” IEEE Transactions on Communications, vol. 69, no. 10, pp. 6706–6721, 2021

2021
[18]

Deep learning for beam training in millimeter wave massive MIMO systems,

C. Qi, Y. Wang, and G. Y. Li, “Deep learning for beam training in millimeter wave massive MIMO systems,” IEEE Transactions on Wireless Communications, 2020

2020
[19]

Deep learning for mmWave beam and blockage prediction using sub-6 GHz channels,

M. Alrabeiah and A. Alkhateeb, “Deep learning for mmWave beam and blockage prediction using sub-6 GHz channels,” IEEE Transactions on Communications, vol. 68, no. 9, pp. 5504–5518, 2020

2020
[20]

Integrated millimeter wave and sub-6 GHz wireless networks: A roadmap for joint mobile broadband and ultra-reliable low-latency com- munications,

O. Semiari, W. Saad, M. Bennis, and M. Debbah, “Integrated millimeter wave and sub-6 GHz wireless networks: A roadmap for joint mobile broadband and ultra-reliable low-latency com- munications,” IEEE Wireless Communications, vol. 26, no. 2, pp. 109–115, 2019

2019
[21]

Improved handover through dual connectivity in 5G mmWave mobile networks,

M. Polese, M. Giordani, M. Mezzavilla, S. Rangan, and M. Zorzi, “Improved handover through dual connectivity in 5G mmWave mobile networks,” IEEE Journal on Selected Areas in Commu- nications, vol. 35, no. 9, pp. 2069–2084, 2017

2069
[22]

Dynamic multi-connectivity performance in ultra-dense urban mmWave deployments,

V. Petrov, D. Solomitckii, A. Samuylov, M. A. Lema, M. Gapeyenko, D. Moltchanov, S. Andreev, V. Naumov, K. Samouylov, M. Dohler et al., “Dynamic multi-connectivity performance in ultra-dense urban mmWave deployments,” IEEE Journal on Selected Areas in Communications, vol. 35, no. 9, pp. 2038–2055, 2017

2038
[23]

Early warning of mmwave signal blockage and aoa transition using sub-6 ghz observations,

Z. Ali, A. Duel-Hallen, and H. Hallen, “Early warning of mmwave signal blockage and aoa transition using sub-6 ghz observations,” IEEE Communications Letters, vol. 24, no. 1, pp. 207–211, 2019

2019
[24]

Deep learning for moving blockage prediction using real mmWave measurements,

S. Wu, M. Alrabeiah, A. Hredzak, C. Chakrabarti, and A. Alkhateeb, “Deep learning for moving blockage prediction using real mmWave measurements,” in ICC 2022 - IEEE Inter- national Conference on Communications, 2022, pp. 3753–3758

2022
[25]

Radar aided proactive block- age prediction in real-world millimeter wave systems,

U. Demirhan and A. Alkhateeb, “Radar aided proactive block- age prediction in real-world millimeter wave systems,” in ICC 2022-IEEE International Conference on Communications. IEEE, 2022, pp. 4547–4552

2022
[26]

LiDAR-aided mobile blockage prediction in real-world millimeter wave systems,

S. Wu, C. Chakrabarti, and A. Alkhateeb, “LiDAR-aided mobile blockage prediction in real-world millimeter wave systems,” in 2022 IEEE Wireless Communications and Networking Confer- ence (WCNC). IEEE, 2022, pp. 2631–2636

2022
[27]

Computer vision aided blockage prediction in real-world millimeter wave deployments,

G. Charan and A. Alkhateeb, “Computer vision aided blockage prediction in real-world millimeter wave deployments,” in 2022 IEEE Globecom Workshops (GC Wkshps). IEEE, 2022, pp. 1711–1716

2022
[28]

Generative AI-enabled blockage prediction for robust dual-band mmWave communication,

M. Ghassemi, H. Zhang, A. Afana, A. Bin Sediq, and M. Erol- Kantarci, “Generative AI-enabled blockage prediction for robust dual-band mmWave communication,” in ICC 2025 - IEEE International Conference on Communications, 2025, pp. 476– 481

2025
[29]

Generative AI-enabled blockage prediction for ro- bust dual-band mmWave communication,

M. Ghassemi, H. Zhang, A. Afana, A. B. Sediq, and M. Erol- Kantarci, “Generative AI-enabled blockage prediction for ro- bust dual-band mmWave communication,” arXiv preprint arXiv:2501.11763, 2025

work page arXiv 2025
[30]

Millimeter wave base stations with cameras: Vision-aided beam and blockage prediction,

M. Alrabeiah, A. Hredzak, and A. Alkhateeb, “Millimeter wave base stations with cameras: Vision-aided beam and blockage prediction,” in 2020 IEEE 91st vehicular technology conference (VTC2020-Spring). IEEE, 2020, pp. 1–5

2020
[31]

User identification: A key en- abler for multi-user vision-aided communications,

G. Charan and A. Alkhateeb, “User identification: A key en- abler for multi-user vision-aided communications,” IEEE Open Journal of the Communications Society, 2023

2023
[32]

Deepsense 6G: A large-scale real-world multi-modal sensing and communication dataset,

A. Alkhateeb, G. Charan, T. Osman, A. Hredzak, J. Morais, U. Demirhan, and N. Srinivas, “Deepsense 6G: A large-scale real-world multi-modal sensing and communication dataset,” IEEE Communications Magazine, vol. 61, no. 9, pp. 122–128, 2023

2023
[33]

Deep learning based computer-vision for enhanced beamform- ing,

S. Karunasena, E. Khordad, T. Drummond, and R. Senanayake, “Deep learning based computer-vision for enhanced beamform- ing,” in 2025 IEEE International Conference on Communica- tions Workshops (ICC Workshops), 2025, pp. 1646–1651

2025
[34]

Microsoft COCO: Com- mon objects in context,

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra- manan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Com- mon objects in context,” in Computer Vision–ECCV 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755

2014
[35]

Yolov10: Real-time end-to-end object detection

L. L. Ao Wang, Hui Chen, “YOLOv10: Real-time end-to-end object detection,” arXiv preprint arXiv:2405.14458, 2024

work page arXiv 2024
[36]

Deep OC- Sort: Multi-pedestrian tracking by adaptive re-identification,

G. Maggiolino, A. Ahmad, J. Cao, and K. Kitani, “Deep OC- Sort: Multi-pedestrian tracking by adaptive re-identification,” in 2023 IEEE International Conference on Image Processing (ICIP), 2023, pp. 3025–3029

2023
[37]

Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,

R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, 2022

2022