A Camera-Cooperative ISAC Framework for Multimodal Non-Cooperative UAVs Sensing

Kun Yang; Luping Xiang; Wenfeng Wu

arxiv: 2605.22090 · v1 · pith:UN6TJOL5new · submitted 2026-05-21 · 💻 cs.AI

A Camera-Cooperative ISAC Framework for Multimodal Non-Cooperative UAVs Sensing

Wenfeng Wu , Luping Xiang , Kun Yang This is my paper

Pith reviewed 2026-05-22 06:06 UTC · model grok-4.3

classification 💻 cs.AI

keywords ISACUAV sensingmultimodal fusioncamera cooperationbeam steeringdata alignmentnon-cooperative targetsstate estimation

0 comments

The pith

A camera-cooperative ISAC framework reduces beam steering overhead by an average of 71 percent for non-cooperative UAV sensing while preserving angular accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that pairs camera-based visual monitoring with integrated sensing and communication to handle non-cooperative UAV detection more efficiently than single-modal ISAC systems alone. Cameras provide coarse airspace coverage while ISAC supplies fine-grained measurements, creating a loop that lowers the resource demands of constant beam adjustments. Alignment of visual and echo features occurs through a cross-attention model, after which a fusion module combines historical and current data to produce state estimates. If the approach works as described, ISAC systems can shift freed resources toward communication tasks without losing tracking reliability. A reader would care because resource competition between sensing and communication remains a core barrier in practical 6G-style deployments.

Core claim

The authors present a Camera-Cooperative ISAC (CC-ISAC) framework that uses cameras for coarse-grained airspace monitoring and ISAC for fine-grained high-precision sensing of non-cooperative UAVs. Within the framework, the Vision-to-Echo Data Alignment (V2EDA) model aligns visual and echo-domain features via cross-attention, and the Multimodal Fusion-Based Estimation (MMFE) model integrates historical multimodal data with current observations for state estimation. Tests on the DeepSense 6G dataset report an average 71 percent reduction in beam steering overhead and 1.69 to 11.15 percent reduction in tracking overhead while maintaining high angular estimation accuracy.

What carries the argument

The Vision-to-Echo Data Alignment (V2EDA) model, which aligns visual and echo-domain features through cross-attention mechanisms to support subsequent multimodal state estimation.

If this is right

ISAC systems can allocate a larger share of resources to communication tasks instead of beam steering.
Reliable surveillance of non-cooperative UAVs becomes feasible with lower overall system overhead.
Resource contention between sensing and communication is reduced, supporting additional communication services.
High angular accuracy is retained even as overhead metrics improve.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same camera-ISAC pairing could be tested on other fast-moving non-cooperative objects such as birds or ground vehicles.
Integration with additional sensor types might further lower tracking overhead if the alignment model generalizes.
Deployment in dense urban 6G scenarios would likely require testing the framework's robustness to varying lighting and weather conditions.

Load-bearing premise

The cross-attention alignment between visual and echo features succeeds without introducing misalignment errors that would degrade the downstream state estimation accuracy.

What would settle it

A side-by-side comparison on the same dataset showing that angular estimation error rises sharply or overhead reductions disappear when the cross-attention alignment step is removed or replaced with independent processing of each modality.

Figures

Figures reproduced from arXiv: 2605.22090 by Kun Yang, Luping Xiang, Wenfeng Wu.

**Figure 1.** Figure 1: System model. proposed, followed by the description of the MMFE model in Section V. Section VI provides numerical results and analysis of the proposed algorithms. Finally, the paper concludes in Section VII. II. SYSTEM MODEL As depicted in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: We describe the functional roles of each task and the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 2.** Figure 2: CC-ISAC framework. targets. Offloading the initial detection task to the vision module (Task A) allows the BS to reduce its sensing resource consumption in Tasks B and C, thereby preserving valuable spatio-temporal resources for communication and other concurrent services. This cooperative design effectively mitigates sensing overhead while improving overall network efficiency. The proposed CC-ISAC framew… view at source ↗

**Figure 3.** Figure 3: Diffusive beam scanning strategy. We propose a hierarchical diffusive beam scanning strategy, in which the candidate sets I (1) → I(2) → I(3) → I(4) are sequentially scanned, as shown in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Angular variation with different ∆t values for various v⊥. Since the activation of the directional echo-sensing beam follows the visual perception process with a latency ∆t, induced by real-time monitoring, feedback transmission, and cross-modal alignment, it is necessary to analyze whether the UAV’s angular displacement during this latency could lead to beam misalignment. The predefined codebook employed … view at source ↗

**Figure 5.** Figure 5: Architecture of the proposed V2EDA model. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Architecture of the proposed MMFE model. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 6.** Figure 6: This fallback mechanism ensures tracking continuity [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Angle error in V2EDA. b. V2EDA w/o ic: V2EDA removes the cropped patch ic, retaining only the detector’s bounding-box features, limiting depth cues. c. V2EDA w/o Fusion: V2EDA replaces the Feature Fusion (CA) with a common concatenation fusion strategy [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: MMFE angle-error evaluation: The left panel shows the boxplot, and the right panel displays the CPF plot. The two [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Fault tolerance capability of MMFE: (a) Vision-Only; (b1)-(b3) Echo-Only at SNR=0,-1,-2; (c1)-(c3) MMFE at SNR=0,- [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Performance of vision-assisted beam selection. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: TComm. resource recovery in half-frame: (a) Beam Steering; (b)-(f) Beam Tracking. considered complete upon successful target detection, thereby directly reflecting the time and energy resources expended during initial access. As shown in Table V, the scanning overhead increases as the beam-width narrows. Compared to the conventional hierarchical scanning method, our proposed vision-assisted search strateg… view at source ↗

read the original abstract

The detection of non-cooperative unmanned aerial vehicles (UAVs) presents significant challenges for Integrated Sensing and Communication (ISAC) systems due to the inherent limitations of single-modal perception and the competition for shared communication and sensing resources. To address these challenges, this paper proposes a novel Camera-Cooperative ISAC (CC-ISAC) framework that employs multimodal sensing to enable efficient UAV beam steering and tracking. The proposed framework employs cameras for coarse-grained airspace monitoring and utilizes ISAC for fine-grained, high-precision sensing, forming a complementary perception loop that enhances both sensing accuracy and resource efficiency. Within this framework, two key modules are developed: (1) a Vision-to-Echo Data Alignment (V2EDA) model that aligns visual and echo-domain features through cross-attention mechanisms, and (2) a Multimodal Fusion-Based Estimation (MMFE) model that integrates historical multimodal data with current observations for robust state estimation. Extensive evaluations conducted on the DeepSense 6G dataset demonstrate that the proposed framework achieves an average reduction of 71% in beam steering overhead and 1.69-11.15% in tracking overhead while maintaining high angular estimation accuracy. The CC-ISAC framework effectively mitigates resource contention between sensing and communication, enabling reliable UAV surveillance while freeing substantial system resources for additional communication tasks, thereby representing a practical advancement in ISAC system design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The CC-ISAC setup delivers concrete overhead cuts on DeepSense 6G data but the V2EDA alignment step still needs tighter error checks to back the claims.

read the letter

The main takeaway is a camera-assisted ISAC system for non-cooperative UAV detection that pairs coarse visual monitoring with fine echo sensing. The authors report 71% lower beam steering overhead and 1.69-11.15% lower tracking overhead while holding angular accuracy, all tested on the DeepSense 6G dataset. They package the idea as the CC-ISAC framework with two modules: V2EDA for cross-attention alignment of vision and echo features, and MMFE for fusing multimodal history into state estimates. This addresses the resource split between sensing and communication in a direct way. The work does a reasonable job of describing a complementary sensing loop and showing numerical gains from the multimodal approach on a public dataset. Those overhead numbers could matter for 6G systems that need to free up spectrum for actual data traffic. The framing is clear enough that a reader can see how the coarse-to-fine handoff is supposed to work. The soft spot sits in the alignment step. The abstract and stress-test note give no direct metrics on V2EDA performance, such as alignment error, correlation scores, or an ablation that isolates the cross-attention. If the mapping between camera pixels and echo returns drifts under motion or multipath, the reported savings would shrink. The full paper should supply those checks plus clearer baselines and variance numbers to make the results robust. This paper is aimed at people working on ISAC for UAV surveillance or 6G resource management. A reader already familiar with multimodal sensing would find the specific overhead claims and module names useful as a practical example. It deserves a serious referee because the dataset evaluation and system-level numbers give something concrete to examine, even if the validation details need strengthening.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Camera-Cooperative ISAC (CC-ISAC) framework for multimodal sensing of non-cooperative UAVs. Cameras provide coarse airspace monitoring while ISAC supplies fine-grained sensing in a complementary loop. Key components are the Vision-to-Echo Data Alignment (V2EDA) model, which uses cross-attention to align visual and echo-domain features, and the Multimodal Fusion-Based Estimation (MMFE) model, which fuses historical and current multimodal observations for state estimation. Experiments on the DeepSense 6G dataset report an average 71% reduction in beam steering overhead and 1.69-11.15% reduction in tracking overhead while preserving high angular estimation accuracy.

Significance. If the alignment and fusion steps prove robust, the framework offers a concrete route to easing resource contention between sensing and communication in ISAC systems for UAV surveillance. The reported overhead savings, if reproducible, would free substantial bandwidth for additional communication tasks and represent a practical step toward efficient multimodal ISAC deployments.

major comments (2)

[V2EDA model] V2EDA model description: no quantitative alignment metrics (e.g., mean pixel-to-echo registration error, feature correlation coefficient, or alignment loss value) are supplied. Because the 71% beam-steering reduction rests on the assumption that cross-attention produces sufficiently accurate visual-echo correspondence for the subsequent MMFE estimator, the absence of these diagnostics leaves the central performance claim unsupported.
[Experimental results] Experimental results section: the headline overhead reductions are stated without reference to concrete baselines, statistical significance tests, error bars, dataset split details, or ablation runs that disable the cross-attention module. Without these controls it is impossible to determine whether the reported gains are attributable to the proposed CC-ISAC loop or to dataset-specific artifacts.

minor comments (2)

[Abstract] The abstract would be clearer if it named the specific baseline methods against which the 71% and 1.69-11.15% figures are measured.
[Notation] Notation for beam-steering and tracking overhead should be defined explicitly (e.g., as a percentage of total slots or as absolute time) the first time it appears in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our CC-ISAC framework manuscript. We address each major comment point by point below, indicating planned revisions to improve clarity and support for our claims.

read point-by-point responses

Referee: [V2EDA model] V2EDA model description: no quantitative alignment metrics (e.g., mean pixel-to-echo registration error, feature correlation coefficient, or alignment loss value) are supplied. Because the 71% beam-steering reduction rests on the assumption that cross-attention produces sufficiently accurate visual-echo correspondence for the subsequent MMFE estimator, the absence of these diagnostics leaves the central performance claim unsupported.

Authors: We acknowledge that the current manuscript does not report explicit quantitative alignment metrics for the V2EDA cross-attention module. The 71% beam-steering reduction is shown via end-to-end system-level results on DeepSense 6G. To directly address this concern and better substantiate the visual-echo correspondence, we will add quantitative diagnostics such as feature correlation coefficients and alignment loss values to the V2EDA description and experimental analysis in the revised manuscript. revision: yes
Referee: [Experimental results] Experimental results section: the headline overhead reductions are stated without reference to concrete baselines, statistical significance tests, error bars, dataset split details, or ablation runs that disable the cross-attention module. Without these controls it is impossible to determine whether the reported gains are attributable to the proposed CC-ISAC loop or to dataset-specific artifacts.

Authors: We agree that additional experimental controls would strengthen the results section. The reported overhead reductions are currently presented as overall framework gains. In revision we will expand this section to specify concrete baselines (e.g., single-modal ISAC and non-cooperative tracking methods), include statistical significance measures and error bars, detail the DeepSense 6G train/test splits, and add ablation experiments that disable the cross-attention component of V2EDA to isolate its contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on external dataset evaluation

full rationale

The CC-ISAC framework is defined through two modules (V2EDA cross-attention alignment and MMFE multimodal fusion) whose outputs are measured via empirical evaluation on the independent DeepSense 6G dataset. No equations, derivations, or self-referential definitions appear in the provided text that would reduce the reported 71% beam-steering or tracking-overhead reductions to fitted parameters or internal construction. The central claims are therefore falsifiable against external benchmarks and do not collapse by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described. Full paper may introduce modeling choices for cross-attention or state estimation that function as implicit assumptions.

pith-pipeline@v0.9.0 · 5776 in / 1217 out tokens · 113464 ms · 2026-05-22T06:06:03.205233+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Vision-to-Echo Data Alignment (V2EDA) model that aligns visual and echo-domain features through cross-attention mechanisms, and (2) a Multimodal Fusion-Based Estimation (MMFE) model
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hierarchical diffusive beam scanning strategy... average reduction of 71% in beam steering overhead

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

Low-altitude intelligent transportation: System architecture, infrastructure, and key technologies,

C. Huang, S. Fang, H. Wu, Y . Wang, and Y . Yang, “Low-altitude intelligent transportation: System architecture, infrastructure, and key technologies,” Journal of Industrial Information Integration, vol. 42, p. 100694, 2024

work page 2024
[2]

Communication and control in collaborative uavs: Recent advances and future trends,

S. Javaid, N. Saeed, Z. Qadir, H. Fahim, B. He, H. Song, and M. Bilal, “Communication and control in collaborative uavs: Recent advances and future trends,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 6, pp. 5719–5739, 2023

work page 2023
[3]

Co- operative isac-empowered low-altitude economy,

J. Tang, Y . Yu, C. Pan, H. Ren, D. Wang, J. Wang, and X. You, “Co- operative isac-empowered low-altitude economy,”IEEE Transactions on Wireless Communications, vol. 24, no. 5, pp. 3837–3853, 2025

work page 2025
[4]

Networked isac- based uav tracking and handover toward low-altitude economy,

C. Zhao, Y . Feng, H. Luo, F. Gao, F. Liu, and S. Jin, “Networked isac- based uav tracking and handover toward low-altitude economy,” IEEE Transactions on Wireless Communications, vol. 24, no. 9, pp. 7670– 7685, 2025

work page 2025
[5]

Integrated sensing and communications: Toward dual-functional wire- less networks for 6g and beyond,

F. Liu, Y . Cui, C. Masouros, J. Xu, T. X. Han, Y . C. Eldar, and S. Buzzi, “Integrated sensing and communications: Toward dual-functional wire- less networks for 6g and beyond,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 6, pp. 1728–1767, 2022

work page 2022
[6]

On the detection of unauthorized drones—techniques and future perspectives: A review,

M. A. Khan, H. Menouar, A. Eldeeb, A. Abu-Dayya, and F. D. Salim, “On the detection of unauthorized drones—techniques and future perspectives: A review,” IEEE Sensors Journal, vol. 22, no. 12, pp. 11 439–11 455, 2022

work page 2022
[7]

An overview of cellular isac for low-altitude uav: New opportunities and challenges,

Y . Song, Y . Zeng, Y . Yang, Z. Ren, G. Cheng, X. Xu, J. Xu, S. Jin, and R. Zhang, “An overview of cellular isac for low-altitude uav: New opportunities and challenges,” IEEE Communications Magazine, 2025

work page 2025
[8]

Intelligent multi-modal sensing-communication integration: Synesthesia of machines,

X. Cheng, H. Zhang, J. Zhang, S. Gao, S. Li, Z. Huang, L. Bai, Z. Yang, X. Zheng, and L. Yang, “Intelligent multi-modal sensing-communication integration: Synesthesia of machines,” IEEE Communications Surveys & Tutorials, vol. 26, no. 1, pp. 258–301, 2023

work page 2023
[9]

Ubiquitous acoustic sensing on commod- ity iot devices: A survey,

C. Cai, R. Zheng, and J. Luo, “Ubiquitous acoustic sensing on commod- ity iot devices: A survey,” IEEE Communications Surveys & Tutorials, vol. 24, no. 1, pp. 432–454, 2022

work page 2022
[10]

Integrated multimodal sensing and communication: Challenges, tech- nologies, and architectures,

Y . Peng, L. Xiang, K. Yang, F. Jiang, K. Wang, and C. Masouros, “Integrated multimodal sensing and communication: Challenges, tech- nologies, and architectures,” arXiv preprint arXiv:2506.22507, 2025

work page arXiv 2025
[11]

Gold- yolo: Efficient object detector via gather-and-distribute mechanism,

C. Wang, W. He, Y . Nie, J. Guo, C. Liu, Y . Wang, and K. Han, “Gold- yolo: Efficient object detector via gather-and-distribute mechanism,” Advances in Neural Information Processing Systems, vol. 36, pp. 51 094–51 112, 2023

work page 2023
[12]

Asf-yolo: A novel yolo model with attentional scale sequence fusion for cell instance segmentation,

M. Kang, C.-M. Ting, F. F. Ting, and R. C.-W. Phan, “Asf-yolo: A novel yolo model with attentional scale sequence fusion for cell instance segmentation,” Image and Vision Computing, vol. 147, p. 105057, 2024. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 16

work page 2024
[13]

A low-slow-small uav detection method based on fusion of range–doppler map and satellite map,

Q. Wang, H. Xu, S. Lin, J. Zhang, W. Zhang, S. Xiang, and M. Gao, “A low-slow-small uav detection method based on fusion of range–doppler map and satellite map,” IEEE Transactions on Aerospace and Electronic Systems, vol. 60, no. 4, pp. 4767–4783, 2024

work page 2024
[14]

Real- time detection for small uavs: Combining yolo and multi-frame motion analysis,

J. Liu, L. Plotegher, E. Roura, C. de Souza Junior, and S. He, “Real- time detection for small uavs: Combining yolo and multi-frame motion analysis,” IEEE Transactions on Aerospace and Electronic Systems, 2025

work page 2025
[15]

A lightweight and accurate uav detection method based on yolov4,

H. Cai, Y . Xie, J. Xu, and Z. Xiong, “A lightweight and accurate uav detection method based on yolov4,” Sensors, vol. 22, no. 18, p. 6874, 2022

work page 2022
[16]

Global-local mav detection under challenging conditions based on appearance and mo- tion,

H. Guo, Y . Zheng, Y . Zhang, Z. Gao, and S. Zhao, “Global-local mav detection under challenging conditions based on appearance and mo- tion,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 9, pp. 12 005–12 017, 2024

work page 2024
[17]

Multi-target tracking and activity classification with millimeter-wave radar,

K. Z. Rajab, B. Wu, P. Alizadeh, and A. Alomainy, “Multi-target tracking and activity classification with millimeter-wave radar,” Applied Physics Letters, vol. 119, no. 3, 2021

work page 2021
[18]

Digital beamforming on receive array calibration: Appli- cation to a persistent x-band surface surveillance radar,

F. I. Urzaiz, J. Gismero-Menoyo, A. Asensio-Lopez, and A. D. de Quevedo, “Digital beamforming on receive array calibration: Appli- cation to a persistent x-band surface surveillance radar,” IEEE Sensors Journal, vol. 21, no. 5, pp. 6752–6760, 2020

work page 2020
[19]

Drone detection & classification with surveillance ‘radar on-the-move’and yolo,

H. Haifawi, F. Fioranelli, A. Yarovoy, and R. van der Meer, “Drone detection & classification with surveillance ‘radar on-the-move’and yolo,” in 2023 IEEE Radar Conference (RadarConf23). IEEE, 2023, pp. 1–6

work page 2023
[20]

Initial access in 5g mmwave cellular networks,

M. Giordani, M. Mezzavilla, and M. Zorzi, “Initial access in 5g mmwave cellular networks,” IEEE communications Magazine, vol. 54, no. 11, pp. 40–47, 2016

work page 2016
[21]

Hierarchical codebook design for beamforming training in millimeter-wave communication,

Z. Xiao, T. He, P. Xia, and X.-G. Xia, “Hierarchical codebook design for beamforming training in millimeter-wave communication,” IEEE Transactions on Wireless Communications, vol. 15, no. 5, pp. 3380– 3392, 2016

work page 2016
[22]

Deep learning on multi sensor data for counter uav applications—a systematic review,

S. Samaras, E. Diamantidou, D. Ataloglou, N. Sakellariou, A. Vafeiadis, V . Magoulianitis, A. Lalas, A. Dimou, D. Zarpalas, K. V otiset al., “Deep learning on multi sensor data for counter uav applications—a systematic review,”Sensors, vol. 19, no. 22, p. 4837, 2019

work page 2019
[23]

Real-time drone detection and tracking with visible, thermal and acoustic sensors,

F. Svanstr ¨om, C. Englund, and F. Alonso-Fernandez, “Real-time drone detection and tracking with visible, thermal and acoustic sensors,” in 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 7265–7272

work page 2020
[24]

Rcfusion: Fusing 4-d radar and camera with bird’s-eye view features for 3-d object detection,

L. Zheng, S. Li, B. Tan, L. Yang, S. Chen, L. Huang, J. Bai, X. Zhu, and Z. Ma, “Rcfusion: Fusing 4-d radar and camera with bird’s-eye view features for 3-d object detection,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–14, 2023

work page 2023
[25]

Radar-camera fusion for object detection and semantic segmentation in autonomous driving: A comprehensive review,

S. Yao, R. Guan, X. Huang, Z. Li, X. Sha, Y . Yue, E. G. Lim, H. Seo, K. L. Man, X. Zhu et al., “Radar-camera fusion for object detection and semantic segmentation in autonomous driving: A comprehensive review,” IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 2094–2128, 2023

work page 2094
[26]

Simac: A semantic-driven integrated multimodal sensing and communication framework,

Y . Peng, L. Xiang, K. Yang, F. Jiang, K. Wang, and D. O. Wu, “Simac: A semantic-driven integrated multimodal sensing and communication framework,” IEEE Journal on Selected Areas in Communications, pp. 1–1, 2025

work page 2025
[27]

Large language model- driven distributed integrated multimodal sensing and semantic commu- nications,

Y . Peng, L. Xiang, B. Zhang, and K. Yang, “Large language model- driven distributed integrated multimodal sensing and semantic commu- nications,” arXiv preprint arXiv:2505.18194, 2025

work page arXiv 2025
[28]

Radar+ rgb fusion for robust object detection in autonomous vehicle,

R. Yadav, A. Vierling, and K. Berns, “Radar+ rgb fusion for robust object detection in autonomous vehicle,” in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 1986–1990

work page 2020
[29]

Crn: Camera radar net for accurate, robust, efficient 3d perception,

Y . Kim, J. Shin, S. Kim, I.-J. Lee, J. W. Choi, and D. Kum, “Crn: Camera radar net for accurate, robust, efficient 3d perception,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 615–17 626

work page 2023
[30]

T-rodnet: Transformer for vehicular millimeter-wave radar object detection,

T. Jiang, L. Zhuang, Q. An, J. Wang, K. Xiao, and A. Wang, “T-rodnet: Transformer for vehicular millimeter-wave radar object detection,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–12, 2022

work page 2022
[31]

Computer vision aided mmwave beam alignment in v2x communications,

W. Xu, F. Gao, X. Tao, J. Zhang, and A. Alkhateeb, “Computer vision aided mmwave beam alignment in v2x communications,” IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2699– 2714, 2022

work page 2022
[32]

Environment semantic com- munication: Enabling distributed sensing aided networks,

S. Imran, G. Charan, and A. Alkhateeb, “Environment semantic com- munication: Enabling distributed sensing aided networks,” IEEE Open Journal of the Communications Society, 2024

work page 2024
[33]

Vehicle cameras guide mm wave beams: Approach and real-world v2v demonstration,

T. Osman, G. Charan, and A. Alkhateeb, “Vehicle cameras guide mm wave beams: Approach and real-world v2v demonstration,” in 2023 57th Asilomar Conference on Signals, Systems, and Computers, 2023, pp. 225–232

work page 2023
[34]

Vision-assisted beam prediction for real world 6g drone communication,

I. Ahmad, A. R. Khan, R. N. B. Rais, A. Zoha, M. A. Imran, and S. Hussain, “Vision-assisted beam prediction for real world 6g drone communication,” in 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), 2023, pp. 1–7

work page 2023
[35]

Occlusion-aware vision-aided beam tracking for multi-user v2i mmwave networks,

J. Park, J.-H. Ahn, J. Seo, and J. Kang, “Occlusion-aware vision-aided beam tracking for multi-user v2i mmwave networks,” in ICC 2025 - IEEE International Conference on Communications, 2025, pp. 2210– 2216

work page 2025
[36]

Deepsense 6g: A large-scale real-world multi-modal sensing and communication dataset,

A. Alkhateeb, G. Charan, T. Osman, A. Hredzak, J. Morais, U. Demirhan, and N. Srinivas, “Deepsense 6g: A large-scale real-world multi-modal sensing and communication dataset,” IEEE Communications Magazine, 2023

work page 2023
[37]

A novel 3d beam training strategy for mmwave uav communications,

W. Zhong, Y . Gu, Q. Zhu, L. Wang, X. Chen, and K. Mao, “A novel 3d beam training strategy for mmwave uav communications,” in 2020 14th European Conference on Antennas and Propagation (EuCAP). IEEE, 2020, pp. 1–5

work page 2020
[38]

On the single-target accuracy of ofdm radar algorithms,

M. Braun, C. Sturm, and F. K. Jondral, “On the single-target accuracy of ofdm radar algorithms,” in 2011 IEEE 22nd International Symposium on Personal, Indoor and Mobile Radio Communications. IEEE, 2011, pp. 794–798

work page 2011
[39]

Time-division isac enabled connected automated vehicles cooperation algorithm de- sign and performance evaluation,

Q. Zhang, H. Sun, X. Gao, X. Wang, and Z. Feng, “Time-division isac enabled connected automated vehicles cooperation algorithm de- sign and performance evaluation,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 7, pp. 2206–2218, 2022

work page 2022
[40]

Radar and camera fusion for object detection and tracking: A comprehensive survey,

K. Shi, S. He, Z. Shi, A. Chen, Z. Xiong, J. Chen, and J. Luo, “Radar and camera fusion for object detection and tracking: A comprehensive survey,”IEEE Communications Surveys & Tutorials, vol. 28, pp. 3478– 3520, 2026

work page 2026
[41]

Rethinking network design and local geometry in point cloud: A simple resid- ual mlp framework

X. Ma, C. Qin, H. You, H. Ran, and Y . Fu, “Rethinking network design and local geometry in point cloud: A simple residual mlp framework,” arXiv preprint arXiv:2202.07123, 2022

work page arXiv 2022
[42]

Grif net: Gated region of interest fusion network for robust 3d object detection from radar point cloud and monocular image,

Y . Kim, J. W. Choi, and D. Kum, “Grif net: Gated region of interest fusion network for robust 3d object detection from radar point cloud and monocular image,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10 857–10 864

work page 2020
[43]

Estimating optimal tracking filter performance for manned maneuvering targets,

R. A. Singer, “Estimating optimal tracking filter performance for manned maneuvering targets,” IEEE Transactions on Aerospace and Electronic Systems, vol. AES-6, no. 4, pp. 473–483, 1970

work page 1970
[44]

Extended kalman filter beam tracking for millimeter wave ve- hicular communications,

S. Shaham, M. Kokshoorn, M. Ding, Z. Lin, and M. Shirvanimoghad- dam, “Extended kalman filter beam tracking for millimeter wave ve- hicular communications,” in 2020 IEEE International Conference on Communications Workshops (ICC Workshops), 2020, pp. 1–6

work page 2020
[45]

YOLOv4: Optimal Speed and Accuracy of Object Detection

A. Bochkovskiy, C.-Y . Wang, and H.-Y . M. Liao, “Yolov4: Op- timal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004

[1] [1]

Low-altitude intelligent transportation: System architecture, infrastructure, and key technologies,

C. Huang, S. Fang, H. Wu, Y . Wang, and Y . Yang, “Low-altitude intelligent transportation: System architecture, infrastructure, and key technologies,” Journal of Industrial Information Integration, vol. 42, p. 100694, 2024

work page 2024

[2] [2]

Communication and control in collaborative uavs: Recent advances and future trends,

S. Javaid, N. Saeed, Z. Qadir, H. Fahim, B. He, H. Song, and M. Bilal, “Communication and control in collaborative uavs: Recent advances and future trends,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 6, pp. 5719–5739, 2023

work page 2023

[3] [3]

Co- operative isac-empowered low-altitude economy,

J. Tang, Y . Yu, C. Pan, H. Ren, D. Wang, J. Wang, and X. You, “Co- operative isac-empowered low-altitude economy,”IEEE Transactions on Wireless Communications, vol. 24, no. 5, pp. 3837–3853, 2025

work page 2025

[4] [4]

Networked isac- based uav tracking and handover toward low-altitude economy,

C. Zhao, Y . Feng, H. Luo, F. Gao, F. Liu, and S. Jin, “Networked isac- based uav tracking and handover toward low-altitude economy,” IEEE Transactions on Wireless Communications, vol. 24, no. 9, pp. 7670– 7685, 2025

work page 2025

[5] [5]

Integrated sensing and communications: Toward dual-functional wire- less networks for 6g and beyond,

F. Liu, Y . Cui, C. Masouros, J. Xu, T. X. Han, Y . C. Eldar, and S. Buzzi, “Integrated sensing and communications: Toward dual-functional wire- less networks for 6g and beyond,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 6, pp. 1728–1767, 2022

work page 2022

[6] [6]

On the detection of unauthorized drones—techniques and future perspectives: A review,

M. A. Khan, H. Menouar, A. Eldeeb, A. Abu-Dayya, and F. D. Salim, “On the detection of unauthorized drones—techniques and future perspectives: A review,” IEEE Sensors Journal, vol. 22, no. 12, pp. 11 439–11 455, 2022

work page 2022

[7] [7]

An overview of cellular isac for low-altitude uav: New opportunities and challenges,

Y . Song, Y . Zeng, Y . Yang, Z. Ren, G. Cheng, X. Xu, J. Xu, S. Jin, and R. Zhang, “An overview of cellular isac for low-altitude uav: New opportunities and challenges,” IEEE Communications Magazine, 2025

work page 2025

[8] [8]

Intelligent multi-modal sensing-communication integration: Synesthesia of machines,

X. Cheng, H. Zhang, J. Zhang, S. Gao, S. Li, Z. Huang, L. Bai, Z. Yang, X. Zheng, and L. Yang, “Intelligent multi-modal sensing-communication integration: Synesthesia of machines,” IEEE Communications Surveys & Tutorials, vol. 26, no. 1, pp. 258–301, 2023

work page 2023

[9] [9]

Ubiquitous acoustic sensing on commod- ity iot devices: A survey,

C. Cai, R. Zheng, and J. Luo, “Ubiquitous acoustic sensing on commod- ity iot devices: A survey,” IEEE Communications Surveys & Tutorials, vol. 24, no. 1, pp. 432–454, 2022

work page 2022

[10] [10]

Integrated multimodal sensing and communication: Challenges, tech- nologies, and architectures,

Y . Peng, L. Xiang, K. Yang, F. Jiang, K. Wang, and C. Masouros, “Integrated multimodal sensing and communication: Challenges, tech- nologies, and architectures,” arXiv preprint arXiv:2506.22507, 2025

work page arXiv 2025

[11] [11]

Gold- yolo: Efficient object detector via gather-and-distribute mechanism,

C. Wang, W. He, Y . Nie, J. Guo, C. Liu, Y . Wang, and K. Han, “Gold- yolo: Efficient object detector via gather-and-distribute mechanism,” Advances in Neural Information Processing Systems, vol. 36, pp. 51 094–51 112, 2023

work page 2023

[12] [12]

Asf-yolo: A novel yolo model with attentional scale sequence fusion for cell instance segmentation,

M. Kang, C.-M. Ting, F. F. Ting, and R. C.-W. Phan, “Asf-yolo: A novel yolo model with attentional scale sequence fusion for cell instance segmentation,” Image and Vision Computing, vol. 147, p. 105057, 2024. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 16

work page 2024

[13] [13]

A low-slow-small uav detection method based on fusion of range–doppler map and satellite map,

Q. Wang, H. Xu, S. Lin, J. Zhang, W. Zhang, S. Xiang, and M. Gao, “A low-slow-small uav detection method based on fusion of range–doppler map and satellite map,” IEEE Transactions on Aerospace and Electronic Systems, vol. 60, no. 4, pp. 4767–4783, 2024

work page 2024

[14] [14]

Real- time detection for small uavs: Combining yolo and multi-frame motion analysis,

J. Liu, L. Plotegher, E. Roura, C. de Souza Junior, and S. He, “Real- time detection for small uavs: Combining yolo and multi-frame motion analysis,” IEEE Transactions on Aerospace and Electronic Systems, 2025

work page 2025

[15] [15]

A lightweight and accurate uav detection method based on yolov4,

H. Cai, Y . Xie, J. Xu, and Z. Xiong, “A lightweight and accurate uav detection method based on yolov4,” Sensors, vol. 22, no. 18, p. 6874, 2022

work page 2022

[16] [16]

Global-local mav detection under challenging conditions based on appearance and mo- tion,

H. Guo, Y . Zheng, Y . Zhang, Z. Gao, and S. Zhao, “Global-local mav detection under challenging conditions based on appearance and mo- tion,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 9, pp. 12 005–12 017, 2024

work page 2024

[17] [17]

Multi-target tracking and activity classification with millimeter-wave radar,

K. Z. Rajab, B. Wu, P. Alizadeh, and A. Alomainy, “Multi-target tracking and activity classification with millimeter-wave radar,” Applied Physics Letters, vol. 119, no. 3, 2021

work page 2021

[18] [18]

Digital beamforming on receive array calibration: Appli- cation to a persistent x-band surface surveillance radar,

F. I. Urzaiz, J. Gismero-Menoyo, A. Asensio-Lopez, and A. D. de Quevedo, “Digital beamforming on receive array calibration: Appli- cation to a persistent x-band surface surveillance radar,” IEEE Sensors Journal, vol. 21, no. 5, pp. 6752–6760, 2020

work page 2020

[19] [19]

Drone detection & classification with surveillance ‘radar on-the-move’and yolo,

H. Haifawi, F. Fioranelli, A. Yarovoy, and R. van der Meer, “Drone detection & classification with surveillance ‘radar on-the-move’and yolo,” in 2023 IEEE Radar Conference (RadarConf23). IEEE, 2023, pp. 1–6

work page 2023

[20] [20]

Initial access in 5g mmwave cellular networks,

M. Giordani, M. Mezzavilla, and M. Zorzi, “Initial access in 5g mmwave cellular networks,” IEEE communications Magazine, vol. 54, no. 11, pp. 40–47, 2016

work page 2016

[21] [21]

Hierarchical codebook design for beamforming training in millimeter-wave communication,

Z. Xiao, T. He, P. Xia, and X.-G. Xia, “Hierarchical codebook design for beamforming training in millimeter-wave communication,” IEEE Transactions on Wireless Communications, vol. 15, no. 5, pp. 3380– 3392, 2016

work page 2016

[22] [22]

Deep learning on multi sensor data for counter uav applications—a systematic review,

S. Samaras, E. Diamantidou, D. Ataloglou, N. Sakellariou, A. Vafeiadis, V . Magoulianitis, A. Lalas, A. Dimou, D. Zarpalas, K. V otiset al., “Deep learning on multi sensor data for counter uav applications—a systematic review,”Sensors, vol. 19, no. 22, p. 4837, 2019

work page 2019

[23] [23]

Real-time drone detection and tracking with visible, thermal and acoustic sensors,

F. Svanstr ¨om, C. Englund, and F. Alonso-Fernandez, “Real-time drone detection and tracking with visible, thermal and acoustic sensors,” in 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 7265–7272

work page 2020

[24] [24]

Rcfusion: Fusing 4-d radar and camera with bird’s-eye view features for 3-d object detection,

L. Zheng, S. Li, B. Tan, L. Yang, S. Chen, L. Huang, J. Bai, X. Zhu, and Z. Ma, “Rcfusion: Fusing 4-d radar and camera with bird’s-eye view features for 3-d object detection,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–14, 2023

work page 2023

[25] [25]

Radar-camera fusion for object detection and semantic segmentation in autonomous driving: A comprehensive review,

S. Yao, R. Guan, X. Huang, Z. Li, X. Sha, Y . Yue, E. G. Lim, H. Seo, K. L. Man, X. Zhu et al., “Radar-camera fusion for object detection and semantic segmentation in autonomous driving: A comprehensive review,” IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 2094–2128, 2023

work page 2094

[26] [26]

Simac: A semantic-driven integrated multimodal sensing and communication framework,

Y . Peng, L. Xiang, K. Yang, F. Jiang, K. Wang, and D. O. Wu, “Simac: A semantic-driven integrated multimodal sensing and communication framework,” IEEE Journal on Selected Areas in Communications, pp. 1–1, 2025

work page 2025

[27] [27]

Large language model- driven distributed integrated multimodal sensing and semantic commu- nications,

Y . Peng, L. Xiang, B. Zhang, and K. Yang, “Large language model- driven distributed integrated multimodal sensing and semantic commu- nications,” arXiv preprint arXiv:2505.18194, 2025

work page arXiv 2025

[28] [28]

Radar+ rgb fusion for robust object detection in autonomous vehicle,

R. Yadav, A. Vierling, and K. Berns, “Radar+ rgb fusion for robust object detection in autonomous vehicle,” in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 1986–1990

work page 2020

[29] [29]

Crn: Camera radar net for accurate, robust, efficient 3d perception,

Y . Kim, J. Shin, S. Kim, I.-J. Lee, J. W. Choi, and D. Kum, “Crn: Camera radar net for accurate, robust, efficient 3d perception,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 615–17 626

work page 2023

[30] [30]

T-rodnet: Transformer for vehicular millimeter-wave radar object detection,

T. Jiang, L. Zhuang, Q. An, J. Wang, K. Xiao, and A. Wang, “T-rodnet: Transformer for vehicular millimeter-wave radar object detection,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–12, 2022

work page 2022

[31] [31]

Computer vision aided mmwave beam alignment in v2x communications,

W. Xu, F. Gao, X. Tao, J. Zhang, and A. Alkhateeb, “Computer vision aided mmwave beam alignment in v2x communications,” IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2699– 2714, 2022

work page 2022

[32] [32]

Environment semantic com- munication: Enabling distributed sensing aided networks,

S. Imran, G. Charan, and A. Alkhateeb, “Environment semantic com- munication: Enabling distributed sensing aided networks,” IEEE Open Journal of the Communications Society, 2024

work page 2024

[33] [33]

Vehicle cameras guide mm wave beams: Approach and real-world v2v demonstration,

T. Osman, G. Charan, and A. Alkhateeb, “Vehicle cameras guide mm wave beams: Approach and real-world v2v demonstration,” in 2023 57th Asilomar Conference on Signals, Systems, and Computers, 2023, pp. 225–232

work page 2023

[34] [34]

Vision-assisted beam prediction for real world 6g drone communication,

I. Ahmad, A. R. Khan, R. N. B. Rais, A. Zoha, M. A. Imran, and S. Hussain, “Vision-assisted beam prediction for real world 6g drone communication,” in 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), 2023, pp. 1–7

work page 2023

[35] [35]

Occlusion-aware vision-aided beam tracking for multi-user v2i mmwave networks,

J. Park, J.-H. Ahn, J. Seo, and J. Kang, “Occlusion-aware vision-aided beam tracking for multi-user v2i mmwave networks,” in ICC 2025 - IEEE International Conference on Communications, 2025, pp. 2210– 2216

work page 2025

[36] [36]

Deepsense 6g: A large-scale real-world multi-modal sensing and communication dataset,

A. Alkhateeb, G. Charan, T. Osman, A. Hredzak, J. Morais, U. Demirhan, and N. Srinivas, “Deepsense 6g: A large-scale real-world multi-modal sensing and communication dataset,” IEEE Communications Magazine, 2023

work page 2023

[37] [37]

A novel 3d beam training strategy for mmwave uav communications,

W. Zhong, Y . Gu, Q. Zhu, L. Wang, X. Chen, and K. Mao, “A novel 3d beam training strategy for mmwave uav communications,” in 2020 14th European Conference on Antennas and Propagation (EuCAP). IEEE, 2020, pp. 1–5

work page 2020

[38] [38]

On the single-target accuracy of ofdm radar algorithms,

M. Braun, C. Sturm, and F. K. Jondral, “On the single-target accuracy of ofdm radar algorithms,” in 2011 IEEE 22nd International Symposium on Personal, Indoor and Mobile Radio Communications. IEEE, 2011, pp. 794–798

work page 2011

[39] [39]

Time-division isac enabled connected automated vehicles cooperation algorithm de- sign and performance evaluation,

Q. Zhang, H. Sun, X. Gao, X. Wang, and Z. Feng, “Time-division isac enabled connected automated vehicles cooperation algorithm de- sign and performance evaluation,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 7, pp. 2206–2218, 2022

work page 2022

[40] [40]

Radar and camera fusion for object detection and tracking: A comprehensive survey,

K. Shi, S. He, Z. Shi, A. Chen, Z. Xiong, J. Chen, and J. Luo, “Radar and camera fusion for object detection and tracking: A comprehensive survey,”IEEE Communications Surveys & Tutorials, vol. 28, pp. 3478– 3520, 2026

work page 2026

[41] [41]

Rethinking network design and local geometry in point cloud: A simple resid- ual mlp framework

X. Ma, C. Qin, H. You, H. Ran, and Y . Fu, “Rethinking network design and local geometry in point cloud: A simple residual mlp framework,” arXiv preprint arXiv:2202.07123, 2022

work page arXiv 2022

[42] [42]

Grif net: Gated region of interest fusion network for robust 3d object detection from radar point cloud and monocular image,

Y . Kim, J. W. Choi, and D. Kum, “Grif net: Gated region of interest fusion network for robust 3d object detection from radar point cloud and monocular image,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10 857–10 864

work page 2020

[43] [43]

Estimating optimal tracking filter performance for manned maneuvering targets,

R. A. Singer, “Estimating optimal tracking filter performance for manned maneuvering targets,” IEEE Transactions on Aerospace and Electronic Systems, vol. AES-6, no. 4, pp. 473–483, 1970

work page 1970

[44] [44]

Extended kalman filter beam tracking for millimeter wave ve- hicular communications,

S. Shaham, M. Kokshoorn, M. Ding, Z. Lin, and M. Shirvanimoghad- dam, “Extended kalman filter beam tracking for millimeter wave ve- hicular communications,” in 2020 IEEE International Conference on Communications Workshops (ICC Workshops), 2020, pp. 1–6

work page 2020

[45] [45]

YOLOv4: Optimal Speed and Accuracy of Object Detection

A. Bochkovskiy, C.-Y . Wang, and H.-Y . M. Liao, “Yolov4: Op- timal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004