Deep Reinforcement Learning for Cognitive Time-Division Joint SAR and Secure Communications
Pith reviewed 2026-05-10 16:37 UTC · model grok-4.3
The pith
Deep reinforcement learning optimizes time and power allocation in a time-division joint SAR and secure communication system to maximize the worst-case secrecy rate.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a cognitive time-division joint SAR and communication system, in which an aerial base station estimates eavesdropper position and velocity via along-track interferometry, formulates the resulting worst-case secrecy-rate maximization as a Markov decision process, and solves it by deep reinforcement learning, yields higher secrecy rates than fixed or random time-allocation schemes while meeting both SAR imaging and communication constraints.
What carries the argument
The Markov decision process whose state captures the eavesdropper trajectory estimated by cognitive SAR along-track interferometry and whose actions are the time and power allocations between SAR and secure communication phases.
If this is right
- The learned policy achieves higher worst-case secrecy rates than both learning and non-learning baselines that use equal-aperture or random time allocation.
- The same policy generalizes to previously unseen eavesdropper motion patterns without retraining.
- Joint optimization of time and power satisfies the SAR imaging quality and communication rate constraints simultaneously.
- Adaptive beamforming and artificial-noise jamming, driven by the SAR-derived estimates, improve secrecy against a ground-moving eavesdropper.
Where Pith is reading between the lines
- The framework could scale to multiple simultaneous users or eavesdroppers by expanding the MDP state and action spaces, provided training remains tractable.
- Because the method relies on SAR-derived motion estimates, combining it with complementary sensors such as optical or inertial data could reduce sensitivity to SAR-specific error sources.
- If the DRL agent is trained only in simulation, real-world transfer would require calibration of the channel and motion models to close the domain gap.
Load-bearing premise
Cognitive SAR along-track interferometry produces position and velocity estimates of the eavesdropper that are accurate enough for adaptive beamforming and artificial-noise jamming to deliver the claimed secrecy gains.
What would settle it
A test in which realistic SAR estimation errors cause the learned policy to produce lower secrecy rates than the equal-aperture baseline, or in which the policy fails to generalize to new eavesdropper trajectories, would falsify the performance claims.
Figures
read the original abstract
Synthetic aperture radar (SAR) imaging can be exploited to enhance wireless communication performance through high-precision environmental awareness. However, integrating sensing and communication functionalities in such wideband systems remains challenging, motivating the development of a joint SAR and communication (JSARC) framework. We propose a dynamic time-division JSARC (TD-JSARC) framework for secure aerial communications that is relevant for critical scenarios, such as surveillance or post-disaster communication, where conventional localization of mobile adversaries often fails. In particular, we consider a secure downlink communication scenario where an aerial base station (ABS) serves a ground user (UE) in the presence of a ground-moving eavesdropper. To detect and track the eavesdropper, the ABS uses cognitive SAR along-track interferometry (ATI) to estimate its position and velocity. Based on these estimates, the ABS applies adaptive beamforming and artificial-noise jamming to enhance secrecy. To this end, we jointly optimize the time and power allocation to maximize the worst-case secrecy rate, while satisfying both SAR and communication constraints. Using the estimated eavesdropper trajectory, we formulate the problem as a Markov decision process (MDP) and solve it via deep reinforcement learning (DRL). Simulation results show that the proposed learning-based approach outperforms both learning and non-learning baseline schemes employing equal-aperture and random time allocation. The proposed method also generalizes well to previously unseen eavesdropper motion patterns.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a time-division joint SAR and secure communications (TD-JSARC) framework for an aerial base station serving a ground user in the presence of a ground-moving eavesdropper. Cognitive SAR along-track interferometry (ATI) is used to estimate the eavesdropper's position and velocity, which inform adaptive beamforming and artificial-noise jamming. The joint time and power allocation problem is formulated to maximize the worst-case secrecy rate subject to SAR imaging and communication constraints, cast as a Markov decision process (MDP), and solved via deep reinforcement learning (DRL). Simulation results are presented claiming that the DRL approach outperforms both learning and non-learning baselines using equal-aperture and random time allocation, and that the learned policy generalizes to previously unseen eavesdropper motion patterns.
Significance. If the results hold under realistic sensing conditions, the work advances integrated sensing and communications (ISAC) for physical-layer security in dynamic aerial scenarios by coupling SAR-based adversary tracking with DRL-driven resource allocation. Credit is given for the reported generalization to unseen motion patterns, which provides evidence that the policy captures transferable structure rather than memorizing specific trajectories. The simulation-based outperformance over multiple baselines is a concrete strength, though its weight depends on validation of the underlying ATI accuracy assumption.
major comments (2)
- [MDP formulation and simulation setup] The MDP formulation (described in the system model and problem formulation sections) incorporates ATI-derived eavesdropper position and velocity estimates directly into the state for adaptive beamforming and AN jamming to achieve the worst-case secrecy rate. No error model is introduced for ATI inaccuracies (e.g., clutter, phase noise, or along-track velocity ambiguity), and no sensitivity analysis quantifies how secrecy rate and policy performance degrade under realistic estimation errors. This assumption is load-bearing for both the outperformance and generalization claims, as optimistic sensing inputs would render the reported gains artifacts of the simulation channel model rather than robust outcomes of the joint design.
- [Simulation results] The simulation results (abstract and results section) report outperformance and generalization but omit key details required for assessment: the specific DRL architecture (e.g., actor-critic network type, layer sizes), full constraint formulations in the MDP reward and state transitions, hyperparameter values beyond learning rate and discount factor, and statistical validation (e.g., number of independent runs, confidence intervals, or variance across random seeds). Without these, it is not possible to determine whether the gains are reproducible or sensitive to simulation artifacts.
minor comments (2)
- [System model] Notation for time-division parameters and secrecy rate expressions could be introduced more explicitly with a dedicated table of symbols to improve readability.
- [Results] Figure captions in the results section should specify the exact simulation parameters (e.g., SNR ranges, eavesdropper velocity distributions) used for each curve to allow direct comparison with the baselines.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for major revision. We address each major comment point by point below, providing clarifications on the current manuscript and committing to specific revisions that strengthen the work without misrepresenting our contributions.
read point-by-point responses
-
Referee: [MDP formulation and simulation setup] The MDP formulation incorporates ATI-derived eavesdropper position and velocity estimates directly into the state without an error model for ATI inaccuracies (e.g., clutter, phase noise, or along-track velocity ambiguity), and no sensitivity analysis quantifies how secrecy rate and policy performance degrade under realistic estimation errors. This assumption is load-bearing for both the outperformance and generalization claims.
Authors: We agree that robustness to ATI estimation errors is important for the claims. The current manuscript assumes ideal estimates to isolate the performance gains from the joint time/power optimization and DRL policy under perfect sensing, which is a standard initial approach in ISAC studies. However, we will revise the manuscript to include a dedicated sensitivity analysis subsection. This will model realistic ATI errors (e.g., additive Gaussian noise on position/velocity estimates with varying variances) and quantify degradation in worst-case secrecy rate, along with how the learned policy performs under noisy states. We will also add discussion on potential robustness enhancements, such as training the DRL agent with noisy observations. revision: yes
-
Referee: [Simulation results] The simulation results omit key details required for assessment: the specific DRL architecture (e.g., actor-critic network type, layer sizes), full constraint formulations in the MDP reward and state transitions, hyperparameter values beyond learning rate and discount factor, and statistical validation (e.g., number of independent runs, confidence intervals, or variance across random seeds).
Authors: We thank the referee for highlighting these reproducibility issues. The original submission emphasized high-level performance comparisons, but we will expand the simulation setup and results sections in the revised version. We will specify the exact DRL architecture (actor-critic networks with layer sizes and activations), provide the complete MDP reward function and all state transition details including constraints, list all hyperparameters (including batch size, exploration parameters, etc.), and report statistical validation with averages over 10 independent runs, including standard deviations and confidence intervals. These additions will enable full assessment and reproduction. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper derives the MDP directly from the physical JSARC system model (ATI-based eavesdropper estimation, adaptive beamforming, AN jamming, SAR/comms constraints) and applies standard DRL to solve the time/power allocation for worst-case secrecy rate. Simulation-based comparisons to equal-aperture and random baselines, plus generalization tests on unseen motion patterns, are external evaluations against the same model; they do not reduce by construction to a fitted quantity or self-defined input. No self-definitional equations, no fitted parameters renamed as predictions, and no load-bearing self-citations appear in the abstract or described chain. The approach is a conventional model-based RL formulation that remains self-contained against its simulation benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- DRL hyperparameters including learning rate and discount factor
axioms (1)
- domain assumption The joint SAR sensing and secure communication dynamics can be accurately modeled as a Markov decision process with states based on eavesdropper estimates.
Reference graph
Works this paper leans on
-
[1]
A survey of physical layer security techniques for 5G wireless networks and challenges ahead,
Y . Wuet al., “A survey of physical layer security techniques for 5G wireless networks and challenges ahead,”IEEE J. Sel. Areas Commun., vol. 36, no. 4, pp. 679–695, 2018. 17
work page 2018
-
[2]
Physical layer security in UA V systems: Challenges and opportunities,
X. Sunet al., “Physical layer security in UA V systems: Challenges and opportunities,”IEEE Wireless Commun., vol. 26, no. 5, pp. 40–47, 2019
work page 2019
-
[3]
Securing UA V communications via joint trajectory and power control,
G. Zhang, Q. Wu, M. Cui, and R. Zhang, “Securing UA V communications via joint trajectory and power control,”IEEE Trans. Wireless Commun., vol. 18, no. 2, pp. 1376–1389, 2019
work page 2019
-
[4]
A. V . Savkin, H. Huang, and W. Ni, “Securing UA V communication in the presence of stationary or mobile eavesdroppers via online 3D trajectory planning,”IEEE Wireless Commun. Lett., vol. 9, no. 8, pp. 1211–1215, 2020
work page 2020
-
[5]
Y . Caiet al., “Joint trajectory and resource allocation design for energy-efficient secure UA V communication systems,” IEEE Trans. Commun., vol. 68, no. 7, pp. 4536–4553, 2020
work page 2020
-
[6]
M. I. Skolnik,Radar Handbook, 3rd ed. McGraw-Hill, 2008
work page 2008
-
[7]
UA V formation and resource allocation optimization for communication-assisted 3D InSAR sensing,
M.-A. Lahmeriet al., “UA V formation and resource allocation optimization for communication-assisted 3D InSAR sensing,” IEEE Trans. Commun., vol. 73, no. 8, pp. 5788–5804, 2025
work page 2025
-
[8]
Sensing accuracy optimization for communication-assisted dual-baseline UA V-InSAR,
——, “Sensing accuracy optimization for communication-assisted dual-baseline UA V-InSAR,” inProc. IEEE Int. Conf. Commun., 2025, pp. 6573–6578
work page 2025
-
[9]
Trajectory planning of cellular-connected UA V for communication-assisted radar sensing,
S. Hu, X. Yuan, W. Ni, and X. Wang, “Trajectory planning of cellular-connected UA V for communication-assisted radar sensing,”IEEE Trans. Commun., vol. 70, no. 9, pp. 6385–6396, 2022
work page 2022
-
[10]
Integrated sensing and communication for UA V-borne SAR systems,
Z. Liu, F. Zesonget al., “Integrated sensing and communication for UA V-borne SAR systems,” inInt. Symp. Commun. Inf. Technol., 2023, pp. 1–6
work page 2023
-
[11]
Z. Liuet al., “Joint user scheduling, power allocation, and trajectory design for joint SAR and communication UA V systems,”IEEE Trans. V eh. Technol., vol. 74, no. 2, pp. 3006–3016, 2025
work page 2025
-
[12]
Exploring ISAC technology for UA V SAR imaging,
S. Moroet al., “Exploring ISAC technology for UA V SAR imaging,” inIEEE Int. Conf. Commun., 2024, pp. 1582–1587
work page 2024
-
[13]
Cognitive radar: a way of the future,
S. Haykin, “Cognitive radar: a way of the future,”IEEE Signal Process. Mag., vol. 23, no. 1, pp. 30–40, 2006
work page 2006
-
[14]
A tutorial on synthetic aperture radar,
A. Moreiraet al., “A tutorial on synthetic aperture radar,”IEEE Geosci. Remote Sens. Mag., vol. 1, no. 1, pp. 6–43, 2013
work page 2013
-
[15]
D. Xuet al., “Robust and Secure Resource Allocation for ISAC Systems: A Novel Optimization Framework for Variable- Length Snapshots,”IEEE Trans. Commun., vol. 70, no. 12, pp. 8196–8214, 2022
work page 2022
-
[16]
Joint bi-static radar and communications designs for intelligent transportation,
N. Cao, Y . Chen, X. Gu, and W. Feng, “Joint bi-static radar and communications designs for intelligent transportation,” IEEE Trans. V eh. Technol., vol. 69, no. 11, pp. 13 060–13 071, 2020
work page 2020
-
[17]
Z. Xieet al., “Optimal Scheduling Policy for Time-Division Joint Radar and Communication Systems: Cross-Layer Design and Sensing for Free,”IEEE Internet Things J., vol. 10, no. 23, pp. 20 746–20 760, 2023
work page 2023
-
[18]
Moving target detection by along-track interferometry,
V . Pascazio, G. Schirinzi, and A. Farina, “Moving target detection by along-track interferometry,” inProc. IEEE Int. Geosci. Remote Sens. Symp., vol. 7, 2001, pp. 3024–3026
work page 2001
-
[19]
Performance assessment of along-track interferometry for detecting ground moving targets,
C. W. Chen, “Performance assessment of along-track interferometry for detecting ground moving targets,” inProc. IEEE Radar Conf., 2004, pp. 99–104
work page 2004
-
[20]
R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018
work page 2018
-
[21]
Proximal Policy Optimization Algorithms
J. Schulmanet al., “Proximal policy optimization algorithms,”arXiv preprint:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[22]
RLlib: Abstractions for distributed reinforcement learning,
E. Lianget al., “RLlib: Abstractions for distributed reinforcement learning,” inProc. 35th Int. Conf. Mach. Learn. (ICML), vol. 80. PMLR, 2018, pp. 3053–3062
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.