Robust SAC-Enabled UAV-RIS Assisted Secure MISO Systems With Untrusted EH Receivers
Pith reviewed 2026-05-21 12:47 UTC · model grok-4.3
The pith
Soft actor-critic optimization maximizes worst-case secrecy energy efficiency in UAV-RIS secure MISO systems with imperfect CSI.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In this work, the authors propose a soft actor-critic reinforcement learning method to solve the highly non-convex worst-case secrecy energy efficiency maximization problem in a secure UAV-RIS assisted multiuser MISO system with untrusted energy harvesting receivers, where the UAV location, power allocation, and discrete RIS phase shifts are jointly designed under imperfect CSI, and demonstrate through simulations its consistent outperformance over BCD-SCA and DRL alternatives with stable performance.
What carries the argument
A tailored soft actor-critic (SAC) framework that learns a stochastic policy for joint continuous and discrete action selection to maximize the objective while handling CSI uncertainty.
If this is right
- The SAC policy enables faster decision making for UAV-RIS configuration compared to iterative optimization methods.
- Higher secrecy energy efficiency can be achieved in practical scenarios with CSI errors.
- The system maintains performance stability when varying the number of users or RIS elements.
- Robustness allows deployment in environments with uncertain channel conditions.
Where Pith is reading between the lines
- This RL-based design could be extended to scenarios with mobile UAVs rather than fixed hovering positions.
- Potential integration with other emerging technologies like terahertz communications for enhanced security.
- Future work might explore transfer learning to adapt the policy to new system configurations quickly.
Load-bearing premise
The simulation scenarios and training procedures used for the SAC agent are representative of practical performance and that the learned policy generalizes beyond the specific training distributions without overfitting to the chosen channel models or uncertainty bounds.
What would settle it
Deployment of the trained SAC policy in a physical testbed with real UAV and RIS hardware under actual wireless channel conditions with varying CSI uncertainty; superior performance over BCD-SCA in terms of measured secrecy energy efficiency would support the claim, while inferior performance would falsify it.
Figures
read the original abstract
Secure downlink transmission in UAV-assisted reconfigurable intelligent surface (RIS)-enabled multiuser MISO systems is challenging due to imperfect channel state information (CSI), untrusted energy-harvesting receivers (UEHRs), and the strong coupling among UAV deployment, transmit power control, and RIS configuration. In this paper, we study a secure UAV-assisted RIS-enabled multiuser MISO system with UEHRs, where a hovering UAV-mounted RIS is jointly optimized in terms of its location, transmit power allocation, and discrete RIS phase shifts. The objective is to maximize the worst-case secrecy energy efficiency (WCSEE) under imperfect CSI and practical discrete phase-shift constraints. The resulting problem is highly nonconvex due to the fractional objective, coupled design variables, discrete phase shifts, and CSI uncertainty. To address these challenges, we propose two complementary approaches. First, a block coordinate descent (BCD) framework combined with successive convex approximation (SCA) is developed to solve a secrecy energy efficiency (SEE) formulation, serving as a structured model-based benchmark. Second, for the more general WCSEE problem, we propose a tailored soft actor-critic (SAC) framework that captures the coupling among variables and avoids repeated iterative optimization. Simulation results show that the proposed SAC method consistently outperforms conventional optimization and deep reinforcement learning (DRL)-based benchmarks, including deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3), while maintaining robustness to CSI uncertainty and stable performance across system configurations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies secure downlink transmission in a UAV-mounted RIS assisted multiuser MISO system with untrusted energy-harvesting receivers under imperfect CSI. It formulates a worst-case secrecy energy efficiency (WCSEE) maximization problem jointly optimizing UAV location, power allocation, and discrete RIS phase shifts. Two methods are proposed: a BCD-SCA benchmark for a related SEE problem and a tailored soft actor-critic (SAC) reinforcement learning framework for the full WCSEE problem. Simulations claim that SAC consistently outperforms DDPG, TD3, and conventional optimization baselines while remaining robust to CSI uncertainty.
Significance. If the performance claims hold under proper statistical validation, the work offers a scalable RL alternative to iterative convex optimization for non-convex, mixed discrete-continuous problems with uncertainty in UAV-RIS secure communications, which is relevant for practical 6G physical-layer security designs.
major comments (3)
- [§V] §V (Simulation Results): The reported outperformance of SAC over BCD+SCA, DDPG, and TD3 is presented without standard deviations, confidence intervals, or results aggregated over multiple independent training seeds. This omission prevents assessment of whether the gains are statistically reliable or sensitive to random initialization.
- [§V] §V (Simulation Results): All evaluated channel realizations and uncertainty bounds match the training distribution (Rician fading with fixed error variance and discrete phases from the same set). No distribution-shift experiments or out-of-sample CSI tests are included, weakening the robustness-to-CSI-uncertainty claim.
- [§IV] §IV (SAC Formulation): The state-action-reward design for the WCSEE objective is described at a high level; it is unclear how the worst-case secrecy rate under bounded CSI errors is encoded in the reward without introducing excessive conservatism or requiring additional inner optimization loops.
minor comments (2)
- [§II] Notation for the uncertainty set and the discrete phase-shift constraint should be introduced earlier and used consistently in the problem formulation.
- [§V] Figure captions for the convergence and performance plots would benefit from explicit mention of the number of Monte-Carlo trials and the exact parameter settings used.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate to strengthen the presentation and claims.
read point-by-point responses
-
Referee: [§V] §V (Simulation Results): The reported outperformance of SAC over BCD+SCA, DDPG, and TD3 is presented without standard deviations, confidence intervals, or results aggregated over multiple independent training seeds. This omission prevents assessment of whether the gains are statistically reliable or sensitive to random initialization.
Authors: We agree that the current results would benefit from explicit statistical validation. In the revised manuscript, we will report performance metrics averaged over at least 10 independent training seeds with different random initializations, including standard deviations and 95% confidence intervals in the figures and tables of Section V. This will allow readers to evaluate the reliability of the observed gains. revision: yes
-
Referee: [§V] §V (Simulation Results): All evaluated channel realizations and uncertainty bounds match the training distribution (Rician fading with fixed error variance and discrete phases from the same set). No distribution-shift experiments or out-of-sample CSI tests are included, weakening the robustness-to-CSI-uncertainty claim.
Authors: We acknowledge that the primary simulation setup uses channel realizations drawn from the same distribution employed during training. To better substantiate the robustness claim, the revised version will incorporate additional experiments with distribution shifts, such as varying Rician K-factors, increased CSI error bounds beyond the training range, and altered phase-shift quantization levels. These will be presented in an extended subsection of Section V. revision: yes
-
Referee: [§IV] §IV (SAC Formulation): The state-action-reward design for the WCSEE objective is described at a high level; it is unclear how the worst-case secrecy rate under bounded CSI errors is encoded in the reward without introducing excessive conservatism or requiring additional inner optimization loops.
Authors: We appreciate this request for clarification. In the revised manuscript, we will expand Section IV with a detailed description of the reward function. The worst-case secrecy rate is incorporated by evaluating a conservative lower bound on the secrecy rate using the bounded CSI error model (worst-case legitimate channel gain and best-case eavesdropper gains within the uncertainty set) directly in the reward computation. This is achieved via closed-form approximations derived from the uncertainty bounds, avoiding nested optimization loops while controlling conservatism through a tunable robustness parameter. The explicit reward expression and design rationale will be added. revision: yes
Circularity Check
No circularity: claims rest on simulation comparisons without reduction to fitted inputs or self-citations
full rationale
The paper presents a BCD+SCA benchmark and a SAC RL framework for WCSEE maximization under CSI uncertainty and discrete phases. The central results are empirical outperformance in simulations against DDPG, TD3, and conventional methods. No equations or sections reduce a claimed prediction to a fitted parameter by construction, nor do they rely on load-bearing self-citations or imported uniqueness theorems. The derivation chain for the optimization problem and the RL policy is self-contained against external benchmarks, with performance evaluated on standard Rician fading and uncertainty models.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a soft actor-critic (SAC)-based deep reinforcement learning framework that learns WCSEE-maximizing policies through interaction with the wireless environment.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
R. Kaur, et al., “A Survey on Reconfigurable Intelligent Surface for Physical Layer Security of Next-Generation Wireless Communications,” IEEE Open J. Veh. Technol., vol. 5, pp. 172-199, 2024
work page 2024
-
[2]
Deep Learning for Secure UA V-Assisted RIS Communication Networks,
U. A. Mughal, et al., “Deep Learning for Secure UA V-Assisted RIS Communication Networks,”IEEE Internet Things Mag., vol. 7, no. 2, pp. 38-44, March 2024
work page 2024
-
[3]
Robust Secure UA V Communications With the Aid of Reconfigurable Intelligent Surfaces,
S. Li, et al., “Robust Secure UA V Communications With the Aid of Reconfigurable Intelligent Surfaces,”IEEE Trans. Wirel. Commun., vol. 20, no. 10, pp. 6402-6417, Oct. 2021
work page 2021
-
[4]
RIS-Assisted Secure UA V Communication Scheme Against Active Jamming and Passive Eavesdropping,
Y . Shang, et al., “RIS-Assisted Secure UA V Communication Scheme Against Active Jamming and Passive Eavesdropping,”IEEE Trans. Intell. Transp. Syst., vol. 25, no. 11, pp. 16953-16963, Nov. 2024
work page 2024
-
[5]
Performance Analysis of RIS-Assisted Wireless Communications With Energy Harvesting,
B. Zhang, et al., “Performance Analysis of RIS-Assisted Wireless Communications With Energy Harvesting,”IEEE Trans. Veh. Technol., vol. 72, no. 1, pp. 1325-1330, Jan. 2023
work page 2023
-
[6]
Phase-Shift and Transmit Power Optimization for RIS-Aided Massive MIMO SWIPT IoT Networks,
M. Mohammadi, H. Q. Ngo and M. Matthaiou, “Phase-Shift and Transmit Power Optimization for RIS-Aided Massive MIMO SWIPT IoT Networks,”IEEE Trans. Communs, vol. 73, no. 1, pp. 631-647, Jan. 2025
work page 2025
-
[7]
Secrecy Energy Efficiency in Full-Duplex AF Relay Systems With Untrusted Energy Harvesters,
J. Ouyang, et al., “Secrecy Energy Efficiency in Full-Duplex AF Relay Systems With Untrusted Energy Harvesters,”IEEE Commun. Lett., vol. 25, no. 11, pp. 3493-3497, Nov. 2021
work page 2021
-
[8]
Secure SWIPT in the Multiuser STAR-RIS Aided MISO Rate Splitting Downlink,
H. R. Hashempour et al., “Secure SWIPT in the Multiuser STAR-RIS Aided MISO Rate Splitting Downlink,”IEEE Trans. Veh. Technol., vol. 73, no. 9, pp. 13466-13481, Sept. 2024
work page 2024
-
[9]
J. Li, D. Wang, H. Zhao, Y . Jin, and Y . He, “Enhancing secrecy energy efficiency in UA V-RIS assisted mobile IoV networks through deep reinforcement learning,”IEEE Trans. Wirel. Commun., doi: 10.1109/TWC.2025.3594691
-
[10]
Movable antenna SWIPT systems with STAR-RIS: A meta deep reinforcement learning approach,
M. Amiri, A. Mohammadzadeh, F. Zeinali, M. R. Mili, M. B. Mashhadi, and P. Xiao, “Movable antenna SWIPT systems with STAR-RIS: A meta deep reinforcement learning approach,”IEEE Trans. Veh. Technol., doi: 10.1109/TVT.2025.3622305
-
[11]
C. Luo, W. Jiang, D. Niyato, Z. Ding, J. Li, and Z. Xiong, “Optimiza- tion and DRL-based joint beamforming design for active-RIS enabled cognitive multicast systems,”IEEE Trans. Wirel. Commun., vol. 23, no. 11, pp. 16234-16247, Nov. 2024
work page 2024
-
[12]
Active RIS-aided EH-NOMA networks: A deep reinforcement learning approach,
Z. Shi, H. Lu, X. Xie, H. Yang, C. Huang, J. Cai, and Z. Ding, “Active RIS-aided EH-NOMA networks: A deep reinforcement learning approach,”IEEE Trans. Communs, vol. 71, no. 10, pp. 5846-5861, Oct. 2023
work page 2023
-
[13]
Deep Reinforcement Learning for Secrecy Energy Efficiency Maximization in RIS-Assisted Networks,
Y . Zhang, et al., “Deep Reinforcement Learning for Secrecy Energy Efficiency Maximization in RIS-Assisted Networks,”IEEE Trans. Veh. Technol., vol. 72, no. 9, pp. 12413-12418, Sept. 2023
work page 2023
-
[14]
DRL-based physical-layer security opti- mization in near-field MIMO systems,
M. M. Razaq and L. Peng, “DRL-based physical-layer security opti- mization in near-field MIMO systems,”IEEE Internet Things J., vol. 12, no. 12, pp. 18606-18615, 15 June15, 2025
work page 2025
-
[15]
Deep reinforcement learning for energy efficiency maximization in RSMA- IRS-assisted ISAC systems,
Z. Ma, R. Zhang, B. Ai, Z. Lian, L. Zeng, D. Niyato, and Y . Peng, “Deep reinforcement learning for energy efficiency maximization in RSMA- IRS-assisted ISAC systems,”IEEE Trans. Wireless Commun., vol. 74, no. 11, pp. 18273-18278, Nov. 2025
work page 2025
-
[16]
Robust secure beamforming design for two-user downlink MISO rate-splitting systems,
H. Fu, S. Feng, W. Tang, and D. W. K. Ng, “Robust secure beamforming design for two-user downlink MISO rate-splitting systems,”IEEE Trans. Wirel. Commun., vol. 19, no. 12, pp. 8351-8365, Dec. 2020
work page 2020
-
[17]
E. Boshkovska, D. W. K. Ng, N. Zlatanov, A. Koelpin and R. Schober, “Robust resource allocation for MIMO wireless powered communication networks based on a non-linear EH model,”IEEE Trans. Communs, vol. 65, no. 5, pp. 1984-1999, May 2017
work page 1984
-
[18]
E. A. Gharavol, Y . Liang and K. Mouthaan, “Robust downlink beam- forming in multiuser MISO cognitive radio networks with imperfect channel-state information,”IEEE Trans. Veh. Technol., vol. 59, no. 6, pp. 2852-2860, July 2010
work page 2010
-
[19]
Practical Non-Linear Energy Harvesting Model and Resource Allocation for SWIPT Systems,
E. Boshkovska, et al., “Practical Non-Linear Energy Harvesting Model and Resource Allocation for SWIPT Systems,”IEEE Commun. Lett., vol. 19, no. 12, pp. 2082-2085, Dec. 2015
work page 2082
-
[20]
Energy Efficiency in Cell-Free Massive MIMO with Zero-Forcing Precoding Design,
L. D. Nguyen, et al., “Energy Efficiency in Cell-Free Massive MIMO with Zero-Forcing Precoding Design,”IEEE Commun. Lett., vol. 21, no. 8, pp. 1871-1874, Aug. 2017
work page 2017
-
[21]
Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,”Proc. IEEE Int. Conf. Mach. Learn., 2018, pp. 1861–1870
work page 2018
-
[22]
Soft Actor-Critic Algorithms and Applications
T. Haarnoja et al., “Soft actor-critic algorithms and applications,” 2018, arXiv:1812.05905.[Online]. Available: http://arxiv.org/abs/1812.05905
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[23]
On nonlinear fractional programming,
W. Dinkelbach, “On nonlinear fractional programming,”Manage. Sci., vol. 13, no. 7, pp. 492–498, Mar. 1967
work page 1967
-
[24]
A Novel SCA-Based Method for Beamforming Optimization in IRS/RIS-Assisted MU-MISO Downlink,
V . Kumar, R. Zhang, M. D. Renzo, and L.-N. Tran, “A Novel SCA-Based Method for Beamforming Optimization in IRS/RIS-Assisted MU-MISO Downlink,”IEEE Wirel. Commun. Lett., vol. 12, no. 2, pp. 297–301, Feb. 2023
work page 2023
-
[25]
Joint Trajectory and Passive Beamforming Design for Secure UA V Networks with RIS,
H. Long et al., “Joint Trajectory and Passive Beamforming Design for Secure UA V Networks with RIS,”2020 IEEE Globecom Workshops (GC Wkshps, Taipei, Taiwan, 2020, pp. 1-6
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.