UAV Access Point Placement for Connectivity to a User with Unknown Location Using Deep RL
Pith reviewed 2026-05-25 00:38 UTC · model grok-4.3
The pith
Deep reinforcement learning places a UAV to reach target SINR for a ground user whose location is unknown, using only measurements and a 3D map.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A deep RL agent, trained to select discrete UAV movements, converges to placements that satisfy a prescribed SINR threshold for an unknown user location by using instantaneous SINR observations together with a static 3D topology map; under a cap on the number of placement steps the agent succeeds in 90 percent of ray-tracing trials and the same policy works in any urban environment.
What carries the argument
Deep RL agent that maps sequences of SINR values and 3D map features to UAV displacement actions in order to maximize link quality.
If this is right
- UAV access points can be commissioned without first locating or tracking ground users.
- The same trained policy applies to new city maps without additional training.
- Blockage and scattering are handled implicitly through the 3D map input rather than explicit channel equations.
- Convergence occurs reliably when the number of allowed moves is bounded.
Where Pith is reading between the lines
- The approach could be combined with energy or flight-time limits by adding those quantities to the reward signal.
- Extending the state to include multiple simultaneous users would allow one UAV to serve small clusters without changing the core learning loop.
- If the 3D map is updated on the fly from onboard sensors, the method could adapt to construction or seasonal foliage changes.
Load-bearing premise
Ray-tracing simulations reproduce the radio behavior of real urban environments closely enough that policies learned inside them transfer without further real-world measurements.
What would settle it
Deploy the learned policy on a physical UAV in a real city block and record whether it reaches the target SINR for a hidden user inside the allowed number of moves.
Figures
read the original abstract
In recent years, unmanned aerial vehicles (UAVs) have been considered for telecommunications purposes as relays, caches, or IoT data collectors. In addition to being easy to deploy, their maneuverability allows them to adjust their location to optimize the capacity of the link to the user equipment on the ground or of the link to the basestation. The majority of the previous work that analyzes the optimal placement of such a UAV makes at least one of two assumptions: the channel can be predicted using a simple model or the locations of the users on the ground are known. In this paper, we use deep reinforcement learning (deep RL) to optimally place a UAV serving a ground user in an urban environment, without the previous knowledge of the channel or user location. Our algorithm relies on signal-to-interference-plus-noise ratio (SINR) measurements and a 3D map of the topology to account for blockage and scatterers. Furthermore, it is designed to operate in any urban environment. Results in conditions simulated by a ray tracing software show that with the constraint on the maximum number of iterations our algorithm has a 90% success rate in converging to a target SINR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a deep reinforcement learning algorithm to position a UAV serving a ground user of unknown location in an urban environment. The approach uses only SINR feedback and a 3D topology map to handle blockage and scattering without assuming a known channel model or user position. Ray-tracing simulations are reported to yield a 90% success rate in reaching a target SINR within a fixed iteration budget.
Significance. If the empirical result holds under the stated conditions, the work demonstrates that model-free deep RL can achieve reliable UAV placement in complex propagation environments where analytic channel models or localization are unavailable. This addresses a practical gap in prior UAV placement literature and supports deployment in arbitrary urban settings given only a map and SINR observations.
major comments (3)
- [Abstract and §4] Abstract and §4 (Simulation Results): the central claim of a 90% success rate is presented without stating the number of Monte Carlo trials, the distribution of user locations, the precise target SINR value, or the iteration budget, preventing assessment of statistical reliability and reproducibility of the reported performance.
- [§3 and §4] §3 (RL Formulation) and §4: the state representation, neural-network architecture, reward function, and training hyperparameters are described at a high level only; without these details the simulation result cannot be independently verified or extended.
- [§4 and §5] §4 and §5 (Discussion): no sensitivity study is provided on map registration error, building-material uncertainty, or dynamic scatterers, which directly affects the load-bearing assumption that the learned policy generalizes from the ray-tracing engine to the claimed “any urban environment.”
minor comments (2)
- [Notation] Notation for the SINR threshold and iteration limit should be introduced once and used consistently across text, equations, and figures.
- [§4] Figure captions in §4 would benefit from explicit mention of the ray-tracing parameters (frequency, materials, diffraction model) used in each scenario.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to improve clarity, reproducibility, and discussion of limitations.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Simulation Results): the central claim of a 90% success rate is presented without stating the number of Monte Carlo trials, the distribution of user locations, the precise target SINR value, or the iteration budget, preventing assessment of statistical reliability and reproducibility of the reported performance.
Authors: We agree that these parameters are essential for assessing statistical reliability. In the revised version, we will explicitly state the number of Monte Carlo trials performed, the distribution used for sampling user locations, the exact target SINR threshold, and the maximum iteration budget. These details were used in generating the reported 90% success rate but were omitted in the original submission for conciseness. revision: yes
-
Referee: [§3 and §4] §3 (RL Formulation) and §4: the state representation, neural-network architecture, reward function, and training hyperparameters are described at a high level only; without these details the simulation result cannot be independently verified or extended.
Authors: We acknowledge that the current descriptions are insufficient for independent verification. The revised manuscript will expand §3 to provide the precise state vector definition, the full neural network architecture (including layer sizes and activation functions), the mathematical form of the reward function, and all training hyperparameters such as learning rate, batch size, and episode length. revision: yes
-
Referee: [§4 and §5] §4 and §5 (Discussion): no sensitivity study is provided on map registration error, building-material uncertainty, or dynamic scatterers, which directly affects the load-bearing assumption that the learned policy generalizes from the ray-tracing engine to the claimed “any urban environment.”
Authors: The referee correctly identifies a gap in the robustness analysis. While a full sensitivity study was not performed due to computational constraints, we will revise §5 to include an explicit discussion of these limitations, clarifying that the policy was trained and evaluated under the specific ray-tracing assumptions and may not directly generalize without retraining when map or material errors are present. We will also add a note on potential extensions for dynamic environments. revision: partial
Circularity Check
No circularity; results are independent simulation outcomes.
full rationale
The paper describes a deep RL policy for UAV placement that receives SINR feedback and a 3D map, then reports an empirical 90% success rate under a fixed iteration budget inside a ray-tracing engine. No equations, fitted parameters, or self-citations are presented that reduce the reported success rate to a definitional identity or to the training data by construction. The central claim is an observed performance statistic from separate simulation trials, not a renaming, ansatz, or self-referential prediction. This is the normal case of an empirical RL paper whose validity rests on external simulation fidelity rather than internal definitional closure.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
UA V-assisted dynamic coverage in a heterogeneous cellular system,
Y . Li and L. Cai, “UA V-assisted dynamic coverage in a heterogeneous cellular system,” IEEE Network, vol. 31, no. 4, pp. 56–61, 2017
work page 2017
-
[2]
Capacity Characterization of UAV-Enabled Two-User Broadcast Channel
Q. Wu, J. Xu, and R. Zhang, “UA V-enabled aerial base station (BS) III/III: Capacity characterization of UA V-enabled two-user broadcast channel,” arXiv preprint arXiv:1801.00443 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
Efficient de- ployment of multiple unmanned aerial vehicles for optimal wireless coverage.,
M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Efficient de- ployment of multiple unmanned aerial vehicles for optimal wireless coverage.,” IEEE Communications Letters , vol. 20, no. 8, pp. 1647– 1650, 2016
work page 2016
-
[4]
Unmanned aerial vehicle-aided communications: Joint transmit power and trajectory optimization,
H. Wang, G. Ren, J. Chen, G. Ding, and Y . Yang, “Unmanned aerial vehicle-aided communications: Joint transmit power and trajectory optimization,” IEEE Wireless Communications Letters , vol. 7, no. 4, pp. 522–525, 2018
work page 2018
-
[5]
Y . Jin, Y . D. Zhang, and B. K. Chalise, “Joint optimization of relay po- sition and power allocation in cooperative broadcast wireless networks,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2493–2496, Mar. 2012
work page 2012
-
[6]
Throughput Maximization for UA V- Enabled Mobile Relaying Systems,
Y . Zeng, R. Zhang, and T. J. Lim, “Throughput Maximization for UA V- Enabled Mobile Relaying Systems,” IEEE Transactions on Communi- cations, vol. 64, pp. 4983–4996, Dec. 2016
work page 2016
-
[7]
Path planning for a connectivity seeking robot,
A. Muralidharan and Y . Mostofi, “Path planning for a connectivity seeking robot,” in 2017 IEEE Globecom Workshops (GC Wkshps), pp. 1– 6, IEEE, 2017
work page 2017
-
[8]
Optimal positioning of commu- nication relay unmanned aerial vehicles in urban environments,
P. Ladosz, H. Oh, and W.-H. Chen, “Optimal positioning of commu- nication relay unmanned aerial vehicles in urban environments,” in 2016 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 1140–1147, IEEE, 2016
work page 2016
-
[9]
Optimal positioning of flying relays for wireless networks: A los map approach,
J. Chen and D. Gesbert, “Optimal positioning of flying relays for wireless networks: A los map approach,” in 2017 IEEE International Conference on Communications (ICC) , pp. 1–6, IEEE, 2017
work page 2017
-
[10]
Learning to communicate in UA V-aided wireless networks: Map-based approaches,
O. Esrafilian, R. Gangula, and D. Gesbert, “Learning to communicate in UA V-aided wireless networks: Map-based approaches,”IEEE Internet of Things Journal , 2018
work page 2018
-
[11]
UA V-relay placement with unknown user locations and channel parameters,
O. Esrafilian, R. Gangula, and D. Gesbert, “UA V-relay placement with unknown user locations and channel parameters,” in 2018 52nd Asilomar Conference on Signals, Systems, and Computers , pp. 1075–1079, IEEE, 2018
work page 2018
-
[12]
Deployment and movement for multiple aerial base stations by reinforcement learning,
X. Liu, Y . Liu, and Y . Chen, “Deployment and movement for multiple aerial base stations by reinforcement learning,” in 2018 IEEE Globecom Workshops (GC Wkshps), pp. 1–6, IEEE, 2018
work page 2018
-
[13]
RSS-Based Q-Learning for Indoor UAV Navigation
M. M. U. Chowdhury, F. Erden, and I. Guvenc, “RSS-based Q-learning for indoor uav navigation,” arXiv preprint arXiv:1905.13406 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[14]
Playing Atari with Deep Reinforcement Learning
V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- stra, and M. Riedmiller, “Playing Atari with deep reinforcement learn- ing,” arXiv preprint arXiv:1312.5602 , 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[15]
Deep reinforcement learning with double Q-learning,
H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016
work page 2016
-
[16]
Dueling Network Architectures for Deep Reinforcement Learning
Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581 , 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[17]
Wireless EM propagation software - Wireless InSite
“Wireless EM propagation software - Wireless InSite.” URL: https://www.remcom.com/wireless-insite-em-propagation-software
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.