UAV Access Point Placement for Connectivity to a User with Unknown Location Using Deep RL

Danijela Cabric; Enes Krijestorac; Samer Hanna

arxiv: 1907.03912 · v2 · pith:Z42G4HT7new · submitted 2019-07-09 · 📡 eess.SP · cs.SY· eess.SY

UAV Access Point Placement for Connectivity to a User with Unknown Location Using Deep RL

Enes Krijestorac , Samer Hanna , Danijela Cabric This is my paper

Pith reviewed 2026-05-25 00:38 UTC · model grok-4.3

classification 📡 eess.SP cs.SYeess.SY

keywords UAV placementdeep reinforcement learningSINRunknown user locationurban environmentray tracingaccess point positioning

0 comments

The pith

Deep reinforcement learning places a UAV to reach target SINR for a ground user whose location is unknown, using only measurements and a 3D map.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a deep RL controller can move a UAV access point in an urban setting to serve a ground user even when the user's position and the radio channel are both unknown ahead of time. The controller receives only SINR feedback at each step and consults a 3D city map to avoid buildings that block the signal. A reader would care because this removes the usual requirements for user tracking hardware or detailed channel models, letting a UAV be deployed on demand in any city layout. Ray-tracing simulations indicate the method reaches the target SINR in 90 percent of trials when limited to a maximum number of moves. The result holds across different urban maps without retraining the agent.

Core claim

A deep RL agent, trained to select discrete UAV movements, converges to placements that satisfy a prescribed SINR threshold for an unknown user location by using instantaneous SINR observations together with a static 3D topology map; under a cap on the number of placement steps the agent succeeds in 90 percent of ray-tracing trials and the same policy works in any urban environment.

What carries the argument

Deep RL agent that maps sequences of SINR values and 3D map features to UAV displacement actions in order to maximize link quality.

If this is right

UAV access points can be commissioned without first locating or tracking ground users.
The same trained policy applies to new city maps without additional training.
Blockage and scattering are handled implicitly through the 3D map input rather than explicit channel equations.
Convergence occurs reliably when the number of allowed moves is bounded.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be combined with energy or flight-time limits by adding those quantities to the reward signal.
Extending the state to include multiple simultaneous users would allow one UAV to serve small clusters without changing the core learning loop.
If the 3D map is updated on the fly from onboard sensors, the method could adapt to construction or seasonal foliage changes.

Load-bearing premise

Ray-tracing simulations reproduce the radio behavior of real urban environments closely enough that policies learned inside them transfer without further real-world measurements.

What would settle it

Deploy the learned policy on a physical UAV in a real city block and record whether it reaches the target SINR for a hidden user inside the allowed number of moves.

Figures

Figures reproduced from arXiv: 1907.03912 by Danijela Cabric, Enes Krijestorac, Samer Hanna.

**Figure 2.** Figure 2: At the input, there are two 2D arrays, corresponding [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: The testing urban environment. Approximate size: 400x500 m. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: The training results for the proposed model and blind model that does [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Successful trajectories of the UAV moving according to our algorithm. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: The CDF of the number of steps until convergence to the sufficient [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

read the original abstract

In recent years, unmanned aerial vehicles (UAVs) have been considered for telecommunications purposes as relays, caches, or IoT data collectors. In addition to being easy to deploy, their maneuverability allows them to adjust their location to optimize the capacity of the link to the user equipment on the ground or of the link to the basestation. The majority of the previous work that analyzes the optimal placement of such a UAV makes at least one of two assumptions: the channel can be predicted using a simple model or the locations of the users on the ground are known. In this paper, we use deep reinforcement learning (deep RL) to optimally place a UAV serving a ground user in an urban environment, without the previous knowledge of the channel or user location. Our algorithm relies on signal-to-interference-plus-noise ratio (SINR) measurements and a 3D map of the topology to account for blockage and scatterers. Furthermore, it is designed to operate in any urban environment. Results in conditions simulated by a ray tracing software show that with the constraint on the maximum number of iterations our algorithm has a 90% success rate in converging to a target SINR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Deep RL places a UAV using only SINR feedback and a 3D map, hitting 90% success in ray-tracing sims but without any real-channel checks.

read the letter

The paper's main contribution is a deep RL policy that moves a UAV to serve a ground user whose location is unknown, using SINR measurements and a 3D city map to handle blockages. It drops the usual assumptions of known positions or simple path-loss formulas and reports that the policy reaches the target SINR in 90% of trials within a fixed iteration budget inside a ray-tracing simulator. The setup is designed to work across different urban layouts without extra calibration. That is the concrete advance: a method that operates with minimal prior information. The simulation results give a clear, quantified demonstration that the agent can learn effective placements from the map and feedback alone. The approach is straightforward to implement in the described environment and addresses a practical gap in UAV deployment work. The central limitation is that every number comes from the same ray-tracing engine. No measured channel data or hardware test is shown, so the 90% figure could shift with map registration errors, dynamic scatterers, or different building materials. The abstract leaves the exact network architecture and training hyperparameters implicit, though the full text presumably supplies them. Minor questions also remain about how the state is encoded from the map and how the reward is shaped, but those are typical for this style of paper. This work is aimed at researchers applying RL to wireless network problems rather than theorists. Readers who need a working example of SINR-driven UAV control in urban settings will find usable details. It is worth sending for peer review; the simulation evidence is reproducible enough to support referee discussion on generalization and implementation choices.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a deep reinforcement learning algorithm to position a UAV serving a ground user of unknown location in an urban environment. The approach uses only SINR feedback and a 3D topology map to handle blockage and scattering without assuming a known channel model or user position. Ray-tracing simulations are reported to yield a 90% success rate in reaching a target SINR within a fixed iteration budget.

Significance. If the empirical result holds under the stated conditions, the work demonstrates that model-free deep RL can achieve reliable UAV placement in complex propagation environments where analytic channel models or localization are unavailable. This addresses a practical gap in prior UAV placement literature and supports deployment in arbitrary urban settings given only a map and SINR observations.

major comments (3)

[Abstract and §4] Abstract and §4 (Simulation Results): the central claim of a 90% success rate is presented without stating the number of Monte Carlo trials, the distribution of user locations, the precise target SINR value, or the iteration budget, preventing assessment of statistical reliability and reproducibility of the reported performance.
[§3 and §4] §3 (RL Formulation) and §4: the state representation, neural-network architecture, reward function, and training hyperparameters are described at a high level only; without these details the simulation result cannot be independently verified or extended.
[§4 and §5] §4 and §5 (Discussion): no sensitivity study is provided on map registration error, building-material uncertainty, or dynamic scatterers, which directly affects the load-bearing assumption that the learned policy generalizes from the ray-tracing engine to the claimed “any urban environment.”

minor comments (2)

[Notation] Notation for the SINR threshold and iteration limit should be introduced once and used consistently across text, equations, and figures.
[§4] Figure captions in §4 would benefit from explicit mention of the ray-tracing parameters (frequency, materials, diffraction model) used in each scenario.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to improve clarity, reproducibility, and discussion of limitations.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Simulation Results): the central claim of a 90% success rate is presented without stating the number of Monte Carlo trials, the distribution of user locations, the precise target SINR value, or the iteration budget, preventing assessment of statistical reliability and reproducibility of the reported performance.

Authors: We agree that these parameters are essential for assessing statistical reliability. In the revised version, we will explicitly state the number of Monte Carlo trials performed, the distribution used for sampling user locations, the exact target SINR threshold, and the maximum iteration budget. These details were used in generating the reported 90% success rate but were omitted in the original submission for conciseness. revision: yes
Referee: [§3 and §4] §3 (RL Formulation) and §4: the state representation, neural-network architecture, reward function, and training hyperparameters are described at a high level only; without these details the simulation result cannot be independently verified or extended.

Authors: We acknowledge that the current descriptions are insufficient for independent verification. The revised manuscript will expand §3 to provide the precise state vector definition, the full neural network architecture (including layer sizes and activation functions), the mathematical form of the reward function, and all training hyperparameters such as learning rate, batch size, and episode length. revision: yes
Referee: [§4 and §5] §4 and §5 (Discussion): no sensitivity study is provided on map registration error, building-material uncertainty, or dynamic scatterers, which directly affects the load-bearing assumption that the learned policy generalizes from the ray-tracing engine to the claimed “any urban environment.”

Authors: The referee correctly identifies a gap in the robustness analysis. While a full sensitivity study was not performed due to computational constraints, we will revise §5 to include an explicit discussion of these limitations, clarifying that the policy was trained and evaluated under the specific ray-tracing assumptions and may not directly generalize without retraining when map or material errors are present. We will also add a note on potential extensions for dynamic environments. revision: partial

Circularity Check

0 steps flagged

No circularity; results are independent simulation outcomes.

full rationale

The paper describes a deep RL policy for UAV placement that receives SINR feedback and a 3D map, then reports an empirical 90% success rate under a fixed iteration budget inside a ray-tracing engine. No equations, fitted parameters, or self-citations are presented that reduce the reported success rate to a definitional identity or to the training data by construction. The central claim is an observed performance statistic from separate simulation trials, not a renaming, ansatz, or self-referential prediction. This is the normal case of an empirical RL paper whose validity rests on external simulation fidelity rather than internal definitional closure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no specific free parameters, axioms, or invented entities identifiable. The deep RL likely involves standard hyperparameters such as learning rate but these are not detailed.

pith-pipeline@v0.9.0 · 5750 in / 1143 out tokens · 29833 ms · 2026-05-25T00:38:36.872813+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 4 internal anchors

[1]

UA V-assisted dynamic coverage in a heterogeneous cellular system,

Y . Li and L. Cai, “UA V-assisted dynamic coverage in a heterogeneous cellular system,” IEEE Network, vol. 31, no. 4, pp. 56–61, 2017

work page 2017
[2]

Capacity Characterization of UAV-Enabled Two-User Broadcast Channel

Q. Wu, J. Xu, and R. Zhang, “UA V-enabled aerial base station (BS) III/III: Capacity characterization of UA V-enabled two-user broadcast channel,” arXiv preprint arXiv:1801.00443 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Efﬁcient de- ployment of multiple unmanned aerial vehicles for optimal wireless coverage.,

M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Efﬁcient de- ployment of multiple unmanned aerial vehicles for optimal wireless coverage.,” IEEE Communications Letters , vol. 20, no. 8, pp. 1647– 1650, 2016

work page 2016
[4]

Unmanned aerial vehicle-aided communications: Joint transmit power and trajectory optimization,

H. Wang, G. Ren, J. Chen, G. Ding, and Y . Yang, “Unmanned aerial vehicle-aided communications: Joint transmit power and trajectory optimization,” IEEE Wireless Communications Letters , vol. 7, no. 4, pp. 522–525, 2018

work page 2018
[5]

Joint optimization of relay po- sition and power allocation in cooperative broadcast wireless networks,

Y . Jin, Y . D. Zhang, and B. K. Chalise, “Joint optimization of relay po- sition and power allocation in cooperative broadcast wireless networks,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2493–2496, Mar. 2012

work page 2012
[6]

Throughput Maximization for UA V- Enabled Mobile Relaying Systems,

Y . Zeng, R. Zhang, and T. J. Lim, “Throughput Maximization for UA V- Enabled Mobile Relaying Systems,” IEEE Transactions on Communi- cations, vol. 64, pp. 4983–4996, Dec. 2016

work page 2016
[7]

Path planning for a connectivity seeking robot,

A. Muralidharan and Y . Mostoﬁ, “Path planning for a connectivity seeking robot,” in 2017 IEEE Globecom Workshops (GC Wkshps), pp. 1– 6, IEEE, 2017

work page 2017
[8]

Optimal positioning of commu- nication relay unmanned aerial vehicles in urban environments,

P. Ladosz, H. Oh, and W.-H. Chen, “Optimal positioning of commu- nication relay unmanned aerial vehicles in urban environments,” in 2016 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 1140–1147, IEEE, 2016

work page 2016
[9]

Optimal positioning of ﬂying relays for wireless networks: A los map approach,

J. Chen and D. Gesbert, “Optimal positioning of ﬂying relays for wireless networks: A los map approach,” in 2017 IEEE International Conference on Communications (ICC) , pp. 1–6, IEEE, 2017

work page 2017
[10]

Learning to communicate in UA V-aided wireless networks: Map-based approaches,

O. Esraﬁlian, R. Gangula, and D. Gesbert, “Learning to communicate in UA V-aided wireless networks: Map-based approaches,”IEEE Internet of Things Journal , 2018

work page 2018
[11]

UA V-relay placement with unknown user locations and channel parameters,

O. Esraﬁlian, R. Gangula, and D. Gesbert, “UA V-relay placement with unknown user locations and channel parameters,” in 2018 52nd Asilomar Conference on Signals, Systems, and Computers , pp. 1075–1079, IEEE, 2018

work page 2018
[12]

Deployment and movement for multiple aerial base stations by reinforcement learning,

X. Liu, Y . Liu, and Y . Chen, “Deployment and movement for multiple aerial base stations by reinforcement learning,” in 2018 IEEE Globecom Workshops (GC Wkshps), pp. 1–6, IEEE, 2018

work page 2018
[13]

RSS-Based Q-Learning for Indoor UAV Navigation

M. M. U. Chowdhury, F. Erden, and I. Guvenc, “RSS-based Q-learning for indoor uav navigation,” arXiv preprint arXiv:1905.13406 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[14]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- stra, and M. Riedmiller, “Playing Atari with deep reinforcement learn- ing,” arXiv preprint arXiv:1312.5602 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[15]

Deep reinforcement learning with double Q-learning,

H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Thirtieth AAAI Conference on Artiﬁcial Intelligence, 2016

work page 2016
[16]

Dueling Network Architectures for Deep Reinforcement Learning

Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[17]

Wireless EM propagation software - Wireless InSite

“Wireless EM propagation software - Wireless InSite.” URL: https://www.remcom.com/wireless-insite-em-propagation-software

work page

[1] [1]

UA V-assisted dynamic coverage in a heterogeneous cellular system,

Y . Li and L. Cai, “UA V-assisted dynamic coverage in a heterogeneous cellular system,” IEEE Network, vol. 31, no. 4, pp. 56–61, 2017

work page 2017

[2] [2]

Capacity Characterization of UAV-Enabled Two-User Broadcast Channel

Q. Wu, J. Xu, and R. Zhang, “UA V-enabled aerial base station (BS) III/III: Capacity characterization of UA V-enabled two-user broadcast channel,” arXiv preprint arXiv:1801.00443 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[3] [3]

Efﬁcient de- ployment of multiple unmanned aerial vehicles for optimal wireless coverage.,

M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Efﬁcient de- ployment of multiple unmanned aerial vehicles for optimal wireless coverage.,” IEEE Communications Letters , vol. 20, no. 8, pp. 1647– 1650, 2016

work page 2016

[4] [4]

Unmanned aerial vehicle-aided communications: Joint transmit power and trajectory optimization,

H. Wang, G. Ren, J. Chen, G. Ding, and Y . Yang, “Unmanned aerial vehicle-aided communications: Joint transmit power and trajectory optimization,” IEEE Wireless Communications Letters , vol. 7, no. 4, pp. 522–525, 2018

work page 2018

[5] [5]

Joint optimization of relay po- sition and power allocation in cooperative broadcast wireless networks,

Y . Jin, Y . D. Zhang, and B. K. Chalise, “Joint optimization of relay po- sition and power allocation in cooperative broadcast wireless networks,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2493–2496, Mar. 2012

work page 2012

[6] [6]

Throughput Maximization for UA V- Enabled Mobile Relaying Systems,

Y . Zeng, R. Zhang, and T. J. Lim, “Throughput Maximization for UA V- Enabled Mobile Relaying Systems,” IEEE Transactions on Communi- cations, vol. 64, pp. 4983–4996, Dec. 2016

work page 2016

[7] [7]

Path planning for a connectivity seeking robot,

A. Muralidharan and Y . Mostoﬁ, “Path planning for a connectivity seeking robot,” in 2017 IEEE Globecom Workshops (GC Wkshps), pp. 1– 6, IEEE, 2017

work page 2017

[8] [8]

Optimal positioning of commu- nication relay unmanned aerial vehicles in urban environments,

P. Ladosz, H. Oh, and W.-H. Chen, “Optimal positioning of commu- nication relay unmanned aerial vehicles in urban environments,” in 2016 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 1140–1147, IEEE, 2016

work page 2016

[9] [9]

Optimal positioning of ﬂying relays for wireless networks: A los map approach,

J. Chen and D. Gesbert, “Optimal positioning of ﬂying relays for wireless networks: A los map approach,” in 2017 IEEE International Conference on Communications (ICC) , pp. 1–6, IEEE, 2017

work page 2017

[10] [10]

Learning to communicate in UA V-aided wireless networks: Map-based approaches,

O. Esraﬁlian, R. Gangula, and D. Gesbert, “Learning to communicate in UA V-aided wireless networks: Map-based approaches,”IEEE Internet of Things Journal , 2018

work page 2018

[11] [11]

UA V-relay placement with unknown user locations and channel parameters,

O. Esraﬁlian, R. Gangula, and D. Gesbert, “UA V-relay placement with unknown user locations and channel parameters,” in 2018 52nd Asilomar Conference on Signals, Systems, and Computers , pp. 1075–1079, IEEE, 2018

work page 2018

[12] [12]

Deployment and movement for multiple aerial base stations by reinforcement learning,

X. Liu, Y . Liu, and Y . Chen, “Deployment and movement for multiple aerial base stations by reinforcement learning,” in 2018 IEEE Globecom Workshops (GC Wkshps), pp. 1–6, IEEE, 2018

work page 2018

[13] [13]

RSS-Based Q-Learning for Indoor UAV Navigation

M. M. U. Chowdhury, F. Erden, and I. Guvenc, “RSS-based Q-learning for indoor uav navigation,” arXiv preprint arXiv:1905.13406 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[14] [14]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- stra, and M. Riedmiller, “Playing Atari with deep reinforcement learn- ing,” arXiv preprint arXiv:1312.5602 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[15] [15]

Deep reinforcement learning with double Q-learning,

H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Thirtieth AAAI Conference on Artiﬁcial Intelligence, 2016

work page 2016

[16] [16]

Dueling Network Architectures for Deep Reinforcement Learning

Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[17] [17]

Wireless EM propagation software - Wireless InSite

“Wireless EM propagation software - Wireless InSite.” URL: https://www.remcom.com/wireless-insite-em-propagation-software

work page