arxiv: 2605.05240 · v1 · submitted 2026-05-03 · 📡 eess.SP · cs.AI

Recognition: unknown

PPO-Based Dynamic Positioning of HAPS-BS in Wind-Disturbed Stratospheric Maritime Networks

Azim Akhtarshenas , German Svistunov , Matteo Bernab\`e , Kuangyu Zheng , David L\'opez-P\'erez

Authors on Pith no claims yet

Pith reviewed 2026-05-09 16:16 UTC · model grok-4.3

classification 📡 eess.SP cs.AI

keywords HAPSdynamic positioningPPOreinforcement learningmaritime networkswind disturbanceswireless coveragestratospheric platforms

0 comments

The pith

A PPO reinforcement learning agent coordinates multiple high-altitude platforms to hold position against wind and serve moving ships.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep reinforcement learning framework in which a centralized PPO agent on one coordinator HAPS adjusts the positions of several serving HAPS. The agent receives radio measurements and network feedback to counteract stratospheric wind drift and ship mobility. If the learned policies work, they deliver stable wide-area wireless coverage and higher throughput over open ocean where no terrestrial base stations exist. Simulation experiments are used to demonstrate that positioning deviations shrink and connectivity remains reliable under realistic wind and mobility conditions.

Core claim

The central claim is that a Proximal Policy Optimization algorithm inside a centralized DRL controller, trained on radio and network feedback, produces positioning actions that reduce wind-induced deviations of HAPS base stations and thereby maintain reliable coverage and throughput for mobile maritime users.

What carries the argument

A centralized PPO-based deep reinforcement learning agent that maps radio measurements and network state into coordinated positioning commands for multiple wind-disturbed HAPS.

If this is right

Wind-induced positioning errors decrease enough to preserve continuous coverage for moving ships.
System throughput stays higher than with static or non-learning positioning under the same disturbances.
A single coordinator HAPS can manage multiple serving platforms without requiring direct ground control.
Coverage extends reliably into maritime regions that lack terrestrial infrastructure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same centralized learning structure could be tested with real-time wind forecasts fed as additional observations.
If the agent runs on actual HAPS hardware, energy and compute limits would become the next practical constraint to measure.
The framework might apply to other high-altitude vehicles such as solar-powered aircraft once their dynamics replace the current wind model.

Load-bearing premise

The models used for stratospheric winds, ship motion, and radio propagation match real-world behavior closely enough that policies trained in simulation will transfer to hardware, and the agent can run in real time on actual HAPS platforms.

What would settle it

Deploy a real HAPS system in stratospheric winds, log the actual positioning errors and user throughput under the PPO policy, and compare those measured values directly against the simulation results reported in the paper.

Figures

Figures reproduced from arXiv: 2605.05240 by Azim Akhtarshenas, David L\'opez-P\'erez, German Svistunov, Kuangyu Zheng, Matteo Bernab\`e.

**Figure 1.** Figure 1: HAPS-assisted maritime communication. τu,d is the shadowing gain and gu,d is the reflector antenna gain, computed from the third generation partnership project (3GPP) statistical channel model defined in [20]. Specifically, path loss gain pu,d is modeled following ITUR recommendations [21] as follows, pu,d = 1/pfspl u,d p cl u,d p ga u,d p ra u,d p ca u,d p sa u,d , (2) where p fspl u,d is the free space … view at source ↗

**Figure 2.** Figure 2: Performance of the proposed HAPS-BS network under stratospheric wind disturbances at an altitude of view at source ↗

read the original abstract

High-Altitude Platform Stations (HAPS) offer a promising solution for wide-area wireless coverage in maritime regions lacking terrestrial infrastructure. However, maintaining reliable performance is challenging due to dynamic ship mobility and atmospheric disturbances, particularly stratospheric wind effects on HAPS positioning. This paper proposes a deep reinforcement learning (DRL)-based framework for dynamic positioning of wind-disturbed HAPS-mounted base stations in maritime networks. A centralized DRL agent deployed on a coordinator HAPS controls multiple serving HAPS using radio measurements and network feedback, capturing realistic channel conditions and user mobility. A Proximal Policy Optimization (PPO) algorithm is employed to learn robust positioning policies that enhance coverage stability and system throughput under wind disturbances. Simulation results show that the proposed approach effectively mitigates wind-induced positioning deviations while ensuring reliable wide-area connectivity for maritime users.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a centralized Proximal Policy Optimization (PPO) deep reinforcement learning framework in which a coordinator HAPS controls the positioning of multiple serving HAPS base stations in a wind-disturbed stratospheric maritime network. The agent uses radio measurements and network feedback to learn policies that reduce wind-induced positioning errors and maintain wide-area coverage and throughput for mobile maritime users. Simulation results are cited to support the claim that the approach mitigates deviations while ensuring reliable connectivity.

Significance. If the simulation models prove faithful to real stratospheric wind statistics, ship tracks, and maritime channels, the work would offer a concrete DRL-based control method for an emerging non-terrestrial network scenario of practical interest. The centralized PPO formulation that incorporates realistic channel and mobility feedback is a relevant technical contribution to adaptive HAPS deployment.

major comments (3)

[Abstract and § Simulation Results] Abstract and results section: the headline claim that the PPO policy 'effectively mitigates wind-induced positioning deviations' is unsupported by any reported quantitative metrics (e.g., RMS positioning error, throughput gain in bps/Hz, coverage probability), baseline comparisons (fixed HAPS, other RL algorithms, or model-predictive control), or error bars. Without these numbers the magnitude and statistical significance of the improvement cannot be assessed.
[§ System Model and § Simulation Setup] Simulation environment description: the wind disturbance process (turbulence spectrum, correlation time, altitude dependence), ship mobility model, and maritime radio propagation (path loss, fading, interference) are defined internally by the authors with no comparison to external data such as radiosonde/lidar wind campaigns, AIS ship tracks, or empirical maritime channel measurements. Because the reported gains are generated inside this synthetic environment, the central robustness claim is at risk of being an artifact of the chosen parameters rather than a general result.
[§ Proposed DRL Framework] PPO formulation: the reward function, state representation (radio measurements and network feedback), action space (HAPS positioning commands), and PPO-specific hyperparameters (learning rate, clip epsilon, etc.) are not specified. These choices are load-bearing for the learned policy and must be documented to allow reproduction or sensitivity analysis.

minor comments (2)

[§ System Model] Notation for HAPS altitude, wind velocity vectors, and channel gains should be defined consistently in a single table or early section to improve readability.
[Abstract] The abstract would benefit from one or two concrete performance numbers (even if preliminary) to give readers an immediate sense of the scale of improvement.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and § Simulation Results] Abstract and results section: the headline claim that the PPO policy 'effectively mitigates wind-induced positioning deviations' is unsupported by any reported quantitative metrics (e.g., RMS positioning error, throughput gain in bps/Hz, coverage probability), baseline comparisons (fixed HAPS, other RL algorithms, or model-predictive control), or error bars. Without these numbers the magnitude and statistical significance of the improvement cannot be assessed.

Authors: We agree that the abstract and results section would benefit from more explicit quantitative support and baseline comparisons to allow readers to assess the magnitude of improvements. While the simulation results section presents figures illustrating positioning stability and throughput under wind disturbances, we will revise both the abstract and the results section to report specific metrics (e.g., RMS positioning error reductions, throughput in bps/Hz, coverage probability) with error bars from multiple runs, and add direct comparisons against fixed HAPS positioning and alternative algorithms such as DDPG. These changes will be incorporated in the revised manuscript. revision: yes
Referee: [§ System Model and § Simulation Setup] Simulation environment description: the wind disturbance process (turbulence spectrum, correlation time, altitude dependence), ship mobility model, and maritime radio propagation (path loss, fading, interference) are defined internally by the authors with no comparison to external data such as radiosonde/lidar wind campaigns, AIS ship tracks, or empirical maritime channel measurements. Because the reported gains are generated inside this synthetic environment, the central robustness claim is at risk of being an artifact of the chosen parameters rather than a general result.

Authors: The referee correctly notes that our models are synthetic. The wind, mobility, and channel parameters are drawn from established theoretical models and literature references (e.g., von Kármán turbulence spectra for stratospheric winds and 3GPP NTN propagation models). We will expand the simulation setup section to explicitly cite these sources and add a limitations paragraph discussing parameter sensitivity. However, we cannot add direct empirical comparisons to new external datasets such as radiosonde or AIS data, as this would require additional data collection outside the scope of the current work. revision: partial
Referee: [§ Proposed DRL Framework] PPO formulation: the reward function, state representation (radio measurements and network feedback), action space (HAPS positioning commands), and PPO-specific hyperparameters (learning rate, clip epsilon, etc.) are not specified. These choices are load-bearing for the learned policy and must be documented to allow reproduction or sensitivity analysis.

Authors: We apologize for the lack of explicit detail in the main text. The overall PPO framework is described, but we will revise § Proposed DRL Framework to fully specify the reward function (a weighted combination of coverage, throughput, and positioning stability terms), the state representation (SINR measurements, user locations, and wind estimates), the continuous action space for HAPS position adjustments, and all PPO hyperparameters (learning rate, clip epsilon, batch size, etc.). This will be moved from any supplementary material into the main body to ensure reproducibility. revision: yes

standing simulated objections not resolved

Direct empirical validation or comparisons against specific external real-world datasets (radiosonde/lidar wind campaigns, AIS ship tracks, or empirical maritime channel measurements), as the study is based entirely on synthetic models and no such primary data collection was performed.

Circularity Check

0 steps flagged

No significant circularity; simulation evaluation is self-contained

full rationale

The paper proposes a PPO-based DRL controller for HAPS positioning and reports simulation outcomes under author-defined wind, mobility, and channel models. No derivation chain reduces a claimed result to its inputs by construction: there are no self-definitional equations, no fitted parameters renamed as independent predictions, and no load-bearing self-citations that import uniqueness or ansatzes. The simulation serves as a standard empirical check of the algorithm on synthetic scenarios rather than a tautological restatement of the modeling assumptions. External validation against real measurements is a separate correctness concern, not a circularity issue.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach depends on standard DRL training assumptions plus domain-specific models for wind and channels that are not independently validated outside the simulation.

free parameters (2)

PPO-specific hyperparameters (learning rate, clip epsilon, etc.)
Tuned values required to train the policy in the simulated environment.
Wind disturbance and channel model parameters
Parameters defining wind speed distributions and radio propagation that shape the observed performance.

axioms (1)

domain assumption Stratospheric wind effects and ship mobility can be faithfully represented by the chosen simulation models.
Invoked when claiming that simulation results translate to real-world mitigation of positioning deviations.

pith-pipeline@v0.9.0 · 5460 in / 1464 out tokens · 61667 ms · 2026-05-09T16:16:54.595894+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Maritime communications: A survey on enabling technologies, opportunities, and challenges,

F. S. Alqurashi, A. Trichili, N. Saeed, B. S. Ooi, and M.- S. Alouini, “Maritime communications: A survey on enabling technologies, opportunities, and challenges,”IEEE Internet of Things Journal, vol. 10, no. 4, pp. 3525–3547, 2022

2022
[2]

Maritime communication net- works: A survey on architecture, key technologies, and chal- lenges,

Z. Shang, X. Zhang, and X. Li, “Maritime communication net- works: A survey on architecture, key technologies, and chal- lenges,”Computer Communications, vol. 241, p. 108255, 2025

2025
[3]

Mar- itime communications — current state and the future potential with SDN and SDR,

N. Niknami, A. Srinivasan, K. St. Germain, and J. Wu, “Mar- itime communications — current state and the future potential with SDN and SDR,”Network, vol. 3, no. 4, pp. 563–584, 2023

2023
[4]

Hy- brid satellite-terrestrial communication networks for the mar- itime internet of things: Key technologies, opportunities, and challenges,

T. Wei, W. Feng, Y. Chen, C.-X. Wang, N. Ge, and J. Lu, “Hy- brid satellite-terrestrial communication networks for the mar- itime internet of things: Key technologies, opportunities, and challenges,”IEEE Internet of things journal, vol. 8, no. 11, pp. 8910–8934, 2021

2021
[5]

Bridging earth and space: A survey on haps for non-terrestrial networks,

G. Svistunov, A. Akhtarshenas, D. López-Pérez, M. Giordani, G. Geraci, and H. Yanikomeroglu, “Bridging earth and space: A survey on haps for non-terrestrial networks,”arXiv preprint arXiv:2510.19731, 2025

work page arXiv 2025
[6]

Optimal topology design of high altitude platform based maritime broadband communica- tion networks,

J. Duan, T. Zhao, and B. Lin, “Optimal topology design of high altitude platform based maritime broadband communica- tion networks,” inInternational Conference on Combinatorial Optimization and Applications. Springer, 2017, pp. 462–470

2017
[7]

Topological optimization algorithm for HAP assisted multi-unmanned ships communication,

H. Cao, T. Yang, Z. Yin, X. Sun, and D. Li, “Topological optimization algorithm for HAP assisted multi-unmanned ships communication,” in2020 IEEE 92nd Vehicular Technology Con- ference (VTC2020-Fall). IEEE, 2020, pp. 1–5

2020
[8]

HAPS-enabled down- link coverage enhancement in islands and maritime areas,

H. Lin, M. A. Kishk, and M.-S. Alouini, “HAPS-enabled down- link coverage enhancement in islands and maritime areas,”IEEE Transactions on Wireless Communications, 2026

2026
[9]

Highaltitudeplatformstationbasedsupermacrobasesta- tion constellations,

M. S. Alam, G. K. Kurt, H. Yanikomeroglu, P. Zhu, and N. D. Ðào,“Highaltitudeplatformstationbasedsupermacrobasesta- tion constellations,”IEEE Communications Magazine, vol. 59, no. 1, pp. 103–109, 2021

2021
[10]

Reinforcement learning-based cloud-aware haps trajectory optimization in soft- switching hybrid FSO/RF cooperative transmission system,

B. Cui, S. Cai, L. Wang, Z. Zhang, and F. Wang, “Reinforcement learning-based cloud-aware haps trajectory optimization in soft- switching hybrid FSO/RF cooperative transmission system,” Sensors, vol. 26, no. 3, p. 948, 2026

2026
[11]

Station- keeping HAPS mission through optimal sprint and drift trajec- tories,

A. Delgado, D. Domínguez, J. Gonzalo, and A. Escapa, “Station- keeping HAPS mission through optimal sprint and drift trajec- tories,”Aerospace Science and Technology, vol. 152, p. 109365, 2024

2024
[12]

A comparison of loon balloon ob- servations and stratospheric reanalysis products,

L. S. Friedrich, A. J. McDonald, G. E. Bodeker, K. E. Cooper, J. Lewis, and A. J. Paterson, “A comparison of loon balloon ob- servations and stratospheric reanalysis products,”Atmospheric Chemistry and Physics, vol. 17, no. 2, pp. 855–866, 2017

2017
[13]

Station- keeping performance analysis for high altitude balloon with alti- tude control system,

H. Du, M. Lv, J. Li, W. Zhu, L. Zhang, and Y. Wu, “Station- keeping performance analysis for high altitude balloon with alti- tude control system,”Aerospace Science and Technology, vol. 92, pp. 644–652, 2019

2019
[14]

Maritimecover- age analysis in altitude-controlled balloons with wind-dependent trajectories,

T. Hirai, T. Iizuka, N. Fukui, N. Endo, R. Yamamoto, Y.Umemiya,H.Matsubara,andN.Wakamiya,“Maritimecover- age analysis in altitude-controlled balloons with wind-dependent trajectories,”Authorea Preprints, 2025

2025
[15]

AI- aided integrated terrestrial and non-terrestrial 6G solutions for sustainable maritime networking,

S. Saafi, O. Vikhrova, G. Fodor, J. Hosek, and S. Andreev, “AI- aided integrated terrestrial and non-terrestrial 6G solutions for sustainable maritime networking,”IEEE Network, vol. 36, no. 3, pp. 183–190, 2022

2022
[16]

Deep reinforcement learning based compu- tation offloading and resource allocation strategy for maritime internet of things,

Y. Xu and Q. Yu, “Deep reinforcement learning based compu- tation offloading and resource allocation strategy for maritime internet of things,”Computer Networks, vol. 264, p. 111221, 2025

2025
[17]

Optimizing UAV aerial base station flights using DRL-based proximal policy optimization,

M. R. Ibáñez, A. Akhtarshenas, D. López-Pérez, and G. Geraci, “Optimizing UAV aerial base station flights using DRL-based proximal policy optimization,” in2025 IEEE ICC Workshops. IEEE, 2025, pp. 1293–1298

2025
[18]

Global assimilation of x project loon stratospheric balloon observations,

L. Coy, M. R. Schoeberl, S. Pawson, S. Candido, and R. W. Carver, “Global assimilation of x project loon stratospheric balloon observations,” inAGU Fall Meeting, no. GSFC-E-DAA- TN50500, 2017

2017
[19]

Station-keeping for high- altitude balloon with reinforcement learning,

Z. Xu, Y. Liu, H. Du, and M. Lv, “Station-keeping for high- altitude balloon with reinforcement learning,”Advances in Space Research, vol. 70, no. 3, pp. 733–751, 2022

2022
[20]

2020, v.15.4

Technical Specification Group Radio Access Network; Study on New Radio (NR) to support non-terrestrial networks, 3GPP TR38.811, Sep. 2020, v.15.4

2020
[21]

Propagation data and prediction methods required for the design of earth-space telecommunication systems,

ITU-R, “Propagation data and prediction methods required for the design of earth-space telecommunication systems,” Inter- national Telecommunication Union, Recommendation P.618-14, December 2021. [Online]. Available: https://www.itu.int

2021
[22]

2024, v.18.0

Study on channel model for frequencies from 0.5 to 100 GHz, 3GPP TR38.901, Mar. 2024, v.18.0

2024
[23]

Sum rate and max-min rate for cellular-enabled UAV swarm networks,

B. Yang, Y. Dang, T. Taleb, S. Shen, and X. Jiang, “Sum rate and max-min rate for cellular-enabled UAV swarm networks,” IEEE Transactions on Vehicular Technology, vol. 72, no. 1, pp. 1073–1083, 2023

2023
[24]

Reinforcementlearning:anintroduction,

R.S.Sutton,“Reinforcementlearning:anintroduction,”A Brad- ford Book, 2018

2018
[25]

Proximal Policy Optimization Algorithms

J.Schulman, F.Wolski, P.Dhariwal,A.Radford, andO.Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017