Recognition: unknown
PPO-Based Dynamic Positioning of HAPS-BS in Wind-Disturbed Stratospheric Maritime Networks
Pith reviewed 2026-05-09 16:16 UTC · model grok-4.3
The pith
A PPO reinforcement learning agent coordinates multiple high-altitude platforms to hold position against wind and serve moving ships.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a Proximal Policy Optimization algorithm inside a centralized DRL controller, trained on radio and network feedback, produces positioning actions that reduce wind-induced deviations of HAPS base stations and thereby maintain reliable coverage and throughput for mobile maritime users.
What carries the argument
A centralized PPO-based deep reinforcement learning agent that maps radio measurements and network state into coordinated positioning commands for multiple wind-disturbed HAPS.
If this is right
- Wind-induced positioning errors decrease enough to preserve continuous coverage for moving ships.
- System throughput stays higher than with static or non-learning positioning under the same disturbances.
- A single coordinator HAPS can manage multiple serving platforms without requiring direct ground control.
- Coverage extends reliably into maritime regions that lack terrestrial infrastructure.
Where Pith is reading between the lines
- The same centralized learning structure could be tested with real-time wind forecasts fed as additional observations.
- If the agent runs on actual HAPS hardware, energy and compute limits would become the next practical constraint to measure.
- The framework might apply to other high-altitude vehicles such as solar-powered aircraft once their dynamics replace the current wind model.
Load-bearing premise
The models used for stratospheric winds, ship motion, and radio propagation match real-world behavior closely enough that policies trained in simulation will transfer to hardware, and the agent can run in real time on actual HAPS platforms.
What would settle it
Deploy a real HAPS system in stratospheric winds, log the actual positioning errors and user throughput under the PPO policy, and compare those measured values directly against the simulation results reported in the paper.
Figures
read the original abstract
High-Altitude Platform Stations (HAPS) offer a promising solution for wide-area wireless coverage in maritime regions lacking terrestrial infrastructure. However, maintaining reliable performance is challenging due to dynamic ship mobility and atmospheric disturbances, particularly stratospheric wind effects on HAPS positioning. This paper proposes a deep reinforcement learning (DRL)-based framework for dynamic positioning of wind-disturbed HAPS-mounted base stations in maritime networks. A centralized DRL agent deployed on a coordinator HAPS controls multiple serving HAPS using radio measurements and network feedback, capturing realistic channel conditions and user mobility. A Proximal Policy Optimization (PPO) algorithm is employed to learn robust positioning policies that enhance coverage stability and system throughput under wind disturbances. Simulation results show that the proposed approach effectively mitigates wind-induced positioning deviations while ensuring reliable wide-area connectivity for maritime users.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a centralized Proximal Policy Optimization (PPO) deep reinforcement learning framework in which a coordinator HAPS controls the positioning of multiple serving HAPS base stations in a wind-disturbed stratospheric maritime network. The agent uses radio measurements and network feedback to learn policies that reduce wind-induced positioning errors and maintain wide-area coverage and throughput for mobile maritime users. Simulation results are cited to support the claim that the approach mitigates deviations while ensuring reliable connectivity.
Significance. If the simulation models prove faithful to real stratospheric wind statistics, ship tracks, and maritime channels, the work would offer a concrete DRL-based control method for an emerging non-terrestrial network scenario of practical interest. The centralized PPO formulation that incorporates realistic channel and mobility feedback is a relevant technical contribution to adaptive HAPS deployment.
major comments (3)
- [Abstract and § Simulation Results] Abstract and results section: the headline claim that the PPO policy 'effectively mitigates wind-induced positioning deviations' is unsupported by any reported quantitative metrics (e.g., RMS positioning error, throughput gain in bps/Hz, coverage probability), baseline comparisons (fixed HAPS, other RL algorithms, or model-predictive control), or error bars. Without these numbers the magnitude and statistical significance of the improvement cannot be assessed.
- [§ System Model and § Simulation Setup] Simulation environment description: the wind disturbance process (turbulence spectrum, correlation time, altitude dependence), ship mobility model, and maritime radio propagation (path loss, fading, interference) are defined internally by the authors with no comparison to external data such as radiosonde/lidar wind campaigns, AIS ship tracks, or empirical maritime channel measurements. Because the reported gains are generated inside this synthetic environment, the central robustness claim is at risk of being an artifact of the chosen parameters rather than a general result.
- [§ Proposed DRL Framework] PPO formulation: the reward function, state representation (radio measurements and network feedback), action space (HAPS positioning commands), and PPO-specific hyperparameters (learning rate, clip epsilon, etc.) are not specified. These choices are load-bearing for the learned policy and must be documented to allow reproduction or sensitivity analysis.
minor comments (2)
- [§ System Model] Notation for HAPS altitude, wind velocity vectors, and channel gains should be defined consistently in a single table or early section to improve readability.
- [Abstract] The abstract would benefit from one or two concrete performance numbers (even if preliminary) to give readers an immediate sense of the scale of improvement.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and § Simulation Results] Abstract and results section: the headline claim that the PPO policy 'effectively mitigates wind-induced positioning deviations' is unsupported by any reported quantitative metrics (e.g., RMS positioning error, throughput gain in bps/Hz, coverage probability), baseline comparisons (fixed HAPS, other RL algorithms, or model-predictive control), or error bars. Without these numbers the magnitude and statistical significance of the improvement cannot be assessed.
Authors: We agree that the abstract and results section would benefit from more explicit quantitative support and baseline comparisons to allow readers to assess the magnitude of improvements. While the simulation results section presents figures illustrating positioning stability and throughput under wind disturbances, we will revise both the abstract and the results section to report specific metrics (e.g., RMS positioning error reductions, throughput in bps/Hz, coverage probability) with error bars from multiple runs, and add direct comparisons against fixed HAPS positioning and alternative algorithms such as DDPG. These changes will be incorporated in the revised manuscript. revision: yes
-
Referee: [§ System Model and § Simulation Setup] Simulation environment description: the wind disturbance process (turbulence spectrum, correlation time, altitude dependence), ship mobility model, and maritime radio propagation (path loss, fading, interference) are defined internally by the authors with no comparison to external data such as radiosonde/lidar wind campaigns, AIS ship tracks, or empirical maritime channel measurements. Because the reported gains are generated inside this synthetic environment, the central robustness claim is at risk of being an artifact of the chosen parameters rather than a general result.
Authors: The referee correctly notes that our models are synthetic. The wind, mobility, and channel parameters are drawn from established theoretical models and literature references (e.g., von Kármán turbulence spectra for stratospheric winds and 3GPP NTN propagation models). We will expand the simulation setup section to explicitly cite these sources and add a limitations paragraph discussing parameter sensitivity. However, we cannot add direct empirical comparisons to new external datasets such as radiosonde or AIS data, as this would require additional data collection outside the scope of the current work. revision: partial
-
Referee: [§ Proposed DRL Framework] PPO formulation: the reward function, state representation (radio measurements and network feedback), action space (HAPS positioning commands), and PPO-specific hyperparameters (learning rate, clip epsilon, etc.) are not specified. These choices are load-bearing for the learned policy and must be documented to allow reproduction or sensitivity analysis.
Authors: We apologize for the lack of explicit detail in the main text. The overall PPO framework is described, but we will revise § Proposed DRL Framework to fully specify the reward function (a weighted combination of coverage, throughput, and positioning stability terms), the state representation (SINR measurements, user locations, and wind estimates), the continuous action space for HAPS position adjustments, and all PPO hyperparameters (learning rate, clip epsilon, batch size, etc.). This will be moved from any supplementary material into the main body to ensure reproducibility. revision: yes
- Direct empirical validation or comparisons against specific external real-world datasets (radiosonde/lidar wind campaigns, AIS ship tracks, or empirical maritime channel measurements), as the study is based entirely on synthetic models and no such primary data collection was performed.
Circularity Check
No significant circularity; simulation evaluation is self-contained
full rationale
The paper proposes a PPO-based DRL controller for HAPS positioning and reports simulation outcomes under author-defined wind, mobility, and channel models. No derivation chain reduces a claimed result to its inputs by construction: there are no self-definitional equations, no fitted parameters renamed as independent predictions, and no load-bearing self-citations that import uniqueness or ansatzes. The simulation serves as a standard empirical check of the algorithm on synthetic scenarios rather than a tautological restatement of the modeling assumptions. External validation against real measurements is a separate correctness concern, not a circularity issue.
Axiom & Free-Parameter Ledger
free parameters (2)
- PPO-specific hyperparameters (learning rate, clip epsilon, etc.)
- Wind disturbance and channel model parameters
axioms (1)
- domain assumption Stratospheric wind effects and ship mobility can be faithfully represented by the chosen simulation models.
Reference graph
Works this paper leans on
-
[1]
Maritime communications: A survey on enabling technologies, opportunities, and challenges,
F. S. Alqurashi, A. Trichili, N. Saeed, B. S. Ooi, and M.- S. Alouini, “Maritime communications: A survey on enabling technologies, opportunities, and challenges,”IEEE Internet of Things Journal, vol. 10, no. 4, pp. 3525–3547, 2022
2022
-
[2]
Maritime communication net- works: A survey on architecture, key technologies, and chal- lenges,
Z. Shang, X. Zhang, and X. Li, “Maritime communication net- works: A survey on architecture, key technologies, and chal- lenges,”Computer Communications, vol. 241, p. 108255, 2025
2025
-
[3]
Mar- itime communications — current state and the future potential with SDN and SDR,
N. Niknami, A. Srinivasan, K. St. Germain, and J. Wu, “Mar- itime communications — current state and the future potential with SDN and SDR,”Network, vol. 3, no. 4, pp. 563–584, 2023
2023
-
[4]
Hy- brid satellite-terrestrial communication networks for the mar- itime internet of things: Key technologies, opportunities, and challenges,
T. Wei, W. Feng, Y. Chen, C.-X. Wang, N. Ge, and J. Lu, “Hy- brid satellite-terrestrial communication networks for the mar- itime internet of things: Key technologies, opportunities, and challenges,”IEEE Internet of things journal, vol. 8, no. 11, pp. 8910–8934, 2021
2021
-
[5]
Bridging earth and space: A survey on haps for non-terrestrial networks,
G. Svistunov, A. Akhtarshenas, D. López-Pérez, M. Giordani, G. Geraci, and H. Yanikomeroglu, “Bridging earth and space: A survey on haps for non-terrestrial networks,”arXiv preprint arXiv:2510.19731, 2025
-
[6]
Optimal topology design of high altitude platform based maritime broadband communica- tion networks,
J. Duan, T. Zhao, and B. Lin, “Optimal topology design of high altitude platform based maritime broadband communica- tion networks,” inInternational Conference on Combinatorial Optimization and Applications. Springer, 2017, pp. 462–470
2017
-
[7]
Topological optimization algorithm for HAP assisted multi-unmanned ships communication,
H. Cao, T. Yang, Z. Yin, X. Sun, and D. Li, “Topological optimization algorithm for HAP assisted multi-unmanned ships communication,” in2020 IEEE 92nd Vehicular Technology Con- ference (VTC2020-Fall). IEEE, 2020, pp. 1–5
2020
-
[8]
HAPS-enabled down- link coverage enhancement in islands and maritime areas,
H. Lin, M. A. Kishk, and M.-S. Alouini, “HAPS-enabled down- link coverage enhancement in islands and maritime areas,”IEEE Transactions on Wireless Communications, 2026
2026
-
[9]
Highaltitudeplatformstationbasedsupermacrobasesta- tion constellations,
M. S. Alam, G. K. Kurt, H. Yanikomeroglu, P. Zhu, and N. D. Ðào,“Highaltitudeplatformstationbasedsupermacrobasesta- tion constellations,”IEEE Communications Magazine, vol. 59, no. 1, pp. 103–109, 2021
2021
-
[10]
Reinforcement learning-based cloud-aware haps trajectory optimization in soft- switching hybrid FSO/RF cooperative transmission system,
B. Cui, S. Cai, L. Wang, Z. Zhang, and F. Wang, “Reinforcement learning-based cloud-aware haps trajectory optimization in soft- switching hybrid FSO/RF cooperative transmission system,” Sensors, vol. 26, no. 3, p. 948, 2026
2026
-
[11]
Station- keeping HAPS mission through optimal sprint and drift trajec- tories,
A. Delgado, D. Domínguez, J. Gonzalo, and A. Escapa, “Station- keeping HAPS mission through optimal sprint and drift trajec- tories,”Aerospace Science and Technology, vol. 152, p. 109365, 2024
2024
-
[12]
A comparison of loon balloon ob- servations and stratospheric reanalysis products,
L. S. Friedrich, A. J. McDonald, G. E. Bodeker, K. E. Cooper, J. Lewis, and A. J. Paterson, “A comparison of loon balloon ob- servations and stratospheric reanalysis products,”Atmospheric Chemistry and Physics, vol. 17, no. 2, pp. 855–866, 2017
2017
-
[13]
Station- keeping performance analysis for high altitude balloon with alti- tude control system,
H. Du, M. Lv, J. Li, W. Zhu, L. Zhang, and Y. Wu, “Station- keeping performance analysis for high altitude balloon with alti- tude control system,”Aerospace Science and Technology, vol. 92, pp. 644–652, 2019
2019
-
[14]
Maritimecover- age analysis in altitude-controlled balloons with wind-dependent trajectories,
T. Hirai, T. Iizuka, N. Fukui, N. Endo, R. Yamamoto, Y.Umemiya,H.Matsubara,andN.Wakamiya,“Maritimecover- age analysis in altitude-controlled balloons with wind-dependent trajectories,”Authorea Preprints, 2025
2025
-
[15]
AI- aided integrated terrestrial and non-terrestrial 6G solutions for sustainable maritime networking,
S. Saafi, O. Vikhrova, G. Fodor, J. Hosek, and S. Andreev, “AI- aided integrated terrestrial and non-terrestrial 6G solutions for sustainable maritime networking,”IEEE Network, vol. 36, no. 3, pp. 183–190, 2022
2022
-
[16]
Deep reinforcement learning based compu- tation offloading and resource allocation strategy for maritime internet of things,
Y. Xu and Q. Yu, “Deep reinforcement learning based compu- tation offloading and resource allocation strategy for maritime internet of things,”Computer Networks, vol. 264, p. 111221, 2025
2025
-
[17]
Optimizing UAV aerial base station flights using DRL-based proximal policy optimization,
M. R. Ibáñez, A. Akhtarshenas, D. López-Pérez, and G. Geraci, “Optimizing UAV aerial base station flights using DRL-based proximal policy optimization,” in2025 IEEE ICC Workshops. IEEE, 2025, pp. 1293–1298
2025
-
[18]
Global assimilation of x project loon stratospheric balloon observations,
L. Coy, M. R. Schoeberl, S. Pawson, S. Candido, and R. W. Carver, “Global assimilation of x project loon stratospheric balloon observations,” inAGU Fall Meeting, no. GSFC-E-DAA- TN50500, 2017
2017
-
[19]
Station-keeping for high- altitude balloon with reinforcement learning,
Z. Xu, Y. Liu, H. Du, and M. Lv, “Station-keeping for high- altitude balloon with reinforcement learning,”Advances in Space Research, vol. 70, no. 3, pp. 733–751, 2022
2022
-
[20]
2020, v.15.4
Technical Specification Group Radio Access Network; Study on New Radio (NR) to support non-terrestrial networks, 3GPP TR38.811, Sep. 2020, v.15.4
2020
-
[21]
Propagation data and prediction methods required for the design of earth-space telecommunication systems,
ITU-R, “Propagation data and prediction methods required for the design of earth-space telecommunication systems,” Inter- national Telecommunication Union, Recommendation P.618-14, December 2021. [Online]. Available: https://www.itu.int
2021
-
[22]
2024, v.18.0
Study on channel model for frequencies from 0.5 to 100 GHz, 3GPP TR38.901, Mar. 2024, v.18.0
2024
-
[23]
Sum rate and max-min rate for cellular-enabled UAV swarm networks,
B. Yang, Y. Dang, T. Taleb, S. Shen, and X. Jiang, “Sum rate and max-min rate for cellular-enabled UAV swarm networks,” IEEE Transactions on Vehicular Technology, vol. 72, no. 1, pp. 1073–1083, 2023
2023
-
[24]
Reinforcementlearning:anintroduction,
R.S.Sutton,“Reinforcementlearning:anintroduction,”A Brad- ford Book, 2018
2018
-
[25]
Proximal Policy Optimization Algorithms
J.Schulman, F.Wolski, P.Dhariwal,A.Radford, andO.Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.