Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

Harry Holt; Marco Sagliano; Roberto Armellin; Yashdeep Chaudhary

arxiv: 2606.13605 · v1 · pith:J2PA43W6new · submitted 2026-06-11 · 🧮 math.OC · cs.LG· cs.SY· eess.SY

Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

Yashdeep Chaudhary , Roberto Armellin , Harry Holt , Marco Sagliano This is my paper

Pith reviewed 2026-06-27 05:40 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.SYeess.SY

keywords robust trajectory optimizationchance-constrained reinforcement learningdistribution-agnosticprobabilistic feasibilityaffine correction lawspacecraft guidanceEarth-Mars transferrocket landing

0 comments

The pith

Chance-constrained reinforcement learning robustifies nominal spacecraft trajectories using only samples of uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that first computes a deterministic nominal trajectory offline and then applies reinforcement learning to learn a correction policy that keeps the actual path feasible with high probability under sampled uncertainties. The correction takes the form of an affine closed-loop law, and chance constraints are handled by checking upper-tail quantiles from many simulated rollouts rather than by assuming any specific probability distribution. The method is shown on an Earth-Mars transfer and on a rocket landing problem, remaining competitive with existing robust optimizers on fuel cost while preserving feasibility even when the uncertainty changes after training. A sympathetic reader would care because most real spacecraft uncertainties are known only through samples and are difficult to capture with standard parametric distributions.

Core claim

By combining an offline nominal trajectory with a learned affine closed-loop correction law that includes feedforward adjustments and time-varying feedback gains, and by enforcing probabilistic feasibility through empirical upper-tail quantiles from rollouts together with covariance penalties on terminal dispersion, the framework achieves distribution-agnostic robust trajectory optimization that stays competitive in upper-tail fuel cost and transfers without redesign to materially different problems such as multi-impulse transfers and continuous-thrust landings.

What carries the argument

The structured affine closed-loop correction law (feedforward adjustment plus time-varying feedback gains) that is learned by reinforcement learning and whose probabilistic feasibility is enforced via rollout-based upper-tail quantiles.

If this is right

Upper-tail fuel cost remains competitive with a recent robust optimization reference under Gaussian uncertainty.
Probabilistic feasibility is preserved when the same policy is tested under bounded uniform uncertainty.
The policy also maintains feasibility under process disturbances that were never seen during training.
The identical robustification structure applies to a short-horizon continuous-thrust landing problem with drag, mass depletion, and glide-slope constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sample-only requirement would let mission planners substitute real flight telemetry directly for synthetic noise models.
The same scaffold could be tested on non-spacecraft problems such as autonomous vehicle path planning where uncertainty distributions are likewise unknown.
If the affine correction form proves insufficient for strongly nonlinear uncertainty propagation, the framework would need a richer policy class while keeping the quantile-based chance-constraint mechanism.

Load-bearing premise

Uncertainties can be represented solely through samples of initial conditions and process noise that the method is allowed to draw.

What would settle it

A set of Monte Carlo rollouts under a new uncertainty distribution, not used in training, in which the empirical fraction of trajectories violating the chance constraints exceeds the allowed probability level.

Figures

Figures reproduced from arXiv: 2606.13605 by Harry Holt, Marco Sagliano, Roberto Armellin, Yashdeep Chaudhary.

**Figure 2.** Figure 2: Control-effort statistics under Gaussian initial uncertainty. (a) Per-node impulse [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: Evolution of ensemble dispersion along the trajectory for the Gaussian case. (a) [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Terminal-state error distributions under Gaussian initial uncertainty. (a) Histogram [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Closed-loop trajectory solution for the Earth–Mars transfer under uniform initial [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Control-effort statistics under uniform initial uncertainty. (a) Per-node impulse [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Evolution of ensemble dispersion along the trajectory for the uniform case. (a) [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Terminal-state error distributions under uniform initial uncertainty. (a) Histogram [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Robustness to two-dimensional variation in unmodeled process noise under Gaussian [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

**Figure 10.** Figure 10: Trade-off between terminal dispersion qˆ0.95(er) and 95th-percentile total cost ∆vtot,0.95 over the diagonal sweep ηr = ηv = η. Marker style indicates feasibility and color encodes the process-noise scaling level. The process-noise study shows that the learned closed-loop structure provides meaningful robustness to moderate unmodeled disturbances without retraining, but with a finite robustness radius det… view at source ↗

**Figure 11.** Figure 11: Closed-loop descent trajectory for the rocket landing case under Gaussian initial [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

**Figure 12.** Figure 12: Applied control and equivalent-∆v statistics for the rocket landing case under Gaussian initial uncertainty with nominal process noise enabled. Panels show: (a) node-wise distribution of applied acceleration magnitude ∥U∥ across the evaluation ensemble and (b) histogram of total ∆veq over the MC rollout. The terminal targeting statistics are reported in [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗

**Figure 13.** Figure 13: Terminal targeting error distributions for the rocket landing case under Gaussian [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗

**Figure 14.** Figure 14: Closed-loop descent trajectory for the rocket landing case under bounded uniform [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗

**Figure 15.** Figure 15: Applied control and equivalent-∆v statistics for the rocket landing case under bounded uniform initial uncertainty with nominal process noise enabled. Panels show: (a) node-wise distribution of applied acceleration magnitude ∥U∥ across the evaluation ensemble and (b) histogram of total ∆veq over the MC rollout. different touchdown-ellipse geometry in [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗

**Figure 16.** Figure 16: Terminal targeting error distributions for the rocket landing case under bounded [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗

read the original abstract

This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows how to use RL for affine corrections on nominal trajectories to achieve robust chance-constrained optimization, with good portability between an Earth-Mars transfer and a landing problem.

read the letter

The main takeaway is that this paper gives a method to robustify a precomputed nominal trajectory using reinforcement learning to learn a structured affine correction law, and demonstrates that the same approach works on two quite different trajectory problems without needing to redesign the core parts.

What the paper does well is apply this to a three-dimensional multi-impulse Earth-Mars transfer, benchmarking against a recent robust optimization reference under Gaussian uncertainty, then testing under uniform uncertainty and unseen process disturbances. It also applies the framework to a stochastic atmospheric pinpoint rocket landing with continuous thrust, drag, mass depletion, and glide-slope constraints. The results indicate competitive upper-tail fuel costs while maintaining probabilistic feasibility through the quantile and penalty mechanisms.

The soft spots are fairly minor. The distribution-agnostic aspect boils down to requiring that uncertainties can be sampled, which is the standard setup for Monte Carlo chance-constrained methods rather than a major advance. The abstract does not include specific quantitative metrics, so judging the exact level of competitiveness requires the full results section. The structured affine form is presented as key to portability, but the paper would benefit from more discussion on why that structure was chosen over alternatives.

This is aimed at the aerospace trajectory optimization community, particularly those dealing with uncertainty in mission design. A reader working on similar robust control problems would get practical value from the case studies and the cross-problem transfer demonstration.

It deserves peer review. The two materially different case studies provide a reasonable test of the portability claim, and the method is grounded enough in sampling-based techniques to warrant referee attention.

Referee Report

2 major / 2 minor

Summary. The paper presents a distribution-agnostic robust trajectory optimization framework that first computes a deterministic nominal trajectory offline and then employs reinforcement learning to learn a structured affine closed-loop correction law (feedforward adjustment plus time-varying feedback gains) for robustness against sampled uncertainties in initial conditions and process noise. Probabilistic feasibility is enforced via rollout-based upper-tail quantiles and covariance-feasibility penalties. The method is evaluated on a 3D multi-impulse Earth-Mars transfer (benchmarked under Gaussian uncertainty and tested under uniform and unseen disturbances) and a stochastic atmospheric pinpoint rocket landing problem, with the central claim being competitive upper-tail fuel cost while preserving probabilistic feasibility and portability of the robustification scaffold across heterogeneous problems without core redesign.

Significance. If the empirical results hold with adequate quantitative validation, the work provides a portable sample-based robustification approach for chance-constrained trajectory optimization that requires only the ability to sample uncertainties, which could facilitate application to varied spacecraft planning tasks in aerospace control without strong distributional assumptions or problem-specific redesign.

major comments (2)

[Abstract and § on numerical results] Abstract and results sections: The central claim of remaining 'competitive in upper-tail fuel cost while preserving probabilistic feasibility' is stated without any numerical metrics, training hyperparameters, rollout counts, or explicit feasibility rates in the abstract; the full manuscript must supply these quantitative comparisons (e.g., fuel cost values and violation probabilities versus the reference method) to substantiate the performance assertions on the Earth-Mars and landing cases.
[Method section describing the correction law] Method description: The structured affine closed-loop correction law is presented as the key robustification mechanism, yet the manuscript provides no derivation or justification for why this particular affine form (rather than a more general policy) suffices to maintain the claimed portability and chance-constraint satisfaction across both impulsive and continuous-thrust problems.

minor comments (2)

[Method] Notation for the covariance-feasibility penalties and quantile thresholds should be defined consistently with explicit symbols and units.
[Numerical experiments] The description of the two case studies would benefit from a brief table summarizing problem dimensions, constraint types, and uncertainty sources for quick comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. Below we respond point by point to the major comments and indicate the changes planned for the revised manuscript.

read point-by-point responses

Referee: [Abstract and § on numerical results] Abstract and results sections: The central claim of remaining 'competitive in upper-tail fuel cost while preserving probabilistic feasibility' is stated without any numerical metrics, training hyperparameters, rollout counts, or explicit feasibility rates in the abstract; the full manuscript must supply these quantitative comparisons (e.g., fuel cost values and violation probabilities versus the reference method) to substantiate the performance assertions on the Earth-Mars and landing cases.

Authors: We agree that the abstract and results sections would be strengthened by the inclusion of explicit numerical metrics. In the revised manuscript we will add quantitative comparisons, including upper-tail fuel costs, feasibility violation probabilities, training hyperparameters, and rollout counts, with direct numerical values versus the reference method for both the Earth-Mars and landing cases. revision: yes
Referee: [Method section describing the correction law] Method description: The structured affine closed-loop correction law is presented as the key robustification mechanism, yet the manuscript provides no derivation or justification for why this particular affine form (rather than a more general policy) suffices to maintain the claimed portability and chance-constraint satisfaction across both impulsive and continuous-thrust problems.

Authors: The affine structure (feedforward adjustment plus time-varying feedback gains) is selected to enable efficient learning via reinforcement learning while ensuring the same robustification scaffold remains portable across heterogeneous problems without core redesign. We acknowledge that the current manuscript does not provide an explicit derivation of why this form suffices for chance-constraint satisfaction. In the revised version we will add a concise justification in the method section explaining how the affine corrections, combined with rollout-based quantiles and covariance penalties, empirically enforce probabilistic feasibility across the tested impulsive and continuous-thrust settings. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claim rests on an empirical sample-based procedure: a nominal trajectory is computed offline, then RL learns an affine correction law whose probabilistic feasibility is enforced directly via rollout quantiles and covariance penalties. This structure is portable across two distinct problems solely because uncertainty is assumed sampleable—the standard precondition for any Monte-Carlo chance-constraint method. No derivation step reduces by construction to a fitted parameter renamed as prediction, no self-citation supplies a uniqueness theorem that forbids alternatives, and no ansatz is smuggled in via prior work. The framework therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Framework rests on sampleability of uncertainty and introduces an affine correction structure whose effectiveness is asserted via case studies.

axioms (1)

domain assumption Uncertainty can be sampled from an unknown distribution
Explicitly stated as the sole requirement for uncertainty representation.

invented entities (1)

Structured affine closed-loop correction law no independent evidence
purpose: Robustify nominal trajectory via feedforward adjustment and time-varying feedback gains
Introduced as the RL policy structure for closed-loop correction

pith-pipeline@v0.9.1-grok · 5774 in / 1193 out tokens · 30268 ms · 2026-06-27T05:40:39.255137+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 20 canonical work pages

[1]

R. H. Battin, An Introduction to the Mathematics and Methods of Astrodynamics, AIAA Education Series, American Institute of Aeronautics and Astronautics (AIAA), 1999

1999
[2]

B. A. Conway, Spacecraft Trajectory Optimization, Cambridge Aerospace Series, Cambridge University Press, 2010

2010
[3]

J. T. Betts, Survey of numerical methods for trajectory optimization, Journal of Guidance, Control, and Dynamics 21 (2) (1998) 193–207.doi:10.2514/2.4231

work page doi:10.2514/2.4231 1998
[4]

D. P. Bertsekas, S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case, no. 5 in Optimization and Neural Computation Series, Athena Scientific, Belmont, Mass, 1996

1996
[5]

Charnes, W

A. Charnes, W. W. Cooper, Chance-constrained programming, Management Science 6 (1) (1959) 73–79. doi:10.1287/mnsc.6.1.73

work page doi:10.1287/mnsc.6.1.73 1959
[6]

Prékopa, Stochastic Programming, no

A. Prékopa, Stochastic Programming, no. v.324 in Mathematics and Its Applications, Springer, Dordrecht, Netherlands, 1995

1995
[7]

Y. Chen, T. T. Georgiou, M. Pavon, Optimal steering of a linear stochastic system to a final probability distribution, part II, IEEE Trans. Automat. Contr. 61 (5) (2016) 1170–1180. 36

2016
[8]

Ridderhof, J

J. Ridderhof, J. Pilipovsky, P. Tsiotras, Chance-constrained covariance control for low-thrust minimum- fuel trajectory optimization, in: AIAA/AAS Astrodynamics Specialists Conference, South Lake Tahoe, California, 2020, paper no. AAS 20-618

2020
[9]

Ridderhof, P

J. Ridderhof, P. Tsiotras, Minimum-fuel powered descent in the presence of random disturbances, in: AIAA Scitech 2019 Forum, American Institute of Aeronautics and Astronautics, 2019. doi: 10.2514/6.2019-0646

work page doi:10.2514/6.2019-0646 2019
[10]

Ridderhof, P

J. Ridderhof, P. Tsiotras, Uncertainty quantification and control during mars powered descent and landing using covariance steering, in: 2018 AIAA Guidance, Navigation, and Control Conference, Kissimmee, Florida, 2018.doi:10.2514/6.2018-0611

work page doi:10.2514/6.2018-0611 2018
[11]

Benedikter, A

B. Benedikter, A. Zavoli, Z. Wang, S. Pizzurro, E. Cavallini, Convex approach to covariance control with application to stochastic low-thrust trajectory optimization, Journal of Guidance, Control, and Dynamics 45 (11) (2022) 2061–2075.doi:10.2514/1.g006806

work page doi:10.2514/1.g006806 2022
[12]

Benedikter, A

B. Benedikter, A. Zavoli, Z. Wang, S. Pizzurro, E. Cavallini, Convex approach to stochastic control for autonomous rocket pinpoint landing, in: AAS/AIAA Astrodynamics Specialist Conference, Charlotte, North Carolina, 2022, paper no. AAS 22-717

2022
[13]

Marmo, A

N. Marmo, A. Zavoli, N. Ozaki, Chance-constraint robust trajectory optimization with a hybrid multiple-shooting approach, Journal of Guidance, Control, and Dynamics 48 (11) (2025) 2495–2511. doi:10.2514/1.g008275

work page doi:10.2514/1.g008275 2025
[14]

Zhang, D

P. Zhang, D. Wu, S. Gong, Trajectory optimization for aerodynamically controlled missiles by chance- constrained sequential convex programming, Aerospace Science and Technology 153 (2024) 109464. doi:10.1016/j.ast.2024.109464

work page doi:10.1016/j.ast.2024.109464 2024
[16]

R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, MIT Press Cambridge, 2018

2018
[17]

Zavoli, L

A. Zavoli, L. Federici, Reinforcement learning for robust trajectory design of interplanetary missions, Journal of Guidance, Control, and Dynamics 44 (8) (2021) 1440–1453.doi:10.2514/1.g005794

work page doi:10.2514/1.g005794 2021
[18]

Federici, A

L. Federici, A. Scorsoglio, A. Zavoli, R. Furfaro, Autonomous guidance for cislunar orbit transfers via reinforcement learning, in: AAS/AIAA Astrodynamics Specialist Conference, Big Sky, MT, 2021, paper no. AAS 21-610

2021
[19]

Gaudet, R

B. Gaudet, R. Linares, R. Furfaro, Deep reinforcement learning for six degree-of-freedom planetary landing, Advances in Space Research 65 (7) (2020) 1723–1741.doi:10.1016/j.asr.2019.12.030

work page doi:10.1016/j.asr.2019.12.030 2020
[20]

Furfaro, A

R. Furfaro, A. Scorsoglio, R. Linares, M. Massari, Adaptive generalized ZEM-ZEV feedback guidance for planetary landing via a deep reinforcement learning approach (Mar. 2020).arXiv:2003.02182

arXiv 2020
[21]

Federici, B

L. Federici, B. Benedikter, R. Furfaro, Reinforcement-learning-enhanced model predictive control with application to autonomous planetary landing, Journal of Guidance, Control, and Dynamics 49 (3) (2026) 788–805.doi:10.2514/1.g009534

work page doi:10.2514/1.g009534 2026
[22]

Gaudet, R

B. Gaudet, R. Furfaro, R. Linares, Reinforcement learning for angle-only intercept guidance of maneu- vering targets, Aerospace Science and Technology 99 (2020) 105746.doi:10.1016/j.ast.2020.105746

work page doi:10.1016/j.ast.2020.105746 2020
[23]

Hovell, S

K. Hovell, S. Ulrich, Deep reinforcement learning for spacecraft proximity operations guidance, Journal of Spacecraft and Rockets 58 (2) (2021) 254–264.doi:10.2514/1.a34838. 37

work page doi:10.2514/1.a34838 2021
[24]

Federici, B

L. Federici, B. Benedikter, A. Zavoli, Deep learning techniques for autonomous spacecraft guidance during proximity operations, Journal of Spacecraft and Rockets 58 (6) (2021) 1774–1785.doi:10.2514/ 1.a35076

2021
[25]

H. Yuan, D. Li, Deep reinforcement learning for rendezvous guidance with enhanced angles-only observability, Aerospace Science and Technology 129 (2022) 107812.doi:10.1016/j.ast.2022.107812

work page doi:10.1016/j.ast.2022.107812 2022
[26]

C. Mu, S. Liu, M. Lu, Z. Liu, L. Cui, K. Wang, Autonomous spacecraft collision avoidance with a variable number of space debris based on safe reinforcement learning, Aerospace Science and Technology 149 (2024) 109131.doi:10.1016/j.ast.2024.109131

work page doi:10.1016/j.ast.2024.109131 2024
[27]

H. Holt, R. Armellin, Reinforcement learning enhanced lqr and control lyapunov functions for spacecraft proximity operations, IEEE Transactions on Robotics 41 (2025) 5117–5129.doi:10.1109/tro.2025. 3600160

work page doi:10.1109/tro.2025 2025
[28]

Capra, A

L. Capra, A. Brandonisio, M. R. Lavagna, Reinforced model predictive guidance and control for spacecraft proximity operations, Aerospace 12 (9) (2025) 837.doi:10.3390/aerospace12090837

work page doi:10.3390/aerospace12090837 2025
[29]

J. T. A. Vedant, M. West, A. Ghosh, Reinforcement learning for spacecraft attitude control, in: 70th International Astronautical Congress, Washington D.C., United States, 2019

2019
[30]

Zheng, Y

M. Zheng, Y. Wu, C. Li, Reinforcement learning strategy for spacecraft attitude hyperagile tracking control with uncertainties, Aerospace Science and Technology 119 (2021) 107126.doi:10.1016/j.ast. 2021.107126

work page doi:10.1016/j.ast 2021
[31]

Chen, Y.-X

R.-Z. Chen, Y.-X. Li, C. K. Ahn, Reinforcement-learning-based fixed-time attitude consensus control for multiple spacecraft systems with model uncertainties, Aerospace Science and Technology 132 (2023) 108060.doi:10.1016/j.ast.2022.108060

work page doi:10.1016/j.ast.2022.108060 2023
[32]

Chaudhary, H

Y. Chaudhary, H. Holt, L. Anoè, R. Armellin, C. Bombardelli, Low-Thrust Cis-Lunar Transfers exploiting Ballistic Capture Trajectories, in: AIAA SCITECH 2024 Forum, American Institute of Aeronautics and Astronautics (AIAA), 2024.doi:10.2514/6.2024-0837

work page doi:10.2514/6.2024-0837 2024
[33]

Liu, Fuel-optimal rocket landing with aerodynamic controls, Journal of Guidance, Control, and Dynamics 42 (1) (2019) 65–77.doi:10.2514/1.g003537

X. Liu, Fuel-optimal rocket landing with aerodynamic controls, Journal of Guidance, Control, and Dynamics 42 (1) (2019) 65–77.doi:10.2514/1.g003537

work page doi:10.2514/1.g003537 2019
[34]

Engstrom, A

L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph, A. Madry, Implementation matters in deep policy gradients: A case study on ppo and trpo (2020).arXiv:2005.12729

arXiv 2020
[35]

Raffin, A

A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dormann, Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research 22 (268) (2021) 1–8, available at http://jmlr.org/papers/v22/20-1364.html

2021
[36]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High-Performance Deep Learning Library (Dec. 2019).arXiv:1912.01703

Pith/arXiv arXiv 2019
[37]

Towers, A

M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. De Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, et al., Gymnasium: A standard interface for reinforcement learning environments (2024).arXiv:2407.17032

Pith/arXiv arXiv 2024
[38]

E. M. Standish, JPL planetary and lunar ephemerides, DE405/LE405, Interoffice Memorandum JPL IOM 312.F-98-048, Jet Propulsion Laboratory (Aug. 1998).ftp://ssd.jpl.nasa.gov/pub/eph/planets/ ioms/de405.iom.pdf. 38

1998
[39]

Diamond, S

S. Diamond, S. Boyd, CVXPY: A Python-embedded modeling language for convex optimization, Journal of Machine Learning Research 17 (83) (2016) 1–5

2016
[40]

Agrawal, R

A. Agrawal, R. Verschueren, S. Diamond, S. Boyd, A rewriting system for convex optimization problems, Journal of Control and Decision 5 (1) (2018) 42–60

2018
[41]

com/latest/pythonapi/index.html

MOSEK ApS, The MOSEK Python Optimizer API manual, Version 11.1 (2025).https://docs.mosek. com/latest/pythonapi/index.html. 39

2025

[1] [1]

R. H. Battin, An Introduction to the Mathematics and Methods of Astrodynamics, AIAA Education Series, American Institute of Aeronautics and Astronautics (AIAA), 1999

1999

[2] [2]

B. A. Conway, Spacecraft Trajectory Optimization, Cambridge Aerospace Series, Cambridge University Press, 2010

2010

[3] [3]

J. T. Betts, Survey of numerical methods for trajectory optimization, Journal of Guidance, Control, and Dynamics 21 (2) (1998) 193–207.doi:10.2514/2.4231

work page doi:10.2514/2.4231 1998

[4] [4]

D. P. Bertsekas, S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case, no. 5 in Optimization and Neural Computation Series, Athena Scientific, Belmont, Mass, 1996

1996

[5] [5]

Charnes, W

A. Charnes, W. W. Cooper, Chance-constrained programming, Management Science 6 (1) (1959) 73–79. doi:10.1287/mnsc.6.1.73

work page doi:10.1287/mnsc.6.1.73 1959

[6] [6]

Prékopa, Stochastic Programming, no

A. Prékopa, Stochastic Programming, no. v.324 in Mathematics and Its Applications, Springer, Dordrecht, Netherlands, 1995

1995

[7] [7]

Y. Chen, T. T. Georgiou, M. Pavon, Optimal steering of a linear stochastic system to a final probability distribution, part II, IEEE Trans. Automat. Contr. 61 (5) (2016) 1170–1180. 36

2016

[8] [8]

Ridderhof, J

J. Ridderhof, J. Pilipovsky, P. Tsiotras, Chance-constrained covariance control for low-thrust minimum- fuel trajectory optimization, in: AIAA/AAS Astrodynamics Specialists Conference, South Lake Tahoe, California, 2020, paper no. AAS 20-618

2020

[9] [9]

Ridderhof, P

J. Ridderhof, P. Tsiotras, Minimum-fuel powered descent in the presence of random disturbances, in: AIAA Scitech 2019 Forum, American Institute of Aeronautics and Astronautics, 2019. doi: 10.2514/6.2019-0646

work page doi:10.2514/6.2019-0646 2019

[10] [10]

Ridderhof, P

J. Ridderhof, P. Tsiotras, Uncertainty quantification and control during mars powered descent and landing using covariance steering, in: 2018 AIAA Guidance, Navigation, and Control Conference, Kissimmee, Florida, 2018.doi:10.2514/6.2018-0611

work page doi:10.2514/6.2018-0611 2018

[11] [11]

Benedikter, A

B. Benedikter, A. Zavoli, Z. Wang, S. Pizzurro, E. Cavallini, Convex approach to covariance control with application to stochastic low-thrust trajectory optimization, Journal of Guidance, Control, and Dynamics 45 (11) (2022) 2061–2075.doi:10.2514/1.g006806

work page doi:10.2514/1.g006806 2022

[12] [12]

Benedikter, A

B. Benedikter, A. Zavoli, Z. Wang, S. Pizzurro, E. Cavallini, Convex approach to stochastic control for autonomous rocket pinpoint landing, in: AAS/AIAA Astrodynamics Specialist Conference, Charlotte, North Carolina, 2022, paper no. AAS 22-717

2022

[13] [13]

Marmo, A

N. Marmo, A. Zavoli, N. Ozaki, Chance-constraint robust trajectory optimization with a hybrid multiple-shooting approach, Journal of Guidance, Control, and Dynamics 48 (11) (2025) 2495–2511. doi:10.2514/1.g008275

work page doi:10.2514/1.g008275 2025

[14] [14]

Zhang, D

P. Zhang, D. Wu, S. Gong, Trajectory optimization for aerodynamically controlled missiles by chance- constrained sequential convex programming, Aerospace Science and Technology 153 (2024) 109464. doi:10.1016/j.ast.2024.109464

work page doi:10.1016/j.ast.2024.109464 2024

[15] [16]

R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, MIT Press Cambridge, 2018

2018

[16] [17]

Zavoli, L

A. Zavoli, L. Federici, Reinforcement learning for robust trajectory design of interplanetary missions, Journal of Guidance, Control, and Dynamics 44 (8) (2021) 1440–1453.doi:10.2514/1.g005794

work page doi:10.2514/1.g005794 2021

[17] [18]

Federici, A

L. Federici, A. Scorsoglio, A. Zavoli, R. Furfaro, Autonomous guidance for cislunar orbit transfers via reinforcement learning, in: AAS/AIAA Astrodynamics Specialist Conference, Big Sky, MT, 2021, paper no. AAS 21-610

2021

[18] [19]

Gaudet, R

B. Gaudet, R. Linares, R. Furfaro, Deep reinforcement learning for six degree-of-freedom planetary landing, Advances in Space Research 65 (7) (2020) 1723–1741.doi:10.1016/j.asr.2019.12.030

work page doi:10.1016/j.asr.2019.12.030 2020

[19] [20]

Furfaro, A

R. Furfaro, A. Scorsoglio, R. Linares, M. Massari, Adaptive generalized ZEM-ZEV feedback guidance for planetary landing via a deep reinforcement learning approach (Mar. 2020).arXiv:2003.02182

arXiv 2020

[20] [21]

Federici, B

L. Federici, B. Benedikter, R. Furfaro, Reinforcement-learning-enhanced model predictive control with application to autonomous planetary landing, Journal of Guidance, Control, and Dynamics 49 (3) (2026) 788–805.doi:10.2514/1.g009534

work page doi:10.2514/1.g009534 2026

[21] [22]

Gaudet, R

B. Gaudet, R. Furfaro, R. Linares, Reinforcement learning for angle-only intercept guidance of maneu- vering targets, Aerospace Science and Technology 99 (2020) 105746.doi:10.1016/j.ast.2020.105746

work page doi:10.1016/j.ast.2020.105746 2020

[22] [23]

Hovell, S

K. Hovell, S. Ulrich, Deep reinforcement learning for spacecraft proximity operations guidance, Journal of Spacecraft and Rockets 58 (2) (2021) 254–264.doi:10.2514/1.a34838. 37

work page doi:10.2514/1.a34838 2021

[23] [24]

Federici, B

L. Federici, B. Benedikter, A. Zavoli, Deep learning techniques for autonomous spacecraft guidance during proximity operations, Journal of Spacecraft and Rockets 58 (6) (2021) 1774–1785.doi:10.2514/ 1.a35076

2021

[24] [25]

H. Yuan, D. Li, Deep reinforcement learning for rendezvous guidance with enhanced angles-only observability, Aerospace Science and Technology 129 (2022) 107812.doi:10.1016/j.ast.2022.107812

work page doi:10.1016/j.ast.2022.107812 2022

[25] [26]

C. Mu, S. Liu, M. Lu, Z. Liu, L. Cui, K. Wang, Autonomous spacecraft collision avoidance with a variable number of space debris based on safe reinforcement learning, Aerospace Science and Technology 149 (2024) 109131.doi:10.1016/j.ast.2024.109131

work page doi:10.1016/j.ast.2024.109131 2024

[26] [27]

H. Holt, R. Armellin, Reinforcement learning enhanced lqr and control lyapunov functions for spacecraft proximity operations, IEEE Transactions on Robotics 41 (2025) 5117–5129.doi:10.1109/tro.2025. 3600160

work page doi:10.1109/tro.2025 2025

[27] [28]

Capra, A

L. Capra, A. Brandonisio, M. R. Lavagna, Reinforced model predictive guidance and control for spacecraft proximity operations, Aerospace 12 (9) (2025) 837.doi:10.3390/aerospace12090837

work page doi:10.3390/aerospace12090837 2025

[28] [29]

J. T. A. Vedant, M. West, A. Ghosh, Reinforcement learning for spacecraft attitude control, in: 70th International Astronautical Congress, Washington D.C., United States, 2019

2019

[29] [30]

Zheng, Y

M. Zheng, Y. Wu, C. Li, Reinforcement learning strategy for spacecraft attitude hyperagile tracking control with uncertainties, Aerospace Science and Technology 119 (2021) 107126.doi:10.1016/j.ast. 2021.107126

work page doi:10.1016/j.ast 2021

[30] [31]

Chen, Y.-X

R.-Z. Chen, Y.-X. Li, C. K. Ahn, Reinforcement-learning-based fixed-time attitude consensus control for multiple spacecraft systems with model uncertainties, Aerospace Science and Technology 132 (2023) 108060.doi:10.1016/j.ast.2022.108060

work page doi:10.1016/j.ast.2022.108060 2023

[31] [32]

Chaudhary, H

Y. Chaudhary, H. Holt, L. Anoè, R. Armellin, C. Bombardelli, Low-Thrust Cis-Lunar Transfers exploiting Ballistic Capture Trajectories, in: AIAA SCITECH 2024 Forum, American Institute of Aeronautics and Astronautics (AIAA), 2024.doi:10.2514/6.2024-0837

work page doi:10.2514/6.2024-0837 2024

[32] [33]

Liu, Fuel-optimal rocket landing with aerodynamic controls, Journal of Guidance, Control, and Dynamics 42 (1) (2019) 65–77.doi:10.2514/1.g003537

X. Liu, Fuel-optimal rocket landing with aerodynamic controls, Journal of Guidance, Control, and Dynamics 42 (1) (2019) 65–77.doi:10.2514/1.g003537

work page doi:10.2514/1.g003537 2019

[33] [34]

Engstrom, A

L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph, A. Madry, Implementation matters in deep policy gradients: A case study on ppo and trpo (2020).arXiv:2005.12729

arXiv 2020

[34] [35]

Raffin, A

A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dormann, Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research 22 (268) (2021) 1–8, available at http://jmlr.org/papers/v22/20-1364.html

2021

[35] [36]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High-Performance Deep Learning Library (Dec. 2019).arXiv:1912.01703

Pith/arXiv arXiv 2019

[36] [37]

Towers, A

M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. De Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, et al., Gymnasium: A standard interface for reinforcement learning environments (2024).arXiv:2407.17032

Pith/arXiv arXiv 2024

[37] [38]

E. M. Standish, JPL planetary and lunar ephemerides, DE405/LE405, Interoffice Memorandum JPL IOM 312.F-98-048, Jet Propulsion Laboratory (Aug. 1998).ftp://ssd.jpl.nasa.gov/pub/eph/planets/ ioms/de405.iom.pdf. 38

1998

[38] [39]

Diamond, S

S. Diamond, S. Boyd, CVXPY: A Python-embedded modeling language for convex optimization, Journal of Machine Learning Research 17 (83) (2016) 1–5

2016

[39] [40]

Agrawal, R

A. Agrawal, R. Verschueren, S. Diamond, S. Boyd, A rewriting system for convex optimization problems, Journal of Control and Decision 5 (1) (2018) 42–60

2018

[40] [41]

com/latest/pythonapi/index.html

MOSEK ApS, The MOSEK Python Optimizer API manual, Version 11.1 (2025).https://docs.mosek. com/latest/pythonapi/index.html. 39

2025