Memory-Efficient Meta-Reinforcement Learning for Adaptive Safety-Critical Control in Adversarial Spacecraft Proximity Operations

Alejandro Posadas-Nava; Minduli Wijayatunga; Richard Linares

arxiv: 2606.17414 · v1 · pith:E2XZ5WBGnew · submitted 2026-06-16 · 💻 cs.LG · math.DS

Memory-Efficient Meta-Reinforcement Learning for Adaptive Safety-Critical Control in Adversarial Spacecraft Proximity Operations

Alejandro Posadas-Nava , Richard Linares , Minduli Wijayatunga This is my paper

Pith reviewed 2026-06-27 01:56 UTC · model grok-4.3

classification 💻 cs.LG math.DS

keywords meta-reinforcement learningcontrol barrier functionsspacecraft proximity operationsstate space modelsadversarial scenariossafety-critical controlproximal policy optimizationMamba

0 comments

The pith

Mamba with PPO outperforms LSTM and GRU in meta-RL for learning safety functions in adversarial spacecraft rendezvous.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends prior meta-RL work on tuning class-K functions for input-constrained control barrier functions by comparing three recurrent architectures and two training algorithms on spacecraft proximity operations. It evaluates performance in cooperative scenarios and in uncooperative ones where the target spacecraft deliberately reduces the chaser's safety margin. Results show that selective state space models paired with proximal policy optimization deliver higher task completion rates, fewer safety violations, and lower fuel use than the alternatives across all tested conditions. A reader would care because the finding identifies a practical memory-efficient setup for adaptive safety-critical controllers that must operate under thrust limits and potential opposition.

Core claim

The paper establishes that selective state space models such as Mamba, when used with proximal policy optimization to learn the class-K functions defining the input-constrained control barrier function recursion via meta-reinforcement learning, achieve superior task completion, safety maintenance, and fuel savings relative to long short-term memory and gated recurrent unit networks trained with either proximal policy optimization or soft actor-critic, in both cooperative and adversarial spacecraft proximity operation scenarios.

What carries the argument

Meta-RL training of recurrent networks to parameterize class-K functions inside the ICCBF forward-invariance recursion, evaluated by task success, safety constraint satisfaction, and propellant consumption under adversarial target motion.

If this is right

Controllers for rendezvous can maintain safety margins against uncooperative targets while using less fuel than current recurrent baselines.
The same meta-RL pipeline can be applied to other nonlinear systems that require input-constrained safety filters.
State-space-model-based policies reduce memory footprint during online adaptation compared with LSTM or GRU equivalents.
Safety-critical meta-RL becomes viable for missions where the target may actively degrade the chaser's feasible set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Onboard spacecraft computers with limited RAM could run these policies in real time where LSTM versions would exceed memory limits.
The performance edge may allow shorter meta-training episodes, lowering the computational cost of adapting to new orbital regimes.
Similar architecture comparisons could be run for other safety-filtered control problems such as autonomous underwater vehicles or aerial collision avoidance.

Load-bearing premise

The simulation environments and selected adversarial behaviors are representative enough that performance gaps seen in training will appear under real spacecraft dynamics and disturbances.

What would settle it

A hardware-in-the-loop experiment or on-orbit test in which the chaser encounters a target with unmodeled dynamics or different adversarial tactics would show whether the reported gains in completion, safety, and fuel use persist.

read the original abstract

Autonomous spacecraft rendezvous and proximity operations (RPO) require controllers that guarantee safety under thrust constraints while minimizing fuel expenditure. Input-constrained control barrier functions (ICCBFs) provide a control method for nonlinear systems with actuation constraints that construct a forward-invariant safe set. Previous work has shown that learning class-$\mathcal{K}$ functions defining the ICCBF recursion via meta reinforcement learning (meta-RL) yields a robust, non-greedy approach to safety-critical control in RPO. This paper extends that framework further by investigating the performance of three recurrent network architectures (Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Selective State Space Model (Mamba)) and two training algorithms (Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC)) to identify the best setup for tuning ICCBF class-K functions via meta-RL. In addition to cooperative test cases, performance is evaluated in the presence of adversarial behavior where the target spacecraft behaves in a way that worsens the safety of the chaser spacecraft. Results indicate that state space models such as Mamba when used with PPO achieve superior task completion, safety, and fuel-savings compared to other architectures, across all cooperative and uncooperative scenarios tested.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mamba plus PPO beats the other recurrent options in this meta-RL tuning of ICCBFs for spacecraft RPO, including the adversarial cases, and the experiments line up with the claim.

read the letter

The main takeaway is that Mamba with PPO gives the best results on task completion, safety margins, and fuel use when meta-RL is used to learn the class-K functions for input-constrained control barrier functions. The paper tests this in both cooperative and adversarial rendezvous scenarios where the target spacecraft actively tries to reduce safety.

What is new is the direct comparison of LSTM, GRU, and Mamba paired with PPO versus SAC, plus the addition of those adversarial test cases on top of the earlier meta-RL + ICCBF framework. The authors run the same training and evaluation protocol across the six combinations and report consistent rankings that favor the state-space model with the on-policy algorithm.

The experimental design holds together. The metrics back the superiority claim without obvious circularity or fitting inside the same loop. The control formulation stays standard and the training loop does not appear to violate its own assumptions.

A minor soft spot is the usual one for simulation studies: the adversarial behaviors and dynamics are chosen by the authors, so transfer to real hardware with unmodeled effects is not demonstrated. That does not undermine the internal comparison, but it limits how far the ranking can be taken without further validation.

This paper is for people working on safe RL for aerospace or other safety-critical control problems. It is a solid empirical benchmark rather than a new theoretical method. The work shows clear thinking and honest engagement with the prior literature on meta-RL for ICCBFs.

I would send it to peer review. The experiments are reproducible enough on their own terms to merit referee time, even if revisions will be needed on the transfer question and hyperparameter reporting.

Referee Report

0 major / 3 minor

Summary. The manuscript extends prior meta-RL work on learning class-K functions for Input-Constrained Control Barrier Functions (ICCBFs) in spacecraft rendezvous and proximity operations. It empirically compares three recurrent architectures (LSTM, GRU, Mamba) paired with PPO and SAC across cooperative and adversarial target behaviors, reporting that Mamba+PPO yields the highest task completion rates, safety margins, and fuel efficiency.

Significance. If the reported rankings hold under the stated experimental conditions, the results supply actionable guidance on architecture selection for meta-RL safety filters in aerospace control. The explicit inclusion of adversarial scenarios and the focus on fuel-constrained, actuation-limited dynamics add practical value beyond standard RL benchmarks.

minor comments (3)

[Abstract] Abstract: the superiority claim is stated without any numerical values, trial counts, or statistical tests; adding one or two key metrics (e.g., success rate or fuel delta) would improve immediate readability.
[Section 4] Section 4 (Experimental Setup): hyperparameter tables list network sizes and learning rates but omit the exact meta-RL horizon length and the number of independent seeds used for each architecture-algorithm pair; these details are needed for reproducibility.
[Figures 5-7] Figures 5-7: the performance plots lack error bars or shaded regions indicating variability across trials; adding them would strengthen the visual comparison of Mamba+PPO against the baselines.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work on meta-RL for ICCBF tuning in spacecraft RPO, including the recognition of its practical value in adversarial settings. We are pleased with the recommendation for minor revision.

Circularity Check

0 steps flagged

No significant circularity; empirical benchmark with independent results

full rationale

The manuscript is an empirical study comparing recurrent architectures (LSTM, GRU, Mamba) paired with PPO/SAC for meta-RL tuning of ICCBF class-K functions in cooperative and adversarial RPO scenarios. No derivation chain, uniqueness theorem, or fitted-parameter prediction is present that reduces reported performance metrics to quantities defined inside the same loop. The reference to prior work on the meta-RL + ICCBF framework is background context only and does not bear the load of the new architecture ranking. All claims rest on simulation metrics that are externally falsifiable and not forced by construction from the inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The paper is an empirical ML application study. No new physical axioms or invented entities are introduced. Free parameters consist of standard RL hyperparameters and network sizes whose values are not reported in the abstract.

free parameters (1)

meta-RL training hyperparameters and network sizes
Typical for any RL study; values not stated in abstract and would be fitted or chosen to produce the reported ranking.

pith-pipeline@v0.9.1-grok · 5765 in / 1174 out tokens · 31805 ms · 2026-06-27T01:56:01.765944+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 19 canonical work pages

[1]

Survey of Numerical Methods for Trajectory Optimization,

J. T. Betts, “Survey of Numerical Methods for Trajectory Optimization,”Journal of Guidance, Control, and Dynamics, V ol. 21, No. 2, 1998, pp. 193–207, 10.2514/2.4231

work page doi:10.2514/2.4231 1998
[2]

Exploiting Scaling Constants to Facilitate the Con- vergence of Indirect Trajectory Optimization Methods,

M. C. Wijayatunga, R. Armellin, and L. Pirovano, “Exploiting Scaling Constants to Facilitate the Con- vergence of Indirect Trajectory Optimization Methods,”Journal of Guidance, Control, and Dynamics, V ol. 46, No. 5, 2023, pp. 958–969, 10.2514/1.G007091

work page doi:10.2514/1.g007091 2023
[3]

Autonomous Trajectory Planning for Rendezvous and Proximity Operations by Conic Optimization,

P. Lu and X. Liu, “Autonomous Trajectory Planning for Rendezvous and Proximity Operations by Conic Optimization,”Journal of Guidance, Control, and Dynamics, V ol. 36, No. 2, 2013, pp. 375–389, 10.2514/1.58436

work page doi:10.2514/1.58436 2013
[4]

Reynolds, Michael Szmuk, Thomas Lew, Riccardo Bonalli, Marco Pavone, and Behçet Açıkme¸ se

D. Malyuta, T. P. Reynolds, M. Szmuk, T. Lew, R. Bonalli, M. Pavone, and B. Ac ¸ıkmes ¸e, “Convex Optimization for Trajectory Generation: A Tutorial on Generating Dynamically Feasible Trajecto- ries Reliably and Efficiently,”IEEE Control Systems Magazine, V ol. 42, No. 5, 2022, pp. 40–113, 10.1109/MCS.2022.3187542

work page doi:10.1109/mcs.2022.3187542 2022
[5]

State-Dependent Trust Region for Successive Convex Optimization of Spacecraft Trajectories,

N. Bernardini, M. C. Wijayatunga, N. Baresi, and R. Armellin, “State-Dependent Trust Region for Successive Convex Optimization of Spacecraft Trajectories,”33rd AAS/AIAA Space Flight Mechanics Meeting, Austin, TX, 2023

2023
[6]

Wijayatunga, Roberto Armellin, Harry Holt, Laura Pirovano, and Aleksander A

M. C. Wijayatunga, R. Armellin, H. Holt, L. Pirovano, and A. A. Lidtke, “Design and Guidance of a Multi-Active Debris Removal Mission,”Astrodynamics, V ol. 7, No. 4, 2023, pp. 383–399, 10.1007/s42064-023-0159-3

work page doi:10.1007/s42064-023-0159-3 2023
[7]

An Autonomous, End-to-End, Convex- Based Framework for Close-Range Rendezvous Trajectory Design and Guidance with Hardware Testbed Validation,

M. C. Wijayatunga, J. Guinane, N. D. Wallace, and X. Wu, “An Autonomous, End-to-End, Convex- Based Framework for Close-Range Rendezvous Trajectory Design and Guidance with Hardware Testbed Validation,” 2026, 10.48550/arXiv.2602.12421

work page doi:10.48550/arxiv.2602.12421 2026
[8]

Model Predictive Control for Spacecraft Rendezvous and Docking: Strategies for Handling Constraints and Case Studies,

A. Weiss, M. Baldwin, R. S. Erwin, and I. Kolmanovsky, “Model Predictive Control for Spacecraft Rendezvous and Docking: Strategies for Handling Constraints and Case Studies,”IEEE Transactions on Control Systems Technology, V ol. 23, No. 4, 2015, pp. 1638–1647, 10.1109/TCST.2014.2379639

work page doi:10.1109/tcst.2014.2379639 2015
[9]

Gaudet, R

B. Gaudet, R. Linares, and R. Furfaro, “Deep Reinforcement Learning for Six Degree-of- Freedom Planetary Landing,”Advances in Space Research, V ol. 65, No. 7, 2020, pp. 1723–1741, 10.1016/j.asr.2019.12.030

work page doi:10.1016/j.asr.2019.12.030 2020
[10]

Zavoli, L

A. Zavoli and L. Federici, “Reinforcement Learning for Robust Trajectory Design of Interplane- tary Missions,”Journal of Guidance, Control, and Dynamics, V ol. 44, No. 8, 2021, pp. 1440–1453, 10.2514/1.G005794

work page doi:10.2514/1.g005794 2021
[11]

Federici, B

L. Federici, B. Benedikter, and A. Zavoli, “Deep Learning Techniques for Autonomous Spacecraft Guidance During Proximity Operations,”Journal of Spacecraft and Rockets, V ol. 58, No. 6, 2021, pp. 1774–1785, 10.2514/1.A35076

work page doi:10.2514/1.a35076 2021
[12]

Robust Trajectory Design and Guidance for Far-Range Rendezvous Using Reinforcement Learning with Safety and Observability Considerations,

M. C. Wijayatunga, R. Armellin, and H. Holt, “Robust Trajectory Design and Guidance for Far-Range Rendezvous Using Reinforcement Learning with Safety and Observability Considerations,”Aerospace Science and Technology, V ol. 159, 2025, p. 109996, 10.1016/j.ast.2025.109996

work page doi:10.1016/j.ast.2025.109996 2025
[13]

Meta-Reinforcement Learning for Adaptive Spacecraft Guidance During Finite-Thrust Rendezvous Missions,

L. Federici, A. Scorsoglio, A. Zavoli, and R. Furfaro, “Meta-Reinforcement Learning for Adaptive Spacecraft Guidance During Finite-Thrust Rendezvous Missions,”Acta Astronautica, V ol. 201, 2022, pp. 129–141, 10.1016/j.actaastro.2022.08.047

work page doi:10.1016/j.actaastro.2022.08.047 2022
[14]

Meta-Reinforcement Learning for Spacecraft Proximity Op- erations Guidance and Control in Cislunar Space,

G. Fereoli, H. Schaub, and P. Di Lizia, “Meta-Reinforcement Learning for Spacecraft Proximity Op- erations Guidance and Control in Cislunar Space,”Journal of Spacecraft and Rockets, V ol. 62, No. 3, 2025, pp. 706–718, 10.2514/1.A36100

work page doi:10.2514/1.a36100 2025
[15]

Safe Reinforcement Learning via Shielding,

M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu, “Safe Reinforcement Learning via Shielding,”Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 32, 2018

2018
[16]

Run Time Assured Reinforcement Learning for Safe Satellite Docking,

K. Dunlap, M. Mote, K. Delsing, and K. L. Hobbs, “Run Time Assured Reinforcement Learning for Safe Satellite Docking,”Journal of Aerospace Information Systems, V ol. 20, No. 1, 2023, pp. 25–36, 10.2514/1.I011126

work page doi:10.2514/1.i011126 2023
[17]

Safe Spacecraft Inspection via Deep Reinforcement Learning and Discrete Control Barrier Functions,

D. v. Wijk, K. Dunlap, M. Majji, and K. L. Hobbs, “Safe Spacecraft Inspection via Deep Reinforcement Learning and Discrete Control Barrier Functions,”Journal of Aerospace Information Systems, V ol. 21, No. 12, 2024, pp. 996–1013

2024
[18]

Learning Safety-Guaranteed, Non- Greedy Control Barrier Functions Using Reinforcement Learning,

M. Wijayatunga, N. Wallace, S. Sukkarieh, and R. Armellin, “Learning Safety-Guaranteed, Non- Greedy Control Barrier Functions Using Reinforcement Learning,” 2026

2026
[19]

Control Barrier Functions: Theory and Applications,

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control Barrier Functions: Theory and Applications,” 2019

2019
[20]

Neural network training as an optimal control problem : — an augmented lagrangian approach —

D. R. Agrawal and D. Panagou, “Safe Control Synthesis via Input Constrained Control Barrier Func- tions,”2021 60th IEEE Conference on Decision and Control (CDC), IEEE, Dec. 2021, p. 6113–6118, 10.1109/cdc45484.2021.9682938. 23

work page doi:10.1109/cdc45484.2021.9682938 2021
[21]

Safe Control With Learned Certificates: A Survey of Neural Lyapunov, Barrier, and Contraction Methods for Robotics and Control,

C. Dawson, S. Gao, and C. Fan, “Safe Control With Learned Certificates: A Survey of Neural Lyapunov, Barrier, and Contraction Methods for Robotics and Control,”IEEE Transactions on Robotics, V ol. 39, No. 3, 2023, pp. 1749–1767, 10.1109/TRO.2022.3232542

work page doi:10.1109/tro.2022.3232542 2023
[22]

Meta-Reinforcement Learning for Robust and Non- greedy Control Barrier Functions in Spacecraft Proximity Operations,

M. C. Wijayatunga, R. Linares, and R. Armellin, “Meta-Reinforcement Learning for Robust and Non- greedy Control Barrier Functions in Spacecraft Proximity Operations,” 2026

2026
[23]

Empirical Evaluation of Gated Recurrent Neural Net- works on Sequence Modeling,

J. Chung, C. Gulcehre, K. Cho, and Y . Bengio, “Empirical Evaluation of Gated Recurrent Neural Net- works on Sequence Modeling,” 2014

2014
[24]

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality,

T. Dao and A. Gu, “Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality,” 2024

2024
[25]

Proximal Policy Optimization Algo- rithms,

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algo- rithms,” 2017

2017
[26]

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” 2018

2018
[27]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction. The MIT Press, second ed., 2018

2018
[28]

J. Beck, R. Vuorio, E. Liu, Z. Xiong, L. Zintgraf, C. Finn, and S. Whiteson,A Tutorial on Meta- Reinforcement Learning. Foundations and Trends in Artificial Intelligence Series, Now Publishers, 2025

2025
[29]

Gaudet, R

B. Gaudet, R. Linares, and R. Furfaro, “Adaptive guidance and integrated navigation with reinforcement meta-learning,”Acta Astronautica, V ol. 169, 2020, pp. 180–190, https://doi.org/10.1016/j.actaastro.2020.01.007

work page doi:10.1016/j.actaastro.2020.01.007 2020
[30]

Control Barrier Functions in Sampled-Data Systems,

J. Breeden, K. Garg, and D. Panagou, “Control Barrier Functions in Sampled-Data Systems,”IEEE Control Systems Letters, V ol. 6, 2022, p. 367–372, 10.1109/lcsys.2021.3076127

work page doi:10.1109/lcsys.2021.3076127 2022
[31]

Safe Spacecraft Inspection via Deep Reinforcement Learning and Discrete Control Barrier Functions,

D. Van Wijk, K. Dunlap, M. Majji, and K. Hobbs, “Safe Spacecraft Inspection via Deep Reinforcement Learning and Discrete Control Barrier Functions,”Journal of Aerospace Information Systems, V ol. 21, No. 12, 2024, pp. 996–1013, 10.2514/1.I011391. 24

work page doi:10.2514/1.i011391 2024

[1] [1]

Survey of Numerical Methods for Trajectory Optimization,

J. T. Betts, “Survey of Numerical Methods for Trajectory Optimization,”Journal of Guidance, Control, and Dynamics, V ol. 21, No. 2, 1998, pp. 193–207, 10.2514/2.4231

work page doi:10.2514/2.4231 1998

[2] [2]

Exploiting Scaling Constants to Facilitate the Con- vergence of Indirect Trajectory Optimization Methods,

M. C. Wijayatunga, R. Armellin, and L. Pirovano, “Exploiting Scaling Constants to Facilitate the Con- vergence of Indirect Trajectory Optimization Methods,”Journal of Guidance, Control, and Dynamics, V ol. 46, No. 5, 2023, pp. 958–969, 10.2514/1.G007091

work page doi:10.2514/1.g007091 2023

[3] [3]

Autonomous Trajectory Planning for Rendezvous and Proximity Operations by Conic Optimization,

P. Lu and X. Liu, “Autonomous Trajectory Planning for Rendezvous and Proximity Operations by Conic Optimization,”Journal of Guidance, Control, and Dynamics, V ol. 36, No. 2, 2013, pp. 375–389, 10.2514/1.58436

work page doi:10.2514/1.58436 2013

[4] [4]

Reynolds, Michael Szmuk, Thomas Lew, Riccardo Bonalli, Marco Pavone, and Behçet Açıkme¸ se

D. Malyuta, T. P. Reynolds, M. Szmuk, T. Lew, R. Bonalli, M. Pavone, and B. Ac ¸ıkmes ¸e, “Convex Optimization for Trajectory Generation: A Tutorial on Generating Dynamically Feasible Trajecto- ries Reliably and Efficiently,”IEEE Control Systems Magazine, V ol. 42, No. 5, 2022, pp. 40–113, 10.1109/MCS.2022.3187542

work page doi:10.1109/mcs.2022.3187542 2022

[5] [5]

State-Dependent Trust Region for Successive Convex Optimization of Spacecraft Trajectories,

N. Bernardini, M. C. Wijayatunga, N. Baresi, and R. Armellin, “State-Dependent Trust Region for Successive Convex Optimization of Spacecraft Trajectories,”33rd AAS/AIAA Space Flight Mechanics Meeting, Austin, TX, 2023

2023

[6] [6]

Wijayatunga, Roberto Armellin, Harry Holt, Laura Pirovano, and Aleksander A

M. C. Wijayatunga, R. Armellin, H. Holt, L. Pirovano, and A. A. Lidtke, “Design and Guidance of a Multi-Active Debris Removal Mission,”Astrodynamics, V ol. 7, No. 4, 2023, pp. 383–399, 10.1007/s42064-023-0159-3

work page doi:10.1007/s42064-023-0159-3 2023

[7] [7]

An Autonomous, End-to-End, Convex- Based Framework for Close-Range Rendezvous Trajectory Design and Guidance with Hardware Testbed Validation,

M. C. Wijayatunga, J. Guinane, N. D. Wallace, and X. Wu, “An Autonomous, End-to-End, Convex- Based Framework for Close-Range Rendezvous Trajectory Design and Guidance with Hardware Testbed Validation,” 2026, 10.48550/arXiv.2602.12421

work page doi:10.48550/arxiv.2602.12421 2026

[8] [8]

Model Predictive Control for Spacecraft Rendezvous and Docking: Strategies for Handling Constraints and Case Studies,

A. Weiss, M. Baldwin, R. S. Erwin, and I. Kolmanovsky, “Model Predictive Control for Spacecraft Rendezvous and Docking: Strategies for Handling Constraints and Case Studies,”IEEE Transactions on Control Systems Technology, V ol. 23, No. 4, 2015, pp. 1638–1647, 10.1109/TCST.2014.2379639

work page doi:10.1109/tcst.2014.2379639 2015

[9] [9]

Gaudet, R

B. Gaudet, R. Linares, and R. Furfaro, “Deep Reinforcement Learning for Six Degree-of- Freedom Planetary Landing,”Advances in Space Research, V ol. 65, No. 7, 2020, pp. 1723–1741, 10.1016/j.asr.2019.12.030

work page doi:10.1016/j.asr.2019.12.030 2020

[10] [10]

Zavoli, L

A. Zavoli and L. Federici, “Reinforcement Learning for Robust Trajectory Design of Interplane- tary Missions,”Journal of Guidance, Control, and Dynamics, V ol. 44, No. 8, 2021, pp. 1440–1453, 10.2514/1.G005794

work page doi:10.2514/1.g005794 2021

[11] [11]

Federici, B

L. Federici, B. Benedikter, and A. Zavoli, “Deep Learning Techniques for Autonomous Spacecraft Guidance During Proximity Operations,”Journal of Spacecraft and Rockets, V ol. 58, No. 6, 2021, pp. 1774–1785, 10.2514/1.A35076

work page doi:10.2514/1.a35076 2021

[12] [12]

Robust Trajectory Design and Guidance for Far-Range Rendezvous Using Reinforcement Learning with Safety and Observability Considerations,

M. C. Wijayatunga, R. Armellin, and H. Holt, “Robust Trajectory Design and Guidance for Far-Range Rendezvous Using Reinforcement Learning with Safety and Observability Considerations,”Aerospace Science and Technology, V ol. 159, 2025, p. 109996, 10.1016/j.ast.2025.109996

work page doi:10.1016/j.ast.2025.109996 2025

[13] [13]

Meta-Reinforcement Learning for Adaptive Spacecraft Guidance During Finite-Thrust Rendezvous Missions,

L. Federici, A. Scorsoglio, A. Zavoli, and R. Furfaro, “Meta-Reinforcement Learning for Adaptive Spacecraft Guidance During Finite-Thrust Rendezvous Missions,”Acta Astronautica, V ol. 201, 2022, pp. 129–141, 10.1016/j.actaastro.2022.08.047

work page doi:10.1016/j.actaastro.2022.08.047 2022

[14] [14]

Meta-Reinforcement Learning for Spacecraft Proximity Op- erations Guidance and Control in Cislunar Space,

G. Fereoli, H. Schaub, and P. Di Lizia, “Meta-Reinforcement Learning for Spacecraft Proximity Op- erations Guidance and Control in Cislunar Space,”Journal of Spacecraft and Rockets, V ol. 62, No. 3, 2025, pp. 706–718, 10.2514/1.A36100

work page doi:10.2514/1.a36100 2025

[15] [15]

Safe Reinforcement Learning via Shielding,

M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu, “Safe Reinforcement Learning via Shielding,”Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 32, 2018

2018

[16] [16]

Run Time Assured Reinforcement Learning for Safe Satellite Docking,

K. Dunlap, M. Mote, K. Delsing, and K. L. Hobbs, “Run Time Assured Reinforcement Learning for Safe Satellite Docking,”Journal of Aerospace Information Systems, V ol. 20, No. 1, 2023, pp. 25–36, 10.2514/1.I011126

work page doi:10.2514/1.i011126 2023

[17] [17]

Safe Spacecraft Inspection via Deep Reinforcement Learning and Discrete Control Barrier Functions,

D. v. Wijk, K. Dunlap, M. Majji, and K. L. Hobbs, “Safe Spacecraft Inspection via Deep Reinforcement Learning and Discrete Control Barrier Functions,”Journal of Aerospace Information Systems, V ol. 21, No. 12, 2024, pp. 996–1013

2024

[18] [18]

Learning Safety-Guaranteed, Non- Greedy Control Barrier Functions Using Reinforcement Learning,

M. Wijayatunga, N. Wallace, S. Sukkarieh, and R. Armellin, “Learning Safety-Guaranteed, Non- Greedy Control Barrier Functions Using Reinforcement Learning,” 2026

2026

[19] [19]

Control Barrier Functions: Theory and Applications,

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control Barrier Functions: Theory and Applications,” 2019

2019

[20] [20]

Neural network training as an optimal control problem : — an augmented lagrangian approach —

D. R. Agrawal and D. Panagou, “Safe Control Synthesis via Input Constrained Control Barrier Func- tions,”2021 60th IEEE Conference on Decision and Control (CDC), IEEE, Dec. 2021, p. 6113–6118, 10.1109/cdc45484.2021.9682938. 23

work page doi:10.1109/cdc45484.2021.9682938 2021

[21] [21]

Safe Control With Learned Certificates: A Survey of Neural Lyapunov, Barrier, and Contraction Methods for Robotics and Control,

C. Dawson, S. Gao, and C. Fan, “Safe Control With Learned Certificates: A Survey of Neural Lyapunov, Barrier, and Contraction Methods for Robotics and Control,”IEEE Transactions on Robotics, V ol. 39, No. 3, 2023, pp. 1749–1767, 10.1109/TRO.2022.3232542

work page doi:10.1109/tro.2022.3232542 2023

[22] [22]

Meta-Reinforcement Learning for Robust and Non- greedy Control Barrier Functions in Spacecraft Proximity Operations,

M. C. Wijayatunga, R. Linares, and R. Armellin, “Meta-Reinforcement Learning for Robust and Non- greedy Control Barrier Functions in Spacecraft Proximity Operations,” 2026

2026

[23] [23]

Empirical Evaluation of Gated Recurrent Neural Net- works on Sequence Modeling,

J. Chung, C. Gulcehre, K. Cho, and Y . Bengio, “Empirical Evaluation of Gated Recurrent Neural Net- works on Sequence Modeling,” 2014

2014

[24] [24]

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality,

T. Dao and A. Gu, “Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality,” 2024

2024

[25] [25]

Proximal Policy Optimization Algo- rithms,

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algo- rithms,” 2017

2017

[26] [26]

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” 2018

2018

[27] [27]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction. The MIT Press, second ed., 2018

2018

[28] [28]

J. Beck, R. Vuorio, E. Liu, Z. Xiong, L. Zintgraf, C. Finn, and S. Whiteson,A Tutorial on Meta- Reinforcement Learning. Foundations and Trends in Artificial Intelligence Series, Now Publishers, 2025

2025

[29] [29]

Gaudet, R

B. Gaudet, R. Linares, and R. Furfaro, “Adaptive guidance and integrated navigation with reinforcement meta-learning,”Acta Astronautica, V ol. 169, 2020, pp. 180–190, https://doi.org/10.1016/j.actaastro.2020.01.007

work page doi:10.1016/j.actaastro.2020.01.007 2020

[30] [30]

Control Barrier Functions in Sampled-Data Systems,

J. Breeden, K. Garg, and D. Panagou, “Control Barrier Functions in Sampled-Data Systems,”IEEE Control Systems Letters, V ol. 6, 2022, p. 367–372, 10.1109/lcsys.2021.3076127

work page doi:10.1109/lcsys.2021.3076127 2022

[31] [31]

Safe Spacecraft Inspection via Deep Reinforcement Learning and Discrete Control Barrier Functions,

D. Van Wijk, K. Dunlap, M. Majji, and K. Hobbs, “Safe Spacecraft Inspection via Deep Reinforcement Learning and Discrete Control Barrier Functions,”Journal of Aerospace Information Systems, V ol. 21, No. 12, 2024, pp. 996–1013, 10.2514/1.I011391. 24

work page doi:10.2514/1.i011391 2024