SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning

Romit Maulik; Xuyang Li

arxiv: 2502.15512 · v5 · submitted 2025-02-21 · 💻 cs.LG

SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning

Xuyang Li , Romit Maulik This is my paper

Pith reviewed 2026-05-23 02:37 UTC · model grok-4.3

classification 💻 cs.LG

keywords reinforcement learningstability analysislatent space modelingaction dynamicsinterpretabilityencoder decoderlinear dynamical systems

0 comments

The pith

A latent space model of actions lets RL agents forecast local instability before execution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes modeling the actions of reinforcement learning agents as evolving variables inside a latent space. It uses a pre-trained encoder-decoder together with a state-dependent linear system to approximate how those actions change over time. Local stability analysis on this model then forecasts whether action norms will grow immediately after selection. A reader would care because real-world control tasks need ways to check for unsafe behavior ahead of time rather than only after failure. The method works by attaching to already trained agents without changing their performance on standard benchmarks.

Core claim

SALSA-RL models control actions as dynamic time-dependent variables in a latent space by means of a pre-trained encoder-decoder and a state-dependent linear system. This construction permits local stability analysis that predicts instantaneous growth in action-norms before the actions are executed. The framework can be deployed non-invasively to assess the local stability of actions generated by pretrained RL agents while leaving their performance unchanged across diverse benchmark environments.

What carries the argument

A pre-trained encoder-decoder combined with a state-dependent linear system that models the evolution of actions in latent space.

If this is right

Instantaneous growth in action-norms can be predicted before execution.
The analysis applies to pretrained agents in a non-invasive way.
Performance on benchmark environments stays the same.
It advances interpretability for the design and analysis of RL systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Stability predictions might be used to modify or filter actions in real time for safer control.
Similar latent linear models could apply to other sequential decision problems beyond RL.
One could test whether the linear approximation holds over longer time horizons than the local analysis assumes.

Load-bearing premise

A pre-trained encoder-decoder and state-dependent linear system can capture the time evolution of actions well enough that local stability predictions remain meaningful.

What would settle it

Running the stability predictions on a trained agent and then observing that the predicted growth rates do not correspond to the actual changes in action norms during environment interactions.

Figures

Figures reproduced from arXiv: 2502.15512 by Romit Maulik, Xuyang Li.

**Figure 1.** Figure 1: Overview of the SALSA-RL framework. Our proposed augmentation to pre-trained RL algorithms relies on a latent action representation governed by a time-varying linear dynamical system through a state-conditioned matrix At. This enables local stability analyses in the action-state phase for reliable and interpretable RL deployments. The framework integrates seamlessly with existing RL algorithms, maintaining… view at source ↗

**Figure 2.** Figure 2: Local stability analysis of Pendulum control. The white line marks where the spectral radius equals 1, and the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Local stability analysis of CartPole control. Each row shows a representative state. Due to the nature of [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Local stability analysis of LunarLander hovering control. The white line indicates where spectral radius [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Transient Growth and Floquet analysis of Pendulum control. The pendulum achieves stabilization in case [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: A custom control scenario where the system remains action-free during the first 30 and last 50 timesteps. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Local stability analysis of the Bipedal Walker control for two cases (left two columns and right two columns). [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Action space of the proposed SALSA-RL (left) compared to the SAC baseline (right). The sharp transitions [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Local stability contour plots for different hidden dimensions: (top-left) [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Stability analysis for Humanoid with hd = 32. Top row: evolution of actions, states, spectral radius, and imaginary parts of eigenvalues. Bottom row: extended analysis via stability contour visualizations. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

read the original abstract

Modern deep reinforcement learning (DRL) methods have made significant advances in handling continuous action spaces. However, real-world control systems, especially those requiring precise and reliable performance, often demand interpretability in the sense of a-priori assessments of agent behavior to identify safe or failure-prone interactions with environments. To address this limitation, this work proposes SALSA-RL (Stability Analysis in the Latent Space of Actions), a novel RL framework that models control actions as dynamic, time-dependent variables evolving within a latent space. By employing a pre-trained encoder-decoder and a state-dependent linear system, this approach enables interpretability through local stability analysis, where instantaneous growth in action-norms can be predicted before their execution. It is demonstrated that SALSA-RL can be deployed in a non-invasive manner for assessing the local stability of actions from pretrained RL agents without compromising on performance across diverse benchmark environments. By enabling a more interpretable analysis of action generation, SALSA-RL provides a powerful tool for advancing the design, analysis, and theoretical understanding of RL systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes SALSA-RL, a framework that models RL control actions as time-dependent variables in a latent space via a pre-trained encoder-decoder and state-dependent linear system. This enables local stability analysis to predict instantaneous growth rates of action norms before execution, providing interpretability for pretrained agents in a non-invasive way across benchmark environments without performance degradation.

Significance. If the modeling assumptions hold and the linearization faithfully captures relevant action dynamics, the method could supply a useful tool for a-priori safety assessment and interpretability in continuous-control RL. However, the manuscript supplies no empirical support for these assumptions, so the significance remains potential rather than demonstrated.

major comments (1)

[Abstract] Abstract: the central claim that local stability predictions are meaningful requires that the pre-trained encoder-decoder and state-dependent linear system faithfully reproduce action time-evolution (small linearization error over relevant timescales and preservation of norm-growth dynamics in latent space). No reconstruction error, multi-step prediction error, or correlation between predicted and observed action-norm trajectories is reported, leaving the modeling assumption untested.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need to validate the core modeling assumptions. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that local stability predictions are meaningful requires that the pre-trained encoder-decoder and state-dependent linear system faithfully reproduce action time-evolution (small linearization error over relevant timescales and preservation of norm-growth dynamics in latent space). No reconstruction error, multi-step prediction error, or correlation between predicted and observed action-norm trajectories is reported, leaving the modeling assumption untested.

Authors: We agree that the abstract does not report explicit reconstruction error, multi-step prediction error, or direct correlation metrics between predicted and observed action-norm trajectories. The current manuscript instead demonstrates non-invasive deployment on pretrained agents with no performance degradation across benchmarks, which provides indirect evidence that the latent dynamics are sufficiently faithful for the intended stability analysis. To directly address the concern, we will add a dedicated subsection (and associated figures) in the Experiments section reporting reconstruction MSE, multi-step rollout error over relevant timescales, and Pearson correlation between predicted and empirical action-norm growth rates. This will make the validation of the linearization assumptions explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper constructs a modeling pipeline (pre-trained encoder-decoder plus state-dependent linear system) explicitly to enable local stability analysis of action dynamics in latent space. The claimed predictions of instantaneous action-norm growth are outputs of this fitted model rather than independent results that reduce to the inputs by definition. No self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described framework. The central claim concerns the interpretability utility of applying stability analysis to the constructed model, which remains self-contained and externally falsifiable via reconstruction or multi-step prediction errors.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of concrete free parameters, axioms, or invented entities. The framework implicitly relies on the existence of a useful latent representation and a linear approximation whose validity is not justified in the provided text.

pith-pipeline@v0.9.0 · 5714 in / 1041 out tokens · 24206 ms · 2026-05-23T02:37:51.080888+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 6 internal anchors

[1]

A physics-guided reinforcement learning framework for an autonomous manufacturing system with expensive data

Md Ferdous Alam, Max Shtein, Kira Barton, and David J Hoelzle. A physics-guided reinforcement learning framework for an autonomous manufacturing system with expensive data. In 2021 American Control Conference (ACC), pp. 484–490. IEEE, 2021

work page 2021
[2]

Advanced PID control

Karl Johan ˚Astr¨om and Tore H¨agglund. Advanced PID control. ISA-The Instrumentation, Systems and Automation Society, 2006

work page 2006
[3]

A survey on physics informed reinforcement learning: Review and open problems

Chayan Banerjee, Kien Nguyen, Clinton Fookes, and Maziar Raissi. A survey on physics informed reinforcement learning: Review and open problems. arXiv preprint arXiv:2309.01909, 2023

work page arXiv 2023
[4]

Prototype-based models in machine learning

Michael Biehl, Barbara Hammer, and Thomas Villmann. Prototype-based models in machine learning. Wiley Interdisciplinary Reviews: Cognitive Science, 7(2):92–111, 2016

work page 2016
[5]

OpenAI Gym

G Brockman. Openai gym. arXiv preprint arXiv:1606.01540, 2016. 12 arXiv SALSA-RL A PREPRINT

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Safe learning in robotics: From learning-based control to safe reinforcement learning

Lukas Brunke, Melissa Greeff, Adam W Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P Schoellig. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5(1):411–444, 2022

work page 2022
[7]

Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning

Jianyu Chen, Shengbo Eben Li, and Masayoshi Tomizuka. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 23(6):5068–5078, 2021

work page 2021
[8]

Physics-guided reinforcement learning for 3d molecular structures

Youngwoo Cho, Sookyung Kim, Peggy Pk Li, Mike P Surh, T Yong-Jin Han, and Jaegul Choo. Physics-guided reinforcement learning for 3d molecular structures. In Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS), 2019

work page 2019
[9]

Interpretable and explainable logical policies via neurally guided symbolic abstraction

Quentin Delfosse, Hikaru Shindo, Devendra Dhami, and Kristian Kersting. Interpretable and explainable logical policies via neurally guided symbolic abstraction. Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[10]

Benchmarking deep reinforcement learning for continuous control

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, pp. 1329–1338. PMLR, 2016

work page 2016
[11]

Stochastic Neural Networks for Hierarchical Reinforcement Learning

Carlos Florensa, Yan Duan, and Pieter Abbeel. Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Addressing function approximation error in actor-critic methods

Scott Fujimoto, Herke Hoof, and David Meger. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018

work page 2018
[13]

A review on deep reinforcement learning for fluid mechanics

Paul Garnier, Jonathan Viquerat, Jean Rabault, Aur´elien Larcher, Alexander Kuhnle, and Elie Hachem. A review on deep reinforcement learning for fluid mechanics. Computers & Fluids, 225:104973, 2021

work page 2021
[14]

A survey on interpretable reinforcement learning

Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao, and Wulong Liu. A survey on interpretable reinforcement learning. Machine Learning, pp. 1–44, 2024

work page 2024
[15]

A review of safe reinforcement learning: Methods, theories and applications

Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll. A review of safe reinforcement learning: Methods, theories and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024
[16]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018

work page 2018
[17]

Actor-critic reinforcement learning for control with stability guarantee

Minghao Han, Lixian Zhang, Jun Wang, and Wei Pan. Actor-critic reinforcement learning for control with stability guarantee. IEEE Robotics and Automation Letters, 5(4):6217–6224, 2020

work page 2020
[18]

Interpretable policies for reinforcement learning by genetic programming

Daniel Hein, Steffen Udluft, and Thomas A Runkler. Interpretable policies for reinforcement learning by genetic programming. Engineering Applications of Artificial Intelligence, 76:158–169, 2018

work page 2018
[19]

Port-metriplectic neural networks: thermodynamics-informed machine learning of complex physical systems

Quercus Hern´andez, Alberto Bad´ıas, Francisco Chinesta, and El´ıas Cueto. Port-metriplectic neural networks: thermodynamics-informed machine learning of complex physical systems. Computational Mechanics, 72(3): 553–561, 2023

work page 2023
[20]

Explainability in deep reinforcement learning

Alexandre Heuillet, Fabien Couthouis, and Natalia D´ıaz-Rodr´ıguez. Explainability in deep reinforcement learning. Knowledge-Based Systems, 214:106685, 2021

work page 2021
[21]

Neural logic reinforcement learning

Zhengyao Jiang and Shan Luo. Neural logic reinforcement learning. In International conference on machine learning, pp. 3110–3119. PMLR, 2019

work page 2019
[22]

Stability-certified reinforcement learning: A control-theoretic perspective

Ming Jin and Javad Lavaei. Stability-certified reinforcement learning: A control-theoretic perspective. IEEE Access, 8:229086–229100, 2020

work page 2020
[23]

Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning

Mu Jin, Zhihao Ma, Kebing Jin, Hankz Hankui Zhuo, Chen Chen, and Chao Yu. Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 7042–7050, 2022

work page 2022
[24]

The joint spectral radius: theory and applications, volume 385

Rapha¨el Jungers. The joint spectral radius: theory and applications, volume 385. Springer Science & Business Media, 2009

work page 2009
[25]

Increasing the safety of adaptive cruise control using physics-guided reinforcement learning

Sorin Liviu Jurj, Dominik Grundt, Tino Werner, Philipp Borchers, Karina Rothemann, and Eike M ¨ohlmann. Increasing the safety of adaptive cruise control using physics-guided reinforcement learning. Energies, 14(22): 7572, 2021

work page 2021
[26]

Scalable deep reinforcement learning for vision-based robotic manipulation

Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, et al. Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on robot learning, pp. 651–673. PMLR, 2018. 13 arXiv SALSA-RL A PREPRINT

work page 2018
[27]

Towards interpretable deep reinforcement learning with human- friendly prototypes

Eoin M Kenny, Mycal Tucker, and Julie Shah. Towards interpretable deep reinforcement learning with human- friendly prototypes. In The Eleventh International Conference on Learning Representations, 2023

work page 2023
[28]

A framework for output-feedback symbolic control

Mahmoud Khaled, Kuize Zhang, and Majid Zamani. A framework for output-feedback symbolic control. IEEE Transactions on Automatic Control, 68(9):5600–5607, 2022

work page 2022
[29]

Auto-Encoding Variational Bayes

Diederik P Kingma. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[30]

Floquet theory: a useful tool for understanding nonequilibrium dynamics

Christopher A Klausmeier. Floquet theory: a useful tool for understanding nonequilibrium dynamics. Theoretical Ecology, 1:153–161, 2008

work page 2008
[31]

Discovering symbolic policies with deep reinforcement learning

Mikel Landajuela, Brenden K Petersen, Sookyung Kim, Claudio P Santiago, Ruben Glatt, Nathan Mundhenk, Jacob F Pettit, and Daniel Faissol. Discovering symbolic policies with deep reinforcement learning. InInternational Conference on Machine Learning, pp. 5979–5989. PMLR, 2021

work page 2021
[32]

Continuous control with deep reinforcement learning

TP Lillicrap. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[33]

Physics-informed dyna-style model-based deep reinforcement learning for dynamic control

Xin-Yang Liu and Jian-Xun Wang. Physics-informed dyna-style model-based deep reinforcement learning for dynamic control. Proceedings of the Royal Society A, 477(2255):20210618, 2021

work page 2021
[34]

Sdrl: interpretable and data-efficient deep reinforce- ment learning leveraging symbolic planning

Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. Sdrl: interpretable and data-efficient deep reinforce- ment learning leveraging symbolic planning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 2970–2977, 2019

work page 2019
[35]

Task-oriented koopman-based control with contrastive encoder

Xubo Lyu, Hanyang Hu, Seth Siriya, Ye Pu, and Mo Chen. Task-oriented koopman-based control with contrastive encoder. In Conference on Robot Learning, pp. 93–105. PMLR, 2023

work page 2023
[36]

Isaac gym: High performance gpu-based physics simulation for robot learning, 2021

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac gym: High performance gpu-based physics simulation for robot learning, 2021

work page 2021
[37]

Towards interpretable reinforcement learning using attention augmented agents

Alexander Mott, Daniel Zoran, Mike Chrzanowski, Daan Wierstra, and Danilo Jimenez Rezende. Towards interpretable reinforcement learning using attention augmented agents. Advances in neural information processing systems, 32, 2019

work page 2019
[38]

Data-efficient hierarchical reinforcement learning

Ofir Nachum, Shixiang Shane Gu, Honglak Lee, and Sergey Levine. Data-efficient hierarchical reinforcement learning. Advances in neural information processing systems, 31, 2018

work page 2018
[39]

Adaptive optics control using model-based reinforcement learning

Jalo Nousiainen, Chang Rajani, Markus Kasper, and Tapio Helin. Adaptive optics control using model-based reinforcement learning. Optics Express, 29(10):15327–15344, 2021

work page 2021
[40]

Hierarchical reinforcement learning: A comprehensive survey

Shubham Pateria, Budhitama Subagdja, Ah-hwee Tan, and Chai Quek. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5):1–35, 2021

work page 2021
[41]

Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control

Jean Rabault, Miroslav Kuchta, Atle Jensen, Ulysse R ´eglade, and Nicolas Cerardi. Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control. Journal of fluid mechanics, 865:281–302, 2019

work page 2019
[42]

Rl baselines3 zoo

Antonin Raffin. Rl baselines3 zoo. https://github.com/DLR-RM/rl-baselines3-zoo , 2020

work page 2020
[43]

Symbolic optimal control

Gunther Reissig and Matthias Rungger. Symbolic optimal control. IEEE Transactions on Automatic Control, 64 (6):2224–2239, 2018

work page 2018
[44]

Koopman-Assisted Reinforcement Learning

Preston Rozwood, Edward Mehrez, Ludger Paehler, Wen Sun, and Steven L Brunton. Koopman-assisted reinforcement learning. arXiv preprint arXiv:2403.02290, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[45]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[46]

Multivariable feedback control: analysis and design

Sigurd Skogestad and Ian Postlethwaite. Multivariable feedback control: analysis and design. john Wiley & sons, 2005

work page 2005
[47]

Reinforcement learning: An introduction

Richard S Sutton. Reinforcement learning: An introduction. A Bradford Book, 2018

work page 2018
[48]

Adaptive control schemes for improving the control system dynamics: a review

Pankaj Swarnkar, Shailendra Kumar Jain, and Rajesh Kumar Nema. Adaptive control schemes for improving the control system dynamics: a review. IETE Technical Review, 31(1):17–33, 2014

work page 2014
[49]

Programmati- cally interpretable reinforcement learning

Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. Programmati- cally interpretable reinforcement learning. In International Conference on Machine Learning, pp. 5045–5054. PMLR, 2018

work page 2018
[50]

Toward physics-guided safe deep reinforcement learning for green data center cooling control

Ruihang Wang, Xinyi Zhang, Xin Zhou, Yonggang Wen, and Rui Tan. Toward physics-guided safe deep reinforcement learning for green data center cooling control. In 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS), pp. 159–169. IEEE, 2022. 14 arXiv SALSA-RL A PREPRINT

work page 2022
[51]

Embed to control: A locally linear latent dynamics model for control from raw images

Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. Advances in neural information processing systems, 28, 2015

work page 2015
[52]

Model-based deep reinforcement learning for accelerated learning from flow simulations

Andre Weiner and Janis Geise. Model-based deep reinforcement learning for accelerated learning from flow simulations. Meccanica, pp. 1–18, 2024

work page 2024
[53]

Koopman q-learning: Offline reinforcement learning via symmetries of dynamics

Matthias Weissenbacher, Samarth Sinha, Animesh Garg, and Kawahara Yoshinobu. Koopman q-learning: Offline reinforcement learning via symmetries of dynamics. In International conference on machine learning , pp. 23645–23667. PMLR, 2022

work page 2022
[54]

Explainable ai and reinforcement learning—a systematic review of current approaches and trends

Lindsay Wells and Tomasz Bednarz. Explainable ai and reinforcement learning—a systematic review of current approaches and trends. Frontiers in artificial intelligence, 4:550030, 2021

work page 2021
[55]

Interpretable deep reinforcement learning for optimizing heterogeneous energy storage systems

Luolin Xiong, Yang Tang, Chensheng Liu, Shuai Mao, Ke Meng, Zhaoyang Dong, and Feng Qian. Interpretable deep reinforcement learning for optimizing heterogeneous energy storage systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023

work page 2023
[56]

Reinforcement learning with prototypical representations

Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Reinforcement learning with prototypical representations. In International Conference on Machine Learning, pp. 11920–11931. PMLR, 2021

work page 2021
[57]

Learning deep neural network representations for koopman operators of nonlinear dynamical systems

Enoch Yeung, Soumya Kundu, and Nathan Hodas. Learning deep neural network representations for koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), pp. 4832–4839. IEEE, 2019

work page 2019
[58]

Physics-guided deep reinforcement learning for flow field denoising

Mustafa Z Yousif, Meng Zhang, Yifan Yang, Haifeng Zhou, Linqi Yu, and HeeChang Lim. Physics-guided deep reinforcement learning for flow field denoising. arXiv preprint arXiv:2302.09559, 2023

work page arXiv 2023
[59]

Reinforcement learning with knowledge representation and reasoning: A brief survey

Chao Yu, Xuejing Zheng, Hankz Hankui Zhuo, Hai Wan, and Weilin Luo. Reinforcement learning with knowledge representation and reasoning: A brief survey. arXiv preprint arXiv:2304.12090, 2023

work page arXiv 2023
[60]

Concept learning for interpretable multi-agent reinforcement learning

Renos Zabounidis, Joseph Campbell, Simon Stepputtis, Dana Hughes, and Katia P Sycara. Concept learning for interpretable multi-agent reinforcement learning. In Conference on Robot Learning, pp. 1828–1837. PMLR, 2023

work page 2023
[61]

Stable and safe reinforcement learning via a barrier-lyapunov actor-critic approach

Liqun Zhao, Konstantinos Gatsis, and Antonis Papachristodoulou. Stable and safe reinforcement learning via a barrier-lyapunov actor-critic approach. In 2023 62nd IEEE Conference on Decision and Control (CDC), pp. 1320–1325. IEEE, 2023

work page 2023
[62]

Sindy-rl: Interpretable and efficient model-based reinforcement learning

Nicholas Zolman, Urban Fasel, J Nathan Kutz, and Steven L Brunton. Sindy-rl: Interpretable and efficient model-based reinforcement learning. arXiv preprint arXiv:2403.09110, 2024. 15 arXiv SALSA-RL A PREPRINT A Derivation of the Policy Gradient for SALSA-RL A.1 Policy Formulation A latent dynamic control policy is defined based on Eqs. (3), (4), (5), and ...

work page arXiv 2024
[63]

Region of Instability (T: 0–30): Initial local instability before any actions are applied

work page
[64]

Region of Recovery (T: 30–150): Control actions are applied to recover the system from local instability to stability

work page
[65]

Region of Control Failure (T: 150–200): The control is removed, and the system transitions back from local stability to instability. From the spectral radius and eigenvalue plot (first column, third row), regions of local instability (ρ1 > 1) are evident in time Regions 1 and 3, as well as at the initial stage of Region 2. In Region 2, the policy successf...

work page

[1] [1]

A physics-guided reinforcement learning framework for an autonomous manufacturing system with expensive data

Md Ferdous Alam, Max Shtein, Kira Barton, and David J Hoelzle. A physics-guided reinforcement learning framework for an autonomous manufacturing system with expensive data. In 2021 American Control Conference (ACC), pp. 484–490. IEEE, 2021

work page 2021

[2] [2]

Advanced PID control

Karl Johan ˚Astr¨om and Tore H¨agglund. Advanced PID control. ISA-The Instrumentation, Systems and Automation Society, 2006

work page 2006

[3] [3]

A survey on physics informed reinforcement learning: Review and open problems

Chayan Banerjee, Kien Nguyen, Clinton Fookes, and Maziar Raissi. A survey on physics informed reinforcement learning: Review and open problems. arXiv preprint arXiv:2309.01909, 2023

work page arXiv 2023

[4] [4]

Prototype-based models in machine learning

Michael Biehl, Barbara Hammer, and Thomas Villmann. Prototype-based models in machine learning. Wiley Interdisciplinary Reviews: Cognitive Science, 7(2):92–111, 2016

work page 2016

[5] [5]

OpenAI Gym

G Brockman. Openai gym. arXiv preprint arXiv:1606.01540, 2016. 12 arXiv SALSA-RL A PREPRINT

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

Safe learning in robotics: From learning-based control to safe reinforcement learning

Lukas Brunke, Melissa Greeff, Adam W Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P Schoellig. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5(1):411–444, 2022

work page 2022

[7] [7]

Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning

Jianyu Chen, Shengbo Eben Li, and Masayoshi Tomizuka. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 23(6):5068–5078, 2021

work page 2021

[8] [8]

Physics-guided reinforcement learning for 3d molecular structures

Youngwoo Cho, Sookyung Kim, Peggy Pk Li, Mike P Surh, T Yong-Jin Han, and Jaegul Choo. Physics-guided reinforcement learning for 3d molecular structures. In Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS), 2019

work page 2019

[9] [9]

Interpretable and explainable logical policies via neurally guided symbolic abstraction

Quentin Delfosse, Hikaru Shindo, Devendra Dhami, and Kristian Kersting. Interpretable and explainable logical policies via neurally guided symbolic abstraction. Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[10] [10]

Benchmarking deep reinforcement learning for continuous control

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, pp. 1329–1338. PMLR, 2016

work page 2016

[11] [11]

Stochastic Neural Networks for Hierarchical Reinforcement Learning

Carlos Florensa, Yan Duan, and Pieter Abbeel. Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Addressing function approximation error in actor-critic methods

Scott Fujimoto, Herke Hoof, and David Meger. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018

work page 2018

[13] [13]

A review on deep reinforcement learning for fluid mechanics

Paul Garnier, Jonathan Viquerat, Jean Rabault, Aur´elien Larcher, Alexander Kuhnle, and Elie Hachem. A review on deep reinforcement learning for fluid mechanics. Computers & Fluids, 225:104973, 2021

work page 2021

[14] [14]

A survey on interpretable reinforcement learning

Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao, and Wulong Liu. A survey on interpretable reinforcement learning. Machine Learning, pp. 1–44, 2024

work page 2024

[15] [15]

A review of safe reinforcement learning: Methods, theories and applications

Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll. A review of safe reinforcement learning: Methods, theories and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024

[16] [16]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018

work page 2018

[17] [17]

Actor-critic reinforcement learning for control with stability guarantee

Minghao Han, Lixian Zhang, Jun Wang, and Wei Pan. Actor-critic reinforcement learning for control with stability guarantee. IEEE Robotics and Automation Letters, 5(4):6217–6224, 2020

work page 2020

[18] [18]

Interpretable policies for reinforcement learning by genetic programming

Daniel Hein, Steffen Udluft, and Thomas A Runkler. Interpretable policies for reinforcement learning by genetic programming. Engineering Applications of Artificial Intelligence, 76:158–169, 2018

work page 2018

[19] [19]

Port-metriplectic neural networks: thermodynamics-informed machine learning of complex physical systems

Quercus Hern´andez, Alberto Bad´ıas, Francisco Chinesta, and El´ıas Cueto. Port-metriplectic neural networks: thermodynamics-informed machine learning of complex physical systems. Computational Mechanics, 72(3): 553–561, 2023

work page 2023

[20] [20]

Explainability in deep reinforcement learning

Alexandre Heuillet, Fabien Couthouis, and Natalia D´ıaz-Rodr´ıguez. Explainability in deep reinforcement learning. Knowledge-Based Systems, 214:106685, 2021

work page 2021

[21] [21]

Neural logic reinforcement learning

Zhengyao Jiang and Shan Luo. Neural logic reinforcement learning. In International conference on machine learning, pp. 3110–3119. PMLR, 2019

work page 2019

[22] [22]

Stability-certified reinforcement learning: A control-theoretic perspective

Ming Jin and Javad Lavaei. Stability-certified reinforcement learning: A control-theoretic perspective. IEEE Access, 8:229086–229100, 2020

work page 2020

[23] [23]

Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning

Mu Jin, Zhihao Ma, Kebing Jin, Hankz Hankui Zhuo, Chen Chen, and Chao Yu. Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 7042–7050, 2022

work page 2022

[24] [24]

The joint spectral radius: theory and applications, volume 385

Rapha¨el Jungers. The joint spectral radius: theory and applications, volume 385. Springer Science & Business Media, 2009

work page 2009

[25] [25]

Increasing the safety of adaptive cruise control using physics-guided reinforcement learning

Sorin Liviu Jurj, Dominik Grundt, Tino Werner, Philipp Borchers, Karina Rothemann, and Eike M ¨ohlmann. Increasing the safety of adaptive cruise control using physics-guided reinforcement learning. Energies, 14(22): 7572, 2021

work page 2021

[26] [26]

Scalable deep reinforcement learning for vision-based robotic manipulation

Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, et al. Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on robot learning, pp. 651–673. PMLR, 2018. 13 arXiv SALSA-RL A PREPRINT

work page 2018

[27] [27]

Towards interpretable deep reinforcement learning with human- friendly prototypes

Eoin M Kenny, Mycal Tucker, and Julie Shah. Towards interpretable deep reinforcement learning with human- friendly prototypes. In The Eleventh International Conference on Learning Representations, 2023

work page 2023

[28] [28]

A framework for output-feedback symbolic control

Mahmoud Khaled, Kuize Zhang, and Majid Zamani. A framework for output-feedback symbolic control. IEEE Transactions on Automatic Control, 68(9):5600–5607, 2022

work page 2022

[29] [29]

Auto-Encoding Variational Bayes

Diederik P Kingma. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[30] [30]

Floquet theory: a useful tool for understanding nonequilibrium dynamics

Christopher A Klausmeier. Floquet theory: a useful tool for understanding nonequilibrium dynamics. Theoretical Ecology, 1:153–161, 2008

work page 2008

[31] [31]

Discovering symbolic policies with deep reinforcement learning

Mikel Landajuela, Brenden K Petersen, Sookyung Kim, Claudio P Santiago, Ruben Glatt, Nathan Mundhenk, Jacob F Pettit, and Daniel Faissol. Discovering symbolic policies with deep reinforcement learning. InInternational Conference on Machine Learning, pp. 5979–5989. PMLR, 2021

work page 2021

[32] [32]

Continuous control with deep reinforcement learning

TP Lillicrap. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[33] [33]

Physics-informed dyna-style model-based deep reinforcement learning for dynamic control

Xin-Yang Liu and Jian-Xun Wang. Physics-informed dyna-style model-based deep reinforcement learning for dynamic control. Proceedings of the Royal Society A, 477(2255):20210618, 2021

work page 2021

[34] [34]

Sdrl: interpretable and data-efficient deep reinforce- ment learning leveraging symbolic planning

Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. Sdrl: interpretable and data-efficient deep reinforce- ment learning leveraging symbolic planning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 2970–2977, 2019

work page 2019

[35] [35]

Task-oriented koopman-based control with contrastive encoder

Xubo Lyu, Hanyang Hu, Seth Siriya, Ye Pu, and Mo Chen. Task-oriented koopman-based control with contrastive encoder. In Conference on Robot Learning, pp. 93–105. PMLR, 2023

work page 2023

[36] [36]

Isaac gym: High performance gpu-based physics simulation for robot learning, 2021

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac gym: High performance gpu-based physics simulation for robot learning, 2021

work page 2021

[37] [37]

Towards interpretable reinforcement learning using attention augmented agents

Alexander Mott, Daniel Zoran, Mike Chrzanowski, Daan Wierstra, and Danilo Jimenez Rezende. Towards interpretable reinforcement learning using attention augmented agents. Advances in neural information processing systems, 32, 2019

work page 2019

[38] [38]

Data-efficient hierarchical reinforcement learning

Ofir Nachum, Shixiang Shane Gu, Honglak Lee, and Sergey Levine. Data-efficient hierarchical reinforcement learning. Advances in neural information processing systems, 31, 2018

work page 2018

[39] [39]

Adaptive optics control using model-based reinforcement learning

Jalo Nousiainen, Chang Rajani, Markus Kasper, and Tapio Helin. Adaptive optics control using model-based reinforcement learning. Optics Express, 29(10):15327–15344, 2021

work page 2021

[40] [40]

Hierarchical reinforcement learning: A comprehensive survey

Shubham Pateria, Budhitama Subagdja, Ah-hwee Tan, and Chai Quek. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5):1–35, 2021

work page 2021

[41] [41]

Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control

Jean Rabault, Miroslav Kuchta, Atle Jensen, Ulysse R ´eglade, and Nicolas Cerardi. Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control. Journal of fluid mechanics, 865:281–302, 2019

work page 2019

[42] [42]

Rl baselines3 zoo

Antonin Raffin. Rl baselines3 zoo. https://github.com/DLR-RM/rl-baselines3-zoo , 2020

work page 2020

[43] [43]

Symbolic optimal control

Gunther Reissig and Matthias Rungger. Symbolic optimal control. IEEE Transactions on Automatic Control, 64 (6):2224–2239, 2018

work page 2018

[44] [44]

Koopman-Assisted Reinforcement Learning

Preston Rozwood, Edward Mehrez, Ludger Paehler, Wen Sun, and Steven L Brunton. Koopman-assisted reinforcement learning. arXiv preprint arXiv:2403.02290, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[45] [45]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[46] [46]

Multivariable feedback control: analysis and design

Sigurd Skogestad and Ian Postlethwaite. Multivariable feedback control: analysis and design. john Wiley & sons, 2005

work page 2005

[47] [47]

Reinforcement learning: An introduction

Richard S Sutton. Reinforcement learning: An introduction. A Bradford Book, 2018

work page 2018

[48] [48]

Adaptive control schemes for improving the control system dynamics: a review

Pankaj Swarnkar, Shailendra Kumar Jain, and Rajesh Kumar Nema. Adaptive control schemes for improving the control system dynamics: a review. IETE Technical Review, 31(1):17–33, 2014

work page 2014

[49] [49]

Programmati- cally interpretable reinforcement learning

Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. Programmati- cally interpretable reinforcement learning. In International Conference on Machine Learning, pp. 5045–5054. PMLR, 2018

work page 2018

[50] [50]

Toward physics-guided safe deep reinforcement learning for green data center cooling control

Ruihang Wang, Xinyi Zhang, Xin Zhou, Yonggang Wen, and Rui Tan. Toward physics-guided safe deep reinforcement learning for green data center cooling control. In 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS), pp. 159–169. IEEE, 2022. 14 arXiv SALSA-RL A PREPRINT

work page 2022

[51] [51]

Embed to control: A locally linear latent dynamics model for control from raw images

Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. Advances in neural information processing systems, 28, 2015

work page 2015

[52] [52]

Model-based deep reinforcement learning for accelerated learning from flow simulations

Andre Weiner and Janis Geise. Model-based deep reinforcement learning for accelerated learning from flow simulations. Meccanica, pp. 1–18, 2024

work page 2024

[53] [53]

Koopman q-learning: Offline reinforcement learning via symmetries of dynamics

Matthias Weissenbacher, Samarth Sinha, Animesh Garg, and Kawahara Yoshinobu. Koopman q-learning: Offline reinforcement learning via symmetries of dynamics. In International conference on machine learning , pp. 23645–23667. PMLR, 2022

work page 2022

[54] [54]

Explainable ai and reinforcement learning—a systematic review of current approaches and trends

Lindsay Wells and Tomasz Bednarz. Explainable ai and reinforcement learning—a systematic review of current approaches and trends. Frontiers in artificial intelligence, 4:550030, 2021

work page 2021

[55] [55]

Interpretable deep reinforcement learning for optimizing heterogeneous energy storage systems

Luolin Xiong, Yang Tang, Chensheng Liu, Shuai Mao, Ke Meng, Zhaoyang Dong, and Feng Qian. Interpretable deep reinforcement learning for optimizing heterogeneous energy storage systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023

work page 2023

[56] [56]

Reinforcement learning with prototypical representations

Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Reinforcement learning with prototypical representations. In International Conference on Machine Learning, pp. 11920–11931. PMLR, 2021

work page 2021

[57] [57]

Learning deep neural network representations for koopman operators of nonlinear dynamical systems

Enoch Yeung, Soumya Kundu, and Nathan Hodas. Learning deep neural network representations for koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), pp. 4832–4839. IEEE, 2019

work page 2019

[58] [58]

Physics-guided deep reinforcement learning for flow field denoising

Mustafa Z Yousif, Meng Zhang, Yifan Yang, Haifeng Zhou, Linqi Yu, and HeeChang Lim. Physics-guided deep reinforcement learning for flow field denoising. arXiv preprint arXiv:2302.09559, 2023

work page arXiv 2023

[59] [59]

Reinforcement learning with knowledge representation and reasoning: A brief survey

Chao Yu, Xuejing Zheng, Hankz Hankui Zhuo, Hai Wan, and Weilin Luo. Reinforcement learning with knowledge representation and reasoning: A brief survey. arXiv preprint arXiv:2304.12090, 2023

work page arXiv 2023

[60] [60]

Concept learning for interpretable multi-agent reinforcement learning

Renos Zabounidis, Joseph Campbell, Simon Stepputtis, Dana Hughes, and Katia P Sycara. Concept learning for interpretable multi-agent reinforcement learning. In Conference on Robot Learning, pp. 1828–1837. PMLR, 2023

work page 2023

[61] [61]

Stable and safe reinforcement learning via a barrier-lyapunov actor-critic approach

Liqun Zhao, Konstantinos Gatsis, and Antonis Papachristodoulou. Stable and safe reinforcement learning via a barrier-lyapunov actor-critic approach. In 2023 62nd IEEE Conference on Decision and Control (CDC), pp. 1320–1325. IEEE, 2023

work page 2023

[62] [62]

Sindy-rl: Interpretable and efficient model-based reinforcement learning

Nicholas Zolman, Urban Fasel, J Nathan Kutz, and Steven L Brunton. Sindy-rl: Interpretable and efficient model-based reinforcement learning. arXiv preprint arXiv:2403.09110, 2024. 15 arXiv SALSA-RL A PREPRINT A Derivation of the Policy Gradient for SALSA-RL A.1 Policy Formulation A latent dynamic control policy is defined based on Eqs. (3), (4), (5), and ...

work page arXiv 2024

[63] [63]

Region of Instability (T: 0–30): Initial local instability before any actions are applied

work page

[64] [64]

Region of Recovery (T: 30–150): Control actions are applied to recover the system from local instability to stability

work page

[65] [65]

Region of Control Failure (T: 150–200): The control is removed, and the system transitions back from local stability to instability. From the spectral radius and eigenvalue plot (first column, third row), regions of local instability (ρ1 > 1) are evident in time Regions 1 and 3, as well as at the initial stage of Region 2. In Region 2, the policy successf...

work page