Susceptibilities applied to regret in deep RL agents reveal stagewise internal development in parameter space of a gridworld model that policy inspection alone cannot detect, validated via activation steering.
Proceedings of the AAAI Conference on Artificial Intelligence , author=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
New discrete-time approximations to SG(L)D enable accurate non-asymptotic predictions of covariance and integrated autocorrelation time for practical tuning in large-batch or misspecified regimes.
citing papers explorer
-
Interpreting Reinforcement Learning Agents with Susceptibilities
Susceptibilities applied to regret in deep RL agents reveal stagewise internal development in parameter space of a gridworld model that policy inspection alone cannot detect, validated via activation steering.
-
Accurate Large-sample Uncertainty Quantification using Stochastic Gradient Markov Chain Monte Carlo
New discrete-time approximations to SG(L)D enable accurate non-asymptotic predictions of covariance and integrated autocorrelation time for practical tuning in large-batch or misspecified regimes.