Susceptibilities applied to regret in deep RL agents reveal stagewise internal development in parameter space of a gridworld model that policy inspection alone cannot detect, validated via activation steering.
Neural Computation , volume =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Bayesian statistics supplies an automatic Occam's razor that penalizes unnatural models needing precise fine-tuning to agree with data, justifying naturalness arguments without aleatoric uncertainty.
citing papers explorer
-
Interpreting Reinforcement Learning Agents with Susceptibilities
Susceptibilities applied to regret in deep RL agents reveal stagewise internal development in parameter space of a gridworld model that policy inspection alone cannot detect, validated via activation steering.
-
It's all in your head -- fine-tuning arguments do not require aleatoric uncertainty
Bayesian statistics supplies an automatic Occam's razor that penalizes unnatural models needing precise fine-tuning to agree with data, justifying naturalness arguments without aleatoric uncertainty.