Introduces the Agent State-Markov Policy Gradient (ASMPG) algorithm and a policy gradient theorem for non-Markovian decision processes by jointly optimizing agent state dynamics and control policy.
International Conference on Machine Learning , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Vanishing L2 regularization yields provable convergence for softmax MAB policies and improves empirical performance.
citing papers explorer
-
Policy Gradient Methods for Non-Markovian Reinforcement Learning
Introduces the Agent State-Markov Policy Gradient (ASMPG) algorithm and a policy gradient theorem for non-Markovian decision processes by jointly optimizing agent state dynamics and control policy.
-
Vanishing L2 regularization for the softmax Multi Armed Bandit
Vanishing L2 regularization yields provable convergence for softmax MAB policies and improves empirical performance.