A two-stage actor-critic RL algorithm learns deterministic equilibrium policies for general time-inconsistent control problems by combining DPG on an auxiliary time-consistent problem with fixed-point iteration on auxiliary functions.
Accuracy of discretely sampled stochastic policies in continuous-time reinforcement learning
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Derives an explicit Gâteaux policy-gradient formula for entropy-regularized continuous-time mean-field control using the value function and cylindrical representations, then builds a model-based actor-critic scheme with PDE well-posedness analysis.
An actor-critic framework built on a time-inhomogeneous little q-function and conditional normalizing flows serves as a mesh-free solver for entropy-regularized jump-diffusion control problems and stochastic games.
Develops SMP for non-convex mean-field control with joint law dependence and Poisson common noise via relaxed controls and extension transformation, then derives equivalent HJB on measure space.
Introduces MF-PhiBE to perform continuous-time mean-field RL from discrete data, with O(Δt) consistency and O((Δt)^2) accuracy in the LQ case.
Introduces a new Q-function definition for continuous-time RL and convergent off-policy algorithms under linear function approximation in model-based and model-free settings.
Derives quantitative convergence rates for the gap between optimal policies from regularized discrete-time Bellman equations and true optimal controls in underlying continuous-time stochastic problems.
citing papers explorer
-
Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems
A two-stage actor-critic RL algorithm learns deterministic equilibrium policies for general time-inconsistent control problems by combining DPG on an auxiliary time-consistent problem with fixed-point iteration on auxiliary functions.
-
An Actor-Critic Framework for Continuous-Time Jump-Diffusion Controls with Normalizing Flows
An actor-critic framework built on a time-inhomogeneous little q-function and conditional normalizing flows serves as a mesh-free solver for entropy-regularized jump-diffusion control problems and stochastic games.
-
Extended mean-field control problems with Poissonian common noise: Stochastic maximum principle and Hamiltonian-Jacobi-Bellman equation
Develops SMP for non-convex mean-field control with joint law dependence and Poisson common noise via relaxed controls and extension transformation, then derives equivalent HJB on measure space.
-
Mean-Field PhiBE: Continuous-Time Mean-Field Reinforcement Learning from Discrete-Time Data
Introduces MF-PhiBE to perform continuous-time mean-field RL from discrete data, with O(Δt) consistency and O((Δt)^2) accuracy in the LQ case.
-
PhiBE-Q-Learning: Bridging Off-Policy Reinforcement Learning and Continuous-Time Control
Introduces a new Q-function definition for continuous-time RL and convergent off-policy algorithms under linear function approximation in model-based and model-free settings.
-
Discretization error from regularized Reinforcement Learning to continuous-time stochastic control
Derives quantitative convergence rates for the gap between optimal policies from regularized discrete-time Bellman equations and true optimal controls in underlying continuous-time stochastic problems.