Accuracy of discretely sampled stochastic policies in continuous-time reinforcement learning

Yanwei Jia, Du Ouyang, Yufei Zhang · 2025 · arXiv 2503.09981

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems

q-fin.CP · 2026-06-10 · unverdicted · novelty 7.0

A two-stage actor-critic RL algorithm learns deterministic equilibrium policies for general time-inconsistent control problems by combining DPG on an auxiliary time-consistent problem with fixed-point iteration on auxiliary functions.

Policy Gradient for Continuous-Time Mean-Field Control

math.OC · 2026-05-20 · conditional · novelty 7.0

Derives an explicit Gâteaux policy-gradient formula for entropy-regularized continuous-time mean-field control using the value function and cylindrical representations, then builds a model-based actor-critic scheme with PDE well-posedness analysis.

An Actor-Critic Framework for Continuous-Time Jump-Diffusion Controls with Normalizing Flows

math.OC · 2026-04-07 · unverdicted · novelty 7.0

An actor-critic framework built on a time-inhomogeneous little q-function and conditional normalizing flows serves as a mesh-free solver for entropy-regularized jump-diffusion control problems and stochastic games.

Extended mean-field control problems with Poissonian common noise: Stochastic maximum principle and Hamiltonian-Jacobi-Bellman equation

math.OC · 2024-07-07 · unverdicted · novelty 7.0

Develops SMP for non-convex mean-field control with joint law dependence and Poisson common noise via relaxed controls and extension transformation, then derives equivalent HJB on measure space.

Mean-Field PhiBE: Continuous-Time Mean-Field Reinforcement Learning from Discrete-Time Data

math.OC · 2026-06-25 · unverdicted · novelty 6.0

Introduces MF-PhiBE to perform continuous-time mean-field RL from discrete data, with O(Δt) consistency and O((Δt)^2) accuracy in the LQ case.

PhiBE-Q-Learning: Bridging Off-Policy Reinforcement Learning and Continuous-Time Control

math.OC · 2026-06-20 · unverdicted · novelty 6.0

Introduces a new Q-function definition for continuous-time RL and convergent off-policy algorithms under linear function approximation in model-based and model-free settings.

Discretization error from regularized Reinforcement Learning to continuous-time stochastic control

math.OC · 2026-04-23 · unverdicted · novelty 5.0

Derives quantitative convergence rates for the gap between optimal policies from regularized discrete-time Bellman equations and true optimal controls in underlying continuous-time stochastic problems.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems q-fin.CP · 2026-06-10 · unverdicted · none · ref 16
A two-stage actor-critic RL algorithm learns deterministic equilibrium policies for general time-inconsistent control problems by combining DPG on an auxiliary time-consistent problem with fixed-point iteration on auxiliary functions.
An Actor-Critic Framework for Continuous-Time Jump-Diffusion Controls with Normalizing Flows math.OC · 2026-04-07 · unverdicted · none · ref 22
An actor-critic framework built on a time-inhomogeneous little q-function and conditional normalizing flows serves as a mesh-free solver for entropy-regularized jump-diffusion control problems and stochastic games.
Extended mean-field control problems with Poissonian common noise: Stochastic maximum principle and Hamiltonian-Jacobi-Bellman equation math.OC · 2024-07-07 · unverdicted · none · ref 31
Develops SMP for non-convex mean-field control with joint law dependence and Poisson common noise via relaxed controls and extension transformation, then derives equivalent HJB on measure space.
Mean-Field PhiBE: Continuous-Time Mean-Field Reinforcement Learning from Discrete-Time Data math.OC · 2026-06-25 · unverdicted · none · ref 23
Introduces MF-PhiBE to perform continuous-time mean-field RL from discrete data, with O(Δt) consistency and O((Δt)^2) accuracy in the LQ case.
PhiBE-Q-Learning: Bridging Off-Policy Reinforcement Learning and Continuous-Time Control math.OC · 2026-06-20 · unverdicted · none · ref 44
Introduces a new Q-function definition for continuous-time RL and convergent off-policy algorithms under linear function approximation in model-based and model-free settings.
Discretization error from regularized Reinforcement Learning to continuous-time stochastic control math.OC · 2026-04-23 · unverdicted · none · ref 19
Derives quantitative convergence rates for the gap between optimal policies from regularized discrete-time Bellman equations and true optimal controls in underlying continuous-time stochastic problems.

Accuracy of discretely sampled stochastic policies in continuous-time reinforcement learning

fields

years

verdicts

representative citing papers

citing papers explorer