A two-stage actor-critic RL algorithm learns deterministic equilibrium policies for general time-inconsistent control problems by combining DPG on an auxiliary time-consistent problem with fixed-point iteration on auxiliary functions.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
q-fin.CP 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems
A two-stage actor-critic RL algorithm learns deterministic equilibrium policies for general time-inconsistent control problems by combining DPG on an auxiliary time-consistent problem with fixed-point iteration on auxiliary functions.