FP-IRL recovers MDP reward, transition, and policy from trajectories alone by using variational system identification on a Fokker-Planck potential that corresponds to reward maximization.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FP-IRL: Fokker--Planck Inverse Reinforcement Learning -- A Physics-Constrained Approach to Markov Decision Processes
FP-IRL recovers MDP reward, transition, and policy from trajectories alone by using variational system identification on a Fokker-Planck potential that corresponds to reward maximization.