Introduces a learned arrow of time in MDPs that aligns with the Jordan-Kinderlehrer-Otto notion for stochastic processes and enables practical RL utilities like reachability and side-effect detection.
0 50000 100000 150000 200000 250000 Iterations 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0Mean Number of Vases Broken Without h-Potential With h-Potential (b) Number of vases broken
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning the Arrow of Time
Introduces a learned arrow of time in MDPs that aligns with the Jordan-Kinderlehrer-Otto notion for stochastic processes and enables practical RL utilities like reachability and side-effect detection.