DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.
Science , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Recurrent RL policies can have their hidden states aligned with PMP co-states through a derived loss, yielding robust performance on partially observable control tasks.
citing papers explorer
-
Distributional Off-Policy Evaluation with Deep Quantile Process Regression
DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.
-
Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning
Recurrent RL policies can have their hidden states aligned with PMP co-states through a derived loss, yielding robust performance on partially observable control tasks.