DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.
arXiv preprint arXiv:2107.04907 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A neural estimator for the generative map g in Y = g(X, U) is obtained by minimizing empirical energy distance between observed and generated distributions, attaining adaptive nonparametric rates.
citing papers explorer
-
Distributional Off-Policy Evaluation with Deep Quantile Process Regression
DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.
-
Neural Generative Distributional Regression
A neural estimator for the generative map g in Y = g(X, U) is obtained by minimizing empirical energy distance between observed and generated distributions, attaining adaptive nonparametric rates.