A distributional framework for optimizing Lipschitz risk functionals in offline contextual bandits yields data-dependent suboptimality bounds of Õ(1/√n) that match risk-neutral rates and are minimax optimal.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
A weighted K-means plus decision-tree pipeline learns multi-action policies from observational data and is applied to HCV treatment choices for HIV co-infected patients, finding a high-clearance subgroup and potential cost savings of CAN$3.6-4.9 million.
citing papers explorer
-
Pessimistic Risk-Aware Policy Learning in Contextual Bandits
A distributional framework for optimizing Lipschitz risk functionals in offline contextual bandits yields data-dependent suboptimality bounds of Õ(1/√n) that match risk-neutral rates and are minimax optimal.
-
Policy Learning with Observational Data: The Case of Hepatitis C Treatment for HIV/HCV Co-Infected Patients
A weighted K-means plus decision-tree pipeline learns multi-action policies from observational data and is applied to HCV treatment choices for HIV co-infected patients, finding a high-clearance subgroup and potential cost savings of CAN$3.6-4.9 million.