Generalised Entropy MDPs and Minimax Regret
classification
💻 cs.LG
stat.ML
keywords
banditbayesianbeliefsconsiderdiscoverdiscussentropyextend
read the original abstract
Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.