Latent order bandits require only a known partial order on actions within each latent state rather than full reward distributions, enabling UCB and posterior-sampling algorithms with regret bounds that match or exceed standard latent bandits when reward scales vary.
In Figure 10, we show the instantaneous regret on theMovieLensdata set for different settings of m
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Latent Order Bandits
Latent order bandits require only a known partial order on actions within each latent state rather than full reward distributions, enabling UCB and posterior-sampling algorithms with regret bounds that match or exceed standard latent bandits when reward scales vary.