In Figure 10, we show the instantaneous regret on theMovieLensdata set for different settings of m

0 500 1000 1500Average cumulative regret, Reg(T ) UCB TS mUCB mTS lobUCB lobTS (c)m= 80 Figure 9: Cumulative regret for the well-specified Setting A of the MovieLens environment with varyingm · 2000

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Latent Order Bandits

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Latent order bandits require only a known partial order on actions within each latent state rather than full reward distributions, enabling UCB and posterior-sampling algorithms with regret bounds that match or exceed standard latent bandits when reward scales vary.

citing papers explorer

Showing 1 of 1 citing paper.

Latent Order Bandits cs.LG · 2026-05-08 · unverdicted · none · ref 13
Latent order bandits require only a known partial order on actions within each latent state rather than full reward distributions, enabling UCB and posterior-sampling algorithms with regret bounds that match or exceed standard latent bandits when reward scales vary.

In Figure 10, we show the instantaneous regret on theMovieLensdata set for different settings of m

fields

years

verdicts

representative citing papers

citing papers explorer