pith. sign in

← back to paper

Review history

arxiv: 2605.05102 · 2 revisions

Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning

  1. 2026-06-30 UNVERDICTED LOW v0.9.1-grok novelty 8.0
    31348 ms 5757 in 1153 out 2026-06-30T23:56:21.237039+00:00
  2. 2026-05-08 UNVERDICTED LOW v0.9.0 novelty 7.0
    57747 ms 5526 in 1268 out 2026-05-08T17:59:44.111780+00:00