pith. machine review for the scientific record. sign in

arxiv: 1703.07710 · v3 · submitted 2017-03-22 · 💻 cs.LG · cs.AI· stat.ML

Recognition: unknown

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Authors on Pith no claims yet
classification 💻 cs.LG cs.AIstat.ML
keywords frameworkregretalgorithmsboundsepisodicguaranteeslearningperformance
0
0 comments X
read the original abstract

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs

    cs.LG 2026-05 unverdicted novelty 6.0

    An actor-critic RL algorithm for low-rank MDPs achieves improved sample efficiency using solely a policy evaluation oracle.