pith. machine review for the scientific record. sign in

arxiv: 2509.08660 · v3 · submitted 2025-09-10 · 💻 cs.LG

Recognition: unknown

Replicable Reinforcement Learning with Linear Function Approximation

Authors on Pith no claims yet
classification 💻 cs.LG
keywords algorithmsreplicablelearningapproximationfunctionlinearsettingsefficient
0
0 comments X
read the original abstract

Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized replicability as the demand that an algorithm produce identical outcomes when executed twice on different samples from the same distribution. Provably replicable algorithms are especially interesting for reinforcement learning (RL), where algorithms are known to be unstable in practice. While replicable algorithms exist for tabular RL settings, extending these guarantees to more practical function approximation settings has remained an open problem. In this work, we make progress by developing replicable methods for linear function approximation in RL. We first introduce two efficient algorithms for replicable random design regression and uncentered covariance estimation, each of independent interest. We then leverage these tools to provide the first provably efficient replicable RL algorithms for linear Markov decision processes in both the generative model and episodic settings. Finally, we evaluate our algorithms experimentally and show how they can inspire more consistent neural policies.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Replicable Composition

    cs.LG 2026-04 unverdicted novelty 8.0

    Replicable algorithms for heterogeneous problems can be composed with O(sum n_i) samples at constant replicability via conversion to perfectly generalizing algorithms, privacy-style composition, and correlated sampling.