pith. sign in

arxiv: 2602.02098 · v2 · pith:DLPU32CFnew · submitted 2026-02-02 · 💻 cs.LG · cs.AI

Probabilistic Performance Guarantees for Multi-Task Reinforcement Learning

classification 💻 cs.LG cs.AI
keywords guaranteesmulti-tasktasksperformancefinitelygeneralisationhigh-confidencelearning
0
0 comments X
read the original abstract

Multi-task reinforcement learning trains generalist policies that can execute multiple tasks. While recent years have seen significant progress, existing approaches rarely provide formal performance guarantees, which are indispensable when deploying policies in safety-critical settings. We present an approach for computing high-confidence guarantees on the performance of a multi-task policy on tasks not seen during training. Concretely, we introduce a new generalisation bound that composes (i) per-task lower confidence bounds from finitely many rollouts with (ii) task-level generalisation from finitely many sampled tasks, yielding a high-confidence guarantee for new tasks drawn from the same arbitrary and unknown distribution. Across state-of-the-art multi-task RL methods, we show that the guarantees are theoretically sound and informative at realistic sample sizes.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Multitask LQG Control: Performance and Generalization Bounds

    math.OC 2026-04 unverdicted novelty 5.0

    Multitask LQG control via history-dependent lifting to LQR yields generalization bounds tied to bisimulation heterogeneity and reduces policy gradient variance proportionally to the number of training tasks.