arXiv preprint arXiv:2510.13512 , year=

Offline, Online KL-Regularized RLHF under Differential Privacy , author= · arXiv 2510.13512

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.

On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

Offline KL-regularized MABs require sample complexity scaling as O(η S A C^π*/ε) for large regularization and Ω(S A C^π*/ε²) for small regularization, with matching lower bounds across the full range.

citing papers explorer

Showing 2 of 2 citing papers.

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability cs.LG · 2026-05-09 · unverdicted · none · ref 107
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization cs.LG · 2026-05-04 · unverdicted · none · ref 55
Offline KL-regularized MABs require sample complexity scaling as O(η S A C^π*/ε) for large regularization and Ω(S A C^π*/ε²) for small regularization, with matching lower bounds across the full range.

arXiv preprint arXiv:2510.13512 , year=

fields

years

verdicts

representative citing papers

citing papers explorer