pith. sign in

Log- arithmic regret for online kl-regularized reinforcement learning, 2025a

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 3

years

2026 3

verdicts

UNVERDICTED 3

roles

background 1

polarities

unclear 1

clear filters

representative citing papers

Efficient Exploration for Iterative Nash Preference Optimization

cs.LG · 2026-05-31 · unverdicted · novelty 7.0

An explicitly exploratory iterative NLHF method achieves O(sqrt(T)) regret for Nash equilibria under general preference models, removing the exponential KL dependence that plagues standard iterative approaches.

citing papers explorer

Showing 3 of 3 citing papers after filters.