ReversedQ applies three targeted changes to RandomizedQ—update order, frequency, and initialization—to raise scaled mean cumulative reward from 9.53% to 78.78% in BDCL and from 21.76% to 61.81% in chain MDPs.
Miskala-Dinc, Amedeo Ercole, and Aviva Prins
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ReversedQ: Opportunities for Faster Q-Learning in Episodic Online Reinforcement Learning
ReversedQ applies three targeted changes to RandomizedQ—update order, frequency, and initialization—to raise scaled mean cumulative reward from 9.53% to 78.78% in BDCL and from 21.76% to 61.81% in chain MDPs.