Quantile Q-Learning estimates the temperature coefficient β via quantile regression and adds value regularization to Extreme Q-Learning, yielding stable training and competitive performance on D4RL and NeoRL2 benchmarks with fixed hyperparameters.
To address this issue, ex- isting methods like Advantage-Weighted Regression (AWR) aim to mitigate such inherent conservative- ness
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression
Quantile Q-Learning estimates the temperature coefficient β via quantile regression and adds value regularization to Extreme Q-Learning, yielding stable training and competitive performance on D4RL and NeoRL2 benchmarks with fixed hyperparameters.