To address this issue, ex- isting methods like Advantage-Weighted Regression (AWR) aim to mitigate such inherent conservative- ness

A Derivation of Policy Objective A common challenge in policy learning arises when the sampling policyµ(· | ·)is suboptimal, which often results in overly conservative estimates from in-sample approaches · 1935

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression

cs.LG · 2025-11-15 · unverdicted · novelty 5.0

Quantile Q-Learning estimates the temperature coefficient β via quantile regression and adds value regularization to Extreme Q-Learning, yielding stable training and competitive performance on D4RL and NeoRL2 benchmarks with fixed hyperparameters.

citing papers explorer

Showing 1 of 1 citing paper.

Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression cs.LG · 2025-11-15 · unverdicted · none · ref 4
Quantile Q-Learning estimates the temperature coefficient β via quantile regression and adds value regularization to Extreme Q-Learning, yielding stable training and competitive performance on D4RL and NeoRL2 benchmarks with fixed hyperparameters.

To address this issue, ex- isting methods like Advantage-Weighted Regression (AWR) aim to mitigate such inherent conservative- ness

fields

years

verdicts

representative citing papers

citing papers explorer