Based on these setups, we conducted several ablation studies to better understand the effects of CPQL and λ

We continued training until the normalized score reached 100, designating this policy as the optimal policy · 2026 · arXiv 2101.0100

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

CPQL adapts the multi-step Peng's Q(λ) operator for conservative offline value estimation, achieving performance guarantees and empirical gains over single-step baselines on D4RL while supporting offline-to-online fine-tuning.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning cs.LG · 2026-05-14 · unverdicted · none · ref 38
CPQL adapts the multi-step Peng's Q(λ) operator for conservative offline value estimation, achieving performance guarantees and empirical gains over single-step baselines on D4RL while supporting offline-to-online fine-tuning.

Based on these setups, we conducted several ablation studies to better understand the effects of CPQL and λ

fields

years

verdicts

representative citing papers

citing papers explorer