In offline RL, the structure of pessimism (set by dataset coverage) matters more for generalization than its amount; a symmetric overly pessimistic value function can outperform a non-symmetric mildly pessimistic one.
Strategically Conservative Q - Learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Generalization in offline RL: The structure is more important than the amount of pessimism
In offline RL, the structure of pessimism (set by dataset coverage) matters more for generalization than its amount; a symmetric overly pessimistic value function can outperform a non-symmetric mildly pessimistic one.