AdamO modifies Adam with an orthogonality correction to ensure the spectral radius of the TD update operator stays below one, providing a theoretical stability guarantee for offline RL.
Advances in neural information processing systems , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.LG 3years
2026 3representative citing papers
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.