Beyond the rainbow: High performance deep reinforcement learning on a desktop pc.arXiv preprint arXiv:2411.03820,

Clark, T · 2024 · arXiv 2411.03820

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling

stat.ML · 2026-05-20 · unverdicted · novelty 5.0

Adaptive Batch Scaling dynamically increases batch size in on-policy RL as policy volatility drops, measured by a new Behavioral Divergence metric, and shows larger networks plus larger batches outperform on ALE with PQN.

Distributional Value Estimation Without Target Networks for Robust Quality-Diversity

cs.LG · 2026-04-22 · unverdicted · novelty 5.0

QDHUAC is a distributional, target-free QD-RL method that enables stable high-UTD training and competitive performance on Brax locomotion tasks using far fewer environment steps than prior approaches.

citing papers explorer

Showing 2 of 2 citing papers.

Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling stat.ML · 2026-05-20 · unverdicted · none · ref 5
Adaptive Batch Scaling dynamically increases batch size in on-policy RL as policy volatility drops, measured by a new Behavioral Divergence metric, and shows larger networks plus larger batches outperform on ALE with PQN.
Distributional Value Estimation Without Target Networks for Robust Quality-Diversity cs.LG · 2026-04-22 · unverdicted · none · ref 8
QDHUAC is a distributional, target-free QD-RL method that enables stable high-UTD training and competitive performance on Brax locomotion tasks using far fewer environment steps than prior approaches.

Beyond the rainbow: High performance deep reinforcement learning on a desktop pc.arXiv preprint arXiv:2411.03820,

fields

years

verdicts

representative citing papers

citing papers explorer