FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
Simplicial embeddings improve sample efficiency in actor-critic agents
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
FastDSAC enables state-of-the-art maximum entropy RL for high-dimensional humanoid control via entropy redistribution per dimension and improved continuous value estimation.
citing papers explorer
-
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
-
FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control
FastDSAC enables state-of-the-art maximum entropy RL for high-dimensional humanoid control via entropy redistribution per dimension and improved continuous value estimation.