pith. sign in

arxiv: 2510.13704 · v2 · pith:SZVWTJZ4new · submitted 2025-10-15 · 💻 cs.LG · cs.AI· cs.RO

Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents

classification 💻 cs.LG cs.AIcs.RO
keywords embeddingssimplicialefficiencyimprovesampleactor-criticagentsenvironment
0
0 comments X
read the original abstract

Recent works have proposed accelerating the wall-clock training time of actor-critic methods via the use of large-scale environment parallelization; unfortunately, these can sometimes still require large number of environment interactions to achieve a desired level of performance. Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layers that constrain embeddings to simplicial structures. This geometric inductive bias results in sparse and discrete features that stabilize critic bootstrapping and strengthen policy gradients. When applied to FastTD3, FastSAC, and PPO, simplicial embeddings consistently improve sample efficiency and final performance across a variety of continuous- and discrete-control environments, without any loss in runtime speed.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

    cs.LG 2026-04 unverdicted novelty 6.0

    FlashSAC scales up Soft Actor-Critic with fewer updates, larger models, higher data throughput, and norm bounds to deliver faster, more stable training than PPO on high-dimensional robot control tasks across dozens of...

  2. FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

    cs.LG 2026-04 unverdicted novelty 6.0

    FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.

  3. FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control

    cs.LG 2026-03 unverdicted novelty 6.0

    FastDSAC enables state-of-the-art maximum entropy RL for high-dimensional humanoid control via entropy redistribution per dimension and improved continuous value estimation.