Fasttd3: Simple, fast, and capable reinforcement learning for humanoid control

Seo, Younggyo, Sferrazza, Carmelo, Geng, Haoran, Nauman, Michal, Yin, Zhao-Heng, Abbeel, Pieter , title = · 2025 · arXiv 2505.22642

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

ACPO: Agent-Chained Policy Optimization for Multi-Agent Reinforcement Learning

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

ACPO decomposes the joint policy gradient into per-agent terms allowing independent actor training that collectively forms a joint gradient step in CTDE-based MARL.

AnyBody: Free-Form Whole-Body Humanoid Control from Arbitrary Keypoint Guidance

cs.RO · 2026-06-28 · unverdicted · novelty 6.0

AnyBody distills a privileged teacher tracker into a latent unit-sphere representation and uses a masked transformer to drive humanoid control from arbitrary keypoint subsets.

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

cs.RO · 2026-05-28 · unverdicted · novelty 6.0

UniLab is a CPU/GPU heterogeneous system for robot RL training using MuJoCoUni and MotrixSim backends that reports 3-10x end-to-end efficiency improvements and cross-platform compatibility beyond CUDA.

Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs

cs.CE · 2026-04-07 · unverdicted · novelty 6.0

Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.

FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

cs.LG · 2026-04-06 · unverdicted · novelty 6.0 · 2 refs

FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

cs.RO · 2026-03-16 · conditional · novelty 6.0

ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.

FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control

cs.LG · 2026-03-13 · unverdicted · novelty 6.0

FastDSAC enables state-of-the-art maximum entropy RL for high-dimensional humanoid control via entropy redistribution per dimension and improved continuous value estimation.

ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation

cs.RO · 2025-06-19 · unverdicted · novelty 6.0

ViTacFormer learns a cross-modal visuo-tactile latent space with autoregressive tactile prediction and an easy-to-hard curriculum, then uses the representation for imitation learning that yields ~50% higher success and the first reported 11-stage, 2.5-minute autonomous dexterous tasks.

When Does Non-Uniform Replay Matter in Reinforcement Learning?

cs.LG · 2026-05-11 · unverdicted · novelty 5.0 · 3 refs

Non-uniform replay helps most when replay volume is low; high-entropy sampling remains important, and a truncated geometric distribution delivers better sample efficiency with negligible overhead.

Dyna-Style Safety Augmented Reinforcement Learning: Staying Safe in the Face of Uncertainty

cs.LG · 2026-04-28 · unverdicted · novelty 5.0

Dyna-SAuR learns scalable safety filters and policies from an uncertainty-aware model, cutting failures by two orders of magnitude on CartPole and MuJoCo Walker tasks.

Relative Entropy Pathwise Policy Optimization

cs.LG · 2025-07-15 · unverdicted · novelty 5.0

REPPO is an on-policy RL method that combines pathwise policy gradients with relative entropy constraints to achieve stable training and high sample efficiency without replay buffers.

citing papers explorer

Showing 1 of 1 citing paper after filters.

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors cs.RO · 2026-03-16 · conditional · none · ref 27
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.

Fasttd3: Simple, fast, and capable reinforcement learning for humanoid control

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer