hub

Fasttd3: Simple, fast, and capable reinforcement learning for humanoid control

· 2025 · arXiv 2505.22642

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

ACPO: Agent-Chained Policy Optimization for Multi-Agent Reinforcement Learning

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

ACPO decomposes the joint policy gradient into per-agent terms allowing independent actor training that collectively forms a joint gradient step in CTDE-based MARL.

AnyBody: Free-Form Whole-Body Humanoid Control from Arbitrary Keypoint Guidance

cs.RO · 2026-06-28 · unverdicted · novelty 6.0

AnyBody distills a privileged teacher tracker into a latent unit-sphere representation and uses a masked transformer to drive humanoid control from arbitrary keypoint subsets.

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

cs.RO · 2026-05-28 · unverdicted · novelty 6.0

UniLab is a CPU/GPU heterogeneous system for robot RL training using MuJoCoUni and MotrixSim backends that reports 3-10x end-to-end efficiency improvements and cross-platform compatibility beyond CUDA.

Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs

cs.CE · 2026-04-07 · unverdicted · novelty 6.0

Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.

FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

cs.LG · 2026-04-06 · unverdicted · novelty 6.0 · 2 refs

FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

cs.RO · 2026-03-16 · conditional · novelty 6.0

ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.

FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control

cs.LG · 2026-03-13 · unverdicted · novelty 6.0

FastDSAC enables state-of-the-art maximum entropy RL for high-dimensional humanoid control via entropy redistribution per dimension and improved continuous value estimation.

ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation

cs.RO · 2025-06-19 · unverdicted · novelty 6.0

ViTacFormer learns a cross-modal visuo-tactile latent space with autoregressive tactile prediction and an easy-to-hard curriculum, then uses the representation for imitation learning that yields ~50% higher success and the first reported 11-stage, 2.5-minute autonomous dexterous tasks.

When Does Non-Uniform Replay Matter in Reinforcement Learning?

cs.LG · 2026-05-11 · unverdicted · novelty 5.0 · 3 refs

Non-uniform replay helps most when replay volume is low; high-entropy sampling remains important, and a truncated geometric distribution delivers better sample efficiency with negligible overhead.

Dyna-Style Safety Augmented Reinforcement Learning: Staying Safe in the Face of Uncertainty

cs.LG · 2026-04-28 · unverdicted · novelty 5.0

Dyna-SAuR learns scalable safety filters and policies from an uncertainty-aware model, cutting failures by two orders of magnitude on CartPole and MuJoCo Walker tasks.

Relative Entropy Pathwise Policy Optimization

cs.LG · 2025-07-15 · unverdicted · novelty 5.0

REPPO is an on-policy RL method that combines pathwise policy gradients with relative entropy constraints to achieve stable training and high sample efficiency without replay buffers.

FastDSAC: Enhancing Policy Plasticity via Constrained Exploration for Scalable Humanoid Locomotion

cs.RO · 2026-06-30 · unverdicted · novelty 4.0

FastDSAC adds a truncated Gaussian policy constraint to distributional actor-critic methods to preserve network plasticity and accelerate training for scalable humanoid locomotion in parallel sampling setups.

citing papers explorer

Showing 12 of 12 citing papers.

ACPO: Agent-Chained Policy Optimization for Multi-Agent Reinforcement Learning cs.AI · 2026-06-29 · unverdicted · none · ref 100
ACPO decomposes the joint policy gradient into per-agent terms allowing independent actor training that collectively forms a joint gradient step in CTDE-based MARL.
AnyBody: Free-Form Whole-Body Humanoid Control from Arbitrary Keypoint Guidance cs.RO · 2026-06-28 · unverdicted · none · ref 2
AnyBody distills a privileged teacher tracker into a latent unit-sphere representation and uses a masked transformer to drive humanoid control from arbitrary keypoint subsets.
UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms cs.RO · 2026-05-28 · unverdicted · none · ref 30
UniLab is a CPU/GPU heterogeneous system for robot RL training using MuJoCoUni and MotrixSim backends that reports 3-10x end-to-end efficiency improvements and cross-platform compatibility beyond CUDA.
Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs cs.CE · 2026-04-07 · unverdicted · none · ref 39
Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control cs.LG · 2026-04-06 · unverdicted · none · ref 75 · 2 links
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors cs.RO · 2026-03-16 · conditional · none · ref 27
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control cs.LG · 2026-03-13 · unverdicted · none · ref 5
FastDSAC enables state-of-the-art maximum entropy RL for high-dimensional humanoid control via entropy redistribution per dimension and improved continuous value estimation.
ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation cs.RO · 2025-06-19 · unverdicted · none · ref 25
ViTacFormer learns a cross-modal visuo-tactile latent space with autoregressive tactile prediction and an easy-to-hard curriculum, then uses the representation for imitation learning that yields ~50% higher success and the first reported 11-stage, 2.5-minute autonomous dexterous tasks.
When Does Non-Uniform Replay Matter in Reinforcement Learning? cs.LG · 2026-05-11 · unverdicted · none · ref 32 · 3 links
Non-uniform replay helps most when replay volume is low; high-entropy sampling remains important, and a truncated geometric distribution delivers better sample efficiency with negligible overhead.
Dyna-Style Safety Augmented Reinforcement Learning: Staying Safe in the Face of Uncertainty cs.LG · 2026-04-28 · unverdicted · none · ref 5
Dyna-SAuR learns scalable safety filters and policies from an uncertainty-aware model, cutting failures by two orders of magnitude on CartPole and MuJoCo Walker tasks.
Relative Entropy Pathwise Policy Optimization cs.LG · 2025-07-15 · unverdicted · none · ref 11
REPPO is an on-policy RL method that combines pathwise policy gradients with relative entropy constraints to achieve stable training and high sample efficiency without replay buffers.
FastDSAC: Enhancing Policy Plasticity via Constrained Exploration for Scalable Humanoid Locomotion cs.RO · 2026-06-30 · unverdicted · none · ref 21
FastDSAC adds a truncated Gaussian policy constraint to distributional actor-critic methods to preserve network plasticity and accelerate training for scalable humanoid locomotion in parallel sampling setups.

Fasttd3: Simple, fast, and capable reinforcement learning for humanoid control

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer