Hyperspherical normalization for scalable deep reinforcement learning

Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo · 2025 · arXiv 2502.15280

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Intentional Updates for Streaming Reinforcement Learning

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Intentional TD and Intentional Policy Gradient select step sizes for fixed fractional TD error reduction and bounded policy KL divergence, yielding stable streaming deep RL performance on par with batch methods.

Extending Differential Temporal Difference Methods for Episodic Problems

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

A generalization of differential TD extends it to episodic settings while preserving policy ordering, inheriting linear TD guarantees, and improving sample efficiency.

FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

cs.LG · 2026-04-06 · unverdicted · novelty 6.0 · 2 refs

FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.

What Does Flow Matching Bring To TD Learning?

cs.LG · 2026-03-04 · conditional · novelty 6.0

Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.

Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning

cs.LG · 2025-10-02 · unverdicted · novelty 6.0

MINTO sets bootstrapped targets to the minimum of online and target network estimates, yielding faster stable value learning across online/offline RL and discrete/continuous actions.

When Does Non-Uniform Replay Matter in Reinforcement Learning?

cs.LG · 2026-05-11 · unverdicted · novelty 5.0 · 3 refs

Non-uniform replay helps most when replay volume is low; high-entropy sampling remains important, and a truncated geometric distribution delivers better sample efficiency with negligible overhead.

EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control

cs.LG · 2026-05-15 · unverdicted · novelty 4.0

EfficientTDMPC extends the TD-MPC family with model ensembles, return averaging, and uncertainty penalties to reach SOTA sample efficiency on hard continuous control benchmarks in low-data regimes.

citing papers explorer

Showing 7 of 7 citing papers.

Intentional Updates for Streaming Reinforcement Learning cs.LG · 2026-04-21 · unverdicted · none · ref 3
Intentional TD and Intentional Policy Gradient select step sizes for fixed fractional TD error reduction and bounded policy KL divergence, yielding stable streaming deep RL performance on par with batch methods.
Extending Differential Temporal Difference Methods for Episodic Problems cs.LG · 2026-05-06 · unverdicted · none · ref 3
A generalization of differential TD extends it to episodic settings while preserving policy ordering, inheriting linear TD guarantees, and improving sample efficiency.
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control cs.LG · 2026-04-06 · unverdicted · none · ref 40 · 2 links
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
What Does Flow Matching Bring To TD Learning? cs.LG · 2026-03-04 · conditional · none · ref 31
Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.
Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning cs.LG · 2025-10-02 · unverdicted · none · ref 3
MINTO sets bootstrapped targets to the minimum of online and target network estimates, yielding faster stable value learning across online/offline RL and discrete/continuous actions.
When Does Non-Uniform Replay Matter in Reinforcement Learning? cs.LG · 2026-05-11 · unverdicted · none · ref 18 · 3 links
Non-uniform replay helps most when replay volume is low; high-entropy sampling remains important, and a truncated geometric distribution delivers better sample efficiency with negligible overhead.
EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control cs.LG · 2026-05-15 · unverdicted · none · ref 23
EfficientTDMPC extends the TD-MPC family with model ensembles, return averaging, and uncertainty penalties to reach SOTA sample efficiency on hard continuous control benchmarks in low-data regimes.

Hyperspherical normalization for scalable deep reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer