pith. machine review for the scientific record.
sign in

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

method 2

citation-polarity summary

years

2026 6 2023 1

roles

method 2

polarities

use method 2

representative citing papers

Bounded Ratio Reinforcement Learning

cs.LG · 2026-04-20 · conditional · novelty 7.0

BRRL derives an analytic optimal policy for regularized constrained RL that guarantees monotonic improvement and yields the BPO algorithm that matches or exceeds PPO.

ANO: A Principled Approach to Robust Policy Optimization

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

ANO derives a robust policy optimizer from geometric principles that replaces clipping with a smooth redescending gradient, showing better performance and stability than PPO, SPO, and GRPO in MuJoCo, Atari, and RLHF experiments.

citing papers explorer

Showing 7 of 7 citing papers.