Title resolution pending

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine · 2018

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights

cs.LG · 2026-05-18 · conditional · novelty 6.0

A maximum entropy reinforcement learning framework generates realistic customer trajectories in retail spaces that match real data better than TSP or PNN heuristics and support more accurate layout optimization decisions.

CAPSULE: Control-Theoretic Action Perturbations for Safe Uncertainty-Aware Reinforcement Learning

cs.LG · 2026-04-26 · unverdicted · novelty 5.0

CAPSULE learns probabilistic control-affine dynamics offline to construct uncertainty-incorporating control barrier functions that enforce conservative safety constraints via online action correction in reinforcement learning.

Distributional Value Estimation Without Target Networks for Robust Quality-Diversity

cs.LG · 2026-04-22 · unverdicted · novelty 5.0

QDHUAC is a distributional, target-free QD-RL method that enables stable high-UTD training and competitive performance on Brax locomotion tasks using far fewer environment steps than prior approaches.

RAMP: Hybrid DRL for Online Learning of Numeric Action Models

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

RAMP learns numeric action models online via a DRL-planning feedback loop and outperforms PPO on IPC numeric domains in solvability and plan quality.

citing papers explorer

Showing 4 of 4 citing papers.

Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights cs.LG · 2026-05-18 · conditional · none · ref 23
A maximum entropy reinforcement learning framework generates realistic customer trajectories in retail spaces that match real data better than TSP or PNN heuristics and support more accurate layout optimization decisions.
CAPSULE: Control-Theoretic Action Perturbations for Safe Uncertainty-Aware Reinforcement Learning cs.LG · 2026-04-26 · unverdicted · none · ref 6
CAPSULE learns probabilistic control-affine dynamics offline to construct uncertainty-incorporating control barrier functions that enforce conservative safety constraints via online action correction in reinforcement learning.
Distributional Value Estimation Without Target Networks for Robust Quality-Diversity cs.LG · 2026-04-22 · unverdicted · none · ref 19
QDHUAC is a distributional, target-free QD-RL method that enables stable high-UTD training and competitive performance on Brax locomotion tasks using far fewer environment steps than prior approaches.
RAMP: Hybrid DRL for Online Learning of Numeric Action Models cs.AI · 2026-04-09 · unverdicted · none · ref 11
RAMP learns numeric action models online via a DRL-planning feedback loop and outperforms PPO on IPC numeric domains in solvability and plan quality.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer