Human preference scaling with demonstrations for deep reinforcement learning

· 2007 · arXiv 2007.12904

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach

cs.LG · 2024-11-01 · unverdicted · novelty 6.0

DIPPER uses bi-level optimization and DPO to train the higher-level policy from stationary preference comparisons and value regularization, claiming up to 40% gains on robotic navigation and manipulation tasks while introducing metrics for non-stationarity and infeasible subgoals.

Themis: An explainable AI-enabled framework for Reinforcement Learning with Human Feedback

cs.AI · 2026-06-23 · unverdicted · novelty 3.0

Themis is an XAI-enabled framework for RL from human feedback that supports 200+ environments and includes a scalable cloud platform for collecting human preferences.

citing papers explorer

Showing 2 of 2 citing papers.

Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach cs.LG · 2024-11-01 · unverdicted · none · ref 4
DIPPER uses bi-level optimization and DPO to train the higher-level policy from stationary preference comparisons and value regularization, claiming up to 40% gains on robotic navigation and manipulation tasks while introducing metrics for non-stationarity and infeasible subgoals.
Themis: An explainable AI-enabled framework for Reinforcement Learning with Human Feedback cs.AI · 2026-06-23 · unverdicted · none · ref 20
Themis is an XAI-enabled framework for RL from human feedback that supports 200+ environments and includes a scalable cloud platform for collecting human preferences.

Human preference scaling with demonstrations for deep reinforcement learning

fields

years

verdicts

representative citing papers

citing papers explorer