pith. the verified trust layer for science. sign in

arxiv: 1810.11748 · v1 · pith:HCPGADKWnew · submitted 2018-10-28 · 💻 cs.HC · cs.LG

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

classification 💻 cs.HC cs.LG
keywords agentfeedbackdqn-tamerhumanhuman-in-the-looprewardsactionsapplication
0
0 comments X p. Extension
Add this Pith Number to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{HCPGADKW}

Prints a linked pith:HCPGADKW badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Exploration has been one of the greatest challenges in reinforcement learning (RL), which is a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms, building a well-learned agent often requires too many trials, mainly due to the difficulty of matching its actions with rewards in the distant future. A remedy for this is to train an agent with real-time feedback from a human observer who immediately gives rewards for some actions. This study tackles a series of challenges for introducing such a human-in-the-loop RL scheme. The first contribution of this work is our experiments with a precisely modeled human observer: binary, delay, stochasticity, unsustainability, and natural reaction. We also propose an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards. We find that DQN-TAMER agents outperform their baselines in Maze and Taxi simulated environments. Furthermore, we demonstrate a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions

    eess.SY 2025-08 unverdicted novelty 2.0

    A literature review of safe RL using Lyapunov and barrier functions that identifies a shift to model-free methods since 2017, well-defined open problems per approach class, and high-dimensional scalability as the main...