Implicit Quantile Networks for Distributional Reinforcement Learning

· 2018 · cs.LG · arXiv 1806.06923

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm's implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

representative citing papers

Mastering Atari with Discrete World Models

cs.LG · 2020-10-05 · accept · novelty 7.0

DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.

What Does Flow Matching Bring To TD Learning?

cs.LG · 2026-03-04 · conditional · novelty 6.0

Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.

DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training

cs.LG · 2025-12-03 · unverdicted · novelty 5.0

DVPO learns token-level value distributions and uses asymmetric risk regularization to contract lower tails while expanding upper tails, outperforming PPO and GRPO under noisy supervision in dialogue, math, and QA tasks.

A Scheme for Dynamic Risk-Sensitive Sequential Decision Making

cs.AI · 2019-07-09 · unverdicted · novelty 3.0

A neural network scheme approximates risk and policies for dynamic risk-sensitive MDPs using synthetic data based on mean-variance risk estimation.

citing papers explorer

Showing 4 of 4 citing papers.

Mastering Atari with Discrete World Models cs.LG · 2020-10-05 · accept · none · ref 13 · internal anchor
DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.
What Does Flow Matching Bring To TD Learning? cs.LG · 2026-03-04 · conditional · none · ref 13 · internal anchor
Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.
DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training cs.LG · 2025-12-03 · unverdicted · none · ref 5 · internal anchor
DVPO learns token-level value distributions and uses asymmetric risk regularization to contract lower tails while expanding upper tails, outperforming PPO and GRPO under noisy supervision in dialogue, math, and QA tasks.
A Scheme for Dynamic Risk-Sensitive Sequential Decision Making cs.AI · 2019-07-09 · unverdicted · none · ref 41 · internal anchor
A neural network scheme approximates risk and policies for dynamic risk-sensitive MDPs using synthetic data based on mean-variance risk estimation.

Implicit Quantile Networks for Distributional Reinforcement Learning

fields

years

verdicts

representative citing papers

citing papers explorer