pith. sign in

hub Canonical reference

Policy distillation

Canonical reference. 80% of citing Pith papers cite this work as background.

18 Pith papers citing it
Background 80% of classified citations
abstract

Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.

hub tools

citation-role summary

background 4 method 1

citation-polarity summary

representative citing papers

Policy-DRIFT: Dynamic Reward-Informed Flow Trajectory Steering

physics.flu-dyn · 2026-05-13 · unverdicted · novelty 6.0

Policy-DRIFT combines conditional flow matching with terminal reward guidance and decoupled DRL to achieve 49% drag reduction in Re_tau=180 channel flow, 16% above DRL benchmarks and with 37 times less actuation energy.

Continual Domain Randomization

cs.RO · 2024-03-18 · unverdicted · novelty 6.0

Continual Domain Randomization trains RL policies sequentially on randomization parameter subsets with continual learning to achieve robust sim-to-real transfer in robotic reaching and grasping.

Attentive Multi-Task Deep Reinforcement Learning

cs.LG · 2019-07-05 · unverdicted · novelty 6.0

Attention mechanism dynamically groups task knowledge at state granularity in multi-task DRL to enable positive transfer and avoid negative transfer, matching or exceeding prior methods with fewer parameters.

Precise Aggressive Aerial Maneuvers with Sensorimotor Policies

cs.RO · 2026-04-07 · unverdicted · novelty 6.0

Reinforcement learning sensorimotor policies enable quadrotors to traverse narrow gaps at extreme tilts with 5 cm clearance using only vision and proprioception, including reactive traversal of moving gaps.

MiniLLM: On-Policy Distillation of Large Language Models

cs.CL · 2023-06-14 · conditional · novelty 6.0

MiniLLM distills large language models into smaller ones via reverse KL divergence and on-policy optimization, yielding higher-quality responses with lower exposure bias than standard KD baselines.

Progressive Neural Networks

cs.LG · 2016-06-15 · unverdicted · novelty 6.0

Progressive neural networks learn sequences of RL tasks without catastrophic forgetting by freezing prior columns and adding lateral connections for knowledge transfer.

VISD: Enhancing Video Reasoning via Structured Self-Distillation

cs.CV · 2026-05-07 · unverdicted · novelty 5.0 · 4 refs

VISD proposes structured self-distillation with a multi-dimensional judge model and direction-magnitude decoupling to improve token-level credit assignment and convergence speed in VideoLLM reasoning training.

Combining Trained Models in Reinforcement Learning

cs.LG · 2026-05-04 · accept · novelty 5.0

A review of 15 studies finds positive transfer in DRL mainly when source and target tasks share structure or include alignment mechanisms, but compute-matched comparisons against from-scratch baselines remain rare.

citing papers explorer

Showing 18 of 18 citing papers.