{"work":{"id":"19e844ce-da43-4244-b2fd-0cef95384b68","openalex_id":null,"doi":null,"arxiv_id":"1511.06295","raw_key":null,"title":"Policy Distillation","authors":null,"authors_text":"Andrei A Rusu, Sergio Gomez Colmenarejo, Caglar Gul- cehre, Guillaume Desjardins, James Kirkpatrick, Raz- van Pascanu, V olodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell","year":2015,"venue":"cs.LG","abstract":"Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.","external_url":"https://arxiv.org/abs/1511.06295","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-25T06:10:23.790482+00:00","pith_arxiv_id":"1511.06295","created_at":"2026-05-09T05:55:30.228803+00:00","updated_at":"2026-05-25T06:10:23.790482+00:00","title_quality_ok":false,"display_title":"Policy distillation","render_title":"Policy distillation"},"hub":{"state":{"work_id":"19e844ce-da43-4244-b2fd-0cef95384b68","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":18,"external_cited_by_count":null,"distinct_field_count":9,"first_pith_cited_at":"2016-06-15T08:20:51+00:00","last_pith_cited_at":"2026-05-20T19:56:30+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-02T01:44:04.758776+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":4},{"context_role":"method","n":1}],"polarity_counts":[{"context_polarity":"background","n":4},{"context_polarity":"use_method","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}