{"work":{"id":"64019d00-0b11-4bbd-b173-b46c8fad0157","openalex_id":null,"doi":null,"arxiv_id":"2503.14476","raw_key":null,"title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","authors":null,"authors_text":"Qiying Yu et al","year":2025,"venue":"cs.LG","abstract":"Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the $\\textbf{D}$ecoupled Clip and $\\textbf{D}$ynamic s$\\textbf{A}$mpling $\\textbf{P}$olicy $\\textbf{O}$ptimization ($\\textbf{DAPO}$) algorithm, and fully open-source a state-of-the-art large-scale RL system that achieves 50 points on AIME 2024 using Qwen2.5-32B base model. Unlike previous works that withhold training details, we introduce four key techniques of our algorithm that make large-scale LLM RL a success. In addition, we open-source our training code, which is built on the verl framework, along with a carefully curated and processed dataset. These components of our open-source system enhance reproducibility and support future research in large-scale LLM RL.","external_url":"https://arxiv.org/abs/2503.14476","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-25T07:05:26.253801+00:00","pith_arxiv_id":"2503.14476","created_at":"2026-05-08T17:23:40.809434+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":true,"display_title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","render_title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale"},"hub":{"state":{"work_id":"64019d00-0b11-4bbd-b173-b46c8fad0157","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":414,"external_cited_by_count":null,"distinct_field_count":16,"first_pith_cited_at":"2024-05-20T01:04:40+00:00","last_pith_cited_at":"2026-05-22T16:29:51+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-12T04:59:08.457161+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":59},{"context_role":"method","n":26},{"context_role":"baseline","n":11},{"context_role":"dataset","n":10},{"context_role":"other","n":1}],"polarity_counts":[{"context_polarity":"background","n":59},{"context_polarity":"use_method","n":24},{"context_polarity":"baseline","n":11},{"context_polarity":"use_dataset","n":10},{"context_polarity":"unclear","n":3}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","claims":[{"claim_text":"Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the $\\textbf{D}$ecoupled Clip and $\\textbf{D}$ynamic s$\\textbf{A}$mpling $\\textbf{P}$olicy $\\textbf{O}$ptimization ($\\textbf{DAPO}$) algorithm, and fully open-source a state-of-the-art large-scale RL system that achieves 50","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks DAPO: An Open-Source LLM Reinforcement Learning System at Scale because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-13T19:33:30.297931+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"b8758931-2570-4a5c-afe9-757bfcf9d8dc","orcid":null,"display_name":"Qiying Yu et al"}]},"error":null,"updated_at":"2026-05-13T19:33:30.295386+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-13T19:33:29.971225+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":176},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":118},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":114},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":103},{"title":"Group Sequence Policy Optimization","work_id":"3a98b53b-9f52-4d95-adf7-89353c0a9a65","shared_citers":60},{"title":"Understanding R1-Zero-Like Training: A Critical Perspective","work_id":"ec354f3b-9484-4a0c-94c8-92d4d0260835","shared_citers":49},{"title":"OpenAI o1 System Card","work_id":"68d3c334-0fc9-49e3-b7b0-a69afae933e2","shared_citers":47},{"title":"Measuring Mathematical Problem Solving With the MATH Dataset","work_id":"50652ac6-fb7c-4675-a2c2-159c241feb17","shared_citers":38},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":36},{"title":"HybridFlow: A Flexible and Efficient RLHF Framework","work_id":"7eb9c9f4-b322-4bba-8011-09ff8d6ad801","shared_citers":35},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":33},{"title":"Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?","work_id":"d854765a-e664-41c0-8655-21c4bf2e0cc4","shared_citers":31},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":29},{"title":"Kimi k1.5: Scaling Reinforcement Learning with LLMs","work_id":"bff96ab1-bd6a-4585-be23-74fdb51969c7","shared_citers":29},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":29},{"title":"Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities","work_id":"008df105-2fdd-45d8-857a-8e35868aecb6","shared_citers":28},{"title":"Qwen2.5-VL Technical Report","work_id":"69dffacb-bfe8-442d-be86-48624c60426f","shared_citers":27},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":25},{"title":"GPT-4o System Card","work_id":"f37bf1c7-4964-4e56-9762-d20da8d9009f","shared_citers":23},{"title":"Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model","work_id":"763e0e44-40dd-4bdd-8414-21f8f9ce6d10","shared_citers":23},{"title":"DeepSeek-V3 Technical Report","work_id":"57d2791d-2219-4c31-a077-afc04b12a75c","shared_citers":22},{"title":"Qwen3-VL Technical Report","work_id":"1fe243aa-e3c0-4da6-b391-4cbcfc88d5c0","shared_citers":22},{"title":"Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement","work_id":"a097c5d4-6d32-46ee-9826-57d532bbfc9c","shared_citers":20},{"title":"The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models","work_id":"d4b4aee4-d20f-4572-886a-4ba9ea6c9b81","shared_citers":19}],"time_series":[{"n":9,"year":2025},{"n":213,"year":2026}]},"error":null,"updated_at":"2026-05-13T19:33:30.121225+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"fixed":1,"items":[{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-13T19:33:29.312053+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","claims":[{"claim_text":"Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the $\\textbf{D}$ecoupled Clip and $\\textbf{D}$ynamic s$\\textbf{A}$mpling $\\textbf{P}$olicy $\\textbf{O}$ptimization ($\\textbf{DAPO}$) algorithm, and fully open-source a state-of-the-art large-scale RL system that achieves 50","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks DAPO: An Open-Source LLM Reinforcement Learning System at Scale because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-13T19:33:29.976415+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","claims":[{"claim_text":"Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the $\\textbf{D}$ecoupled Clip and $\\textbf{D}$ynamic s$\\textbf{A}$mpling $\\textbf{P}$olicy $\\textbf{O}$ptimization ($\\textbf{DAPO}$) algorithm, and fully open-source a state-of-the-art large-scale RL system that achieves 50","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks DAPO: An Open-Source LLM Reinforcement Learning System at Scale because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-13T19:33:29.974461+00:00"}},"summary":{"title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","claims":[{"claim_text":"Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the $\\textbf{D}$ecoupled Clip and $\\textbf{D}$ynamic s$\\textbf{A}$mpling $\\textbf{P}$olicy $\\textbf{O}$ptimization ($\\textbf{DAPO}$) algorithm, and fully open-source a state-of-the-art large-scale RL system that achieves 50","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks DAPO: An Open-Source LLM Reinforcement Learning System at Scale because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":176},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":118},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":114},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":103},{"title":"Group Sequence Policy Optimization","work_id":"3a98b53b-9f52-4d95-adf7-89353c0a9a65","shared_citers":60},{"title":"Understanding R1-Zero-Like Training: A Critical Perspective","work_id":"ec354f3b-9484-4a0c-94c8-92d4d0260835","shared_citers":49},{"title":"OpenAI o1 System Card","work_id":"68d3c334-0fc9-49e3-b7b0-a69afae933e2","shared_citers":47},{"title":"Measuring Mathematical Problem Solving With the MATH Dataset","work_id":"50652ac6-fb7c-4675-a2c2-159c241feb17","shared_citers":38},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":36},{"title":"HybridFlow: A Flexible and Efficient RLHF Framework","work_id":"7eb9c9f4-b322-4bba-8011-09ff8d6ad801","shared_citers":35},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":33},{"title":"Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?","work_id":"d854765a-e664-41c0-8655-21c4bf2e0cc4","shared_citers":31},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":29},{"title":"Kimi k1.5: Scaling Reinforcement Learning with LLMs","work_id":"bff96ab1-bd6a-4585-be23-74fdb51969c7","shared_citers":29},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":29},{"title":"Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities","work_id":"008df105-2fdd-45d8-857a-8e35868aecb6","shared_citers":28},{"title":"Qwen2.5-VL Technical Report","work_id":"69dffacb-bfe8-442d-be86-48624c60426f","shared_citers":27},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":25},{"title":"GPT-4o System Card","work_id":"f37bf1c7-4964-4e56-9762-d20da8d9009f","shared_citers":23},{"title":"Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model","work_id":"763e0e44-40dd-4bdd-8414-21f8f9ce6d10","shared_citers":23},{"title":"DeepSeek-V3 Technical Report","work_id":"57d2791d-2219-4c31-a077-afc04b12a75c","shared_citers":22},{"title":"Qwen3-VL Technical Report","work_id":"1fe243aa-e3c0-4da6-b391-4cbcfc88d5c0","shared_citers":22},{"title":"Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement","work_id":"a097c5d4-6d32-46ee-9826-57d532bbfc9c","shared_citers":20},{"title":"The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models","work_id":"d4b4aee4-d20f-4572-886a-4ba9ea6c9b81","shared_citers":19}],"time_series":[{"n":9,"year":2025},{"n":213,"year":2026}]},"authors":[{"id":"b8758931-2570-4a5c-afe9-757bfcf9d8dc","orcid":null,"display_name":"Qiying Yu et al","source":"manual","import_confidence":0.72}]}}