{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:Y724IUS3R43ZC5N6FQA2E34LPF","short_pith_number":"pith:Y724IUS3","schema_version":"1.0","canonical_sha256":"c7f5c4525b8f379175be2c01a26f8b797279fcbc3c81d5a12b78a0690de43b95","source":{"kind":"arxiv","id":"2504.01296","version":1},"attestation_state":"computed","paper":{"title":"ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Reinforcement learning with token limits can cut LLM chain-of-thought length in half while dropping accuracy by only two percent on math benchmarks.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Bairu Hou, Jacob Andreas, Jiabao Ji, Kaizhi Qian, Shiyu Chang, Yang Zhang, Yujian Liu","submitted_at":"2025-04-02T01:59:26Z","abstract_excerpt":"We present ThinkPrune, a simple yet effective method for pruning the thinking length for long-thinking LLMs, which has been found to often produce inefficient and redundant thinking processes. Existing preliminary explorations of reducing thinking length primarily focus on forcing the thinking process to early exit, rather than adapting the LLM to optimize and consolidate the thinking process, and therefore the length-performance tradeoff observed so far is sub-optimal. To fill this gap, ThinkPrune offers a simple solution that continuously trains the long-thinking LLMs via reinforcement learn"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2504.01296","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2025-04-02T01:59:26Z","cross_cats_sorted":[],"title_canon_sha256":"8ac750f31e605e586bf46d52ef3b42890890427fa911befb0f7ce9074a4bcba2","abstract_canon_sha256":"ac003bb650b4416d87af90a5efc31ec072ca7afd6947d4adfcc624afa51623a6"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:14.733747Z","signature_b64":"vG5+DsBNtixFjkmuZGbrdOp6/XGveNd/3TNoECxW6bwuXnZVMwpsJrRJ3FTQINtt23C3u/qG7Q+rPjy22CvkCA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"c7f5c4525b8f379175be2c01a26f8b797279fcbc3c81d5a12b78a0690de43b95","last_reissued_at":"2026-05-17T23:38:14.733060Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:14.733060Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Reinforcement learning with token limits can cut LLM chain-of-thought length in half while dropping accuracy by only two percent on math benchmarks.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Bairu Hou, Jacob Andreas, Jiabao Ji, Kaizhi Qian, Shiyu Chang, Yang Zhang, Yujian Liu","submitted_at":"2025-04-02T01:59:26Z","abstract_excerpt":"We present ThinkPrune, a simple yet effective method for pruning the thinking length for long-thinking LLMs, which has been found to often produce inefficient and redundant thinking processes. Existing preliminary explorations of reducing thinking length primarily focus on forcing the thinking process to early exit, rather than adapting the LLM to optimize and consolidate the thinking process, and therefore the length-performance tradeoff observed so far is sub-optimal. To fill this gap, ThinkPrune offers a simple solution that continuously trains the long-thinking LLMs via reinforcement learn"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"ThinkPrune results in a remarkable performance-length tradeoff -- on the AIME24 dataset, the reasoning length of DeepSeek-R1-Distill-Qwen-1.5B can be reduced by half with only 2% drop in performance.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the RL objective with the added token-limit penalty will converge to a policy that preserves core reasoning capability rather than learning superficial shortcuts that only work on the training distribution.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ThinkPrune halves reasoning length on AIME24 for DeepSeek-R1-Distill-Qwen-1.5B with only 2% performance drop by applying iterative RL under token limits.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Reinforcement learning with token limits can cut LLM chain-of-thought length in half while dropping accuracy by only two percent on math benchmarks.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"6edb23556a7e21cd27f4951a96f64fd2f1f7206d8f38e0c7432a0821528d78cb"},"source":{"id":"2504.01296","kind":"arxiv","version":1},"verdict":{"id":"c13209b5-4e8f-4c51-aefe-76d27a1bd202","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T07:11:17.367727Z","strongest_claim":"ThinkPrune results in a remarkable performance-length tradeoff -- on the AIME24 dataset, the reasoning length of DeepSeek-R1-Distill-Qwen-1.5B can be reduced by half with only 2% drop in performance.","one_line_summary":"ThinkPrune halves reasoning length on AIME24 for DeepSeek-R1-Distill-Qwen-1.5B with only 2% performance drop by applying iterative RL under token limits.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the RL objective with the added token-limit penalty will converge to a policy that preserves core reasoning capability rather than learning superficial shortcuts that only work on the training distribution.","pith_extraction_headline":"Reinforcement learning with token limits can cut LLM chain-of-thought length in half while dropping accuracy by only two percent on math benchmarks."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"4756eb27130b8a59e9b62e464b120b3e5aa2c678d970c22064d2f1deb476d005"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2504.01296","created_at":"2026-05-17T23:38:14.733164+00:00"},{"alias_kind":"arxiv_version","alias_value":"2504.01296v1","created_at":"2026-05-17T23:38:14.733164+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2504.01296","created_at":"2026-05-17T23:38:14.733164+00:00"},{"alias_kind":"pith_short_12","alias_value":"Y724IUS3R43Z","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"Y724IUS3R43ZC5N6","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"Y724IUS3","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":25,"internal_anchor_count":25,"sample":[{"citing_arxiv_id":"2504.10368","citing_title":"Exploring the System 1 Thinking Capability of Large Reasoning Models","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2505.13975","citing_title":"DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22211","citing_title":"CLORE: Content-Level Optimization for Reasoning Efficiency","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2508.10164","citing_title":"Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2509.02547","citing_title":"The Landscape of Agentic Reinforcement Learning for LLMs: A Survey","ref_index":209,"is_internal_anchor":true},{"citing_arxiv_id":"2510.08483","citing_title":"DeepPrune: Parallel Scaling without Inter-trace Redundancy","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2510.19669","citing_title":"DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2512.01925","citing_title":"Rectifying LLM Thought from Lens of Optimization","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2512.19995","citing_title":"Schoenfeld's Anatomy of Mathematical Reasoning by Language Models","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2601.11340","citing_title":"Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2602.09953","citing_title":"ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2603.05433","citing_title":"CRISP: Compressed Reasoning via Iterative Self-Policy Distillation","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2603.08659","citing_title":"CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10195","citing_title":"Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13165","citing_title":"STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2503.16419","citing_title":"Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models","ref_index":66,"is_internal_anchor":true},{"citing_arxiv_id":"2503.09567","citing_title":"Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models","ref_index":268,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10195","citing_title":"Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09806","citing_title":"LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2604.24881","citing_title":"Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2604.24003","citing_title":"Stabilizing Efficient Reasoning with Step-Level Advantage Selection","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06165","citing_title":"Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost","ref_index":214,"is_internal_anchor":true},{"citing_arxiv_id":"2605.01111","citing_title":"When Less is Enough: Efficient Inference via Collaborative Reasoning","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2604.09852","citing_title":"MEMENTO: Teaching LLMs to Manage Their Own Context","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2604.17312","citing_title":"A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions","ref_index":5,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/Y724IUS3R43ZC5N6FQA2E34LPF","json":"https://pith.science/pith/Y724IUS3R43ZC5N6FQA2E34LPF.json","graph_json":"https://pith.science/api/pith-number/Y724IUS3R43ZC5N6FQA2E34LPF/graph.json","events_json":"https://pith.science/api/pith-number/Y724IUS3R43ZC5N6FQA2E34LPF/events.json","paper":"https://pith.science/paper/Y724IUS3"},"agent_actions":{"view_html":"https://pith.science/pith/Y724IUS3R43ZC5N6FQA2E34LPF","download_json":"https://pith.science/pith/Y724IUS3R43ZC5N6FQA2E34LPF.json","view_paper":"https://pith.science/paper/Y724IUS3","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2504.01296&json=true","fetch_graph":"https://pith.science/api/pith-number/Y724IUS3R43ZC5N6FQA2E34LPF/graph.json","fetch_events":"https://pith.science/api/pith-number/Y724IUS3R43ZC5N6FQA2E34LPF/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/Y724IUS3R43ZC5N6FQA2E34LPF/action/timestamp_anchor","attest_storage":"https://pith.science/pith/Y724IUS3R43ZC5N6FQA2E34LPF/action/storage_attestation","attest_author":"https://pith.science/pith/Y724IUS3R43ZC5N6FQA2E34LPF/action/author_attestation","sign_citation":"https://pith.science/pith/Y724IUS3R43ZC5N6FQA2E34LPF/action/citation_signature","submit_replication":"https://pith.science/pith/Y724IUS3R43ZC5N6FQA2E34LPF/action/replication_record"}},"created_at":"2026-05-17T23:38:14.733164+00:00","updated_at":"2026-05-17T23:38:14.733164+00:00"}