{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:W6AUY5MMPVV5I44UCQMODSCQO4","short_pith_number":"pith:W6AUY5MM","schema_version":"1.0","canonical_sha256":"b7814c758c7d6bd473941418e1c850771ed65b4dc0fb596c301f813c7b3ea475","source":{"kind":"arxiv","id":"2510.24701","version":2},"attestation_state":"computed","paper":{"title":"Tongyi DeepResearch Technical Report","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Tongyi DeepResearch, a sparsely activated 30.5-billion-parameter agentic model, achieves state-of-the-art performance on long-horizon deep research benchmarks.","cross_cats":["cs.AI","cs.IR","cs.LG","cs.MA"],"primary_cat":"cs.CL","authors_text":"Bo Zhang, Chenxi Wang, Dingchu Zhang, Donglei Yu, Fei Huang, Gang Fu, Guangyu Li, Guoxin Chen, Hailong Yin, Haiyang Shen, Huifeng Yin, Jialong Wu, Jiayin Yang, Jingren Zhou, Junkai Zhang, Jun Lin, Kuan Li, Kui Zeng, Liangcai Su, Litu Ou, Liwen Zhang, Li Yang, Maojia Song, Ming Yan, Minpeng Liao, Pengjun Xie, Peng Xia, Qian Xiao, Rui Min, Ruixue Ding, Rui Ye, Runnan Fang, Shaowei Chen, Shen Huang, Shihang Wang, Shihao Cai, Tongyi DeepResearch Team: Baixuan Li, Weizhou Shen, Wenbiao Yin, Xiaobin Wang, Xin Guan, Xinmiao Yu, Xinyu Geng, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Yingcheng Shi, Yong Jiang, Yuning Wu, Zhengwei Tao, Zhen Zhang, Zhongwang Zhang, Zhuo Chen, Zijian Li, Zile Qiao","submitted_at":"2025-10-28T17:53:02Z","abstract_excerpt":"We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized envir"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2510.24701","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2025-10-28T17:53:02Z","cross_cats_sorted":["cs.AI","cs.IR","cs.LG","cs.MA"],"title_canon_sha256":"fdaf53b70fa1410648c148bcc3a9a3008bdccf98a7fc4bb6c71c4aef6f3c0a63","abstract_canon_sha256":"f4e88d3fdfb0fb53f4253b20c5a1f0d8d4c9b9924ffb31617fd6730f9489c434"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:52.957370Z","signature_b64":"hZ1onrvBrNX1pT+HtWu4njx8aYDNojwshZNecUL5BWqlJL5lz5EKhyXMYOhhbhn0m8m24IBvks3JPtTvLd3bCQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"b7814c758c7d6bd473941418e1c850771ed65b4dc0fb596c301f813c7b3ea475","last_reissued_at":"2026-05-17T23:38:52.956713Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:52.956713Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Tongyi DeepResearch Technical Report","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Tongyi DeepResearch, a sparsely activated 30.5-billion-parameter agentic model, achieves state-of-the-art performance on long-horizon deep research benchmarks.","cross_cats":["cs.AI","cs.IR","cs.LG","cs.MA"],"primary_cat":"cs.CL","authors_text":"Bo Zhang, Chenxi Wang, Dingchu Zhang, Donglei Yu, Fei Huang, Gang Fu, Guangyu Li, Guoxin Chen, Hailong Yin, Haiyang Shen, Huifeng Yin, Jialong Wu, Jiayin Yang, Jingren Zhou, Junkai Zhang, Jun Lin, Kuan Li, Kui Zeng, Liangcai Su, Litu Ou, Liwen Zhang, Li Yang, Maojia Song, Ming Yan, Minpeng Liao, Pengjun Xie, Peng Xia, Qian Xiao, Rui Min, Ruixue Ding, Rui Ye, Runnan Fang, Shaowei Chen, Shen Huang, Shihang Wang, Shihao Cai, Tongyi DeepResearch Team: Baixuan Li, Weizhou Shen, Wenbiao Yin, Xiaobin Wang, Xin Guan, Xinmiao Yu, Xinyu Geng, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Yingcheng Shi, Yong Jiang, Yuning Wu, Zhengwei Tao, Zhen Zhang, Zhongwang Zhang, Zhuo Chen, Zijian Li, Zile Qiao","submitted_at":"2025-10-28T17:53:02Z","abstract_excerpt":"We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized envir"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the fully automatic data synthesis pipeline produces training data of sufficient quality and diversity to enable genuine long-horizon research agency without human annotation or verification.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Tongyi DeepResearch is a new agentic LLM that reaches state-of-the-art results on deep research benchmarks including Humanity's Last Exam and BrowseComp through fully automatic data synthesis and specialized training environments.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Tongyi DeepResearch, a sparsely activated 30.5-billion-parameter agentic model, achieves state-of-the-art performance on long-horizon deep research benchmarks.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"950ce73e1442f9615485c8d15db812a83a6008dd575c9570ef4b123fcc9a1507"},"source":{"id":"2510.24701","kind":"arxiv","version":2},"verdict":{"id":"df79380e-9bc2-47b5-859f-c8c210d76962","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T08:52:19.962797Z","strongest_claim":"Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510.","one_line_summary":"Tongyi DeepResearch is a new agentic LLM that reaches state-of-the-art results on deep research benchmarks including Humanity's Last Exam and BrowseComp through fully automatic data synthesis and specialized training environments.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the fully automatic data synthesis pipeline produces training data of sufficient quality and diversity to enable genuine long-horizon research agency without human annotation or verification.","pith_extraction_headline":"Tongyi DeepResearch, a sparsely activated 30.5-billion-parameter agentic model, achieves state-of-the-art performance on long-horizon deep research benchmarks."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":3,"snapshot_sha256":"6a38d56333850261e55b28b7d7686e4e80fa935a96a0e9b21c176af365286f6c"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2510.24701","created_at":"2026-05-17T23:38:52.956837+00:00"},{"alias_kind":"arxiv_version","alias_value":"2510.24701v2","created_at":"2026-05-17T23:38:52.956837+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2510.24701","created_at":"2026-05-17T23:38:52.956837+00:00"},{"alias_kind":"pith_short_12","alias_value":"W6AUY5MMPVV5","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"W6AUY5MMPVV5I44U","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"W6AUY5MM","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":32,"internal_anchor_count":32,"sample":[{"citing_arxiv_id":"2605.22138","citing_title":"Efficient Agentic Reasoning Through Self-Regulated Simulative Planning","ref_index":97,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16217","citing_title":"Argus: Evidence Assembly for Scalable Deep Research Agents","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07022","citing_title":"Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale","ref_index":65,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14133","citing_title":"ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents","ref_index":82,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16217","citing_title":"Argus: Evidence Assembly for Scalable Deep Research Agents","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19196","citing_title":"Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?","ref_index":50,"is_internal_anchor":true},{"citing_arxiv_id":"2511.11793","citing_title":"MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2602.09514","citing_title":"EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2602.12705","citing_title":"MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs","ref_index":58,"is_internal_anchor":true},{"citing_arxiv_id":"2603.04751","citing_title":"Evaluating the Search Agent in a Parallel World","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2603.15262","citing_title":"Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11611","citing_title":"CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14133","citing_title":"ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents","ref_index":82,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12518","citing_title":"TimelineReasoner: Advancing Timeline Summarization with Large Reasoning Models","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13034","citing_title":"ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13534","citing_title":"Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2604.04949","citing_title":"Learning to Retrieve from Agent Trajectories","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12004","citing_title":"Learning Agentic Policy from Action Guidance","ref_index":56,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12481","citing_title":"ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11611","citing_title":"CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10832","citing_title":"Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07202","citing_title":"Towards Autonomous Business Intelligence via Data-to-Insight Discovery Agent","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08283","citing_title":"HTPO: Towards Exploration-Exploitation Balanced Policy Optimization via Hierarchical Token-level Objective Control","ref_index":33,"is_internal_anchor":true},{"citing_arxiv_id":"2604.25256","citing_title":"AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05191","citing_title":"LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents","ref_index":10,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":3,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/W6AUY5MMPVV5I44UCQMODSCQO4","json":"https://pith.science/pith/W6AUY5MMPVV5I44UCQMODSCQO4.json","graph_json":"https://pith.science/api/pith-number/W6AUY5MMPVV5I44UCQMODSCQO4/graph.json","events_json":"https://pith.science/api/pith-number/W6AUY5MMPVV5I44UCQMODSCQO4/events.json","paper":"https://pith.science/paper/W6AUY5MM"},"agent_actions":{"view_html":"https://pith.science/pith/W6AUY5MMPVV5I44UCQMODSCQO4","download_json":"https://pith.science/pith/W6AUY5MMPVV5I44UCQMODSCQO4.json","view_paper":"https://pith.science/paper/W6AUY5MM","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2510.24701&json=true","fetch_graph":"https://pith.science/api/pith-number/W6AUY5MMPVV5I44UCQMODSCQO4/graph.json","fetch_events":"https://pith.science/api/pith-number/W6AUY5MMPVV5I44UCQMODSCQO4/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/W6AUY5MMPVV5I44UCQMODSCQO4/action/timestamp_anchor","attest_storage":"https://pith.science/pith/W6AUY5MMPVV5I44UCQMODSCQO4/action/storage_attestation","attest_author":"https://pith.science/pith/W6AUY5MMPVV5I44UCQMODSCQO4/action/author_attestation","sign_citation":"https://pith.science/pith/W6AUY5MMPVV5I44UCQMODSCQO4/action/citation_signature","submit_replication":"https://pith.science/pith/W6AUY5MMPVV5I44UCQMODSCQO4/action/replication_record"}},"created_at":"2026-05-17T23:38:52.956837+00:00","updated_at":"2026-05-17T23:38:52.956837+00:00"}