{"paper":{"title":"OpenThoughts-Agent: Data Recipes for Agentic Models","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Alexander Glenn Shaw, Alex Dimakis, Anurag Kashyap, Artem Gazizov, Ashima Suvarna, Atula Tejaswi, Benjamin Feuer, Boxuan Li, Charlie F. Ruan, Chinmay Hegde, Daanish Khazi, E. Kelly Buchanan, Emmanouil Koukoumidis, Erica Zhang, Etash Guha, Ethan Shen, Hange Liu, Hanwen Xing, Harsh Raj, Hritik Bansal, Jenia Jitsev, Ke Sun, Leon Liangyu Chen, Lin Shi, Ludwig Schmidt, Marianna Nezhurina, Michael Siu, Minh Pham, Negin Raoof, Nicholas Roberts, Nishad Singhi, Patrick Yubeaton, Reinhard Heckel, Richard Zhuang, Robert Zhang, Ryan Marten, Saadia Gabriel, Sankalp Jajee, Shlok Natarajan, Siyan Zhao, Steven Dillmann, Sujay Sanghavi, Tyler Griggs, Wanjia Zhao, Xiangyi Li, Xiaokun Chen, Xunyi Jiang, Yein Park, Yixin Wang, Zhiwei Xu","submitted_at":"2026-06-23T17:34:29Z","abstract_excerpt":"Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that generalize across diverse agentic tasks. The OpenThoughts-Agent (OT-Agent) project addresses this gap with a fully open data curation pipeline for training agentic models. We conduct more than 100 controlled ablation experiments to systematically investigate each stage of the pipel"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2606.24855","kind":"arxiv","version":1},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2606.24855/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}