{"paper":{"title":"AIDE: AI-Driven Exploration in the Space of Code","license":"http://creativecommons.org/licenses/by/4.0/","headline":"AIDE uses large language models to perform tree search in code space and reaches state-of-the-art results on Kaggle, OpenAI MLE-Bench, and METR RE-Bench.","cross_cats":["cs.LG"],"primary_cat":"cs.AI","authors_text":"Deniss Jacenko, Dhruv Srikanth, Dixing Xu, Dominik Schmidt, Ian Kaplan, Yuxiang WU, Zhengyao Jiang","submitted_at":"2025-02-18T18:57:21Z","abstract_excerpt":"Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs"},"claims":{"count":3,"items":[{"kind":"strongest_claim","text":"By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the tree search guided by LLMs can reliably identify and improve upon promising code variants without the search space becoming intractable or the evaluations becoming unreliable.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"AIDE uses large language models to perform tree search in code space and reaches state-of-the-art results on Kaggle, OpenAI MLE-Bench, and METR RE-Bench.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"}],"snapshot_sha256":"a7a736928e39dd1d21318f06b558abc94b39c80ce541c3a3920bfa620a4dd389"},"source":{"id":"2502.13138","kind":"arxiv","version":1},"verdict":{"id":"7ba5a638-9b0b-4040-9396-88e88438a4cd","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T18:16:23.603996Z","strongest_claim":"By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.","one_line_summary":"AIDE uses large language models to perform tree search in code space and reaches state-of-the-art results on Kaggle, OpenAI MLE-Bench, and METR RE-Bench.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the tree search guided by LLMs can reliably identify and improve upon promising code variants without the search space becoming intractable or the evaluations becoming unreliable.","pith_extraction_headline":""},"references":{"count":14,"sample":[{"doi":"10.1126/science.abq1158","year":2019,"title":"Li, Y., Choi, D.H., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., et al., 2022","work_id":"cc452f34-3d34-41ff-9206-8edad6625ce6","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Voyager: An Open-Ended Embodied Agent with Large Language Models","work_id":"ffe0d207-86cf-4742-a100-e988ac8b9676","ref_index":2,"cited_arxiv_id":"2305.16291","is_internal_anchor":true},{"doi":"","year":null,"title":"Distributed Random Forest (DRF) and Extremely Randomized Trees (XRT)","work_id":"db2145b6-e1e8-4ce3-a44c-c900ebb7293f","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Generalized Linear Model (GLM) with regularization","work_id":"f095d396-55ea-4ee6-b3d3-89474fcae80e","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"H2O Gradient Boosting Machines","work_id":"89aee919-4405-4246-bae7-3854966a8126","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":14,"snapshot_sha256":"098c479e261841ca43a655028e78ffc90d602a830a9ea7264f463b467e7ac2fc","internal_anchors":1},"formal_canon":{"evidence_count":2,"snapshot_sha256":"c0441559ff1bf371acd3626b67e14ca2eb2fc1ccc17cfc5cba518557fcd889d9"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}