{"work":{"id":"f1762ea0-e382-4f38-a28c-adc643789859","openalex_id":null,"doi":null,"arxiv_id":"2407.16741","raw_key":null,"title":"OpenHands: An Open Platform for AI Software Developers as Generalist Agents","authors":null,"authors_text":"Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge","year":2024,"venue":"cs.SE","abstract":"Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, coordination between multiple agents, and incorporation of evaluation benchmarks. Based on our currently incorporated benchmarks, we perform an evaluation of agents over 15 challenging tasks, including software engineering (e.g., SWE-BENCH) and web browsing (e.g., WEBARENA), among others. Released under the permissive MIT license, OpenHands is a community project spanning academia and industry with more than 2.1K contributions from over 188 contributors.","external_url":"https://arxiv.org/abs/2407.16741","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-25T04:05:20.343405+00:00","pith_arxiv_id":"2407.16741","created_at":"2026-05-09T06:40:40.882508+00:00","updated_at":"2026-05-25T04:05:20.343405+00:00","title_quality_ok":true,"display_title":"OpenHands: An Open Platform for AI Software Developers as Generalist Agents","render_title":"OpenHands: An Open Platform for AI Software Developers as Generalist Agents"},"hub":{"state":{"work_id":"f1762ea0-e382-4f38-a28c-adc643789859","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":113,"external_cited_by_count":null,"distinct_field_count":13,"first_pith_cited_at":"2024-10-09T17:34:27+00:00","last_pith_cited_at":"2026-05-22T15:03:13+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-01T12:53:26.471607+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":33},{"context_role":"baseline","n":2},{"context_role":"method","n":2},{"context_role":"other","n":1}],"polarity_counts":[{"context_polarity":"background","n":30},{"context_polarity":"unclear","n":4},{"context_polarity":"baseline","n":2},{"context_polarity":"use_method","n":2}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"OpenHands: An Open Platform for AI Software Developers as Generalist Agents","claims":[{"claim_text":"Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"both safe and unsafe trajectories but focuses on safety-specific tasks and supplies only trajectory-level labels. We therefore constructAFTRAJ- 2K, a unified corpus of multi-agent trajectories collected, filtered, and annotated for online auditing. Figure 2(a) illustrates the construction pipeline. Trajectory Collection.We instantiate multi-agent systems on a suite of off-the-shelf frame- works [53, 17, 41] and run them on tasks spanning mathematical reasoning [16], code generation [29], and ope","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"AI2-THOR [40] Unity✗++✗ ✗ ✗ MineDojo [22] Minecraft✗+✓ ✗ ✗ ProcTHOR [16] Unity✓++✗ ✗ ✗ Habitat 3.0 [56] Habitat-Sim✗++✓ ✗ ✗ MetaUrban [81] PyBullet✓++✓ ✗ ✗ GRUtopia [73] Isaac Sim✓++✓ ✗ ✗ EmbodiedCity [26] UE4✗+++✗ ✗ ✗ UnrealZoo [102] UE4/5✗+++✓ ✗ ✗ Virtual Community [103] Genesis✓++✗ ✗ ✗ VirtualEnv [68] UE5✗+++✗ ✗ ✗ Holodeck [89] AI2-THOR/Unity✓++✗ ✗ ✗ SAGE [83] Isaac Sim✓+++✗ ✓ ✗ GenEnv [28] AlfWorld/Text-only✗+✗ ✗ ✓ SIMWORLDSTUDIOUE5✓+++✓ ✓ ✓ 4 Related Work Embodied simulation platforms.Embod","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"5: visual agentic intelligence. arXiv:2602.02276, 2026. [23] Haoyu Wang, Christopher M. Poskitt, and Jun Sun. Agentspec: Customizable runtime enforce- ment for safe and reliable LLM agents. InICSE, 2026. [24] Haoyu Wang, Christopher M. Poskitt, Jiali Wei, and Jun Sun. Probguard: Probabilistic runtime monitoring for llm agent safety. arXiv:2508.00500, 2025. [25] Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, et al. O","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Code and workspace agent benchmarks.Tool and code benchmarks provide the closest prece- dent for the workspace-repair side of Claw-Eval-Live. API-Bank [20], ToolBench/ToolLLM [33], Gorilla [31], MINT [40], τ-bench [48], and MCP-Bench [41] focus on API or tool manipulation. HumanEval [3], MBPP [ 2], DS-1000 [ 18], CRUXEval [11], RepoBench [ 22], SWE-bench [ 15], OpenHands [39], and Terminal-Bench [25] move from function-level code generation toward reposi- tory or command-line execution. Within t","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Most existing automated repair systems, however, are built upon a different modeling assumption. Repository-level repair pipelines typically treat available validation signals as fixed during repair: candidate patches are generated first and subsequently filtered based on whether they satisfy these signals. Representative approaches, including Agentless [47], SpecRover [40], Moatless Tools [52], KGCompass [29], and DARS [ 1], largely follow this paradigm, using tests primarily as static acceptan","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering, 2024. URLhttps://arxiv.org/abs/2405.15793. [37] Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents, 2024. URL https://arxiv. org/abs/2403.02691. [38] Ziling Zhou. Governing dynamic capabilities: Cryptographic binding and reproducibility verification for ai agent tool use, 2026. URLhttps://a","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks OpenHands: An Open Platform for AI Software Developers as Generalist Agents because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (30 contexts).","role_counts":[{"n":30,"context_role":"background"},{"n":2,"context_role":"baseline"},{"n":2,"context_role":"method"},{"n":1,"context_role":"other"}]},"error":null,"updated_at":"2026-05-19T18:11:51.671230+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"4aaf9ea2-a317-4cde-926b-c2ea3ad19a96","orcid":null,"display_name":"Xingyao Wang"},{"id":"1ae322de-b126-40d4-94de-b8b3d210d411","orcid":null,"display_name":"Boxuan Li"},{"id":"26adcd6c-d4c5-42b9-a37d-ab9fc0ac5bd7","orcid":null,"display_name":"Yufan Song"},{"id":"6cb2fd15-51ef-4235-b5bd-7bdf58e8dd06","orcid":null,"display_name":"Frank F. Xu"},{"id":"bc54dc04-88ea-4046-91ba-fc249dc76c7d","orcid":null,"display_name":"Xiangru Tang"},{"id":"f82dff1a-1631-4b05-9808-50e6ca97d03e","orcid":null,"display_name":"Mingchen Zhuge"}]},"error":null,"updated_at":"2026-05-19T18:11:40.843436+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T10:39:20.291681+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":20},{"title":"SWE-bench: Can Language Models Resolve Real-World GitHub Issues?","work_id":"d0effe15-a689-441a-8e3f-ea35f1c4e4b1","shared_citers":20},{"title":"SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering","work_id":"01826cd9-a652-403c-a2ec-531da9fe2b6a","shared_citers":18},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":14},{"title":"ReAct: Synergizing Reasoning and Acting in Language Models","work_id":"407a2351-25f1-497d-b611-f77d0292a8e6","shared_citers":12},{"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","shared_citers":11},{"title":"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation","work_id":"92b7eb9c-c3d8-4518-a376-06fa15dd895b","shared_citers":10},{"title":"Agentless: Demystifying LLM-based Software Engineering Agents","work_id":"71c901c4-3c83-4e10-af54-3daef7fff397","shared_citers":9},{"title":"WebArena: A Realistic Web Environment for Building Autonomous Agents","work_id":"7058ffd2-a339-4102-89eb-248eeb074652","shared_citers":9},{"title":"SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?","work_id":"a561c78a-4b02-4053-a92a-bc5c7c5f6b9b","shared_citers":8},{"title":"Voyager: An Open-Ended Embodied Agent with Large Language Models","work_id":"ffe0d207-86cf-4742-a100-e988ac8b9676","shared_citers":7},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":6},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":6},{"title":"Mle-bench: Evaluating machine learning agents on machine learning engineering","work_id":"a671e43f-ceab-49e7-adc3-473d802a97ca","shared_citers":6},{"title":"Qwen2.5-Coder Technical Report","work_id":"09ba463d-6377-4017-9801-444ffb94b056","shared_citers":6},{"title":"Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces","work_id":"0624be05-1d97-4fd6-8300-b04b8a3ab04b","shared_citers":6},{"title":"The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery","work_id":"56b6b58d-e73a-4317-896e-36ac5f84e957","shared_citers":6},{"title":"Trae agent: An LLM-based agent for software engineering with test-time scaling.arXiv preprint arXiv:2507.23370","work_id":"9bbaf3fb-3f46-415d-bc2c-ecf1cfdd0924","shared_citers":6},{"title":"AlphaEvolve: A coding agent for scientific and algorithmic discovery","work_id":"76a0f850-d490-4e4f-ab98-8d25df82cd23","shared_citers":5},{"title":"Code Llama: Open Foundation Models for Code","work_id":"e73bffa4-7620-47ac-9327-259a60db52ca","shared_citers":5},{"title":"DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines","work_id":"d490f594-f5fc-47b0-ae6a-6550e50fe095","shared_citers":5},{"title":"Gorilla: Large Language Model Connected with Massive APIs","work_id":"126a464a-4a73-495f-b669-de1e44aa8f09","shared_citers":5},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":5},{"title":"Kimi K2.5: Visual Agentic Intelligence","work_id":"d690be8f-5d53-49b0-b1e7-79668eb8fcdb","shared_citers":5}],"time_series":[{"n":1,"year":2024},{"n":3,"year":2025},{"n":62,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T10:39:24.466973+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T10:39:18.368803+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"OpenHands: An Open Platform for AI Software Developers as Generalist Agents","claims":[{"claim_text":"Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"both safe and unsafe trajectories but focuses on safety-specific tasks and supplies only trajectory-level labels. We therefore constructAFTRAJ- 2K, a unified corpus of multi-agent trajectories collected, filtered, and annotated for online auditing. Figure 2(a) illustrates the construction pipeline. Trajectory Collection.We instantiate multi-agent systems on a suite of off-the-shelf frame- works [53, 17, 41] and run them on tasks spanning mathematical reasoning [16], code generation [29], and ope","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"AI2-THOR [40] Unity✗++✗ ✗ ✗ MineDojo [22] Minecraft✗+✓ ✗ ✗ ProcTHOR [16] Unity✓++✗ ✗ ✗ Habitat 3.0 [56] Habitat-Sim✗++✓ ✗ ✗ MetaUrban [81] PyBullet✓++✓ ✗ ✗ GRUtopia [73] Isaac Sim✓++✓ ✗ ✗ EmbodiedCity [26] UE4✗+++✗ ✗ ✗ UnrealZoo [102] UE4/5✗+++✓ ✗ ✗ Virtual Community [103] Genesis✓++✗ ✗ ✗ VirtualEnv [68] UE5✗+++✗ ✗ ✗ Holodeck [89] AI2-THOR/Unity✓++✗ ✗ ✗ SAGE [83] Isaac Sim✓+++✗ ✓ ✗ GenEnv [28] AlfWorld/Text-only✗+✗ ✗ ✓ SIMWORLDSTUDIOUE5✓+++✓ ✓ ✓ 4 Related Work Embodied simulation platforms.Embod","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"5: visual agentic intelligence. arXiv:2602.02276, 2026. [23] Haoyu Wang, Christopher M. Poskitt, and Jun Sun. Agentspec: Customizable runtime enforce- ment for safe and reliable LLM agents. InICSE, 2026. [24] Haoyu Wang, Christopher M. Poskitt, Jiali Wei, and Jun Sun. Probguard: Probabilistic runtime monitoring for llm agent safety. arXiv:2508.00500, 2025. [25] Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, et al. O","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Code and workspace agent benchmarks.Tool and code benchmarks provide the closest prece- dent for the workspace-repair side of Claw-Eval-Live. API-Bank [20], ToolBench/ToolLLM [33], Gorilla [31], MINT [40], τ-bench [48], and MCP-Bench [41] focus on API or tool manipulation. HumanEval [3], MBPP [ 2], DS-1000 [ 18], CRUXEval [11], RepoBench [ 22], SWE-bench [ 15], OpenHands [39], and Terminal-Bench [25] move from function-level code generation toward reposi- tory or command-line execution. Within t","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Most existing automated repair systems, however, are built upon a different modeling assumption. Repository-level repair pipelines typically treat available validation signals as fixed during repair: candidate patches are generated first and subsequently filtered based on whether they satisfy these signals. Representative approaches, including Agentless [47], SpecRover [40], Moatless Tools [52], KGCompass [29], and DARS [ 1], largely follow this paradigm, using tests primarily as static acceptan","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering, 2024. URLhttps://arxiv.org/abs/2405.15793. [37] Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents, 2024. URL https://arxiv. org/abs/2403.02691. [38] Ziling Zhou. Governing dynamic capabilities: Cryptographic binding and reproducibility verification for ai agent tool use, 2026. URLhttps://a","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks OpenHands: An Open Platform for AI Software Developers as Generalist Agents because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (30 contexts).","role_counts":[{"n":30,"context_role":"background"},{"n":2,"context_role":"baseline"},{"n":2,"context_role":"method"},{"n":1,"context_role":"other"}]},"error":null,"updated_at":"2026-05-19T18:11:40.323908+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"OpenHands: An Open Platform for AI Software Developers as Generalist Agents","claims":[{"claim_text":"Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks OpenHands: An Open Platform for AI Software Developers as Generalist Agents because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T10:39:16.044041+00:00"}},"summary":{"title":"OpenHands: An Open Platform for AI Software Developers as Generalist Agents","claims":[{"claim_text":"Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks OpenHands: An Open Platform for AI Software Developers as Generalist Agents because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":20},{"title":"SWE-bench: Can Language Models Resolve Real-World GitHub Issues?","work_id":"d0effe15-a689-441a-8e3f-ea35f1c4e4b1","shared_citers":20},{"title":"SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering","work_id":"01826cd9-a652-403c-a2ec-531da9fe2b6a","shared_citers":18},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":14},{"title":"ReAct: Synergizing Reasoning and Acting in Language Models","work_id":"407a2351-25f1-497d-b611-f77d0292a8e6","shared_citers":12},{"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","shared_citers":11},{"title":"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation","work_id":"92b7eb9c-c3d8-4518-a376-06fa15dd895b","shared_citers":10},{"title":"Agentless: Demystifying LLM-based Software Engineering Agents","work_id":"71c901c4-3c83-4e10-af54-3daef7fff397","shared_citers":9},{"title":"WebArena: A Realistic Web Environment for Building Autonomous Agents","work_id":"7058ffd2-a339-4102-89eb-248eeb074652","shared_citers":9},{"title":"SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?","work_id":"a561c78a-4b02-4053-a92a-bc5c7c5f6b9b","shared_citers":8},{"title":"Voyager: An Open-Ended Embodied Agent with Large Language Models","work_id":"ffe0d207-86cf-4742-a100-e988ac8b9676","shared_citers":7},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":6},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":6},{"title":"Mle-bench: Evaluating machine learning agents on machine learning engineering","work_id":"a671e43f-ceab-49e7-adc3-473d802a97ca","shared_citers":6},{"title":"Qwen2.5-Coder Technical Report","work_id":"09ba463d-6377-4017-9801-444ffb94b056","shared_citers":6},{"title":"Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces","work_id":"0624be05-1d97-4fd6-8300-b04b8a3ab04b","shared_citers":6},{"title":"The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery","work_id":"56b6b58d-e73a-4317-896e-36ac5f84e957","shared_citers":6},{"title":"Trae agent: An LLM-based agent for software engineering with test-time scaling.arXiv preprint arXiv:2507.23370","work_id":"9bbaf3fb-3f46-415d-bc2c-ecf1cfdd0924","shared_citers":6},{"title":"AlphaEvolve: A coding agent for scientific and algorithmic discovery","work_id":"76a0f850-d490-4e4f-ab98-8d25df82cd23","shared_citers":5},{"title":"Code Llama: Open Foundation Models for Code","work_id":"e73bffa4-7620-47ac-9327-259a60db52ca","shared_citers":5},{"title":"DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines","work_id":"d490f594-f5fc-47b0-ae6a-6550e50fe095","shared_citers":5},{"title":"Gorilla: Large Language Model Connected with Massive APIs","work_id":"126a464a-4a73-495f-b669-de1e44aa8f09","shared_citers":5},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":5},{"title":"Kimi K2.5: Visual Agentic Intelligence","work_id":"d690be8f-5d53-49b0-b1e7-79668eb8fcdb","shared_citers":5}],"time_series":[{"n":1,"year":2024},{"n":3,"year":2025},{"n":62,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"1ae322de-b126-40d4-94de-b8b3d210d411","orcid":null,"display_name":"Boxuan Li","source":"manual","import_confidence":0.72},{"id":"6cb2fd15-51ef-4235-b5bd-7bdf58e8dd06","orcid":null,"display_name":"Frank F. Xu","source":"manual","import_confidence":0.72},{"id":"f82dff1a-1631-4b05-9808-50e6ca97d03e","orcid":null,"display_name":"Mingchen Zhuge","source":"manual","import_confidence":0.72},{"id":"bc54dc04-88ea-4046-91ba-fc249dc76c7d","orcid":null,"display_name":"Xiangru Tang","source":"manual","import_confidence":0.72},{"id":"4aaf9ea2-a317-4cde-926b-c2ea3ad19a96","orcid":null,"display_name":"Xingyao Wang","source":"manual","import_confidence":0.72},{"id":"26adcd6c-d4c5-42b9-a37d-ab9fc0ac5bd7","orcid":null,"display_name":"Yufan Song","source":"manual","import_confidence":0.72}]}}