{"work":{"id":"fcc1bceb-335d-4236-be5f-8ded6c3e1131","openalex_id":null,"doi":null,"arxiv_id":"2501.16150","raw_key":null,"title":"A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions","authors":null,"authors_text":null,"year":2025,"venue":"cs.AI","abstract":"Agents for computer use (ACUs) are an emerging class of systems capable of executing complex tasks on digital devices -- such as desktops, mobile phones, and web platforms -- given instructions in natural language. These agents can automate tasks by controlling software via low-level actions like mouse clicks and touchscreen gestures. However, despite rapid progress, ACUs are not yet mature for everyday use. In this survey, we investigate the state-of-the-art, trends, and research gaps in the development of practical ACUs. We provide a comprehensive review of the ACU landscape, introducing a unifying taxonomy spanning three dimensions: (I) the domain perspective, characterizing agent operating contexts; (II) the interaction perspective, describing observation modalities (e.g., screenshots, HTML) and action modalities (e.g., mouse, keyboard, code execution); and (III) the agent perspective, detailing how agents perceive, reason, and learn. We review 87 ACUs and 33 datasets across foundation model-based and classical approaches through this taxonomy. Our analysis identifies six major research gaps: insufficient generalization, inefficient learning, limited planning, low task complexity in benchmarks, non-standardized evaluation, and a disconnect between research and practical conditions. To address these gaps, we advocate for: (a) vision-based observations and low-level control to enhance generalization; (b) adaptive learning beyond static prompting; (c) effective planning and reasoning methods and models; (d) benchmarks that reflect real-world task complexity; (e) standardized evaluation based on task success; (f) aligning agent design with real-world deployment constraints. Together, our taxonomy and analysis establish a foundation for advancing ACU research toward general-purpose agents for robust and scalable computer use.","external_url":"https://arxiv.org/abs/2501.16150","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-25T08:15:33.558674+00:00","pith_arxiv_id":"2501.16150","created_at":"2026-05-10T22:20:46.967971+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":true,"display_title":"A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions","render_title":"A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions"},"hub":{"state":{"work_id":"fcc1bceb-335d-4236-be5f-8ded6c3e1131","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":10,"external_cited_by_count":null,"distinct_field_count":5,"first_pith_cited_at":"2024-11-27T12:13:39+00:00","last_pith_cited_at":"2026-05-11T20:27:54+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-05-26T01:06:16.239448+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":1}],"polarity_counts":[{"context_polarity":"background","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}