{"paper":{"title":"Agent AI: Surveying the Horizons of Multimodal Interaction","license":"http://creativecommons.org/licenses/by/4.0/","headline":"","cross_cats":["cs.HC","cs.LG"],"primary_cat":"cs.AI","authors_text":"Bidipta Sarkar, Demetri Terzopoulos, Hoi Vo, Jae Sung Park, Jianfeng Gao, Katsushi Ikeuchi, Li Fei-Fei, Naoki Wake, Qiuyuan Huang, Ran Gong, Rohan Taori, Yejin Choi, Yusuke Noda, Zane Durante","submitted_at":"2024-01-07T19:11:18Z","abstract_excerpt":"Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive use"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2401.03568","kind":"arxiv","version":2},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}