{"work":{"id":"92bf7a73-2bef-4de4-8957-a9233d60b416","openalex_id":null,"doi":null,"arxiv_id":"2107.14795","raw_key":null,"title":"Perceiver IO: A General Architecture for Structured Inputs & Outputs","authors":null,"authors_text":"Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding","year":2021,"venue":"cs.LG","abstract":"A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.","external_url":"https://arxiv.org/abs/2107.14795","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-25T07:20:28.838364+00:00","pith_arxiv_id":"2107.14795","created_at":"2026-05-08T18:44:01.680501+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":true,"display_title":"Perceiver IO: A General Architecture for Structured Inputs & Outputs","render_title":"Perceiver IO: A General Architecture for Structured Inputs & Outputs"},"hub":{"state":{"work_id":"92bf7a73-2bef-4de4-8957-a9233d60b416","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":29,"external_cited_by_count":null,"distinct_field_count":11,"first_pith_cited_at":"2021-12-20T18:55:25+00:00","last_pith_cited_at":"2026-05-22T14:56:05+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-09T09:14:58.927542+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":3},{"context_role":"method","n":2}],"polarity_counts":[{"context_polarity":"background","n":3},{"context_polarity":"use_method","n":2}],"runs":{},"summary":{},"graph":{},"authors":[]}}