pith. sign in
Pith Number

pith:UOOVP6KN

pith:2021:UOOVP6KNB3N3PH6Q6NKB4P653J
not attested not anchored not stored refs resolved

Perceiver IO: A General Architecture for Structured Inputs & Outputs

Andrew Brock, Andrew Jaegle, Andrew Zisserman, Carl Doersch, Catalin Ionescu, Daniel Zoran, David Ding, Evan Shelhamer, Jean-Baptiste Alayrac, Jo\=ao Carreira, Matthew M. Botvinick, Olivier H\'enaff, Oriol Vinyals, Sebastian Borgeaud, Skanda Koppula

Perceiver IO adds a flexible querying mechanism to the Perceiver so one architecture processes arbitrary structured inputs and produces outputs of any size or type while scaling linearly.

arxiv:2107.14795 v3 · 2021-07-30 · cs.LG · cs.CL · cs.CV · cs.SD · eess.AS

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UOOVP6KNB3N3PH6Q6NKB4P653J}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.

C2weakest assumption

That the added flexible querying mechanism can produce outputs of arbitrary sizes and semantics across domains without introducing hidden task-specific assumptions or requiring per-task architectural changes that undermine the generality claim.

C3one line summary

Perceiver IO is a general architecture that processes arbitrary structured inputs and outputs with linear scaling and achieves strong results on GLUE, Sintel optical flow, multi-task reasoning, and StarCraft II without task-specific components.

References

103 extracted · 103 resolved · 7 Pith anchors

[1] Imitating interactive intelligence, 2021, 2012.05672 http://arxiv.org/abs/2012.05672 2012
[2] VATT : Transformers for multimodal self-supervised learning from raw video, audio and text 2021
[3] Self-supervised multimodal versatile networks 2020
[4] The D eep M ind JAX E cosystem, 2020 2020
[5] Longformer: The Long-Document Transformer 2004 · arXiv:2004.05150

Formal links

1 machine-checked theorem link

Cited by

32 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.367308Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a39d57f94d0edbb79fd0f3541e3fddda713de0c62bb0e3d9d235df20a1976d1a

Aliases

arxiv: 2107.14795 · arxiv_version: 2107.14795v3 · doi: 10.48550/arxiv.2107.14795 · pith_short_12: UOOVP6KNB3N3 · pith_short_16: UOOVP6KNB3N3PH6Q · pith_short_8: UOOVP6KN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UOOVP6KNB3N3PH6Q6NKB4P653J \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a39d57f94d0edbb79fd0f3541e3fddda713de0c62bb0e3d9d235df20a1976d1a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c328ae9fb67839fb9efacc3ad0e9364bcf672c3b4a0802fcf8968ada843530e7",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.CV",
      "cs.SD",
      "eess.AS"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2021-07-30T17:53:34Z",
    "title_canon_sha256": "51e3957195f6b7855e23f45785d8f55e645d412b477cb4d44eb01f3219901580"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2107.14795",
    "kind": "arxiv",
    "version": 3
  }
}