pith. sign in
Pith Number

pith:EXLCQCNE

pith:2023:EXLCQCNEKLTKZELHJSQHJDQ5YC
not attested not anchored not stored refs resolved

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

Chenfei Wu, Nan Duan, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang

Visual ChatGPT lets users chat with images by linking ChatGPT to visual foundation models through prompts.

arxiv:2303.04671 v1 · 2023-03-08 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{EXLCQCNEKLTKZELHJSQHJDQ5YC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We build a system called Visual ChatGPT, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.

C2weakest assumption

That prompt-based injection of visual model capabilities into ChatGPT enables reliable multi-step collaboration without frequent errors in task decomposition or model selection.

C3one line summary

Visual ChatGPT integrates visual foundation models with ChatGPT via prompts to enable multi-step image understanding, generation, and editing in conversational interactions.

References

58 extracted · 58 resolved · 9 Pith anchors

[1] Flamingo: a visual language model for few-shot learning 2022
[2] Vqa: Visual question answering 2015
[3] Vlmo: Unified vision-language pre- training with mixture-of-modality-experts 2021
[4] In- structpix2pix: Learning to follow image editing instructions 2022
[5] Lan- guage models are few-shot learners 1901

Formal links

2 machine-checked theorem links

Cited by

45 papers in Pith

Receipt and verification
First computed 2026-05-18T04:00:28.484742Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

25d62809a452e6ac91674ca0748e1dc094b385ad72923e3828febcfba8cf321f

Aliases

arxiv: 2303.04671 · arxiv_version: 2303.04671v1 · doi: 10.48550/arxiv.2303.04671 · pith_short_12: EXLCQCNEKLTK · pith_short_16: EXLCQCNEKLTKZELH · pith_short_8: EXLCQCNE
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/EXLCQCNEKLTKZELHJSQHJDQ5YC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 25d62809a452e6ac91674ca0748e1dc094b385ad72923e3828febcfba8cf321f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "eee4f9578aa0ad14064cdf31d8f7c34541abf7367a11f22bab2cab92b014d4f3",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-03-08T15:50:02Z",
    "title_canon_sha256": "990bdc7e3e647b036962fbb157f950c841665d609e987e7d95d9fedb63867a44"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2303.04671",
    "kind": "arxiv",
    "version": 1
  }
}