pith. sign in
Pith Number

pith:C757O2MR

pith:2023:C757O2MRLHQBAEEYAMLEYHRIXC
not attested not anchored not stored refs resolved

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

Ce Liu, Ehsan Azarnasab, Faisal Ahmed, Jianfeng Wang, Kevin Lin, Lijuan Wang, Linjie Li, Michael Zeng, Zhengyuan Yang, Zicheng Liu

A textual prompt design lets ChatGPT collaborate with vision experts to handle advanced multimodal reasoning and action in zero-shot settings.

arxiv:2303.11381 v1 · 2023-03-20 · cs.CV · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{C757O2MRLHQBAEEYAMLEYHRIXC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Zero-shot experiments demonstrate MM-REACT's effectiveness in addressing the specified capabilities of interests and its wide application in different scenarios that require advanced visual understanding.

C2weakest assumption

The textual prompt design can faithfully represent and allow language models to process dense visual signals such as images and videos without loss of critical information.

C3one line summary

MM-REACT uses textual prompts to let ChatGPT collaborate with external vision experts for zero-shot multimodal reasoning and action on advanced visual tasks.

References

60 extracted · 60 resolved · 18 Pith anchors

[1] Do As I Can, Not As I Say: Grounding Language in Robotic Affordances 2022 · arXiv:2204.01691
[2] Flamingo: a Visual Language Model for Few-Shot Learning 2022 · arXiv:2204.14198
[3] Lan- guage models are few-shot learners 2020
[4] End-to- end object detection with transformers 2020
[5] Harrison Chase. Langchain. https://langchain. readthedocs.io/, 2023. 4 6 Figure 4. Case studies of MM-R EACT’s capabilities and application scenarios: visual math and text reasoning. 7 2023

Formal links

2 machine-checked theorem links

Cited by

42 papers in Pith

Receipt and verification
First computed 2026-05-18T03:00:13.087518Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

17fbf7699159e010109803164c1e28b8be7a9d986cdbd49dc4371790e0fd38f7

Aliases

arxiv: 2303.11381 · arxiv_version: 2303.11381v1 · doi: 10.48550/arxiv.2303.11381 · pith_short_12: C757O2MRLHQB · pith_short_16: C757O2MRLHQBAEEY · pith_short_8: C757O2MR
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/C757O2MRLHQBAEEYAMLEYHRIXC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 17fbf7699159e010109803164c1e28b8be7a9d986cdbd49dc4371790e0fd38f7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ad55e24432253d2a7fd679fd3f5d8e67b783e447d9f8098fe700f8965e46239c",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-03-20T18:31:47Z",
    "title_canon_sha256": "d8fc08a05575b94e41ebafb15d423ba6457301a1ecb6e571ce2d3f5d4f47bbb0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2303.11381",
    "kind": "arxiv",
    "version": 1
  }
}