Pith Number

pith:C757O2MR

pith:2023:C757O2MRLHQBAEEYAMLEYHRIXC

not attested not anchored not stored refs resolved

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

Ce Liu, Ehsan Azarnasab, Faisal Ahmed, Jianfeng Wang, Kevin Lin, Lijuan Wang, Linjie Li, Michael Zeng, Zhengyuan Yang, Zicheng Liu

A textual prompt design lets ChatGPT collaborate with vision experts to handle advanced multimodal reasoning and action in zero-shot settings.

arxiv:2303.11381 v1 · 2023-03-20 · cs.CV · cs.CL · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{C757O2MRLHQBAEEYAMLEYHRIXC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Zero-shot experiments demonstrate MM-REACT's effectiveness in addressing the specified capabilities of interests and its wide application in different scenarios that require advanced visual understanding.

C2weakest assumption

The textual prompt design can faithfully represent and allow language models to process dense visual signals such as images and videos without loss of critical information.

C3one line summary

MM-REACT uses textual prompts to let ChatGPT collaborate with external vision experts for zero-shot multimodal reasoning and action on advanced visual tasks.

References

60 extracted · 60 resolved · 18 Pith anchors

[1] Do As I Can, Not As I Say: Grounding Language in Robotic Affordances 2022 · arXiv:2204.01691

[2] Flamingo: a Visual Language Model for Few-Shot Learning 2022 · arXiv:2204.14198

[3] Lan- guage models are few-shot learners 2020

[4] End-to- end object detection with transformers 2020

[5] Harrison Chase. Langchain. https://langchain. readthedocs.io/, 2023. 4 6 Figure 4. Case studies of MM-R EACT’s capabilities and application scenarios: visual math and text reasoning. 7 2023

Formal links

2 machine-checked theorem links

Cited by

42 papers in Pith

FireScope: Wildfire Risk Raster Prediction with a Chain-of-Thought Oracle

Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models

TRACE: Evidence Grounding-Guided Multi-Video Event Understanding and Claim Generation

A Comprehensive Overview of Large Language Models

FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

Receipt and verification

First computed	2026-05-18T03:00:13.087518Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

17fbf7699159e010109803164c1e28b8be7a9d986cdbd49dc4371790e0fd38f7

Aliases

arxiv: 2303.11381 · arxiv_version: 2303.11381v1 · doi: 10.48550/arxiv.2303.11381 · pith_short_12: C757O2MRLHQB · pith_short_16: C757O2MRLHQBAEEY · pith_short_8: C757O2MR

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/C757O2MRLHQBAEEYAMLEYHRIXC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 17fbf7699159e010109803164c1e28b8be7a9d986cdbd49dc4371790e0fd38f7

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ad55e24432253d2a7fd679fd3f5d8e67b783e447d9f8098fe700f8965e46239c",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-03-20T18:31:47Z",
    "title_canon_sha256": "d8fc08a05575b94e41ebafb15d423ba6457301a1ecb6e571ce2d3f5d4f47bbb0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2303.11381",
    "kind": "arxiv",
    "version": 1
  }
}