pith. sign in
Pith Number

pith:XYRW52VE

pith:2022:XYRW52VEY4L236AVZCPU5VWAUR
not attested not anchored not stored refs resolved

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

Adrian Wong, Andy Zeng, Aveek Purohit, Brian Ichter, Federico Tombari, Johnny Lee, Krzysztof Choromanski, Maria Attarian, Michael Ryoo, Pete Florence, Stefan Welker, Vikas Sindhwani, Vincent Vanhoucke

Pretrained models can be composed zero-shot through multimodal prompting to exchange information and gain new multimodal capabilities without finetuning.

arxiv:2204.00598 v2 · 2022-04-01 · cs.CV · cs.AI · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XYRW52VEY4L236AVZCPU5VWAUR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

multiple pretrained models may be composed zero-shot i.e., via multimodal-informed prompting, to exchange information with each other and capture new multimodal capabilities, without requiring finetuning

C2weakest assumption

That distinct capabilities stored in separately trained foundation models can be reliably accessed and combined through prompting alone, without finetuning or task-specific adaptation that would break the zero-shot property.

C3one line summary

Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.

References

142 extracted · 142 resolved · 17 Pith anchors

[1] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018 · arXiv:1810.04805
[2] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information process 1901
[3] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. In Internati 2021
[4] On the Opportunities and Risks of Foundation Models 2021 · arXiv:2108.07258
[5] J. Li, R. Selvaraju, A. Gotmare, S. Joty, C. Xiong, and S. C. H. Hoi. Align before fuse: Vision and language representation learning with momentum distillation. Advances in Neural Information Processi 2021

Formal links

1 machine-checked theorem link

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.286458Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

be236eeaa4c717adf815c89f4ed6c0a4718b32bca204501fa8988d1d841daea7

Aliases

arxiv: 2204.00598 · arxiv_version: 2204.00598v2 · doi: 10.48550/arxiv.2204.00598 · pith_short_12: XYRW52VEY4L2 · pith_short_16: XYRW52VEY4L236AV · pith_short_8: XYRW52VE
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XYRW52VEY4L236AVZCPU5VWAUR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: be236eeaa4c717adf815c89f4ed6c0a4718b32bca204501fa8988d1d841daea7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "577a8a3b32c6a40980cc97108847a1f07fcfe408cc83e3df4277ff16066621c8",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2022-04-01T17:43:13Z",
    "title_canon_sha256": "6c6ab2fbb224c4add86a606e514200ffee80459270e2a6a79154866871bd8d58"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2204.00598",
    "kind": "arxiv",
    "version": 2
  }
}