Pith Number

pith:PJOWPOFF

pith:2024:PJOWPOFFWLEMB7JYLTWCWOGEE5

not attested not anchored not stored refs resolved

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Guowei Xu, Hao Li, Lichao Sun, Li Yuan, Peng Jin, Yibing Song, Ziang Wu

By training on structured four-stage annotations, LLaVA-CoT lets vision-language models reason autonomously and outperform larger models with only 100k samples.

arxiv:2411.10440 v6 · 2024-11-15 · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{PJOWPOFFWLEMB7JYLTWCWOGEE5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

with only 100k training samples and test-time scaling, LLaVA-CoT not only outperforms its base model by 9.4% on a wide range of multimodal reasoning benchmarks, but also surpasses the performance of larger and even closed-source models, such as Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct.

C2weakest assumption

That the human-provided structured reasoning annotations in the LLaVA-CoT-100k dataset faithfully capture effective multistage reasoning without introducing systematic biases or annotation artifacts that the model simply memorizes.

C3one line summary

LLaVA-CoT adds autonomous multistage reasoning to vision-language models, delivering 9.4% gains over its base model and outperforming larger models like Gemini-1.5-pro on reasoning benchmarks via a 100k annotated dataset and SWIRES test-time scaling.

References

68 extracted · 68 resolved · 3 Pith anchors

[1] https : / / opencompass

[2] Available at: https://www 2024

[3] Gpt-4o system card, 2024 2024

[4] Variational best-of-n alignment, 2024 2024

[5] Neuro-symbolic visual reasoning: Disentangling 2020

Formal links

2 machine-checked theorem links

Cited by

39 papers in Pith

Efficient Reasoning with Hidden Thinking

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

Toward Generalizable Forgery Detection and Reasoning

Grounded Reinforcement Learning for Visual Reasoning

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

Receipt and verification

First computed	2026-05-17T23:38:48.018188Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

7a5d67b8a5b2c8c0fd385cec2b38c4275c7481a89b5dba265e12bb5c41fff2e1

Aliases

arxiv: 2411.10440 · arxiv_version: 2411.10440v6 · doi: 10.48550/arxiv.2411.10440 · pith_short_12: PJOWPOFFWLEM · pith_short_16: PJOWPOFFWLEMB7JY · pith_short_8: PJOWPOFF

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/PJOWPOFFWLEMB7JYLTWCWOGEE5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7a5d67b8a5b2c8c0fd385cec2b38c4275c7481a89b5dba265e12bb5c41fff2e1

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "24b193d28ef5af944ab35cb2be4e90913f09b547ee2d6b7a86d57d3933323322",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-11-15T18:58:31Z",
    "title_canon_sha256": "bc7d3a69bb86e42ea12f690bae4d1046c5a3e7378c8f824482aa58f70d6e11b9"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2411.10440",
    "kind": "arxiv",
    "version": 6
  }
}