Pith Number

pith:VITK3VBU

pith:2025:VITK3VBU5MEPLZ4MOALEG5VYZ6

not attested not anchored not stored refs resolved

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Bo Zhang, Dacheng Yin, Fengyun Rao, Haoyu Lu, Hongkun Pan, Minfeng Zhu, Wei Chen, Xiaoxuan He, Xingtao Yang, Xiyan Jiang, Yan Deng, Yi Yang

Converting images to formal textual representations lets a new model reason more precisely about visual content and outperform GPT-4o on multimodal benchmarks.

arxiv:2503.10615 v2 · 2025-03-13 · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{VITK3VBU5MEPLZ4MOALEG5VYZ6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experimental results show that R1-Onevision achieves state-of-the-art performance, outperforming models such as GPT-4o and Qwen2.5-VL on multiple challenging multimodal reasoning benchmarks.

C2weakest assumption

The cross-modal reasoning pipeline that transforms images into formal textual representations enables precise language-based reasoning without loss of critical visual information.

C3one line summary

R1-Onevision turns images into structured text for multimodal reasoning, trains on a custom dataset with RL, and claims SOTA results on an educational benchmark.

References

52 extracted · 52 resolved · 12 Pith anchors

[1] GPT-4 Technical Report · arXiv:2303.08774

[2] Large language models for mathematical reasoning: Progresses and challenges 2024

[3] Qwen2.5-VL Technical Report 2025 · arXiv:2502.13923

[4] Evaluating Large Language Models Trained on Code 2021 · arXiv:2107.03374

[5] Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling 2024 · arXiv:2412.05271

Formal links

2 machine-checked theorem links

Cited by

44 papers in Pith

DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection

GRIT: Teaching MLLMs to Think with Images

REC-RL: Referring expression counting via Gaussian and range-based reward optimization

Are VLMs Seeing or Just Saying? Uncovering the Illusion of Visual Re-examination

CaMo: Camera Motion Grounded Evaluation and Training for Vision-Language Models

Receipt and verification

First computed	2026-05-17T23:38:49.635056Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

aa26add434eb08f5e78c70164376b8cfb7ad667f77886f2511bf8d2db77f60c0

Aliases

arxiv: 2503.10615 · arxiv_version: 2503.10615v2 · doi: 10.48550/arxiv.2503.10615 · pith_short_12: VITK3VBU5MEP · pith_short_16: VITK3VBU5MEPLZ4M · pith_short_8: VITK3VBU

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/VITK3VBU5MEPLZ4MOALEG5VYZ6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: aa26add434eb08f5e78c70164376b8cfb7ad667f77886f2511bf8d2db77f60c0

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "d5c091666b3b7ee6e36740e3f46ddaba83dc06f3c49fa818841cd1f10fe06639",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-03-13T17:56:05Z",
    "title_canon_sha256": "a834e22b8a0f74a52fd27e91bfd289fbbba20a0f8d0cae4148e740dde2f22644"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2503.10615",
    "kind": "arxiv",
    "version": 2
  }
}