pith:VPF5YSOV
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
A single unified model can recognize texts, formulas, tables, charts and more by treating them all as characters.
arxiv:2409.01704 v1 · 2024-09-03 · cs.CV
Record completeness
Claims
we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as 'characters' and propose the General OCR Theory along with an excellent model, namely GOT, to promote the arrival of OCR-2.0. The GOT ... can handle all the above 'characters' under various OCR tasks.
That a single 580M-parameter end-to-end model with prompt-based output formatting can maintain high accuracy across all listed character types and input styles without requiring task-specific components or suffering from interference between them.
GOT is a unified end-to-end model that treats all man-made optical signals as characters and handles multiple OCR tasks including formatted output and interactive region recognition via prompts.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:13.114827Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519 (pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
abcbdc49d51daad954dc97f26cb7d88cb26f4e9f5be16636c71bdb3d83838c56
Aliases
· ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VPF5YSOVDWVNSVG4S7ZGZN6YRS \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: abcbdc49d51daad954dc97f26cb7d88cb26f4e9f5be16636c71bdb3d83838c56
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "42b6d6c11459ec2495155196491af0a58729ade575db81501be3ff3c46cd15c5",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by-sa/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-09-03T08:41:31Z",
"title_canon_sha256": "3f2bbec1951d819bd39a2296e8c2b4200d4a4fea581a6aefc7dfd61787c8bda4"
},
"schema_version": "1.0",
"source": {
"id": "2409.01704",
"kind": "arxiv",
"version": 1
}
}