pith. machine review for the scientific record. sign in
Pith Number

pith:VPF5YSOV

pith:2024:VPF5YSOVDWVNSVG4S7ZGZN6YRS
not attested not anchored not stored refs resolved

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Chenglong Liu, Chunrui Han, Haoran Wei, Jianjian Sun, Jia Wang, Jinyue Chen, Liang Zhao, Lingyu Kong, Xiangyu Zhang, Yanming Xu, Yuang Peng, Zheng Ge

A single unified model can recognize texts, formulas, tables, charts and more by treating them all as characters.

arxiv:2409.01704 v1 · 2024-09-03 · cs.CV

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open

Claims

C1strongest claim

we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as 'characters' and propose the General OCR Theory along with an excellent model, namely GOT, to promote the arrival of OCR-2.0. The GOT ... can handle all the above 'characters' under various OCR tasks.

C2weakest assumption

That a single 580M-parameter end-to-end model with prompt-based output formatting can maintain high accuracy across all listed character types and input styles without requiring task-specific components or suffering from interference between them.

C3one line summary

GOT is a unified end-to-end model that treats all man-made optical signals as characters and handles multiple OCR tasks including formatted output and interactive region recognition via prompts.

References

55 extracted · 55 resolved · 10 Pith anchors

[1] https://huggingface.co/datasets/Teklia/CASIA-HWDB2-line (2024) 6 2024
[2] https://huggingface.co/datasets/Teklia/IAM-line (2024) 6 2024
[3] https://huggingface.co/datasets/Teklia/NorHand-v3-line (2024) 6 2024
[4] Qwen Technical Report 2023 · arXiv:2309.16609
[5] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 2023 · arXiv:2308.12966

Formal links

2 machine-checked theorem links

Cited by

17 papers in Pith

Receipt and verification
First computed2026-05-17T23:38:13.114827Z
Builderpith-number-builder-2026-05-17-v1
SignaturePith Ed25519 (pith-v1-2026-05) · public key
Schemapith-number/v1.0

Canonical hash

abcbdc49d51daad954dc97f26cb7d88cb26f4e9f5be16636c71bdb3d83838c56

Aliases

arxiv: 2409.01704 · arxiv_version: 2409.01704v1 · doi: 10.48550/arxiv.2409.01704
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VPF5YSOVDWVNSVG4S7ZGZN6YRS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: abcbdc49d51daad954dc97f26cb7d88cb26f4e9f5be16636c71bdb3d83838c56
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "42b6d6c11459ec2495155196491af0a58729ade575db81501be3ff3c46cd15c5",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-09-03T08:41:31Z",
    "title_canon_sha256": "3f2bbec1951d819bd39a2296e8c2b4200d4a4fea581a6aefc7dfd61787c8bda4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2409.01704",
    "kind": "arxiv",
    "version": 1
  }
}