pith:WP3BBLDL
Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models
OCR information routes into vision-language models at architecture-specific layers, forming a low-dimensional signal that transfers across datasets.
arxiv:2602.22918 v3 · 2026-02-26 · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WP3BBLDLKSDFOXPGWIHW7UNOJR}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
The OCR signal is remarkably low-dimensional: PC1 captures 72.9% of variance. Crucially, principal component analysis (PCA) directions learned on one dataset transfer to others, demonstrating shared text-processing pathways. Surprisingly, in models with modular OCR circuits (notably Qwen3-VL-4B), OCR removal can improve counting performance (up to +6.9 percentage points).
That inpainting text in images only affects the OCR pathway without introducing other unintended changes to the visual input that could confound the activation differences.
Causal interventions identify architecture-specific OCR bottlenecks in VLMs at mid or early layers, with low-dimensional shared pathways and potential performance benefits from OCR removal.
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-20T00:00:35.208847Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
b3f610ac6b5486575de6b20f6fd1ae4c4e9565b6e22d4dd46c45961fd0fddbd4
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WP3BBLDLKSDFOXPGWIHW7UNOJR \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b3f610ac6b5486575de6b20f6fd1ae4c4e9565b6e22d4dd46c45961fd0fddbd4
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "a7dc2563348038b7712030ef79d94e915b77d304f57d01eb0c8c5a525ff2c84c",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-02-26T12:06:02Z",
"title_canon_sha256": "0e39d9d87f55beb0bb2e47a6db5c08075fbd593200c20273a65c7ec7d32c6516"
},
"schema_version": "1.0",
"source": {
"id": "2602.22918",
"kind": "arxiv",
"version": 3
}
}