pith. sign in
Pith Number

pith:WP3BBLDL

pith:2026:WP3BBLDLKSDFOXPGWIHW7UNOJR
not attested not anchored not stored refs pending

Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models

Jonathan Steinberg, Oren Gal

OCR information routes into vision-language models at architecture-specific layers, forming a low-dimensional signal that transfers across datasets.

arxiv:2602.22918 v3 · 2026-02-26 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WP3BBLDLKSDFOXPGWIHW7UNOJR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The OCR signal is remarkably low-dimensional: PC1 captures 72.9% of variance. Crucially, principal component analysis (PCA) directions learned on one dataset transfer to others, demonstrating shared text-processing pathways. Surprisingly, in models with modular OCR circuits (notably Qwen3-VL-4B), OCR removal can improve counting performance (up to +6.9 percentage points).

C2weakest assumption

That inpainting text in images only affects the OCR pathway without introducing other unintended changes to the visual input that could confound the activation differences.

C3one line summary

Causal interventions identify architecture-specific OCR bottlenecks in VLMs at mid or early layers, with low-dimensional shared pathways and potential performance benefits from OCR removal.

Formal links

2 machine-checked theorem links

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-20T00:00:35.208847Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b3f610ac6b5486575de6b20f6fd1ae4c4e9565b6e22d4dd46c45961fd0fddbd4

Aliases

arxiv: 2602.22918 · arxiv_version: 2602.22918v3 · doi: 10.48550/arxiv.2602.22918 · pith_short_12: WP3BBLDLKSDF · pith_short_16: WP3BBLDLKSDFOXPG · pith_short_8: WP3BBLDL
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WP3BBLDLKSDFOXPGWIHW7UNOJR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b3f610ac6b5486575de6b20f6fd1ae4c4e9565b6e22d4dd46c45961fd0fddbd4
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a7dc2563348038b7712030ef79d94e915b77d304f57d01eb0c8c5a525ff2c84c",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-02-26T12:06:02Z",
    "title_canon_sha256": "0e39d9d87f55beb0bb2e47a6db5c08075fbd593200c20273a65c7ec7d32c6516"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.22918",
    "kind": "arxiv",
    "version": 3
  }
}