pith:4ZAXKMRQ
Deep Pre-Alignment for VLMs
Deep Pre-Alignment replaces the ViT encoder with a small VLM perceiver to align visual features deeply with the LLM's text space.
arxiv:2605.15300 v1 · 2026-05-14 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4ZAXKMRQEX3UBNINNER7WWVLJ3}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
On the 4B parameter scale, DPA outperforms baselines by 1.9 points across 8 multimodal benchmarks, with gains widening to 3.0 points at the 32B scale; by offloading alignment to the perceiver, DPA achieves a 32.9% reduction in language capability forgetting over 3 text benchmarks.
That feeding the LLM with features from a small VLM perceiver (rather than a standard ViT plus projector) produces sufficiently deep pre-alignment so that the LLM's initial layers no longer perform superficial modality matching, as stated in the motivation citing prior alignment analyses.
Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:00:51.496070Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
e64175323025f740b50d6923fb5aab4ee0584fd4493f7cdf02af0e3e2a6b087d
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4ZAXKMRQEX3UBNINNER7WWVLJ3 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e64175323025f740b50d6923fb5aab4ee0584fd4493f7cdf02af0e3e2a6b087d
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "017f88a96ca487ffa8bf0917da4183da73b1aac9cfb192fa146f4ce820b65e5e",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-14T18:14:15Z",
"title_canon_sha256": "7e175ea771a9c586b737a13198eb361d0be7b9493deaa788087f9f5b5b77380d"
},
"schema_version": "1.0",
"source": {
"id": "2605.15300",
"kind": "arxiv",
"version": 1
}
}