pith. sign in
Pith Number

pith:4ZAXKMRQ

pith:2026:4ZAXKMRQEX3UBNINNER7WWVLJ3
not attested not anchored not stored refs resolved

Deep Pre-Alignment for VLMs

Bo Zheng, Jun Song, Kaidong Zhang, Kechen Fang, Tianyu Yu, Yicheng Zhang, Yuan Yao, Zihao Wan

Deep Pre-Alignment replaces the ViT encoder with a small VLM perceiver to align visual features deeply with the LLM's text space.

arxiv:2605.15300 v1 · 2026-05-14 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4ZAXKMRQEX3UBNINNER7WWVLJ3}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

On the 4B parameter scale, DPA outperforms baselines by 1.9 points across 8 multimodal benchmarks, with gains widening to 3.0 points at the 32B scale; by offloading alignment to the perceiver, DPA achieves a 32.9% reduction in language capability forgetting over 3 text benchmarks.

C2weakest assumption

That feeding the LLM with features from a small VLM perceiver (rather than a standard ViT plus projector) produces sufficiently deep pre-alignment so that the LLM's initial layers no longer perform superficial modality matching, as stated in the motivation citing prior alignment analyses.

C3one line summary

Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.

References

172 extracted · 172 resolved · 29 Pith anchors

[1] International conference on machine learning , pages= 2023
[2] Flamingo: a visual language model for few-shot learning , author=. NeurIPS , volume=
[3] Changpinyo, Soravit and Sharma, Piyush and Ding, Nan and Soricut, Radu , booktitle=
[4] Byeon, Minwoo and Park, Beomhee and Kim, Haecheon and Lee, Sungjun and Baek, Woonhyuk and Kim, Saehoon , year =
[5] Schuhmann, Christoph and Beaumont, Romain and Vencu, Richard and Gordon, Cade and Wightman, Ross and Cherti, Mehdi and Coombes, Theo and Katta, Aarush and Mullis, Clayton and Wortsman, Mitchell and ot

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:00:51.496070Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

e64175323025f740b50d6923fb5aab4ee0584fd4493f7cdf02af0e3e2a6b087d

Aliases

arxiv: 2605.15300 · arxiv_version: 2605.15300v1 · doi: 10.48550/arxiv.2605.15300 · pith_short_12: 4ZAXKMRQEX3U · pith_short_16: 4ZAXKMRQEX3UBNIN · pith_short_8: 4ZAXKMRQ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4ZAXKMRQEX3UBNINNER7WWVLJ3 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e64175323025f740b50d6923fb5aab4ee0584fd4493f7cdf02af0e3e2a6b087d
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "017f88a96ca487ffa8bf0917da4183da73b1aac9cfb192fa146f4ce820b65e5e",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-14T18:14:15Z",
    "title_canon_sha256": "7e175ea771a9c586b737a13198eb361d0be7b9493deaa788087f9f5b5b77380d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15300",
    "kind": "arxiv",
    "version": 1
  }
}