pith:GCGY5GYL
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
ReAlign aligns text embeddings to image distributions via a training-free three-step process using unpaired data, letting MLLMs pretrain without paired image-text examples.
arxiv:2602.07026 v3 · 2026-02-02 · cs.CV · cs.AI · cs.MM
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GCGY5GYL2WNFEH7SUV3CPVEPVC}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
ReAlign, a training-free three-step procedure (Anchor, Trace, Centroid Alignment) that uses statistics from massive unpaired data, explicitly rectifies geometric misalignment so that unpaired text can substitute for paired image-text data in MLLM pretraining.
The Fixed-frame Modality Gap Theory assumes that the decomposition into stable biases and anisotropic residuals remains valid when the reference frame is frozen and that the statistics computed from unpaired data accurately capture the target image distribution without introducing new distortions.
ReAlign corrects the modality gap in unpaired data to let MLLMs learn visual distributions from text alone before instruction tuning, reducing dependence on expensive paired corpora.
Formal links
Cited by
Receipt and verification
| First computed | 2026-06-08T01:03:58.232218Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
308d8e9b0bd59a521ff2a57627d48fa89f245c840d23165effe84d74f3930f9e
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GCGY5GYL2WNFEH7SUV3CPVEPVC \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 308d8e9b0bd59a521ff2a57627d48fa89f245c840d23165effe84d74f3930f9e
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "396849b50901b5c2d18b42f454bf27b4f63b4e4caa7b19452aa641d8a0379828",
"cross_cats_sorted": [
"cs.AI",
"cs.MM"
],
"license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-02-02T13:59:39Z",
"title_canon_sha256": "a3a004a856edd2a7c835034529bccf638be46badc1f9285774d0bc71a3b3d631"
},
"schema_version": "1.0",
"source": {
"id": "2602.07026",
"kind": "arxiv",
"version": 3
}
}