pith:KD37OJPH
DiM\textsuperscript{3}: Bridging Multilingual and Multimodal Models via Direction- and Magnitude-Aware Merging
Selective merging of direction- and magnitude-aware residual updates injects multilingual capability into multimodal models without training.
arxiv:2605.12960 v1 · 2026-05-13 · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{KD37OJPHLG7ECMVMI4CUQI777Z}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Experiments on multilingual benchmarks in both text-only and vision-language settings, covering 57 languages across LLaVA- and Qwen-based backbones, show that DiM3 consistently outperforms existing merging baselines, substantially improves multilingual performance over the original multimodal model, and remains competitive with dedicated multilingual multimodal fine-tuning while largely retaining general multimodal ability.
The assumption that multilingual and multimodal residual updates are heterogeneous in a way that can be selectively composed per parameter dimension using direction and magnitude awareness without unintended interference in the shared backbone.
DiM3 merges multilingual and multimodal model updates in a direction- and magnitude-aware way to enhance multilingual performance in vision-language models while preserving original multimodal abilities.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T03:09:09.204085Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
50f7f725e759be4132ac47054823fffe45fae4028a59fa662ff78cdd107f01a5
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/KD37OJPHLG7ECMVMI4CUQI777Z \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 50f7f725e759be4132ac47054823fffe45fae4028a59fa662ff78cdd107f01a5
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "7fcbce155f5a653bc4d9cb8f097abf3f8237354f11cd20731ddaa486a6d9045c",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-05-13T03:50:54Z",
"title_canon_sha256": "1373ba8a1da4cf3771021660241b7d85437ab61e60bb136f615a8630a94fcbb4"
},
"schema_version": "1.0",
"source": {
"id": "2605.12960",
"kind": "arxiv",
"version": 1
}
}