pith:QT2OC6LC
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Language serves as a semantic anchor to align video, audio, depth, and infrared into one shared feature space.
arxiv:2310.01852 v7 · 2023-10-03 · cs.CV · cs.AI
Record completeness
Claims
LanguageBind has achieved superior performance on a wide range of 15 benchmarks covering video, audio, depth, and infrared. Moreover, multiple experiments have provided evidence for the effectiveness of LanguageBind in achieving indirect alignment and complementarity among diverse modalities.
That a language encoder trained only on video-text pairs already contains sufficiently rich semantics to serve as an effective binding anchor for infrared, depth, and audio without direct cross-modal supervision between those modalities.
LanguageBind aligns video, infrared, depth, and audio to a frozen language encoder via contrastive learning on the new VIDAL-10M dataset, extending video-language pretraining to N modalities.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:15.255731Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519 (pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
84f4e17962cdd73cf3e472c202a64c93dc2fd8ff7032b1327a7be4af869fdfc7
Aliases
· ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/QT2OC6LCZXLTZ47EOLBAFJSMSP \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 84f4e17962cdd73cf3e472c202a64c93dc2fd8ff7032b1327a7be4af869fdfc7
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "a89d3f1b0310aacfd85d336f1ea4af2f879bd89dae84a6100731d407e7c398cc",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2023-10-03T07:33:27Z",
"title_canon_sha256": "7e6e9944d2ba0e8600d523445e1059096ea4896ecd671d4aa18fb88ebe4f6991"
},
"schema_version": "1.0",
"source": {
"id": "2310.01852",
"kind": "arxiv",
"version": 7
}
}