pith. machine review for the scientific record. sign in
Pith Number

pith:QT2OC6LC

pith:2023:QT2OC6LCZXLTZ47EOLBAFJSMSP
not attested not anchored not stored refs resolved

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Bin Lin, Bin Zhu, Hongfa Wang, Jiaxi Cui, Junwu Zhang, Li Yuan, Munan Ning, Wancai Zhang, Wei Liu, Wenhao Jiang, Yang Yan, Yatian Pang, ZhiFeng Li, Zongwei Li

Language serves as a semantic anchor to align video, audio, depth, and infrared into one shared feature space.

arxiv:2310.01852 v7 · 2023-10-03 · cs.CV · cs.AI

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open

Claims

C1strongest claim

LanguageBind has achieved superior performance on a wide range of 15 benchmarks covering video, audio, depth, and infrared. Moreover, multiple experiments have provided evidence for the effectiveness of LanguageBind in achieving indirect alignment and complementarity among diverse modalities.

C2weakest assumption

That a language encoder trained only on video-text pairs already contains sufficiently rich semantics to serve as an effective binding anchor for infrared, depth, and audio without direct cross-modal supervision between those modalities.

C3one line summary

LanguageBind aligns video, infrared, depth, and audio to a frozen language encoder via contrastive learning on the new VIDAL-10M dataset, extending video-language pretraining to N modalities.

References

202 extracted · 202 resolved · 13 Pith anchors

[2] Localizing moments in video with natural language 2017
[3] Convolutional neural networks for static and dynamic breast infrared imaging classification 2018
[4] Interactive intrinsic video editing 2014
[5] Activitynet: A large-scale video benchmark for human activity understanding 2015
[6] Estimating depth from monocular images as classification using deep fully convolutional residual networks 2017

Formal links

2 machine-checked theorem links

Cited by

22 papers in Pith

Receipt and verification
First computed2026-05-17T23:38:15.255731Z
Builderpith-number-builder-2026-05-17-v1
SignaturePith Ed25519 (pith-v1-2026-05) · public key
Schemapith-number/v1.0

Canonical hash

84f4e17962cdd73cf3e472c202a64c93dc2fd8ff7032b1327a7be4af869fdfc7

Aliases

arxiv: 2310.01852 · arxiv_version: 2310.01852v7 · doi: 10.48550/arxiv.2310.01852
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/QT2OC6LCZXLTZ47EOLBAFJSMSP \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 84f4e17962cdd73cf3e472c202a64c93dc2fd8ff7032b1327a7be4af869fdfc7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a89d3f1b0310aacfd85d336f1ea4af2f879bd89dae84a6100731d407e7c398cc",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-10-03T07:33:27Z",
    "title_canon_sha256": "7e6e9944d2ba0e8600d523445e1059096ea4896ecd671d4aa18fb88ebe4f6991"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2310.01852",
    "kind": "arxiv",
    "version": 7
  }
}