pith:XUD5VQBE
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
A vision-language distillation method trains object detectors to recognize arbitrary text-described objects, including categories never seen in training.
arxiv:2104.13921 v3 · 2021-04-28 · cs.CV · cs.AI · cs.LG
Record completeness
Claims
ViLD obtains 16.1 mask AP_r with a ResNet-50 backbone, even outperforming the supervised counterpart by 3.8. When trained with a stronger teacher model ALIGN, ViLD achieves 26.3 AP_r.
That embeddings produced by the teacher on image regions and category texts remain sufficiently aligned with the student's region proposals even for categories never seen during detector training.
ViLD distills region and text embeddings from a teacher vision-language model into a student detector, enabling open-vocabulary detection that outperforms supervised baselines on held-out rare classes in LVIS and transfers to COCO, VOC, and Objects365.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:13.952040Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
bd07dac0246ec5b18a630bf7b7b1ee9cc56c013b7525f6fd7c3add0f3ee26684
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XUD5VQBEN3C3DCTDBP33PMPOTT \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: bd07dac0246ec5b18a630bf7b7b1ee9cc56c013b7525f6fd7c3add0f3ee26684
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "98315c67feae8fab3e6d382b299a6260eb382798eaff40304ac8565486c2c684",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2021-04-28T17:58:57Z",
"title_canon_sha256": "14e6e262ae9f9fd75c3b326bb712106491e0ea98bba9e020cfd7a299b938345f"
},
"schema_version": "1.0",
"source": {
"id": "2104.13921",
"kind": "arxiv",
"version": 3
}
}