pith. machine review for the scientific record. sign in
Pith Number

pith:XUD5VQBE

pith:2021:XUD5VQBEN3C3DCTDBP33PMPOTT
not attested not anchored not stored refs resolved

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

Tsung-Yi Lin, Weicheng Kuo, Xiuye Gu, Yin Cui

A vision-language distillation method trains object detectors to recognize arbitrary text-described objects, including categories never seen in training.

arxiv:2104.13921 v3 · 2021-04-28 · cs.CV · cs.AI · cs.LG

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

ViLD obtains 16.1 mask AP_r with a ResNet-50 backbone, even outperforming the supervised counterpart by 3.8. When trained with a stronger teacher model ALIGN, ViLD achieves 26.3 AP_r.

C2weakest assumption

That embeddings produced by the teacher on image regions and category texts remain sufficiently aligned with the student's region proposals even for categories never seen during detector training.

C3one line summary

ViLD distills region and text embeddings from a teacher vision-language model into a student detector, enabling open-vocabulary detection that outperforms supervised baselines on held-out rare classes in LVIS and transfers to COCO, VOC, and Objects365.

References

15 extracted · 15 resolved · 0 Pith anchors

[1] Lvis: A dataset for large vocabulary instance seg- mentation 2022
[2] Zero shot recognition with unreliable attributes 2014
[3] Faster r-cnn: Towards real-time object detection with region proposal networks 2022
[4] Technical report: A good box is not a guarantee of a good mask.Joint COCO and LVIS workshop at ECCV 2020: LVIS Challenge Track, 2020
[5] The caltech-ucsd birds-200-2011 dataset 2011

Formal links

2 machine-checked theorem links

Cited by

19 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.952040Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

bd07dac0246ec5b18a630bf7b7b1ee9cc56c013b7525f6fd7c3add0f3ee26684

Aliases

arxiv: 2104.13921 · arxiv_version: 2104.13921v3 · doi: 10.48550/arxiv.2104.13921 · pith_short_12: XUD5VQBEN3C3 · pith_short_16: XUD5VQBEN3C3DCTD · pith_short_8: XUD5VQBE
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XUD5VQBEN3C3DCTDBP33PMPOTT \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: bd07dac0246ec5b18a630bf7b7b1ee9cc56c013b7525f6fd7c3add0f3ee26684
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "98315c67feae8fab3e6d382b299a6260eb382798eaff40304ac8565486c2c684",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2021-04-28T17:58:57Z",
    "title_canon_sha256": "14e6e262ae9f9fd75c3b326bb712106491e0ea98bba9e020cfd7a299b938345f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2104.13921",
    "kind": "arxiv",
    "version": 3
  }
}