pith. sign in

Toward unified token learning for vision-language tracking.IEEE Transactions on Circuits and Systems for Video Technology, 34(4):2125–2135

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

citation-role summary

baseline 1

citation-polarity summary

fields

cs.CV 1

years

2026 1

verdicts

UNVERDICTED 1

roles

baseline 1

polarities

baseline 1

representative citing papers

Learning to Track Instance from Single Nature Language Description

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

Tracker is a self-supervised VL tracker that uses a Dynamic Token Aggregation Module to learn instance tracking from single language descriptions in unlabeled videos and outperforms prior self-supervised methods.

citing papers explorer

Showing 1 of 1 citing paper.

  • Learning to Track Instance from Single Nature Language Description cs.CV · 2026-05-08 · unverdicted · none · ref 59

    Tracker is a self-supervised VL tracker that uses a Dynamic Token Aggregation Module to learn instance tracking from single language descriptions in unlabeled videos and outperforms prior self-supervised methods.