Diff-Tracking learns and updates text prompts for diffusion models so that cross-attention maps locate arbitrary targets across video frames without any ground-truth annotations.
Chase: Robust visual tracking via cell-level differentiable neural architecture search,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Leveraging Text-to-Image Diffusion Models for Unsupervised Visual Object Tracking
Diff-Tracking learns and updates text prompts for diffusion models so that cross-attention maps locate arbitrary targets across video frames without any ground-truth annotations.