TACO proposes Relative Structure Distillation and a lightweight specialization projection to mitigate inconsistency between fine-tuning and evaluation objectives in open-vocabulary video recognition, claiming state-of-the-art results on cross-dataset and base-to-novel benchmarks.
Orthogonal temporal interpolation for zero-shot video recognition
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TACO: Towards Task-Consistent Open-Vocabulary Adaptation in Video Recognition
TACO proposes Relative Structure Distillation and a lightweight specialization projection to mitigate inconsistency between fine-tuning and evaluation objectives in open-vocabulary video recognition, claiming state-of-the-art results on cross-dataset and base-to-novel benchmarks.