OneTrackerV2 unifies multimodal tracking via Meta Merger and Dual Mixture-of-Experts to reach state-of-the-art results on five tasks and 12 benchmarks with efficiency and robustness when modalities are missing.
arXiv preprint arXiv:2304.14394 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4verdicts
UNVERDICTED 4representative citing papers
RELO formulates visual object tracking localization as a Markov decision process solved by reinforcement learning with combined IoU and AUC rewards, augmented by layer-aligned temporal token propagation, and reports 57.5% AUC on LaSOText without template updates.
GOLA reduces redundancy in low-rank adaptation for RGB-T tracking by using SVD-based partitioning and inter-group orthogonal constraints to enable complementary feature learning, outperforming prior methods on four benchmarks.
A dual-stage self-supervised tracker learns robust representations by first using semantic prompts on forward and backward branches then injecting contextual noise to handle complex feature spaces.
citing papers explorer
-
Unified Multimodal Visual Tracking with Dual Mixture-of-Experts
OneTrackerV2 unifies multimodal tracking via Meta Merger and Dual Mixture-of-Experts to reach state-of-the-art results on five tasks and 12 benchmarks with efficiency and robustness when modalities are missing.
-
RELO: Reinforcement Learning to Localize for Visual Object Tracking
RELO formulates visual object tracking localization as a Markov decision process solved by reinforcement learning with combined IoU and AUC rewards, augmented by layer-aligned temporal token propagation, and reports 57.5% AUC on LaSOText without template updates.
-
Group Orthogonal Low-Rank Adaptation for RGB-T Tracking
GOLA reduces redundancy in low-rank adaptation for RGB-T tracking by using SVD-based partitioning and inter-group orthogonal constraints to enable complementary feature learning, outperforming prior methods on four benchmarks.
-
Boosting Self-Supervised Tracking with Contextual Prompts and Noise Learning
A dual-stage self-supervised tracker learns robust representations by first using semantic prompts on forward and backward branches then injecting contextual noise to handle complex feature spaces.