SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker

· 2026 · cs.CV · arXiv 2604.12502

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Parameter-efficient fine-tuning (PEFT) in multimodal tracking reveals a concerning trend where recent performance gains are often achieved at the cost of inflated parameter budgets, which fundamentally erodes PEFT's efficiency promise. In this work, we introduce SEATrack, a Simple, Efficient, and Adaptive two-stream multimodal tracker that tackles this performance-efficiency dilemma from two complementary perspectives. We first prioritize cross-modal alignment of matching responses, an underexplored yet pivotal factor that we argue is essential for breaking the trade-off. Specifically, we observe that modality-specific biases in existing two-stream methods generate conflicting matching attention maps, thereby hindering effective joint representation learning. To mitigate this, we propose AMG-LoRA, which seamlessly integrates Low-Rank Adaptation (LoRA) for domain adaptation with Adaptive Mutual Guidance (AMG) to dynamically refine and align attention maps across modalities. We then depart from conventional local fusion approaches by introducing a Hierarchical Mixture of Experts (HMoE) that enables efficient global relation modeling, effectively balancing expressiveness and computational efficiency in cross-modal fusion. Equipped with these innovations, SEATrack advances notable progress over state-of-the-art methods in balancing performance with efficiency across RGB-T, RGB-D, and RGB-E tracking tasks. \href{https://github.com/AutoLab-SAI-SJTU/SEATrack}{\textcolor{cyan}{Code is available}}.

representative citing papers

Unified Multimodal Visual Tracking with Dual Mixture-of-Experts

cs.CV · 2026-05-05 · unverdicted · novelty 7.0

OneTrackerV2 unifies multimodal tracking via Meta Merger and Dual Mixture-of-Experts to reach state-of-the-art results on five tasks and 12 benchmarks with efficiency and robustness when modalities are missing.

citing papers explorer

Showing 1 of 1 citing paper.

Unified Multimodal Visual Tracking with Dual Mixture-of-Experts cs.CV · 2026-05-05 · unverdicted · none · ref 6 · internal anchor
OneTrackerV2 unifies multimodal tracking via Meta Merger and Dual Mixture-of-Experts to reach state-of-the-art results on five tasks and 12 benchmarks with efficiency and robustness when modalities are missing.

SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker

fields

years

verdicts

representative citing papers

citing papers explorer