SymTrack is the first systematic detection-free framework for scene text tracking that constructs benchmarks from video text spotting datasets and reports up to 11.97% AUC gains over prior trackers.
CVPR , pages=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Gaze Attention groups visual embeddings into selectable regions and dynamically restricts attention to task-relevant ones, matching dense baselines with up to 90% fewer visual KV entries via added context tokens.
SpecPL introduces spectral decomposition via frozen VAE and counterfactual high-frequency permutation to bridge modality asymmetry in VLM prompt learning, reaching 81.51% harmonic-mean accuracy on 11 benchmarks.
citing papers explorer
-
Beyond Detection: A Structure-Aware Framework for Scene Text Tracking
SymTrack is the first systematic detection-free framework for scene text tracking that constructs benchmarks from video text spotting datasets and reports up to 11.97% AUC gains over prior trackers.
-
Learning to See What You Need: Gaze Attention for Multimodal Large Language Models
Gaze Attention groups visual embeddings into selectable regions and dynamically restricts attention to task-relevant ones, matching dense baselines with up to 90% fewer visual KV entries via added context tokens.
-
SpecPL: Disentangling Spectral Granularity for Prompt Learning
SpecPL introduces spectral decomposition via frozen VAE and counterfactual high-frequency permutation to bridge modality asymmetry in VLM prompt learning, reaching 81.51% harmonic-mean accuracy on 11 benchmarks.