pith. machine review for the scientific record. sign in

arxiv: 2604.02654 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Drift-Resilient Temporal Priors for Visual Tracking

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:46 UTC · model grok-4.3

classification 💻 cs.CV
keywords visual trackingmodel drifttemporal priorsreliability calibrationDTPTrackLaSOTGOT-10k
0
0 comments X

The pith

DTPTrack reduces model drift in visual trackers by learning reliability scores and synthesizing dynamic temporal priors from history.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DTPTrack as a lightweight module that can be added to existing visual trackers to prevent drift from noisy past predictions. It works through a Temporal Reliability Calibrator that scores each historical frame for usefulness and a Temporal Guidance Synthesizer that turns the reliable ones into compact predictive priors. These priors anchor to the ground-truth template while filtering noise, and the module integrates into trackers like OSTrack, ODTrack, and LoRAT. The strongest version reaches new state-of-the-art numbers on standard benchmarks.

Core claim

DTPTrack suppresses drift by assigning per-frame reliability scores to historical states to filter noise and synthesizing the calibrated history into a compact set of dynamic temporal priors that supply predictive guidance beyond the baseline tracker.

What carries the argument

The DTPTrack module built from a Temporal Reliability Calibrator (TRC) that learns per-frame reliability scores and a Temporal Guidance Synthesizer (TGS) that produces compact dynamic temporal priors from reliable history.

If this is right

  • DTPTrack integrates into three different tracking architectures and delivers consistent accuracy gains across all of them.
  • The best-performing version sets new state-of-the-art numbers of 77.5% Success on LaSOT and 80.3% AO on GOT-10k.
  • The priors anchor to the ground-truth template while discarding noisy historical states.
  • The same module works across OSTrack, ODTrack, and LoRAT without architecture-specific redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar reliability calibration could be tested in video object detection or action recognition to handle temporal noise.
  • Varying the number of historical frames fed into the synthesizer might reveal an optimal window size for long-term tracking.
  • Isolating the contribution of the synthesized priors versus the reliability scores alone would clarify which component drives the gains.

Load-bearing premise

The learned reliability scores genuinely separate useful signal from noise in historical predictions and the resulting priors supply predictive information not already available to the baseline tracker.

What would settle it

An ablation that replaces the learned reliability scores with uniform or random weights and still obtains the same accuracy gains on LaSOT and GOT-10k would show that the calibration step is not carrying the claimed benefit.

Figures

Figures reproduced from arXiv: 2604.02654 by Liting Lin, Weijun Zhuang, Xin Li, Yuqing Huang, Zhenyu He.

Figure 1
Figure 1. Figure 1: Comparison of temporal modeling strategies in visual tracking. (a) Autoregressive trackers propagate historical predictions through a sequence model, making them vulnerable to cumulative errors. (b) Dynamic memory trackers update an internal memory over time, but noisy predictions can contaminate the memory state. (c) Online spatial–temporal trackers process short video clips jointly but treat all historic… view at source ↗
Figure 2
Figure 2. Figure 2: Architectural Overview of the DTPTrack Module within our Extended LoRATv2 Backbone. Our module operates in two stages before the main Transformer blocks. First, the Temporal Reliability Calibrator (TRC) block summarizes the feature embeddings of the template (z0) and three historical reference frames (z1, z2, z3) and computes a reliability weight for each. The confidence for the initial ground-truth frame … view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of the gating mechanism. The model as [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison with state-of-the-art trackers on challenging scenarios [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
read the original abstract

Temporal information is crucial for visual tracking, but existing multi-frame trackers are vulnerable to model drift caused by naively aggregating noisy historical predictions. In this paper, we introduce DTPTrack, a lightweight and generalizable module designed to be seamlessly integrated into existing trackers to suppress drift. Our framework consists of two core components: (1) a Temporal Reliability Calibrator (TRC) mechanism that learns to assign a per-frame reliability score to historical states, filtering out noise while anchoring on the ground-truth template; and (2) a Temporal Guidance Synthesizer (TGS) module that synthesizes this calibrated history into a compact set of dynamic temporal priors to provide predictive guidance. To demonstrate its versatility, we integrate DTPTrack into three diverse tracking architectures--OSTrack, ODTrack, and LoRAT-and show consistent, significant performance gains across all baselines. Our best-performing model, built upon an extended LoRATv2 backbone, sets a new state-of-the-art on several benchmarks, achieving a 77.5% Success rate on LaSOT and an 80.3% AO on GOT-10k.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DTPTrack, a lightweight plug-in module for visual tracking consisting of a Temporal Reliability Calibrator (TRC) that learns per-frame reliability scores to filter noisy historical predictions while anchoring to the ground-truth template, and a Temporal Guidance Synthesizer (TGS) that converts the calibrated history into compact dynamic temporal priors. The module is inserted at the feature level and evaluated by integration into OSTrack, ODTrack, and LoRAT backbones under standard supervised training on tracking datasets, yielding consistent gains and new state-of-the-art results (77.5% Success on LaSOT, 80.3% AO on GOT-10k with extended LoRATv2).

Significance. If the empirical improvements hold under rigorous validation, the work supplies a generalizable, drift-resilient mechanism for exploiting temporal information in multi-frame trackers. The consistent gains across three architecturally distinct baselines and the reported SOTA numbers on standard benchmarks indicate practical utility for the tracking community.

major comments (2)
  1. [Experiments] Experiments section: the claim of 'consistent, significant performance gains' across OSTrack, ODTrack, and LoRAT is only partially supported because the manuscript provides no error bars, number of runs, or statistical significance tests; without these, it is impossible to determine whether the reported deltas exceed run-to-run variance.
  2. [§4.2] §4.2 (TRC description): the assertion that the learned reliability scores 'genuinely separate signal from noise' rests on the weakest assumption in the paper; the current ablations do not isolate whether the scores supply predictive information beyond what the baseline already extracts from the same history.
minor comments (2)
  1. [Figure 2] The integration diagram (Figure 2) would be clearer if it explicitly marked the feature-level insertion point of DTPTrack relative to the backbone's temporal aggregation layers.
  2. [§3] Notation: the symbols for the reliability score r_t and the synthesized prior P_t are introduced without a compact table of definitions; a short notation table would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for minor revision. We address each major comment point-by-point below, indicating the changes we will incorporate into the revised manuscript.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the claim of 'consistent, significant performance gains' across OSTrack, ODTrack, and LoRAT is only partially supported because the manuscript provides no error bars, number of runs, or statistical significance tests; without these, it is impossible to determine whether the reported deltas exceed run-to-run variance.

    Authors: We agree that the absence of error bars and statistical analysis weakens the strength of the 'consistent, significant' claim. In the revised manuscript we will rerun the three backbone integrations with three different random seeds, report mean and standard deviation for Success, AO, and Precision on LaSOT and GOT-10k, and add a brief statistical comparison (paired t-test) between baseline and DTPTrack-augmented results. The updated Experiments section and tables will reflect these additions. revision: yes

  2. Referee: [§4.2] §4.2 (TRC description): the assertion that the learned reliability scores 'genuinely separate signal from noise' rests on the weakest assumption in the paper; the current ablations do not isolate whether the scores supply predictive information beyond what the baseline already extracts from the same history.

    Authors: We thank the referee for identifying this gap. The existing Table 3 ablations show gains from calibrated versus raw history, but do not fully isolate the contribution of the learned scores. In the revision we will add a controlled ablation that replaces the learned reliability scores with (i) uniform scores and (ii) random scores drawn from the same distribution, while keeping the rest of the pipeline identical. We will also include qualitative visualizations of per-frame reliability scores on representative sequences to illustrate correlation with tracking quality. These additions will appear in §4.2 and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper presents DTPTrack as an architectural module (TRC for per-frame reliability scoring and TGS for synthesizing dynamic priors) inserted into existing trackers like OSTrack, ODTrack, and LoRAT. All claims rest on standard supervised training on public tracking datasets followed by empirical comparisons and ablations on benchmarks such as LaSOT and GOT-10k. No equations, derivations, or self-citations appear that reduce any prediction or performance gain to a quantity defined by the same inputs or fitted parameters. The reported improvements are externally falsifiable via public benchmark scores and remain independent of the module definitions themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard deep-learning assumptions about learned reliability scores being meaningful and on the existence of public tracking benchmarks; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5502 in / 1163 out tokens · 34569 ms · 2026-05-13T19:46:17.874334+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

  1. [1]

    Ar- trackv2: Prompting autoregressive tracker where to look and how to describe

    Yifan Bai, Zeyang Zhao, Yihong Gong, and Xing Wei. Ar- trackv2: Prompting autoregressive tracker where to look and how to describe. InCVPR, 2024. 6

  2. [2]

    Learning discriminative model prediction for track- ing

    Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. Learning discriminative model prediction for track- ing. InICCV, 2019. 3

  3. [3]

    Hiptrack: Vi- sual tracking with historical prompts

    Wenrui Cai, Qingjie Liu, and Yunhong Wang. Hiptrack: Vi- sual tracking with historical prompts. InCVPR, 2024. 2, 6

  4. [4]

    Spmtrack: Spatio-temporal parameter-efficient fine-tuning with mixture of experts for scalable visual tracking

    Wenrui Cai, Qingjie Liu, and Yunhong Wang. Spmtrack: Spatio-temporal parameter-efficient fine-tuning with mixture of experts for scalable visual tracking. InCVPR, 2025. 3, 5, 6

  5. [5]

    Robust object modeling for visual tracking

    Yidong Cai, Jie Liu, Jie Tang, and Gangshan Wu. Robust object modeling for visual tracking. InICCV, 2023. 2, 6

  6. [6]

    Backbone is all your need: A simplified architecture for visual object tracking

    Boyu Chen, Peixia Li, Lei Bai, Lei Qiao, Qiuhong Shen, Bo Li, Weihao Gan, Wei Wu, and Wanli Ouyang. Backbone is all your need: A simplified architecture for visual object tracking. InECCV, 2022. 2, 6

  7. [7]

    Transformer tracking

    Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. Transformer tracking. InCVPR, 2021. 2

  8. [8]

    High-performance transformer tracking

    Xin Chen, Bin Yan, Jiawen Zhu, Huchuan Lu, Xiang Ruan, and Dong Wang. High-performance transformer tracking. IEEE TPAMI, 45(7):8507–8523, 2022. 2

  9. [9]

    SeqTrack: Sequence to sequence learning for visual ob- ject tracking

    Xin Chen, Houwen Peng, Dong Wang, Huchuan Lu, and Han Hu. SeqTrack: Sequence to sequence learning for visual ob- ject tracking. InCVPR, 2023. 1, 6

  10. [10]

    MixFormer: End-to-end tracking with iterative mixed atten- tion

    Yutao Cui, Cheng Jiang, Limin Wang, and Gangshan Wu. MixFormer: End-to-end tracking with iterative mixed atten- tion. InCVPR, 2022. 1, 3

  11. [11]

    MixFormer: End-to-end tracking with iterative mixed atten- tion.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 4129 – 4146, 2024

    Yutao Cui, Cheng Jiang, Gangshan Wu, and Limin Wang. MixFormer: End-to-end tracking with iterative mixed atten- tion.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 4129 – 4146, 2024. 3

  12. [12]

    Proba- bilistic regression for visual tracking

    Martin Danelljan, Luc Van Gool, and Radu Timofte. Proba- bilistic regression for visual tracking. InCVPR, 2020. 3

  13. [13]

    FlashAttention-2: Faster attention with better par- allelism and work partitioning

    Tri Dao. FlashAttention-2: Faster attention with better par- allelism and work partitioning. InICLR, 2024. 1

  14. [14]

    Fu, Stefano Ermon, Atri Rudra, and Christopher R´e

    Tri Dao, Daniel Y . Fu, Stefano Ermon, Atri Rudra, and Christopher R´e. FlashAttention: Fast and memory-efficient exact attention with IO-awareness. InNeurIPS, 2022. 1

  15. [15]

    Vision transformers need registers

    Timoth ´ee Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Vision transformers need registers. InICLR,

  16. [16]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021. 1, 2

  17. [17]

    LaSOT: A high-quality benchmark for large-scale single ob- ject tracking

    Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. LaSOT: A high-quality benchmark for large-scale single ob- ject tracking. InCVPR, 2019. 5

  18. [18]

    AiATrack: Attention in attention for transformer visual tracking

    Shenyuan Gao, Chunluan Zhou, Chao Ma, Xinggang Wang, and Junsong Yuan. AiATrack: Attention in attention for transformer visual tracking. InECCV, 2022. 2, 6

  19. [19]

    Generalized relation modeling for transformer tracking

    Shenyuan Gao, Chunluan Zhou, and Jun Zhang. Generalized relation modeling for transformer tracking. InCVPR, 2023. 2, 6

  20. [20]

    Dreamtrack: Dreaming the future for mul- timodal visual object tracking

    Mingzhe Guo, Weiping Tan, Wenyu Ran, Liping Jing, and Zhipeng Zhang. Dreamtrack: Dreaming the future for mul- timodal visual object tracking. InCVPR, 2025. 2

  21. [21]

    Target-aware tracking with long-term context attention

    Kaijie He, Canlong Zhang, Sheng Xie, Zhixin Li, and Zhi- wen Wang. Target-aware tracking with long-term context attention. InAAAI, 2023. 1, 3, 6

  22. [22]

    LoRA: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In ICLR, 2022. 3, 1

  23. [23]

    Got-10k: A large high-diversity benchmark for generic object tracking in the wild.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 43(5):1562–1577, 2021

    Lianghua Huang, Xin Zhao, and Kaiqi Huang. Got-10k: A large high-diversity benchmark for generic object tracking in the wild.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 43(5):1562–1577, 2021. 5

  24. [24]

    Rtracker: Recoverable tracking via pn tree structured memory

    Yuqing Huang, Xin Li, Zikun Zhou, Yaowei Wang, Zhenyu He, and Ming-Hsuan Yang. Rtracker: Recoverable tracking via pn tree structured memory. InCVPR, 2024. 2

  25. [25]

    Exploring enhanced contextual information for video-level object tracking

    Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, and Dong Wang. Exploring enhanced contextual information for video-level object tracking. InAAAI, 2025. 3

  26. [26]

    The tenth visual object tracking vot2022 challenge re- sults

    Matej Kristan, Ale ˇs Leonardis, Jiˇr´ı Matas, Michael Felsberg, Roman Pflugfelder, Joni-Kristian K ¨am¨ar¨ainen, Hyung Jin Chang, Martin Danelljan, Luka ˇCehovin Zajc, Alan Lukeˇziˇc, et al. The tenth visual object tracking vot2022 challenge re- sults. InECCV Workshops, 2022. 2

  27. [27]

    The second visual object tracking segmentation vots2024 challenge results

    Matej Kristan, Ji ˇr´ı Matas, Pavel Tokmakov, Michael Fels- berg, Luka ˇCehovin Zajc, Alan Luke ˇziˇc, Khanh-Tung Tran, Xuan-Son Vu, Johanna Bj ¨orklund, Hyung Jin Chang, et al. The second visual object tracking segmentation vots2024 challenge results. InECCV Workshops, pages 357–383. Springer, 2024. 2

  28. [28]

    FractalNet: Ultra-deep neural networks without residuals

    Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. FractalNet: Ultra-deep neural networks without residuals. InICLR, 2016. 1

  29. [29]

    CiteTracker: Correlating image and text for visual tracking

    Xin Li, Yuqing Huang, Zhenyu He, Yaowei Wang, Huchuan Lu, and Ming-Hsuan Yang. CiteTracker: Correlating image and text for visual tracking. InICCV, 2023. 2

  30. [30]

    SwinTrack: A simple and strong baseline for trans- former tracking

    Liting Lin, Heng Fan, Zhipeng Zhang, Yong Xu, and Haibin Ling. SwinTrack: A simple and strong baseline for trans- former tracking. InNeurIPS, 2022. 2

  31. [31]

    Tracking meets lora: Faster training, larger model, stronger performance

    Liting Lin, Heng Fan, Zhipeng Zhang, Yaowei Wang, Yong Xu, and Haibin Ling. Tracking meets lora: Faster training, larger model, stronger performance. InECCV, 2024. 2, 5, 6, 7, 1

  32. [32]

    Loratv2: En- abling low-cost temporal modeling in one-stream trackers

    Liting Lin, Heng Fan, Zhipeng Zhang, Yuqing Huang, Yaowei Wang, Yong Xu, and Haibin Ling. Loratv2: En- abling low-cost temporal modeling in one-stream trackers. InNeurIPS, 2025. 3, 5, 6

  33. [33]

    Microsoft COCO: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014. 5

  34. [34]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2019. 1

  35. [35]

    A benchmark and simulator for uav tracking

    Matthias Mueller, Neil Smith, and Bernard Ghanem. A benchmark and simulator for uav tracking. InECCV, 2016. 5

  36. [36]

    TrackingNet: A large-scale dataset and benchmark for object tracking in the wild

    Matthias Muller, Adel Bibi, Silvio Giancola, Salman Al- subaihi, and Bernard Ghanem. TrackingNet: A large-scale dataset and benchmark for object tracking in the wild. In ECCV, 2018. 5

  37. [37]

    Learning multi-domain convolutional neural networks for visual tracking

    Hyeonseob Nam and Bohyung Han. Learning multi-domain convolutional neural networks for visual tracking. InCVPR,

  38. [38]

    DINOv2: Learning robust visual features without supervi- sion

    Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervi- sion. InTMLR, 2024. 5, 1

  39. [39]

    Vast- track: Vast category visual object tracking

    Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, and Libo Zhang. Vast- track: Vast category visual object tracking. InNeurIPS,

  40. [40]

    arXiv preprint arXiv:2112.05682 , year=

    Markus N Rabe and Charles Staats. Self-attention does not needo(n 2)memory.arXiv preprint arXiv:2112.05682,

  41. [41]

    Generalized in- tersection over union: A metric and a loss for bounding box regression

    Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized in- tersection over union: A metric and a loss for bounding box regression. InCVPR, 2019. 1

  42. [42]

    DeiT III: Revenge of the ViT

    Hugo Touvron, Matthieu Cord, and Herv ´e J´egou. DeiT III: Revenge of the ViT. InECCV. Springer, 2022. 1

  43. [43]

    Towards more flexible and accurate object tracking with natural language: Algo- rithms and benchmark

    Xiao Wang, Xiujun Shu, Zhipeng Zhang, Bo Jiang, Yaowei Wang, Yonghong Tian, and Feng Wu. Towards more flexible and accurate object tracking with natural language: Algo- rithms and benchmark. InCVPR, 2021. 5

  44. [44]

    Autoregressive visual tracking

    Xing Wei, Yifan Bai, Yongchao Zheng, Dahu Shi, and Yi- hong Gong. Autoregressive visual tracking. InCVPR, 2023. 2, 3, 6

  45. [45]

    DropMAE: Masked autoen- coders with spatial-attention dropout for tracking tasks

    Qiangqiang Wu, Tianyu Yang, Ziquan Liu, Baoyuan Wu, Ying Shan, and Antoni B Chan. DropMAE: Masked autoen- coders with spatial-attention dropout for tracking tasks. In CVPR, 2023. 2, 6

  46. [46]

    Object track- ing benchmark.IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, 2015

    Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Object track- ing benchmark.IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, 2015. 5

  47. [47]

    Motiontrack: Learning motion predictor for multiple object tracking.Neu- ral Networks, 179:106539, 2024

    Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xi- ang Zhang, Zhigang Luo, and Dacheng Tao. Motiontrack: Learning motion predictor for multiple object tracking.Neu- ral Networks, 179:106539, 2024. 2

  48. [48]

    Video- track: Learning to track objects via video transformer

    Fei Xie, Lei Chu, Jiahao Li, Yan Lu, and Chao Ma. Video- track: Learning to track objects via video transformer. In CVPR, 2023

  49. [49]

    Diffusiontrack: Point set diffusion model for visual object tracking

    Fei Xie, Zhongdao Wang, and Chao Ma. Diffusiontrack: Point set diffusion model for visual object tracking. In CVPR, 2024. 2

  50. [50]

    Autore- gressive queries for adaptive tracking with spatio-temporal transformers

    Jinxia Xie, Bineng Zhong, Zhiyi Mo, Shengping Zhang, Liangtao Shi, Shuxiang Song, and Rongrong Ji. Autore- gressive queries for adaptive tracking with spatio-temporal transformers. InCVPR, 2024. 3, 6

  51. [51]

    Learning spatio-temporal transformer for vi- sual tracking

    Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, and Huchuan Lu. Learning spatio-temporal transformer for vi- sual tracking. InICCV, 2021. 1, 2, 3

  52. [52]

    Joint feature learning and relation modeling for tracking: A one-stream framework

    Botao Ye, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. Joint feature learning and relation modeling for tracking: A one-stream framework. InECCV, 2022. 1, 2, 5, 6, 7

  53. [53]

    Odtrack: Online dense temporal token learning for visual tracking

    Yaozong Zheng, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shengping Zhang, and Xianxian Li. Odtrack: Online dense temporal token learning for visual tracking. InAAAI, 2024. 1, 2, 3, 6, 7

  54. [54]

    Two-stream beats one-stream: asymmetric siamese network for efficient visual tracking

    Jiawen Zhu, Huayi Tang, Xin Chen, Xinying Wang, Dong Wang, and Huchuan Lu. Two-stream beats one-stream: asymmetric siamese network for efficient visual tracking. In AAAI, 2025. 2 Drift-Resilient Temporal Priors for Visual Tracking Supplementary Material In this supplementary material, we provide more imple- mentation details and visualizations for the pro...