arxiv: 2604.02654 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Drift-Resilient Temporal Priors for Visual Tracking

Yuqing Huang , Liting Lin , Weijun Zhuang , Zhenyu He , Xin Li

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:46 UTC · model grok-4.3

classification 💻 cs.CV

keywords visual trackingmodel drifttemporal priorsreliability calibrationDTPTrackLaSOTGOT-10k

0 comments

The pith

DTPTrack reduces model drift in visual trackers by learning reliability scores and synthesizing dynamic temporal priors from history.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DTPTrack as a lightweight module that can be added to existing visual trackers to prevent drift from noisy past predictions. It works through a Temporal Reliability Calibrator that scores each historical frame for usefulness and a Temporal Guidance Synthesizer that turns the reliable ones into compact predictive priors. These priors anchor to the ground-truth template while filtering noise, and the module integrates into trackers like OSTrack, ODTrack, and LoRAT. The strongest version reaches new state-of-the-art numbers on standard benchmarks.

Core claim

DTPTrack suppresses drift by assigning per-frame reliability scores to historical states to filter noise and synthesizing the calibrated history into a compact set of dynamic temporal priors that supply predictive guidance beyond the baseline tracker.

What carries the argument

The DTPTrack module built from a Temporal Reliability Calibrator (TRC) that learns per-frame reliability scores and a Temporal Guidance Synthesizer (TGS) that produces compact dynamic temporal priors from reliable history.

If this is right

DTPTrack integrates into three different tracking architectures and delivers consistent accuracy gains across all of them.
The best-performing version sets new state-of-the-art numbers of 77.5% Success on LaSOT and 80.3% AO on GOT-10k.
The priors anchor to the ground-truth template while discarding noisy historical states.
The same module works across OSTrack, ODTrack, and LoRAT without architecture-specific redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar reliability calibration could be tested in video object detection or action recognition to handle temporal noise.
Varying the number of historical frames fed into the synthesizer might reveal an optimal window size for long-term tracking.
Isolating the contribution of the synthesized priors versus the reliability scores alone would clarify which component drives the gains.

Load-bearing premise

The learned reliability scores genuinely separate useful signal from noise in historical predictions and the resulting priors supply predictive information not already available to the baseline tracker.

What would settle it

An ablation that replaces the learned reliability scores with uniform or random weights and still obtains the same accuracy gains on LaSOT and GOT-10k would show that the calibration step is not carrying the claimed benefit.

Figures

Figures reproduced from arXiv: 2604.02654 by Liting Lin, Weijun Zhuang, Xin Li, Yuqing Huang, Zhenyu He.

**Figure 1.** Figure 1: Comparison of temporal modeling strategies in visual tracking. (a) Autoregressive trackers propagate historical predictions through a sequence model, making them vulnerable to cumulative errors. (b) Dynamic memory trackers update an internal memory over time, but noisy predictions can contaminate the memory state. (c) Online spatial–temporal trackers process short video clips jointly but treat all historic… view at source ↗

**Figure 2.** Figure 2: Architectural Overview of the DTPTrack Module within our Extended LoRATv2 Backbone. Our module operates in two stages before the main Transformer blocks. First, the Temporal Reliability Calibrator (TRC) block summarizes the feature embeddings of the template (z0) and three historical reference frames (z1, z2, z3) and computes a reliability weight for each. The confidence for the initial ground-truth frame … view at source ↗

**Figure 3.** Figure 3: Visualization of the gating mechanism. The model as [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison with state-of-the-art trackers on challenging scenarios [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

read the original abstract

Temporal information is crucial for visual tracking, but existing multi-frame trackers are vulnerable to model drift caused by naively aggregating noisy historical predictions. In this paper, we introduce DTPTrack, a lightweight and generalizable module designed to be seamlessly integrated into existing trackers to suppress drift. Our framework consists of two core components: (1) a Temporal Reliability Calibrator (TRC) mechanism that learns to assign a per-frame reliability score to historical states, filtering out noise while anchoring on the ground-truth template; and (2) a Temporal Guidance Synthesizer (TGS) module that synthesizes this calibrated history into a compact set of dynamic temporal priors to provide predictive guidance. To demonstrate its versatility, we integrate DTPTrack into three diverse tracking architectures--OSTrack, ODTrack, and LoRAT-and show consistent, significant performance gains across all baselines. Our best-performing model, built upon an extended LoRATv2 backbone, sets a new state-of-the-art on several benchmarks, achieving a 77.5% Success rate on LaSOT and an 80.3% AO on GOT-10k.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DTPTrack adds a lightweight plug-in module with reliability scoring and prior synthesis that delivers consistent gains across three trackers and new SOTA numbers on LaSOT and GOT-10k.

read the letter

The paper's main point is a small module called DTPTrack that you can insert into existing multi-frame trackers to cut drift. It has two pieces: the Temporal Reliability Calibrator learns per-frame scores on historical states to drop noisy ones while staying tied to the ground-truth template, and the Temporal Guidance Synthesizer packs the cleaned history into compact dynamic priors that feed the backbone for better next-frame predictions. They drop it into OSTrack, ODTrack, and LoRAT under the same training setup and show steady lifts on the usual benchmarks, with the best version hitting 77.5% success on LaSOT and 80.3% AO on GOT-10k. The ablations cover the two components and the three backbones, and the gains hold without any obvious circular fitting. The idea builds on temporal history but adds a concrete reliability filter that the cited baselines lacked, and the implementation stays lightweight and general. A soft spot is that the paper does not spend much time on when the reliability scores go wrong, such as prolonged occlusion or sudden appearance changes where past frames are mostly noise. The improvements are real but not huge, so the value depends on how much drift is actually limiting performance in a given pipeline. No problems with the data handling or basic soundness. This is the sort of practical addition that tracking researchers and people running long-term video pipelines would find useful to try. It has enough new detail and supporting runs to deserve a full referee process rather than a quick pass.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DTPTrack, a lightweight plug-in module for visual tracking consisting of a Temporal Reliability Calibrator (TRC) that learns per-frame reliability scores to filter noisy historical predictions while anchoring to the ground-truth template, and a Temporal Guidance Synthesizer (TGS) that converts the calibrated history into compact dynamic temporal priors. The module is inserted at the feature level and evaluated by integration into OSTrack, ODTrack, and LoRAT backbones under standard supervised training on tracking datasets, yielding consistent gains and new state-of-the-art results (77.5% Success on LaSOT, 80.3% AO on GOT-10k with extended LoRATv2).

Significance. If the empirical improvements hold under rigorous validation, the work supplies a generalizable, drift-resilient mechanism for exploiting temporal information in multi-frame trackers. The consistent gains across three architecturally distinct baselines and the reported SOTA numbers on standard benchmarks indicate practical utility for the tracking community.

major comments (2)

[Experiments] Experiments section: the claim of 'consistent, significant performance gains' across OSTrack, ODTrack, and LoRAT is only partially supported because the manuscript provides no error bars, number of runs, or statistical significance tests; without these, it is impossible to determine whether the reported deltas exceed run-to-run variance.
[§4.2] §4.2 (TRC description): the assertion that the learned reliability scores 'genuinely separate signal from noise' rests on the weakest assumption in the paper; the current ablations do not isolate whether the scores supply predictive information beyond what the baseline already extracts from the same history.

minor comments (2)

[Figure 2] The integration diagram (Figure 2) would be clearer if it explicitly marked the feature-level insertion point of DTPTrack relative to the backbone's temporal aggregation layers.
[§3] Notation: the symbols for the reliability score r_t and the synthesized prior P_t are introduced without a compact table of definitions; a short notation table would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for minor revision. We address each major comment point-by-point below, indicating the changes we will incorporate into the revised manuscript.

read point-by-point responses

Referee: [Experiments] Experiments section: the claim of 'consistent, significant performance gains' across OSTrack, ODTrack, and LoRAT is only partially supported because the manuscript provides no error bars, number of runs, or statistical significance tests; without these, it is impossible to determine whether the reported deltas exceed run-to-run variance.

Authors: We agree that the absence of error bars and statistical analysis weakens the strength of the 'consistent, significant' claim. In the revised manuscript we will rerun the three backbone integrations with three different random seeds, report mean and standard deviation for Success, AO, and Precision on LaSOT and GOT-10k, and add a brief statistical comparison (paired t-test) between baseline and DTPTrack-augmented results. The updated Experiments section and tables will reflect these additions. revision: yes
Referee: [§4.2] §4.2 (TRC description): the assertion that the learned reliability scores 'genuinely separate signal from noise' rests on the weakest assumption in the paper; the current ablations do not isolate whether the scores supply predictive information beyond what the baseline already extracts from the same history.

Authors: We thank the referee for identifying this gap. The existing Table 3 ablations show gains from calibrated versus raw history, but do not fully isolate the contribution of the learned scores. In the revision we will add a controlled ablation that replaces the learned reliability scores with (i) uniform scores and (ii) random scores drawn from the same distribution, while keeping the rest of the pipeline identical. We will also include qualitative visualizations of per-frame reliability scores on representative sequences to illustrate correlation with tracking quality. These additions will appear in §4.2 and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper presents DTPTrack as an architectural module (TRC for per-frame reliability scoring and TGS for synthesizing dynamic priors) inserted into existing trackers like OSTrack, ODTrack, and LoRAT. All claims rest on standard supervised training on public tracking datasets followed by empirical comparisons and ablations on benchmarks such as LaSOT and GOT-10k. No equations, derivations, or self-citations appear that reduce any prediction or performance gain to a quantity defined by the same inputs or fitted parameters. The reported improvements are externally falsifiable via public benchmark scores and remain independent of the module definitions themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard deep-learning assumptions about learned reliability scores being meaningful and on the existence of public tracking benchmarks; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5502 in / 1163 out tokens · 34569 ms · 2026-05-13T19:46:17.874334+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

[1]

Ar- trackv2: Prompting autoregressive tracker where to look and how to describe

Yifan Bai, Zeyang Zhao, Yihong Gong, and Xing Wei. Ar- trackv2: Prompting autoregressive tracker where to look and how to describe. InCVPR, 2024. 6

work page 2024
[2]

Learning discriminative model prediction for track- ing

Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. Learning discriminative model prediction for track- ing. InICCV, 2019. 3

work page 2019
[3]

Hiptrack: Vi- sual tracking with historical prompts

Wenrui Cai, Qingjie Liu, and Yunhong Wang. Hiptrack: Vi- sual tracking with historical prompts. InCVPR, 2024. 2, 6

work page 2024
[4]

Spmtrack: Spatio-temporal parameter-efficient fine-tuning with mixture of experts for scalable visual tracking

Wenrui Cai, Qingjie Liu, and Yunhong Wang. Spmtrack: Spatio-temporal parameter-efficient fine-tuning with mixture of experts for scalable visual tracking. InCVPR, 2025. 3, 5, 6

work page 2025
[5]

Robust object modeling for visual tracking

Yidong Cai, Jie Liu, Jie Tang, and Gangshan Wu. Robust object modeling for visual tracking. InICCV, 2023. 2, 6

work page 2023
[6]

Backbone is all your need: A simplified architecture for visual object tracking

Boyu Chen, Peixia Li, Lei Bai, Lei Qiao, Qiuhong Shen, Bo Li, Weihao Gan, Wei Wu, and Wanli Ouyang. Backbone is all your need: A simplified architecture for visual object tracking. InECCV, 2022. 2, 6

work page 2022
[7]

Transformer tracking

Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. Transformer tracking. InCVPR, 2021. 2

work page 2021
[8]

High-performance transformer tracking

Xin Chen, Bin Yan, Jiawen Zhu, Huchuan Lu, Xiang Ruan, and Dong Wang. High-performance transformer tracking. IEEE TPAMI, 45(7):8507–8523, 2022. 2

work page 2022
[9]

SeqTrack: Sequence to sequence learning for visual ob- ject tracking

Xin Chen, Houwen Peng, Dong Wang, Huchuan Lu, and Han Hu. SeqTrack: Sequence to sequence learning for visual ob- ject tracking. InCVPR, 2023. 1, 6

work page 2023
[10]

MixFormer: End-to-end tracking with iterative mixed atten- tion

Yutao Cui, Cheng Jiang, Limin Wang, and Gangshan Wu. MixFormer: End-to-end tracking with iterative mixed atten- tion. InCVPR, 2022. 1, 3

work page 2022
[11]

MixFormer: End-to-end tracking with iterative mixed atten- tion.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 4129 – 4146, 2024

Yutao Cui, Cheng Jiang, Gangshan Wu, and Limin Wang. MixFormer: End-to-end tracking with iterative mixed atten- tion.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 4129 – 4146, 2024. 3

work page 2024
[12]

Proba- bilistic regression for visual tracking

Martin Danelljan, Luc Van Gool, and Radu Timofte. Proba- bilistic regression for visual tracking. InCVPR, 2020. 3

work page 2020
[13]

FlashAttention-2: Faster attention with better par- allelism and work partitioning

Tri Dao. FlashAttention-2: Faster attention with better par- allelism and work partitioning. InICLR, 2024. 1

work page 2024
[14]

Fu, Stefano Ermon, Atri Rudra, and Christopher R´e

Tri Dao, Daniel Y . Fu, Stefano Ermon, Atri Rudra, and Christopher R´e. FlashAttention: Fast and memory-efficient exact attention with IO-awareness. InNeurIPS, 2022. 1

work page 2022
[15]

Vision transformers need registers

Timoth ´ee Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Vision transformers need registers. InICLR,

work page
[16]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021. 1, 2

work page 2021
[17]

LaSOT: A high-quality benchmark for large-scale single ob- ject tracking

Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. LaSOT: A high-quality benchmark for large-scale single ob- ject tracking. InCVPR, 2019. 5

work page 2019
[18]

AiATrack: Attention in attention for transformer visual tracking

Shenyuan Gao, Chunluan Zhou, Chao Ma, Xinggang Wang, and Junsong Yuan. AiATrack: Attention in attention for transformer visual tracking. InECCV, 2022. 2, 6

work page 2022
[19]

Generalized relation modeling for transformer tracking

Shenyuan Gao, Chunluan Zhou, and Jun Zhang. Generalized relation modeling for transformer tracking. InCVPR, 2023. 2, 6

work page 2023
[20]

Dreamtrack: Dreaming the future for mul- timodal visual object tracking

Mingzhe Guo, Weiping Tan, Wenyu Ran, Liping Jing, and Zhipeng Zhang. Dreamtrack: Dreaming the future for mul- timodal visual object tracking. InCVPR, 2025. 2

work page 2025
[21]

Target-aware tracking with long-term context attention

Kaijie He, Canlong Zhang, Sheng Xie, Zhixin Li, and Zhi- wen Wang. Target-aware tracking with long-term context attention. InAAAI, 2023. 1, 3, 6

work page 2023
[22]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In ICLR, 2022. 3, 1

work page 2022
[23]

Got-10k: A large high-diversity benchmark for generic object tracking in the wild.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 43(5):1562–1577, 2021

Lianghua Huang, Xin Zhao, and Kaiqi Huang. Got-10k: A large high-diversity benchmark for generic object tracking in the wild.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 43(5):1562–1577, 2021. 5

work page 2021
[24]

Rtracker: Recoverable tracking via pn tree structured memory

Yuqing Huang, Xin Li, Zikun Zhou, Yaowei Wang, Zhenyu He, and Ming-Hsuan Yang. Rtracker: Recoverable tracking via pn tree structured memory. InCVPR, 2024. 2

work page 2024
[25]

Exploring enhanced contextual information for video-level object tracking

Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, and Dong Wang. Exploring enhanced contextual information for video-level object tracking. InAAAI, 2025. 3

work page 2025
[26]

The tenth visual object tracking vot2022 challenge re- sults

Matej Kristan, Ale ˇs Leonardis, Jiˇr´ı Matas, Michael Felsberg, Roman Pflugfelder, Joni-Kristian K ¨am¨ar¨ainen, Hyung Jin Chang, Martin Danelljan, Luka ˇCehovin Zajc, Alan Lukeˇziˇc, et al. The tenth visual object tracking vot2022 challenge re- sults. InECCV Workshops, 2022. 2

work page 2022
[27]

The second visual object tracking segmentation vots2024 challenge results

Matej Kristan, Ji ˇr´ı Matas, Pavel Tokmakov, Michael Fels- berg, Luka ˇCehovin Zajc, Alan Luke ˇziˇc, Khanh-Tung Tran, Xuan-Son Vu, Johanna Bj ¨orklund, Hyung Jin Chang, et al. The second visual object tracking segmentation vots2024 challenge results. InECCV Workshops, pages 357–383. Springer, 2024. 2

work page 2024
[28]

FractalNet: Ultra-deep neural networks without residuals

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. FractalNet: Ultra-deep neural networks without residuals. InICLR, 2016. 1

work page 2016
[29]

CiteTracker: Correlating image and text for visual tracking

Xin Li, Yuqing Huang, Zhenyu He, Yaowei Wang, Huchuan Lu, and Ming-Hsuan Yang. CiteTracker: Correlating image and text for visual tracking. InICCV, 2023. 2

work page 2023
[30]

SwinTrack: A simple and strong baseline for trans- former tracking

Liting Lin, Heng Fan, Zhipeng Zhang, Yong Xu, and Haibin Ling. SwinTrack: A simple and strong baseline for trans- former tracking. InNeurIPS, 2022. 2

work page 2022
[31]

Tracking meets lora: Faster training, larger model, stronger performance

Liting Lin, Heng Fan, Zhipeng Zhang, Yaowei Wang, Yong Xu, and Haibin Ling. Tracking meets lora: Faster training, larger model, stronger performance. InECCV, 2024. 2, 5, 6, 7, 1

work page 2024
[32]

Loratv2: En- abling low-cost temporal modeling in one-stream trackers

Liting Lin, Heng Fan, Zhipeng Zhang, Yuqing Huang, Yaowei Wang, Yong Xu, and Haibin Ling. Loratv2: En- abling low-cost temporal modeling in one-stream trackers. InNeurIPS, 2025. 3, 5, 6

work page 2025
[33]

Microsoft COCO: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014. 5

work page 2014
[34]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2019. 1

work page 2019
[35]

A benchmark and simulator for uav tracking

Matthias Mueller, Neil Smith, and Bernard Ghanem. A benchmark and simulator for uav tracking. InECCV, 2016. 5

work page 2016
[36]

TrackingNet: A large-scale dataset and benchmark for object tracking in the wild

Matthias Muller, Adel Bibi, Silvio Giancola, Salman Al- subaihi, and Bernard Ghanem. TrackingNet: A large-scale dataset and benchmark for object tracking in the wild. In ECCV, 2018. 5

work page 2018
[37]

Learning multi-domain convolutional neural networks for visual tracking

Hyeonseob Nam and Bohyung Han. Learning multi-domain convolutional neural networks for visual tracking. InCVPR,

work page
[38]

DINOv2: Learning robust visual features without supervi- sion

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervi- sion. InTMLR, 2024. 5, 1

work page 2024
[39]

Vast- track: Vast category visual object tracking

Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, and Libo Zhang. Vast- track: Vast category visual object tracking. InNeurIPS,

work page
[40]

arXiv preprint arXiv:2112.05682 , year=

Markus N Rabe and Charles Staats. Self-attention does not needo(n 2)memory.arXiv preprint arXiv:2112.05682,

work page arXiv
[41]

Generalized in- tersection over union: A metric and a loss for bounding box regression

Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized in- tersection over union: A metric and a loss for bounding box regression. InCVPR, 2019. 1

work page 2019
[42]

DeiT III: Revenge of the ViT

Hugo Touvron, Matthieu Cord, and Herv ´e J´egou. DeiT III: Revenge of the ViT. InECCV. Springer, 2022. 1

work page 2022
[43]

Towards more flexible and accurate object tracking with natural language: Algo- rithms and benchmark

Xiao Wang, Xiujun Shu, Zhipeng Zhang, Bo Jiang, Yaowei Wang, Yonghong Tian, and Feng Wu. Towards more flexible and accurate object tracking with natural language: Algo- rithms and benchmark. InCVPR, 2021. 5

work page 2021
[44]

Autoregressive visual tracking

Xing Wei, Yifan Bai, Yongchao Zheng, Dahu Shi, and Yi- hong Gong. Autoregressive visual tracking. InCVPR, 2023. 2, 3, 6

work page 2023
[45]

DropMAE: Masked autoen- coders with spatial-attention dropout for tracking tasks

Qiangqiang Wu, Tianyu Yang, Ziquan Liu, Baoyuan Wu, Ying Shan, and Antoni B Chan. DropMAE: Masked autoen- coders with spatial-attention dropout for tracking tasks. In CVPR, 2023. 2, 6

work page 2023
[46]

Object track- ing benchmark.IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, 2015

Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Object track- ing benchmark.IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, 2015. 5

work page 2015
[47]

Motiontrack: Learning motion predictor for multiple object tracking.Neu- ral Networks, 179:106539, 2024

Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xi- ang Zhang, Zhigang Luo, and Dacheng Tao. Motiontrack: Learning motion predictor for multiple object tracking.Neu- ral Networks, 179:106539, 2024. 2

work page 2024
[48]

Video- track: Learning to track objects via video transformer

Fei Xie, Lei Chu, Jiahao Li, Yan Lu, and Chao Ma. Video- track: Learning to track objects via video transformer. In CVPR, 2023

work page 2023
[49]

Diffusiontrack: Point set diffusion model for visual object tracking

Fei Xie, Zhongdao Wang, and Chao Ma. Diffusiontrack: Point set diffusion model for visual object tracking. In CVPR, 2024. 2

work page 2024
[50]

Autore- gressive queries for adaptive tracking with spatio-temporal transformers

Jinxia Xie, Bineng Zhong, Zhiyi Mo, Shengping Zhang, Liangtao Shi, Shuxiang Song, and Rongrong Ji. Autore- gressive queries for adaptive tracking with spatio-temporal transformers. InCVPR, 2024. 3, 6

work page 2024
[51]

Learning spatio-temporal transformer for vi- sual tracking

Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, and Huchuan Lu. Learning spatio-temporal transformer for vi- sual tracking. InICCV, 2021. 1, 2, 3

work page 2021
[52]

Joint feature learning and relation modeling for tracking: A one-stream framework

Botao Ye, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. Joint feature learning and relation modeling for tracking: A one-stream framework. InECCV, 2022. 1, 2, 5, 6, 7

work page 2022
[53]

Odtrack: Online dense temporal token learning for visual tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shengping Zhang, and Xianxian Li. Odtrack: Online dense temporal token learning for visual tracking. InAAAI, 2024. 1, 2, 3, 6, 7

work page 2024
[54]

Two-stream beats one-stream: asymmetric siamese network for efficient visual tracking

Jiawen Zhu, Huayi Tang, Xin Chen, Xinying Wang, Dong Wang, and Huchuan Lu. Two-stream beats one-stream: asymmetric siamese network for efficient visual tracking. In AAAI, 2025. 2 Drift-Resilient Temporal Priors for Visual Tracking Supplementary Material In this supplementary material, we provide more imple- mentation details and visualizations for the pro...

work page 2025