Not All Starting Points Are Equal: Pre-trained Priors and Their Outsized Impact on Person Identification
Pith reviewed 2026-05-22 12:51 UTC · model grok-4.3
The pith
Large pre-trained foundation models reach state-of-the-art person re-identification performance through simple fine-tuning that leaves solutions close to their initial weights.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under equated domain adaptation pipelines, pre-trained weights function as a strong prior; large foundation models therefore yield state-of-the-art re-identification accuracy on Market, PRCC, DeepChange, and BTS while the final weights stay close in parameter space to the starting values.
What carries the argument
Pre-trained weights acting as the prior in a maximum-probability point estimate of the Gibbs posterior under fixed domain-adaptation steps.
If this is right
- Large foundation models with direct fine-tuning set new performance levels on the listed re-id datasets.
- High-performing solutions lie close in parameter space to the original pre-trained weights.
- Comparable accuracy is reachable with small transfer sets and with different transfer datasets.
- Results are sensitive to optimizer, weight-decay value, and loss function.
- Direct fine-tuning of large vision foundation models should become a standard baseline in future re-id studies.
Where Pith is reading between the lines
- The same prior-strength argument may apply to other transfer-learning settings where adaptation data are limited.
- Measuring Euclidean or cosine distance in weight space could serve as a cheap diagnostic for how much a given pre-training run helps a downstream task.
- Future work could test whether deliberately moving the starting weights farther from the pre-trained point reduces final accuracy under the same adaptation budget.
Load-bearing premise
The domain adaptation pipelines are kept identical across every starting model so that performance gaps can be attributed directly to differences in the pre-trained weights.
What would settle it
Run the identical adaptation pipeline on several foundation models and measure whether the ranking of final accuracies remains stable or collapses when the pipelines are allowed to differ.
Figures
read the original abstract
Recent years have seen an explosion of diverse general purpose pre-training methodologies for computer vision. However, the impact that these pre-training methodologies have on person identification tasks (re-id) remains under-explored. We show that under equated domain adaptation pipelines, there is dramatic variance in person identification outcomes using different starting models (architectures and pre-trained weights). We show that a range of intuitive explanations for differing downstream performance on a range of re-id tests are insufficient and propose that pre-trained weights serve as a strong prior to the weights learned during domain adaptation. This framework allows for domain adapted solutions to be viewed as a maximum probability point estimate of the Gibbs posterior with the pre-trained weights acting as a prior. Under this framework, we show that large, pre-trained foundation models with simple domain adaptation achieve SOTA solutions on a range of re-id datasets (Market, PRCC, DeepChange, BTS) with solutions that are very close in the parameter space to the starting parameters. Moreover, we perform ablations on these solutions and show that they can be reached with small transfer sets and with varying transfer datasets but are sensitive to choice of optimizer, weight-decay, and loss function. Ultimately, we propose that the simple approach of direct fine-tuning using large vision foundation models (CLIP, Dino, EVA, AIM, etc.) needs to serve as an important baseline for future work in re-id.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an empirical study on the impact of different pre-trained vision models on person re-identification (re-id) tasks. It argues that under equated domain adaptation pipelines, there is dramatic variance in performance across starting models (e.g., CLIP, DINO, EVA, AIM). Intuitive explanations for these differences are deemed insufficient, and instead, pre-trained weights are proposed to act as strong priors. This is framed using the Gibbs posterior, where domain-adapted solutions are maximum probability point estimates. The paper reports that large foundation models achieve SOTA performance on re-id datasets such as Market, PRCC, DeepChange, and BTS, with adapted parameters remaining close to the initial ones. Ablations indicate that these solutions can be reached with small transfer sets and varying datasets but are sensitive to optimizer, weight-decay, and loss function choices.
Significance. Should the results be confirmed, this paper makes a valuable contribution by highlighting the outsized influence of pre-trained priors in re-id and recommending that simple fine-tuning of large models serve as a strong baseline for future work. The Gibbs posterior framing provides an interesting interpretive tool, and the empirical demonstrations on multiple datasets with ablations add to the evidence base. This could encourage the community to focus more on initialization effects rather than solely on novel adaptation techniques.
major comments (1)
- [Abstract and Experimental Setup] The equivalence of the domain adaptation pipelines across different starting models is load-bearing for the central claim that performance differences are due to the pre-trained priors. The abstract states that results hold 'under equated domain adaptation pipelines' and reports sensitivity to optimizer, weight-decay, and loss function. However, it is not clear whether other key hyperparameters (learning rate schedules, epoch counts, augmentation strength) were held strictly fixed for all initializations or re-optimized per model. If a single fixed recipe was applied without per-model tuning, superior performance for certain models (e.g., CLIP vs. EVA) may reflect better alignment with that recipe rather than prior strength alone. Explicit confirmation and a table listing the shared hyperparameter values used for every starting model are required to support the attribution.
minor comments (2)
- The abstract refers to 'a range of intuitive explanations' being insufficient; listing the specific explanations considered (and why they fail) in the introduction or related work section would improve transparency.
- [Ablations] The statement that solutions are 'very close in the parameter space to the starting parameters' would be strengthened by reporting a quantitative metric such as mean L2 distance or cosine similarity between initial and final weights, ideally in a results table.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the experimental details supporting our central claims. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract and Experimental Setup] The equivalence of the domain adaptation pipelines across different starting models is load-bearing for the central claim that performance differences are due to the pre-trained priors. The abstract states that results hold 'under equated domain adaptation pipelines' and reports sensitivity to optimizer, weight-decay, and loss function. However, it is not clear whether other key hyperparameters (learning rate schedules, epoch counts, augmentation strength) were held strictly fixed for all initializations or re-optimized per model. If a single fixed recipe was applied without per-model tuning, superior performance for certain models (e.g., CLIP vs. EVA) may reflect better alignment with that recipe rather than prior strength alone. Explicit confirmation and a table listing the shared hyperparameter values used for every starting model are required to s
Authors: We confirm that a single fixed hyperparameter recipe was used uniformly across all starting models (CLIP, DINO, EVA, AIM, etc.) with no per-model re-optimization of learning rate schedules, epoch counts, or augmentation strength. This fixed recipe was applied to isolate the effect of the pre-trained priors as the source of performance variance. The sensitivities to optimizer, weight-decay, and loss function noted in the abstract were explored in dedicated ablation studies (where those elements were varied while holding the rest of the pipeline fixed). To make the equivalence explicit, we will add a table in the revised manuscript listing all shared hyperparameter values applied to every initialization. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's central claims rest on direct empirical comparisons of performance variance and parameter-space proximity across different pre-trained initializations (CLIP, DINO, EVA, etc.) under a single fixed domain-adaptation recipe on multiple re-id benchmarks. These outcomes are measured quantities, not quantities derived from the Gibbs-posterior framing. The posterior view is explicitly offered as an interpretive lens for the observed closeness of adapted solutions to starting weights rather than a mathematical step that presupposes or constructs those measurements. No equation or claim reduces the reported SOTA results, ablation findings, or sensitivity analyses to a fitted parameter renamed as a prediction or to a self-referential definition. The derivation chain is therefore self-contained against the external experimental benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Domain adaptation pipelines can be equated across different pre-trained starting models for fair comparison
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
large, pre-trained foundation models with simple domain adaptation achieve SOTA solutions on a range of re-id datasets ... with solutions that are very close in the parameter space to the starting parameters
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
pre-trained weights serve as a strong prior to the weights learned during domain adaptation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Foundation models defining a new era in vision: A survey and outlook
Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(4):2245–2264,
-
[2]
Cloth-changing person re-identification with self-attention
Vaibhav Bansal, Gian Luca Foresti, and Niki Mar- tinel. Cloth-changing person re-identification with self-attention. In 2022 IEEE/CVF Winter Confer- ence on Applications of Computer Vision Workshops (WACVW), pages 602–610, 2022. 2
work page 2022
-
[3]
Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts
Soravit Changpinyo, Piyush Sharma, Nan Ding, and Radu Soricut. Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In 2021 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) , pages 3557–3567, 2021. 4
work page 2021
-
[4]
Occlude them all: Occlusion- aware attention network for occluded person re-id
Peixian Chen, Wenfeng Liu, Pingyang Dai, Jianzhuang Liu, Qixiang Ye, Mingliang Xu, Qi’an Chen, and Rongrong Ji. Occlude them all: Occlusion- aware attention network for occluded person re-id. In Proceedings of the IEEE/CVF international confer- ence on computer vision , pages 11833–11842, 2021. 3
work page 2021
-
[5]
Oc4-reid: Occluded cloth- changing person re-identification, 2024
Zhihao Chen, Yiyuan Ge, Ziyang Wang, Jiaju Kang, and Mingya Zhang. Oc4-reid: Occluded cloth- changing person re-identification, 2024. 8
work page 2024
-
[6]
Expanding accurate person recognition to new alti- tudes and ranges: The briar dataset
David Cornett, Joel Brogan, Nell Barber, Deniz Aykac, Seth Baird, Nicholas Burchfield, Carl Dukes, Andrew Duncan, Regina Ferrell, Jim Goddard, et al. Expanding accurate person recognition to new alti- tudes and ranges: The briar dataset. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 593–602, 2023. 1, 2
work page 2023
-
[7]
Dauphin, Angela Fan, Michael Auli, and David Grangier
Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated con- volutional networks, 2017. 3
work page 2017
-
[8]
An image is worth 16x16 words: Transformers for image recognition at scale, 2021
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. 2
work page 2021
-
[9]
Eva: Exploring the limits of masked visual representation learning at scale
Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. Eva: Exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19358–19369, 2023. 3
work page 2023
-
[10]
Eva-02: A vi- sual representation for neon genesis.Image and Vision Computing, 149:105171, 2024
Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. Eva-02: A vi- sual representation for neon genesis.Image and Vision Computing, 149:105171, 2024. 1, 2, 3, 4
work page 2024
-
[11]
Unsupervised pre-training for person re- identification, 2021
Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, and Dong Chen. Unsupervised pre-training for person re- identification, 2021. 3
work page 2021
-
[12]
Aonet: attentional occlusion-aware network for occluded person re-identification
Guangyu Gao, Qianxiang Wang, Jing Ge, and Yan Zhang. Aonet: attentional occlusion-aware network for occluded person re-identification. In Proceedings of the Asian conference on computer vision , pages 1606–1621, 2022. 3
work page 2022
-
[13]
Understanding the difficulty of training deep feedforward neural net- works
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural net- works. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , pages 249–256, Chia Laguna Resort, Sardinia, Italy,
-
[14]
X. Gu, H. Chang, B. Ma, S. Bai, S. Shan, and X. Chen. Clothes-changing person re-identification with rgb modality only. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 1060–1069, 2022. 2
work page 2022
-
[15]
Clothes-changing person re-identification with rgb modality only, 2022
Xinqian Gu, Hong Chang, Bingpeng Ma, Shutao Bai, Shiguang Shan, and Xilin Chen. Clothes-changing person re-identification with rgb modality only, 2022. 3, 5, 7
work page 2022
-
[16]
Dissecting the time course of person recogni- tion in natural viewing environments
Carina A Hahn, Alice J O’Toole, and P Jonathon Phillips. Dissecting the time course of person recogni- tion in natural viewing environments. British Journal of Psychology, 107(1):117–134, 2016. 1
work page 2016
-
[17]
Clothing-change feature augmenta- tion for person re-identification
Ke Han, Shaogang Gong, Yan Huang, Liang Wang, and Tieniu Tan. Clothing-change feature augmenta- tion for person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22066–22075, 2023. 2
work page 2023
-
[18]
Clip-scgi: Synthesized 9 caption-guided inversion for person re-identification,
Qianru Han, Xinwei He, Zhi Liu, Sannyuya Liu, Ying Zhang, and Jinhai Xiang. Clip-scgi: Synthesized 9 caption-guided inversion for person re-identification,
-
[19]
Deep residual learning for image recognition,
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition,
-
[20]
Transreid: Transformer-based ob- ject re-identification
Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. Transreid: Transformer-based ob- ject re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15013–15022, 2021. 2
work page 2021
-
[21]
Gaussian error lin- ear units (gelus), 2023
Dan Hendrycks and Kevin Gimpel. Gaussian error lin- ear units (gelus), 2023. 3
work page 2023
-
[22]
Rotary position embedding for vision trans- former, 2024
Byeongho Heo, Song Park, Dongyoon Han, and Sang- doo Yun. Rotary position embedding for vision trans- former, 2024. 3
work page 2024
-
[23]
Whole- body detection, identification and recognition at alti- tude and range
Siyuan Huang, Ram Prabhakar Kathirvel, Yuxiang Guo, Chun Pong Lau, and Rama Chellappa. Whole- body detection, identification and recognition at alti- tude and range. IEEE Transactions on Biometrics, Be- havior, and Identity Science, 2024. 2
work page 2024
-
[24]
Vills – video- image learning to learn semantics for person re- identification, 2024
Siyuan Huang, Ram Prabhakar, Yuxiang Guo, Rama Chellappa, and Cheng Peng. Vills – video- image learning to learn semantics for person re- identification, 2024. 3, 4, 5, 6, 7
work page 2024
- [25]
-
[26]
Clothing status awareness for long-term person re-identification
Yan Huang, Qiang Wu, JingSong Xu, Yi Zhong, and ZhaoXiang Zhang. Clothing status awareness for long-term person re-identification. In2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11875–11884, 2021. 2
work page 2021
-
[27]
Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick. Segment anything,
-
[28]
SV Aruna Kumar, Ehsan Yaghoubi, Abhijit Das, BS Harish, and Hugo Proenc ¸a. The p-destre: A fully an- notated dataset for pedestrian detection, tracking, and short/long-term re-identification from aerial devices. IEEE Transactions on Information Forensics and Se- curity, 16:1696–1708, 2020. 2
work page 2020
-
[29]
The open images dataset v4.International Journal of Com- puter Vision, 128(7):1956–1981, 2020
Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. The open images dataset v4.International Journal of Com- puter Vision, 128(7):1956–1981, 2020. 4
work page 1956
-
[30]
Attribute de-biased vision transformer (ad-vit) for long-term person re-identification
Kyung Won Lee, Bhavin Jawade, Deen Mohan, Sri- rangaraj Setlur, and Venu Govindaraju. Attribute de-biased vision transformer (ad-vit) for long-term person re-identification. In 2022 18th IEEE Inter- national Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–8, 2022. 2
work page 2022
-
[31]
Siyuan Li, Li Sun, and Qingli Li. Clip-reid: Exploit- ing vision-language model for image re-identification without concrete text labels, 2023. 3
work page 2023
-
[32]
Clip-driven cloth- agnostic feature learning for cloth-changing person re- identification, 2024
Shuang Li, Jiaxu Leng, Guozhang Li, Ji Gan, Haosheng chen, and Xinbo Gao. Clip-driven cloth- agnostic feature learning for cloth-changing person re- identification, 2024. 3
work page 2024
-
[33]
Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles
Tianjiao Li, Jun Liu, Wei Zhang, Yun Ni, Wen- qian Wang, and Zhiheng Li. Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles. In 2021 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 16261–16270, 2021. 2
work page 2021
-
[34]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft coco: Common ob- jects in context. In Computer Vision – ECCV 2014 , pages 740–755, Cham, 2014. Springer International Publishing. 4
work page 2014
-
[35]
Distilling clip with dual guidance for learning discriminative human body shape representation
Feng Liu, Minchul Kim, Zhiyuan Ren, and Xiaoming Liu. Distilling clip with dual guidance for learning discriminative human body shape representation. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 256–266, 2024. 3
work page 2024
-
[36]
Swin transformer: Hierarchical vision transformer using shifted windows, 2021
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows, 2021. 1, 2
work page 2021
-
[37]
Self- supervised pre-training for transformer-based person re-identification, 2021
Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li, and Rong Jin. Self- supervised pre-training for transformer-based person re-identification, 2021. 2
work page 2021
-
[38]
Subject identification up to 1km: Performer perspective on the iarpa briar program
Scott McCloskey, Brandon RichardWebster, Roddy Collins, and Anthony Hoogs. Subject identification up to 1km: Performer perspective on the iarpa briar program. Proceedings of the National Security Sensor and Data Fusion Committee (NSSDF), 2023. 2
work page 2023
-
[39]
Dissecting human body representations in deep networks trained for person identification, 2025
Thomas M Metz, Matthew Q Hill, Blake Myers, Veda Nandan Gandi, Rahul Chilakapati, and Alice J O’Toole. Dissecting human body representations in deep networks trained for person identification, 2025. 2, 3, 8
work page 2025
-
[40]
Myers, Lucas Jaggernauth, Thomas M
Blake A. Myers, Lucas Jaggernauth, Thomas M. Metz, Matthew Q. Hill, Veda Nandan Gandi, Car- los D. Castillo, and Alice J. O’Toole. Recognizing people by body shape using deep networks of images and words. Proceedings of the IEEE: International Joint Conference on Biometrics, 2023. 2 10
work page 2023
-
[41]
Unconstrained body recognition at altitude and range: Comparing four approaches, 2025
Blake A Myers, Matthew Q Hill, Veda Nandan Gandi, Thomas M Metz, and Alice J O’Toole. Unconstrained body recognition at altitude and range: Comparing four approaches, 2025. 1, 2, 3, 4, 6, 7
work page 2025
-
[42]
Masked attribute description embedding for cloth-changing person re- identification, 2024
Chunlei Peng, Boyu Wang, Decheng Liu, Nannan Wang, Ruimin Hu, and Xinbo Gao. Masked attribute description embedding for cloth-changing person re- identification, 2024. 4
work page 2024
-
[43]
Long-term cloth-changing person re- identification, 2020
Xuelin Qian, Wenxuan Wang, Li Zhang, Fangrui Zhu, Yanwei Fu, Tao Xiang, Yu-Gang Jiang, and Xi- angyang Xue. Long-term cloth-changing person re- identification, 2020. 2
work page 2020
-
[44]
Learning trans- ferable visual models from natural language supervi- sion, 2021
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning trans- ferable visual models from natural language supervi- sion, 2021. 3
work page 2021
-
[45]
Prajit Ramachandran, Barret Zoph, and Quoc V . Le. Searching for activation functions, 2017. 3
work page 2017
-
[46]
Imagenet-21k pretraining for the masses, 2021
Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, and Lihi Zelnik-Manor. Imagenet-21k pretraining for the masses, 2021. 4
work page 2021
-
[47]
Imagenet large scale visual recognition chal- lenge
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition chal- lenge. International journal of computer vision , 115: 211–252, 2015. 2, 4
work page 2015
-
[48]
Ob- jects365: A large-scale, high-quality dataset for object detection
Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, and Jian Sun. Ob- jects365: A large-scale, high-quality dataset for object detection. In 2019 IEEE/CVF International Confer- ence on Computer Vision (ICCV) , pages 8429–8438,
work page 2019
-
[49]
Charu Sharma, Siddhant R. Kapil, and David Chap- man. Person re-identification with a locally aware transformer, 2021. 2
work page 2021
-
[50]
Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. Conceptual captions: A cleaned, hy- pernymed, image alt-text dataset for automatic im- age captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers) , pages 2556–2565, Melbourne, Australia, 2018. Association for Compu- tatio...
work page 2018
- [51]
-
[52]
X. Shu, X. Wang, X. Zang, S. Zhang, Y . Chen, G. Li, and Q. Tian. Large-scale spatio-temporal person re-identification: Algorithms and benchmark. IEEE Transactions on Circuits and Systems for Video Tech- nology, 32(7):4390–4403, 2021. 4
work page 2021
-
[53]
Body part-based representation learning for occluded person re-identification
Vladimir Somers, Christophe De Vleeschouwer, and Alexandre Alahi. Body part-based representation learning for occluded person re-identification. In Pro- ceedings of the IEEE/CVF winter conference on appli- cations of computer vision, pages 1613–1623, 2023. 3
work page 2023
-
[54]
Roformer: Enhanced transformer with rotary position embedding, 2023
Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2023. 3
work page 2023
-
[55]
Eva-clip: Improved training techniques for clip at scale, 2023
Quan Sun, Yuxin Fang, Ledell Wu, Xinlong Wang, and Yue Cao. Eva-clip: Improved training techniques for clip at scale, 2023. 3
work page 2023
-
[56]
Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Pa- tra, Zhun Liu, Vishrav Chaudhary, Xia Song, and Furu Wei. Foundation transformers, 2022. 3
work page 2022
-
[57]
A benchmark for clothes variation in person re-identification
Kai Wang, Zhi Ma, Shiyan Chen, Jinni Yang, Keke Zhou, and Tao Li. A benchmark for clothes variation in person re-identification. International Journal of Intelligent Systems, 35(12):1881–1898, 2020. 2
work page 2020
-
[58]
Person transfer gan to bridge domain gap for person re-identification
Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 79–88, 2018. 2
work page 2018
-
[59]
Revealing the dark se- crets of masked image modeling, 2022
Zhenda Xie, Zigang Geng, Jingcheng Hu, Zheng Zhang, Han Hu, and Yue Cao. Revealing the dark se- crets of masked image modeling, 2022. 3
work page 2022
-
[60]
Deepchange: A large long- term person re-identification benchmark with clothes change, 2022
Peng Xu and Xiatian Zhu. Deepchange: A large long- term person re-identification benchmark with clothes change, 2022. 6
work page 2022
-
[61]
Deepchange: A long- term person re-identification benchmark with clothes change
Peng Xu and Xiatian Zhu. Deepchange: A long- term person re-identification benchmark with clothes change. In Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 11196– 11205, 2023. 2
work page 2023
-
[62]
Occluded person re- identification with single-scale global representations
Cheng Yan, Guansong Pang, Jile Jiao, Xiao Bai, Xue- tao Feng, and Chunhua Shen. Occluded person re- identification with single-scale global representations. In Proceedings of the IEEE/CVF international confer- ence on computer vision , pages 11875–11884, 2021. 3
work page 2021
-
[63]
Per- son re-identification by contour sketch under moder- ate clothing change
Qize Yang, Ancong Wu, and Wei-Shi Zheng. Per- son re-identification by contour sketch under moder- ate clothing change. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019. 2, 6
work page 2019
-
[64]
Good is bad: Causality inspired cloth- debiasing for cloth-changing person re-identification
Zhengwei Yang, Meng Lin, Xian Zhong, Yu Wu, and Zheng Wang. Good is bad: Causality inspired cloth- debiasing for cloth-changing person re-identification. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 1472–1481,
work page 2023
-
[65]
Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven C. H. Hoi. Deep learning for person re-identification: A survey and outlook. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 44(6):2872–2893, 2022. 1
work page 2022
-
[66]
Cocas: A large-scale clothes chang- ing person dataset for re-identification
Shijie Yu, Shihua Li, Dapeng Chen, Rui Zhao, Junjie Yan, and Yu Qiao. Cocas: A large-scale clothes chang- ing person dataset for re-identification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3400–3409, 2020. 1
work page 2020
-
[67]
Hat: Hierarchical aggregation trans- formers for person re-identification
Guowen Zhang, Pingping Zhang, Jinqing Qi, and Huchuan Lu. Hat: Hierarchical aggregation trans- formers for person re-identification. In Proceedings of the 29th ACM International Conference on Mul- timedia, page 516–525, New York, NY , USA, 2021. Association for Computing Machinery. 2
work page 2021
-
[68]
3d-aware neu- ral body fitting for occlusion robust 3d human pose estimation
Yi Zhang, Pengliang Ji, Angtian Wang, Jieru Mei, Adam Kortylewski, and Alan Yuille. 3d-aware neu- ral body fitting for occlusion robust 3d human pose estimation. In Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision , pages 9399– 9410, 2023. 3
work page 2023
-
[69]
Cilp-fgdi: Ex- ploiting vision-language model for generalizable per- son re-identification, 2025
Huazhong Zhao, Lei Qi, and Xin Geng. Cilp-fgdi: Ex- ploiting vision-language model for generalizable per- son re-identification, 2025. 3
work page 2025
-
[70]
Scalable person re- identification: A benchmark
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. Scalable person re- identification: A benchmark. In 2015 IEEE Interna- tional Conference on Computer Vision (ICCV), pages 1116–1124, 2015. 6
work page 2015
-
[71]
Scalable person re- identification: A benchmark
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. Scalable person re- identification: A benchmark. In Proceedings of the IEEE international conference on computer vision , pages 1116–1124, 2015. 2
work page 2015
-
[72]
Mars: A video benchmark for large-scale person re-identification
Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. Mars: A video benchmark for large-scale person re-identification. In Computer Vision–ECCV 2016: 14th European Con- ference, Amsterdam, The Netherlands, October 11- 14, 2016, Proceedings, Part VI 14 , pages 868–884. Springer, 2016. 2
work page 2016
-
[73]
Se- mantic understanding of scenes through the ade20k dataset
Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Se- mantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision , 127(3):302–321, 2019. 4
work page 2019
-
[74]
Sharc: Shape and appearance recogni- tion for person identification in-the-wild
Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, and Ram Nevatia. Sharc: Shape and appearance recogni- tion for person identification in-the-wild. In Proceed- ings of the IEEE/CVF Winter Conference on Applica- tions of Computer Vision, pages 6290–6300, 2024. 3
work page 2024
-
[75]
Occluded person re-identification
Jiaxuan Zhuo, Zeyu Chen, Jianhuang Lai, and Guang- cong Wang. Occluded person re-identification. In 2018 IEEE international conference on multimedia and expo (ICME), pages 1–6. IEEE, 2018. 3 12
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.