Recognition: 2 theorem links
· Lean TheoremPretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search
Pith reviewed 2026-05-10 19:09 UTC · model grok-4.3
The pith
A pretrain-then-adapt approach uses bidirectional retrieval disagreements to estimate uncertainty and recalibrate text-based person search models on unlabeled test data alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a bidirectional retrieval disagreement mechanism supplies a usable proxy for uncertainty in cross-modal person search. Pairs that rank highly in both retrieval directions receive low uncertainty and are treated as reliable; the rest drive model recalibration in an offline test-time step. This process, applied after pretraining, mitigates domain shift without any target labels and yields consistent gains across CLIP-based and XVLM-based frameworks on CUHK-PEDES, ICFG-PEDES, RSTPReid, and PAB.
What carries the argument
The Uncertainty-Aware Test-Time Adaptation (UATTA) framework, whose core component is the bidirectional retrieval disagreement score that labels image-text pairs as low- or high-uncertainty to guide label-free recalibration.
If this is right
- The method removes dependence on large-scale target-domain labels for practical deployment.
- Adaptation incurs only minimal post-training time cost while still handling domain shift.
- Performance gains hold for both single-stage CLIP-style and two-stage XVLM-style retrieval pipelines.
- The same offline procedure establishes a new baseline for label-efficient person search systems.
Where Pith is reading between the lines
- The same disagreement-based uncertainty signal could serve as a lightweight proxy in other cross-modal retrieval settings where labeled target data is scarce.
- If the recalibration step proves robust, future systems might rely less on expensive synthetic pretraining corpora.
- Applying the method to streaming test data rather than a fixed offline batch would test whether continuous domain drift can be tracked without labels.
Load-bearing premise
Disagreement between the two retrieval directions reliably indicates uncertainty and that recalibrating on high-uncertainty pairs improves alignment without introducing new errors.
What would settle it
On any of the four benchmarks, the adapted model after UATTA recalibration shows lower top-1 or mAP retrieval accuracy than the same pretrained model left unchanged.
Figures
read the original abstract
Text-based person search faces inherent limitations due to data scarcity, driven by stringent privacy constraints and the high cost of manual annotation. To mitigate this, existing methods usually rely on a Pretrain-then-Finetune paradigm, where models are first pretrained on synthetic person-caption data to establish cross-modal alignment, followed by fine-tuning on labeled real-world datasets. However, this paradigm lacks practicality in real-world deployment scenarios, where large-scale annotated target-domain data is typically inaccessible. In this work, we propose a new Pretrain-then-Adapt paradigm that eliminates reliance on extensive target-domain supervision through an offline test-time adaptation manner, enabling dynamic model adaptation using only unlabeled test data with minimal post-train time cost. To mitigate overconfidence with false positives of previous entropy-based test-time adaptation, we propose an Uncertainty-Aware Test-Time Adaptation (UATTA) framework, which introduces a bidirectional retrieval disagreement mechanism to estimate uncertainty, i.e., low uncertainty is assigned when an image-text pair ranks highly in both image-to-text and text-to-image retrieval, indicating high alignment; otherwise, high uncertainty is detected. This indicator drives offline test-time model recalibration without labels, effectively mitigating domain shift. We validate UATTA on four benchmarks, i.e., CUHK-PEDES, ICFG-PEDES, RSTPReid, and PAB, showing consistent improvements across both CLIP-based (one-stage) and XVLM-based (two-stage) frameworks. Ablation studies confirm that UATTA outperforms existing offline test-time adaptation strategies, establishing a new benchmark for label-efficient, deployable person search systems. Our code is available at https://github.com/nkuzjh/UATTA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Pretrain-then-Adapt paradigm for text-based person search that replaces labeled fine-tuning with offline test-time adaptation on unlabeled target data. It introduces an Uncertainty-Aware Test-Time Adaptation (UATTA) framework whose core component is a bidirectional retrieval disagreement mechanism: an image-text pair is assigned low uncertainty (and used for recalibration) when it ranks highly in both image-to-text and text-to-image retrieval. The method is evaluated on CUHK-PEDES, ICFG-PEDES, RSTPReid and PAB, reporting consistent gains for both one-stage (CLIP) and two-stage (XVLM) backbones, with ablations against prior offline TTA baselines and public code release.
Significance. If the central claim holds, the work addresses a practically important gap: privacy-constrained deployment of cross-modal retrieval where target labels are unavailable. The offline, low-cost adaptation and evaluation across two model families and four benchmarks are positive features; public code further supports reproducibility.
major comments (1)
- The bidirectional high-rank agreement mechanism (described in the UATTA framework) implicitly assumes that mutual top ranking is diagnostic of correct cross-modal alignment. Under domain shift this can fail if the pretrained model shares the same systematic error in both retrieval directions, causing incorrect pairs to receive low uncertainty scores and be used for recalibration. The manuscript provides no error analysis, confusion-matrix breakdown, or verification that the uncertainty signal improves alignment rather than reinforcing noise; this assumption is load-bearing for the claim that UATTA safely mitigates shift without labels or new errors.
minor comments (1)
- Abstract states 'consistent improvements' and 'outperforms existing strategies' but supplies no numerical deltas, tables, or metrics; adding at least headline numbers would improve readability.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The major comment raises an important point about the core assumption in our bidirectional retrieval disagreement mechanism. We address it point-by-point below and commit to strengthening the manuscript with additional analysis.
read point-by-point responses
-
Referee: The bidirectional high-rank agreement mechanism (described in the UATTA framework) implicitly assumes that mutual top ranking is diagnostic of correct cross-modal alignment. Under domain shift this can fail if the pretrained model shares the same systematic error in both retrieval directions, causing incorrect pairs to receive low uncertainty scores and be used for recalibration. The manuscript provides no error analysis, confusion-matrix breakdown, or verification that the uncertainty signal improves alignment rather than reinforcing noise; this assumption is load-bearing for the claim that UATTA safely mitigates shift without labels or new errors.
Authors: We agree that the manuscript currently lacks a direct error analysis or verification of the selected low-uncertainty pairs. While the empirical gains on four benchmarks and two backbones (CLIP and XVLM) suggest the mechanism is effective in practice, we acknowledge this does not fully rule out reinforcement of systematic errors. In the revision we will add: (1) post-hoc precision of the top-ranked image-text pairs used for recalibration (computed against ground-truth labels available in the test sets), (2) a breakdown showing how uncertainty scores correlate with alignment quality, and (3) an ablation comparing adaptation performance when using only high-agreement pairs versus random or entropy-based selection. These additions will directly address whether the signal improves alignment or risks reinforcing noise. revision: yes
Circularity Check
No significant circularity detected
full rationale
The derivation chain introduces a bidirectional retrieval disagreement heuristic to estimate uncertainty for offline test-time recalibration on unlabeled data. This proxy is computed from the model's current rankings and used to drive adaptation, with performance then measured against external ground-truth benchmarks (CUHK-PEDES, ICFG-PEDES, RSTPReid, PAB). No step reduces a claimed prediction or result to its own inputs by definition, no fitted parameter is relabeled as a prediction, and no load-bearing premise rests solely on self-citation. The method is a proposed heuristic validated empirically rather than a self-referential construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pretrained cross-modal models can be recalibrated at test time using uncertainty estimates derived from retrieval rankings without target-domain labels.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
bidirectional retrieval disagreement mechanism to estimate uncertainty, i.e., low uncertainty is assigned when an image-text pair ranks highly in both image-to-text and text-to-image retrieval
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
uncertainty-weighted gradient re-calibration: LUATTA = sum ... -p log(p) / D
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Yang Bai, Jingyao Wang, Min Cao, Chen Chen, Ziqiang Cao, Liqiang Nie, and Min Zhang. 2023. Text-based person search without parallel image-text data. In Proceedings of the 31st ACM International Conference on Multimedia. 757–767
2023
-
[3]
Maryam Bukhari, Sadaf Yasmin, Sheneela Naz, Muazzam Maqsood, Jehyeok Rew, and Seungmin Rho. 2023. Language and vision based person re-identification for surveillance systems using deep learning with LIP layers.Image and Vision Computing132 (2023), 104658
2023
-
[4]
Feng Chen, Jielong He, Yang Liu, Heng Liu, Zhe Chen, and Yaxiong Wang. 2025. Unsupervised Cross-Modal Person Search via Progressive Diverse Text Genera- tion. InProceedings of the 33rd ACM International Conference on Multimedia (MM ’25). Association for Computing Machinery, 6047–6056
2025
-
[5]
Pengxu Chen, Huazhong Liu, Jihong Ding, Xinghao Huang, Shaojun Zou, and Laurence Tianruo Yang. 2025. Class Activation Values: Lucid and Faithful Visual Interpretations for CLIP-based Text-Image Retrievals. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). 844–853
2025
-
[6]
Weijing Chen, Linli Yao, and Qin Jin. 2023. Rethinking Benchmarks for Cross- modal Image-text Retrieval. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23). 1241–1251
2023
- [7]
-
[8]
Qichao Dong, Lingyu Liu, Yaxiong Wang, Jason J. R. Liu, and Zhedong Zheng
-
[9]
InACM Multimedia - BNI Track
Domain-Agnostic Neural Oil Painting via Normalization Affine Test-Time Adaptation. InACM Multimedia - BNI Track
-
[10]
Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, and Dong Chen. 2021. Unsupervised pre-training for person re- identification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14750–14759
2021
-
[11]
Bipin Gaikwad and Abhijit Karmakar. 2023. Real-time distributed video analytics for privacy-aware person search.Computer Vision and Image Understanding234 (2023), 103749
2023
-
[12]
Daming Gao, Yang Bai, Min Cao, Hao Dou, Mang Ye, and Min Zhang. 2025. Semi- Supervised Text-Based Person Search.IEEE Transactions on Image Processing34 (jan 2025), 5888–5903
2025
-
[13]
Tiantian Gong, Junsheng Wang, and Liyan Zhang. 2024. Enhancing cross-modal completion and alignment for unsupervised incomplete text-to-image person retrieval. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI ’24). Article 88, 9 pages
2024
-
[14]
Xiao Han, Sen He, Li Zhang, and Tao Xiang. 2021. Text-Based Person Search with Limited Data. InBMVC
2021
-
[15]
Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. 2024. Parameter-efficient fine-tuning for large models: A comprehensive survey.arXiv preprint arXiv:2403.14608(2024)
work page internal anchor Pith review arXiv 2024
-
[16]
Bohan Hou, Haoqiang Lin, Xuemeng Song, Haokun Wen, Meng Liu, Yupeng Hu, and Xiangyu Zhao. 2025. FiRE: Enhancing MLLMs with Fine-Grained Context Learning for Complex Image Retrieval. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). 803–812
2025
-
[17]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.ICLR1, 2 (2022), 3
2022
-
[18]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. InInternational conference on machine learning. pmlr, 448–456
2015
-
[19]
Yusuke Iwasawa and Yutaka Matsuo. 2021. Test-time classifier adjustment mod- ule for model-agnostic domain generalization.Advances in Neural Information Processing Systems34 (2021), 2427–2440
2021
-
[20]
Ding Jiang and Mang Ye. 2023. Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval. InIEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
2023
-
[21]
Jiayu Jiang, Changxing Ding, Wentao Tan, Junhong Wang, Jin Tao, and Xiangmin Xu. 2025. Modeling Thousands of Human Annotators for Generalizable Text-to- Image Person Re-identification. InProceedings of the Computer Vision and Pattern Recognition Conference. 9220–9230
2025
-
[22]
Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision?Advances in neural information processing systems30 (2017)
2017
-
[23]
Samee Khan, Tanveer Hussain, Amin Ullah, and Sung Baik. 2021. Deep-ReID: Deep Features and Autoencoder Assisted Image Patching Strategy for Person Re-identification in Smart Cities Surveillance.Multimedia Tools and Applications 83 (01 2021). doi:10.1007/s11042-020-10145-8
-
[24]
Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, and Sungroh Yoon. 2024. Entropy is not enough for test-time adaptation: From the perspective of disentangled factors.ICLR(2024)
2024
-
[25]
Haobin Li, Peng Hu, Qianjun Zhang, Xi Peng, XitingLiu, and Mouxing Yang
-
[26]
In The Thirteenth International Conference on Learning Representations
Test-time Adaptation for Cross-modal Retrieval with Query Shift. In The Thirteenth International Conference on Learning Representations. https: //openreview.net/forum?id=BmG88rONaU
-
[27]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. InInternational conference on machine learning. PMLR, 19730–19742
2023
-
[28]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InInternational conference on machine learning. PMLR, 12888–12900
2022
-
[29]
Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. 2021. Align before fuse: Vision and language repre- sentation learning with momentum distillation.Advances in neural information processing systems34 (2021), 9694–9705
2021
-
[30]
Shenshen Li, Chen He, Xing Xu, Fumin Shen, Yang Yang, and Heng Tao Shen
-
[31]
In Proceedings of the AAAI Conference on Artificial Intelligence, Vol
Adaptive uncertainty-based learning for text-based person retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 3172–3180
-
[32]
Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang
-
[33]
InProceedings of the IEEE conference on computer vision and pattern recognition
Person search with natural language description. InProceedings of the IEEE conference on computer vision and pattern recognition. 1970–1979
1970
-
[34]
Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. InProceedings of the 59th Annual Meeting of the Associa- tion for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 4582–4597
2021
-
[35]
Yongqi Li, Hongru Cai, Wenjie Wang, Leigang Qu, Yinwei Wei, Wenjie Li, Liqiang Nie, and Tat-Seng Chua. 2025. Revolutionizing Text-to-Image Retrieval as Au- toregressive Token-to-Voken Generation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). 813–822
2025
-
[36]
Zongyi Li, Li Jianbo, Yuxuan Shi, Jiazhong Chen, Shijuan Huang, Linnan Tu, Fei Shen, and Hefei Ling. 2025. Exploring the Potential of Large Vision-Language Models for Unsupervised Text-Based Person Retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 5119–5127
2025
-
[37]
Zongyi Li, Jianbo Li, Yuxuan Shi, Hefei Ling, Jiazhong Chen, Runsheng Wang, and Shijuan Huang. 2024. Cross-modal generation and alignment via attribute- guided prompt for unsupervised text-based person retrieval. InProceedings of the International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organ...
2024
-
[38]
Jian Liang, Dapeng Hu, and Jiashi Feng. 2020. Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation. InInternational Conference on Machine Learning (ICML). 6028–6039
2020
-
[39]
Vuong D Nguyen, Samiha Mirza, Abdollah Zakeri, Ayush Gupta, Khadija Khaldi, Rahma Aloui, Pranav Mantini, Shishir K Shah, and Fatima Merchant. 2024. Tack- ling domain shifts in person re-identification: A survey and analysis. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4149–4159
2024
-
[40]
Kai Niu, Liucun Shi, Ke Han, Qinzi Zhao, Yue Wu, and Yanning Zhang. 2025. Test-Time Adaptation for Text-Based Person Search. InProceedings of the 33rd ACM International Conference on Multimedia (MM ’25). 2997–3006
2025
- [41]
-
[42]
Leigang Qu, Meng Liu, Wenjie Wang, Zhedong Zheng, Liqiang Nie, and Tat- Seng Chua. 2023. Learnable pillar-based re-ranking for image-text retrieval. InProceedings of the 46th international ACM SIGIR conference on research and development in information retrieval. 1252–1261
2023
-
[43]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763
2021
-
[44]
Zhiyin Shao, Xinyu Zhang, Changxing Ding, Jian Wang, and Jingdong Wang. 2023. Unified pre-training with pseudo texts for text-to-image person re-identification. InProceedings of the IEEE/CVF international conference on computer vision. 11174– 11184
2023
-
[45]
Liangxu Su, Rong Quan, Zhiyuan Qi, and Jie Qin. 2024. MACA: Memory-aided Coarse-to-fine Alignment for Text-based Person Search. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). 2497–2501
2024
-
[46]
Jintao Sun, Hao Fei, Gangyi Ding, and Zhedong Zheng. 2025. From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search. InProceedings of the ACM on Web Conference 2025 (WWW ’25). ACM, 2341–2351. doi:10.1145/3696410.3714788
-
[47]
Mingkui Tan, Guohao Chen, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Peilin Zhao, and Shuaicheng Niu. 2025. Uncertainty-calibrated test-time model adaptation without forgetting.IEEE Transactions on Pattern Analysis and Machine Intelligence (2025). SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Zhang et al
2025
-
[48]
Wentan Tan, Changxing Ding, Jiayu Jiang, Fei Wang, Yibing Zhan, and Dapeng Tao. 2024. Harnessing the power of mllms for transferable text-to-image person reid. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17127–17137
2024
-
[49]
Haomiao Tang, Jinpeng Wang, Yuang Peng, GuangHao Meng, Ruisheng Luo, Bin Chen, Long Chen, Yaowei Wang, and Shu-Tao Xia. 2025. Modeling Uncertainty in Composed Image Retrieval via Probabilistic Embeddings. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1210–1222
2025
-
[50]
Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. 2021. Tent: Fully Test-Time Adaptation by Entropy Minimization. In International Conference on Learning Representations. https://openreview.net/ forum?id=uXl3bZLkr3c
2021
-
[51]
Junsheng Wang, Tiantian Gong, and Yan Yan. 2024. Semi-supervised Prototype Semantic Association Learning for Robust Cross-modal Retrieval. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 872–881
2024
-
[52]
Yaxiong Wang, Lianwei Wu, Lechao Cheng, Zhun Zhong, Yujiao Wu, and Meng Wang. 2025. Beyond general alignment: Fine-grained entity-centric image-text matching with multimodal attentive experts. InProceedings of the 48th Inter- national ACM SIGIR Conference on Research and Development in Information Retrieval. 792–802
2025
-
[53]
Shicheng Xu, Danyang Hou, Liang Pang, Jingcheng Deng, Jun Xu, Huawei Shen, and Xueqi Cheng. 2024. Invisible relevance bias: Text-image retrieval models prefer ai-generated images. InProceedings of the 47th international ACM SIGIR conference on research and development in information retrieval. 208–217
2024
-
[54]
Mouxing Yang, Yunfan Li, Changqing Zhang, Peng Hu, and Xi Peng. 2024. Test- time adaptation against multi-modal reliability bias. InThe twelfth international conference on learning representations
2024
-
[55]
Shuyu Yang, Yaxiong Wang, Yongrui Li, Li Zhu, and Zhedong Zheng. 2026. Minimizing the Pretraining Gap: Domain-Aligned Text-Based Person Retrieval. Pattern Recognition(2026)
2026
-
[56]
Shuyu Yang, Yaxiong Wang, Li Zhu, and Zhedong Zheng. 2025. Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 11720–11730
2025
-
[57]
Shuyu Yang, Yinan Zhou, Zhedong Zheng, Yaxiong Wang, Li Zhu, and Yujiao Wu
-
[58]
InProceedings of the 31st ACM international conference on multimedia
Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark. InProceedings of the 31st ACM international conference on multimedia. 4492–4501
- [59]
-
[60]
Chen Yiyang, Zheng Zhedong, Ji Wei, Qu Leigang, and Chua Tat-Seng. 2024. Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization. InThe Twelfth International Conference on Learning Representa- tions. https://openreview.net/forum?id=Yb5KvPkKQg
2024
-
[61]
Hang Yu, Jiahao Wen, and Zhedong Zheng. 2025. CAMeL: Cross-modality Adap- tive Meta-Learning for Text-based Person Retrieval.IEEE Transactions on Infor- mation Forensics and Security(2025)
2025
-
[62]
Yan Zeng, Xinsong Zhang, and Hang Li. 2021. Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts. InInternational Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:244129883
2021
-
[63]
Yifan Zhang, Xue Wang, Kexin Jin, Kun Yuan, Zhang Zhang, Liang Wang, Rong Jin, and Tieniu Tan. 2023. AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelh...
2023
-
[64]
Shizhen Zhao, Changxin Gao, Yuanjie Shao, Wei-Shi Zheng, and Nong Sang
-
[65]
InProceedings of the IEEE/CVF international conference on computer vision
Weakly supervised text-based person re-identification. InProceedings of the IEEE/CVF international conference on computer vision. 11395–11404
-
[66]
Shuai Zhao, Xiaohan Wang, Linchao Zhu, and Yi Yang. 2024. Test-time adaptation with clip reward for zero-shot generalization in vision-language models.ICLR (2024)
2024
-
[67]
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian
-
[68]
InProceedings of the IEEE international conference on computer vision
Scalable person re-identification: A benchmark. InProceedings of the IEEE international conference on computer vision. 1116–1124
-
[69]
Zhedong Zheng and Liang Zheng. 2024. 2. object re-identification: Problems, algorithms and responsible research practice.The Boundaries of Data(2024), 21
2024
-
[70]
Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, and Yi-Dong Shen. 2020. Dual-path convolutional image-text embeddings with instance loss.ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)16, 2 (2020), 1–23
2020
-
[71]
Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. InProceedings of the IEEE international conference on computer vision. 3754–3762
2017
-
[72]
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022. Learning to prompt for vision-language models.International Journal of Computer Vision 130, 9 (2022), 2337–2348
2022
-
[73]
Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang, Fangqiang Hu, and Gang Hua. 2021. Dssl: Deep surroundings-person separation learning for text-based person retrieval. InProceedings of the 29th ACM international conference on multimedia. 209–217
2021
-
[74]
Jialong Zuo, Jiahao Hong, Feng Zhang, Changqian Yu, Hanyu Zhou, Changxin Gao, Nong Sang, and Jingdong Wang. 2024. Plip: Language-image pre-training for person representation learning.Advances in Neural Information Processing Systems37 (2024), 45666–45702
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.