An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval
Pith reviewed 2026-05-22 22:52 UTC · model grok-4.3
The pith
A fully synthetic data pipeline can serve as a standalone replacement or augmentation to real data for training text-based person retrieval models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a unified synthesis pipeline operating entirely without real person data produces training examples whose practical utility is demonstrated through experiments: an inter-class module creates diverse identity-centric images via automatic prompt construction, an intra-class module increases identity variation via text-driven image editing, and automatic text generation supplies the paired descriptions, allowing the resulting data to function either as a complete replacement for real data or as a complementary augmentation.
What carries the argument
The unified data synthesis pipeline that combines inter-class image generation via automatic prompt construction with intra-class augmentation via text-driven image editing, plus automatic textual description generation.
If this is right
- Models trained solely on the synthetic data achieve competitive retrieval accuracy on real test sets.
- Mixing the synthetic data with real data produces higher performance than real data alone in the tested scenarios.
- The method removes the requirement to collect real person images or obtain manual textual annotations.
- The pipeline supports systematic testing of synthetic data effectiveness across a range of real-world retrieval conditions.
- Automatic generation enables production of arbitrarily large training sets without additional human effort.
Where Pith is reading between the lines
- The same generation approach could be adapted to reduce data collection needs in other vision-language retrieval tasks.
- If the synthetic data preserves or increases diversity, it might help address dataset biases that appear in real collections.
- Scaling the pipeline with different image generators could further improve the quality gap to real data.
- Widespread adoption would lower the infrastructure cost of deploying text-based person retrieval systems.
Load-bearing premise
The generated synthetic images and descriptions are realistic and diverse enough that models trained on them achieve performance on real test data that reflects actual usefulness.
What would settle it
If models trained only on the synthetic data achieve substantially lower accuracy than real-data baselines on standard real-world benchmarks such as CUHK-PEDES, the claim of practical utility as a replacement would be falsified.
Figures
read the original abstract
Data plays a pivotal role in Text-Based Person Retrieval (TBPR) research. Mainstream research paradigm necessitates real-world person images with manual textual annotations for training models, posing privacy concerns and annotation burdens. Several pioneering efforts explore synthetic data generation, and yet still depend on real data as a foundation, inheriting the same limitations. The feasibility of purely synthetic TBPR data remains unexplored, and there is currently no systematic study on the effectiveness boundaries of synthetic data across various real-world scenarios. In this work, we present the first comprehensive empirical study of synthetic data for TBPR, with two key aspects. (1) We propose a unified data synthesis pipeline that can operate entirely without real person data. It combines an inter-class image generation module that produces diverse identity-centric images by means of an automatic prompt construction strategy, and an intra-class augmentation module that enhances identity variation through text-driven image editing. (2) Leveraging this pipeline and an automatic textual description generation, we explore the effectiveness of synthetic data in diverse scenarios through extensive experiments, to reveal its practical utility as either a standalone replacement or a complementary augmentation to real data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents the first comprehensive empirical study of synthetic data for Text-Based Person Retrieval (TBPR). It proposes a unified synthesis pipeline that generates both images and descriptions entirely without real person data, via an inter-class module using automatic prompt construction for diverse identities and an intra-class module using text-driven editing for variation. Automatic textual description generation is also included. Extensive experiments across scenarios are used to assess whether synthetic data can serve as a standalone replacement or complementary augmentation to real data.
Significance. If the results hold, the work could meaningfully reduce privacy and annotation burdens in TBPR by establishing empirical boundaries for synthetic-data utility. Credit is given for conducting the first systematic study of a fully synthetic pipeline and for running experiments across multiple scenarios rather than a single setting.
major comments (2)
- [§3 (unified data synthesis pipeline)] The central claim that the pipeline produces data 'sufficiently realistic and diverse' for practical utility rests solely on downstream retrieval metrics on real test sets. No independent validation of the synthetic distribution (FID, attribute-level fidelity, or human realism ratings) is described for the inter-class generation or intra-class augmentation modules.
- [§4 (experiments)] Because the generative models are pre-trained on real distributions, end-task performance alone cannot isolate whether success stems from the proposed automatic prompt and editing strategy or from the base generators. An ablation or control (e.g., random prompts or non-person-specific editing) is needed to support the 'practical utility' conclusion.
minor comments (2)
- [Abstract] The abstract states that experiments explore 'diverse scenarios' but does not enumerate them; a brief list would improve clarity.
- [§3.1] Notation for the automatic prompt construction strategy and text-driven editing operations should be introduced with explicit symbols or pseudocode for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below, agreeing where the manuscript requires strengthening and outlining the planned changes.
read point-by-point responses
-
Referee: [§3 (unified data synthesis pipeline)] The central claim that the pipeline produces data 'sufficiently realistic and diverse' for practical utility rests solely on downstream retrieval metrics on real test sets. No independent validation of the synthetic distribution (FID, attribute-level fidelity, or human realism ratings) is described for the inter-class generation or intra-class augmentation modules.
Authors: We agree that the original manuscript relies exclusively on downstream TBPR metrics to support claims of sufficient realism and diversity. Independent metrics such as FID, attribute-level analysis, or human ratings are absent. In the revised version we will add FID comparisons between the synthetic images (both inter-class and intra-class) and real distributions, together with attribute-level fidelity checks where automatic attribute labels can be obtained from the generation process. revision: yes
-
Referee: [§4 (experiments)] Because the generative models are pre-trained on real distributions, end-task performance alone cannot isolate whether success stems from the proposed automatic prompt and editing strategy or from the base generators. An ablation or control (e.g., random prompts or non-person-specific editing) is needed to support the 'practical utility' conclusion.
Authors: The referee correctly identifies that the current experiments do not isolate the contribution of the automatic prompt construction and text-driven editing modules from the capabilities of the underlying pre-trained generators. We will add the requested control ablations in the revision: (i) inter-class generation with random prompts instead of our structured identity-centric prompts, and (ii) intra-class augmentation with non-person-specific editing instructions, reporting the resulting retrieval performance to demonstrate the incremental benefit of the proposed strategies. revision: yes
Circularity Check
No circularity: empirical study relies on experimental comparisons, not derivations or self-referential definitions
full rationale
The paper is a purely empirical study of synthetic data for TBPR. It proposes a data synthesis pipeline and evaluates it via experiments on real test sets, with no equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations. All claims rest on direct performance comparisons between synthetic and real data setups. No step reduces by construction to its own inputs, satisfying the criteria for score 0 with an empty steps list.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose an inter-class image generation pipeline... automatic prompt construction strategy... intra-class image augmentation pipeline... three types of edits... automatic text generation
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Study on the effectiveness of synthetic data... three scenarios (S1: No data, S2: Limited data, S3: Abundant data)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966, 2023. 1, 5, 4
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Rasa: Relation and sensitivity aware representation learning for text-based person search
Yang Bai, Min Cao, Daming Gao, Ziqiang Cao, Chen Chen, Zhenfeng Fan, Liqiang Nie, and Min Zhang. Rasa: Relation and sensitivity aware representation learning for text-based person search. arXiv preprint arXiv:2305.13653, 2023. 2, 5, 6, 7
-
[3]
Looking be- yond appearances: Synthetic training data for deep cnns in re-identification
Igor Barros Barbosa, Marco Cristani, Barbara Caputo, Alek- sander Rognhaugen, and Theoharis Theoharis. Looking be- yond appearances: Synthetic training data for deep cnns in re-identification. Computer Vision and Image Understand- ing, 167:50–62, 2018. 2
work page 2018
-
[4]
In- structpix2pix: Learning to follow image editing instructions
Tim Brooks, Aleksander Holynski, and Alexei A Efros. In- structpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 18392–18402, 2023. 7
work page 2023
-
[5]
Masactrl: Tuning-free mu- tual self-attention control for consistent image synthesis and editing
Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, Xi- aohu Qie, and Yinqiang Zheng. Masactrl: Tuning-free mu- tual self-attention control for consistent image synthesis and editing. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 22560–22570, 2023. 5, 7, 1
work page 2023
-
[6]
An empirical study of clip for text-based person search
Min Cao, Yang Bai, Ziyin Zeng, Mang Ye, and Min Zhang. An empirical study of clip for text-based person search. In Proceedings of the AAAI Conference on Artificial Intelli- gence, pages 465–473, 2024. 1, 2, 5, 6
work page 2024
-
[7]
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, et al. Pixart- α: Fast training of diffusion transformer for photorealistic text-to-image synthesis. arXiv preprint arXiv:2310.00426, 2023. 7
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
Improving text-based person search by spatial matching and adaptive threshold
Tianlang Chen, Chenliang Xu, and Jiebo Luo. Improving text-based person search by spatial matching and adaptive threshold. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1879–1887. IEEE, 2018. 2
work page 2018
-
[9]
Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24185–24198, 2024. 5, 8, 2, 4
work page 2024
-
[10]
Noise map guidance: Inversion with spatial context for real image editing
Hansam Cho, Jonghyun Lee, Seoung Bum Kim, Tae-Hyun Oh, and Yonghyun Jeong. Noise map guidance: Inversion with spatial context for real image editing. arXiv preprint arXiv:2402.04625, 2024. 4, 1
-
[11]
Semantically self-aligned network for text-to- image part-aware person re-identification
Zefeng Ding, Changxing Ding, Zhiyin Shao, and Dacheng Tao. Semantically self-aligned network for text-to- image part-aware person re-identification. arXiv preprint arXiv:2107.12666, 2021. 2, 6
-
[12]
Using language to extend to unseen do- mains
Lisa Dunlap, Clara Mohri, Devin Guillory, Han Zhang, Trevor Darrell, Joseph E Gonzalez, Aditi Raghunathan, and Anna Rohrbach. Using language to extend to unseen do- mains. International Conference on Learning Representa- tions (ICLR), 2023. 1, 2
work page 2023
-
[13]
Scaling laws of synthetic images for model training
Lijie Fan, Kaifeng Chen, Dilip Krishnan, Dina Katabi, Phillip Isola, and Yonglong Tian. Scaling laws of synthetic images for model training... for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7382–7392, 2024. 2, 7
work page 2024
-
[14]
Axm-net: Implicit cross-modal fea- ture alignment for person re-identification
Ammarah Farooq, Muhammad Awais, Josef Kittler, and Syed Safwan Khalid. Axm-net: Implicit cross-modal fea- ture alignment for person re-identification. InProceedings of the AAAI Conference on Artificial Intelligence, pages 4477– 4485, 2022. 2
work page 2022
-
[15]
Unsuper- vised pre-training for person re-identification
Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, and Dong Chen. Unsuper- vised pre-training for person re-identification. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14750–14759, 2021. 1, 3
work page 2021
-
[16]
Bilma: Bidirectional local-matching for text-based person re-identification
Takuro Fujii and Shuhei Tarashima. Bilma: Bidirectional local-matching for text-based person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2786–2790, 2023. 7
work page 2023
-
[17]
Semi-supervised text-based person search
Daming Gao, Yang Bai, Min Cao, Hao Dou, Mang Ye, and Min Zhang. Semi-supervised text-based person search. arXiv preprint arXiv:2404.18106, 2024. 6
-
[18]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2014. 2
work page 2014
-
[19]
Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, and Xiaojuan Qi. Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574, 2022. 1, 2
-
[20]
Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt im- age editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022. 4, 5, 7, 1
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[21]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. Advances in Neural Information Processing Systems , 30, 2017. 7
work page 2017
-
[22]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021. 4
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[23]
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, et al. Minicpm: Unveiling the potential of small language models with scalable training strategies. arXiv preprint arXiv:2404.06395, 2024. 5, 4
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[24]
Tag2text: Guiding vision-language model via image tagging
Xinyu Huang, Youcai Zhang, Jinyu Ma, Weiwei Tian, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, and Lei Zhang. Tag2text: Guiding vision-language model via image tagging. arXiv preprint arXiv:2303.05657, 2023. 1
-
[25]
Cross-modal implicit relation rea- soning and aligning for text-to-image person retrieval
Ding Jiang and Mang Ye. Cross-modal implicit relation rea- soning and aligning for text-to-image person retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 2787–2797, 2023. 1, 2, 5, 6, 7
work page 2023
-
[26]
Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification
In `es Hyeonsu Kim, JoungBin Lee, Soowon Son, Woo- jeong Jin, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, et al. Pose- diversified augmentation with diffusion model for person re- identification. arXiv preprint arXiv:2406.16042, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[27]
Person search with natural lan- guage description
Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. Person search with natural lan- guage description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 1970– 1979, 2017. 1, 2, 6
work page 1970
-
[28]
Learning semantic- aligned feature representation for text-based person search
Shiping Li, Min Cao, and Min Zhang. Learning semantic- aligned feature representation for text-based person search. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 2724–2728. IEEE, 2022. 2, 7
work page 2022
-
[29]
Adaptive uncertainty-based learning for text-based person retrieval
Shenshen Li, Chen He, Xing Xu, Fumin Shen, Yang Yang, and Heng Tao Shen. Adaptive uncertainty-based learning for text-based person retrieval. In Proceedings of the AAAI Con- ference on Artificial Intelligence, pages 3172–3180, 2024. 1, 6, 7, 5
work page 2024
-
[30]
Cross-modal adaptive dual association for text-to-image per- son retrieval
Dixuan Lin, Yixing Peng, Jingke Meng, and Wei-Shi Zheng. Cross-modal adaptive dual association for text-to-image per- son retrieval. IEEE Transactions on Multimedia, 2024. 7
work page 2024
-
[31]
Causality-inspired invariant representation learning for text-based person retrieval
Yu Liu, Guihe Qin, Haipeng Chen, Zhiyong Cheng, and Xun Yang. Causality-inspired invariant representation learning for text-based person retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence , pages 14052–14060,
-
[32]
Prodigy: An expeditiously adaptive parameter-free learner.arXiv preprint arXiv:2306.06101, 2023
Konstantin Mishchenko and Aaron Defazio. Prodigy: An expeditiously adaptive parameter-free learner.arXiv preprint arXiv:2306.06101, 2023. 4
-
[33]
Synthesizing efficient data with diffu- sion models for person re-identification pre-training
Ke Niu, Haiyang Yu, Xuelin Qian, Teng Fu, Bin Li, and Xiangyang Xue. Synthesizing efficient data with diffu- sion models for person re-identification pre-training. arXiv preprint arXiv:2406.06045, 2024. 2
-
[34]
Noisy-correspondence learning for text-to-image person re-identification
Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, and Peng Hu. Noisy-correspondence learning for text-to-image person re-identification. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27197–27206, 2024. 6
work page 2024
-
[35]
Learn- ing transferable visual models from natural language super- vision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. In International Conference on Machine Learning , pages 8748–8763. PMLR, 2021. 2
work page 2021
-
[36]
Real-time flying object detection with yolov8.arXiv preprint arXiv:2305.09972, 2023
Dillon Reis, Jordan Kupec, Jacqueline Hong, and Ahmad Daoudi. Real-time flying object detection with yolov8.arXiv preprint arXiv:2305.09972, 2023. 5
-
[37]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022. 1, 4, 7
work page 2022
-
[38]
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500– 22510, 2023. 4
work page 2023
-
[39]
Learning granularity-unified representations for text-to-image person re-identification
Zhiyin Shao, Xinyu Zhang, Meng Fang, Zhifeng Lin, Jian Wang, and Changxing Ding. Learning granularity-unified representations for text-to-image person re-identification. In Proceedings of the 30th Acm International Conference on Multimedia, pages 5566–5574, 2022. 2
work page 2022
-
[40]
Unified pre-training with pseudo texts for text-to-image person re-identification
Zhiyin Shao, Xinyu Zhang, Changxing Ding, Jian Wang, and Jingdong Wang. Unified pre-training with pseudo texts for text-to-image person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 11174–11184, 2023. 1, 2, 3, 6, 7
work page 2023
-
[41]
See finer, see more: Implicit modality alignment for text-based person retrieval
Xiujun Shu, Wei Wen, Haoqian Wu, Keyu Chen, Yiran Song, Ruizhi Qiao, Bo Ren, and Xiao Wang. See finer, see more: Implicit modality alignment for text-based person retrieval. In European Conference on Computer Vision , pages 624–
-
[42]
Springer, 2022. 2, 7
work page 2022
-
[43]
Diverse person: Customize your own dataset for text-based person search
Zifan Song, Guosheng Hu, and Cairong Zhao. Diverse person: Customize your own dataset for text-based person search. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 4943–4951, 2024. 1, 2, 3, 4
work page 2024
-
[44]
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan Sun, Yuxin Fang, Ledell Wu, Xinlong Wang, and Yue Cao. Eva-clip: Improved training techniques for clip at scale. arXiv preprint arXiv:2303.15389, 2023. 5
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[45]
Dissecting person re- identification from the viewpoint of viewpoint
Xiaoxiao Sun and Liang Zheng. Dissecting person re- identification from the viewpoint of viewpoint. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 608–617, 2019. 2, 6, 9
work page 2019
-
[46]
Harnessing the power of mllms for transferable text-to-image person reid
Wentan Tan, Changxing Ding, Jiayu Jiang, Fei Wang, Yib- ing Zhan, and Dapeng Tao. Harnessing the power of mllms for transferable text-to-image person reid. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17127–17137, 2024. 1, 2, 3, 5, 6, 7
work page 2024
-
[47]
Learning attention-guided pyramidal features for few-shot fine-grained recognition
Hao Tang, Chengcheng Yuan, Zechao Li, and Jinhui Tang. Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognition, 130:108792,
-
[48]
Stablerep: Synthetic images from text-to- image models make strong visual representation learners
Yonglong Tian, Lijie Fan, Phillip Isola, Huiwen Chang, and Dilip Krishnan. Stablerep: Synthetic images from text-to- image models make strong visual representation learners. Advances in Neural Information Processing Systems , 36,
-
[49]
Yanan Wang, Shengcai Liao, and Ling Shao. Surpassing real-world source training data: Random 3d characters for generalizable person re-identification. In Proceedings of the 28th ACM International Conference on Multimedia , pages 3422–3430, 2020. 2, 6
work page 2020
-
[50]
Cloning outfits from real-world images to 3d characters for gen- eralizable person re-identification
Yanan Wang, Xuezhi Liang, and Shengcai Liao. Cloning outfits from real-world images to 3d characters for gen- eralizable person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4900–4909, 2022. 2
work page 2022
-
[51]
Vi- taa: Visual-textual attributes alignment in person search by natural language
Zhe Wang, Zhiyuan Fang, Jun Wang, and Yezhou Yang. Vi- taa: Visual-textual attributes alignment in person search by natural language. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pages 402–420. Springer, 2020. 2, 7
work page 2020
-
[52]
Zijie Wang, Aichun Zhu, Jingyi Xue, Xili Wan, Chao Liu, Tian Wang, and Yifeng Li. Look before you leap: Improv- ing text-based person retrieval by learning a consistent cross- modal common manifold. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1984–1992,
work page 1984
-
[53]
Person transfer gan to bridge domain gap for person re- identification
Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. Person transfer gan to bridge domain gap for person re- identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 79–88,
-
[54]
Contrastive transformer learning with proximity data generation for text-based per- son search
Hefeng Wu, Weifeng Chen, Zhibin Liu, Tianshui Chen, Zhiguang Chen, and Liang Lin. Contrastive transformer learning with proximity data generation for text-based per- son search. IEEE Transactions on Circuits and Systems for Video Technology, 2023. 1, 2, 3, 4
work page 2023
-
[55]
Lapscore: language- guided person search via color reasoning
Yushuang Wu, Zizheng Yan, Xiaoguang Han, Guanbin Li, Changqing Zou, and Shuguang Cui. Lapscore: language- guided person search via color reasoning. In Proceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 1624–1633, 2021. 7
work page 2021
-
[56]
Laip: Learning local alignment from image-phrase modeling for text-based person search
Yu Wu, Haiguang Wang, Mengxia Wu, Min Cao, and Min Zhang. Laip: Learning local alignment from image-phrase modeling for text-based person search. In 2024 IEEE Inter- national Conference on Multimedia and Expo (ICME), pages 1–10. IEEE, 2024. 7
work page 2024
-
[57]
Refined knowledge transfer for language-based person search
Ziqiang Wu, Bingpeng Ma, Hong Chang, and Shiguang Shan. Refined knowledge transfer for language-based person search. IEEE Transactions on Multimedia , 25:9315–9329,
-
[58]
Less is more: Learning from synthetic data with fine-grained attributes for person re-identification
Suncheng Xiang, Dahong Qian, Mengyuan Guan, Binjie Yan, Ting Liu, Yuzhuo Fu, and Guanjie You. Less is more: Learning from synthetic data with fine-grained attributes for person re-identification. ACM Transactions on Multime- dia Computing, Communications and Applications , 19(5s): 1–20, 2023. 2
work page 2023
-
[59]
Image-specific information suppression and implicit local alignment for text-based person search
Shuanglin Yan, Hao Tang, Liyan Zhang, and Jinhui Tang. Image-specific information suppression and implicit local alignment for text-based person search. IEEE Transactions on Neural Networks and Learning Systems, 2023. 2
work page 2023
-
[60]
An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. Qwen2 technical report.arXiv preprint arXiv:2407.10671, 2024. 1, 2, 4
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[61]
Shuyu Yang, Yinan Zhou, Zhedong Zheng, Yaxiong Wang, Li Zhu, and Yujiao Wu. Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark. In Proceedings of the 31st ACM International Conference on Multimedia, pages 4492–4501, 2023. 1, 2, 3, 6, 7, 5
work page 2023
-
[62]
Unrealperson: An adaptive pipeline towards costless person re-identification
Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, and Qi Tian. Unrealperson: An adaptive pipeline towards costless person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11506–11515, 2021. 2
work page 2021
-
[63]
Deep cross-modal projection learning for image-text matching
Ying Zhang and Huchuan Lu. Deep cross-modal projection learning for image-text matching. In Proceedings of the Eu- ropean Conference on Computer Vision (ECCV), pages 686– 701, 2018. 2
work page 2018
-
[64]
Joint discriminative and genera- tive learning for person re-identification
Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, and Jan Kautz. Joint discriminative and genera- tive learning for person re-identification. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2138–2147, 2019. 6, 9
work page 2019
-
[65]
Camstyle: A novel data augmentation method for person re-identification
Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, and Yi Yang. Camstyle: A novel data augmentation method for person re-identification. IEEE Transactions on Image Pro- cessing, 28(3):1176–1190, 2018. 2
work page 2018
-
[66]
Dssl: Deep surroundings-person separation learning for text-based per- son retrieval
Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang, Fangqiang Hu, and Gang Hua. Dssl: Deep surroundings-person separation learning for text-based per- son retrieval. In Proceedings of the 29th ACM International Conference on Multimedia, pages 209–217, 2021. 2, 6, 3 An Empirical Study of Validating Synthetic Data for Text-Based Person Retriev...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.