arxiv: 2605.05027 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 3 theorem links

· Lean Theorem

Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification

Wen Wen , Hao Chen , Shiliang Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords lifelong person re-identificationvision-language modelsprompt distillationcontinual learningcatastrophic forgettingsemantic driftdomain adaptation

0 comments

The pith

Anchoring vision models to frozen text encoders solves semantic drift in lifelong person re-identification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that lifelong person re-identification can be improved by anchoring visual learning to a fixed text semantic space from pretrained models. This addresses catastrophic forgetting and semantic drift when models encounter new data domains over time. By distilling prompts asymmetrically, the approach maintains alignment without letting text dominate, while allowing visual adaptation through an adaptive prompt pool. A sympathetic reader would care because real-world person tracking systems must handle evolving camera networks without losing accuracy on prior scenes or requiring massive data storage.

Core claim

PAD is an asymmetric vision-text framework where the frozen text encoder serves as a stable semantic anchor across domains. Prompts are distilled on the textual side to preserve vision-text alignment in a fixed space. On the visual side, an EMA-based teacher with an adaptive prompt pool enables domain-wise adaptation by allocating new slots while freezing past ones.

What carries the argument

Prompt-Anchored Vision-Text Distillation (PAD), an asymmetric framework that decouples vision and text roles using a frozen text encoder as semantic anchor and adaptive prompts for vision.

If this is right

Performance is maintained on previously seen domains without storing exemplar images.
New domains are incorporated by adding slots to the prompt pool while keeping past slots frozen.
Improved generalization to unseen domains compared to visual-only methods.
The balance between stability and plasticity is achieved through asymmetric distillation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar text anchoring could extend to other continual vision tasks like object detection or segmentation when descriptions are available.
Avoiding storage of old images could support privacy-sensitive lifelong systems in surveillance settings.
Scaling the prompt pool to many more domains would test whether slot management remains efficient.

Load-bearing premise

The frozen text encoder in pretrained vision-language models serves as a stable semantic anchor across different visual domains.

What would settle it

Testing PAD on a sequence of domains where text captions no longer match the visual content, then measuring whether accuracy on old domains drops as sharply as in visual-only baselines.

Figures

Figures reproduced from arXiv: 2605.05027 by Hao Chen, Shiliang Zhang, Wen Wen.

**Figure 1.** Figure 1: Comparison of lifelong ReID paradigms. (a) Exemplar view at source ↗

**Figure 2.** Figure 2: Overview of the proposed PAD framework. The framework consists of a textual branch (left) and a visual branch (right) that evolve across domains. On the textual side, we use a frozen text encoder and distill the learnable textual prompts (TA-Prompt). On the visual side, we construct a visual prompt (VA-Prompt) pool and train last layers of the image encoder with a two-term visual distillation loss. The tex… view at source ↗

**Figure 3.** Figure 3: Performance tendency on seen domains (AKAorder1). After each training step, the model is evaluated on the already-seen domains. contextual tokens. Text-side distillation applies only logitlevel KL loss with weight λtext = 0.5, temperature τ = 0.07, and the learnable scaling factor γ initialized to 7.0. VA-Prompt adopts 6 general + 6 expert tokens per layer (pool size 36, Top-K=4). New expert slots are ac… view at source ↗

**Figure 4.** Figure 4: Performance tendency on unseen domains (AKAorder1). After each training step, the performance of all unseen domains is evaluated. from a fully fine-tuned CLIP-ReID [12] baseline without any prompt or distillation, we progressively introduce the freezing scheme, VA-Prompt, textual distillation, and visual distillation view at source ↗

**Figure 5.** Figure 5: summarizes the trainable ratios on both sides. Only the TA-Prompt is updated on the textual branch, whereas the visual branch optimizes the VA-Prompt, classifier head, and selectively unfrozen backbone blocks. Although the architecture is fixed, the effective trainable ratio varies across domains because identity distributions affect the classifier and prompt-routing parameters that receive gradients. CUH… view at source ↗

read the original abstract

Lifelong person re-identification (LReID) aims to train a generalizable model with sequentially collected data. However, such models often suffer from semantic drift, limited adaptability, and catastrophic forgetting as new domains emerge. Existing exemplar-free approaches largely rely on visual-only distillation or parameter regularization, while overlooking the potential of auxiliary modalities, such as text, to preserve semantic stability and enable incremental plasticity. We observe that the frozen text encoder in pretrained vision-language models can serve as a stable semantic anchor across domains. To decouple the roles of vision and text, we propose Prompt-Anchored vision-text Distillation (PAD), an asymmetric vision-text framework for semantic alignment and cross-domain generalization. On the textual side, we distill prompts to preserve vision-text alignment under a fixed semantic space, acting as a global semantic reference rather than a dominant learning signal. On the visual side, an EMA-based teacher with an adaptive prompt pool enables domain-wise adaptation by allocating new slots while freezing past ones. Extensive experiments show that PAD substantially outperforms state-of-the-art methods across seen and unseen domains, achieving a strong balance between stability and plasticity. Project page is available at https://github.com/zu-zi/PAD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PAD introduces an asymmetric vision-text distillation anchored by a frozen text encoder and adaptive visual prompt pool for lifelong ReID, with no major internal contradictions but empirical details needed to judge the claimed gains.

read the letter

The main point is that this paper proposes Prompt-Anchored vision-text Distillation (PAD) for lifelong person re-identification. It freezes the text encoder from a pretrained vision-language model as a stable semantic reference, distills prompts on the text side to maintain alignment, and pairs that with an EMA teacher plus an adaptive prompt pool on the vision side that adds new slots for incoming domains while keeping old ones fixed. This asymmetric design is the actual novelty, moving past the visual-only distillation or simple regularization baselines that the abstract flags as limited. The framing around semantic drift and the stability-plasticity trade-off is clear and practical for sequential domain collection in ReID. The method description stays internally consistent: the text side acts as a fixed anchor rather than a competing signal, and the visual side handles incremental adaptation without obvious circularity or unstated bounds on domain shift. The soft spot is the strength of the central claim. The abstract states that PAD substantially outperforms SOTA across seen and unseen domains, yet supplies no numbers, dataset breakdowns, or ablation results in the provided summary. Without those, it is hard to tell how much the prompt anchoring actually drives the balance versus the adaptive pool mechanics. The assumption that the frozen text encoder supplies reliable domain-invariant semantics is plausible on paper and does not create contradictions in the sketch, but it would need explicit checks against larger shifts. This work is for people already working on lifelong or incremental learning in computer vision, especially those open to multimodal anchors. It is worth sending to a serious referee because the problem is real, the technical split is well-motivated, and the approach is reproducible enough to test.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Prompt-Anchored Vision-Text Distillation (PAD) for lifelong person re-identification (LReID). It observes that a frozen text encoder from pretrained vision-language models can act as a domain-invariant semantic anchor. The method uses asymmetric distillation: on the text side, prompts are distilled to preserve vision-text alignment under a fixed semantic space; on the visual side, an EMA teacher combined with an adaptive prompt pool enables incremental domain adaptation by allocating new prompt slots while freezing prior ones. The central claim is that this framework substantially outperforms existing exemplar-free LReID methods on both seen and unseen domains while achieving a favorable stability-plasticity trade-off.

Significance. If the empirical results hold under rigorous evaluation, the work would represent a meaningful advance in continual learning for person re-identification by demonstrating the utility of multi-modal (vision-text) distillation with frozen anchors to mitigate semantic drift and catastrophic forgetting. It extends prior visual-only regularization approaches and could influence subsequent research on leveraging pretrained VLMs for domain-generalizable incremental tasks.

major comments (1)

Abstract: The central empirical claim that 'PAD substantially outperforms state-of-the-art methods across seen and unseen domains' is asserted without any supporting metrics (e.g., mAP or Rank-1), dataset names, baseline comparisons, or ablation results. This is load-bearing because the paper is an empirical contribution whose value rests entirely on the strength of the experimental evidence; the absence of even summary numbers in the abstract leaves the claim uninspectable from the provided text.

minor comments (2)

§3 (method description): The adaptive prompt pool mechanism is described at a high level; a concrete algorithm or pseudocode for slot allocation and freezing would improve reproducibility.
The project page URL is given but no supplementary material or code release details are mentioned in the text; confirming open-source availability would strengthen the contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the concern point by point below and agree that revisions are warranted to strengthen the presentation of our empirical results.

read point-by-point responses

Referee: Abstract: The central empirical claim that 'PAD substantially outperforms state-of-the-art methods across seen and unseen domains' is asserted without any supporting metrics (e.g., mAP or Rank-1), dataset names, baseline comparisons, or ablation results. This is load-bearing because the paper is an empirical contribution whose value rests entirely on the strength of the experimental evidence; the absence of even summary numbers in the abstract leaves the claim uninspectable from the provided text.

Authors: We agree that the abstract would be strengthened by including concrete quantitative support for the performance claim. In the revised manuscript, we will update the abstract to concisely incorporate key results from our experiments, such as representative mAP and Rank-1 scores on seen domains (e.g., standard benchmarks like Market-1501) and unseen domains, along with brief indications of outperformance relative to leading baselines. This will make the central empirical contribution immediately verifiable while preserving the abstract's length and focus. We note that the full paper already contains detailed tables, ablations, and comparisons; the revision simply surfaces summary evidence at the abstract level. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes an empirical method (PAD) that builds on external pretrained vision-language models and introduces an asymmetric distillation framework with EMA teacher and adaptive prompt pool. No equations, derivations, or first-principles predictions are presented that reduce by construction to fitted parameters or self-referential definitions. The central claims rest on experimental validation across domains rather than any load-bearing self-citation chain or ansatz smuggled via prior work. The frozen text encoder is treated as an external stable anchor, not derived internally. This is a standard non-circular method paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that pretrained vision-language models provide a fixed semantic space usable as an anchor, with no free parameters or invented entities explicitly listed in the abstract.

axioms (1)

domain assumption Frozen text encoder serves as a stable semantic anchor across domains
Invoked to justify decoupling vision and text roles and using text as global reference.

pith-pipeline@v0.9.0 · 5514 in / 1111 out tokens · 41998 ms · 2026-05-08T17:52:54.783821+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost (J-cost J(x)=½(x+x⁻¹)−1) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The overall training objective integrates contrastive alignment, identity classification, metric learning, and vision–text knowledge distillation: L_overall = L_supcon + L_ID + L_triplet + L_KD ... λ_text=0.5, τ=0.07, γ initialized to 7.0 ... α=0.997 ... λ_feat=λ_logit=0.5, τ=4.0.
IndisputableMonolith/Foundation/DimensionForcing (8-tick period, D=3) alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PAD adopts 6 general + 6 expert tokens per layer (pool size 36, Top-K=4). New expert slots are activated per domain, while all previous slots are frozen.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Anti-forgetting adaptation for unsupervised person re-identification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(2):1056–1072, 2025

Hao Chen, Francois Bremond, Nicu Sebe, and Shiliang Zhang. Anti-forgetting adaptation for unsupervised person re-identification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(2):1056–1072, 2025

2025
[2]

Learning continual compatible representation for re-indexing free lifelong person re-identification

Zhenyu Cui, Jiahuan Zhou, Xun Wang, Manyu Zhu, and Yuxin Peng. Learning continual compatible representation for re-indexing free lifelong person re-identification. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16614–16623, 2024

2024
[3]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review arXiv 2010
[4]

Viewpoint invariant pedestrian recognition with an ensemble of localized features

Douglas Gray and Hai Tao. Viewpoint invariant pedestrian recognition with an ensemble of localized features. InECCV, 2008

2008
[5]

In de- fense of the triplet loss for person re-identification.arXiv preprint arXiv:1703.07737, 2017

Alexander Hermans, Lucas Beyer, and Bastian Leibe. In de- fense of the triplet loss for person re-identification.arXiv preprint arXiv:1703.07737, 2017

work page arXiv 2017
[6]

Roth, and Horst Bischof

Martin Hirzer, Csaba Beleznai, Peter M. Roth, and Horst Bischof. Person Re-Identification by Descriptive and Dis- criminative Classification. InProc. Scandinavian Confer- ence on Image Analysis (SCIA), 2011

2011
[7]

Rainbowprompt: Diversity-enhanced prompt-evolving for continual learning

Kiseong Hong, Gyeong-hyeon Kim, and Eunwoo Kim. Rainbowprompt: Diversity-enhanced prompt-evolving for continual learning. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 1130–1140, 2025

2025
[8]

Lifelong unsupervised domain adaptive person re-identification with coordinated anti-forgetting and adaptation

Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, and Zheng-jun Zha. Lifelong unsupervised domain adaptive person re-identification with coordinated anti-forgetting and adaptation. InCVPR, 2022

2022
[9]

Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

2020
[10]

Fcs: Feature calibration and separation for non-exemplar class incremen- tal learning

Qiwei Li, Yuxin Peng, and Jiahuan Zhou. Fcs: Feature calibration and separation for non-exemplar class incremen- tal learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 28495– 28504, 2024

2024
[11]

Exemplar-free lifelong person re-identification via prompt- guided adaptive knowledge consolidation.International Journal of Computer Vision, pages 1–16, 2024

Qiwei Li, Kunlun Xu, Yuxin Peng, and Jiahuan Zhou. Exemplar-free lifelong person re-identification via prompt- guided adaptive knowledge consolidation.International Journal of Computer Vision, pages 1–16, 2024

2024
[12]

Clip-reid: Exploiting vision-language model for image re-identification without concrete text labels

Siyuan Li, Li Sun, and Qingli Li. Clip-reid: Exploiting vision-language model for image re-identification without concrete text labels. InAAAI, 2023

2023
[13]

Li and Xiaogang Wang

W. Li and Xiaogang Wang. Locally aligned feature trans- forms across views.CVPR, 2013

2013
[14]

Li, Rui Zhao, and Xiaogang Wang

W. Li, Rui Zhao, and Xiaogang Wang. Human reidentifica- tion with transferred metric learning. InACCV, 2012

2012
[15]

Deep- reid: Deep filter pairing neural network for person re- identification.CVPR, 2014

Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. Deep- reid: Deep filter pairing neural network for person re- identification.CVPR, 2014

2014
[16]

Learning without forgetting

Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelli- gence, 40(12):2935–2947, 2017

2017
[17]

Distribution-aware for- getting compensation for exemplar-free lifelong person re- identification.arXiv preprint arXiv:2504.15041, 2025

Shiben Liu, Huijie Fan, Qiang Wang, Baojie Fan, Yan- dong Tang, and Liangqiong Qu. Distribution-aware for- getting compensation for exemplar-free lifelong person re- identification.arXiv preprint arXiv:2504.15041, 2025

work page arXiv 2025
[18]

Xiang, and S

Chen Change Loy, T. Xiang, and S. Gong. Multi-camera activity correlation analysis. InCVPR, 2009

2009
[19]

Bakker, and Michael S

Nan Pu, Wei Chen, Yu Liu, Erwin M. Bakker, and Michael S. Lew. Lifelong person re-identification via adaptive knowl- edge accumulation. InCVPR, 2021

2021
[20]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InICML, 2021

2021
[21]

Performance measures and a data set for multi-target, multi-camera tracking

Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. Performance measures and a data set for multi-target, multi-camera tracking. InECCV workshops, 2016

2016
[22]

Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning

James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, and Zsolt Kira. Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 11909–11919, 2023

2023
[23]

Patch-based knowledge dis- tillation for lifelong person re-identification

Zhicheng Sun and Yadong Mu. Patch-based knowledge dis- tillation for lifelong person re-identification. InProceedings of the 30th ACM International Conference on Multimedia, pages 696–707, 2022

2022
[24]

Attriclip: A non-incremental learner for incremental knowledge learning

Runqi Wang, Xiaoyue Duan, Guoliang Kang, Jianzhuang Liu, Shaohui Lin, Songcen Xu, Jinhu L ¨u, and Baochang Zhang. Attriclip: A non-incremental learner for incremental knowledge learning. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 3654–3663, 2023

2023
[25]

S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning.Advances in Neural Informa- tion Processing Systems, 35:5682–5695, 2022

Yabin Wang, Zhiwu Huang, and Xiaopeng Hong. S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning.Advances in Neural Informa- tion Processing Systems, 35:5682–5695, 2022

2022
[26]

Dualprompt: Complementary prompting for rehearsal-free continual learning

Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vin- cent Perot, Jennifer Dy, et al. Dualprompt: Complementary prompting for rehearsal-free continual learning. InECCV, 2022

2022
[27]

Learning to prompt for continual learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jen- nifer Dy, and Tomas Pfister. Learning to prompt for continual learning. InCVPR, 2022

2022
[28]

Person transfer gan to bridge domain gap for person re- identification

Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. Person transfer gan to bridge domain gap for person re- identification. InCVPR, 2018

2018
[29]

Generalising without forget- ting for lifelong person re-identification

Guile Wu and Shaogang Gong. Generalising without forget- ting for lifelong person re-identification. InAAAI, 2021

2021
[30]

Joint detection and identification feature learn- ing for person search.CVPR, 2017

Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiao- gang Wang. Joint detection and identification feature learn- ing for person search.CVPR, 2017

2017
[31]

Distribution-aware knowledge prototyping for non-exemplar lifelong person re-identification

Kunlun Xu, Xu Zou, Yuxin Peng, and Jiahuan Zhou. Distribution-aware knowledge prototyping for non-exemplar lifelong person re-identification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16604–16613, 2024

2024
[32]

Lstkc: Long short-term knowledge consolidation for lifelong person re- identification

Kunlun Xu, Xu Zou, and Jiahuan Zhou. Lstkc: Long short-term knowledge consolidation for lifelong person re- identification. InAAAI, 2024

2024
[33]

Dask: Distribution rehearsing via adaptive style kernel learning for exemplar-free lifelong person re- identification

Kunlun Xu, Chenghao Jiang, Peixi Xiong, Yuxin Peng, and Jiahuan Zhou. Dask: Distribution rehearsing via adaptive style kernel learning for exemplar-free lifelong person re- identification. InProceedings of the AAAI Conference on Artificial Intelligence, pages 8915–8923, 2025

2025
[34]

Self-reinforcing prototype evolution with dual- knowledge cooperation for semi-supervised lifelong person re-identification

Kunlun Xu, Fan Zhuo, Jiangmeng Li, Xu Zou, and Ji- ahuan Zhou. Self-reinforcing prototype evolution with dual- knowledge cooperation for semi-supervised lifelong person re-identification. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, 2025

2025
[35]

A pedestrian is worth one prompt: Towards language guidance person re-identification

Zexian Yang, Dayan Wu, Chenming Wu, Zheng Lin, Jingzi Gu, and Weiping Wang. A pedestrian is worth one prompt: Towards language guidance person re-identification. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17343–17353, 2024

2024
[36]

Lifelong person re-identification via knowledge re- freshing and consolidation

Chunlin Yu, Ye Shi, Zimo Liu, Shenghua Gao, and Jingya Wang. Lifelong person re-identification via knowledge re- freshing and consolidation. InAAAI, 2023

2023
[37]

Tf-clip: Learning text-free clip for video- based person re-identification

Chenyang Yu, Xuehu Liu, Yingquan Wang, Pingping Zhang, and Huchuan Lu. Tf-clip: Learning text-free clip for video- based person re-identification. InAAAI, 2024

2024
[38]

Multi-prompts learning with cross-modal alignment for attribute-based person re-identification

Yajing Zhai, Yawen Zeng, Zhiyong Huang, Zheng Qin, Xin Jin, and Da Cao. Multi-prompts learning with cross-modal alignment for attribute-based person re-identification. In AAAI, 2024

2024
[39]

Spindle net: Person re-identification with human body region guided feature decomposition and fusion.CVPR, 2017

Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, and Xiaoou Tang. Spindle net: Person re-identification with human body region guided feature decomposition and fusion.CVPR, 2017

2017
[40]

Scalable person re-identification: A benchmark.ICCV, 2015

Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jing- dong Wang, and Qi Tian. Scalable person re-identification: A benchmark.ICCV, 2015

2015
[41]

Associat- ing groups of people

Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. Associat- ing groups of people. InBMVC, 2009

2009
[42]

External knowledge injection for clip-based class-incremental learning

Da-Wei Zhou, Kai-Wen Li, Jingyi Ning, Han-Jia Ye, Lijun Zhang, and De-Chuan Zhan. External knowledge injection for clip-based class-incremental learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3314–3325, 2025

2025
[43]

Distribution-aware knowledge aligning and prototyp- ing for non-exemplar lifelong person re-identification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Jiahuan Zhou, Kunlun Xu, Fan Zhuo, Xu Zou, and Yuxin Peng. Distribution-aware knowledge aligning and prototyp- ing for non-exemplar lifelong person re-identification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2025
[44]

pure-weight

Xiaohan Zou, Wenchao Ma, and Shu Zhao. Learning con- ditional space-time prompt distributions for video class- incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4862–4873, 2025. Prompt-Anchored Vision–Text Distillation for Lifelong Person Re-identification Supplementary Material Overvie...

2025