pith. machine review for the scientific record. sign in

arxiv: 2605.05027 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 3 theorem links

· Lean Theorem

Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords lifelong person re-identificationvision-language modelsprompt distillationcontinual learningcatastrophic forgettingsemantic driftdomain adaptation
0
0 comments X

The pith

Anchoring vision models to frozen text encoders solves semantic drift in lifelong person re-identification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that lifelong person re-identification can be improved by anchoring visual learning to a fixed text semantic space from pretrained models. This addresses catastrophic forgetting and semantic drift when models encounter new data domains over time. By distilling prompts asymmetrically, the approach maintains alignment without letting text dominate, while allowing visual adaptation through an adaptive prompt pool. A sympathetic reader would care because real-world person tracking systems must handle evolving camera networks without losing accuracy on prior scenes or requiring massive data storage.

Core claim

PAD is an asymmetric vision-text framework where the frozen text encoder serves as a stable semantic anchor across domains. Prompts are distilled on the textual side to preserve vision-text alignment in a fixed space. On the visual side, an EMA-based teacher with an adaptive prompt pool enables domain-wise adaptation by allocating new slots while freezing past ones.

What carries the argument

Prompt-Anchored Vision-Text Distillation (PAD), an asymmetric framework that decouples vision and text roles using a frozen text encoder as semantic anchor and adaptive prompts for vision.

If this is right

  • Performance is maintained on previously seen domains without storing exemplar images.
  • New domains are incorporated by adding slots to the prompt pool while keeping past slots frozen.
  • Improved generalization to unseen domains compared to visual-only methods.
  • The balance between stability and plasticity is achieved through asymmetric distillation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar text anchoring could extend to other continual vision tasks like object detection or segmentation when descriptions are available.
  • Avoiding storage of old images could support privacy-sensitive lifelong systems in surveillance settings.
  • Scaling the prompt pool to many more domains would test whether slot management remains efficient.

Load-bearing premise

The frozen text encoder in pretrained vision-language models serves as a stable semantic anchor across different visual domains.

What would settle it

Testing PAD on a sequence of domains where text captions no longer match the visual content, then measuring whether accuracy on old domains drops as sharply as in visual-only baselines.

Figures

Figures reproduced from arXiv: 2605.05027 by Hao Chen, Shiliang Zhang, Wen Wen.

Figure 1
Figure 1. Figure 1: Comparison of lifelong ReID paradigms. (a) Exemplar view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed PAD framework. The framework consists of a textual branch (left) and a visual branch (right) that evolve across domains. On the textual side, we use a frozen text encoder and distill the learnable textual prompts (TA-Prompt). On the visual side, we construct a visual prompt (VA-Prompt) pool and train last layers of the image encoder with a two-term visual distillation loss. The tex… view at source ↗
Figure 3
Figure 3. Figure 3: Performance tendency on seen domains (AKA￾order1). After each training step, the model is evaluated on the already-seen domains. contextual tokens. Text-side distillation applies only logit￾level KL loss with weight λtext = 0.5, temperature τ = 0.07, and the learnable scaling factor γ initialized to 7.0. VA-Prompt adopts 6 general + 6 expert tokens per layer (pool size 36, Top-K=4). New expert slots are ac… view at source ↗
Figure 4
Figure 4. Figure 4: Performance tendency on unseen domains (AKA￾order1). After each training step, the performance of all unseen domains is evaluated. from a fully fine-tuned CLIP-ReID [12] baseline without any prompt or distillation, we progressively introduce the freezing scheme, VA-Prompt, textual distillation, and visual distillation view at source ↗
Figure 5
Figure 5. Figure 5: summarizes the trainable ratios on both sides. Only the TA-Prompt is updated on the textual branch, whereas the visual branch optimizes the VA-Prompt, classifier head, and selectively unfrozen backbone blocks. Although the architecture is fixed, the effective trainable ratio varies across domains because identity distributions affect the classifier and prompt-routing parameters that re￾ceive gradients. CUH… view at source ↗
read the original abstract

Lifelong person re-identification (LReID) aims to train a generalizable model with sequentially collected data. However, such models often suffer from semantic drift, limited adaptability, and catastrophic forgetting as new domains emerge. Existing exemplar-free approaches largely rely on visual-only distillation or parameter regularization, while overlooking the potential of auxiliary modalities, such as text, to preserve semantic stability and enable incremental plasticity. We observe that the frozen text encoder in pretrained vision-language models can serve as a stable semantic anchor across domains. To decouple the roles of vision and text, we propose Prompt-Anchored vision-text Distillation (PAD), an asymmetric vision-text framework for semantic alignment and cross-domain generalization. On the textual side, we distill prompts to preserve vision-text alignment under a fixed semantic space, acting as a global semantic reference rather than a dominant learning signal. On the visual side, an EMA-based teacher with an adaptive prompt pool enables domain-wise adaptation by allocating new slots while freezing past ones. Extensive experiments show that PAD substantially outperforms state-of-the-art methods across seen and unseen domains, achieving a strong balance between stability and plasticity. Project page is available at https://github.com/zu-zi/PAD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Prompt-Anchored Vision-Text Distillation (PAD) for lifelong person re-identification (LReID). It observes that a frozen text encoder from pretrained vision-language models can act as a domain-invariant semantic anchor. The method uses asymmetric distillation: on the text side, prompts are distilled to preserve vision-text alignment under a fixed semantic space; on the visual side, an EMA teacher combined with an adaptive prompt pool enables incremental domain adaptation by allocating new prompt slots while freezing prior ones. The central claim is that this framework substantially outperforms existing exemplar-free LReID methods on both seen and unseen domains while achieving a favorable stability-plasticity trade-off.

Significance. If the empirical results hold under rigorous evaluation, the work would represent a meaningful advance in continual learning for person re-identification by demonstrating the utility of multi-modal (vision-text) distillation with frozen anchors to mitigate semantic drift and catastrophic forgetting. It extends prior visual-only regularization approaches and could influence subsequent research on leveraging pretrained VLMs for domain-generalizable incremental tasks.

major comments (1)
  1. Abstract: The central empirical claim that 'PAD substantially outperforms state-of-the-art methods across seen and unseen domains' is asserted without any supporting metrics (e.g., mAP or Rank-1), dataset names, baseline comparisons, or ablation results. This is load-bearing because the paper is an empirical contribution whose value rests entirely on the strength of the experimental evidence; the absence of even summary numbers in the abstract leaves the claim uninspectable from the provided text.
minor comments (2)
  1. §3 (method description): The adaptive prompt pool mechanism is described at a high level; a concrete algorithm or pseudocode for slot allocation and freezing would improve reproducibility.
  2. The project page URL is given but no supplementary material or code release details are mentioned in the text; confirming open-source availability would strengthen the contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the concern point by point below and agree that revisions are warranted to strengthen the presentation of our empirical results.

read point-by-point responses
  1. Referee: Abstract: The central empirical claim that 'PAD substantially outperforms state-of-the-art methods across seen and unseen domains' is asserted without any supporting metrics (e.g., mAP or Rank-1), dataset names, baseline comparisons, or ablation results. This is load-bearing because the paper is an empirical contribution whose value rests entirely on the strength of the experimental evidence; the absence of even summary numbers in the abstract leaves the claim uninspectable from the provided text.

    Authors: We agree that the abstract would be strengthened by including concrete quantitative support for the performance claim. In the revised manuscript, we will update the abstract to concisely incorporate key results from our experiments, such as representative mAP and Rank-1 scores on seen domains (e.g., standard benchmarks like Market-1501) and unseen domains, along with brief indications of outperformance relative to leading baselines. This will make the central empirical contribution immediately verifiable while preserving the abstract's length and focus. We note that the full paper already contains detailed tables, ablations, and comparisons; the revision simply surfaces summary evidence at the abstract level. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes an empirical method (PAD) that builds on external pretrained vision-language models and introduces an asymmetric distillation framework with EMA teacher and adaptive prompt pool. No equations, derivations, or first-principles predictions are presented that reduce by construction to fitted parameters or self-referential definitions. The central claims rest on experimental validation across domains rather than any load-bearing self-citation chain or ansatz smuggled via prior work. The frozen text encoder is treated as an external stable anchor, not derived internally. This is a standard non-circular method paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that pretrained vision-language models provide a fixed semantic space usable as an anchor, with no free parameters or invented entities explicitly listed in the abstract.

axioms (1)
  • domain assumption Frozen text encoder serves as a stable semantic anchor across domains
    Invoked to justify decoupling vision and text roles and using text as global reference.

pith-pipeline@v0.9.0 · 5514 in / 1111 out tokens · 41998 ms · 2026-05-08T17:52:54.783821+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Anti-forgetting adaptation for unsupervised person re-identification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(2):1056–1072, 2025

    Hao Chen, Francois Bremond, Nicu Sebe, and Shiliang Zhang. Anti-forgetting adaptation for unsupervised person re-identification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(2):1056–1072, 2025

  2. [2]

    Learning continual compatible representation for re-indexing free lifelong person re-identification

    Zhenyu Cui, Jiahuan Zhou, Xun Wang, Manyu Zhu, and Yuxin Peng. Learning continual compatible representation for re-indexing free lifelong person re-identification. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16614–16623, 2024

  3. [3]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

  4. [4]

    Viewpoint invariant pedestrian recognition with an ensemble of localized features

    Douglas Gray and Hai Tao. Viewpoint invariant pedestrian recognition with an ensemble of localized features. InECCV, 2008

  5. [5]

    In de- fense of the triplet loss for person re-identification.arXiv preprint arXiv:1703.07737, 2017

    Alexander Hermans, Lucas Beyer, and Bastian Leibe. In de- fense of the triplet loss for person re-identification.arXiv preprint arXiv:1703.07737, 2017

  6. [6]

    Roth, and Horst Bischof

    Martin Hirzer, Csaba Beleznai, Peter M. Roth, and Horst Bischof. Person Re-Identification by Descriptive and Dis- criminative Classification. InProc. Scandinavian Confer- ence on Image Analysis (SCIA), 2011

  7. [7]

    Rainbowprompt: Diversity-enhanced prompt-evolving for continual learning

    Kiseong Hong, Gyeong-hyeon Kim, and Eunwoo Kim. Rainbowprompt: Diversity-enhanced prompt-evolving for continual learning. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 1130–1140, 2025

  8. [8]

    Lifelong unsupervised domain adaptive person re-identification with coordinated anti-forgetting and adaptation

    Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, and Zheng-jun Zha. Lifelong unsupervised domain adaptive person re-identification with coordinated anti-forgetting and adaptation. InCVPR, 2022

  9. [9]

    Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

    Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

  10. [10]

    Fcs: Feature calibration and separation for non-exemplar class incremen- tal learning

    Qiwei Li, Yuxin Peng, and Jiahuan Zhou. Fcs: Feature calibration and separation for non-exemplar class incremen- tal learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 28495– 28504, 2024

  11. [11]

    Exemplar-free lifelong person re-identification via prompt- guided adaptive knowledge consolidation.International Journal of Computer Vision, pages 1–16, 2024

    Qiwei Li, Kunlun Xu, Yuxin Peng, and Jiahuan Zhou. Exemplar-free lifelong person re-identification via prompt- guided adaptive knowledge consolidation.International Journal of Computer Vision, pages 1–16, 2024

  12. [12]

    Clip-reid: Exploiting vision-language model for image re-identification without concrete text labels

    Siyuan Li, Li Sun, and Qingli Li. Clip-reid: Exploiting vision-language model for image re-identification without concrete text labels. InAAAI, 2023

  13. [13]

    Li and Xiaogang Wang

    W. Li and Xiaogang Wang. Locally aligned feature trans- forms across views.CVPR, 2013

  14. [14]

    Li, Rui Zhao, and Xiaogang Wang

    W. Li, Rui Zhao, and Xiaogang Wang. Human reidentifica- tion with transferred metric learning. InACCV, 2012

  15. [15]

    Deep- reid: Deep filter pairing neural network for person re- identification.CVPR, 2014

    Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. Deep- reid: Deep filter pairing neural network for person re- identification.CVPR, 2014

  16. [16]

    Learning without forgetting

    Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelli- gence, 40(12):2935–2947, 2017

  17. [17]

    Distribution-aware for- getting compensation for exemplar-free lifelong person re- identification.arXiv preprint arXiv:2504.15041, 2025

    Shiben Liu, Huijie Fan, Qiang Wang, Baojie Fan, Yan- dong Tang, and Liangqiong Qu. Distribution-aware for- getting compensation for exemplar-free lifelong person re- identification.arXiv preprint arXiv:2504.15041, 2025

  18. [18]

    Xiang, and S

    Chen Change Loy, T. Xiang, and S. Gong. Multi-camera activity correlation analysis. InCVPR, 2009

  19. [19]

    Bakker, and Michael S

    Nan Pu, Wei Chen, Yu Liu, Erwin M. Bakker, and Michael S. Lew. Lifelong person re-identification via adaptive knowl- edge accumulation. InCVPR, 2021

  20. [20]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InICML, 2021

  21. [21]

    Performance measures and a data set for multi-target, multi-camera tracking

    Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. Performance measures and a data set for multi-target, multi-camera tracking. InECCV workshops, 2016

  22. [22]

    Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning

    James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, and Zsolt Kira. Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 11909–11919, 2023

  23. [23]

    Patch-based knowledge dis- tillation for lifelong person re-identification

    Zhicheng Sun and Yadong Mu. Patch-based knowledge dis- tillation for lifelong person re-identification. InProceedings of the 30th ACM International Conference on Multimedia, pages 696–707, 2022

  24. [24]

    Attriclip: A non-incremental learner for incremental knowledge learning

    Runqi Wang, Xiaoyue Duan, Guoliang Kang, Jianzhuang Liu, Shaohui Lin, Songcen Xu, Jinhu L ¨u, and Baochang Zhang. Attriclip: A non-incremental learner for incremental knowledge learning. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 3654–3663, 2023

  25. [25]

    S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning.Advances in Neural Informa- tion Processing Systems, 35:5682–5695, 2022

    Yabin Wang, Zhiwu Huang, and Xiaopeng Hong. S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning.Advances in Neural Informa- tion Processing Systems, 35:5682–5695, 2022

  26. [26]

    Dualprompt: Complementary prompting for rehearsal-free continual learning

    Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vin- cent Perot, Jennifer Dy, et al. Dualprompt: Complementary prompting for rehearsal-free continual learning. InECCV, 2022

  27. [27]

    Learning to prompt for continual learning

    Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jen- nifer Dy, and Tomas Pfister. Learning to prompt for continual learning. InCVPR, 2022

  28. [28]

    Person transfer gan to bridge domain gap for person re- identification

    Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. Person transfer gan to bridge domain gap for person re- identification. InCVPR, 2018

  29. [29]

    Generalising without forget- ting for lifelong person re-identification

    Guile Wu and Shaogang Gong. Generalising without forget- ting for lifelong person re-identification. InAAAI, 2021

  30. [30]

    Joint detection and identification feature learn- ing for person search.CVPR, 2017

    Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiao- gang Wang. Joint detection and identification feature learn- ing for person search.CVPR, 2017

  31. [31]

    Distribution-aware knowledge prototyping for non-exemplar lifelong person re-identification

    Kunlun Xu, Xu Zou, Yuxin Peng, and Jiahuan Zhou. Distribution-aware knowledge prototyping for non-exemplar lifelong person re-identification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16604–16613, 2024

  32. [32]

    Lstkc: Long short-term knowledge consolidation for lifelong person re- identification

    Kunlun Xu, Xu Zou, and Jiahuan Zhou. Lstkc: Long short-term knowledge consolidation for lifelong person re- identification. InAAAI, 2024

  33. [33]

    Dask: Distribution rehearsing via adaptive style kernel learning for exemplar-free lifelong person re- identification

    Kunlun Xu, Chenghao Jiang, Peixi Xiong, Yuxin Peng, and Jiahuan Zhou. Dask: Distribution rehearsing via adaptive style kernel learning for exemplar-free lifelong person re- identification. InProceedings of the AAAI Conference on Artificial Intelligence, pages 8915–8923, 2025

  34. [34]

    Self-reinforcing prototype evolution with dual- knowledge cooperation for semi-supervised lifelong person re-identification

    Kunlun Xu, Fan Zhuo, Jiangmeng Li, Xu Zou, and Ji- ahuan Zhou. Self-reinforcing prototype evolution with dual- knowledge cooperation for semi-supervised lifelong person re-identification. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, 2025

  35. [35]

    A pedestrian is worth one prompt: Towards language guidance person re-identification

    Zexian Yang, Dayan Wu, Chenming Wu, Zheng Lin, Jingzi Gu, and Weiping Wang. A pedestrian is worth one prompt: Towards language guidance person re-identification. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17343–17353, 2024

  36. [36]

    Lifelong person re-identification via knowledge re- freshing and consolidation

    Chunlin Yu, Ye Shi, Zimo Liu, Shenghua Gao, and Jingya Wang. Lifelong person re-identification via knowledge re- freshing and consolidation. InAAAI, 2023

  37. [37]

    Tf-clip: Learning text-free clip for video- based person re-identification

    Chenyang Yu, Xuehu Liu, Yingquan Wang, Pingping Zhang, and Huchuan Lu. Tf-clip: Learning text-free clip for video- based person re-identification. InAAAI, 2024

  38. [38]

    Multi-prompts learning with cross-modal alignment for attribute-based person re-identification

    Yajing Zhai, Yawen Zeng, Zhiyong Huang, Zheng Qin, Xin Jin, and Da Cao. Multi-prompts learning with cross-modal alignment for attribute-based person re-identification. In AAAI, 2024

  39. [39]

    Spindle net: Person re-identification with human body region guided feature decomposition and fusion.CVPR, 2017

    Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, and Xiaoou Tang. Spindle net: Person re-identification with human body region guided feature decomposition and fusion.CVPR, 2017

  40. [40]

    Scalable person re-identification: A benchmark.ICCV, 2015

    Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jing- dong Wang, and Qi Tian. Scalable person re-identification: A benchmark.ICCV, 2015

  41. [41]

    Associat- ing groups of people

    Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. Associat- ing groups of people. InBMVC, 2009

  42. [42]

    External knowledge injection for clip-based class-incremental learning

    Da-Wei Zhou, Kai-Wen Li, Jingyi Ning, Han-Jia Ye, Lijun Zhang, and De-Chuan Zhan. External knowledge injection for clip-based class-incremental learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3314–3325, 2025

  43. [43]

    Distribution-aware knowledge aligning and prototyp- ing for non-exemplar lifelong person re-identification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Jiahuan Zhou, Kunlun Xu, Fan Zhuo, Xu Zou, and Yuxin Peng. Distribution-aware knowledge aligning and prototyp- ing for non-exemplar lifelong person re-identification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  44. [44]

    pure-weight

    Xiaohan Zou, Wenchao Ma, and Shu Zhao. Learning con- ditional space-time prompt distributions for video class- incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4862–4873, 2025. Prompt-Anchored Vision–Text Distillation for Lifelong Person Re-identification Supplementary Material Overvie...