SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification

Huchuan Lu; Lixin Wang; Pingping Zhang; Xiang Hu; Yuhao Wang

arxiv: 2504.09549 · v3 · pith:OIPX7FVGnew · submitted 2025-04-13 · 💻 cs.CV

SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification

Yuhao Wang , Xiang Hu , Lixin Wang , Pingping Zhang , Huchuan Lu This is my paper

Pith reviewed 2026-05-22 19:31 UTC · model grok-4.3

classification 💻 cs.CV

keywords aerial-ground person re-identificationstable diffusiongenerative modelsview-aware featuresperson re-idview refinementcomputer vision

0 comments

The pith

Fine-tuning Stable Diffusion on identity and view conditions from a ViT model generates view-mimicking features that improve aerial-ground person re-identification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes SD-ReID, a generative framework that trains a ViT-based extractor to capture identity and view conditions, then fine-tunes Stable Diffusion to produce features mimicking different camera perspectives while keeping identity information intact. This contrasts with prior methods that focus only on making representations invariant to viewpoint changes. A View-Refined Decoder integrates instance-level details with global features, and the combined representations are used for retrieval. Experiments across five benchmarks show gains in matching persons between aerial and ground views. If the approach holds, it offers a way to leverage generative models to handle extreme viewpoint gaps without discarding view-specific information.

Core claim

The authors claim that extracting controllable identity and view conditions via a ViT-based model, using those conditions to fine-tune Stable Diffusion for enhanced person representations, and applying a View-Refined Decoder to merge instance-level and global-level features yields improved retrieval of specific persons across aerial and ground cameras on the CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR, and G2APS-ReID datasets.

What carries the argument

The fine-tuned Stable Diffusion model guided by identity and view conditions extracted from a ViT-based model, together with the View-Refined Decoder that integrates instance-level and global-level features.

Load-bearing premise

Fine-tuning Stable Diffusion with identity and view conditions extracted by a ViT-based model produces view-mimicking features that improve rather than degrade identity discrimination, and the View-Refined Decoder integrates instance-level and global-level features without introducing new inconsistencies.

What would settle it

If adding the generated view-mimicking features and View-Refined Decoder outputs lowers retrieval accuracy on the five AG-ReID benchmarks relative to the ViT baseline alone, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2504.09549 by Huchuan Lu, Lixin Wang, Pingping Zhang, Xiang Hu, Yuhao Wang.

**Figure 2.** Figure 2: Overall framework of the proposed SD-ReID. In the first stage, a view-aware Transformer encoder extracts person representations ˜ [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Details of the condition learner based on aerial input. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Inference process from aerial input to ground view feature generation. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Performance comparison with different numbers of identity conditions [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Detailed structures of different VRD mechanisms. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 10.** Figure 10: Performance with different timesteps τ under the G→A protocol. Stage1 Stage2 [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: Comparison of trainable parameters across existing baselines and [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of feature distributions with t-SNE [ [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗

**Figure 13.** Figure 13: Rank list comparison among VDT, SD-ReID’s stage 1, and SD-ReID’s stage 2 on challenging examples. Green boxes indicate correct matches, while [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗

**Figure 14.** Figure 14: Visualization of activation maps and feature similarities. (a) and [PITH_FULL_IMAGE:figures/full_fig_p010_14.png] view at source ↗

read the original abstract

Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative models to maintain the identity consistency despite drastic changes in camera viewpoints. The core idea behind these methods is quite natural, but designing a view-robust model is a very challenging task. Moreover, they overlook the contribution of view-specific features in enhancing the model's ability to represent persons. To address these issues, we propose a novel generative framework named SD-ReID for AG-ReID, which leverages generative models to mimic the feature distribution of different views while extracting robust identity representations. More specifically, we first train a ViT-based model to extract person representations along with controllable conditions, including identity and view conditions. We then fine-tune the Stable Diffusion (SD) model to enhance person representations guided by these controllable conditions. Furthermore, we introduce the View-Refined Decoder (VRD) to bridge the gap between instance-level and global-level features. Finally, both person representations and all-view features are employed to retrieve target persons. Extensive experiments on five AG-ReID benchmarks (i.e., CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR and G2APS-ReID) demonstrate the effectiveness of our proposed method. The source code and pre-trained models are available at https://github.com/924973292/SD-ReID.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a diffusion-based view synthesis step to aerial-ground re-ID but the gains are asserted without numbers or ablations in the visible sections.

read the letter

The main thing here is a pipeline that extracts identity and view conditions with a ViT, fine-tunes Stable Diffusion to produce view-mimicking features, and adds a View-Refined Decoder to mix instance and global cues before retrieval. That combination is not just plugging diffusion into re-ID; it targets the specific mismatch between aerial and ground views by generating rather than only suppressing view differences. The code release is a plus for anyone who wants to test it directly on the five listed benchmarks. What stands out is the decision to keep view-specific information instead of forcing full invariance, which matches the practical needs of surveillance setups where both views matter. The abstract is internally consistent and the steps follow logically from prior conditional diffusion work. The soft spot is that the central performance claim sits on an unshown experimental section. No tables, no ablation on the diffusion fine-tuning, and no direct comparison showing that the generated features improve identity discrimination rather than add noise. Without those numbers it is hard to tell whether the extra machinery pays off or whether simpler view-augmentation baselines would do similar work. The assumption that the View-Refined Decoder cleanly integrates the two feature levels also needs the failure cases to be convincing. This is the kind of paper that belongs in a computer-vision venue focused on applied re-ID or generative vision. A reader already working on cross-view matching or on controllable diffusion for recognition tasks will find the architecture worth examining. It deserves a serious referee because the problem is well-defined, the method is reproducible in principle, and the idea is a clear step beyond standard invariant-feature approaches, even if the current evidence is thin.

Referee Report

2 major / 2 minor

Summary. The paper proposes SD-ReID, a generative framework for Aerial-Ground Person Re-Identification (AG-ReID). It first trains a ViT-based model to extract identity and view conditions from person images. These conditions then guide fine-tuning of a Stable Diffusion model to mimic feature distributions across views. A View-Refined Decoder (VRD) is introduced to integrate instance-level and global-level features. The resulting person representations and all-view features are used together for retrieval. The authors assert that this approach improves robustness to viewpoint changes and report effectiveness on five AG-ReID benchmarks (CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR, G2APS-ReID), with code and models released publicly.

Significance. If the empirical claims hold, the work has moderate significance for AG-ReID by extending conditional diffusion models to synthesize view-specific features while preserving identity discrimination, moving beyond purely discriminative view-robust designs. The public release of source code and pre-trained models at the cited GitHub repository strengthens reproducibility and allows direct verification of the pipeline.

major comments (2)

[Abstract] Abstract: The central claim that the method 'demonstrate[s] the effectiveness' on five benchmarks rests on experimental outcomes, yet the manuscript text supplies no quantitative results, performance tables, ablation studies, or error analysis. Without these, the improvement over prior discriminative models cannot be assessed and is load-bearing for the contribution.
[Method] Method (View-Refined Decoder description): The VRD is asserted to successfully bridge instance-level and global-level features without introducing inconsistencies, but no architecture diagram, equations for feature fusion, or training objective for the decoder are provided. This leaves the weakest assumption unverified and directly affects whether the combined representations improve rather than degrade discrimination.

minor comments (2)

[Introduction] The transition in the introduction from limitations of prior work to the proposed generative approach could be tightened for clarity.
Notation for the controllable conditions (identity and view) extracted by the ViT could be formalized with explicit symbols to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to improve clarity and completeness while preserving the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the method 'demonstrate[s] the effectiveness' on five benchmarks rests on experimental outcomes, yet the manuscript text supplies no quantitative results, performance tables, ablation studies, or error analysis. Without these, the improvement over prior discriminative models cannot be assessed and is load-bearing for the contribution.

Authors: We agree that the abstract would be strengthened by including key quantitative results. The full manuscript contains detailed performance tables, ablation studies, and comparisons in the Experiments section. In the revision, we will update the abstract to summarize the main empirical gains (e.g., average Rank-1 improvements across the five benchmarks) so the effectiveness claim is directly supported. revision: yes
Referee: [Method] Method (View-Refined Decoder description): The VRD is asserted to successfully bridge instance-level and global-level features without introducing inconsistencies, but no architecture diagram, equations for feature fusion, or training objective for the decoder are provided. This leaves the weakest assumption unverified and directly affects whether the combined representations improve rather than degrade discrimination.

Authors: We acknowledge that the current description of the View-Refined Decoder would benefit from additional technical detail. We will add an architecture diagram, explicit equations for the instance-to-global feature fusion, and the precise training objective for the decoder in the revised Method section. This will allow readers to verify that the fusion improves rather than degrades discrimination. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a procedural pipeline: train a ViT-based extractor for identity and view conditions, fine-tune Stable Diffusion under those conditions, insert a View-Refined Decoder, and combine instance- and global-level features for retrieval. All performance claims are obtained by standard supervised training and evaluation on five external public benchmarks (CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR, G2APS-ReID). No equation equates a claimed improvement to a fitted parameter by construction, no uniqueness theorem is imported from prior self-work, and no ansatz is smuggled via self-citation. The derivation therefore remains self-contained against external data and does not reduce to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on one new architectural component and standard deep-learning assumptions about diffusion-model fine-tuning; no additional free parameters or invented physical entities are introduced beyond the decoder.

free parameters (1)

hyperparameters for ViT and Stable Diffusion fine-tuning
Standard training choices that are selected to optimize performance on the target benchmarks.

axioms (1)

domain assumption Stable Diffusion can be fine-tuned to synthesize view-specific feature distributions when conditioned on identity and view signals.
Invoked when describing the fine-tuning stage that mimics different camera viewpoints.

invented entities (1)

View-Refined Decoder (VRD) no independent evidence
purpose: Bridge instance-level and global-level features
New module introduced to connect per-person and view-wide representations.

pith-pipeline@v0.9.0 · 5796 in / 1439 out tokens · 74387 ms · 2026-05-22T19:31:22.942046+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 7 internal anchors

[1]

Illumination-invariant person re-identification,

Y . Huang, Z.-J. Zha, X. Fu, and W. Zhang, “Illumination-invariant person re-identification,” inACMMM, 2019, pp. 365–373

work page 2019
[2]

Multi-scale learning for low-resolution person re-identification,

X. Li, W.-S. Zheng, X. Wang, T. Xiang, and S. Gong, “Multi-scale learning for low-resolution person re-identification,” inICCV, 2015, pp. 3765–3773

work page 2015
[3]

Adversarially occluded samples for person re-identification,

H. Huang, D. Li, Z. Zhang, X. Chen, and K. Huang, “Adversarially occluded samples for person re-identification,” inCVPR, 2018, pp. 5098–5107

work page 2018
[4]

Aerial-ground person re-id,

H. Nguyen, K. Nguyen, S. Sridharan, and C. Fookes, “Aerial-ground person re-id,” inICME, 2023, pp. 2585–2590

work page 2023
[5]

View-decoupled transformer for person re-identification under aerial-ground camera network,

Q. Zhang, L. Wang, V . M. Patel, X. Xie, and J. Lai, “View-decoupled transformer for person re-identification under aerial-ground camera network,” inCVPR, 2024, pp. 22 000–22 009

work page 2024
[6]

Ag-reid. v2: Bridging aerial and ground views for person re-identification,

H. Nguyen, K. Nguyen, S. Sridharan, and C. Fookes, “Ag-reid. v2: Bridging aerial and ground views for person re-identification,”TIFS, pp. 2896 – 2908, 2024

work page 2024
[7]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inCVPR, 2022, pp. 10 684–10 695

work page 2022
[8]

Diffusiondet: Diffusion model for object detection,

S. Chen, P. Sun, Y . Song, and P. Luo, “Diffusiondet: Diffusion model for object detection,” inICCV, 2023, pp. 19 830–19 843

work page 2023
[9]

A generalist framework for panoptic segmentation of images and videos,

T. Chen, L. Li, S. Saxena, G. Hinton, and D. J. Fleet, “A generalist framework for panoptic segmentation of images and videos,” inICCV, 2023, pp. 909–919

work page 2023
[10]

Deep metric learning for person re-identification,

D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Deep metric learning for person re-identification,” inICPR, 2014, pp. 34–39

work page 2014
[11]

Omni-scale feature learning for person re-identification,

K. Zhou, Y . Yang, A. Cavallaro, and T. Xiang, “Omni-scale feature learning for person re-identification,” inICCV, 2019, pp. 3702–3712

work page 2019
[12]

Auto-reid: Searching for a part-aware convnet for person re-identification,

R. Quan, X. Dong, Y . Wu, L. Zhu, and Y . Yang, “Auto-reid: Searching for a part-aware convnet for person re-identification,” inICCV, 2019, pp. 3750–3759

work page 2019
[13]

Transreid: Transformer-based object re-identification,

S. He, H. Luo, P. Wang, F. Wang, H. Li, and W. Jiang, “Transreid: Transformer-based object re-identification,” inICCV, 2021, pp. 15 013– 15 022

work page 2021
[14]

Clip-reid: exploiting vision-language model for image re-identification without concrete text labels,

S. Li, L. Sun, and Q. Li, “Clip-reid: exploiting vision-language model for image re-identification without concrete text labels,” inAAAI, vol. 37, no. 1, 2023, pp. 1405–1413. IEEE TRANSACTIONS ON IMAGE PROCESSING 11

work page 2023
[15]

Rgb-infrared cross-modality person re-identification,

A. Wu, W.-S. Zheng, H.-X. Yu, S. Gong, and J. Lai, “Rgb-infrared cross-modality person re-identification,” inICCV, 2017, pp. 5380–5389

work page 2017
[16]

Hierarchical discriminative learning for visible thermal person re-identification,

M. Ye, X. Lan, J. Li, and P. Yuen, “Hierarchical discriminative learning for visible thermal person re-identification,” inAAAI, vol. 32, no. 1, 2018

work page 2018
[17]

Learning progressive modality-shared transformers for effective visible-infrared person re-identification,

H. Lu, X. Zou, and P. Zhang, “Learning progressive modality-shared transformers for effective visible-infrared person re-identification,” in AAAI, vol. 37, no. 2, 2023, pp. 1835–1843

work page 2023
[18]

Top-reid: Multi-spectral object re-identification with token permutation,

Y . Wang, X. Liu, P. Zhang, H. Lu, Z. Tu, and H. Lu, “Top-reid: Multi-spectral object re-identification with token permutation,” inAAAI, vol. 38, no. 6, 2024, pp. 5758–5766

work page 2024
[19]

Magic tokens: Select diverse tokens for multi-modal object re-identification,

P. Zhang, Y . Wang, Y . Liu, Z. Tu, and H. Lu, “Magic tokens: Select diverse tokens for multi-modal object re-identification,” inCVPR, 2024, pp. 17 117–17 126

work page 2024
[20]

Mam- bapro: Multi-modal object re-identification with mamba aggregation and synergistic prompt,

Y . Wang, X. Liu, T. Yan, Y . Liu, A. Zheng, P. Zhang, and H. Lu, “Mam- bapro: Multi-modal object re-identification with mamba aggregation and synergistic prompt,” inAAAI, vol. 39, no. 8, 2025, pp. 8150–8158

work page 2025
[21]

Decoupled feature-based mixture of experts for multi-modal object re-identification,

Y . Wang, Y . Liu, A. Zheng, and P. Zhang, “Decoupled feature-based mixture of experts for multi-modal object re-identification,” inAAAI, vol. 39, no. 8, 2025, pp. 8141–8149

work page 2025
[22]

Idea: Inverted text with cooper- ative deformable aggregation for multi-modal object re-identification,

Y . Wang, Y . Lv, P. Zhang, and H. Lu, “Idea: Inverted text with cooper- ative deformable aggregation for multi-modal object re-identification,” inCVPR, 2025, pp. 29 701–29 710

work page 2025
[23]

Secap: Self- calibrating and adaptive prompts for cross-view person re-identification in aerial-ground networks,

S. Wang, Y . Wang, R. Wu, B. Jiao, W. Wang, and P. Wang, “Secap: Self- calibrating and adaptive prompts for cross-view person re-identification in aerial-ground networks,” inCVPR, 2025, pp. 22 119–22 128

work page 2025
[24]

Cross-platform video person reid: A new benchmark dataset and adaptation approach,

S. Zhang, W. Luo, D. Cheng, Q. Yang, L. Ran, Y . Xing, and Y . Zhang, “Cross-platform video person reid: A new benchmark dataset and adaptation approach,” inECCV, 2024, pp. 270–287

work page 2024
[25]

Detreidx: A stress-test dataset for real-world uav-based person recognition,

K. A. Hambarde, N. Mbongo, P. K. MP, S. Mekewad, C. Fernandes, G. Silahtaro ˘glu, A. Nithya, P. Wasnik, M. Rashidunnabi, P. Samale et al., “Detreidx: A stress-test dataset for real-world uav-based person recognition,”arXiv preprint arXiv:2505.04793, 2025

work page arXiv 2025
[26]

Multi-modal multi-platform person re-identification: Benchmark and method,

R. Ha, S. Jiang, B. Li, B. Pan, Y . Zhu, J. Zhang, X. Zhu, S. Gong, and J. Wang, “Multi-modal multi-platform person re-identification: Benchmark and method,”arXiv preprint arXiv:2503.17096, 2025

work page arXiv 2025
[27]

Ag-vpreid: A challenging large-scale benchmark for aerial-ground video-based person re-identification,

H. Nguyen, K. Nguyen, A. Pemasiri, F. Liu, S. Sridharan, and C. Fookes, “Ag-vpreid: A challenging large-scale benchmark for aerial-ground video-based person re-identification,” inCVPR, 2025, pp. 1241–1251

work page 2025
[28]

Ag-vpreid. vir: Bridging aerial and ground platforms for video- based visible-infrared person re-id,

H. Nguyen, K. Nguyen, A. Pemasiri, A. Jahan, C. Fookes, and S. Srid- haran, “Ag-vpreid. vir: Bridging aerial and ground platforms for video- based visible-infrared person re-id,”arXiv preprint arXiv:2507.17995, 2025

work page arXiv 2025
[29]

Dynamic token selective transformer for aerial-ground person re-identification,

Y . Wang and M. Pishgar, “Dynamic token selective transformer for aerial-ground person re-identification,”arXiv preprint arXiv:2412.00433v2, 2024

work page arXiv 2024
[30]

Latex: Leveraging attribute- based text knowledge for aerial-ground person re-identification,

X. Hu, Y . Wang, P. Zhang, and H. Lu, “Latex: Leveraging attribute- based text knowledge for aerial-ground person re-identification,”arXiv preprint arXiv:2503.23722, 2025

work page arXiv 2025
[31]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” NeurIPS, vol. 33, pp. 6840–6851, 2020

work page 2020
[32]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[33]

Diffusion models beat gans on image synthesis,

P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,”NeurIPS, vol. 34, pp. 8780–8794, 2021

work page 2021
[34]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans, “Classifier-free diffusion guidance,”arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[35]

Diffusevae: Efficient, controllable and high-fidelity generation from low-dimensional latents,

K. Pandey, A. Mukherjee, P. Rai, and A. Kumar, “Diffusevae: Efficient, controllable and high-fidelity generation from low-dimensional latents,” arXiv preprint arXiv:2201.00308, 2022

work page arXiv 2022
[36]

Photorealistic text-to-image diffusion models with deep language understanding,

C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to-image diffusion models with deep language understanding,”NeurIPS, vol. 35, pp. 36 479–36 494, 2022

work page 2022
[37]

Hierarchical Text-Conditional Image Generation with CLIP Latents

A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,”arXiv preprint arXiv:2204.06125, vol. 1, no. 2, p. 3, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[38]

Cascaded diffusion models for high fidelity image generation,

J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, and T. Salimans, “Cascaded diffusion models for high fidelity image generation,”JMLR, vol. 23, no. 47, pp. 1–33, 2022

work page 2022
[39]

Feature erasing and diffusion network for occluded person re-identification,

Z. Wang, F. Zhu, S. Tang, R. Zhao, L. He, and J. Song, “Feature erasing and diffusion network for occluded person re-identification,” inCVPR, 2022, pp. 4754–4763

work page 2022
[40]

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

I. H. Kim, J. Lee, W. Jin, S. Son, K. Cho, J. Seo, M.-S. Kwak, S. Cho, J. Baek, B. Leeet al., “Pose-dive: Pose-diversified augmen- tation with diffusion model for person re-identification,”arXiv preprint arXiv:2406.16042, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[41]

Rethinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inCVPR, 2016, pp. 2818–2826

work page 2016
[42]

In Defense of the Triplet Loss for Person Re-Identification

A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identification,”arXiv preprint arXiv:1703.07737, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Coarse-to-fine latent diffusion for pose-guided person image synthesis,

Y . Lu, M. Zhang, A. J. Ma, X. Xie, and J. Lai, “Coarse-to-fine latent diffusion for pose-guided person image synthesis,” inCVPR, 2024, pp. 6420–6429

work page 2024
[44]

Fastreid: A pytorch toolbox for general instance re-identification,

L. He, X. Liao, W. Liu, X. Liu, P. Cheng, and T. Mei, “Fastreid: A pytorch toolbox for general instance re-identification,” inACMMM, 2023, pp. 9664–9667

work page 2023
[45]

Learning part-based convolutional features for person re-identification,

Y . Sun, L. Zheng, Y . Li, Y . Yang, Q. Tian, and S. Wang, “Learning part-based convolutional features for person re-identification,”TPAMI, vol. 43, no. 3, pp. 902–917, 2019

work page 2019
[46]

Bag of tricks and a strong baseline for deep person re-identification,

H. Luo, Y . Gu, X. Liao, S. Lai, and W. Jiang, “Bag of tricks and a strong baseline for deep person re-identification,” inCVPR workshops, 2019, pp. 0–0

work page 2019
[47]

Learning discriminative features with multiple granularities for person re-identification,

G. Wang, Y . Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identification,” in ACMMM, 2018, pp. 274–282

work page 2018
[48]

A strong and efficient baseline for vehicle re-identification using deep triplet embedding,

R. Kumar, E. Weill, F. Aghdasi, and P. Sriram, “A strong and efficient baseline for vehicle re-identification using deep triplet embedding,” JAISCR, vol. 10, no. 1, pp. 27–45, 2020

work page 2020
[49]

Deep learning for person re-identification: A survey and outlook,

M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,”TPAMI, vol. 44, no. 6, pp. 2872–2893, 2021

work page 2021
[50]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[51]

Learning generalisable omni-scale representations for person re-identification,

K. Zhou, Y . Yang, A. Cavallaro, and T. Xiang, “Learning generalisable omni-scale representations for person re-identification,”TPAMI, vol. 44, no. 9, pp. 5056–5069, 2021

work page 2021
[52]

Unity is strength: Unifying convolutional and transformeral features for better person re- identification,

Y . Wang, P. Zhang, X. Liu, Z. Tu, and H. Lu, “Unity is strength: Unifying convolutional and transformeral features for better person re- identification,”TITS, 2025

work page 2025
[53]

Prototypical contrastive learning-based clip fine- tuning for object re-identification,

J. Li and X. Gong, “Prototypical contrastive learning-based clip fine- tuning for object re-identification,”arXiv preprint arXiv:2310.17218, 2023

work page arXiv 2023
[54]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inICCV, 2021, pp. 10 012–10 022

work page 2021
[55]

Deep high-resolution representation learning for visual recognition,

J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y . Zhao, D. Liu, Y . Mu, M. Tan, X. Wanget al., “Deep high-resolution representation learning for visual recognition,”TPAMI, vol. 43, no. 10, pp. 3349–3364, 2020

work page 2020
[56]

Swin transformer v2: Scaling up capacity and resolution,

Z. Liu, H. Hu, Y . Lin, Z. Yao, Z. Xie, Y . Wei, J. Ning, Y . Cao, Z. Zhang, L. Donget al., “Swin transformer v2: Scaling up capacity and resolution,” inCVPR, 2022, pp. 12 009–12 019

work page 2022
[57]

Ag-reid 2023: Aerial-ground person re-identification challenge results,

K. Nguyen, C. Fookes, S. Sridharan, F. Liu, X. Liu, A. Ross, D. Michal- ski, H. Nguyen, D. Deb, M. Kothariet al., “Ag-reid 2023: Aerial-ground person re-identification challenge results,” inIJCB, 2023, pp. 1–10

work page 2023
[58]

Enhancing visible- infrared person re-identification with modality-and instance-aware visual prompt learning,

R. Wu, B. Jiao, W. Wang, M. Liu, and P. Wang, “Enhancing visible- infrared person re-identification with modality-and instance-aware visual prompt learning,” inICMR, 2024, pp. 579–588

work page 2024
[59]

Ground-to-aerial person search: Benchmark dataset and approach,

S. Zhang, Q. Yang, D. Cheng, Y . Xing, G. Liang, P. Wang, and Y . Zhang, “Ground-to-aerial person search: Benchmark dataset and approach,” in ACM MM, 2023, pp. 789–799

work page 2023
[60]

Computational and performance aspects of pca-based face-recognition algorithms,

H. Moon and P. J. Phillips, “Computational and performance aspects of pca-based face-recognition algorithms,”Perception, vol. 30, no. 3, pp. 303–321, 2001

work page 2001
[61]

Scalable person re-identification: A benchmark,

L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” inICCV, 2015, pp. 1116–1124

work page 2015
[62]

Diffusers: State-of-the-art diffusion mod- els,

P. V on Platen, S. Patil, A. Lozhkov, P. Cuenca, N. Lambert, K. Rasul, M. Davaadorj, and T. Wolf, “Diffusers: State-of-the-art diffusion mod- els,” 2022

work page 2022
[63]

Random erasing data augmentation,

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y . Yang, “Random erasing data augmentation,” inAAAI, vol. 34, no. 07, 2020, pp. 13 001–13 008

work page 2020
[64]

Large-scale machine learning with stochastic gradient de- scent,

L. Bottou, “Large-scale machine learning with stochastic gradient de- scent,” inICCS, 2010, pp. 177–186

work page 2010
[65]

Adam: A Method for Stochastic Optimization

D. P. Kingma, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[66]

Visualizing data using t-sne

L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.”JMLR, vol. 9, no. 11, 2008

work page 2008

[1] [1]

Illumination-invariant person re-identification,

Y . Huang, Z.-J. Zha, X. Fu, and W. Zhang, “Illumination-invariant person re-identification,” inACMMM, 2019, pp. 365–373

work page 2019

[2] [2]

Multi-scale learning for low-resolution person re-identification,

X. Li, W.-S. Zheng, X. Wang, T. Xiang, and S. Gong, “Multi-scale learning for low-resolution person re-identification,” inICCV, 2015, pp. 3765–3773

work page 2015

[3] [3]

Adversarially occluded samples for person re-identification,

H. Huang, D. Li, Z. Zhang, X. Chen, and K. Huang, “Adversarially occluded samples for person re-identification,” inCVPR, 2018, pp. 5098–5107

work page 2018

[4] [4]

Aerial-ground person re-id,

H. Nguyen, K. Nguyen, S. Sridharan, and C. Fookes, “Aerial-ground person re-id,” inICME, 2023, pp. 2585–2590

work page 2023

[5] [5]

View-decoupled transformer for person re-identification under aerial-ground camera network,

Q. Zhang, L. Wang, V . M. Patel, X. Xie, and J. Lai, “View-decoupled transformer for person re-identification under aerial-ground camera network,” inCVPR, 2024, pp. 22 000–22 009

work page 2024

[6] [6]

Ag-reid. v2: Bridging aerial and ground views for person re-identification,

H. Nguyen, K. Nguyen, S. Sridharan, and C. Fookes, “Ag-reid. v2: Bridging aerial and ground views for person re-identification,”TIFS, pp. 2896 – 2908, 2024

work page 2024

[7] [7]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inCVPR, 2022, pp. 10 684–10 695

work page 2022

[8] [8]

Diffusiondet: Diffusion model for object detection,

S. Chen, P. Sun, Y . Song, and P. Luo, “Diffusiondet: Diffusion model for object detection,” inICCV, 2023, pp. 19 830–19 843

work page 2023

[9] [9]

A generalist framework for panoptic segmentation of images and videos,

T. Chen, L. Li, S. Saxena, G. Hinton, and D. J. Fleet, “A generalist framework for panoptic segmentation of images and videos,” inICCV, 2023, pp. 909–919

work page 2023

[10] [10]

Deep metric learning for person re-identification,

D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Deep metric learning for person re-identification,” inICPR, 2014, pp. 34–39

work page 2014

[11] [11]

Omni-scale feature learning for person re-identification,

K. Zhou, Y . Yang, A. Cavallaro, and T. Xiang, “Omni-scale feature learning for person re-identification,” inICCV, 2019, pp. 3702–3712

work page 2019

[12] [12]

Auto-reid: Searching for a part-aware convnet for person re-identification,

R. Quan, X. Dong, Y . Wu, L. Zhu, and Y . Yang, “Auto-reid: Searching for a part-aware convnet for person re-identification,” inICCV, 2019, pp. 3750–3759

work page 2019

[13] [13]

Transreid: Transformer-based object re-identification,

S. He, H. Luo, P. Wang, F. Wang, H. Li, and W. Jiang, “Transreid: Transformer-based object re-identification,” inICCV, 2021, pp. 15 013– 15 022

work page 2021

[14] [14]

Clip-reid: exploiting vision-language model for image re-identification without concrete text labels,

S. Li, L. Sun, and Q. Li, “Clip-reid: exploiting vision-language model for image re-identification without concrete text labels,” inAAAI, vol. 37, no. 1, 2023, pp. 1405–1413. IEEE TRANSACTIONS ON IMAGE PROCESSING 11

work page 2023

[15] [15]

Rgb-infrared cross-modality person re-identification,

A. Wu, W.-S. Zheng, H.-X. Yu, S. Gong, and J. Lai, “Rgb-infrared cross-modality person re-identification,” inICCV, 2017, pp. 5380–5389

work page 2017

[16] [16]

Hierarchical discriminative learning for visible thermal person re-identification,

M. Ye, X. Lan, J. Li, and P. Yuen, “Hierarchical discriminative learning for visible thermal person re-identification,” inAAAI, vol. 32, no. 1, 2018

work page 2018

[17] [17]

Learning progressive modality-shared transformers for effective visible-infrared person re-identification,

H. Lu, X. Zou, and P. Zhang, “Learning progressive modality-shared transformers for effective visible-infrared person re-identification,” in AAAI, vol. 37, no. 2, 2023, pp. 1835–1843

work page 2023

[18] [18]

Top-reid: Multi-spectral object re-identification with token permutation,

Y . Wang, X. Liu, P. Zhang, H. Lu, Z. Tu, and H. Lu, “Top-reid: Multi-spectral object re-identification with token permutation,” inAAAI, vol. 38, no. 6, 2024, pp. 5758–5766

work page 2024

[19] [19]

Magic tokens: Select diverse tokens for multi-modal object re-identification,

P. Zhang, Y . Wang, Y . Liu, Z. Tu, and H. Lu, “Magic tokens: Select diverse tokens for multi-modal object re-identification,” inCVPR, 2024, pp. 17 117–17 126

work page 2024

[20] [20]

Mam- bapro: Multi-modal object re-identification with mamba aggregation and synergistic prompt,

Y . Wang, X. Liu, T. Yan, Y . Liu, A. Zheng, P. Zhang, and H. Lu, “Mam- bapro: Multi-modal object re-identification with mamba aggregation and synergistic prompt,” inAAAI, vol. 39, no. 8, 2025, pp. 8150–8158

work page 2025

[21] [21]

Decoupled feature-based mixture of experts for multi-modal object re-identification,

Y . Wang, Y . Liu, A. Zheng, and P. Zhang, “Decoupled feature-based mixture of experts for multi-modal object re-identification,” inAAAI, vol. 39, no. 8, 2025, pp. 8141–8149

work page 2025

[22] [22]

Idea: Inverted text with cooper- ative deformable aggregation for multi-modal object re-identification,

Y . Wang, Y . Lv, P. Zhang, and H. Lu, “Idea: Inverted text with cooper- ative deformable aggregation for multi-modal object re-identification,” inCVPR, 2025, pp. 29 701–29 710

work page 2025

[23] [23]

Secap: Self- calibrating and adaptive prompts for cross-view person re-identification in aerial-ground networks,

S. Wang, Y . Wang, R. Wu, B. Jiao, W. Wang, and P. Wang, “Secap: Self- calibrating and adaptive prompts for cross-view person re-identification in aerial-ground networks,” inCVPR, 2025, pp. 22 119–22 128

work page 2025

[24] [24]

Cross-platform video person reid: A new benchmark dataset and adaptation approach,

S. Zhang, W. Luo, D. Cheng, Q. Yang, L. Ran, Y . Xing, and Y . Zhang, “Cross-platform video person reid: A new benchmark dataset and adaptation approach,” inECCV, 2024, pp. 270–287

work page 2024

[25] [25]

Detreidx: A stress-test dataset for real-world uav-based person recognition,

K. A. Hambarde, N. Mbongo, P. K. MP, S. Mekewad, C. Fernandes, G. Silahtaro ˘glu, A. Nithya, P. Wasnik, M. Rashidunnabi, P. Samale et al., “Detreidx: A stress-test dataset for real-world uav-based person recognition,”arXiv preprint arXiv:2505.04793, 2025

work page arXiv 2025

[26] [26]

Multi-modal multi-platform person re-identification: Benchmark and method,

R. Ha, S. Jiang, B. Li, B. Pan, Y . Zhu, J. Zhang, X. Zhu, S. Gong, and J. Wang, “Multi-modal multi-platform person re-identification: Benchmark and method,”arXiv preprint arXiv:2503.17096, 2025

work page arXiv 2025

[27] [27]

Ag-vpreid: A challenging large-scale benchmark for aerial-ground video-based person re-identification,

H. Nguyen, K. Nguyen, A. Pemasiri, F. Liu, S. Sridharan, and C. Fookes, “Ag-vpreid: A challenging large-scale benchmark for aerial-ground video-based person re-identification,” inCVPR, 2025, pp. 1241–1251

work page 2025

[28] [28]

Ag-vpreid. vir: Bridging aerial and ground platforms for video- based visible-infrared person re-id,

H. Nguyen, K. Nguyen, A. Pemasiri, A. Jahan, C. Fookes, and S. Srid- haran, “Ag-vpreid. vir: Bridging aerial and ground platforms for video- based visible-infrared person re-id,”arXiv preprint arXiv:2507.17995, 2025

work page arXiv 2025

[29] [29]

Dynamic token selective transformer for aerial-ground person re-identification,

Y . Wang and M. Pishgar, “Dynamic token selective transformer for aerial-ground person re-identification,”arXiv preprint arXiv:2412.00433v2, 2024

work page arXiv 2024

[30] [30]

Latex: Leveraging attribute- based text knowledge for aerial-ground person re-identification,

X. Hu, Y . Wang, P. Zhang, and H. Lu, “Latex: Leveraging attribute- based text knowledge for aerial-ground person re-identification,”arXiv preprint arXiv:2503.23722, 2025

work page arXiv 2025

[31] [31]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” NeurIPS, vol. 33, pp. 6840–6851, 2020

work page 2020

[32] [32]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[33] [33]

Diffusion models beat gans on image synthesis,

P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,”NeurIPS, vol. 34, pp. 8780–8794, 2021

work page 2021

[34] [34]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans, “Classifier-free diffusion guidance,”arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[35] [35]

Diffusevae: Efficient, controllable and high-fidelity generation from low-dimensional latents,

K. Pandey, A. Mukherjee, P. Rai, and A. Kumar, “Diffusevae: Efficient, controllable and high-fidelity generation from low-dimensional latents,” arXiv preprint arXiv:2201.00308, 2022

work page arXiv 2022

[36] [36]

Photorealistic text-to-image diffusion models with deep language understanding,

C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to-image diffusion models with deep language understanding,”NeurIPS, vol. 35, pp. 36 479–36 494, 2022

work page 2022

[37] [37]

Hierarchical Text-Conditional Image Generation with CLIP Latents

A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,”arXiv preprint arXiv:2204.06125, vol. 1, no. 2, p. 3, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[38] [38]

Cascaded diffusion models for high fidelity image generation,

J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, and T. Salimans, “Cascaded diffusion models for high fidelity image generation,”JMLR, vol. 23, no. 47, pp. 1–33, 2022

work page 2022

[39] [39]

Feature erasing and diffusion network for occluded person re-identification,

Z. Wang, F. Zhu, S. Tang, R. Zhao, L. He, and J. Song, “Feature erasing and diffusion network for occluded person re-identification,” inCVPR, 2022, pp. 4754–4763

work page 2022

[40] [40]

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

I. H. Kim, J. Lee, W. Jin, S. Son, K. Cho, J. Seo, M.-S. Kwak, S. Cho, J. Baek, B. Leeet al., “Pose-dive: Pose-diversified augmen- tation with diffusion model for person re-identification,”arXiv preprint arXiv:2406.16042, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[41] [41]

Rethinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inCVPR, 2016, pp. 2818–2826

work page 2016

[42] [42]

In Defense of the Triplet Loss for Person Re-Identification

A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identification,”arXiv preprint arXiv:1703.07737, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[43] [43]

Coarse-to-fine latent diffusion for pose-guided person image synthesis,

Y . Lu, M. Zhang, A. J. Ma, X. Xie, and J. Lai, “Coarse-to-fine latent diffusion for pose-guided person image synthesis,” inCVPR, 2024, pp. 6420–6429

work page 2024

[44] [44]

Fastreid: A pytorch toolbox for general instance re-identification,

L. He, X. Liao, W. Liu, X. Liu, P. Cheng, and T. Mei, “Fastreid: A pytorch toolbox for general instance re-identification,” inACMMM, 2023, pp. 9664–9667

work page 2023

[45] [45]

Learning part-based convolutional features for person re-identification,

Y . Sun, L. Zheng, Y . Li, Y . Yang, Q. Tian, and S. Wang, “Learning part-based convolutional features for person re-identification,”TPAMI, vol. 43, no. 3, pp. 902–917, 2019

work page 2019

[46] [46]

Bag of tricks and a strong baseline for deep person re-identification,

H. Luo, Y . Gu, X. Liao, S. Lai, and W. Jiang, “Bag of tricks and a strong baseline for deep person re-identification,” inCVPR workshops, 2019, pp. 0–0

work page 2019

[47] [47]

Learning discriminative features with multiple granularities for person re-identification,

G. Wang, Y . Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identification,” in ACMMM, 2018, pp. 274–282

work page 2018

[48] [48]

A strong and efficient baseline for vehicle re-identification using deep triplet embedding,

R. Kumar, E. Weill, F. Aghdasi, and P. Sriram, “A strong and efficient baseline for vehicle re-identification using deep triplet embedding,” JAISCR, vol. 10, no. 1, pp. 27–45, 2020

work page 2020

[49] [49]

Deep learning for person re-identification: A survey and outlook,

M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,”TPAMI, vol. 44, no. 6, pp. 2872–2893, 2021

work page 2021

[50] [50]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[51] [51]

Learning generalisable omni-scale representations for person re-identification,

K. Zhou, Y . Yang, A. Cavallaro, and T. Xiang, “Learning generalisable omni-scale representations for person re-identification,”TPAMI, vol. 44, no. 9, pp. 5056–5069, 2021

work page 2021

[52] [52]

Unity is strength: Unifying convolutional and transformeral features for better person re- identification,

Y . Wang, P. Zhang, X. Liu, Z. Tu, and H. Lu, “Unity is strength: Unifying convolutional and transformeral features for better person re- identification,”TITS, 2025

work page 2025

[53] [53]

Prototypical contrastive learning-based clip fine- tuning for object re-identification,

J. Li and X. Gong, “Prototypical contrastive learning-based clip fine- tuning for object re-identification,”arXiv preprint arXiv:2310.17218, 2023

work page arXiv 2023

[54] [54]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inICCV, 2021, pp. 10 012–10 022

work page 2021

[55] [55]

Deep high-resolution representation learning for visual recognition,

J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y . Zhao, D. Liu, Y . Mu, M. Tan, X. Wanget al., “Deep high-resolution representation learning for visual recognition,”TPAMI, vol. 43, no. 10, pp. 3349–3364, 2020

work page 2020

[56] [56]

Swin transformer v2: Scaling up capacity and resolution,

Z. Liu, H. Hu, Y . Lin, Z. Yao, Z. Xie, Y . Wei, J. Ning, Y . Cao, Z. Zhang, L. Donget al., “Swin transformer v2: Scaling up capacity and resolution,” inCVPR, 2022, pp. 12 009–12 019

work page 2022

[57] [57]

Ag-reid 2023: Aerial-ground person re-identification challenge results,

K. Nguyen, C. Fookes, S. Sridharan, F. Liu, X. Liu, A. Ross, D. Michal- ski, H. Nguyen, D. Deb, M. Kothariet al., “Ag-reid 2023: Aerial-ground person re-identification challenge results,” inIJCB, 2023, pp. 1–10

work page 2023

[58] [58]

Enhancing visible- infrared person re-identification with modality-and instance-aware visual prompt learning,

R. Wu, B. Jiao, W. Wang, M. Liu, and P. Wang, “Enhancing visible- infrared person re-identification with modality-and instance-aware visual prompt learning,” inICMR, 2024, pp. 579–588

work page 2024

[59] [59]

Ground-to-aerial person search: Benchmark dataset and approach,

S. Zhang, Q. Yang, D. Cheng, Y . Xing, G. Liang, P. Wang, and Y . Zhang, “Ground-to-aerial person search: Benchmark dataset and approach,” in ACM MM, 2023, pp. 789–799

work page 2023

[60] [60]

Computational and performance aspects of pca-based face-recognition algorithms,

H. Moon and P. J. Phillips, “Computational and performance aspects of pca-based face-recognition algorithms,”Perception, vol. 30, no. 3, pp. 303–321, 2001

work page 2001

[61] [61]

Scalable person re-identification: A benchmark,

L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” inICCV, 2015, pp. 1116–1124

work page 2015

[62] [62]

Diffusers: State-of-the-art diffusion mod- els,

P. V on Platen, S. Patil, A. Lozhkov, P. Cuenca, N. Lambert, K. Rasul, M. Davaadorj, and T. Wolf, “Diffusers: State-of-the-art diffusion mod- els,” 2022

work page 2022

[63] [63]

Random erasing data augmentation,

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y . Yang, “Random erasing data augmentation,” inAAAI, vol. 34, no. 07, 2020, pp. 13 001–13 008

work page 2020

[64] [64]

Large-scale machine learning with stochastic gradient de- scent,

L. Bottou, “Large-scale machine learning with stochastic gradient de- scent,” inICCS, 2010, pp. 177–186

work page 2010

[65] [65]

Adam: A Method for Stochastic Optimization

D. P. Kingma, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[66] [66]

Visualizing data using t-sne

L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.”JMLR, vol. 9, no. 11, 2008

work page 2008