Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Byeongwon Lee; In\`es Hyeonsu Kim; JeongYeol Baek; JoungBin Lee; Junyoung Seo; Seokju Cho; Seungryong Kim; Soowon Son; Woojeong Jin

arxiv: 2406.16042 · v3 · submitted 2024-06-23 · 💻 cs.CV

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

In\`es Hyeonsu Kim , Woojeong Jin , Soowon Son , Junyoung Seo , Seokju Cho , JeongYeol Baek , Byeongwon Lee , JoungBin Lee

show 1 more author

Seungryong Kim

This is my paper

Pith reviewed 2026-05-23 23:49 UTC · model grok-4.3

classification 💻 cs.CV

keywords person re-identificationdata augmentationdiffusion modelpose variationviewpoint variationSMPL modelbias reduction

0 comments

The pith

A diffusion model conditioned on SMPL-derived poses and viewpoints augments Re-ID training data to reduce pose and camera bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Person re-identification models often overfit to the limited poses and camera angles present in standard training sets. Pose-dIVE generates new images by running a diffusion model whose inputs include both a target human pose and a target camera viewpoint, both derived from the SMPL body model. The generated images keep the original identity while introducing previously rare poses and viewpoints. Re-ID models trained on the enlarged set are then expected to rely on identity cues rather than on pose- or viewpoint-specific appearance patterns.

Core claim

By conditioning the diffusion model on both the human pose and camera viewpoint through the SMPL model, the framework generates augmented training data with diverse human poses and camera viewpoints so that existing Re-ID models learn features unbiased by these variations and generalize better to new camera systems.

What carries the argument

Diffusion model conditioned on SMPL pose and viewpoint parameters, used to synthesize new training images that preserve identity while varying only pose and viewpoint.

If this is right

Re-ID models trained on the augmented data learn features independent of pose and viewpoint.
Generalization improves on datasets collected from previously unseen camera setups.
The method outperforms prior data-augmentation techniques for Re-ID on standard benchmarks.
The training distribution gains explicit coverage of sparse pose and viewpoint combinations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conditioning approach could be tested on other recognition tasks where viewpoint or pose imbalance limits performance.
Targeted generation of rare poses might reduce reliance on large-scale real-world data collection for Re-ID.
Measuring the entropy of pose and viewpoint distributions before and after augmentation would quantify the claimed diversification effect.

Load-bearing premise

The generated images must keep the original person's identity intact and change only pose and viewpoint without creating artifacts that Re-ID models can exploit as shortcuts.

What would settle it

If the same identity verification network applied to original-versus-generated image pairs shows identity mismatch rates substantially higher than on real image pairs, or if Re-ID accuracy on pose-diverse test sets fails to rise after augmentation.

Figures

Figures reproduced from arXiv: 2406.16042 by Byeongwon Lee, In\`es Hyeonsu Kim, JeongYeol Baek, JoungBin Lee, Junyoung Seo, Seokju Cho, Seungryong Kim, Soowon Son, Woojeong Jin.

**Figure 1.** Figure 1: Pose-dIVE diversifies the viewpoint and human pose of the Re-ID dataset to help generalize and improve the performance of arbitrary Re-ID models. Furthermore, the limited number of cameras in the datasets makes it challenging to generalize models to new camera networks (Luo, Song, and Zhang 2020; Wang 2013; Zhong et al. 2018). As a result, learning pose-invariant features for Re-ID that remain consistent a… view at source ↗

**Figure 2.** Figure 2: Visualization of the effect of viewpoint and human pose augmentation. We compare visualizations of camera viewpoint and human pose distributions for the Market-1501 (Zheng et al. 2015). The left figures (i) display the camera viewpoint distribution derived from SMPL, while the right figures (ii) illustrate the pose distribution. In (i), from left to right, we show the viewpoint distributions of the traini… view at source ↗

**Figure 3.** Figure 3: Pose-dIVE framework. Upon observing the highly biased viewpoint and human pose distributions in training dataset, we augment the dataset by manipulating SMPL body shapes and feeding the rendered shapes into a generative model to fill in sparsely distributed poses and viewpoints. With this augmented dataset, we can train a Re-ID model that is robust to viewpoint and human pose biases. model (Ho, Jain, and A… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison. We compare our generated output with DPTN (Zhang et al. 2022), showing that Pose-dIVE can generate more realistic images while better preserving identity and accurately following the target pose. Qualitative Results Qualitative comparisons. In [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Visualization of the split data. To validate the generalization power of our framework, we split the MSMT17 dataset into train/test sets using two distinct approaches: 1) splitting based on viewpoint, and 2) splitting based on human pose. The visualization clearly illustrates the separation between the train and test distributions. Training Dataset # of Images PIDs Market1501 mAP ↑ R1 ↑ (I) Baseline Datas… view at source ↗

**Figure 5.** Figure 5: Qualitative results. Example images from the augmented MSMT17 and Market-1501 dataset demonstrate how the generated images preserve original identities while maintaining realism and consistency with the Re-ID dataset. Visualization of generated data. In [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Overall architecture of generative model in Pose-dIVE. Given the viewpoint and pose distributions, we first render the body shape sampled from the distribution using SMPL, generating the corresponding skeleton, depth map, and normal maps. These conditions, along with a reference image for identity preservation, are then fed into generative module, which consists of two branches: the reference U-Net process… view at source ↗

**Figure 8.** Figure 8: Additional qualitative results. Examples of generated images from the Pose-dIVE augmented datasets. The results demonstrate realistic rendering while preserving the identity of the reference images and aligning accurately with the target poses [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Impact of the number of generated images per PID. Experiments are conducted in the Pose-dIVE augmented CUHK03 (L) dataset. We use CLIP-ReID baseline. maps, in [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Example SMPL, skeleton, depth and normal maps from external dataset. Examples of generated images from the Pose-dIVE augmented datasets. The results demonstrate realistic rendering while preserving the identity of the reference images and aligning accurately with the target poses [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

read the original abstract

Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems or environments. To overcome this, we propose Pose-dIVE, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment the training dataset to enable existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. By conditioning the diffusion model on both the human pose and camera viewpoint through the SMPL model, our framework generates augmented training data with diverse human poses and camera viewpoints. Experimental results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pose-dIVE adds SMPL-conditioned diffusion for Re-ID pose diversity but leaves identity preservation unaddressed in the abstract.

read the letter

The core idea is to condition a diffusion model on SMPL pose and viewpoint parameters so it can synthesize additional training images that fill gaps in existing Re-ID datasets. This targets a real limitation: most public sets have narrow pose and camera distributions, which hurts generalization to new viewpoints. The pipeline is a direct attempt to scale diversity without new camera rigs, which is a practical goal for anyone running Re-ID in the wild. The combination of explicit SMPL conditioning with diffusion for this downstream task is not in the prior work cited in the abstract, so that framing counts as new even if the individual components are established. The paper earns credit for framing the objective clearly around bias reduction rather than just claiming generic augmentation gains. The main soft spot is identity preservation. SMPL supplies 3D body parameters but carries no texture, clothing, or face information, so nothing in the described conditioning prevents the diffusion model from altering appearance details. The abstract states that the generated data helps Re-ID models learn unbiased features, yet it supplies no mention of reference-image conditioning, perceptual losses, or post-generation identity checks. If those controls are missing, any reported accuracy lift could come from the model exploiting new artifacts instead of true pose invariance. No quantitative results, baselines, or ablation numbers appear in the abstract, which makes it impossible to judge whether the method beats standard augmentations or random pose jitter. This work is aimed at Re-ID practitioners who already use generative augmentation and want to try an SMPL-driven variant. A reader who needs concrete numbers or identity safeguards will get limited value until the full experiments are checked. It deserves peer review because the problem is well-posed and the proposed mechanism is testable; a referee can ask for the missing identity controls and the actual performance deltas.

Referee Report

2 major / 1 minor

Summary. The paper proposes Pose-dIVE, a data augmentation method for person re-identification that employs a diffusion model conditioned on human pose and camera viewpoint through the SMPL model. The goal is to generate training samples with diverse poses and viewpoints to mitigate bias in Re-ID models caused by limited diversity in existing datasets. The abstract states that experiments demonstrate the method's effectiveness relative to other augmentation-based Re-ID approaches.

Significance. If the generated images preserve source identity while varying only pose and viewpoint, the approach could provide a scalable way to diversify Re-ID training data and improve model generalization across camera systems. The choice to leverage SMPL for explicit 3D control is a reasonable technical direction for pose-conditioned generation.

major comments (2)

[Abstract] Abstract: The central claim requires that diffusion outputs retain source identity (clothing texture, facial details, appearance) while varying only pose and viewpoint. SMPL supplies 3D body parameters but encodes neither surface texture nor identity-specific cues; the manuscript describes no explicit identity-preserving mechanism such as reference-image cross-attention, perceptual loss, or feature-matching regularizer. This is load-bearing because without it the generated samples can introduce spurious identity cues that the downstream Re-ID model exploits, undermining the bias-reduction objective.
[Abstract] Abstract: The claim that 'experimental results demonstrate the effectiveness' is made without any quantitative results, baselines, controls, or metrics for identity preservation (e.g., Re-ID feature similarity before/after augmentation or comparison against standard augmentations). This prevents verification of whether gains exceed trivial augmentation or whether identity is actually preserved.

minor comments (1)

The abstract would be clearer if it included at least one key quantitative result (e.g., rank-1 accuracy improvement on a standard Re-ID benchmark) to support the effectiveness statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and outline revisions to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim requires that diffusion outputs retain source identity (clothing texture, facial details, appearance) while varying only pose and viewpoint. SMPL supplies 3D body parameters but encodes neither surface texture nor identity-specific cues; the manuscript describes no explicit identity-preserving mechanism such as reference-image cross-attention, perceptual loss, or feature-matching regularizer. This is load-bearing because without it the generated samples can introduce spurious identity cues that the downstream Re-ID model exploits, undermining the bias-reduction objective.

Authors: We agree that explicit identity preservation is essential to ensure the augmentation varies only pose and viewpoint without introducing spurious cues. The manuscript conditions the diffusion model on SMPL parameters for pose and viewpoint but does not detail an additional identity-preserving component such as reference cross-attention or perceptual losses. We will revise the method section to explicitly describe the identity preservation strategy (e.g., by incorporating source-image conditioning) and add quantitative verification of identity retention. revision: yes
Referee: [Abstract] Abstract: The claim that 'experimental results demonstrate the effectiveness' is made without any quantitative results, baselines, controls, or metrics for identity preservation (e.g., Re-ID feature similarity before/after augmentation or comparison against standard augmentations). This prevents verification of whether gains exceed trivial augmentation or whether identity is actually preserved.

Authors: The abstract summarizes the experimental outcomes at a high level, while the full paper presents quantitative results, baselines, and comparisons in the experiments section. However, we acknowledge that the abstract lacks specific metrics for identity preservation. In the revision we will update the abstract to include key quantitative highlights and ensure identity-preservation metrics (such as feature similarity) are reported and discussed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; generative pipeline evaluated externally

full rationale

The paper describes a conditional diffusion pipeline for pose/viewpoint augmentation in Re-ID, with success measured by downstream model accuracy on held-out datasets rather than by internal consistency with its own outputs. No equations, fitted parameters, or predictions are presented that reduce by construction to the inputs (e.g., no self-definitional ratios or renamed empirical patterns). Self-citations, if present, are not load-bearing for any uniqueness claim. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the assumption that a pre-trained diffusion model can be steered by SMPL parameters without identity leakage and that the resulting images are distributionally useful for Re-ID training. No free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption SMPL model accurately encodes human pose and camera viewpoint from 2D images
The conditioning step presupposes that SMPL parameters extracted from source images faithfully represent the desired pose and viewpoint variations.
domain assumption Diffusion models can generate identity-preserving images when conditioned on SMPL parameters
The core generation step assumes the diffusion model respects identity while obeying the SMPL conditioning.

pith-pipeline@v0.9.0 · 5735 in / 1327 out tokens · 16786 ms · 2026-05-23T23:49:58.486802+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By conditioning the diffusion model on both the human pose and camera viewpoint through the SMPL model, our framework generates augmented training data with diverse human poses and camera viewpoints.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we leverage the knowledge of pre-trained large-scale diffusion models... reference U-Net... pose guider network

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval
cs.CV 2025-03 unverdicted novelty 7.0

Empirical study of a fully synthetic data generation pipeline for text-based person retrieval that tests its use as a replacement or augmentation for real data across scenarios.
SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
cs.CV 2025-04 unverdicted novelty 6.0

SD-ReID trains a ViT to extract identity and view conditions, fine-tunes Stable Diffusion to generate view-mimicking features, adds a View-Refined Decoder, and combines both identity and all-view features for retrieva...
ID-Sim: An Identity-Focused Similarity Metric
cs.CV 2026-04 unverdicted novelty 5.0

ID-Sim is a new similarity metric that aims to capture human selective sensitivity to identities by training on curated real and generative synthetic data and validating against human annotations on recognition, retri...

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · cited by 3 Pith papers · 4 internal anchors

[1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Bak, S.; Zaidenberg, S.; Boulay, B.; and Bremond, F. 2014. Improving person re-identification by viewpoint cues. In 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 175--180. IEEE

work page 2014
[4]

K.; Khan, S.; Cholakkal, H.; Anwer, R

Bhunia, A. K.; Khan, S.; Cholakkal, H.; Anwer, R. M.; Laaksonen, J.; Shah, M.; and Khan, F. S. 2023. Person image synthesis via denoising diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5968--5976

work page 2023
[5]

Chan, C.; Ginosar, S.; Zhou, T.; and Efros, A. A. 2019. Everybody dance now. In Proceedings of the IEEE/CVF international conference on computer vision, 5933--5942

work page 2019
[6]

Chen, W.; Xu, X.; Jia, J.; Luo, H.; Wang, Y.; Wang, F.; Jin, R.; and Sun, X. 2023. Beyond appearance: a semantic controllable self-supervised learning framework for human-centric visual tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15050--15061

work page 2023
[7]

Chen, X.; Fu, C.; Zhao, Y.; Zheng, F.; Song, J.; Ji, R.; and Yang, Y. 2020. Salience-guided cascaded suppression network for person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3300--3310

work page 2020
[8]

Chen, Y.-C.; Zhu, X.; Zheng, W.-S.; and Lai, J.-H. 2018. Person Re-Identification by Camera Correlation Aware Feature Augmentation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 40(2)

work page 2018
[9]

Cho, Y.-J.; and Yoon, K.-J. 2016. Improving person re-identification via Pose-aware Multi-shot Matching. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 1354--1362. IEEE Computer Society and the Computer Vision Foundation (CVF)

work page 2016
[10]

Co s ar, S.; and Bellotto, N. 2020. Human Re-identification with a robot thermal camera using entropy-based sampling. Journal of Intelligent & Robotic Systems, 98(1): 85--102

work page 2020
[11]

Dai, Z.; Chen, M.; Gu, X.; Zhu, S.; and Tan, P. 2019. Batch DropBlock Network for Person Re-Identification and Beyond. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 3690--3700. IEEE Computer Society

work page 2019
[12]

Ding, C.; Wang, K.; Wang, P.; and Tao, D. 2020. Multi-task learning with coarse priors for robust part-aware person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3): 1474--1488

work page 2020
[13]

Fu, D.; Chen, D.; Bao, J.; Yang, H.; Yuan, L.; Zhang, L.; Li, H.; and Chen, D. 2021. Unsupervised pre-training for person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 14750--14759

work page 2021
[14]

Ge, Y.; Li, Z.; Zhao, H.; Yin, G.; Yi, S.; Wang, X.; et al. 2018. Fd-gan: Pose-guided feature distilling gan for robust person re-identification. Advances in neural information processing systems, 31

work page 2018
[16]

Gong, Y.; Huang, L.; and Chen, L. 2021 b . Eliminate deviation with deviation for data augmentation and a general multi-modal data learning method. arXiv preprint arXiv:2101.08533

work page arXiv 2021
[17]

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2020. Generative adversarial networks. Communications of the ACM, 63(11): 139--144

work page 2020
[18]

Gu, J.; Wang, K.; Luo, H.; Chen, C.; Jiang, W.; Fang, Y.; Zhang, S.; You, Y.; and Zhao, J. 2023. Msinet: Twins contrastive search of multi-scale interaction for object reid. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19243--19253

work page 2023
[19]

Han, X.; Zhu, X.; Deng, J.; Song, Y.-Z.; and Xiang, T. 2023. Controllable person image synthesis with pose-constrained latent diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 22768--22777

work page 2023
[20]

He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; and Jiang, W. 2021 a . TransReID: Transformer-Based Object Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 15013--15022

work page 2021
[21]

He, T.; Jin, X.; Shen, X.; Huang, J.; Chen, Z.; and Hua, X.-S. 2021 b . Dense interaction learning for video-based person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1490--1501

work page 2021
[22]

M.; K \"o stinger, M.; and Bischof, H

Hirzer, M.; Roth, P. M.; K \"o stinger, M.; and Bischof, H. 2012. Relaxed pairwise learned metric for person re-identification. In Computer Vision--ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12, 780--793. Springer

work page 2012
[23]

Ho, J.; Jain, A.; and Abbeel, P. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 6840--6851

work page 2020
[24]

Hoffer, E.; and Ailon, N. 2015. Deep metric learning using triplet network. In Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3, 84--92. Springer

work page 2015
[25]

Hu, L.; Gao, X.; Zhang, P.; Sun, K.; Zhang, B.; and Bo, L. 2023. Animate anyone: Consistent and controllable image-to-video synthesis for character animation. arXiv preprint arXiv:2311.17117

work page arXiv 2023
[26]

Huang, H.; Li, D.; Zhang, Z.; Chen, X.; and Huang, K. 2018. Adversarially Occluded Samples for Person Re-identification. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5098--5107. IEEE Computer Society

work page 2018
[27]

Jin, X.; Lan, C.; Zeng, W.; Wei, G.; and Chen, Z. 2020. Semantics-aligned representation learning for person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 11173--11180

work page 2020
[28]

Kapil, S. 2021. Locally Aware Transformer for Person Re-Identification. Master's thesis, University of Maryland, Baltimore County

work page 2021
[29]

Karanam, S.; Li, Y.; and Radke, R. J. 2015. Person Re-Identification with Discriminatively Trained Viewpoint Invariant Dictionaries. In 2015 IEEE International Conference on Computer Vision (ICCV), 4516--4524. IEEE

work page 2015
[30]

Karras, J.; Holynski, A.; Wang, T.-C.; and Kemelmacher-Shlizerman, I. 2023. Dreampose: Fashion image-to-video synthesis via stable diffusion. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 22623--22633. IEEE

work page 2023
[31]

Adam: A Method for Stochastic Optimization

Kingma, D. P.; and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[32]

M.; and Bischof, H

Koestinger, M.; Hirzer, M.; Wohlhart, P.; Roth, P. M.; and Bischof, H. 2012. Large scale metric learning from equivalence constraints. In 2012 IEEE conference on computer vision and pattern recognition, 2288--2295. IEEE

work page 2012
[33]

Li, S.; Sun, L.; and Li, Q. 2023. CLIP-ReID: exploiting vision-language model for image re-identification without concrete text labels. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 1405--1413

work page 2023
[34]

Li, W.; Zhao, R.; Xiao, T.; and Wang, X. 2014. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 152--159

work page 2014
[35]

Liao, S.; and Li, S. Z. 2015. Efficient psd constrained asymmetric metric learning for person re-identification. In Proceedings of the IEEE international conference on computer vision, 3685--3693

work page 2015
[36]

Liu, J.; Ni, B.; Yan, Y.; Zhou, P.; Cheng, S.; and Hu, J. 2018. Pose Transferrable Person Re-identification. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE

work page 2018
[37]

Liu, X.; Song, M.; Tao, D.; Zhou, X.; Chen, C.; and Bu, J. 2014. Semi-supervised Coupled Dictionary Learning for Person Re-identification. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3550--3557. IEEE Computer Society

work page 2014
[38]

Loper, M.; Mahmood, N.; Romero, J.; Pons-Moll, G.; and Black, M. J. 2015. SMPL: A Skinned Multi-Person Linear Model. Acm Transactions on Graphics, 34(Article 248)

work page 2015
[39]

Loper, M.; Mahmood, N.; Romero, J.; Pons-Moll, G.; and Black, M. J. 2023. SMPL: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, 851--866

work page 2023
[40]

Luo, C.; Song, C.; and Zhang, Z. 2020. Generalizing person re-identification by camera-aware invariance learning and cross-domain mixup. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XV 16, 224--241. Springer

work page 2020
[41]

Luo, H.; Jiang, W.; Gu, Y.; Liu, F.; Liao, X.; Lai, S.; and Gu, J. 2019. A strong baseline and batch normneuralization neck for deep person reidentification. arXiv preprint arXiv:1906.08332

work page arXiv 2019
[42]

M.; and Miller, P

McLaughlin, N.; Del Rincon, J. M.; and Miller, P. 2015. Data-augmentation for reducing dataset bias in person re-identification. In 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 1--6. IEEE Computer Society

work page 2015
[43]

Ni, X.; and Rahtu, E. 2021. Flipreid: closing the gap between training and inference in person re-identification. In 2021 9th European Workshop on Visual Information Processing (EUVIP), 1--6. IEEE

work page 2021
[44]

Qian, X.; Fu, Y.; Xiang, T.; Wang, W.; Qiu, J.; Wu, Y.; Jiang, Y.-G.; and Xue, X. 2018. Pose-normalized image generation for person re-identification. In Proceedings of the European conference on computer vision (ECCV), 650--667

work page 2018
[45]

W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al

Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748--8763. PMLR

work page 2021
[46]

Rao, Y.; Chen, G.; Lu, J.; and Zhou, J. 2021. Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 1025--1034

work page 2021
[47]

Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; and Ommer, B. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10684--10695

work page 2022
[48]

Ronneberger, O.; Fischer, P.; and Brox, T. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, 234--241. Springer

work page 2015
[49]

S.; Schumann, A.; Eberle, A.; and Stiefelhagen, R

Sarfraz, M. S.; Schumann, A.; Eberle, A.; and Stiefelhagen, R. 2018. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In Proceedings of the IEEE conference on computer vision and pattern recognition, 420--429

work page 2018
[50]

Somers, V.; De Vleeschouwer, C.; and Alahi, A. 2023. Body Part-Based Representation Learning for Occluded Person Re-Identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 1613--1623

work page 2023
[51]

Song, J.; Meng, C.; and Ermon, S. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502

work page internal anchor Pith review Pith/arXiv arXiv 2020
[52]

H.; and Sebe, N

Tang, H.; Bai, S.; Zhang, L.; Torr, P. H.; and Sebe, N. 2020. Xinggan for person image generation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXV 16, 717--734. Springer

work page 2020
[53]

Van der Maaten, L.; and Hinton, G. 2008. Visualizing data using t-SNE. Journal of machine learning research, 9(11)

work page 2008
[54]

Wang, G.; Lai, J.; Huang, P.; and Xie, X. 2019. Spatial-temporal person re-identification. In Proceedings of the AAAI conference on artificial intelligence, volume 33, 8933--8940

work page 2019
[55]

Wang, T.; Liu, H.; Song, P.; Guo, T.; and Shi, W. 2022. Pose-guided feature disentangling for occluded person re-identification based on transformer. In Proceedings of the AAAI conference on artificial intelligence, volume 36, 2540--2549

work page 2022
[56]

Wang, X. 2013. Intelligent multi-camera video surveillance: A review. Pattern recognition letters, 34(1): 3--19

work page 2013
[57]

Wei, L.; Zhang, S.; Gao, W.; and Tian, Q. 2018. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 79--88

work page 2018
[58]

Wieczorek, M.; Rychalska, B.; and Dabrowski, J. 2021. On the unreasonable effectiveness of centroids in image retrieval. In Neural Information Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8--12, 2021, Proceedings, Part IV 28, 212--223. Springer

work page 2021
[59]

Xiong, F.; Gou, M.; Camps, O.; and Sznaier, M. 2014. Person re-identification using kernel-based metric learning methods. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII 13, 1--16. Springer

work page 2014
[60]

H.; Yan, H.; Liu, J.-W.; Zhang, C.; Feng, J.; and Shou, M

Xu, Z.; Zhang, J.; Liew, J. H.; Yan, H.; Liu, J.-W.; Zhang, C.; Feng, J.; and Shou, M. Z. 2023. Magicanimate: Temporally consistent human image animation using diffusion model. arXiv preprint arXiv:2311.16498

work page arXiv 2023
[61]

Ye, M.; Shen, J.; Lin, G.; Xiang, T.; Shao, L.; and Hoi, S. C. 2021. Deep learning for person re-identification: A survey and outlook. IEEE transactions on pattern analysis and machine intelligence, 44(6): 2872--2893

work page 2021
[62]

Yu, H.-X.; Wu, A.; and Zheng, W.-S. 2018. Unsupervised person re-identification by deep asymmetric metric embedding. IEEE transactions on pattern analysis and machine intelligence, 42(4): 956--973

work page 2018
[63]

Vector-quantized Image Modeling with Improved VQGAN

Yu, J.; Li, X.; Koh, J. Y.; Zhang, H.; Pang, R.; Qin, J.; Ku, A.; Xu, Y.; Baldridge, J.; and Wu, Y. 2021. Vector-quantized image modeling with improved vqgan. arXiv preprint arXiv:2110.04627

work page internal anchor Pith review Pith/arXiv arXiv 2021
[64]

Zablotskaia, P.; Siarohin, A.; Zhao, B.; and Sigal, L. 2019. Dwnet: Dense warp-based network for pose-guided human video generation. arXiv preprint arXiv:1910.09139

work page arXiv 2019
[65]

Zang, X.; Li, G.; Gao, W.; and Shu, X. 2021. Learning to disentangle scenes for person re-identification. Image and Vision Computing, 116: 104330

work page 2021
[66]

Zhang, L.; Rao, A.; and Agrawala, M. 2023. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3836--3847

work page 2023
[67]

Zhang, P.; Yang, L.; Lai, J.-H.; and Xie, X. 2022. Exploring dual-task correlation for pose guided person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7713--7722

work page 2022
[68]

Zhao, H.; Tian, M.; Sun, S.; Shao, J.; Yan, J.; Yi, S.; Wang, X.; and Tang, X. 2017. Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE

work page 2017
[69]

Zhao, R.; Ouyang, W.; and Wang, X. 2013. Unsupervised Salience Learning for Person Re-identification. In 2013 IEEE Conference on Computer Vision and Pattern Recognition

work page 2013
[70]

Zheng, L.; Bie, Z.; Sun, Y.; Wang, J.; Su, C.; Wang, S.; and Tian, Q. 2016. Mars: A video benchmark for large-scale person re-identification. In Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, 868--884. Springer

work page 2016
[71]

Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; and Tian, Q. 2015. Scalable Person Re-identification: A Benchmark. In Computer Vision, IEEE International Conference on Computer Vision, 1116--1124

work page 2015
[72]

Zheng, L.; Yang, Y.; and Hauptmann, A. G. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984

work page internal anchor Pith review Pith/arXiv arXiv 2016
[73]

Zheng, L.; Yang, Y.; and Tian, Q. 2017. SIFT meets CNN: A decade survey of instance retrieval. IEEE transactions on pattern analysis and machine intelligence, 40(5): 1224--1244

work page 2017
[74]

Zheng, W.-S.; Gong, S.; and Xiang, T. 2011. Person re-identification by probabilistic relative distance comparison. In CVPR 2011, 649--656. IEEE

work page 2011
[75]

Zheng, Z.; Yang, X.; Yu, Z.; Zheng, L.; Yang, Y.; and Kautz, J. 2019. Joint discriminative and generative learning for person re-identification. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2138--2147

work page 2019
[76]

Zheng, Z.; Zheng, L.; and Yang, Y. 2017. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE

work page 2017
[77]

Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; and Yang, Y. 2020. Random Erasing Data Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 13001--13008

work page 2020
[78]

Zhong, Z.; Zheng, L.; Zheng, Z.; Li, S.; and Yang, Y. 2018. Camera Style Adaptation for Person Re-identification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

work page 2018
[79]

Zhu, K.; Guo, H.; Zhang, S.; Wang, Y.; Liu, J.; Wang, J.; and Tang, M. 2023. Aaformer: Auto-aligned transformer for person re-identification. IEEE Transactions on Neural Networks and Learning Systems

work page 2023
[80]

L.; Dai, Z.; Xu, Y.; Cao, X.; Yao, Y.; Zhu, H.; and Zhu, S

Zhu, S.; Chen, J. L.; Dai, Z.; Xu, Y.; Cao, X.; Yao, Y.; Zhu, H.; and Zhu, S. 2024. Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance. arXiv preprint arXiv:2403.14781

work page arXiv 2024

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Bak, S.; Zaidenberg, S.; Boulay, B.; and Bremond, F. 2014. Improving person re-identification by viewpoint cues. In 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 175--180. IEEE

work page 2014

[4] [4]

K.; Khan, S.; Cholakkal, H.; Anwer, R

Bhunia, A. K.; Khan, S.; Cholakkal, H.; Anwer, R. M.; Laaksonen, J.; Shah, M.; and Khan, F. S. 2023. Person image synthesis via denoising diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5968--5976

work page 2023

[5] [5]

Chan, C.; Ginosar, S.; Zhou, T.; and Efros, A. A. 2019. Everybody dance now. In Proceedings of the IEEE/CVF international conference on computer vision, 5933--5942

work page 2019

[6] [6]

Chen, W.; Xu, X.; Jia, J.; Luo, H.; Wang, Y.; Wang, F.; Jin, R.; and Sun, X. 2023. Beyond appearance: a semantic controllable self-supervised learning framework for human-centric visual tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15050--15061

work page 2023

[7] [7]

Chen, X.; Fu, C.; Zhao, Y.; Zheng, F.; Song, J.; Ji, R.; and Yang, Y. 2020. Salience-guided cascaded suppression network for person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3300--3310

work page 2020

[8] [8]

Chen, Y.-C.; Zhu, X.; Zheng, W.-S.; and Lai, J.-H. 2018. Person Re-Identification by Camera Correlation Aware Feature Augmentation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 40(2)

work page 2018

[9] [9]

Cho, Y.-J.; and Yoon, K.-J. 2016. Improving person re-identification via Pose-aware Multi-shot Matching. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 1354--1362. IEEE Computer Society and the Computer Vision Foundation (CVF)

work page 2016

[10] [10]

Co s ar, S.; and Bellotto, N. 2020. Human Re-identification with a robot thermal camera using entropy-based sampling. Journal of Intelligent & Robotic Systems, 98(1): 85--102

work page 2020

[11] [11]

Dai, Z.; Chen, M.; Gu, X.; Zhu, S.; and Tan, P. 2019. Batch DropBlock Network for Person Re-Identification and Beyond. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 3690--3700. IEEE Computer Society

work page 2019

[12] [12]

Ding, C.; Wang, K.; Wang, P.; and Tao, D. 2020. Multi-task learning with coarse priors for robust part-aware person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3): 1474--1488

work page 2020

[13] [13]

Fu, D.; Chen, D.; Bao, J.; Yang, H.; Yuan, L.; Zhang, L.; Li, H.; and Chen, D. 2021. Unsupervised pre-training for person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 14750--14759

work page 2021

[14] [14]

Ge, Y.; Li, Z.; Zhao, H.; Yin, G.; Yi, S.; Wang, X.; et al. 2018. Fd-gan: Pose-guided feature distilling gan for robust person re-identification. Advances in neural information processing systems, 31

work page 2018

[15] [16]

Gong, Y.; Huang, L.; and Chen, L. 2021 b . Eliminate deviation with deviation for data augmentation and a general multi-modal data learning method. arXiv preprint arXiv:2101.08533

work page arXiv 2021

[16] [17]

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2020. Generative adversarial networks. Communications of the ACM, 63(11): 139--144

work page 2020

[17] [18]

Gu, J.; Wang, K.; Luo, H.; Chen, C.; Jiang, W.; Fang, Y.; Zhang, S.; You, Y.; and Zhao, J. 2023. Msinet: Twins contrastive search of multi-scale interaction for object reid. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19243--19253

work page 2023

[18] [19]

Han, X.; Zhu, X.; Deng, J.; Song, Y.-Z.; and Xiang, T. 2023. Controllable person image synthesis with pose-constrained latent diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 22768--22777

work page 2023

[19] [20]

He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; and Jiang, W. 2021 a . TransReID: Transformer-Based Object Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 15013--15022

work page 2021

[20] [21]

He, T.; Jin, X.; Shen, X.; Huang, J.; Chen, Z.; and Hua, X.-S. 2021 b . Dense interaction learning for video-based person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1490--1501

work page 2021

[21] [22]

M.; K \"o stinger, M.; and Bischof, H

Hirzer, M.; Roth, P. M.; K \"o stinger, M.; and Bischof, H. 2012. Relaxed pairwise learned metric for person re-identification. In Computer Vision--ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12, 780--793. Springer

work page 2012

[22] [23]

Ho, J.; Jain, A.; and Abbeel, P. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 6840--6851

work page 2020

[23] [24]

Hoffer, E.; and Ailon, N. 2015. Deep metric learning using triplet network. In Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3, 84--92. Springer

work page 2015

[24] [25]

Hu, L.; Gao, X.; Zhang, P.; Sun, K.; Zhang, B.; and Bo, L. 2023. Animate anyone: Consistent and controllable image-to-video synthesis for character animation. arXiv preprint arXiv:2311.17117

work page arXiv 2023

[25] [26]

Huang, H.; Li, D.; Zhang, Z.; Chen, X.; and Huang, K. 2018. Adversarially Occluded Samples for Person Re-identification. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5098--5107. IEEE Computer Society

work page 2018

[26] [27]

Jin, X.; Lan, C.; Zeng, W.; Wei, G.; and Chen, Z. 2020. Semantics-aligned representation learning for person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 11173--11180

work page 2020

[27] [28]

Kapil, S. 2021. Locally Aware Transformer for Person Re-Identification. Master's thesis, University of Maryland, Baltimore County

work page 2021

[28] [29]

Karanam, S.; Li, Y.; and Radke, R. J. 2015. Person Re-Identification with Discriminatively Trained Viewpoint Invariant Dictionaries. In 2015 IEEE International Conference on Computer Vision (ICCV), 4516--4524. IEEE

work page 2015

[29] [30]

Karras, J.; Holynski, A.; Wang, T.-C.; and Kemelmacher-Shlizerman, I. 2023. Dreampose: Fashion image-to-video synthesis via stable diffusion. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 22623--22633. IEEE

work page 2023

[30] [31]

Adam: A Method for Stochastic Optimization

Kingma, D. P.; and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014

[31] [32]

M.; and Bischof, H

Koestinger, M.; Hirzer, M.; Wohlhart, P.; Roth, P. M.; and Bischof, H. 2012. Large scale metric learning from equivalence constraints. In 2012 IEEE conference on computer vision and pattern recognition, 2288--2295. IEEE

work page 2012

[32] [33]

Li, S.; Sun, L.; and Li, Q. 2023. CLIP-ReID: exploiting vision-language model for image re-identification without concrete text labels. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 1405--1413

work page 2023

[33] [34]

Li, W.; Zhao, R.; Xiao, T.; and Wang, X. 2014. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 152--159

work page 2014

[34] [35]

Liao, S.; and Li, S. Z. 2015. Efficient psd constrained asymmetric metric learning for person re-identification. In Proceedings of the IEEE international conference on computer vision, 3685--3693

work page 2015

[35] [36]

Liu, J.; Ni, B.; Yan, Y.; Zhou, P.; Cheng, S.; and Hu, J. 2018. Pose Transferrable Person Re-identification. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE

work page 2018

[36] [37]

Liu, X.; Song, M.; Tao, D.; Zhou, X.; Chen, C.; and Bu, J. 2014. Semi-supervised Coupled Dictionary Learning for Person Re-identification. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3550--3557. IEEE Computer Society

work page 2014

[37] [38]

Loper, M.; Mahmood, N.; Romero, J.; Pons-Moll, G.; and Black, M. J. 2015. SMPL: A Skinned Multi-Person Linear Model. Acm Transactions on Graphics, 34(Article 248)

work page 2015

[38] [39]

Loper, M.; Mahmood, N.; Romero, J.; Pons-Moll, G.; and Black, M. J. 2023. SMPL: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, 851--866

work page 2023

[39] [40]

Luo, C.; Song, C.; and Zhang, Z. 2020. Generalizing person re-identification by camera-aware invariance learning and cross-domain mixup. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XV 16, 224--241. Springer

work page 2020

[40] [41]

Luo, H.; Jiang, W.; Gu, Y.; Liu, F.; Liao, X.; Lai, S.; and Gu, J. 2019. A strong baseline and batch normneuralization neck for deep person reidentification. arXiv preprint arXiv:1906.08332

work page arXiv 2019

[41] [42]

M.; and Miller, P

McLaughlin, N.; Del Rincon, J. M.; and Miller, P. 2015. Data-augmentation for reducing dataset bias in person re-identification. In 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 1--6. IEEE Computer Society

work page 2015

[42] [43]

Ni, X.; and Rahtu, E. 2021. Flipreid: closing the gap between training and inference in person re-identification. In 2021 9th European Workshop on Visual Information Processing (EUVIP), 1--6. IEEE

work page 2021

[43] [44]

Qian, X.; Fu, Y.; Xiang, T.; Wang, W.; Qiu, J.; Wu, Y.; Jiang, Y.-G.; and Xue, X. 2018. Pose-normalized image generation for person re-identification. In Proceedings of the European conference on computer vision (ECCV), 650--667

work page 2018

[44] [45]

W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al

Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748--8763. PMLR

work page 2021

[45] [46]

Rao, Y.; Chen, G.; Lu, J.; and Zhou, J. 2021. Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 1025--1034

work page 2021

[46] [47]

Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; and Ommer, B. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10684--10695

work page 2022

[47] [48]

Ronneberger, O.; Fischer, P.; and Brox, T. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, 234--241. Springer

work page 2015

[48] [49]

S.; Schumann, A.; Eberle, A.; and Stiefelhagen, R

Sarfraz, M. S.; Schumann, A.; Eberle, A.; and Stiefelhagen, R. 2018. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In Proceedings of the IEEE conference on computer vision and pattern recognition, 420--429

work page 2018

[49] [50]

Somers, V.; De Vleeschouwer, C.; and Alahi, A. 2023. Body Part-Based Representation Learning for Occluded Person Re-Identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 1613--1623

work page 2023

[50] [51]

Song, J.; Meng, C.; and Ermon, S. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502

work page internal anchor Pith review Pith/arXiv arXiv 2020

[51] [52]

H.; and Sebe, N

Tang, H.; Bai, S.; Zhang, L.; Torr, P. H.; and Sebe, N. 2020. Xinggan for person image generation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXV 16, 717--734. Springer

work page 2020

[52] [53]

Van der Maaten, L.; and Hinton, G. 2008. Visualizing data using t-SNE. Journal of machine learning research, 9(11)

work page 2008

[53] [54]

Wang, G.; Lai, J.; Huang, P.; and Xie, X. 2019. Spatial-temporal person re-identification. In Proceedings of the AAAI conference on artificial intelligence, volume 33, 8933--8940

work page 2019

[54] [55]

Wang, T.; Liu, H.; Song, P.; Guo, T.; and Shi, W. 2022. Pose-guided feature disentangling for occluded person re-identification based on transformer. In Proceedings of the AAAI conference on artificial intelligence, volume 36, 2540--2549

work page 2022

[55] [56]

Wang, X. 2013. Intelligent multi-camera video surveillance: A review. Pattern recognition letters, 34(1): 3--19

work page 2013

[56] [57]

Wei, L.; Zhang, S.; Gao, W.; and Tian, Q. 2018. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 79--88

work page 2018

[57] [58]

Wieczorek, M.; Rychalska, B.; and Dabrowski, J. 2021. On the unreasonable effectiveness of centroids in image retrieval. In Neural Information Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8--12, 2021, Proceedings, Part IV 28, 212--223. Springer

work page 2021

[58] [59]

Xiong, F.; Gou, M.; Camps, O.; and Sznaier, M. 2014. Person re-identification using kernel-based metric learning methods. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII 13, 1--16. Springer

work page 2014

[59] [60]

H.; Yan, H.; Liu, J.-W.; Zhang, C.; Feng, J.; and Shou, M

Xu, Z.; Zhang, J.; Liew, J. H.; Yan, H.; Liu, J.-W.; Zhang, C.; Feng, J.; and Shou, M. Z. 2023. Magicanimate: Temporally consistent human image animation using diffusion model. arXiv preprint arXiv:2311.16498

work page arXiv 2023

[60] [61]

Ye, M.; Shen, J.; Lin, G.; Xiang, T.; Shao, L.; and Hoi, S. C. 2021. Deep learning for person re-identification: A survey and outlook. IEEE transactions on pattern analysis and machine intelligence, 44(6): 2872--2893

work page 2021

[61] [62]

Yu, H.-X.; Wu, A.; and Zheng, W.-S. 2018. Unsupervised person re-identification by deep asymmetric metric embedding. IEEE transactions on pattern analysis and machine intelligence, 42(4): 956--973

work page 2018

[62] [63]

Vector-quantized Image Modeling with Improved VQGAN

Yu, J.; Li, X.; Koh, J. Y.; Zhang, H.; Pang, R.; Qin, J.; Ku, A.; Xu, Y.; Baldridge, J.; and Wu, Y. 2021. Vector-quantized image modeling with improved vqgan. arXiv preprint arXiv:2110.04627

work page internal anchor Pith review Pith/arXiv arXiv 2021

[63] [64]

Zablotskaia, P.; Siarohin, A.; Zhao, B.; and Sigal, L. 2019. Dwnet: Dense warp-based network for pose-guided human video generation. arXiv preprint arXiv:1910.09139

work page arXiv 2019

[64] [65]

Zang, X.; Li, G.; Gao, W.; and Shu, X. 2021. Learning to disentangle scenes for person re-identification. Image and Vision Computing, 116: 104330

work page 2021

[65] [66]

Zhang, L.; Rao, A.; and Agrawala, M. 2023. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3836--3847

work page 2023

[66] [67]

Zhang, P.; Yang, L.; Lai, J.-H.; and Xie, X. 2022. Exploring dual-task correlation for pose guided person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7713--7722

work page 2022

[67] [68]

Zhao, H.; Tian, M.; Sun, S.; Shao, J.; Yan, J.; Yi, S.; Wang, X.; and Tang, X. 2017. Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE

work page 2017

[68] [69]

Zhao, R.; Ouyang, W.; and Wang, X. 2013. Unsupervised Salience Learning for Person Re-identification. In 2013 IEEE Conference on Computer Vision and Pattern Recognition

work page 2013

[69] [70]

Zheng, L.; Bie, Z.; Sun, Y.; Wang, J.; Su, C.; Wang, S.; and Tian, Q. 2016. Mars: A video benchmark for large-scale person re-identification. In Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, 868--884. Springer

work page 2016

[70] [71]

Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; and Tian, Q. 2015. Scalable Person Re-identification: A Benchmark. In Computer Vision, IEEE International Conference on Computer Vision, 1116--1124

work page 2015

[71] [72]

Zheng, L.; Yang, Y.; and Hauptmann, A. G. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984

work page internal anchor Pith review Pith/arXiv arXiv 2016

[72] [73]

Zheng, L.; Yang, Y.; and Tian, Q. 2017. SIFT meets CNN: A decade survey of instance retrieval. IEEE transactions on pattern analysis and machine intelligence, 40(5): 1224--1244

work page 2017

[73] [74]

Zheng, W.-S.; Gong, S.; and Xiang, T. 2011. Person re-identification by probabilistic relative distance comparison. In CVPR 2011, 649--656. IEEE

work page 2011

[74] [75]

Zheng, Z.; Yang, X.; Yu, Z.; Zheng, L.; Yang, Y.; and Kautz, J. 2019. Joint discriminative and generative learning for person re-identification. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2138--2147

work page 2019

[75] [76]

Zheng, Z.; Zheng, L.; and Yang, Y. 2017. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE

work page 2017

[76] [77]

Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; and Yang, Y. 2020. Random Erasing Data Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 13001--13008

work page 2020

[77] [78]

Zhong, Z.; Zheng, L.; Zheng, Z.; Li, S.; and Yang, Y. 2018. Camera Style Adaptation for Person Re-identification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

work page 2018

[78] [79]

Zhu, K.; Guo, H.; Zhang, S.; Wang, Y.; Liu, J.; Wang, J.; and Tang, M. 2023. Aaformer: Auto-aligned transformer for person re-identification. IEEE Transactions on Neural Networks and Learning Systems

work page 2023

[79] [80]

L.; Dai, Z.; Xu, Y.; Cao, X.; Yao, Y.; Zhu, H.; and Zhu, S

Zhu, S.; Chen, J. L.; Dai, Z.; Xu, Y.; Cao, X.; Yao, Y.; Zhu, H.; and Zhu, S. 2024. Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance. arXiv preprint arXiv:2403.14781

work page arXiv 2024