SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
Pith reviewed 2026-05-22 19:31 UTC · model grok-4.3
The pith
Fine-tuning Stable Diffusion on identity and view conditions from a ViT model generates view-mimicking features that improve aerial-ground person re-identification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that extracting controllable identity and view conditions via a ViT-based model, using those conditions to fine-tune Stable Diffusion for enhanced person representations, and applying a View-Refined Decoder to merge instance-level and global-level features yields improved retrieval of specific persons across aerial and ground cameras on the CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR, and G2APS-ReID datasets.
What carries the argument
The fine-tuned Stable Diffusion model guided by identity and view conditions extracted from a ViT-based model, together with the View-Refined Decoder that integrates instance-level and global-level features.
Load-bearing premise
Fine-tuning Stable Diffusion with identity and view conditions extracted by a ViT-based model produces view-mimicking features that improve rather than degrade identity discrimination, and the View-Refined Decoder integrates instance-level and global-level features without introducing new inconsistencies.
What would settle it
If adding the generated view-mimicking features and View-Refined Decoder outputs lowers retrieval accuracy on the five AG-ReID benchmarks relative to the ViT baseline alone, the central claim would be falsified.
Figures
read the original abstract
Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative models to maintain the identity consistency despite drastic changes in camera viewpoints. The core idea behind these methods is quite natural, but designing a view-robust model is a very challenging task. Moreover, they overlook the contribution of view-specific features in enhancing the model's ability to represent persons. To address these issues, we propose a novel generative framework named SD-ReID for AG-ReID, which leverages generative models to mimic the feature distribution of different views while extracting robust identity representations. More specifically, we first train a ViT-based model to extract person representations along with controllable conditions, including identity and view conditions. We then fine-tune the Stable Diffusion (SD) model to enhance person representations guided by these controllable conditions. Furthermore, we introduce the View-Refined Decoder (VRD) to bridge the gap between instance-level and global-level features. Finally, both person representations and all-view features are employed to retrieve target persons. Extensive experiments on five AG-ReID benchmarks (i.e., CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR and G2APS-ReID) demonstrate the effectiveness of our proposed method. The source code and pre-trained models are available at https://github.com/924973292/SD-ReID.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SD-ReID, a generative framework for Aerial-Ground Person Re-Identification (AG-ReID). It first trains a ViT-based model to extract identity and view conditions from person images. These conditions then guide fine-tuning of a Stable Diffusion model to mimic feature distributions across views. A View-Refined Decoder (VRD) is introduced to integrate instance-level and global-level features. The resulting person representations and all-view features are used together for retrieval. The authors assert that this approach improves robustness to viewpoint changes and report effectiveness on five AG-ReID benchmarks (CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR, G2APS-ReID), with code and models released publicly.
Significance. If the empirical claims hold, the work has moderate significance for AG-ReID by extending conditional diffusion models to synthesize view-specific features while preserving identity discrimination, moving beyond purely discriminative view-robust designs. The public release of source code and pre-trained models at the cited GitHub repository strengthens reproducibility and allows direct verification of the pipeline.
major comments (2)
- [Abstract] Abstract: The central claim that the method 'demonstrate[s] the effectiveness' on five benchmarks rests on experimental outcomes, yet the manuscript text supplies no quantitative results, performance tables, ablation studies, or error analysis. Without these, the improvement over prior discriminative models cannot be assessed and is load-bearing for the contribution.
- [Method] Method (View-Refined Decoder description): The VRD is asserted to successfully bridge instance-level and global-level features without introducing inconsistencies, but no architecture diagram, equations for feature fusion, or training objective for the decoder are provided. This leaves the weakest assumption unverified and directly affects whether the combined representations improve rather than degrade discrimination.
minor comments (2)
- [Introduction] The transition in the introduction from limitations of prior work to the proposed generative approach could be tightened for clarity.
- Notation for the controllable conditions (identity and view) extracted by the ViT could be formalized with explicit symbols to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to improve clarity and completeness while preserving the core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the method 'demonstrate[s] the effectiveness' on five benchmarks rests on experimental outcomes, yet the manuscript text supplies no quantitative results, performance tables, ablation studies, or error analysis. Without these, the improvement over prior discriminative models cannot be assessed and is load-bearing for the contribution.
Authors: We agree that the abstract would be strengthened by including key quantitative results. The full manuscript contains detailed performance tables, ablation studies, and comparisons in the Experiments section. In the revision, we will update the abstract to summarize the main empirical gains (e.g., average Rank-1 improvements across the five benchmarks) so the effectiveness claim is directly supported. revision: yes
-
Referee: [Method] Method (View-Refined Decoder description): The VRD is asserted to successfully bridge instance-level and global-level features without introducing inconsistencies, but no architecture diagram, equations for feature fusion, or training objective for the decoder are provided. This leaves the weakest assumption unverified and directly affects whether the combined representations improve rather than degrade discrimination.
Authors: We acknowledge that the current description of the View-Refined Decoder would benefit from additional technical detail. We will add an architecture diagram, explicit equations for the instance-to-global feature fusion, and the precise training objective for the decoder in the revised Method section. This will allow readers to verify that the fusion improves rather than degrades discrimination. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper defines a procedural pipeline: train a ViT-based extractor for identity and view conditions, fine-tune Stable Diffusion under those conditions, insert a View-Refined Decoder, and combine instance- and global-level features for retrieval. All performance claims are obtained by standard supervised training and evaluation on five external public benchmarks (CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR, G2APS-ReID). No equation equates a claimed improvement to a fitted parameter by construction, no uniqueness theorem is imported from prior self-work, and no ansatz is smuggled via self-citation. The derivation therefore remains self-contained against external data and does not reduce to its own inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- hyperparameters for ViT and Stable Diffusion fine-tuning
axioms (1)
- domain assumption Stable Diffusion can be fine-tuned to synthesize view-specific feature distributions when conditioned on identity and view signals.
invented entities (1)
-
View-Refined Decoder (VRD)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Illumination-invariant person re-identification,
Y . Huang, Z.-J. Zha, X. Fu, and W. Zhang, “Illumination-invariant person re-identification,” inACMMM, 2019, pp. 365–373
work page 2019
-
[2]
Multi-scale learning for low-resolution person re-identification,
X. Li, W.-S. Zheng, X. Wang, T. Xiang, and S. Gong, “Multi-scale learning for low-resolution person re-identification,” inICCV, 2015, pp. 3765–3773
work page 2015
-
[3]
Adversarially occluded samples for person re-identification,
H. Huang, D. Li, Z. Zhang, X. Chen, and K. Huang, “Adversarially occluded samples for person re-identification,” inCVPR, 2018, pp. 5098–5107
work page 2018
-
[4]
H. Nguyen, K. Nguyen, S. Sridharan, and C. Fookes, “Aerial-ground person re-id,” inICME, 2023, pp. 2585–2590
work page 2023
-
[5]
View-decoupled transformer for person re-identification under aerial-ground camera network,
Q. Zhang, L. Wang, V . M. Patel, X. Xie, and J. Lai, “View-decoupled transformer for person re-identification under aerial-ground camera network,” inCVPR, 2024, pp. 22 000–22 009
work page 2024
-
[6]
Ag-reid. v2: Bridging aerial and ground views for person re-identification,
H. Nguyen, K. Nguyen, S. Sridharan, and C. Fookes, “Ag-reid. v2: Bridging aerial and ground views for person re-identification,”TIFS, pp. 2896 – 2908, 2024
work page 2024
-
[7]
High- resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inCVPR, 2022, pp. 10 684–10 695
work page 2022
-
[8]
Diffusiondet: Diffusion model for object detection,
S. Chen, P. Sun, Y . Song, and P. Luo, “Diffusiondet: Diffusion model for object detection,” inICCV, 2023, pp. 19 830–19 843
work page 2023
-
[9]
A generalist framework for panoptic segmentation of images and videos,
T. Chen, L. Li, S. Saxena, G. Hinton, and D. J. Fleet, “A generalist framework for panoptic segmentation of images and videos,” inICCV, 2023, pp. 909–919
work page 2023
-
[10]
Deep metric learning for person re-identification,
D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Deep metric learning for person re-identification,” inICPR, 2014, pp. 34–39
work page 2014
-
[11]
Omni-scale feature learning for person re-identification,
K. Zhou, Y . Yang, A. Cavallaro, and T. Xiang, “Omni-scale feature learning for person re-identification,” inICCV, 2019, pp. 3702–3712
work page 2019
-
[12]
Auto-reid: Searching for a part-aware convnet for person re-identification,
R. Quan, X. Dong, Y . Wu, L. Zhu, and Y . Yang, “Auto-reid: Searching for a part-aware convnet for person re-identification,” inICCV, 2019, pp. 3750–3759
work page 2019
-
[13]
Transreid: Transformer-based object re-identification,
S. He, H. Luo, P. Wang, F. Wang, H. Li, and W. Jiang, “Transreid: Transformer-based object re-identification,” inICCV, 2021, pp. 15 013– 15 022
work page 2021
-
[14]
S. Li, L. Sun, and Q. Li, “Clip-reid: exploiting vision-language model for image re-identification without concrete text labels,” inAAAI, vol. 37, no. 1, 2023, pp. 1405–1413. IEEE TRANSACTIONS ON IMAGE PROCESSING 11
work page 2023
-
[15]
Rgb-infrared cross-modality person re-identification,
A. Wu, W.-S. Zheng, H.-X. Yu, S. Gong, and J. Lai, “Rgb-infrared cross-modality person re-identification,” inICCV, 2017, pp. 5380–5389
work page 2017
-
[16]
Hierarchical discriminative learning for visible thermal person re-identification,
M. Ye, X. Lan, J. Li, and P. Yuen, “Hierarchical discriminative learning for visible thermal person re-identification,” inAAAI, vol. 32, no. 1, 2018
work page 2018
-
[17]
H. Lu, X. Zou, and P. Zhang, “Learning progressive modality-shared transformers for effective visible-infrared person re-identification,” in AAAI, vol. 37, no. 2, 2023, pp. 1835–1843
work page 2023
-
[18]
Top-reid: Multi-spectral object re-identification with token permutation,
Y . Wang, X. Liu, P. Zhang, H. Lu, Z. Tu, and H. Lu, “Top-reid: Multi-spectral object re-identification with token permutation,” inAAAI, vol. 38, no. 6, 2024, pp. 5758–5766
work page 2024
-
[19]
Magic tokens: Select diverse tokens for multi-modal object re-identification,
P. Zhang, Y . Wang, Y . Liu, Z. Tu, and H. Lu, “Magic tokens: Select diverse tokens for multi-modal object re-identification,” inCVPR, 2024, pp. 17 117–17 126
work page 2024
-
[20]
Mam- bapro: Multi-modal object re-identification with mamba aggregation and synergistic prompt,
Y . Wang, X. Liu, T. Yan, Y . Liu, A. Zheng, P. Zhang, and H. Lu, “Mam- bapro: Multi-modal object re-identification with mamba aggregation and synergistic prompt,” inAAAI, vol. 39, no. 8, 2025, pp. 8150–8158
work page 2025
-
[21]
Decoupled feature-based mixture of experts for multi-modal object re-identification,
Y . Wang, Y . Liu, A. Zheng, and P. Zhang, “Decoupled feature-based mixture of experts for multi-modal object re-identification,” inAAAI, vol. 39, no. 8, 2025, pp. 8141–8149
work page 2025
-
[22]
Y . Wang, Y . Lv, P. Zhang, and H. Lu, “Idea: Inverted text with cooper- ative deformable aggregation for multi-modal object re-identification,” inCVPR, 2025, pp. 29 701–29 710
work page 2025
-
[23]
S. Wang, Y . Wang, R. Wu, B. Jiao, W. Wang, and P. Wang, “Secap: Self- calibrating and adaptive prompts for cross-view person re-identification in aerial-ground networks,” inCVPR, 2025, pp. 22 119–22 128
work page 2025
-
[24]
Cross-platform video person reid: A new benchmark dataset and adaptation approach,
S. Zhang, W. Luo, D. Cheng, Q. Yang, L. Ran, Y . Xing, and Y . Zhang, “Cross-platform video person reid: A new benchmark dataset and adaptation approach,” inECCV, 2024, pp. 270–287
work page 2024
-
[25]
Detreidx: A stress-test dataset for real-world uav-based person recognition,
K. A. Hambarde, N. Mbongo, P. K. MP, S. Mekewad, C. Fernandes, G. Silahtaro ˘glu, A. Nithya, P. Wasnik, M. Rashidunnabi, P. Samale et al., “Detreidx: A stress-test dataset for real-world uav-based person recognition,”arXiv preprint arXiv:2505.04793, 2025
-
[26]
Multi-modal multi-platform person re-identification: Benchmark and method,
R. Ha, S. Jiang, B. Li, B. Pan, Y . Zhu, J. Zhang, X. Zhu, S. Gong, and J. Wang, “Multi-modal multi-platform person re-identification: Benchmark and method,”arXiv preprint arXiv:2503.17096, 2025
-
[27]
H. Nguyen, K. Nguyen, A. Pemasiri, F. Liu, S. Sridharan, and C. Fookes, “Ag-vpreid: A challenging large-scale benchmark for aerial-ground video-based person re-identification,” inCVPR, 2025, pp. 1241–1251
work page 2025
-
[28]
Ag-vpreid. vir: Bridging aerial and ground platforms for video- based visible-infrared person re-id,
H. Nguyen, K. Nguyen, A. Pemasiri, A. Jahan, C. Fookes, and S. Srid- haran, “Ag-vpreid. vir: Bridging aerial and ground platforms for video- based visible-infrared person re-id,”arXiv preprint arXiv:2507.17995, 2025
-
[29]
Dynamic token selective transformer for aerial-ground person re-identification,
Y . Wang and M. Pishgar, “Dynamic token selective transformer for aerial-ground person re-identification,”arXiv preprint arXiv:2412.00433v2, 2024
-
[30]
Latex: Leveraging attribute- based text knowledge for aerial-ground person re-identification,
X. Hu, Y . Wang, P. Zhang, and H. Lu, “Latex: Leveraging attribute- based text knowledge for aerial-ground person re-identification,”arXiv preprint arXiv:2503.23722, 2025
-
[31]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” NeurIPS, vol. 33, pp. 6840–6851, 2020
work page 2020
-
[32]
Denoising Diffusion Implicit Models
J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[33]
Diffusion models beat gans on image synthesis,
P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,”NeurIPS, vol. 34, pp. 8780–8794, 2021
work page 2021
-
[34]
Classifier-Free Diffusion Guidance
J. Ho and T. Salimans, “Classifier-free diffusion guidance,”arXiv preprint arXiv:2207.12598, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[35]
Diffusevae: Efficient, controllable and high-fidelity generation from low-dimensional latents,
K. Pandey, A. Mukherjee, P. Rai, and A. Kumar, “Diffusevae: Efficient, controllable and high-fidelity generation from low-dimensional latents,” arXiv preprint arXiv:2201.00308, 2022
-
[36]
Photorealistic text-to-image diffusion models with deep language understanding,
C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to-image diffusion models with deep language understanding,”NeurIPS, vol. 35, pp. 36 479–36 494, 2022
work page 2022
-
[37]
Hierarchical Text-Conditional Image Generation with CLIP Latents
A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,”arXiv preprint arXiv:2204.06125, vol. 1, no. 2, p. 3, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[38]
Cascaded diffusion models for high fidelity image generation,
J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, and T. Salimans, “Cascaded diffusion models for high fidelity image generation,”JMLR, vol. 23, no. 47, pp. 1–33, 2022
work page 2022
-
[39]
Feature erasing and diffusion network for occluded person re-identification,
Z. Wang, F. Zhu, S. Tang, R. Zhao, L. He, and J. Song, “Feature erasing and diffusion network for occluded person re-identification,” inCVPR, 2022, pp. 4754–4763
work page 2022
-
[40]
Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification
I. H. Kim, J. Lee, W. Jin, S. Son, K. Cho, J. Seo, M.-S. Kwak, S. Cho, J. Baek, B. Leeet al., “Pose-dive: Pose-diversified augmen- tation with diffusion model for person re-identification,”arXiv preprint arXiv:2406.16042, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[41]
Rethinking the inception architecture for computer vision,
C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inCVPR, 2016, pp. 2818–2826
work page 2016
-
[42]
In Defense of the Triplet Loss for Person Re-Identification
A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identification,”arXiv preprint arXiv:1703.07737, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[43]
Coarse-to-fine latent diffusion for pose-guided person image synthesis,
Y . Lu, M. Zhang, A. J. Ma, X. Xie, and J. Lai, “Coarse-to-fine latent diffusion for pose-guided person image synthesis,” inCVPR, 2024, pp. 6420–6429
work page 2024
-
[44]
Fastreid: A pytorch toolbox for general instance re-identification,
L. He, X. Liao, W. Liu, X. Liu, P. Cheng, and T. Mei, “Fastreid: A pytorch toolbox for general instance re-identification,” inACMMM, 2023, pp. 9664–9667
work page 2023
-
[45]
Learning part-based convolutional features for person re-identification,
Y . Sun, L. Zheng, Y . Li, Y . Yang, Q. Tian, and S. Wang, “Learning part-based convolutional features for person re-identification,”TPAMI, vol. 43, no. 3, pp. 902–917, 2019
work page 2019
-
[46]
Bag of tricks and a strong baseline for deep person re-identification,
H. Luo, Y . Gu, X. Liao, S. Lai, and W. Jiang, “Bag of tricks and a strong baseline for deep person re-identification,” inCVPR workshops, 2019, pp. 0–0
work page 2019
-
[47]
Learning discriminative features with multiple granularities for person re-identification,
G. Wang, Y . Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identification,” in ACMMM, 2018, pp. 274–282
work page 2018
-
[48]
A strong and efficient baseline for vehicle re-identification using deep triplet embedding,
R. Kumar, E. Weill, F. Aghdasi, and P. Sriram, “A strong and efficient baseline for vehicle re-identification using deep triplet embedding,” JAISCR, vol. 10, no. 1, pp. 27–45, 2020
work page 2020
-
[49]
Deep learning for person re-identification: A survey and outlook,
M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,”TPAMI, vol. 44, no. 6, pp. 2872–2893, 2021
work page 2021
-
[50]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[51]
Learning generalisable omni-scale representations for person re-identification,
K. Zhou, Y . Yang, A. Cavallaro, and T. Xiang, “Learning generalisable omni-scale representations for person re-identification,”TPAMI, vol. 44, no. 9, pp. 5056–5069, 2021
work page 2021
-
[52]
Y . Wang, P. Zhang, X. Liu, Z. Tu, and H. Lu, “Unity is strength: Unifying convolutional and transformeral features for better person re- identification,”TITS, 2025
work page 2025
-
[53]
Prototypical contrastive learning-based clip fine- tuning for object re-identification,
J. Li and X. Gong, “Prototypical contrastive learning-based clip fine- tuning for object re-identification,”arXiv preprint arXiv:2310.17218, 2023
-
[54]
Swin transformer: Hierarchical vision transformer using shifted windows,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inICCV, 2021, pp. 10 012–10 022
work page 2021
-
[55]
Deep high-resolution representation learning for visual recognition,
J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y . Zhao, D. Liu, Y . Mu, M. Tan, X. Wanget al., “Deep high-resolution representation learning for visual recognition,”TPAMI, vol. 43, no. 10, pp. 3349–3364, 2020
work page 2020
-
[56]
Swin transformer v2: Scaling up capacity and resolution,
Z. Liu, H. Hu, Y . Lin, Z. Yao, Z. Xie, Y . Wei, J. Ning, Y . Cao, Z. Zhang, L. Donget al., “Swin transformer v2: Scaling up capacity and resolution,” inCVPR, 2022, pp. 12 009–12 019
work page 2022
-
[57]
Ag-reid 2023: Aerial-ground person re-identification challenge results,
K. Nguyen, C. Fookes, S. Sridharan, F. Liu, X. Liu, A. Ross, D. Michal- ski, H. Nguyen, D. Deb, M. Kothariet al., “Ag-reid 2023: Aerial-ground person re-identification challenge results,” inIJCB, 2023, pp. 1–10
work page 2023
-
[58]
R. Wu, B. Jiao, W. Wang, M. Liu, and P. Wang, “Enhancing visible- infrared person re-identification with modality-and instance-aware visual prompt learning,” inICMR, 2024, pp. 579–588
work page 2024
-
[59]
Ground-to-aerial person search: Benchmark dataset and approach,
S. Zhang, Q. Yang, D. Cheng, Y . Xing, G. Liang, P. Wang, and Y . Zhang, “Ground-to-aerial person search: Benchmark dataset and approach,” in ACM MM, 2023, pp. 789–799
work page 2023
-
[60]
Computational and performance aspects of pca-based face-recognition algorithms,
H. Moon and P. J. Phillips, “Computational and performance aspects of pca-based face-recognition algorithms,”Perception, vol. 30, no. 3, pp. 303–321, 2001
work page 2001
-
[61]
Scalable person re-identification: A benchmark,
L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” inICCV, 2015, pp. 1116–1124
work page 2015
-
[62]
Diffusers: State-of-the-art diffusion mod- els,
P. V on Platen, S. Patil, A. Lozhkov, P. Cuenca, N. Lambert, K. Rasul, M. Davaadorj, and T. Wolf, “Diffusers: State-of-the-art diffusion mod- els,” 2022
work page 2022
-
[63]
Random erasing data augmentation,
Z. Zhong, L. Zheng, G. Kang, S. Li, and Y . Yang, “Random erasing data augmentation,” inAAAI, vol. 34, no. 07, 2020, pp. 13 001–13 008
work page 2020
-
[64]
Large-scale machine learning with stochastic gradient de- scent,
L. Bottou, “Large-scale machine learning with stochastic gradient de- scent,” inICCS, 2010, pp. 177–186
work page 2010
-
[65]
Adam: A Method for Stochastic Optimization
D. P. Kingma, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[66]
L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.”JMLR, vol. 9, no. 11, 2008
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.