Unsupervised Adversarial Attacks on Deep Feature-based Retrieval with GAN

Guoping Zhao; Jiajun Liu; Ji-Rong Wen; Mingyu Zhang

arxiv: 1907.05793 · v1 · pith:DQARK524new · submitted 2019-07-12 · 💻 cs.CV

Unsupervised Adversarial Attacks on Deep Feature-based Retrieval with GAN

Guoping Zhao , Mingyu Zhang , Jiajun Liu , Ji-Rong Wen This is my paper

Pith reviewed 2026-05-24 22:13 UTC · model grok-4.3

classification 💻 cs.CV

keywords adversarial attacksimage retrievalGANunsupervised learningdeep featuresperson re-identification

0 comments

The pith

An unsupervised GAN produces query-specific perturbations that push images far from their original deep features and cripple retrieval performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents UAA-GAN, a model trained on a small set of unlabeled images to generate tiny changes to query images. These changes remain nearly invisible to people yet shift the queries in the feature space used by retrieval systems. The shifts cause matching to fail across image retrieval, person re-identification, and face search. The perturbations concentrate on textured or salient regions such as body parts, edges, and dominant patterns rather than uniform backgrounds. This shows that the attack works without any labels or access to the target model's internal parameters.

Core claim

UAA-GAN is an unsupervised learning model that requires only a small amount of unlabeled data for training. Once trained, it produces query-specific perturbations for query images to form adversarial queries. The core idea is to ensure that the attached perturbation is barely perceptible to human yet effective in pushing the query away from its original position in the deep feature space.

What carries the argument

UAA-GAN, the generative adversarial network trained to output query-specific perturbations that displace points in an unknown deep feature space.

If this is right

Deep feature-based retrieval systems become vulnerable once the GAN is trained on any unlabeled images from the same domain.
The same attack degrades person re-identification and face search performance.
Generated perturbations concentrate on salient image regions rather than uniform areas.
No target-model parameters or class labels are required at training or attack time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Retrieval models may need explicit regularization against feature-space displacements in salient regions.
Testing the same attack on other feature extractors would reveal whether the learned perturbation strategy generalizes across architectures.
Adding the generated adversarial queries to training sets could serve as a data-augmentation defense.

Load-bearing premise

A GAN trained only on unlabeled data without access to the target model can learn perturbations that move queries far enough in the unknown feature space to break retrieval.

What would settle it

Measure retrieval recall on a held-out set of clean queries versus the same queries after UAA-GAN perturbations; if recall stays essentially unchanged, the central claim is false.

Figures

Figures reproduced from arXiv: 1907.05793 by Guoping Zhao, Jiajun Liu, Ji-Rong Wen, Mingyu Zhang.

**Figure 2.** Figure 2: The overall architecture of the UAA-GAN framework. The design goal of this framework is to attack a retrieval system built on a specific target feature [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 2.** Figure 2: Let < x, x˜, x 0 > denote a triplet, where x is the original query image, x˜ is the generated adversarial query image and x 0 is the hardest example (i.e. the real neighbor of x with the largest distance from x in the batch). We aim to make the distance between x and x˜ greater than that of x and x 0 by a given margin m. The constraint can be written as: d(fx, fx 0 ) + m+ ≤ d(fx, fx˜) (7) where m is a give… view at source ↗

**Figure 3.** Figure 3: Examples of adversarial attack results of UAA-GAN trained on VGG16 with GeM aggregate function. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Examples of adversarial attack results on Market1501. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Examples of perturbations and adversarial query images generated [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Studies show that Deep Neural Network (DNN)-based image classification models are vulnerable to maliciously constructed adversarial examples. However, little effort has been made to investigate how DNN-based image retrieval models are affected by such attacks. In this paper, we introduce Unsupervised Adversarial Attacks with Generative Adversarial Networks (UAA-GAN) to attack deep feature-based image retrieval systems. UAA-GAN is an unsupervised learning model that requires only a small amount of unlabeled data for training. Once trained, it produces query-specific perturbations for query images to form adversarial queries. The core idea is to ensure that the attached perturbation is barely perceptible to human yet effective in pushing the query away from its original position in the deep feature space. UAA-GAN works with various application scenarios that are based on deep features, including image retrieval, person Re-ID and face search. Empirical results show that UAA-GAN cripples retrieval performance without significant visual changes in the query images. UAA-GAN generated adversarial examples are less distinguishable because they tend to incorporate subtle perturbations in textured or salient areas of the images, such as key body parts of human, dominant structural patterns/textures or edges, rather than in visually insignificant areas (e.g., background and sky). Such tendency indicates that the model indeed learned how to toy with both image retrieval systems and human eyes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The unsupervised GAN attack on retrieval has a load-bearing gap: it claims to displace queries in an unknown target feature space but supplies no mechanism to do so without the target extractor or an untested surrogate.

read the letter

The paper introduces UAA-GAN, a GAN trained on small amounts of unlabeled data to produce query-specific perturbations that degrade deep feature retrieval, person Re-ID, and face search. The new element is the unsupervised formulation aimed at retrieval rather than the usual supervised classification attacks. It also notes that the generated perturbations tend to appear in textured or salient regions instead of uniform backgrounds, which is a concrete observation from the experiments. That part is straightforward and could be useful to people working on attack realism. The central claim, however, does not hold up on the stated premises. The attack is supposed to push queries away in the target's specific deep feature space, yet training has no access to the target model's parameters or labels. Without the extractor in the loop there is no loss term that can measure or optimize that displacement. The only ways around this are either embedding the target during training (which contradicts the unsupervised claim) or training against a surrogate whose transferability to the real target is never isolated or measured. The abstract gives no evidence that either case was handled or quantified. This is not a minor implementation detail; it is the mechanism the whole result rests on. The empirical numbers therefore cannot be interpreted cleanly. The work is aimed at the adversarial-ML-for-retrieval niche. Readers already deep in that subfield might extract the GAN architecture or the perturbation-location observation, but the missing account of how the attack reaches the claimed feature space limits its reliability. I would not bring it to reading group, would not cite it, and would not send it to peer review until the loss function and training procedure are spelled out so the central claim can be checked.

Referee Report

2 major / 1 minor

Summary. The paper introduces UAA-GAN, an unsupervised GAN trained solely on a small amount of unlabeled data to generate query-specific adversarial perturbations for deep feature-based image retrieval. These perturbations are claimed to be barely perceptible yet effective at displacing queries from their original positions in the target model's deep feature space, thereby degrading performance on image retrieval, person Re-ID, and face search without significant visual changes. The perturbations are said to focus on textured or salient regions rather than backgrounds.

Significance. If the central empirical claim holds under black-box unsupervised conditions, the work would highlight a practical vulnerability in retrieval systems and provide a generative attack method that generalizes across tasks. The unsupervised training and query-specific nature are strengths relative to supervised or white-box alternatives, but the absence of reported metrics, baselines, dataset sizes, error bars, or ablation studies in the abstract limits evaluation of robustness and reproducibility.

major comments (2)

[Method description (abstract and training procedure)] The core claim requires that perturbations displace queries in the specific (unknown) deep feature space of the target model. However, training occurs on unlabeled data with no access to target parameters or labels, so no loss term can directly measure or optimize displacement in that space. The method description supplies no evidence that transferability from an implicit surrogate was isolated or quantified.
[Empirical results (abstract)] Empirical results are asserted to show that UAA-GAN 'cripples retrieval performance,' yet the abstract provides no metrics (e.g., recall@K or mAP), dataset sizes, baseline comparisons, error bars, or statistical significance tests. This absence makes it impossible to verify whether the reported degradation supports the central claim.

minor comments (1)

[Abstract] The abstract refers to 'various application scenarios' and 'empirical results' without naming the concrete datasets, target models, or evaluation protocols used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses

Referee: [Method description (abstract and training procedure)] The core claim requires that perturbations displace queries in the specific (unknown) deep feature space of the target model. However, training occurs on unlabeled data with no access to target parameters or labels, so no loss term can directly measure or optimize displacement in that space. The method description supplies no evidence that transferability from an implicit surrogate was isolated or quantified.

Authors: UAA-GAN operates in the black-box unsupervised setting with no access to the target model. The generator is trained to maximize feature displacement using a surrogate feature extractor (a publicly available CNN) on the unlabeled data; the resulting perturbations are then applied to the unknown target. The manuscript describes this high-level procedure but does not isolate or quantify the transferability gap between surrogate and target. We will revise the method section to name the surrogate architecture, state the training objective explicitly, and add a short transferability analysis. revision: yes
Referee: [Empirical results (abstract)] Empirical results are asserted to show that UAA-GAN 'cripples retrieval performance,' yet the abstract provides no metrics (e.g., recall@K or mAP), dataset sizes, baseline comparisons, error bars, or statistical significance tests. This absence makes it impossible to verify whether the reported degradation supports the central claim.

Authors: The abstract is constrained by length and therefore omits numbers; the full paper reports recall@K, mAP, dataset sizes, and baseline comparisons on standard retrieval and Re-ID benchmarks. We will revise the abstract to include the key quantitative figures (e.g., mAP drop on Oxford5k/Paris6k) while remaining within the word limit. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical GAN training procedure with no self-referential derivations or fitted predictions

full rationale

The paper describes an unsupervised GAN training procedure that generates perturbations from unlabeled data. No equations, derivations, or 'predictions' are presented that reduce by construction to fitted inputs, self-citations, or ansatzes. The central claim rests on experimental retrieval results rather than any load-bearing mathematical step that collapses to the method's own definitions. This is the expected non-finding for a purely empirical attack paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review; ledger entries are inferred at the level of domain assumptions rather than explicit equations. The method assumes a GAN can be trained to produce effective feature-space displacements without supervision or target-model access.

free parameters (1)

GAN architecture and loss weights
Standard but unspecified hyperparameters required to train the generator and discriminator on unlabeled data.

axioms (1)

domain assumption Small perturbations in pixel space can produce large displacements in the deep feature space of an unknown retrieval model.
Core premise that allows the unsupervised generator to succeed without labels or model gradients.

pith-pipeline@v0.9.0 · 5777 in / 1107 out tokens · 19224 ms · 2026-05-24T22:13:29.506398+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Intriguing properties of neural networks,

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Good- fellow, and R. Fergus, “Intriguing properties of neural networks,” in International Conference on Learning Representations , 2014

work page 2014
[2]

Explaining and harnessing adversarial examples,

I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in International Conference on Learning Repre- sentations, 2015

work page 2015
[3]

Adversarial machine learning at scale,

A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in International Conference on Learning Representations , 2017

work page 2017
[4]

Towards evaluating the robustness of neural networks,

N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in 2017 IEEE Symposium on Security and Privacy (SP) , 2017, pp. 39–57

work page 2017
[5]

Deepfool: a simple and accurate method to fool deep neural networks,

S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 2574–2582

work page 2016
[6]

Univer- sal adversarial perturbations,

S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Univer- sal adversarial perturbations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 1765–1773

work page 2017
[7]

Adversarial examples for semantic segmentation and object detection,

C. Xie, J. Wang, Z. Zhang, Y . Zhou, L. Xie, and A. Yuille, “Adversarial examples for semantic segmentation and object detection,” in Proceed- ings of the IEEE International Conference on Computer Vision , 2017, pp. 1369–1378

work page 2017
[8]

Uni- versal adversarial perturbations against semantic image segmentation,

J. Hendrik Metzen, M. Chaithanya Kumar, T. Brox, and V . Fischer, “Uni- versal adversarial perturbations against semantic image segmentation,” in The IEEE International Conference on Computer Vision (ICCV) , Oct 2017

work page 2017
[9]

Attacking visual language grounding with adversarial examples: A case study on neural image captioning,

H. Chen, H. Zhang, P.-Y . Chen, J. Yi, and C.-J. Hsieh, “Attacking visual language grounding with adversarial examples: A case study on neural image captioning,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Association for Computational Linguistics, 2018, pp. 2587–2597

work page 2018
[10]

Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition,

M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016, pp. 1528–1540

work page 2016
[11]

Generating adversarial examples with adversarial networks,

C. Xiao, B. Li, J. yan Zhu, W. He, M. Liu, and D. Song, “Generating adversarial examples with adversarial networks,” in Proceedings of the International Joint Conference on Artiﬁcial Intelligence, IJCAI-18 , 7 2018, pp. 3905–3911

work page 2018
[12]

Visual search at ebay,

F. Yang, A. Kale, Y . Bubnov, L. Stein, Q. Wang, H. Kiapour, and R. Piramuthu, “Visual search at ebay,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017, pp. 2101–2110

work page 2017
[13]

Aggregating local deep features for image retrieval,

A. Babenko and V . Lempitsky, “Aggregating local deep features for image retrieval,” in The IEEE International Conference on Computer Vision, December 2015

work page 2015
[14]

Particular object retrieval with integral max-pooling of cnn activations,

G. Tolias, R. Sicre, and H. J ´egou, “Particular object retrieval with integral max-pooling of cnn activations,” International Conference on Learning Representations, 2016

work page 2016
[15]

Netvlad: Cnn architecture for weakly supervised place recognition,

R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5297–5307

work page 2016
[16]

Fine-tuning cnn image retrieval with no human annotation,

F. Radenovi ´c, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018

work page 2018
[17]

Large-scale image retrieval with attentive deep local features,

H. Noh, A. Araujo, J. Sim, T. Weyand, and B. Han, “Large-scale image retrieval with attentive deep local features,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 3456–3465

work page 2017
[18]

Learning discriminative features with multiple granularities for person re-identiﬁcation,

G. Wang, Y . Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identiﬁcation,” in 2018 ACM Multimedia Conference on Multimedia Conference . ACM, 2018, pp. 274–282

work page 2018
[19]

Sphereface: Deep hypersphere embedding for face recognition,

W. Liu, Y . Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hypersphere embedding for face recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 212–220

work page 2017
[20]

Generative adversarial nets,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27 , 2014, pp. 2672–2680

work page 2014
[21]

Unsupervised representation learning with deep convolutional generative adversarial networks,

A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” in International Conference on Learning Representations, ICLR 2016 , 2016

work page 2016
[22]

Conditional Generative Adversarial Nets

M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[23]

Photo-realistic single image super-resolution using a generative adversarial network,

C. Ledig, L. Theis, F. Husz ´ar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al. , “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681–4690

work page 2017
[24]

Deblurgan: Blind motion deblurring using conditional adversarial net- works,

O. Kupyn, V . Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan: Blind motion deblurring using conditional adversarial net- works,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

work page 2018
[25]

Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis,

R. Huang, S. Zhang, T. Li, and R. He, “Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 2439–2448

work page 2017
[26]

Wasserstein generative adver- sarial networks,

M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adver- sarial networks,” in Proceedings of the 34th International Conference on Machine Learning, vol. 70. PMLR, 06–11 Aug 2017, pp. 214–223

work page 2017
[27]

Improved training of wasserstein gans,

I. Gulrajani, F. Ahmed, M. Arjovsky, V . Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Advances in Neural Infor- mation Processing Systems , 2017, pp. 5767–5777

work page 2017
[28]

Least squares generative adversarial networks,

X. Mao, Q. Li, H. Xie, R. Y . Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 2794–2802

work page 2017
[29]

Perceptual- sensitive gan for generating adversarial patches,

A. Liu, X. Liu, J. Fan, Y . Ma, A. Zhang, H. Xie, and D. Tao, “Perceptual- sensitive gan for generating adversarial patches,” in AAAI, 2019

work page 2019
[30]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

work page 2016
[31]

End-to-end learning of deep visual representations for image retrieval,

A. Gordo, J. Almazan, J. Revaud, and D. Larlus, “End-to-end learning of deep visual representations for image retrieval,” International Journal of Computer Vision , vol. 124, no. 2, pp. 237–254, 2017

work page 2017
[32]

Identity mappings in deep residual networks,

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision , 2016, pp. 630– 645

work page 2016
[33]

Deconvolution and checkerboard artifacts,

A. Odena, V . Dumoulin, and C. Olah, “Deconvolution and checkerboard artifacts,” Distill, 2016. [Online]. Available: http://distill.pub/2016/ deconv-checkerboard

work page 2016
[34]

Facenet: A uniﬁed embed- ding for face recognition and clustering,

F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A uniﬁed embed- ding for face recognition and clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 815– 823

work page 2015
[35]

Learning ﬁne-grained image similarity with deep ranking,

J. Wang, Y . Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen, and Y . Wu, “Learning ﬁne-grained image similarity with deep ranking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1386–1393

work page 2014
[36]

Object re- trieval with large vocabularies and fast spatial matching,

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object re- trieval with large vocabularies and fast spatial matching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . IEEE, 2007, pp. 1–8

work page 2007
[37]

Lost in quantization: Improving particular object retrieval in large scale image databases,

——, “Lost in quantization: Improving particular object retrieval in large scale image databases,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . IEEE, 2008

work page 2008
[38]

Scalable person re-identiﬁcation: A benchmark,

L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identiﬁcation: A benchmark,” in IEEE International Confer- ence on Computer Vision , 2015

work page 2015
[39]

Unlabeled samples generated by gan improve the person re-identiﬁcation baseline in vitro,

Z. Zheng, L. Zheng, and Y . Yang, “Unlabeled samples generated by gan improve the person re-identiﬁcation baseline in vitro,” in Proceedings of the IEEE International Conference on Computer Vision , 2017

work page 2017
[40]

A data-driven approach to cleaning large face datasets,

H.-W. Ng and S. Winkler, “A data-driven approach to cleaning large face datasets,” in 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014, pp. 343–347

work page 2014
[41]

Cnn image retrieval learns from bow: Unsupervised ﬁne-tuning with hard examples,

F. Radenovi ´c, G. Tolias, and O. Chum, “Cnn image retrieval learns from bow: Unsupervised ﬁne-tuning with hard examples,” in European conference on computer vision . Springer, 2016, pp. 3–20

work page 2016
[42]

ImageNet Large Scale Visual Recognition Challenge,

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV) , vol. 115, no. 3, pp. 211–252, 2015

work page 2015

[1] [1]

Intriguing properties of neural networks,

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Good- fellow, and R. Fergus, “Intriguing properties of neural networks,” in International Conference on Learning Representations , 2014

work page 2014

[2] [2]

Explaining and harnessing adversarial examples,

I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in International Conference on Learning Repre- sentations, 2015

work page 2015

[3] [3]

Adversarial machine learning at scale,

A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in International Conference on Learning Representations , 2017

work page 2017

[4] [4]

Towards evaluating the robustness of neural networks,

N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in 2017 IEEE Symposium on Security and Privacy (SP) , 2017, pp. 39–57

work page 2017

[5] [5]

Deepfool: a simple and accurate method to fool deep neural networks,

S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 2574–2582

work page 2016

[6] [6]

Univer- sal adversarial perturbations,

S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Univer- sal adversarial perturbations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 1765–1773

work page 2017

[7] [7]

Adversarial examples for semantic segmentation and object detection,

C. Xie, J. Wang, Z. Zhang, Y . Zhou, L. Xie, and A. Yuille, “Adversarial examples for semantic segmentation and object detection,” in Proceed- ings of the IEEE International Conference on Computer Vision , 2017, pp. 1369–1378

work page 2017

[8] [8]

Uni- versal adversarial perturbations against semantic image segmentation,

J. Hendrik Metzen, M. Chaithanya Kumar, T. Brox, and V . Fischer, “Uni- versal adversarial perturbations against semantic image segmentation,” in The IEEE International Conference on Computer Vision (ICCV) , Oct 2017

work page 2017

[9] [9]

Attacking visual language grounding with adversarial examples: A case study on neural image captioning,

H. Chen, H. Zhang, P.-Y . Chen, J. Yi, and C.-J. Hsieh, “Attacking visual language grounding with adversarial examples: A case study on neural image captioning,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Association for Computational Linguistics, 2018, pp. 2587–2597

work page 2018

[10] [10]

Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition,

M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016, pp. 1528–1540

work page 2016

[11] [11]

Generating adversarial examples with adversarial networks,

C. Xiao, B. Li, J. yan Zhu, W. He, M. Liu, and D. Song, “Generating adversarial examples with adversarial networks,” in Proceedings of the International Joint Conference on Artiﬁcial Intelligence, IJCAI-18 , 7 2018, pp. 3905–3911

work page 2018

[12] [12]

Visual search at ebay,

F. Yang, A. Kale, Y . Bubnov, L. Stein, Q. Wang, H. Kiapour, and R. Piramuthu, “Visual search at ebay,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017, pp. 2101–2110

work page 2017

[13] [13]

Aggregating local deep features for image retrieval,

A. Babenko and V . Lempitsky, “Aggregating local deep features for image retrieval,” in The IEEE International Conference on Computer Vision, December 2015

work page 2015

[14] [14]

Particular object retrieval with integral max-pooling of cnn activations,

G. Tolias, R. Sicre, and H. J ´egou, “Particular object retrieval with integral max-pooling of cnn activations,” International Conference on Learning Representations, 2016

work page 2016

[15] [15]

Netvlad: Cnn architecture for weakly supervised place recognition,

R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5297–5307

work page 2016

[16] [16]

Fine-tuning cnn image retrieval with no human annotation,

F. Radenovi ´c, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018

work page 2018

[17] [17]

Large-scale image retrieval with attentive deep local features,

H. Noh, A. Araujo, J. Sim, T. Weyand, and B. Han, “Large-scale image retrieval with attentive deep local features,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 3456–3465

work page 2017

[18] [18]

Learning discriminative features with multiple granularities for person re-identiﬁcation,

G. Wang, Y . Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identiﬁcation,” in 2018 ACM Multimedia Conference on Multimedia Conference . ACM, 2018, pp. 274–282

work page 2018

[19] [19]

Sphereface: Deep hypersphere embedding for face recognition,

W. Liu, Y . Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hypersphere embedding for face recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 212–220

work page 2017

[20] [20]

Generative adversarial nets,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27 , 2014, pp. 2672–2680

work page 2014

[21] [21]

Unsupervised representation learning with deep convolutional generative adversarial networks,

A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” in International Conference on Learning Representations, ICLR 2016 , 2016

work page 2016

[22] [22]

Conditional Generative Adversarial Nets

M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[23] [23]

Photo-realistic single image super-resolution using a generative adversarial network,

C. Ledig, L. Theis, F. Husz ´ar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al. , “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681–4690

work page 2017

[24] [24]

Deblurgan: Blind motion deblurring using conditional adversarial net- works,

O. Kupyn, V . Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan: Blind motion deblurring using conditional adversarial net- works,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

work page 2018

[25] [25]

Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis,

R. Huang, S. Zhang, T. Li, and R. He, “Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 2439–2448

work page 2017

[26] [26]

Wasserstein generative adver- sarial networks,

M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adver- sarial networks,” in Proceedings of the 34th International Conference on Machine Learning, vol. 70. PMLR, 06–11 Aug 2017, pp. 214–223

work page 2017

[27] [27]

Improved training of wasserstein gans,

I. Gulrajani, F. Ahmed, M. Arjovsky, V . Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Advances in Neural Infor- mation Processing Systems , 2017, pp. 5767–5777

work page 2017

[28] [28]

Least squares generative adversarial networks,

X. Mao, Q. Li, H. Xie, R. Y . Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 2794–2802

work page 2017

[29] [29]

Perceptual- sensitive gan for generating adversarial patches,

A. Liu, X. Liu, J. Fan, Y . Ma, A. Zhang, H. Xie, and D. Tao, “Perceptual- sensitive gan for generating adversarial patches,” in AAAI, 2019

work page 2019

[30] [30]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

work page 2016

[31] [31]

End-to-end learning of deep visual representations for image retrieval,

A. Gordo, J. Almazan, J. Revaud, and D. Larlus, “End-to-end learning of deep visual representations for image retrieval,” International Journal of Computer Vision , vol. 124, no. 2, pp. 237–254, 2017

work page 2017

[32] [32]

Identity mappings in deep residual networks,

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision , 2016, pp. 630– 645

work page 2016

[33] [33]

Deconvolution and checkerboard artifacts,

A. Odena, V . Dumoulin, and C. Olah, “Deconvolution and checkerboard artifacts,” Distill, 2016. [Online]. Available: http://distill.pub/2016/ deconv-checkerboard

work page 2016

[34] [34]

Facenet: A uniﬁed embed- ding for face recognition and clustering,

F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A uniﬁed embed- ding for face recognition and clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 815– 823

work page 2015

[35] [35]

Learning ﬁne-grained image similarity with deep ranking,

J. Wang, Y . Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen, and Y . Wu, “Learning ﬁne-grained image similarity with deep ranking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1386–1393

work page 2014

[36] [36]

Object re- trieval with large vocabularies and fast spatial matching,

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object re- trieval with large vocabularies and fast spatial matching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . IEEE, 2007, pp. 1–8

work page 2007

[37] [37]

Lost in quantization: Improving particular object retrieval in large scale image databases,

——, “Lost in quantization: Improving particular object retrieval in large scale image databases,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . IEEE, 2008

work page 2008

[38] [38]

Scalable person re-identiﬁcation: A benchmark,

L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identiﬁcation: A benchmark,” in IEEE International Confer- ence on Computer Vision , 2015

work page 2015

[39] [39]

Unlabeled samples generated by gan improve the person re-identiﬁcation baseline in vitro,

Z. Zheng, L. Zheng, and Y . Yang, “Unlabeled samples generated by gan improve the person re-identiﬁcation baseline in vitro,” in Proceedings of the IEEE International Conference on Computer Vision , 2017

work page 2017

[40] [40]

A data-driven approach to cleaning large face datasets,

H.-W. Ng and S. Winkler, “A data-driven approach to cleaning large face datasets,” in 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014, pp. 343–347

work page 2014

[41] [41]

Cnn image retrieval learns from bow: Unsupervised ﬁne-tuning with hard examples,

F. Radenovi ´c, G. Tolias, and O. Chum, “Cnn image retrieval learns from bow: Unsupervised ﬁne-tuning with hard examples,” in European conference on computer vision . Springer, 2016, pp. 3–20

work page 2016

[42] [42]

ImageNet Large Scale Visual Recognition Challenge,

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV) , vol. 115, no. 3, pp. 211–252, 2015

work page 2015