Semi-supervised Feature-Level Attribute Manipulation for Fashion Image Retrieval

Minchul Shin; Sanghyuk Park; Taeksoo Kim

arxiv: 1907.05007 · v1 · pith:HRMRMTAInew · submitted 2019-07-11 · 💻 cs.CV

Semi-supervised Feature-Level Attribute Manipulation for Fashion Image Retrieval

Minchul Shin , Sanghyuk Park , Taeksoo Kim This is my paper

Pith reviewed 2026-05-24 23:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords fashion image retrievalattribute manipulationfeature-level manipulationfashion attribute manipulationdistribution matchingsemi-supervised learninginstance retrieval

0 comments

The pith

Feature-level attribute manipulation lets existing fashion retrieval methods edit traits like color without losing search accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes performing fashion attribute manipulation directly on learned feature representations rather than on pixels or images. This is achieved by aligning the distribution of the edited features with the distribution of actual features from the data. The separation means that strong existing systems for finding identical fashion items can now also return similar items with user-specified changes to one attribute. A reader would care because real-world search often requires both exact matches and controlled variations, and prior approaches forced a choice between the two tasks. The method claims this works in a semi-supervised setting without retraining the core representation.

Core claim

The paper claims that attribute manipulation can be performed independently at the feature level by matching the distribution of manipulated features with real features, enabling prior methods for fashion instance-level image retrieval to perform fashion attribute manipulation without sacrificing their retrieval performance.

What carries the argument

Feature-level attribute manipulation via distribution matching between manipulated and real features, which decouples the editing step from image representation learning.

If this is right

Previous FIR methods gain FAM capability without joint retraining.
Retrieval performance on the original task stays intact after adding manipulation.
Attribute changes can occur independently from the representation learning stage.
Users can retrieve items similar to a query but with targeted modifications such as color or pattern.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The distribution-matching step could be applied to pretrained retrieval models in other visual domains to add partial editing without full retraining.
This separation suggests interactive search interfaces where users adjust one attribute and immediately see updated results.
If the matching is done with unlabeled data, the method may lower the need for expensive attribute-labeled pairs.

Load-bearing premise

That matching the distribution of manipulated features with real features is sufficient to preserve a query's unique characteristics while allowing independent attribute changes.

What would settle it

A measurable drop in retrieval accuracy on standard FIR benchmarks when the manipulated features are used instead of the original features would show the approach fails to preserve performance.

Figures

Figures reproduced from arXiv: 1907.05007 by Minchul Shin, Sanghyuk Park, Taeksoo Kim.

**Figure 2.** Figure 2: Top-3 retrieval results after the query attribute manipulation. The green-bordered [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: t-SNE visualization of the attribute-specific embedding vectors on the embedding [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

With a growing demand for the search by image, many works have studied the task of fashion instance-level image retrieval (FIR). Furthermore, the recent works introduce a concept of fashion attribute manipulation (FAM) which manipulates a specific attribute (e.g color) of a fashion item while maintaining the rest of the attributes (e.g shape, and pattern). In this way, users can search not only "the same" items but also "similar" items with the desired attributes. FAM is a challenging task in that the attributes are hard to define, and the unique characteristics of a query are hard to be preserved. Although both FIR and FAM are important in real-life applications, most of the previous studies have focused on only one of these problem. In this study, we aim to achieve competitive performance on both FIR and FAM. To do so, we propose a novel method that converts a query into a representation with the desired attributes. We introduce a new idea of attribute manipulation at the feature level, by matching the distribution of manipulated features with real features. In this fashion, the attribute manipulation can be done independently from learning a representation from the image. By introducing the feature-level attribute manipulation, the previous methods for FIR can perform attribute manipulation without sacrificing their retrieval performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They decouple FAM from FIR by attaching a feature-level distribution matching module so existing retrieval networks can handle attribute edits without retraining.

read the letter

The main thing here is a modular way to add attribute manipulation to any existing fashion image retrieval model. They run the manipulation after the features are already extracted, using distribution matching so the edited features stay plausible while only the target attribute changes. This keeps the original retrieval network untouched and avoids joint optimization of both tasks. The claim is that prior FIR methods can now support the similar-but-different-attribute searches that matter in e-commerce without losing accuracy on the unchanged parts. That separation is the concrete step they take beyond treating the two problems in isolation. It is a practical engineering move rather than a new representation or generative technique. The modularity is useful if you already have a strong retrieval backbone and just need to bolt on editing. The soft spot is whether distribution matching alone preserves instance identity. It enforces aggregate statistics across the feature set, not closeness of each manipulated vector to its original query vector. If the base features entangle attributes, the edit can leak into non-target properties and degrade retrieval even if the module is added post-hoc. The abstract does not mention cycle consistency, reconstruction, or per-instance fidelity terms that would directly counter this, so the no-sacrifice claim rests on how well the matching works in practice. Experiments would need to show retrieval metrics before and after manipulation plus controls for identity drift. This is for groups building fashion search systems who need both exact and attribute-modified results. It is not aimed at readers looking for general advances in disentanglement or generative modeling. The idea is clear enough on its own terms to warrant referee time so the experimental details can be checked.

Referee Report

1 major / 1 minor

Summary. The paper claims that performing attribute manipulation at the feature level—by matching the distribution of manipulated features to real features in a semi-supervised setup—allows existing fashion instance retrieval (FIR) methods to also perform fashion attribute manipulation (FAM) without degrading retrieval performance. The approach decouples manipulation from representation learning so that prior FIR techniques can be extended to support attribute changes (e.g., color) while preserving other attributes and query identity.

Significance. If empirically validated, the result would be useful for practical fashion search systems that need both exact-instance retrieval and controlled attribute editing. The feature-level, post-hoc nature of the manipulation is a conceptual strength because it avoids joint retraining of the representation. The semi-supervised framing could also reduce labeling costs. However, significance is tempered by the fact that the central claim rests on an untested assumption about distribution matching being sufficient for identity preservation.

major comments (1)

[Abstract / Proposed Method] Abstract and method description: the claim that 'matching the distribution of manipulated features with real features' suffices to change one attribute while retaining the query's unique characteristics (and thus FIR performance) is load-bearing, yet the description supplies no instance-level fidelity term (cycle consistency, reconstruction loss, or per-query similarity constraint). Distribution matching enforces only aggregate statistics; when base FIR features are entangled this risks altering non-target attributes, directly undermining the 'without sacrificing their retrieval performance' assertion.

minor comments (1)

[Abstract] The abstract states the intended benefit but supplies no experimental results, baselines, or validation details; the Experiments section (if present) should be cross-referenced in the abstract for immediate assessment of the central claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below and indicate where revisions will be made.

read point-by-point responses

Referee: [Abstract / Proposed Method] Abstract and method description: the claim that 'matching the distribution of manipulated features with real features' suffices to change one attribute while retaining the query's unique characteristics (and thus FIR performance) is load-bearing, yet the description supplies no instance-level fidelity term (cycle consistency, reconstruction loss, or per-query similarity constraint). Distribution matching enforces only aggregate statistics; when base FIR features are entangled this risks altering non-target attributes, directly undermining the 'without sacrificing their retrieval performance' assertion.

Authors: We agree that the abstract and method overview emphasize distribution matching at the aggregate level without an explicit instance-level fidelity term such as cycle consistency or per-query reconstruction. The core design decouples manipulation from the base FIR representation, which is already trained to preserve identity; the semi-supervised distribution matching is then applied only to shift the target attribute. Our experiments demonstrate maintained retrieval performance, providing empirical support that non-target attributes are largely preserved. However, the referee's point is valid regarding the description: we will revise the method section to explicitly discuss the reliance on the base feature extractor for identity preservation, add a limitations paragraph addressing potential entanglement risks, and clarify that no additional per-instance constraint is used. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is architectural separation without self-referential reduction

full rationale

The paper describes a semi-supervised approach to feature-level attribute manipulation for fashion image retrieval by matching distributions of manipulated features to real ones, allowing the manipulation module to operate independently of the base FIR representation learner. No equations, derivations, or fitted parameters are shown that would make any claimed performance preservation equivalent to the inputs by construction. The central claim—that prior FIR methods can add FAM without sacrificing retrieval performance—follows from the stated independence of the modules rather than from any self-definition, self-citation load-bearing uniqueness theorem, or renaming of known results. The approach is presented as building on existing FIR techniques with an added distribution-matching component, which remains an empirical design choice rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are described in the abstract; the method implicitly relies on standard assumptions of deep feature learning and distribution matching but none are itemized.

pith-pipeline@v0.9.0 · 5757 in / 1011 out tokens · 17435 ms · 2026-05-24T23:24:30.554728+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 2 internal anchors

[1]

https://shopping.naver.com/

work page
[2]

Learning attribute representations with localization for ﬂexible fashion search

Kenan E Ak, Ashraf A Kassim, Joo Hwee Lim, and Jo Yew Tham. Learning attribute representations with localization for ﬂexible fashion search. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7708–7717, 2018

work page 2018
[3]

Efﬁcient multi- attribute similarity learning towards attribute-based fashion search

Kenan E Ak, Joo Hwee Lim, Jo Yew Tham, and Ashraf A Kassim. Efﬁcient multi- attribute similarity learning towards attribute-based fashion search. In 2018 IEEE Win- ter Conference on Applications of Computer Vision (WACV), pages 1671–1679. IEEE, 2018

work page 2018
[4]

Aggregating Deep Convolutional Features for Image Retrieval

Artem Babenko and Victor Lempitsky. Aggregating deep convolutional features for image retrieval. arXiv preprint arXiv:1510.07493, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[5]

BEGAN: Boundary Equilibrium Generative Adversarial Networks

David Berthelot, Thomas Schumm, and Luke Metz. Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Describing clothing by seman- tic attributes

Huizhong Chen, Andrew Gallagher, and Bernd Girod. Describing clothing by seman- tic attributes. In European Conference on Computer Vision (ECCV) , pages 609–623. Springer, 2012

work page 2012
[7]

Describing clothing by semantic attributes

Huizhong Chen, Andrew Gallagher, and Bernd Girod. Describing clothing by semantic attributes. In European conference on computer vision, pages 609–623. Springer, 2012

work page 2012
[8]

Leveraging weakly annotated data for fashion image retrieval and label prediction

Charles Corbiere, Hedi Ben-Younes, Alexandre Ramé, and Charles Ollion. Leveraging weakly annotated data for fashion image retrieval and label prediction. In Proceedings of the IEEE International Conference on Computer Vision, pages 2268–2274, 2017

work page 2017
[9]

Style ﬁnder: Fine-grained clothing style detection and retrieval

Wei Di, Catherine Wah, Anurag Bhardwaj, Robinson Piramuthu, and Neel Sundaresan. Style ﬁnder: Fine-grained clothing style detection and retrieval. In IEEE Conference on computer vision and pattern recognition workshops, pages 8–13, 2013

work page 2013
[10]

Cross-domain fashion image retrieval

Bojana Gajic and Ramon Baldrich. Cross-domain fashion image retrieval. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1869–1871, 2018

work page 2018
[11]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

work page 2014
[12]

End-to-end learning of deep visual representations for image retrieval

Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, 124(2):237–254, 2017

work page 2017
[13]

Where to buy it: Matching street clothing photos in online shops

M Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C Berg, and Tamara L Berg. Where to buy it: Matching street clothing photos in online shops. InInternational Conference on Computer Vision (ICCV), pages 3343–3351, 2015

work page 2015
[14]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. SHIN ET AL.: FEA TURE-LEVEL A TTRIBUTE MANIPULA TION FOR FASHION 11

work page 2016
[15]

Squeeze-and-excitation networks

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 7132–7141, 2018

work page 2018
[16]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

work page 2017
[17]

Cross-domain image retrieval with a dual attribute-aware ranking network

Junshi Huang, Rogerio S Feris, Qiang Chen, and Shuicheng Yan. Cross-domain image retrieval with a dual attribute-aware ranking network. In Proceedings of the IEEE international conference on computer vision, pages 1062–1070, 2015

work page 2015
[18]

Combination of multiple global descriptors for image retrieval

HeeJae Jun, ByungSoo Ko, Youngjoon Kim, Insik Kim, and Jongtack Kim. Combination of multiple global descriptors for image retrieval. arXiv preprint arXiv:1903.10663, 2019

work page arXiv 1903
[19]

Getting the look: clothing recog- nition and segmentation for automatic product suggestions in everyday photos

Yannis Kalantidis, Lyndon Kennedy, and Li-Jia Li. Getting the look: clothing recog- nition and segmentation for automatic product suggestions in everyday photos. In Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, pages 105–112, 2013

work page 2013
[20]

Hipster wars: Discovering elements of fashion styles

M Hadi Kiapour, Kota Yamaguchi, Alexander C Berg, and Tamara L Berg. Hipster wars: Discovering elements of fashion styles. In European Conference on Computer Vision (ECCV), pages 472–488, 2014

work page 2014
[21]

Learn- ing to discover cross-domain relations with generative adversarial networks

Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. Learn- ing to discover cross-domain relations with generative adversarial networks. In Pro- ceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1857–1865. JMLR. org, 2017

work page 2017
[22]

Attribute pivots for guiding relevance feed- back in image search

Adriana Kovashka and Kristen Grauman. Attribute pivots for guiding relevance feed- back in image search. In Proceedings of the IEEE International Conference on Com- puter Vision, pages 297–304, 2013

work page 2013
[23]

Whittlesearch: Image search with relative attribute feedback

Adriana Kovashka, Devi Parikh, and Kristen Grauman. Whittlesearch: Image search with relative attribute feedback. In Computer Vision and Pattern Recognition (CVPR), pages 2973–2980, 2012

work page 2012
[24]

Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set

Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Hanqing Lu, and Shuicheng Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Computer Vision and Pattern Recognition (CVPR), pages 3330–3337. IEEE, 2012

work page 2012
[25]

Deepfashion: Pow- ering robust clothes recognition and retrieval with rich annotations

Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. Deepfashion: Pow- ering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 1096–1104, 2016

work page 2016
[26]

Visualizing data using t-sne

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008

work page 2008
[27]

Conﬁdence and diversity for active selection of feedback in image retrieval

Bhavin Modi and Adriana Kovashka. Conﬁdence and diversity for active selection of feedback in image retrieval. In British Machine Vision Conference (BMVC), 2017. 12 SHIN ET AL.: FEA TURE-LEVEL A TTRIBUTE MANIPULA TION FOR FASHION

work page 2017
[28]

Give me a hint! navigating image databases using human-in-the-loop feedback

Bryan Plummer, Hadi Kiapour, Shuai Zheng, and Robinson Piramuthu. Give me a hint! navigating image databases using human-in-the-loop feedback. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) , pages 2048–2057. IEEE, 2019

work page 2019
[29]

Improved techniques for training gans

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in neural information processing systems, pages 2234–2242, 2016

work page 2016
[30]

Facenet: A uniﬁed embed- ding for face recognition and clustering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A uniﬁed embed- ding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015

work page 2015
[31]

End-to-end localization and ranking for rela- tive attributes

Krishna Kumar Singh and Yong Jae Lee. End-to-end localization and ranking for rela- tive attributes. In European Conference on Computer Vision (ECCV), pages 753–769. Springer, 2016

work page 2016
[32]

Improved deep metric learning with multi-class n-pair loss objective

Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. In Advances in Neural Information Processing Systems, pages 1857–1865, 2016

work page 2016
[33]

Dropout: a simple way to prevent neural networks from overﬁtting

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014

work page 1929
[34]

Inception-v4, inception-resnet and the impact of residual connections on learning

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artiﬁcial Intelligence, 2017

work page 2017
[35]

Learning type-aware embeddings for fashion compatibility

Mariya I Vasileva, Bryan A Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, and David Forsyth. Learning type-aware embeddings for fashion compatibility. In Proceedings of the European Conference on Computer Vision (ECCV), pages 390–405, 2018

work page 2018
[36]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE confer- ence on computer vision and pattern recognition, pages 1492–1500, 2017

work page 2017
[37]

Mix and match: Joint model for clothing and attribute recognition

Kota Yamaguchi, Takayuki Okatani, Kyoko Sudo, Kazuhiko Murasaki, and Yukinobu Taniguchi. Mix and match: Joint model for clothing and attribute recognition. In British Machine Vision Conference (BMVC), volume 1, page 4, 2015

work page 2015
[38]

Articulated pose estimation with ﬂexible mixtures-of- parts

Yi Yang and Deva Ramanan. Articulated pose estimation with ﬂexible mixtures-of- parts. In CVPR 2011, pages 1385–1392. IEEE, 2011

work page 2011
[39]

Hard-aware point-to-set deep metric for person re-identiﬁcation

Rui Yu, Zhiyong Dou, Song Bai, Zhaoxiang Zhang, Yongchao Xu, and Xiang Bai. Hard-aware point-to-set deep metric for person re-identiﬁcation. InProceedings of the European Conference on Computer Vision (ECCV), pages 188–204, 2018

work page 2018
[40]

Memory-augmented attribute manipulation networks for interactive fashion search

Bo Zhao, Jiashi Feng, Xiao Wu, and Shuicheng Yan. Memory-augmented attribute manipulation networks for interactive fashion search. InProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 1520–1528, 2017. SHIN ET AL.: FEA TURE-LEVEL A TTRIBUTE MANIPULA TION FOR FASHION 13

work page 2017
[41]

Unpaired image-to- image translation using cycle-consistent adversarial networks

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to- image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2223–2232, 2017

work page 2017

[1] [1]

https://shopping.naver.com/

work page

[2] [2]

Learning attribute representations with localization for ﬂexible fashion search

Kenan E Ak, Ashraf A Kassim, Joo Hwee Lim, and Jo Yew Tham. Learning attribute representations with localization for ﬂexible fashion search. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7708–7717, 2018

work page 2018

[3] [3]

Efﬁcient multi- attribute similarity learning towards attribute-based fashion search

Kenan E Ak, Joo Hwee Lim, Jo Yew Tham, and Ashraf A Kassim. Efﬁcient multi- attribute similarity learning towards attribute-based fashion search. In 2018 IEEE Win- ter Conference on Applications of Computer Vision (WACV), pages 1671–1679. IEEE, 2018

work page 2018

[4] [4]

Aggregating Deep Convolutional Features for Image Retrieval

Artem Babenko and Victor Lempitsky. Aggregating deep convolutional features for image retrieval. arXiv preprint arXiv:1510.07493, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[5] [5]

BEGAN: Boundary Equilibrium Generative Adversarial Networks

David Berthelot, Thomas Schumm, and Luke Metz. Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Describing clothing by seman- tic attributes

Huizhong Chen, Andrew Gallagher, and Bernd Girod. Describing clothing by seman- tic attributes. In European Conference on Computer Vision (ECCV) , pages 609–623. Springer, 2012

work page 2012

[7] [7]

Describing clothing by semantic attributes

Huizhong Chen, Andrew Gallagher, and Bernd Girod. Describing clothing by semantic attributes. In European conference on computer vision, pages 609–623. Springer, 2012

work page 2012

[8] [8]

Leveraging weakly annotated data for fashion image retrieval and label prediction

Charles Corbiere, Hedi Ben-Younes, Alexandre Ramé, and Charles Ollion. Leveraging weakly annotated data for fashion image retrieval and label prediction. In Proceedings of the IEEE International Conference on Computer Vision, pages 2268–2274, 2017

work page 2017

[9] [9]

Style ﬁnder: Fine-grained clothing style detection and retrieval

Wei Di, Catherine Wah, Anurag Bhardwaj, Robinson Piramuthu, and Neel Sundaresan. Style ﬁnder: Fine-grained clothing style detection and retrieval. In IEEE Conference on computer vision and pattern recognition workshops, pages 8–13, 2013

work page 2013

[10] [10]

Cross-domain fashion image retrieval

Bojana Gajic and Ramon Baldrich. Cross-domain fashion image retrieval. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1869–1871, 2018

work page 2018

[11] [11]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

work page 2014

[12] [12]

End-to-end learning of deep visual representations for image retrieval

Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, 124(2):237–254, 2017

work page 2017

[13] [13]

Where to buy it: Matching street clothing photos in online shops

M Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C Berg, and Tamara L Berg. Where to buy it: Matching street clothing photos in online shops. InInternational Conference on Computer Vision (ICCV), pages 3343–3351, 2015

work page 2015

[14] [14]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. SHIN ET AL.: FEA TURE-LEVEL A TTRIBUTE MANIPULA TION FOR FASHION 11

work page 2016

[15] [15]

Squeeze-and-excitation networks

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 7132–7141, 2018

work page 2018

[16] [16]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

work page 2017

[17] [17]

Cross-domain image retrieval with a dual attribute-aware ranking network

Junshi Huang, Rogerio S Feris, Qiang Chen, and Shuicheng Yan. Cross-domain image retrieval with a dual attribute-aware ranking network. In Proceedings of the IEEE international conference on computer vision, pages 1062–1070, 2015

work page 2015

[18] [18]

Combination of multiple global descriptors for image retrieval

HeeJae Jun, ByungSoo Ko, Youngjoon Kim, Insik Kim, and Jongtack Kim. Combination of multiple global descriptors for image retrieval. arXiv preprint arXiv:1903.10663, 2019

work page arXiv 1903

[19] [19]

Getting the look: clothing recog- nition and segmentation for automatic product suggestions in everyday photos

Yannis Kalantidis, Lyndon Kennedy, and Li-Jia Li. Getting the look: clothing recog- nition and segmentation for automatic product suggestions in everyday photos. In Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, pages 105–112, 2013

work page 2013

[20] [20]

Hipster wars: Discovering elements of fashion styles

M Hadi Kiapour, Kota Yamaguchi, Alexander C Berg, and Tamara L Berg. Hipster wars: Discovering elements of fashion styles. In European Conference on Computer Vision (ECCV), pages 472–488, 2014

work page 2014

[21] [21]

Learn- ing to discover cross-domain relations with generative adversarial networks

Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. Learn- ing to discover cross-domain relations with generative adversarial networks. In Pro- ceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1857–1865. JMLR. org, 2017

work page 2017

[22] [22]

Attribute pivots for guiding relevance feed- back in image search

Adriana Kovashka and Kristen Grauman. Attribute pivots for guiding relevance feed- back in image search. In Proceedings of the IEEE International Conference on Com- puter Vision, pages 297–304, 2013

work page 2013

[23] [23]

Whittlesearch: Image search with relative attribute feedback

Adriana Kovashka, Devi Parikh, and Kristen Grauman. Whittlesearch: Image search with relative attribute feedback. In Computer Vision and Pattern Recognition (CVPR), pages 2973–2980, 2012

work page 2012

[24] [24]

Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set

Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Hanqing Lu, and Shuicheng Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Computer Vision and Pattern Recognition (CVPR), pages 3330–3337. IEEE, 2012

work page 2012

[25] [25]

Deepfashion: Pow- ering robust clothes recognition and retrieval with rich annotations

Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. Deepfashion: Pow- ering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 1096–1104, 2016

work page 2016

[26] [26]

Visualizing data using t-sne

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008

work page 2008

[27] [27]

Conﬁdence and diversity for active selection of feedback in image retrieval

Bhavin Modi and Adriana Kovashka. Conﬁdence and diversity for active selection of feedback in image retrieval. In British Machine Vision Conference (BMVC), 2017. 12 SHIN ET AL.: FEA TURE-LEVEL A TTRIBUTE MANIPULA TION FOR FASHION

work page 2017

[28] [28]

Give me a hint! navigating image databases using human-in-the-loop feedback

Bryan Plummer, Hadi Kiapour, Shuai Zheng, and Robinson Piramuthu. Give me a hint! navigating image databases using human-in-the-loop feedback. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) , pages 2048–2057. IEEE, 2019

work page 2019

[29] [29]

Improved techniques for training gans

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in neural information processing systems, pages 2234–2242, 2016

work page 2016

[30] [30]

Facenet: A uniﬁed embed- ding for face recognition and clustering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A uniﬁed embed- ding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015

work page 2015

[31] [31]

End-to-end localization and ranking for rela- tive attributes

Krishna Kumar Singh and Yong Jae Lee. End-to-end localization and ranking for rela- tive attributes. In European Conference on Computer Vision (ECCV), pages 753–769. Springer, 2016

work page 2016

[32] [32]

Improved deep metric learning with multi-class n-pair loss objective

Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. In Advances in Neural Information Processing Systems, pages 1857–1865, 2016

work page 2016

[33] [33]

Dropout: a simple way to prevent neural networks from overﬁtting

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014

work page 1929

[34] [34]

Inception-v4, inception-resnet and the impact of residual connections on learning

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artiﬁcial Intelligence, 2017

work page 2017

[35] [35]

Learning type-aware embeddings for fashion compatibility

Mariya I Vasileva, Bryan A Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, and David Forsyth. Learning type-aware embeddings for fashion compatibility. In Proceedings of the European Conference on Computer Vision (ECCV), pages 390–405, 2018

work page 2018

[36] [36]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE confer- ence on computer vision and pattern recognition, pages 1492–1500, 2017

work page 2017

[37] [37]

Mix and match: Joint model for clothing and attribute recognition

Kota Yamaguchi, Takayuki Okatani, Kyoko Sudo, Kazuhiko Murasaki, and Yukinobu Taniguchi. Mix and match: Joint model for clothing and attribute recognition. In British Machine Vision Conference (BMVC), volume 1, page 4, 2015

work page 2015

[38] [38]

Articulated pose estimation with ﬂexible mixtures-of- parts

Yi Yang and Deva Ramanan. Articulated pose estimation with ﬂexible mixtures-of- parts. In CVPR 2011, pages 1385–1392. IEEE, 2011

work page 2011

[39] [39]

Hard-aware point-to-set deep metric for person re-identiﬁcation

Rui Yu, Zhiyong Dou, Song Bai, Zhaoxiang Zhang, Yongchao Xu, and Xiang Bai. Hard-aware point-to-set deep metric for person re-identiﬁcation. InProceedings of the European Conference on Computer Vision (ECCV), pages 188–204, 2018

work page 2018

[40] [40]

Memory-augmented attribute manipulation networks for interactive fashion search

Bo Zhao, Jiashi Feng, Xiao Wu, and Shuicheng Yan. Memory-augmented attribute manipulation networks for interactive fashion search. InProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 1520–1528, 2017. SHIN ET AL.: FEA TURE-LEVEL A TTRIBUTE MANIPULA TION FOR FASHION 13

work page 2017

[41] [41]

Unpaired image-to- image translation using cycle-consistent adversarial networks

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to- image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2223–2232, 2017

work page 2017