KiseKloset for Fashion Retrieval and Recommendation

Khoi-Nguyen Nguyen-Ngoc; Minh-Triet Tran; Tam V. Nguyen; Thanh-Tung Phan-Nguyen; Trung-Nghia Le

arxiv: 2506.23471 · v2 · submitted 2025-06-30 · 💻 cs.IR · cs.CV

KiseKloset for Fashion Retrieval and Recommendation

Thanh-Tung Phan-Nguyen , Khoi-Nguyen Nguyen-Ngoc , Tam V. Nguyen , Minh-Triet Tran , Trung-Nghia Le This is my paper

Pith reviewed 2026-05-19 08:11 UTC · model grok-4.3

classification 💻 cs.IR cs.CV

keywords fashion retrievaloutfit recommendationvirtual try-ontransformer architecturecomplementary itemse-commerceimage searchrecommendation system

0 comments

The pith

KiseKloset pairs a new transformer for cross-category complementary fashion recommendations with a lightweight real-time virtual try-on module.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KiseKloset as an end-to-end system for outfit retrieval and recommendation in fashion e-commerce. It supports two retrieval paths, similar-item matching and text-feedback guidance, while introducing a transformer that suggests items from different categories that work together as outfits. The system also adds a virtual try-on component built to run quickly, use limited memory, and generate realistic images of garments on a person. These pieces together aim to help online shoppers explore options more effectively and visualize purchases before buying.

Core claim

KiseKloset integrates similar-item and text-guided retrieval, a novel transformer architecture for recommending complementary garments across diverse categories, approximate algorithms for faster search, and a lightweight virtual try-on framework that operates in real time with low memory use while preserving output realism, as validated through deployment where 84 percent of users reported improved shopping experience.

What carries the argument

The novel transformer architecture that takes fashion item features and generates recommendations for complementary pieces drawn from multiple product categories to form coherent outfits.

If this is right

Approximate algorithms reduce search time over large fashion catalogs while preserving retrieval quality.
Text feedback allows users to refine searches beyond visual similarity alone.
Real-time virtual try-on lets shoppers preview how specific garments appear on their own body, supporting more confident purchase decisions.
Deployment feedback indicates that the combined retrieval, recommendation, and visualization tools raise overall user satisfaction in online fashion shopping.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The cross-category recommendation approach could extend to other visual product domains such as furniture or accessories where items must coordinate.
Widespread use of the virtual try-on module might lower return volumes by giving shoppers clearer expectations before purchase.
Mobile versions of the lightweight try-on framework could enable in-store augmented-reality previews without heavy hardware requirements.

Load-bearing premise

The new transformer and virtual try-on module deliver measurable improvements in recommendation quality and visualization realism over prior methods, as reflected in user satisfaction without detailed benchmark numbers.

What would settle it

A side-by-side A/B test that records purchase rates, return rates, or session completion times for shoppers using KiseKloset versus a baseline recommendation system without the transformer or virtual try-on components.

Figures

Figures reproduced from arXiv: 2506.23471 by Khoi-Nguyen Nguyen-Ngoc, Minh-Triet Tran, Tam V. Nguyen, Thanh-Tung Phan-Nguyen, Trung-Nghia Le.

**Figure 1.** Figure 1: Interface of the propose KiseKloset system, integrated outfit retrieval, recommendation, and virtual try-on capabilities. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Our complementary item recommendation is more generalized than fill-in-the-blank outfit recommendation [ [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Our system supports various types of ORR. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Examples of text feedback-guided item retrieval. The first item is reference, the remain items are retrieval results. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: We augment retrieval results to enhance users’ experience. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Our proposed inter-category complementary item recommendation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Two settings of inter-category complementary item recommendation: Tone sur tone (top) and Mix and match (bottom). [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Overview architecture of the used DM-VTON [ [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Interaction flow of the proposed KiseKloset system. [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Query time of nearest neighbor methods. Subsequently, users must upload or pick the garment they want to try on from our provided list on the right side (as shown in Figure 9b). The left panel also displays the chosen model image in the previous step. Now users can press the Next button to view the try-on result. Once users have chosen both the input person and garment image, they are presented with the t… view at source ↗

**Figure 11.** Figure 11: Rating scores on ORR quality (1: very dissatisfied, 5: very satisfied). [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: Issues of existing parser-free VTON methods (i.e., PF-AFN [11], FS-VTON [17], DM-VTON [30]). [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

read the original abstract

The global fashion e-commerce industry has become integral to people's daily lives, leveraging technological advancements to offer personalized shopping experiences, primarily through recommendation systems that enhance customer engagement through personalized suggestions. To improve customers' experience in online shopping, we propose a novel comprehensive KiseKloset system for outfit retrieval and recommendation. We explore two approaches for outfit retrieval: similar item retrieval and text feedback-guided item retrieval. Notably, we introduce a novel transformer architecture designed to recommend complementary items from diverse categories. Furthermore, we enhance the overall performance of the search pipeline by integrating approximate algorithms to optimize the search process. Additionally, addressing the crucial needs of online shoppers, we employ a lightweight yet efficient virtual try-on framework capable of real-time operation, memory efficiency, and maintaining realistic outputs compared to its predecessors. This virtual try-on module empowers users to visualize specific garments on themselves, enhancing the customers' experience and reducing costs associated with damaged items for retailers. We deployed our end-to-end system for online users to test and provide feedback, enabling us to measure their satisfaction levels. The results of our user study revealed that 84% of participants found our comprehensive system highly useful, significantly improving their online shopping experience.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper describes a practical end-to-end fashion system with a transformer recommender and lightweight virtual try-on, but its performance claims rest on an under-specified user study.

read the letter

This paper is a system description that combines a transformer for recommending complementary fashion items across categories, approximate nearest-neighbor search for speed, and a memory-efficient virtual try-on module that runs in real time. The authors deployed the whole thing and report that 84% of users found it highly useful for online shopping. That integration for e-commerce is the core contribution, and it is presented clearly enough that someone building a similar pipeline could pick up implementation ideas from the description. The virtual try-on part is positioned as lighter than prior work while keeping realistic output, which is a reasonable engineering goal for deployment. The approximate search addition also makes sense for scaling retrieval. Those pieces show some attention to practical constraints like memory and latency. The main weakness is the evaluation. The user study is the only quantitative support offered, yet it gives no participant count, recruitment details, exact questions asked, baseline comparisons, or any statistical measures around the 84% figure. Without those, it is difficult to tell whether the system actually outperforms existing fashion platforms or simply reflects general satisfaction with any modern interface. The abstract and system overview do not supply the missing methodology either. This work is aimed at practitioners in fashion e-commerce or applied IR who want to see how retrieval, recommendation, and try-on can be wired together in one product. Readers looking for new theoretical results or large-scale benchmark improvements will not find them here. If the authors add proper experimental controls and comparisons in a revision, the paper could be worth sending to referees for its engineering details; right now the evidence is too thin to judge the claimed advantages.

Referee Report

1 major / 2 minor

Summary. The manuscript presents the KiseKloset end-to-end system for fashion outfit retrieval and recommendation. It describes two retrieval approaches (similar-item and text-feedback-guided), a novel transformer architecture for recommending complementary items across diverse categories, integration of approximate search algorithms to optimize the pipeline, a lightweight real-time virtual try-on module claimed to be memory-efficient and more realistic than predecessors, and a deployed user study in which 84% of participants found the comprehensive system highly useful.

Significance. If the architectural contributions and user-study results can be substantiated with proper controls and comparisons, the work could demonstrate a practical integration of cross-category recommendation, efficient search, and visualization that improves online fashion shopping experiences and reduces return costs. At present the lack of methodological detail in the evaluation limits the ability to gauge its incremental contribution over existing retrieval and try-on systems.

major comments (1)

[User Study] User Study section: the central claim that the deployed system delivers a practically superior experience rests on the statement that '84% of participants found our comprehensive system highly useful.' No participant count, recruitment method, questionnaire items, baseline interface, statistical test, or confidence interval is supplied, rendering the percentage uninterpretable as evidence of superiority.

minor comments (2)

[Abstract] Abstract: the phrase 'significantly improving their online shopping experience' is asserted without any quantitative comparison or baseline metric.
[Methodology] The description of the transformer architecture for complementary-item recommendation would benefit from an explicit statement of its input/output format and loss function to allow comparison with prior cross-category models.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the concern regarding the user study below and commit to revising the paper to strengthen the evaluation section.

read point-by-point responses

Referee: [User Study] User Study section: the central claim that the deployed system delivers a practically superior experience rests on the statement that '84% of participants found our comprehensive system highly useful.' No participant count, recruitment method, questionnaire items, baseline interface, statistical test, or confidence interval is supplied, rendering the percentage uninterpretable as evidence of superiority.

Authors: We agree that the current description of the user study lacks the methodological details needed for proper interpretation and comparison. In the revised manuscript we will expand this section to report the exact number of participants, the recruitment approach via the deployed platform, the questionnaire items administered, any baseline interfaces used for comparison, and the results of statistical tests including confidence intervals. These additions will allow readers to better assess the practical impact of the system. revision: yes

Circularity Check

0 steps flagged

No circularity: system description without derivations or self-referential fits

full rationale

The paper introduces a KiseKloset system with a novel transformer for cross-category complementary recommendation, approximate search integration, and a lightweight virtual try-on module. Claims rest on architectural descriptions and a deployed user study reporting 84% satisfaction. No equations, parameters, first-principles derivations, or predictive models appear in the provided text. There are no self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations that reduce the central claims to inputs by construction. This is a standard applied systems paper whose evidence is empirical and descriptive rather than mathematically circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on domain assumptions about user study validity and the practical utility of the described architectures without introducing new mathematical entities or free parameters.

axioms (1)

domain assumption Deployed user feedback provides a reliable measure of overall system usefulness and satisfaction.
The validation of the end-to-end system depends on this empirical claim from the user study.

pith-pipeline@v0.9.0 · 5758 in / 1258 out tokens · 43128 ms · 2026-05-19T08:11:29.712524+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

[n. d.]. Ace Your Product Recommendations to Grow Revenue. https://www.visenze.com/blog/2023/07/19/ace-your-product-recommendations-to- grow-revenue. Accessed: 2023-07-30

work page 2023
[2]

[n. d.]. Fashion e-commerce market value worldwide from 2023 to 2027. https://www.statista.com/topics/9288/fashion-e-commerce-worldwide. Accessed: 2023-06-29

work page 2023
[3]

Shuai Bai, Huiling Zhou, Zhikang Li, Chang Zhou, and Hongxia Yang. 2022. Single stage virtual try-on via deformable attention flows. In European Conference on Computer Vision (ECCV) . 409–425

work page 2022
[4]

Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022. Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 4959–4968

work page 2022
[5]

Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022. Effective Conditioned and Composed Image Retrieval Combining CLIP-Based Features. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 21466–21474

work page 2022
[6]

Erik Bernhardsson. 2018. Annoy: Approximate Nearest Neighbors in C++/Python . https://pypi.org/project/annoy/ Python package version 1.13.0. Manuscript submitted to ACM KiseKloset: Comprehensive System For Outfit Retrieval, Recommendation, And Try-On 17

work page 2018
[7]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 7291–7299

work page 2017
[8]

Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao, and Binqiang Zhao. 2019. POG: personalized outfit generation for fashion recommendation at Alibaba iFashion. In International Conference on Knowledge Discovery & Data Mining (SIGKDD). 2662–2670

work page 2019
[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Benjamin Fele, Ajda Lampe, Peter Peer, and Vitomir Struc. 2022. C-vton: Context-driven image-based virtual try-on network. In IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV). 3144–3153

work page 2022
[11]

Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, and Ping Luo. 2021. Parser-free virtual try-on via distilling appearance flows. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 8485–8493

work page 2021
[12]

Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 932–940

work page 2017
[13]

RA Guler, Natalia Neverova, and IK DensePose. 2018. DensePose: Dense human pose estimation in the wild. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 7297–7306

work page 2018
[14]

Xintong Han, Xiaojun Hu, Weilin Huang, and Matthew R Scott. 2019. Clothflow: A flow-based model for clothed person generation. In IEEE/CVF International Conference on Computer Vision (ICCV) . 10471–10480

work page 2019
[15]

Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S Davis. 2018. Viton: An image-based virtual try-on network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 7543–7552

work page 2018
[16]

Xiao Han, Licheng Yu, Xiatian Zhu, Li Zhang, Yi-Zhe Song, and Tao Xiang. 2022. FashionViL: Fashion-Focused Vision-and-Language Representation Learning. In European Conference on Computer Vision (ECCV) . 634–651

work page 2022
[17]

Sen He, Yi-Zhe Song, and Tao Xiang. 2022. Style-based global appearance flow for virtual try-on. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3470–3479

work page 2022
[18]

Thibaut Issenhuth, Jérémie Mary, and Clément Calauzenes. 2020. Do not mask what you do not need to mask: a parser-free virtual try-on. In European Conference on Computer Vision (ECCV) . 619–635

work page 2020
[19]

Junkyu Jang, Eugene Hwang, and Sung-Hyuk Park. 2024. Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval. In Winter Conference on Applications of Computer Vision (W ACV). 8066–8075

work page 2024
[20]

Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2010), 117–128

work page 2010
[21]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547

work page 2019
[22]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 4401–4410

work page 2019
[23]

Cheng-I Lai. 2019. Contrastive Predictive Coding Based Feature for Automatic Speaker Verification. arXiv preprint arXiv:1904.01575 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[24]

Peike Li, Yunqiu Xu, Yunchao Wei, and Yi Yang. 2020. Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 6 (2020), 3260–3271

work page 2020
[25]

Chao Lin, Zhao Li, Sheng Zhou, Shichang Hu, Jialun Zhang, Linhao Luo, Jiarun Zhang, Longtao Huang, and Yuan He. 2022. RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on. In International Joint Conference on Artificial Intelligence (IJCAI) . 1151–1158

work page 2022
[26]

Yen-Liang Lin, Son Tran, and Larry Davis. 2020. Fashion Outfit Complementary Item Retrieval. In Conference on Computer Vision and Pattern Recognition. 3311–3319

work page 2020
[27]

Nguyen, Jiashi Feng, Meng Wang, and Shuicheng Yan

Si Liu, Tam V. Nguyen, Jiashi Feng, Meng Wang, and Shuicheng Yan. 2012. Hi, Magic Closet, Tell Me What to Wear!. InInternational Conference on Multimedia (ACM MM). 619–628

work page 2012
[28]

Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In IEEE Conference on Computer Vision and Pattern Recognition . 1096–1104

work page 2016
[29]

Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (2018), 824–836

work page 2018
[30]

Khoi-Nguyen Nguyen-Ngoc, Thanh-Tung Phan-Nguyen, Khanh-Duy Le, Tam V Nguyen, Minh-Triet Tran, and Trung-Nghia Le. 2023. DM-VTON: Distilled Mobile Real-time Virtual Try-On. In IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct) . 695–700

work page 2023
[31]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis with Spatially-Adaptive Normalization. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 2337–2346

work page 2019
[32]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9

work page 2019
[33]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition . 4510–4520

work page 2018
[34]

Rohan Sarkar, Navaneeth Bodla, Mariya Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, and Gerard Medioni. 2022. OutfitTransformer: Outfit Representations for Fashion Recommendation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . 2262–2266. Manuscript submitted to ACM 18 T.-T. Phan-Nguyen et al

work page 2022
[35]

Sivic and Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In International Conference on Computer Vision . 1470–1477

work page 2003
[36]

Pongsate Tangseng, Kota Yamaguchi, and Takayuki Okatani. 2017. Recommending outfits from personal closet. In International Conference on Computer Vision Workshops (ICCV Workshops). 2275–2279

work page 2017
[37]

Vasileva, Bryan A

Mariya I. Vasileva, Bryan A. Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, and David Forsyth. 2018. Learning Type-Aware Embeddings for Fashion Compatibility. In European Conference on Computer Vision (ECCV) . 405–421

work page 2018
[38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems (2017), 6000–6010

work page 2017
[39]

Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, and Meng Yang. 2018. Toward characteristic-preserving image-based virtual try-on network. In European Conference on Computer Vision (ECCV) . 589–604

work page 2018
[40]

Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie, Kristen Grauman, and Rogerio Feris. 2021. Fashion IQ: A new dataset towards retrieving images by natural language feedback. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 11307–11317

work page 2021
[41]

Ying Wu, Hongbing Liu, Pengzhen Lu, Lihua Zhang, and Fangjian Yuan. 2022. Design and implementation of virtual fitting system based on gesture recognition and clothing transfer algorithm. Scientific Reports 12, 1 (2022), 18356

work page 2022
[42]

Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, Wangmeng Zuo, and Ping Luo. 2020. Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 7850–7859

work page 2020
[43]

Zhenglong Zhou, Bo Shu, Shaojie Zhuo, Xiaoming Deng, Ping Tan, and Stephen Lin. 2012. Image-based clothes animation for virtual fitting. In SIGGRAPH Asia Technical Briefs. 1–4. Manuscript submitted to ACM

work page 2012

[1] [1]

[n. d.]. Ace Your Product Recommendations to Grow Revenue. https://www.visenze.com/blog/2023/07/19/ace-your-product-recommendations-to- grow-revenue. Accessed: 2023-07-30

work page 2023

[2] [2]

[n. d.]. Fashion e-commerce market value worldwide from 2023 to 2027. https://www.statista.com/topics/9288/fashion-e-commerce-worldwide. Accessed: 2023-06-29

work page 2023

[3] [3]

Shuai Bai, Huiling Zhou, Zhikang Li, Chang Zhou, and Hongxia Yang. 2022. Single stage virtual try-on via deformable attention flows. In European Conference on Computer Vision (ECCV) . 409–425

work page 2022

[4] [4]

Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022. Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 4959–4968

work page 2022

[5] [5]

Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022. Effective Conditioned and Composed Image Retrieval Combining CLIP-Based Features. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 21466–21474

work page 2022

[6] [6]

Erik Bernhardsson. 2018. Annoy: Approximate Nearest Neighbors in C++/Python . https://pypi.org/project/annoy/ Python package version 1.13.0. Manuscript submitted to ACM KiseKloset: Comprehensive System For Outfit Retrieval, Recommendation, And Try-On 17

work page 2018

[7] [7]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 7291–7299

work page 2017

[8] [8]

Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao, and Binqiang Zhao. 2019. POG: personalized outfit generation for fashion recommendation at Alibaba iFashion. In International Conference on Knowledge Discovery & Data Mining (SIGKDD). 2662–2670

work page 2019

[9] [9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Benjamin Fele, Ajda Lampe, Peter Peer, and Vitomir Struc. 2022. C-vton: Context-driven image-based virtual try-on network. In IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV). 3144–3153

work page 2022

[11] [11]

Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, and Ping Luo. 2021. Parser-free virtual try-on via distilling appearance flows. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 8485–8493

work page 2021

[12] [12]

Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 932–940

work page 2017

[13] [13]

RA Guler, Natalia Neverova, and IK DensePose. 2018. DensePose: Dense human pose estimation in the wild. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 7297–7306

work page 2018

[14] [14]

Xintong Han, Xiaojun Hu, Weilin Huang, and Matthew R Scott. 2019. Clothflow: A flow-based model for clothed person generation. In IEEE/CVF International Conference on Computer Vision (ICCV) . 10471–10480

work page 2019

[15] [15]

Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S Davis. 2018. Viton: An image-based virtual try-on network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 7543–7552

work page 2018

[16] [16]

Xiao Han, Licheng Yu, Xiatian Zhu, Li Zhang, Yi-Zhe Song, and Tao Xiang. 2022. FashionViL: Fashion-Focused Vision-and-Language Representation Learning. In European Conference on Computer Vision (ECCV) . 634–651

work page 2022

[17] [17]

Sen He, Yi-Zhe Song, and Tao Xiang. 2022. Style-based global appearance flow for virtual try-on. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3470–3479

work page 2022

[18] [18]

Thibaut Issenhuth, Jérémie Mary, and Clément Calauzenes. 2020. Do not mask what you do not need to mask: a parser-free virtual try-on. In European Conference on Computer Vision (ECCV) . 619–635

work page 2020

[19] [19]

Junkyu Jang, Eugene Hwang, and Sung-Hyuk Park. 2024. Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval. In Winter Conference on Applications of Computer Vision (W ACV). 8066–8075

work page 2024

[20] [20]

Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2010), 117–128

work page 2010

[21] [21]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547

work page 2019

[22] [22]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 4401–4410

work page 2019

[23] [23]

Cheng-I Lai. 2019. Contrastive Predictive Coding Based Feature for Automatic Speaker Verification. arXiv preprint arXiv:1904.01575 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[24] [24]

Peike Li, Yunqiu Xu, Yunchao Wei, and Yi Yang. 2020. Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 6 (2020), 3260–3271

work page 2020

[25] [25]

Chao Lin, Zhao Li, Sheng Zhou, Shichang Hu, Jialun Zhang, Linhao Luo, Jiarun Zhang, Longtao Huang, and Yuan He. 2022. RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on. In International Joint Conference on Artificial Intelligence (IJCAI) . 1151–1158

work page 2022

[26] [26]

Yen-Liang Lin, Son Tran, and Larry Davis. 2020. Fashion Outfit Complementary Item Retrieval. In Conference on Computer Vision and Pattern Recognition. 3311–3319

work page 2020

[27] [27]

Nguyen, Jiashi Feng, Meng Wang, and Shuicheng Yan

Si Liu, Tam V. Nguyen, Jiashi Feng, Meng Wang, and Shuicheng Yan. 2012. Hi, Magic Closet, Tell Me What to Wear!. InInternational Conference on Multimedia (ACM MM). 619–628

work page 2012

[28] [28]

Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In IEEE Conference on Computer Vision and Pattern Recognition . 1096–1104

work page 2016

[29] [29]

Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (2018), 824–836

work page 2018

[30] [30]

Khoi-Nguyen Nguyen-Ngoc, Thanh-Tung Phan-Nguyen, Khanh-Duy Le, Tam V Nguyen, Minh-Triet Tran, and Trung-Nghia Le. 2023. DM-VTON: Distilled Mobile Real-time Virtual Try-On. In IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct) . 695–700

work page 2023

[31] [31]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis with Spatially-Adaptive Normalization. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 2337–2346

work page 2019

[32] [32]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9

work page 2019

[33] [33]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition . 4510–4520

work page 2018

[34] [34]

Rohan Sarkar, Navaneeth Bodla, Mariya Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, and Gerard Medioni. 2022. OutfitTransformer: Outfit Representations for Fashion Recommendation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . 2262–2266. Manuscript submitted to ACM 18 T.-T. Phan-Nguyen et al

work page 2022

[35] [35]

Sivic and Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In International Conference on Computer Vision . 1470–1477

work page 2003

[36] [36]

Pongsate Tangseng, Kota Yamaguchi, and Takayuki Okatani. 2017. Recommending outfits from personal closet. In International Conference on Computer Vision Workshops (ICCV Workshops). 2275–2279

work page 2017

[37] [37]

Vasileva, Bryan A

Mariya I. Vasileva, Bryan A. Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, and David Forsyth. 2018. Learning Type-Aware Embeddings for Fashion Compatibility. In European Conference on Computer Vision (ECCV) . 405–421

work page 2018

[38] [38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems (2017), 6000–6010

work page 2017

[39] [39]

Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, and Meng Yang. 2018. Toward characteristic-preserving image-based virtual try-on network. In European Conference on Computer Vision (ECCV) . 589–604

work page 2018

[40] [40]

Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie, Kristen Grauman, and Rogerio Feris. 2021. Fashion IQ: A new dataset towards retrieving images by natural language feedback. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 11307–11317

work page 2021

[41] [41]

Ying Wu, Hongbing Liu, Pengzhen Lu, Lihua Zhang, and Fangjian Yuan. 2022. Design and implementation of virtual fitting system based on gesture recognition and clothing transfer algorithm. Scientific Reports 12, 1 (2022), 18356

work page 2022

[42] [42]

Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, Wangmeng Zuo, and Ping Luo. 2020. Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 7850–7859

work page 2020

[43] [43]

Zhenglong Zhou, Bo Shu, Shaojie Zhuo, Xiaoming Deng, Ping Tan, and Stephen Lin. 2012. Image-based clothes animation for virtual fitting. In SIGGRAPH Asia Technical Briefs. 1–4. Manuscript submitted to ACM

work page 2012