pith. sign in

arxiv: 2408.09847 · v3 · pith:AZDWP34Unew · submitted 2024-08-19 · 💻 cs.IR

Fashion Image-to-Image Translation for Complementary Item Retrieval

Pith reviewed 2026-05-23 22:20 UTC · model grok-4.3

classification 💻 cs.IR
keywords fashion retrievalimage-to-image translationconditional GANcompatibility modelingcomplementary itemstop-bottom retrievalcomposed image retrievalgenerative models
0
0 comments X

The pith

A two-stage model generates complementary fashion images with conditional GANs and feeds them into retrieval to raise top-bottom matching accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GeCo to improve fashion compatibility modeling by first running a Complementary Item Generation Model that translates a seed item image into a compatible target item image. These synthetic images then act as extra conditioning inputs inside the retrieval stage. The authors argue that earlier generative methods lost performance because they did not verify the quality of the images they created, and that explicit attention to this quality plus the use of paired translation solves the problem even when training data is scarce. Experiments on three datasets, including a newly released Fashion Taobao collection, show higher accuracy than Bayesian ranking baselines and prior generative approaches. The work matters for online retail because better automatic matching can reduce the need for large labeled datasets while still producing usable recommendations.

Core claim

The central claim is that the Generative Compatibility Model (GeCo) improves fashion item retrieval by first using the Complementary Item Generation Model (CIGM), a conditional GAN performing paired image-to-image translation, to produce target-item images from seed items and then incorporating those generated images as conditioning signals inside the compatibility scoring step of composed image retrieval.

What carries the argument

The Complementary Item Generation Model (CIGM), a conditional GAN that performs paired image-to-image translation to create complementary-item images used as conditioning signals for retrieval.

If this is right

  • The GeCo model outperforms state-of-the-art baselines on three top-bottom retrieval datasets.
  • Paired image-to-image translation inside the composed image retrieval framework supplies effective conditioning signals.
  • The approach mitigates the need for very large training sets that typical generative models require.
  • Release of the Fashion Taobao dataset provides a new benchmark for top-bottom compatibility research.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same two-stage pattern of generating conditioning images before retrieval could be tested on non-fashion item pairing tasks such as furniture or accessory matching.
  • If generation quality fluctuates across items, an explicit quality filter or uncertainty estimate on the synthetic images might further stabilize results.
  • Extending the method from pairs to sets of three or more mutually compatible items would be a direct next measurement of the same conditioning mechanism.

Load-bearing premise

The images produced by the CIGM component are high-quality enough to supply useful conditioning signals that raise rather than lower retrieval performance.

What would settle it

Retraining the retrieval stage on the same three datasets once with and once without the CIGM-generated images and observing no gain or a drop in accuracy metrics would falsify the claim.

Figures

Figures reproduced from arXiv: 2408.09847 by Claudio Pomo, Dietmar Jannach, Matteo Attimonelli, Tommaso Di Noia.

Figure 1
Figure 1. Figure 1: In the proposed architecture the CIGM model generates bottom templates. Subsequently, the GeCo model leverages the top, the generated template, and the candidate bottom images to evaluate their compati￾bility. This approach facilitates both compatibility modeling and complementary item retrieval tasks. In the remainder of this paper, we review related work in fashion image retrieval and generative models. … view at source ↗
Figure 2
Figure 2. Figure 2: An example of generated images from [14] illustrates the differences in generation quality: (a) presents images generated by a VAE, while (b) showcases images sampled from a GAN. training of GANs can be formulated as a min-max game with the objective function, shown in Equation (1), derived from the Jensen-Shannon (JS) divergence, where 𝑝𝑑𝑎𝑡𝑎 (x) represents the data distribution, and 𝑝z (z) represents the … view at source ↗
Figure 3
Figure 3. Figure 3: Pix2Pix original generator [26]. distributions over the sets of tops T and bottoms B. Our approach first involves learning a mapping between the probability distributions of tops 𝑃𝑑𝑎𝑡𝑎 (T ) and bottoms 𝑃𝑑𝑎𝑡𝑎 (B) using the CIGM model. Then we use this mapping to generate meaningful templates given a top 𝑡 ∈ T. Ideally, the CIGM model should learn to generate samples 𝑏 ∈ B. However, in practice, CIGM learns … view at source ↗
Figure 4
Figure 4. Figure 4: Complementary Item Generation Model. the capture of high-frequency details and textures with greater effectiveness. This design allows the discriminator to identify which specific areas of the generated image should be improved to deceive the discriminator effectively. As foreshadowed in the previous section, in order to overcome the mode collapse phenome￾non [40, 53] and to produce more realistic template… view at source ↗
Figure 5
Figure 5. Figure 5: Top: conditioning tops. Middle: ground-truth bottoms. Bottom: generated bottoms with the proposed [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The complete two-stage architecture, highlighting the Generative Compatibility Model (GeCo) and describing the overall structure. While the BPR loss focuses on optimizing pairwise rankings by ensuring preferred items are ranked higher than non-preferred items, the InfoNCE loss operates in a self-supervised manner, aiming to bring positive sample pairs closer together in the latent space while pushing negat… view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of pairs across all datasets. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Scatter plots illustrating variations in terms of AUC and MRR in response to adjustments of the loss [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The templates generated by the CIGM, MGCM, and Pix2PixCM models, given the same top image, highlighting the superior quality of our templates. The images are taken from the FashionTaobaoTB dataset. We note that the templates from the baseline models (MGCM and Pix2PixCM) are scaled to match the higher resolution of our templates. three top image inputs. Evidently, the templates produced by CIGM exhibit sign… view at source ↗
Figure 10
Figure 10. Figure 10: Retrieval performance of the GeCo model on the FashionTaobaoTB dataset, compared to various baseline models, all using the same input top. It can be observed that our model exhibits superior retrieval performance and generates more realistic templates with higher resolution. The first row displays the input top, while the second row shows the bottom template generated by the corresponding model in each co… view at source ↗
read the original abstract

The increasing demand for online fashion retail has boosted research in fashion compatibility modeling and item retrieval, focusing on matching user queries (textual descriptions or reference images) with compatible fashion items. A key challenge is top-bottom retrieval, where precise compatibility modeling is essential. Traditional methods, often based on Bayesian Personalized Ranking (BPR), have shown limited performance. Recent efforts have explored using generative models in compatibility modeling and item retrieval, where generated images serve as additional inputs. However, these approaches often overlook the quality of generated images, which could be crucial for model performance. Additionally, generative models typically require large datasets, posing challenges when such data is scarce. To address these issues, we introduce the Generative Compatibility Model (GeCo), a two-stage approach that improves fashion image retrieval through paired image-to-image translation. First, the Complementary Item Generation Model (CIGM), built on Conditional Generative Adversarial Networks (GANs), generates target item images (e.g., bottoms) from seed items (e.g., tops), offering conditioning signals for retrieval. These generated samples are then integrated into GeCo, enhancing compatibility modeling and retrieval accuracy. Evaluations on three datasets show that GeCo outperforms state-of-the-art baselines. Key contributions include: (i) the GeCo model utilizing paired image-to-image translation within the Composed Image Retrieval framework, (ii) comprehensive evaluations on benchmark datasets, and (iii) the release of a new Fashion Taobao dataset designed for top-bottom retrieval, promoting further research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Generative Compatibility Model (GeCo), a two-stage approach for fashion complementary item retrieval. The first stage, Complementary Item Generation Model (CIGM), employs Conditional Generative Adversarial Networks (cGANs) to perform paired image-to-image translation, generating images of complementary items (e.g., bottoms from tops). These generated images serve as conditioning signals in the second stage for improved compatibility modeling and retrieval. The paper reports that GeCo outperforms state-of-the-art baselines on three datasets and releases a new Fashion Taobao dataset for top-bottom retrieval.

Significance. If the empirical claims hold with proper controls, the work is significant for highlighting the importance of generated image quality in generative approaches to compatibility modeling, which prior work overlooked. The release of the new dataset is a positive contribution that could facilitate further research in the field. The two-stage design directly targets the identified limitation in existing methods.

major comments (2)
  1. [Experiments] Experiments section: the central claim that GeCo outperforms baselines via CIGM-generated conditioning signals requires an ablation isolating the contribution of the generated images (e.g., retrieval performance with vs. without CIGM outputs, or with real vs. generated conditioning). Without this, it is impossible to confirm that the generated samples supply high-quality signals rather than noise, which is the load-bearing assumption flagged in the abstract.
  2. [Evaluation protocol] Evaluation protocol (likely §4 or §5): the abstract asserts outperformance on three datasets but the manuscript must report exact baselines, metrics (e.g., Recall@K, NDCG), data splits, and statistical significance tests; absence of these details prevents verification of the empirical superiority claim.
minor comments (2)
  1. [Abstract] Abstract: the description of the new Fashion Taobao dataset should include basic statistics (number of pairs, train/test split sizes) to allow immediate assessment of its scale and utility.
  2. [Model description] Notation: the distinction between CIGM and GeCo could be clarified with a single diagram or explicit statement of how the generated image is fed into the compatibility scorer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the empirical validation.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the central claim that GeCo outperforms baselines via CIGM-generated conditioning signals requires an ablation isolating the contribution of the generated images (e.g., retrieval performance with vs. without CIGM outputs, or with real vs. generated conditioning). Without this, it is impossible to confirm that the generated samples supply high-quality signals rather than noise, which is the load-bearing assumption flagged in the abstract.

    Authors: We agree that an explicit ablation isolating the CIGM contribution is required to substantiate the central claim. The current two-stage design assumes the generated images provide useful conditioning, but without direct comparison the source of gains remains unclear. In the revision we will add ablation results comparing retrieval performance with vs. without CIGM outputs and, where feasible, real vs. generated conditioning signals on the three datasets. revision: yes

  2. Referee: [Evaluation protocol] Evaluation protocol (likely §4 or §5): the abstract asserts outperformance on three datasets but the manuscript must report exact baselines, metrics (e.g., Recall@K, NDCG), data splits, and statistical significance tests; absence of these details prevents verification of the empirical superiority claim.

    Authors: We acknowledge that the evaluation details must be reported with full precision to allow verification. The revised manuscript will explicitly enumerate all baselines, list the complete set of metrics (including Recall@K and any NDCG), detail the train/validation/test splits for each of the three datasets, and add statistical significance tests (e.g., paired t-tests across runs) supporting the reported improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces an empirical two-stage architecture (CIGM using Conditional GANs to generate conditioning images, then integrated into GeCo for top-bottom retrieval) and reports performance gains on three datasets versus baselines. No equations, parameter-fitting steps, or derivation chain appear in the abstract or described contributions. The central claim is an external empirical comparison rather than any internal reduction to fitted inputs or self-citations, rendering the argument self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

Based on abstract only; the approach rests on the domain assumption that cGANs can produce usable complementary fashion images and on standard GAN training procedures whose hyperparameters are not enumerated.

free parameters (1)
  • cGAN training hyperparameters and loss weights
    Standard in conditional GAN models; values are fitted during training but not reported in abstract.
axioms (1)
  • domain assumption Conditional GANs conditioned on fashion images can generate images of compatible items at sufficient quality to aid retrieval
    Invoked as the justification for the CIGM stage.
invented entities (2)
  • GeCo no independent evidence
    purpose: Two-stage compatibility model that consumes generated images
    Newly proposed model name and architecture.
  • CIGM no independent evidence
    purpose: cGAN component that performs the image-to-image translation
    Newly proposed component name.

pith-pipeline@v0.9.0 · 5806 in / 1333 out tokens · 29769 ms · 2026-05-23T22:20:51.104263+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 4 internal anchors

  1. [1]

    Martín Arjovsky and Léon Bottou. 2017. Towards Principled Methods for Training Generative Adversarial Networks. In ICLR. OpenReview.net

  2. [2]

    Martín Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In ICML (Proceedings of Machine Learning Research, Vol. 70) . PMLR, 214–223

  3. [3]

    Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, and Alberto Del Bimbo. 2023. Zero-Shot Composed Image Retrieval with Textual Inversion. In ICCV. IEEE, 15292–15301

  4. [4]

    Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022. Effective conditioned and composed image retrieval combining CLIP-based features. In CVPR. IEEE, 21434–21442

  5. [5]

    Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2024. Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features. ACM Trans. Multim. Comput. Commun. Appl. 20, 3 (2024), 62:1–62:24

  6. [6]

    Adrien Berthelot, Eddy Caron, Mathilde Jay, and Laurent Lefèvre. 2024. Estimating the environmental impact of Generative-AI services using an LCA-based methodology. Procedia CIRP 122 (2024), 707–712

  7. [7]

    Koby Bibas, Oren Sar Shalom, and Dietmar Jannach. 2023. Semi-supervised Adversarial Learning for Complementary Item Recommendation. In WWW. ACM, 1804–1812

  8. [8]

    Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In SIGIR. ACM, 335–344

  9. [9]

    Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao, and Binqiang Zhao. 2019. POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion. In KDD. ACM, 2662–2670

  10. [10]

    Zeyu Cui, Zekun Li, Shu Wu, Xiaoyu Zhang, and Liang Wang. 2019. Dressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks. In WWW. ACM, 307–317

  11. [11]

    McAuley, Giovanni Pellegrini, Alejandro Bellogín, and Tommaso Di Noia

    Yashar Deldjoo, Fatemeh Nazary, Arnau Ramisa, Julian J. McAuley, Giovanni Pellegrini, Alejandro Bellogín, and Tommaso Di Noia. 2024. A Review of Modern Fashion Recommender Systems. ACM Comput. Surv. 56, 4 (2024), 87:1–87:37

  12. [12]

    Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2021. A Study on the Relative Impor- tance of Convolutional Neural Networks in Visually-Aware Recommender Systems. InCVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967

  13. [13]

    Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In NeurIPS. 8780–8794

  14. [14]

    Mohamed El-Kaddoury, Abdelhak Mahmoudi, and Mohamed Majid Himmi. 2019. Deep Generative Models for Image Generation: A Practical Comparison Between Variational Autoencoders and Generative Adversarial Networks. In MSPN. Springer

  15. [15]

    Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 8 (2006), 861–874

  16. [16]

    Chun-Mei Feng, Yang Bai, Tao Luo, Zhen Li, Salman Khan, Wangmeng Zuo, Xinxing Xu, Rick Siow Mong Goh, and Yong Liu. 2023. VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering.CoRR abs/2312.12273 (2023). 22 Attimonelli et al

  17. [17]

    Zhangchi Feng, Richong Zhang, and Zhijie Nie. 2024. Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives. arXiv preprint arXiv:2404.11317 (2024)

  18. [18]

    NIPS 2016 Tutorial: Generative Adversarial Networks

    Ian J. Goodfellow. 2017. NIPS 2016 Tutorial: Generative Adversarial Networks. CoRR abs/1701.00160 (2017)

  19. [19]

    Generative Adversarial Networks

    Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. CoRR abs/1406.2661 (2014)

  20. [20]

    Courville

    Ishaan Gulrajani, Faruk Ahmed, Martín Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved Training of Wasserstein GANs. In NIPS. 5767–5777

  21. [21]

    Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou Sun, and Wei Wang. 2020. P-Companion: A Principled Framework for Diversified Complementary Product Recommendation. In CIKM. ACM, 2517–2524

  22. [22]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770–778

  23. [23]

    Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI. AAAI Press, 144–150

  24. [24]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In NeurIPS

  25. [25]

    Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, and Mehdi Noroozi

    David T. Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, and Mehdi Noroozi. 2022. Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives. In AAAI. AAAI Press, 897–905

  26. [26]

    Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR. IEEE Computer Society, 5967–5976

  27. [27]

    Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian J. McAuley. 2017. Visually-Aware Fashion Recommendation and Design with Generative Image Models. In ICDM. IEEE Computer Society, 207–216

  28. [28]

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the Design Space of Diffusion-Based Generative Models. In NeurIPS

  29. [29]

    Kingma and Max Welling

    Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In ICLR

  30. [30]

    Bell, and Chris Volinsky

    Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (2009), 30–37

  31. [31]

    Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi

    Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In CVPR. IEEE Computer Society, 105–114

  32. [32]

    Xingchen Li, Xiang Wang, Xiangnan He, Long Chen, Jun Xiao, and Tat-Seng Chua. 2020. Hierarchical Fashion Graph Network for Personalized Outfit Recommendation. In SIGIR. ACM, 159–168

  33. [33]

    Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2019. Improving Outfit Recom- mendation with Co-supervision of Fashion Generation. In WWW. ACM, 1095–1105

  34. [34]

    Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2020. Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation. IEEE TKDE 32, 8 (2020), 1502–1516

  35. [35]

    Jinhuan Liu, Xuemeng Song, Zhumin Chen, and Jun Ma. 2020. MGCM: Multi-modal generative compatibility modeling for clothing matching. Neurocomputing 414 (2020), 215–224

  36. [36]

    Jinhuan Liu, Xuemeng Song, Zhaochun Ren, Liqiang Nie, Zhaopeng Tu, and Jun Ma. 2020. Auxiliary Template-Enhanced Generative Compatibility Modeling. In IJCAI. ijcai.org, 3508–3514

  37. [37]

    Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. 2022. Pseudo Numerical Methods for Diffusion Models on Manifolds. In ICLR. OpenReview.net

  38. [38]

    Qiang Liu, Shu Wu, and Liang Wang. 2017. DeepStyle: Learning User Preferences for Visual Recommendation. In SIGIR. ACM, 841–844

  39. [39]

    Zheyuan Liu, Cristian Rodriguez Opazo, Damien Teney, and Stephen Gould. 2021. Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models. In ICCV. IEEE, 2105–2114

  40. [40]

    Mescheder, Andreas Geiger, and Sebastian Nowozin

    Lars M. Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which Training Methods for GANs do actually Converge?. In ICML (Proceedings of Machine Learning Research, Vol. 80) . PMLR, 3478–3487

  41. [41]

    Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR abs/1411.1784 (2014)

  42. [42]

    Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In ICLR. OpenReview.net

  43. [43]

    Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially- Adaptive Normalization. In CVPR. Computer Vision Foundation / IEEE, 2337–2346

  44. [44]

    Razvan Pascanu, Tomás Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In ICML (3) (JMLR Workshop and Conference Proceedings, Vol. 28) . JMLR.org, 1310–1318

  45. [45]

    Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI. AUAI Press, 452–461

  46. [46]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR. IEEE, 10674–10685. Fashion Image-to-Image Translation for Complementary Item Retrieval 23

  47. [47]

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI (3) (Lecture Notes in Computer Science, Vol. 9351) . Springer, 234–241

  48. [48]

    Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, and Gerard Medioni

    Rohan Sarkar, Navaneeth Bodla, Mariya I. Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, and Gerard Medioni

  49. [49]

    In W ACV

    OutfitTransformer: Learning Outfit Representations for Fashion Recommendation. In W ACV. IEEE, 3590–3598

  50. [50]

    Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. InICLR. OpenReview.net

  51. [51]

    Xuemeng Song, Fuli Feng, Jinhuan Liu, Zekun Li, Liqiang Nie, and Jun Ma. 2017. NeuroStylist: Neural Compatibility Modeling for Clothing Matching. In ACM Multimedia. ACM, 753–761

  52. [52]

    Xuemeng Song, Xianjing Han, Yunkai Li, Jingyuan Chen, Xin-Shun Xu, and Liqiang Nie. 2019. GP-BPR: Personalized Compatibility Modeling for Clothing Matching. In ACM Multimedia. ACM, 320–328

  53. [53]

    Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score- Based Generative Modeling through Stochastic Differential Equations. In ICLR. OpenReview.net

  54. [54]

    Hoang Thanh-Tung and Truyen Tran. 2020. Catastrophic forgetting and mode collapse in GANs. In IJCNN

  55. [55]

    Newsam, and Kofi Boakye

    Yuxin Tian, Shawn D. Newsam, and Kofi Boakye. 2023. Fashion Image Retrieval with Text Feedback by Additive Attention Compositional Learning. In W ACV. IEEE, 1011–1021

  56. [56]

    Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. CoRR abs/1807.03748 (2018)

  57. [57]

    Feng Wang and Huaping Liu. 2021. Understanding the Behaviour of Contrastive Loss. In CVPR. Computer Vision Foundation / IEEE, 2495–2504

  58. [58]

    Jianfeng Wang, Xiaochun Cheng, Ruomei Wang, and Shaohui Liu. 2021. Learning Outfit Compatibility with Graph Attention Network and Visual-Semantic Embedding. In ICME. IEEE, 1–6

  59. [59]

    Jui-Chieh Wu, José Antonio Sánchez Rodríguez, and Humberto Jesús Corona Pampín. 2019. Session-based Comple- mentary Fashion Recommendations. CoRR abs/1908.08327 (2019)

  60. [60]

    Huijing Zhan and Jie Lin. 2021. PAN: Personalized Attention Network For Outfit Recommendation. In 2021 IEEE International Conference on Image Processing, ICIP 2021 . IEEE, 2663–2667

  61. [61]

    Han Zhang, Tao Xu, and Hongsheng Li. 2017. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV. IEEE Computer Society, 5908–5916

  62. [62]

    Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV. IEEE Computer Society, 2242–2251