Fashion Image-to-Image Translation for Complementary Item Retrieval

Claudio Pomo; Dietmar Jannach; Matteo Attimonelli; Tommaso Di Noia

arxiv: 2408.09847 · v3 · pith:AZDWP34Unew · submitted 2024-08-19 · 💻 cs.IR

Fashion Image-to-Image Translation for Complementary Item Retrieval

Matteo Attimonelli , Claudio Pomo , Dietmar Jannach , Tommaso Di Noia This is my paper

Pith reviewed 2026-05-23 22:20 UTC · model grok-4.3

classification 💻 cs.IR

keywords fashion retrievalimage-to-image translationconditional GANcompatibility modelingcomplementary itemstop-bottom retrievalcomposed image retrievalgenerative models

0 comments

The pith

A two-stage model generates complementary fashion images with conditional GANs and feeds them into retrieval to raise top-bottom matching accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GeCo to improve fashion compatibility modeling by first running a Complementary Item Generation Model that translates a seed item image into a compatible target item image. These synthetic images then act as extra conditioning inputs inside the retrieval stage. The authors argue that earlier generative methods lost performance because they did not verify the quality of the images they created, and that explicit attention to this quality plus the use of paired translation solves the problem even when training data is scarce. Experiments on three datasets, including a newly released Fashion Taobao collection, show higher accuracy than Bayesian ranking baselines and prior generative approaches. The work matters for online retail because better automatic matching can reduce the need for large labeled datasets while still producing usable recommendations.

Core claim

The central claim is that the Generative Compatibility Model (GeCo) improves fashion item retrieval by first using the Complementary Item Generation Model (CIGM), a conditional GAN performing paired image-to-image translation, to produce target-item images from seed items and then incorporating those generated images as conditioning signals inside the compatibility scoring step of composed image retrieval.

What carries the argument

The Complementary Item Generation Model (CIGM), a conditional GAN that performs paired image-to-image translation to create complementary-item images used as conditioning signals for retrieval.

If this is right

The GeCo model outperforms state-of-the-art baselines on three top-bottom retrieval datasets.
Paired image-to-image translation inside the composed image retrieval framework supplies effective conditioning signals.
The approach mitigates the need for very large training sets that typical generative models require.
Release of the Fashion Taobao dataset provides a new benchmark for top-bottom compatibility research.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-stage pattern of generating conditioning images before retrieval could be tested on non-fashion item pairing tasks such as furniture or accessory matching.
If generation quality fluctuates across items, an explicit quality filter or uncertainty estimate on the synthetic images might further stabilize results.
Extending the method from pairs to sets of three or more mutually compatible items would be a direct next measurement of the same conditioning mechanism.

Load-bearing premise

The images produced by the CIGM component are high-quality enough to supply useful conditioning signals that raise rather than lower retrieval performance.

What would settle it

Retraining the retrieval stage on the same three datasets once with and once without the CIGM-generated images and observing no gain or a drop in accuracy metrics would falsify the claim.

Figures

Figures reproduced from arXiv: 2408.09847 by Claudio Pomo, Dietmar Jannach, Matteo Attimonelli, Tommaso Di Noia.

**Figure 1.** Figure 1: In the proposed architecture the CIGM model generates bottom templates. Subsequently, the GeCo model leverages the top, the generated template, and the candidate bottom images to evaluate their compatibility. This approach facilitates both compatibility modeling and complementary item retrieval tasks. In the remainder of this paper, we review related work in fashion image retrieval and generative models. … view at source ↗

**Figure 2.** Figure 2: An example of generated images from [14] illustrates the differences in generation quality: (a) presents images generated by a VAE, while (b) showcases images sampled from a GAN. training of GANs can be formulated as a min-max game with the objective function, shown in Equation (1), derived from the Jensen-Shannon (JS) divergence, where 𝑝𝑑𝑎𝑡𝑎 (x) represents the data distribution, and 𝑝z (z) represents the … view at source ↗

**Figure 3.** Figure 3: Pix2Pix original generator [26]. distributions over the sets of tops T and bottoms B. Our approach first involves learning a mapping between the probability distributions of tops 𝑃𝑑𝑎𝑡𝑎 (T ) and bottoms 𝑃𝑑𝑎𝑡𝑎 (B) using the CIGM model. Then we use this mapping to generate meaningful templates given a top 𝑡 ∈ T. Ideally, the CIGM model should learn to generate samples 𝑏 ∈ B. However, in practice, CIGM learns … view at source ↗

**Figure 4.** Figure 4: Complementary Item Generation Model. the capture of high-frequency details and textures with greater effectiveness. This design allows the discriminator to identify which specific areas of the generated image should be improved to deceive the discriminator effectively. As foreshadowed in the previous section, in order to overcome the mode collapse phenomenon [40, 53] and to produce more realistic template… view at source ↗

**Figure 5.** Figure 5: Top: conditioning tops. Middle: ground-truth bottoms. Bottom: generated bottoms with the proposed [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: The complete two-stage architecture, highlighting the Generative Compatibility Model (GeCo) and describing the overall structure. While the BPR loss focuses on optimizing pairwise rankings by ensuring preferred items are ranked higher than non-preferred items, the InfoNCE loss operates in a self-supervised manner, aiming to bring positive sample pairs closer together in the latent space while pushing negat… view at source ↗

**Figure 7.** Figure 7: Distribution of pairs across all datasets. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Scatter plots illustrating variations in terms of AUC and MRR in response to adjustments of the loss [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: The templates generated by the CIGM, MGCM, and Pix2PixCM models, given the same top image, highlighting the superior quality of our templates. The images are taken from the FashionTaobaoTB dataset. We note that the templates from the baseline models (MGCM and Pix2PixCM) are scaled to match the higher resolution of our templates. three top image inputs. Evidently, the templates produced by CIGM exhibit sign… view at source ↗

**Figure 10.** Figure 10: Retrieval performance of the GeCo model on the FashionTaobaoTB dataset, compared to various baseline models, all using the same input top. It can be observed that our model exhibits superior retrieval performance and generates more realistic templates with higher resolution. The first row displays the input top, while the second row shows the bottom template generated by the corresponding model in each co… view at source ↗

read the original abstract

The increasing demand for online fashion retail has boosted research in fashion compatibility modeling and item retrieval, focusing on matching user queries (textual descriptions or reference images) with compatible fashion items. A key challenge is top-bottom retrieval, where precise compatibility modeling is essential. Traditional methods, often based on Bayesian Personalized Ranking (BPR), have shown limited performance. Recent efforts have explored using generative models in compatibility modeling and item retrieval, where generated images serve as additional inputs. However, these approaches often overlook the quality of generated images, which could be crucial for model performance. Additionally, generative models typically require large datasets, posing challenges when such data is scarce. To address these issues, we introduce the Generative Compatibility Model (GeCo), a two-stage approach that improves fashion image retrieval through paired image-to-image translation. First, the Complementary Item Generation Model (CIGM), built on Conditional Generative Adversarial Networks (GANs), generates target item images (e.g., bottoms) from seed items (e.g., tops), offering conditioning signals for retrieval. These generated samples are then integrated into GeCo, enhancing compatibility modeling and retrieval accuracy. Evaluations on three datasets show that GeCo outperforms state-of-the-art baselines. Key contributions include: (i) the GeCo model utilizing paired image-to-image translation within the Composed Image Retrieval framework, (ii) comprehensive evaluations on benchmark datasets, and (iii) the release of a new Fashion Taobao dataset designed for top-bottom retrieval, promoting further research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GeCo slots a standard cGAN image-to-image step into a two-stage fashion retrieval pipeline and ships a new top-bottom dataset; the empirical gains need the full numbers to judge.

read the letter

The paper's core move is to generate a complementary bottom image from a top via conditional GAN, then feed the generated image as extra conditioning into a retrieval model. This is packaged as GeCo and tested on three datasets, with a new Fashion Taobao collection released for the task. The abstract flags that earlier generative work skipped checking whether the synthetic images actually helped, and the two-stage design is meant to fix that. That framing is reasonable and the dataset release is a concrete plus for anyone working on this narrow slice of e-commerce retrieval. The approach itself is an application of existing cGAN techniques rather than a new framework, so the novelty sits in the integration and the data contribution. On the evidence side, the abstract asserts outperformance but gives no baselines, metrics, significance tests, or ablation on the generated-image quality, which leaves the size of the lift unclear. If the full paper shows clean ablations and reasonable controls on the GAN outputs, that would strengthen the case; if the gains are small or sensitive to hyper-parameters, the contribution shrinks. No obvious internal contradictions jump out from the description, and the circularity burden is low since the claim is empirical. This is useful reading for people already focused on fashion compatibility modeling or composed image retrieval. It is not broad enough to interest a general IR or generative-model audience. I would send it to review so the experimental details can be checked properly rather than desk-rejecting it outright.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Generative Compatibility Model (GeCo), a two-stage approach for fashion complementary item retrieval. The first stage, Complementary Item Generation Model (CIGM), employs Conditional Generative Adversarial Networks (cGANs) to perform paired image-to-image translation, generating images of complementary items (e.g., bottoms from tops). These generated images serve as conditioning signals in the second stage for improved compatibility modeling and retrieval. The paper reports that GeCo outperforms state-of-the-art baselines on three datasets and releases a new Fashion Taobao dataset for top-bottom retrieval.

Significance. If the empirical claims hold with proper controls, the work is significant for highlighting the importance of generated image quality in generative approaches to compatibility modeling, which prior work overlooked. The release of the new dataset is a positive contribution that could facilitate further research in the field. The two-stage design directly targets the identified limitation in existing methods.

major comments (2)

[Experiments] Experiments section: the central claim that GeCo outperforms baselines via CIGM-generated conditioning signals requires an ablation isolating the contribution of the generated images (e.g., retrieval performance with vs. without CIGM outputs, or with real vs. generated conditioning). Without this, it is impossible to confirm that the generated samples supply high-quality signals rather than noise, which is the load-bearing assumption flagged in the abstract.
[Evaluation protocol] Evaluation protocol (likely §4 or §5): the abstract asserts outperformance on three datasets but the manuscript must report exact baselines, metrics (e.g., Recall@K, NDCG), data splits, and statistical significance tests; absence of these details prevents verification of the empirical superiority claim.

minor comments (2)

[Abstract] Abstract: the description of the new Fashion Taobao dataset should include basic statistics (number of pairs, train/test split sizes) to allow immediate assessment of its scale and utility.
[Model description] Notation: the distinction between CIGM and GeCo could be clarified with a single diagram or explicit statement of how the generated image is fed into the compatibility scorer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the empirical validation.

read point-by-point responses

Referee: [Experiments] Experiments section: the central claim that GeCo outperforms baselines via CIGM-generated conditioning signals requires an ablation isolating the contribution of the generated images (e.g., retrieval performance with vs. without CIGM outputs, or with real vs. generated conditioning). Without this, it is impossible to confirm that the generated samples supply high-quality signals rather than noise, which is the load-bearing assumption flagged in the abstract.

Authors: We agree that an explicit ablation isolating the CIGM contribution is required to substantiate the central claim. The current two-stage design assumes the generated images provide useful conditioning, but without direct comparison the source of gains remains unclear. In the revision we will add ablation results comparing retrieval performance with vs. without CIGM outputs and, where feasible, real vs. generated conditioning signals on the three datasets. revision: yes
Referee: [Evaluation protocol] Evaluation protocol (likely §4 or §5): the abstract asserts outperformance on three datasets but the manuscript must report exact baselines, metrics (e.g., Recall@K, NDCG), data splits, and statistical significance tests; absence of these details prevents verification of the empirical superiority claim.

Authors: We acknowledge that the evaluation details must be reported with full precision to allow verification. The revised manuscript will explicitly enumerate all baselines, list the complete set of metrics (including Recall@K and any NDCG), detail the train/validation/test splits for each of the three datasets, and add statistical significance tests (e.g., paired t-tests across runs) supporting the reported improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces an empirical two-stage architecture (CIGM using Conditional GANs to generate conditioning images, then integrated into GeCo for top-bottom retrieval) and reports performance gains on three datasets versus baselines. No equations, parameter-fitting steps, or derivation chain appear in the abstract or described contributions. The central claim is an external empirical comparison rather than any internal reduction to fitted inputs or self-citations, rendering the argument self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

Based on abstract only; the approach rests on the domain assumption that cGANs can produce usable complementary fashion images and on standard GAN training procedures whose hyperparameters are not enumerated.

free parameters (1)

cGAN training hyperparameters and loss weights
Standard in conditional GAN models; values are fitted during training but not reported in abstract.

axioms (1)

domain assumption Conditional GANs conditioned on fashion images can generate images of compatible items at sufficient quality to aid retrieval
Invoked as the justification for the CIGM stage.

invented entities (2)

GeCo no independent evidence
purpose: Two-stage compatibility model that consumes generated images
Newly proposed model name and architecture.
CIGM no independent evidence
purpose: cGAN component that performs the image-to-image translation
Newly proposed component name.

pith-pipeline@v0.9.0 · 5806 in / 1333 out tokens · 29769 ms · 2026-05-23T22:20:51.104263+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 4 internal anchors

[1]

Martín Arjovsky and Léon Bottou. 2017. Towards Principled Methods for Training Generative Adversarial Networks. In ICLR. OpenReview.net

work page 2017
[2]

Martín Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In ICML (Proceedings of Machine Learning Research, Vol. 70) . PMLR, 214–223

work page 2017
[3]

Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, and Alberto Del Bimbo. 2023. Zero-Shot Composed Image Retrieval with Textual Inversion. In ICCV. IEEE, 15292–15301

work page 2023
[4]

Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022. Effective conditioned and composed image retrieval combining CLIP-based features. In CVPR. IEEE, 21434–21442

work page 2022
[5]

Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2024. Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features. ACM Trans. Multim. Comput. Commun. Appl. 20, 3 (2024), 62:1–62:24

work page 2024
[6]

Adrien Berthelot, Eddy Caron, Mathilde Jay, and Laurent Lefèvre. 2024. Estimating the environmental impact of Generative-AI services using an LCA-based methodology. Procedia CIRP 122 (2024), 707–712

work page 2024
[7]

Koby Bibas, Oren Sar Shalom, and Dietmar Jannach. 2023. Semi-supervised Adversarial Learning for Complementary Item Recommendation. In WWW. ACM, 1804–1812

work page 2023
[8]

Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In SIGIR. ACM, 335–344

work page 2017
[9]

Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao, and Binqiang Zhao. 2019. POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion. In KDD. ACM, 2662–2670

work page 2019
[10]

Zeyu Cui, Zekun Li, Shu Wu, Xiaoyu Zhang, and Liang Wang. 2019. Dressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks. In WWW. ACM, 307–317

work page 2019
[11]

McAuley, Giovanni Pellegrini, Alejandro Bellogín, and Tommaso Di Noia

Yashar Deldjoo, Fatemeh Nazary, Arnau Ramisa, Julian J. McAuley, Giovanni Pellegrini, Alejandro Bellogín, and Tommaso Di Noia. 2024. A Review of Modern Fashion Recommender Systems. ACM Comput. Surv. 56, 4 (2024), 87:1–87:37

work page 2024
[12]

Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2021. A Study on the Relative Impor- tance of Convolutional Neural Networks in Visually-Aware Recommender Systems. InCVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967

work page 2021
[13]

Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In NeurIPS. 8780–8794

work page 2021
[14]

Mohamed El-Kaddoury, Abdelhak Mahmoudi, and Mohamed Majid Himmi. 2019. Deep Generative Models for Image Generation: A Practical Comparison Between Variational Autoencoders and Generative Adversarial Networks. In MSPN. Springer

work page 2019
[15]

Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 8 (2006), 861–874

work page 2006
[16]

Chun-Mei Feng, Yang Bai, Tao Luo, Zhen Li, Salman Khan, Wangmeng Zuo, Xinxing Xu, Rick Siow Mong Goh, and Yong Liu. 2023. VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering.CoRR abs/2312.12273 (2023). 22 Attimonelli et al

work page arXiv 2023
[17]

Zhangchi Feng, Richong Zhang, and Zhijie Nie. 2024. Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives. arXiv preprint arXiv:2404.11317 (2024)

work page arXiv 2024
[18]

NIPS 2016 Tutorial: Generative Adversarial Networks

Ian J. Goodfellow. 2017. NIPS 2016 Tutorial: Generative Adversarial Networks. CoRR abs/1701.00160 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Generative Adversarial Networks

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. CoRR abs/1406.2661 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[20]

Courville

Ishaan Gulrajani, Faruk Ahmed, Martín Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved Training of Wasserstein GANs. In NIPS. 5767–5777

work page 2017
[21]

Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou Sun, and Wei Wang. 2020. P-Companion: A Principled Framework for Diversified Complementary Product Recommendation. In CIKM. ACM, 2517–2524

work page 2020
[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770–778

work page 2016
[23]

Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI. AAAI Press, 144–150

work page 2016
[24]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In NeurIPS

work page 2020
[25]

Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, and Mehdi Noroozi

David T. Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, and Mehdi Noroozi. 2022. Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives. In AAAI. AAAI Press, 897–905

work page 2022
[26]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR. IEEE Computer Society, 5967–5976

work page 2017
[27]

Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian J. McAuley. 2017. Visually-Aware Fashion Recommendation and Design with Generative Image Models. In ICDM. IEEE Computer Society, 207–216

work page 2017
[28]

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the Design Space of Diffusion-Based Generative Models. In NeurIPS

work page 2022
[29]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In ICLR

work page 2014
[30]

Bell, and Chris Volinsky

Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (2009), 30–37

work page 2009
[31]

Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi

Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In CVPR. IEEE Computer Society, 105–114

work page 2017
[32]

Xingchen Li, Xiang Wang, Xiangnan He, Long Chen, Jun Xiao, and Tat-Seng Chua. 2020. Hierarchical Fashion Graph Network for Personalized Outfit Recommendation. In SIGIR. ACM, 159–168

work page 2020
[33]

Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2019. Improving Outfit Recom- mendation with Co-supervision of Fashion Generation. In WWW. ACM, 1095–1105

work page 2019
[34]

Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2020. Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation. IEEE TKDE 32, 8 (2020), 1502–1516

work page 2020
[35]

Jinhuan Liu, Xuemeng Song, Zhumin Chen, and Jun Ma. 2020. MGCM: Multi-modal generative compatibility modeling for clothing matching. Neurocomputing 414 (2020), 215–224

work page 2020
[36]

Jinhuan Liu, Xuemeng Song, Zhaochun Ren, Liqiang Nie, Zhaopeng Tu, and Jun Ma. 2020. Auxiliary Template-Enhanced Generative Compatibility Modeling. In IJCAI. ijcai.org, 3508–3514

work page 2020
[37]

Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. 2022. Pseudo Numerical Methods for Diffusion Models on Manifolds. In ICLR. OpenReview.net

work page 2022
[38]

Qiang Liu, Shu Wu, and Liang Wang. 2017. DeepStyle: Learning User Preferences for Visual Recommendation. In SIGIR. ACM, 841–844

work page 2017
[39]

Zheyuan Liu, Cristian Rodriguez Opazo, Damien Teney, and Stephen Gould. 2021. Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models. In ICCV. IEEE, 2105–2114

work page 2021
[40]

Mescheder, Andreas Geiger, and Sebastian Nowozin

Lars M. Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which Training Methods for GANs do actually Converge?. In ICML (Proceedings of Machine Learning Research, Vol. 80) . PMLR, 3478–3487

work page 2018
[41]

Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR abs/1411.1784 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[42]

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In ICLR. OpenReview.net

work page 2018
[43]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially- Adaptive Normalization. In CVPR. Computer Vision Foundation / IEEE, 2337–2346

work page 2019
[44]

Razvan Pascanu, Tomás Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In ICML (3) (JMLR Workshop and Conference Proceedings, Vol. 28) . JMLR.org, 1310–1318

work page 2013
[45]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI. AUAI Press, 452–461

work page 2009
[46]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR. IEEE, 10674–10685. Fashion Image-to-Image Translation for Complementary Item Retrieval 23

work page 2022
[47]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI (3) (Lecture Notes in Computer Science, Vol. 9351) . Springer, 234–241

work page 2015
[48]

Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, and Gerard Medioni

Rohan Sarkar, Navaneeth Bodla, Mariya I. Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, and Gerard Medioni

work page
[49]

In W ACV

OutfitTransformer: Learning Outfit Representations for Fashion Recommendation. In W ACV. IEEE, 3590–3598

work page
[50]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. InICLR. OpenReview.net

work page 2021
[51]

Xuemeng Song, Fuli Feng, Jinhuan Liu, Zekun Li, Liqiang Nie, and Jun Ma. 2017. NeuroStylist: Neural Compatibility Modeling for Clothing Matching. In ACM Multimedia. ACM, 753–761

work page 2017
[52]

Xuemeng Song, Xianjing Han, Yunkai Li, Jingyuan Chen, Xin-Shun Xu, and Liqiang Nie. 2019. GP-BPR: Personalized Compatibility Modeling for Clothing Matching. In ACM Multimedia. ACM, 320–328

work page 2019
[53]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score- Based Generative Modeling through Stochastic Differential Equations. In ICLR. OpenReview.net

work page 2021
[54]

Hoang Thanh-Tung and Truyen Tran. 2020. Catastrophic forgetting and mode collapse in GANs. In IJCNN

work page 2020
[55]

Newsam, and Kofi Boakye

Yuxin Tian, Shawn D. Newsam, and Kofi Boakye. 2023. Fashion Image Retrieval with Text Feedback by Additive Attention Compositional Learning. In W ACV. IEEE, 1011–1021

work page 2023
[56]

Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. CoRR abs/1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[57]

Feng Wang and Huaping Liu. 2021. Understanding the Behaviour of Contrastive Loss. In CVPR. Computer Vision Foundation / IEEE, 2495–2504

work page 2021
[58]

Jianfeng Wang, Xiaochun Cheng, Ruomei Wang, and Shaohui Liu. 2021. Learning Outfit Compatibility with Graph Attention Network and Visual-Semantic Embedding. In ICME. IEEE, 1–6

work page 2021
[59]

Jui-Chieh Wu, José Antonio Sánchez Rodríguez, and Humberto Jesús Corona Pampín. 2019. Session-based Comple- mentary Fashion Recommendations. CoRR abs/1908.08327 (2019)

work page arXiv 2019
[60]

Huijing Zhan and Jie Lin. 2021. PAN: Personalized Attention Network For Outfit Recommendation. In 2021 IEEE International Conference on Image Processing, ICIP 2021 . IEEE, 2663–2667

work page 2021
[61]

Han Zhang, Tao Xu, and Hongsheng Li. 2017. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV. IEEE Computer Society, 5908–5916

work page 2017
[62]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV. IEEE Computer Society, 2242–2251

work page 2017

[1] [1]

Martín Arjovsky and Léon Bottou. 2017. Towards Principled Methods for Training Generative Adversarial Networks. In ICLR. OpenReview.net

work page 2017

[2] [2]

Martín Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In ICML (Proceedings of Machine Learning Research, Vol. 70) . PMLR, 214–223

work page 2017

[3] [3]

Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, and Alberto Del Bimbo. 2023. Zero-Shot Composed Image Retrieval with Textual Inversion. In ICCV. IEEE, 15292–15301

work page 2023

[4] [4]

Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022. Effective conditioned and composed image retrieval combining CLIP-based features. In CVPR. IEEE, 21434–21442

work page 2022

[5] [5]

Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2024. Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features. ACM Trans. Multim. Comput. Commun. Appl. 20, 3 (2024), 62:1–62:24

work page 2024

[6] [6]

Adrien Berthelot, Eddy Caron, Mathilde Jay, and Laurent Lefèvre. 2024. Estimating the environmental impact of Generative-AI services using an LCA-based methodology. Procedia CIRP 122 (2024), 707–712

work page 2024

[7] [7]

Koby Bibas, Oren Sar Shalom, and Dietmar Jannach. 2023. Semi-supervised Adversarial Learning for Complementary Item Recommendation. In WWW. ACM, 1804–1812

work page 2023

[8] [8]

Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In SIGIR. ACM, 335–344

work page 2017

[9] [9]

Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao, and Binqiang Zhao. 2019. POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion. In KDD. ACM, 2662–2670

work page 2019

[10] [10]

Zeyu Cui, Zekun Li, Shu Wu, Xiaoyu Zhang, and Liang Wang. 2019. Dressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks. In WWW. ACM, 307–317

work page 2019

[11] [11]

McAuley, Giovanni Pellegrini, Alejandro Bellogín, and Tommaso Di Noia

Yashar Deldjoo, Fatemeh Nazary, Arnau Ramisa, Julian J. McAuley, Giovanni Pellegrini, Alejandro Bellogín, and Tommaso Di Noia. 2024. A Review of Modern Fashion Recommender Systems. ACM Comput. Surv. 56, 4 (2024), 87:1–87:37

work page 2024

[12] [12]

Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2021. A Study on the Relative Impor- tance of Convolutional Neural Networks in Visually-Aware Recommender Systems. InCVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967

work page 2021

[13] [13]

Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In NeurIPS. 8780–8794

work page 2021

[14] [14]

Mohamed El-Kaddoury, Abdelhak Mahmoudi, and Mohamed Majid Himmi. 2019. Deep Generative Models for Image Generation: A Practical Comparison Between Variational Autoencoders and Generative Adversarial Networks. In MSPN. Springer

work page 2019

[15] [15]

Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 8 (2006), 861–874

work page 2006

[16] [16]

Chun-Mei Feng, Yang Bai, Tao Luo, Zhen Li, Salman Khan, Wangmeng Zuo, Xinxing Xu, Rick Siow Mong Goh, and Yong Liu. 2023. VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering.CoRR abs/2312.12273 (2023). 22 Attimonelli et al

work page arXiv 2023

[17] [17]

Zhangchi Feng, Richong Zhang, and Zhijie Nie. 2024. Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives. arXiv preprint arXiv:2404.11317 (2024)

work page arXiv 2024

[18] [18]

NIPS 2016 Tutorial: Generative Adversarial Networks

Ian J. Goodfellow. 2017. NIPS 2016 Tutorial: Generative Adversarial Networks. CoRR abs/1701.00160 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

Generative Adversarial Networks

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. CoRR abs/1406.2661 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[20] [20]

Courville

Ishaan Gulrajani, Faruk Ahmed, Martín Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved Training of Wasserstein GANs. In NIPS. 5767–5777

work page 2017

[21] [21]

Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou Sun, and Wei Wang. 2020. P-Companion: A Principled Framework for Diversified Complementary Product Recommendation. In CIKM. ACM, 2517–2524

work page 2020

[22] [22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770–778

work page 2016

[23] [23]

Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI. AAAI Press, 144–150

work page 2016

[24] [24]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In NeurIPS

work page 2020

[25] [25]

Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, and Mehdi Noroozi

David T. Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, and Mehdi Noroozi. 2022. Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives. In AAAI. AAAI Press, 897–905

work page 2022

[26] [26]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR. IEEE Computer Society, 5967–5976

work page 2017

[27] [27]

Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian J. McAuley. 2017. Visually-Aware Fashion Recommendation and Design with Generative Image Models. In ICDM. IEEE Computer Society, 207–216

work page 2017

[28] [28]

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the Design Space of Diffusion-Based Generative Models. In NeurIPS

work page 2022

[29] [29]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In ICLR

work page 2014

[30] [30]

Bell, and Chris Volinsky

Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (2009), 30–37

work page 2009

[31] [31]

Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi

Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In CVPR. IEEE Computer Society, 105–114

work page 2017

[32] [32]

Xingchen Li, Xiang Wang, Xiangnan He, Long Chen, Jun Xiao, and Tat-Seng Chua. 2020. Hierarchical Fashion Graph Network for Personalized Outfit Recommendation. In SIGIR. ACM, 159–168

work page 2020

[33] [33]

Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2019. Improving Outfit Recom- mendation with Co-supervision of Fashion Generation. In WWW. ACM, 1095–1105

work page 2019

[34] [34]

Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2020. Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation. IEEE TKDE 32, 8 (2020), 1502–1516

work page 2020

[35] [35]

Jinhuan Liu, Xuemeng Song, Zhumin Chen, and Jun Ma. 2020. MGCM: Multi-modal generative compatibility modeling for clothing matching. Neurocomputing 414 (2020), 215–224

work page 2020

[36] [36]

Jinhuan Liu, Xuemeng Song, Zhaochun Ren, Liqiang Nie, Zhaopeng Tu, and Jun Ma. 2020. Auxiliary Template-Enhanced Generative Compatibility Modeling. In IJCAI. ijcai.org, 3508–3514

work page 2020

[37] [37]

Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. 2022. Pseudo Numerical Methods for Diffusion Models on Manifolds. In ICLR. OpenReview.net

work page 2022

[38] [38]

Qiang Liu, Shu Wu, and Liang Wang. 2017. DeepStyle: Learning User Preferences for Visual Recommendation. In SIGIR. ACM, 841–844

work page 2017

[39] [39]

Zheyuan Liu, Cristian Rodriguez Opazo, Damien Teney, and Stephen Gould. 2021. Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models. In ICCV. IEEE, 2105–2114

work page 2021

[40] [40]

Mescheder, Andreas Geiger, and Sebastian Nowozin

Lars M. Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which Training Methods for GANs do actually Converge?. In ICML (Proceedings of Machine Learning Research, Vol. 80) . PMLR, 3478–3487

work page 2018

[41] [41]

Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR abs/1411.1784 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[42] [42]

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In ICLR. OpenReview.net

work page 2018

[43] [43]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially- Adaptive Normalization. In CVPR. Computer Vision Foundation / IEEE, 2337–2346

work page 2019

[44] [44]

Razvan Pascanu, Tomás Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In ICML (3) (JMLR Workshop and Conference Proceedings, Vol. 28) . JMLR.org, 1310–1318

work page 2013

[45] [45]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI. AUAI Press, 452–461

work page 2009

[46] [46]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR. IEEE, 10674–10685. Fashion Image-to-Image Translation for Complementary Item Retrieval 23

work page 2022

[47] [47]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI (3) (Lecture Notes in Computer Science, Vol. 9351) . Springer, 234–241

work page 2015

[48] [48]

Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, and Gerard Medioni

Rohan Sarkar, Navaneeth Bodla, Mariya I. Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, and Gerard Medioni

work page

[49] [49]

In W ACV

OutfitTransformer: Learning Outfit Representations for Fashion Recommendation. In W ACV. IEEE, 3590–3598

work page

[50] [50]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. InICLR. OpenReview.net

work page 2021

[51] [51]

Xuemeng Song, Fuli Feng, Jinhuan Liu, Zekun Li, Liqiang Nie, and Jun Ma. 2017. NeuroStylist: Neural Compatibility Modeling for Clothing Matching. In ACM Multimedia. ACM, 753–761

work page 2017

[52] [52]

Xuemeng Song, Xianjing Han, Yunkai Li, Jingyuan Chen, Xin-Shun Xu, and Liqiang Nie. 2019. GP-BPR: Personalized Compatibility Modeling for Clothing Matching. In ACM Multimedia. ACM, 320–328

work page 2019

[53] [53]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score- Based Generative Modeling through Stochastic Differential Equations. In ICLR. OpenReview.net

work page 2021

[54] [54]

Hoang Thanh-Tung and Truyen Tran. 2020. Catastrophic forgetting and mode collapse in GANs. In IJCNN

work page 2020

[55] [55]

Newsam, and Kofi Boakye

Yuxin Tian, Shawn D. Newsam, and Kofi Boakye. 2023. Fashion Image Retrieval with Text Feedback by Additive Attention Compositional Learning. In W ACV. IEEE, 1011–1021

work page 2023

[56] [56]

Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. CoRR abs/1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[57] [57]

Feng Wang and Huaping Liu. 2021. Understanding the Behaviour of Contrastive Loss. In CVPR. Computer Vision Foundation / IEEE, 2495–2504

work page 2021

[58] [58]

Jianfeng Wang, Xiaochun Cheng, Ruomei Wang, and Shaohui Liu. 2021. Learning Outfit Compatibility with Graph Attention Network and Visual-Semantic Embedding. In ICME. IEEE, 1–6

work page 2021

[59] [59]

Jui-Chieh Wu, José Antonio Sánchez Rodríguez, and Humberto Jesús Corona Pampín. 2019. Session-based Comple- mentary Fashion Recommendations. CoRR abs/1908.08327 (2019)

work page arXiv 2019

[60] [60]

Huijing Zhan and Jie Lin. 2021. PAN: Personalized Attention Network For Outfit Recommendation. In 2021 IEEE International Conference on Image Processing, ICIP 2021 . IEEE, 2663–2667

work page 2021

[61] [61]

Han Zhang, Tao Xu, and Hongsheng Li. 2017. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV. IEEE Computer Society, 5908–5916

work page 2017

[62] [62]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV. IEEE Computer Society, 2242–2251

work page 2017