Fashion Image-to-Image Translation for Complementary Item Retrieval
Pith reviewed 2026-05-23 22:20 UTC · model grok-4.3
The pith
A two-stage model generates complementary fashion images with conditional GANs and feeds them into retrieval to raise top-bottom matching accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the Generative Compatibility Model (GeCo) improves fashion item retrieval by first using the Complementary Item Generation Model (CIGM), a conditional GAN performing paired image-to-image translation, to produce target-item images from seed items and then incorporating those generated images as conditioning signals inside the compatibility scoring step of composed image retrieval.
What carries the argument
The Complementary Item Generation Model (CIGM), a conditional GAN that performs paired image-to-image translation to create complementary-item images used as conditioning signals for retrieval.
If this is right
- The GeCo model outperforms state-of-the-art baselines on three top-bottom retrieval datasets.
- Paired image-to-image translation inside the composed image retrieval framework supplies effective conditioning signals.
- The approach mitigates the need for very large training sets that typical generative models require.
- Release of the Fashion Taobao dataset provides a new benchmark for top-bottom compatibility research.
Where Pith is reading between the lines
- The same two-stage pattern of generating conditioning images before retrieval could be tested on non-fashion item pairing tasks such as furniture or accessory matching.
- If generation quality fluctuates across items, an explicit quality filter or uncertainty estimate on the synthetic images might further stabilize results.
- Extending the method from pairs to sets of three or more mutually compatible items would be a direct next measurement of the same conditioning mechanism.
Load-bearing premise
The images produced by the CIGM component are high-quality enough to supply useful conditioning signals that raise rather than lower retrieval performance.
What would settle it
Retraining the retrieval stage on the same three datasets once with and once without the CIGM-generated images and observing no gain or a drop in accuracy metrics would falsify the claim.
Figures
read the original abstract
The increasing demand for online fashion retail has boosted research in fashion compatibility modeling and item retrieval, focusing on matching user queries (textual descriptions or reference images) with compatible fashion items. A key challenge is top-bottom retrieval, where precise compatibility modeling is essential. Traditional methods, often based on Bayesian Personalized Ranking (BPR), have shown limited performance. Recent efforts have explored using generative models in compatibility modeling and item retrieval, where generated images serve as additional inputs. However, these approaches often overlook the quality of generated images, which could be crucial for model performance. Additionally, generative models typically require large datasets, posing challenges when such data is scarce. To address these issues, we introduce the Generative Compatibility Model (GeCo), a two-stage approach that improves fashion image retrieval through paired image-to-image translation. First, the Complementary Item Generation Model (CIGM), built on Conditional Generative Adversarial Networks (GANs), generates target item images (e.g., bottoms) from seed items (e.g., tops), offering conditioning signals for retrieval. These generated samples are then integrated into GeCo, enhancing compatibility modeling and retrieval accuracy. Evaluations on three datasets show that GeCo outperforms state-of-the-art baselines. Key contributions include: (i) the GeCo model utilizing paired image-to-image translation within the Composed Image Retrieval framework, (ii) comprehensive evaluations on benchmark datasets, and (iii) the release of a new Fashion Taobao dataset designed for top-bottom retrieval, promoting further research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Generative Compatibility Model (GeCo), a two-stage approach for fashion complementary item retrieval. The first stage, Complementary Item Generation Model (CIGM), employs Conditional Generative Adversarial Networks (cGANs) to perform paired image-to-image translation, generating images of complementary items (e.g., bottoms from tops). These generated images serve as conditioning signals in the second stage for improved compatibility modeling and retrieval. The paper reports that GeCo outperforms state-of-the-art baselines on three datasets and releases a new Fashion Taobao dataset for top-bottom retrieval.
Significance. If the empirical claims hold with proper controls, the work is significant for highlighting the importance of generated image quality in generative approaches to compatibility modeling, which prior work overlooked. The release of the new dataset is a positive contribution that could facilitate further research in the field. The two-stage design directly targets the identified limitation in existing methods.
major comments (2)
- [Experiments] Experiments section: the central claim that GeCo outperforms baselines via CIGM-generated conditioning signals requires an ablation isolating the contribution of the generated images (e.g., retrieval performance with vs. without CIGM outputs, or with real vs. generated conditioning). Without this, it is impossible to confirm that the generated samples supply high-quality signals rather than noise, which is the load-bearing assumption flagged in the abstract.
- [Evaluation protocol] Evaluation protocol (likely §4 or §5): the abstract asserts outperformance on three datasets but the manuscript must report exact baselines, metrics (e.g., Recall@K, NDCG), data splits, and statistical significance tests; absence of these details prevents verification of the empirical superiority claim.
minor comments (2)
- [Abstract] Abstract: the description of the new Fashion Taobao dataset should include basic statistics (number of pairs, train/test split sizes) to allow immediate assessment of its scale and utility.
- [Model description] Notation: the distinction between CIGM and GeCo could be clarified with a single diagram or explicit statement of how the generated image is fed into the compatibility scorer.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the empirical validation.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the central claim that GeCo outperforms baselines via CIGM-generated conditioning signals requires an ablation isolating the contribution of the generated images (e.g., retrieval performance with vs. without CIGM outputs, or with real vs. generated conditioning). Without this, it is impossible to confirm that the generated samples supply high-quality signals rather than noise, which is the load-bearing assumption flagged in the abstract.
Authors: We agree that an explicit ablation isolating the CIGM contribution is required to substantiate the central claim. The current two-stage design assumes the generated images provide useful conditioning, but without direct comparison the source of gains remains unclear. In the revision we will add ablation results comparing retrieval performance with vs. without CIGM outputs and, where feasible, real vs. generated conditioning signals on the three datasets. revision: yes
-
Referee: [Evaluation protocol] Evaluation protocol (likely §4 or §5): the abstract asserts outperformance on three datasets but the manuscript must report exact baselines, metrics (e.g., Recall@K, NDCG), data splits, and statistical significance tests; absence of these details prevents verification of the empirical superiority claim.
Authors: We acknowledge that the evaluation details must be reported with full precision to allow verification. The revised manuscript will explicitly enumerate all baselines, list the complete set of metrics (including Recall@K and any NDCG), detail the train/validation/test splits for each of the three datasets, and add statistical significance tests (e.g., paired t-tests across runs) supporting the reported improvements. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper introduces an empirical two-stage architecture (CIGM using Conditional GANs to generate conditioning images, then integrated into GeCo for top-bottom retrieval) and reports performance gains on three datasets versus baselines. No equations, parameter-fitting steps, or derivation chain appear in the abstract or described contributions. The central claim is an external empirical comparison rather than any internal reduction to fitted inputs or self-citations, rendering the argument self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- cGAN training hyperparameters and loss weights
axioms (1)
- domain assumption Conditional GANs conditioned on fashion images can generate images of compatible items at sufficient quality to aid retrieval
invented entities (2)
-
GeCo
no independent evidence
-
CIGM
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Martín Arjovsky and Léon Bottou. 2017. Towards Principled Methods for Training Generative Adversarial Networks. In ICLR. OpenReview.net
work page 2017
-
[2]
Martín Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In ICML (Proceedings of Machine Learning Research, Vol. 70) . PMLR, 214–223
work page 2017
-
[3]
Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, and Alberto Del Bimbo. 2023. Zero-Shot Composed Image Retrieval with Textual Inversion. In ICCV. IEEE, 15292–15301
work page 2023
-
[4]
Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022. Effective conditioned and composed image retrieval combining CLIP-based features. In CVPR. IEEE, 21434–21442
work page 2022
-
[5]
Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2024. Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features. ACM Trans. Multim. Comput. Commun. Appl. 20, 3 (2024), 62:1–62:24
work page 2024
-
[6]
Adrien Berthelot, Eddy Caron, Mathilde Jay, and Laurent Lefèvre. 2024. Estimating the environmental impact of Generative-AI services using an LCA-based methodology. Procedia CIRP 122 (2024), 707–712
work page 2024
-
[7]
Koby Bibas, Oren Sar Shalom, and Dietmar Jannach. 2023. Semi-supervised Adversarial Learning for Complementary Item Recommendation. In WWW. ACM, 1804–1812
work page 2023
-
[8]
Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In SIGIR. ACM, 335–344
work page 2017
-
[9]
Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao, and Binqiang Zhao. 2019. POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion. In KDD. ACM, 2662–2670
work page 2019
-
[10]
Zeyu Cui, Zekun Li, Shu Wu, Xiaoyu Zhang, and Liang Wang. 2019. Dressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks. In WWW. ACM, 307–317
work page 2019
-
[11]
McAuley, Giovanni Pellegrini, Alejandro Bellogín, and Tommaso Di Noia
Yashar Deldjoo, Fatemeh Nazary, Arnau Ramisa, Julian J. McAuley, Giovanni Pellegrini, Alejandro Bellogín, and Tommaso Di Noia. 2024. A Review of Modern Fashion Recommender Systems. ACM Comput. Surv. 56, 4 (2024), 87:1–87:37
work page 2024
-
[12]
Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2021. A Study on the Relative Impor- tance of Convolutional Neural Networks in Visually-Aware Recommender Systems. InCVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967
work page 2021
-
[13]
Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In NeurIPS. 8780–8794
work page 2021
-
[14]
Mohamed El-Kaddoury, Abdelhak Mahmoudi, and Mohamed Majid Himmi. 2019. Deep Generative Models for Image Generation: A Practical Comparison Between Variational Autoencoders and Generative Adversarial Networks. In MSPN. Springer
work page 2019
-
[15]
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 8 (2006), 861–874
work page 2006
- [16]
- [17]
-
[18]
NIPS 2016 Tutorial: Generative Adversarial Networks
Ian J. Goodfellow. 2017. NIPS 2016 Tutorial: Generative Adversarial Networks. CoRR abs/1701.00160 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[19]
Generative Adversarial Networks
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. CoRR abs/1406.2661 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
- [20]
-
[21]
Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou Sun, and Wei Wang. 2020. P-Companion: A Principled Framework for Diversified Complementary Product Recommendation. In CIKM. ACM, 2517–2524
work page 2020
-
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770–778
work page 2016
-
[23]
Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI. AAAI Press, 144–150
work page 2016
-
[24]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In NeurIPS
work page 2020
-
[25]
Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, and Mehdi Noroozi
David T. Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, and Mehdi Noroozi. 2022. Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives. In AAAI. AAAI Press, 897–905
work page 2022
-
[26]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR. IEEE Computer Society, 5967–5976
work page 2017
-
[27]
Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian J. McAuley. 2017. Visually-Aware Fashion Recommendation and Design with Generative Image Models. In ICDM. IEEE Computer Society, 207–216
work page 2017
-
[28]
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the Design Space of Diffusion-Based Generative Models. In NeurIPS
work page 2022
-
[29]
Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In ICLR
work page 2014
-
[30]
Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (2009), 30–37
work page 2009
-
[31]
Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In CVPR. IEEE Computer Society, 105–114
work page 2017
-
[32]
Xingchen Li, Xiang Wang, Xiangnan He, Long Chen, Jun Xiao, and Tat-Seng Chua. 2020. Hierarchical Fashion Graph Network for Personalized Outfit Recommendation. In SIGIR. ACM, 159–168
work page 2020
-
[33]
Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2019. Improving Outfit Recom- mendation with Co-supervision of Fashion Generation. In WWW. ACM, 1095–1105
work page 2019
-
[34]
Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2020. Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation. IEEE TKDE 32, 8 (2020), 1502–1516
work page 2020
-
[35]
Jinhuan Liu, Xuemeng Song, Zhumin Chen, and Jun Ma. 2020. MGCM: Multi-modal generative compatibility modeling for clothing matching. Neurocomputing 414 (2020), 215–224
work page 2020
-
[36]
Jinhuan Liu, Xuemeng Song, Zhaochun Ren, Liqiang Nie, Zhaopeng Tu, and Jun Ma. 2020. Auxiliary Template-Enhanced Generative Compatibility Modeling. In IJCAI. ijcai.org, 3508–3514
work page 2020
-
[37]
Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. 2022. Pseudo Numerical Methods for Diffusion Models on Manifolds. In ICLR. OpenReview.net
work page 2022
-
[38]
Qiang Liu, Shu Wu, and Liang Wang. 2017. DeepStyle: Learning User Preferences for Visual Recommendation. In SIGIR. ACM, 841–844
work page 2017
-
[39]
Zheyuan Liu, Cristian Rodriguez Opazo, Damien Teney, and Stephen Gould. 2021. Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models. In ICCV. IEEE, 2105–2114
work page 2021
-
[40]
Mescheder, Andreas Geiger, and Sebastian Nowozin
Lars M. Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which Training Methods for GANs do actually Converge?. In ICML (Proceedings of Machine Learning Research, Vol. 80) . PMLR, 3478–3487
work page 2018
-
[41]
Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR abs/1411.1784 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[42]
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In ICLR. OpenReview.net
work page 2018
-
[43]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially- Adaptive Normalization. In CVPR. Computer Vision Foundation / IEEE, 2337–2346
work page 2019
-
[44]
Razvan Pascanu, Tomás Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In ICML (3) (JMLR Workshop and Conference Proceedings, Vol. 28) . JMLR.org, 1310–1318
work page 2013
-
[45]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI. AUAI Press, 452–461
work page 2009
-
[46]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR. IEEE, 10674–10685. Fashion Image-to-Image Translation for Complementary Item Retrieval 23
work page 2022
-
[47]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI (3) (Lecture Notes in Computer Science, Vol. 9351) . Springer, 234–241
work page 2015
-
[48]
Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, and Gerard Medioni
Rohan Sarkar, Navaneeth Bodla, Mariya I. Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, and Gerard Medioni
- [49]
-
[50]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. InICLR. OpenReview.net
work page 2021
-
[51]
Xuemeng Song, Fuli Feng, Jinhuan Liu, Zekun Li, Liqiang Nie, and Jun Ma. 2017. NeuroStylist: Neural Compatibility Modeling for Clothing Matching. In ACM Multimedia. ACM, 753–761
work page 2017
-
[52]
Xuemeng Song, Xianjing Han, Yunkai Li, Jingyuan Chen, Xin-Shun Xu, and Liqiang Nie. 2019. GP-BPR: Personalized Compatibility Modeling for Clothing Matching. In ACM Multimedia. ACM, 320–328
work page 2019
-
[53]
Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score- Based Generative Modeling through Stochastic Differential Equations. In ICLR. OpenReview.net
work page 2021
-
[54]
Hoang Thanh-Tung and Truyen Tran. 2020. Catastrophic forgetting and mode collapse in GANs. In IJCNN
work page 2020
-
[55]
Yuxin Tian, Shawn D. Newsam, and Kofi Boakye. 2023. Fashion Image Retrieval with Text Feedback by Additive Attention Compositional Learning. In W ACV. IEEE, 1011–1021
work page 2023
-
[56]
Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. CoRR abs/1807.03748 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[57]
Feng Wang and Huaping Liu. 2021. Understanding the Behaviour of Contrastive Loss. In CVPR. Computer Vision Foundation / IEEE, 2495–2504
work page 2021
-
[58]
Jianfeng Wang, Xiaochun Cheng, Ruomei Wang, and Shaohui Liu. 2021. Learning Outfit Compatibility with Graph Attention Network and Visual-Semantic Embedding. In ICME. IEEE, 1–6
work page 2021
- [59]
-
[60]
Huijing Zhan and Jie Lin. 2021. PAN: Personalized Attention Network For Outfit Recommendation. In 2021 IEEE International Conference on Image Processing, ICIP 2021 . IEEE, 2663–2667
work page 2021
-
[61]
Han Zhang, Tao Xu, and Hongsheng Li. 2017. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV. IEEE Computer Society, 5908–5916
work page 2017
-
[62]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV. IEEE Computer Society, 2242–2251
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.