A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce

Abdul-Saboor Sheikh; Evgenii Koriagin; Reza Shirvany; Roland Vollgraf; Romain Guigoures; Urs Bergmann; Yuen King Ho

arxiv: 1907.09844 · v1 · pith:AUB5IGG6new · submitted 2019-07-23 · 💻 cs.LG · stat.ML

A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce

Abdul-Saboor Sheikh , Romain Guigoures , Evgenii Koriagin , Yuen King Ho , Reza Shirvany , Roland Vollgraf , Urs Bergmann This is my paper

Pith reviewed 2026-05-24 17:33 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords size recommendationfit predictiondeep learningfashion e-commercecollaborative filteringembeddingspersonalized recommendationsreturn reduction

0 comments

The pith

A deep learning model learns customer and article embeddings to predict personalized size and fit from sparse interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep learning method for size and fit recommendations that combines content features with collaborative signals to address extreme sparsity in customer purchase data. It learns a global set of parameters for population-level patterns while using specific embeddings for customers and articles, plus mappings for additional attributes. The approach also handles multiple individuals or intents behind one account. On public datasets it improves over published results, and on proprietary ones it beats comparable methods including a Bayesian size recommender. If the model works as described, platforms could cut returns from wrong sizes and raise customer satisfaction.

Core claim

The central claim is that a neural network ingesting arbitrary customer and article attributes, learning entity-specific embeddings, and optimizing shared parameters on observed interactions can model size and fit preferences more accurately than prior collaborative filtering or Bayesian methods.

What carries the argument

A content-collaborative deep network that maps attributes into a latent space via learned embeddings and global parameters to derive fit predictions.

If this is right

Higher accuracy on two public datasets than state-of-the-art published results.
Outperformance of other methods on proprietary expert-fit and purchase datasets.
Ability to model multiple intents per account through the embedding structure.
Direct reduction in size-related return costs for e-commerce platforms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding-plus-attribute structure could apply to other sparse recommendation settings such as furniture or electronics sizing.
Shared-account modeling might extend to group or family recommendations in non-fashion domains.
Adding real-time feedback loops from post-purchase surveys could further refine the latent space.

Load-bearing premise

That arbitrary customer and article attributes can be mapped into embeddings and global parameters to predict fit accurately from observed interactions without overfitting to sparse or biased data.

What would settle it

Application of the model to a held-out set of customer purchase records where its accuracy does not exceed that of a standard matrix factorization baseline or the cited Bayesian method.

Figures

Figures reproduced from arXiv: 1907.09844 by Abdul-Saboor Sheikh, Evgenii Koriagin, Reza Shirvany, Roland Vollgraf, Romain Guigoures, Urs Bergmann, Yuen King Ho.

**Figure 2.** Figure 2: The ROC curves for one of the best runs of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems from extreme sparsity of customer-article orders. To alleviate this problem, we propose a deep learning based content-collaborative methodology for personalized size and fit recommendation. Our proposed method can ingest arbitrary customer and article data and can model multiple individuals or intents behind a single account. The method optimizes a global set of parameters to learn population-level abstractions of size and fit relevant information from observed customer-article interactions. It further employs customer and article specific embedding variables to learn their properties. Together with learned entity embeddings, the method maps additional customer and article attributes into a latent space to derive personalized recommendations. Application of our method to two publicly available datasets demonstrate an improvement over the state-of-the-art published results. On two proprietary datasets, one containing fit feedback from fashion experts and the other involving customer purchases, we further outperform comparable methodologies, including a recent Bayesian approach for size recommendation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract claims a new content-collaborative deep learning model for size/fit prediction that beats SOTA on public and proprietary data, but supplies zero metrics, baselines, or validation details so the improvements cannot be checked.

read the letter

The paper presents a deep learning model that combines customer and article attributes with learned embeddings to predict fit, while also modeling multiple intents per account through global parameters plus entity-specific variables. That combination is the main technical step beyond standard collaborative filtering for this domain, and it targets a clear practical pain point in fashion e-commerce where returns from bad size matches are expensive. The method is described as ingesting arbitrary attributes and optimizing a shared parameter set for population-level size abstractions, which is a reasonable way to address the extreme sparsity of purchase data. On that narrow point the work is straightforward and domain-appropriate. The abstract states that the approach improves over published results on two public datasets and outperforms a recent Bayesian baseline on two proprietary ones (expert fit feedback and customer purchases). If the full paper actually shows those gains with proper controls, the contribution would be a usable system for this vertical. The soft spot is exactly what the stress-test note flags: purchase data is sparse and selection-biased, so without reported regularization, negative sampling strategy, or constraints on the latent space it is easy for the model to pick up account-level purchase habits instead of generalizable fit signals. The abstract gives no evidence on those controls, and the lack of any numbers, tables, or experimental setup means the central empirical claim cannot be evaluated. This is the kind of paper that belongs in an applied recsys venue if the experiments hold up, but the current text is too thin to judge. I would not cite it on the basis of the abstract alone. A serious editor should send the full version to review rather than desk-reject, because the application area is commercially relevant and the modeling ideas are coherent even if the evidence is still missing.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a deep learning content-collaborative method for personalized size and fit recommendation in fashion e-commerce. It ingests arbitrary customer and article attributes, learns customer- and article-specific embeddings plus a global set of parameters from observed interactions, and claims to model multiple individuals or intents per account while alleviating sparsity. The method is reported to improve over published state-of-the-art results on two public datasets and to outperform comparable methods (including a recent Bayesian baseline) on two proprietary datasets, one with expert fit feedback and one with purchase records.

Significance. If the empirical gains prove robust, the hybrid embedding approach could meaningfully reduce size-related returns and improve customer satisfaction on e-commerce platforms. The ability to incorporate arbitrary attributes and to optimize population-level abstractions alongside entity-specific variables addresses a practical sparsity challenge. The work does not, however, supply the quantitative evidence needed to evaluate whether these gains reflect genuine size/fit modeling or dataset-specific correlations.

major comments (2)

[Abstract] Abstract: the central empirical claim—that the method improves over state-of-the-art on public datasets and outperforms baselines (including a Bayesian approach) on proprietary datasets—is stated without any metrics, baselines, error bars, statistical tests, or validation protocol, rendering the primary contribution unverifiable.
[Method] Method section (description of embedding and global-parameter training): no details are supplied on regularization, negative sampling, or latent-space constraints that would prevent the learned embeddings from capturing account-level purchase biases or expert-label correlations rather than generalizable size/fit signals; this assumption is load-bearing for the claim that the model learns “population-level abstractions.”

minor comments (1)

[Abstract] Abstract: the phrase “optimizes a global set of parameters” is repeated without clarifying whether these parameters are shared across all entities or include per-entity terms; a single clarifying sentence would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address each major comment below and outline the revisions we will make to improve the manuscript's clarity and verifiability.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim—that the method improves over state-of-the-art on public datasets and outperforms baselines (including a Bayesian approach) on proprietary datasets—is stated without any metrics, baselines, error bars, statistical tests, or validation protocol, rendering the primary contribution unverifiable.

Authors: We agree that the abstract would benefit from greater specificity to allow readers to immediately assess the claims. In the revised version we will incorporate concise quantitative results (e.g., relative improvements in AUC or accuracy), name the main baselines, and briefly reference the evaluation protocol and datasets while preserving the abstract's length and readability. revision: yes
Referee: [Method] Method section (description of embedding and global-parameter training): no details are supplied on regularization, negative sampling, or latent-space constraints that would prevent the learned embeddings from capturing account-level purchase biases or expert-label correlations rather than generalizable size/fit signals; this assumption is load-bearing for the claim that the model learns “population-level abstractions.”

Authors: We acknowledge that additional technical details would strengthen the exposition. The original manuscript describes the joint optimization of global parameters and entity embeddings, but we will expand the method section to explicitly cover the regularization terms, negative-sampling procedure, and any latent-space normalization or constraints used during training. These additions will clarify the mechanisms intended to promote generalizable size/fit signals over dataset-specific biases. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard empirical ML training and held-out evaluation

full rationale

The paper presents a neural network architecture that ingests customer and article attributes, learns embeddings and global parameters from observed interactions, and reports accuracy improvements on held-out portions of public and proprietary datasets. No equations or claims reduce a prediction to a fitted input by construction, no self-citation chains bear the central result, and no ansatz or uniqueness theorem is smuggled in; the derivation consists of standard supervised optimization whose outputs are measured against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are described. The approach relies on standard neural network training assumptions and learned embeddings.

pith-pipeline@v0.9.0 · 5759 in / 1029 out tokens · 40842 ms · 2026-05-24T17:33:47.744291+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

[1]

G Mohammed Abdulla and Sumit Borar. 2017. Size recommendation system for fashion e-commerce. In Workshop on Machine Learning Meets Fashion, KDD

work page 2017
[2]

Sugato Basu, Arindam Banerjee, and Raymond Mooney. 2002. Semi-supervised clustering by seeding. InIn Proceedings of 19th International Conference on Machine Learning

work page 2002
[3]

Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah

work page
[4]

In Advances in Neural Information Processing Systems 6 , J

Signature Verification using a "Siamese" time delay neural network. In Advances in Neural Information Processing Systems 6 , J. D. Cowan, G. Tesauro, and J. Alspector (Eds.). 737–744

work page
[5]

Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. 2017. Sequential user-based recurrent neural network recommendations. In Proceedings of the 11th ACM Conference on Recommender Systems (RecSys ’17)

work page 2017
[6]

Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In Proceedings of the 24th International Conference on World Wide Web . International World Wide Web Conferences Steering Committee, 278–288

work page 2015
[7]

Ronald L Graham, Donald E Knuth, Oren Patashnik, and Stanley Liu. 1989. Con- crete mathematics: a foundation for computer science. Computers in Physics 3, 5 (1989), 106–107

work page 1989
[8]

Romain Guigourès, Yuen King Ho, Evgenii Koriagin, Abdul-Saboor Sheikh, Urs Bergmann, and Reza Shirvany. 2018. A hierarchical bayesian model for size recom- mendation in fashion. In Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 392–396

work page 2018
[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

work page 2016
[10]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web . International World Wide Web Conferences Steering Committee, 173–182

work page 2017
[11]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough Data. ACM SIGKDD Explorations Newsletter 3, 1, 27–32

work page 2013
[12]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[13]

Christopher C Johnson. 2014. Logistic matrix factorization for implicit feedback data. Advances in Neural Information Processing Systems 27 (2014)

work page 2014
[14]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti- mization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[15]

Yehuda Koren and Robert M. Bell. 2015. Advances in Collaborative Filtering. In Recommender Systems Handbook. 77–118

work page 2015
[16]

Manning, Prabhakar Raghavan, and Hinrich Schütze

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Intro- duction to information retrieval . Cambridge University Press

work page 2008
[17]

Daniele Micci-Barreca. 2001. A preprocessing scheme for high-cardinality cat- egorical attributes in classification and prediction problems. ACM SIGKDD Explorations Newsletter 3, 1 (2001), 27–32

work page 2001
[18]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2018. Efficient estimation of word representations in vector space. Workshop in International Conference on Learning Representations (ICLR) (2018)

work page 2018
[19]

Rishabh Misra, Mengting Wan, and Julian McAuley. 2018. Decomposing fit semantics for product size recommendation in metric spaces. In Proceedings of the 12th ACM Conference on Recommender Systems . ACM, 422–426

work page 2018
[20]

Mouhannad

Fanke Peng and Al-Sayegh. Mouhannad. 2014. Personalised Size Recommenda- tion for Online Fashion. In 6th International conference on mass customization and personalization in Central Europe . 1–6

work page 2014
[21]

Gina Pisut and Lenda Jo Connell. 2007. Fit preferences of female consumers in the USA. Journal of Fashion Marketing and Management: An International Journal 11, 3 (2007), 366–379

work page 2007
[22]

Vivek Sembium, Rajeev Rastogi, Atul Saroop, and Srujana Merugu. 2017. Recom- mending product sizes to customers. In Proceedings of the 11th ACM Conference on Recommender Systems. ACM, 243–250

work page 2017
[23]

Vivek Sembium, Rajeev Rastogi, Lavanya Tekumalla, and Atul Saroop. 2018. Bayesian models for product size recommendations. In Proceedings of the 25th International Conference on World Wide Web. 679–687

work page 2018
[24]

Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Collaborative filtering beyond the user-item matrix: A survey of the state-of-the-art and future challenges.ACM Comput. Surv. (2014)

work page 2014
[25]

Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Rea- soning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems . 926–934

work page 2013
[26]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958

work page 2014

[1] [1]

G Mohammed Abdulla and Sumit Borar. 2017. Size recommendation system for fashion e-commerce. In Workshop on Machine Learning Meets Fashion, KDD

work page 2017

[2] [2]

Sugato Basu, Arindam Banerjee, and Raymond Mooney. 2002. Semi-supervised clustering by seeding. InIn Proceedings of 19th International Conference on Machine Learning

work page 2002

[3] [3]

Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah

work page

[4] [4]

In Advances in Neural Information Processing Systems 6 , J

Signature Verification using a "Siamese" time delay neural network. In Advances in Neural Information Processing Systems 6 , J. D. Cowan, G. Tesauro, and J. Alspector (Eds.). 737–744

work page

[5] [5]

Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. 2017. Sequential user-based recurrent neural network recommendations. In Proceedings of the 11th ACM Conference on Recommender Systems (RecSys ’17)

work page 2017

[6] [6]

Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In Proceedings of the 24th International Conference on World Wide Web . International World Wide Web Conferences Steering Committee, 278–288

work page 2015

[7] [7]

Ronald L Graham, Donald E Knuth, Oren Patashnik, and Stanley Liu. 1989. Con- crete mathematics: a foundation for computer science. Computers in Physics 3, 5 (1989), 106–107

work page 1989

[8] [8]

Romain Guigourès, Yuen King Ho, Evgenii Koriagin, Abdul-Saboor Sheikh, Urs Bergmann, and Reza Shirvany. 2018. A hierarchical bayesian model for size recom- mendation in fashion. In Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 392–396

work page 2018

[9] [9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

work page 2016

[10] [10]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web . International World Wide Web Conferences Steering Committee, 173–182

work page 2017

[11] [11]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough Data. ACM SIGKDD Explorations Newsletter 3, 1, 27–32

work page 2013

[12] [12]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[13] [13]

Christopher C Johnson. 2014. Logistic matrix factorization for implicit feedback data. Advances in Neural Information Processing Systems 27 (2014)

work page 2014

[14] [14]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti- mization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[15] [15]

Yehuda Koren and Robert M. Bell. 2015. Advances in Collaborative Filtering. In Recommender Systems Handbook. 77–118

work page 2015

[16] [16]

Manning, Prabhakar Raghavan, and Hinrich Schütze

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Intro- duction to information retrieval . Cambridge University Press

work page 2008

[17] [17]

Daniele Micci-Barreca. 2001. A preprocessing scheme for high-cardinality cat- egorical attributes in classification and prediction problems. ACM SIGKDD Explorations Newsletter 3, 1 (2001), 27–32

work page 2001

[18] [18]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2018. Efficient estimation of word representations in vector space. Workshop in International Conference on Learning Representations (ICLR) (2018)

work page 2018

[19] [19]

Rishabh Misra, Mengting Wan, and Julian McAuley. 2018. Decomposing fit semantics for product size recommendation in metric spaces. In Proceedings of the 12th ACM Conference on Recommender Systems . ACM, 422–426

work page 2018

[20] [20]

Mouhannad

Fanke Peng and Al-Sayegh. Mouhannad. 2014. Personalised Size Recommenda- tion for Online Fashion. In 6th International conference on mass customization and personalization in Central Europe . 1–6

work page 2014

[21] [21]

Gina Pisut and Lenda Jo Connell. 2007. Fit preferences of female consumers in the USA. Journal of Fashion Marketing and Management: An International Journal 11, 3 (2007), 366–379

work page 2007

[22] [22]

Vivek Sembium, Rajeev Rastogi, Atul Saroop, and Srujana Merugu. 2017. Recom- mending product sizes to customers. In Proceedings of the 11th ACM Conference on Recommender Systems. ACM, 243–250

work page 2017

[23] [23]

Vivek Sembium, Rajeev Rastogi, Lavanya Tekumalla, and Atul Saroop. 2018. Bayesian models for product size recommendations. In Proceedings of the 25th International Conference on World Wide Web. 679–687

work page 2018

[24] [24]

Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Collaborative filtering beyond the user-item matrix: A survey of the state-of-the-art and future challenges.ACM Comput. Surv. (2014)

work page 2014

[25] [25]

Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Rea- soning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems . 926–934

work page 2013

[26] [26]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958

work page 2014