One Embedding To Do Them All

Loveperteek Singh; Sagar Arora; Shreya Singh; Sumit Borar

arxiv: 1906.12120 · v1 · pith:Z6EFZWOZnew · submitted 2019-06-28 · 💻 cs.LG · cs.IR· stat.ML

One Embedding To Do Them All

Loveperteek Singh , Shreya Singh , Sagar Arora , Sumit Borar This is my paper

Pith reviewed 2026-05-25 13:40 UTC · model grok-4.3

classification 💻 cs.LG cs.IRstat.ML

keywords product embeddingsmulti-source learninge-commercedenoising autoencoderBayesian personalized rankingSiamese networkunified representationsclickstream data

0 comments

The pith

Unified embeddings from text, clicks and images perform well on attribute coverage, similarity and return prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a framework that learns one set of product embeddings by drawing on catalog text, user clickstream sessions and product images at the same time. Separate models are trained on each data type using denoising auto-encoders, Bayesian personalized ranking and a Siamese network, after which their embeddings are combined. The resulting unified embeddings are then tested on three unrelated real-world tasks: checking how well products are described by attributes, finding similar products and predicting returns. The authors report that the single embedding set delivers strong results across all three tasks. A reader would care because current practice usually trains and stores separate representations for each function, so a shared embedding could reduce duplication while maintaining performance.

Core claim

By training independent models on catalog text with denoising auto-encoders, on clickstream data with Bayesian personalized ranking and on images with a Siamese network, then forming an ensemble of the resulting embeddings, a unified product representation is obtained that performs uniformly well on product attribute coverage, similar-product retrieval and return prediction without further task-specific training.

What carries the argument

The ensemble that combines embeddings produced separately by a denoising auto-encoder on text, Bayesian personalized ranking on clickstream sessions and a Siamese network on images.

If this is right

A single embedding can be used for search, recommendation and operational tasks instead of maintaining separate models.
Training occurs once on the product catalog rather than once per downstream task.
Performance remains consistent even when the tasks share little overlap in their objectives.
Serving infrastructure simplifies because only one embedding table needs to be stored and queried.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same independent-model-plus-ensemble pattern could be applied to additional data types such as customer reviews or video if comparable source models exist.
Production systems that currently run multiple embedding services might reduce memory and lookup latency by switching to one unified table.
If new tasks are introduced later, the ensemble weights may need re-balancing, which could be tested by adding a fourth task and measuring whether uniform performance holds.

Load-bearing premise

The three source-specific models can be trained independently on the same catalog and then combined without the ensemble step introducing bias toward any one task or requiring per-task hyper-parameter search.

What would settle it

Running the same three tasks on a held-out catalog where a task-specific model trained only for return prediction clearly outperforms the unified embedding on that task alone.

Figures

Figures reproduced from arXiv: 1906.12120 by Loveperteek Singh, Sagar Arora, Shreya Singh, Sumit Borar.

**Figure 1.** Figure 1: Different Techniques to Learn Product Embeddings This section describes different ways to learn product embeddings. As shown in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Autoencoder Architecture [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Prod2Vec Architecture each product(in bag and purchased) as center word in the list we sample all other product in the list as context words. This is equivalent to generating all product-product(centrecontext) pairs from the list and setting window size to one. The latent representations of the products are learned using the Skip Gram with Negative Sampling model. We sample negative samples randomly from… view at source ↗

**Figure 4.** Figure 4: Most Similar Products to a given Query using different Embeddings [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Precision at different values of k for different attributes [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: T-SNE plot showing Brand Clusters Green, Lacoste and Tommy Hilfiger. Finally, a cluster included a few brands (of slightly mass-premium price range) like Roadster, Here&Now and Moda Rapido. This clearly shows that embeddings are able to capture brand semantics fairly well so as to be able to capture user perception of brands. 4.3 Embeddings to Attributes This task attempts to evaluate learnt embeddings on… view at source ↗

**Figure 9.** Figure 9: Hit Ratio at different K values 4.6 Cart Return Prediction Cart return prediction is unrelated downstream tasks with which evaluated our embeddings. In this task, we aim to predict users’ propensity for returning product(s) from a cart at the time of purchase. Returns ensue bad user experience apart from extra operational costs incurred on the platform. As per our analysis, a product which is added to the… view at source ↗

read the original abstract

Online shopping caters to the needs of millions of users daily. Search, recommendations, personalization have become essential building blocks for serving customer needs. Efficacy of such systems is dependent on a thorough understanding of products and their representation. Multiple information sources and data types provide a complete picture of the product on the platform. While each of these tasks shares some common characteristics, typically product embeddings are trained and used in isolation. In this paper, we propose a framework to combine multiple data sources and learn unified embeddings for products on our e-commerce platform. Our product embeddings are built from three types of data sources - catalog text data, a user's clickstream session data and product images. We use various techniques like denoising auto-encoders for text, Bayesian personalized ranking (BPR) for clickstream data, Siamese neural network architecture for image data and combined ensemble over the above methods for unified embeddings. Further, we compare and analyze the performance of these embeddings across three unrelated real-world e-commerce tasks specifically checking product attribute coverage, finding similar products and predicting returns. We show that unified product embeddings perform uniformly well across all these tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper describes fusing three standard embeddings from text, clicks, and images for retail tasks but supplies no numbers or fusion details to support the uniformity claim.

read the letter

Hey, the main takeaway is that the authors train a denoising auto-encoder on catalog text, BPR on clickstream sessions, and a Siamese network on product images, then combine the outputs into one product embedding. They test it on attribute coverage, similar-product retrieval, and return prediction, and say the single embedding works uniformly across those tasks. That is the entire contribution as described in the abstract. What the paper does reasonably is pick three common data sources that actually exist in e-commerce and check whether one representation can serve multiple downstream needs instead of training separate models for each. That matches a real operational question. The methods themselves are standard and the choice of tasks is sensible for the domain. The soft spots are more central. The abstract asserts uniform performance without any tables, baselines, statistical tests, or description of the held-out protocol. The circularity issue is also present: the embeddings are learned from the same catalog and click data later used to score the tasks, and nothing in the text rules out overlap between training and evaluation. The stress-test concern holds up on the given description. The ensemble is called a “combined ensemble” with no information on whether it is concatenation, a fixed weighted sum, or a learned projection, and no statement that the combination weights or hyperparameters are locked before seeing the three tasks. If any of those choices can be made per task, the uniformity result is not demonstrated. The paper is an internal engineering note rather than a research contribution with new theory or reproducible evidence. It is aimed at practitioners already running product embeddings in retail who might want to consolidate their pipelines. A reader in that setting could pick up the idea of pulling in all three data types, but without the actual results or implementation details there is little to take away. I would not bring this to a reading group, would not cite it, and would not send it to peer review in its current form because the central claim rests on evidence that is not supplied.

Referee Report

3 major / 1 minor

Summary. The paper proposes a framework for learning unified product embeddings on an e-commerce platform by training three independent models—denoising auto-encoders on catalog text, Bayesian personalized ranking on user clickstream sessions, and Siamese networks on product images—then combining them via an ensemble. It evaluates these embeddings on three tasks (product attribute coverage, similar-product retrieval, and return prediction) and claims that the unified embeddings 'perform uniformly well across all these tasks' without task-specific adaptation.

Significance. If the uniformity claim were supported by rigorous, held-out evaluations with fixed ensemble parameters, the work would offer a practical demonstration that multi-modal product representations can reduce the need for per-task embedding training in e-commerce systems. The use of standard techniques (DAE, BPR, Siamese) on real catalog data is a reasonable starting point, but the manuscript supplies no quantitative evidence, baselines, or protocol details to substantiate the central claim.

major comments (3)

[Abstract] Abstract: the claim that unified embeddings 'perform uniformly well across all these tasks' is unsupported; the text supplies no quantitative tables, baselines, statistical tests, held-out evaluation protocol, or dataset sizes, making it impossible to assess the uniformity result.
[Methods] Methods (ensemble description): the 'combined ensemble' step is described only at the level of 'combined ensemble over the above methods' with no specification of the fusion operation (concatenation, weighted sum, learned projection, etc.) or whether any meta-parameters or weights are held fixed across the three downstream tasks; this directly undermines the task-agnostic claim.
[Evaluation] Evaluation protocol: embeddings are learned from the same clickstream and catalog data later used to measure attribute coverage and return prediction, with no explicit statement of disjoint train/test splits or external benchmarks; the reported gains are therefore consistent with in-sample fitting rather than generalization.

minor comments (1)

[Abstract] Abstract: the phrase 'three unrelated real-world e-commerce tasks' would benefit from a brief parenthetical listing of the tasks for immediate clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and will revise the manuscript to incorporate clarifications and additional details where needed.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that unified embeddings 'perform uniformly well across all these tasks' is unsupported; the text supplies no quantitative tables, baselines, statistical tests, held-out evaluation protocol, or dataset sizes, making it impossible to assess the uniformity result.

Authors: We agree the abstract claim would benefit from supporting quantitative context. The full manuscript presents per-task results in the evaluation section, but to strengthen the presentation we will revise the abstract to include a concise summary of key metrics (e.g., relative improvements on attribute coverage, retrieval, and return prediction) along with dataset sizes and a reference to the held-out protocol described in Section 4. revision: yes
Referee: [Methods] Methods (ensemble description): the 'combined ensemble' step is described only at the level of 'combined ensemble over the above methods' with no specification of the fusion operation (concatenation, weighted sum, learned projection, etc.) or whether any meta-parameters or weights are held fixed across the three downstream tasks; this directly undermines the task-agnostic claim.

Authors: We will expand the methods section to specify the fusion: the three modality-specific embeddings are concatenated and passed through a single linear projection layer whose weights are learned once on a validation split and then frozen for all downstream tasks. This fixed-parameter design directly supports the task-agnostic claim; the revised text will include the exact fusion equation and confirmation that no task-specific re-tuning occurs. revision: yes
Referee: [Evaluation] Evaluation protocol: embeddings are learned from the same clickstream and catalog data later used to measure attribute coverage and return prediction, with no explicit statement of disjoint train/test splits or external benchmarks; the reported gains are therefore consistent with in-sample fitting rather than generalization.

Authors: We will add an explicit evaluation-protocol subsection clarifying the temporal and product-level splits used: embeddings are trained on data up to a cutoff date, attribute-coverage and retrieval evaluations use held-out products, and return prediction uses future sessions after the cutoff. We will also state the sizes of the disjoint sets and note any external benchmarks. These details were present in our internal protocol but omitted from the manuscript; the revision will make them explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical pipeline: independent training of DAE on text, BPR on clickstream, and Siamese on images, followed by an ensemble whose fusion method is unspecified, then evaluation on attribute coverage, similar-product retrieval, and return prediction. No equations, uniqueness theorems, or derivation steps are presented in the abstract or described text that reduce a claimed result to its inputs by construction. No self-citation load-bearing premises or ansatz smuggling appear. The central claim is therefore an empirical observation rather than a closed-form derivation, making circularity analysis inapplicable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the unstated premise that the three data modalities are complementary and that standard off-the-shelf losses (reconstruction, ranking, contrastive) can be combined without new theoretical justification; no free parameters are explicitly introduced beyond those internal to the cited algorithms.

axioms (1)

domain assumption Product representations learned from one modality transfer to tasks defined on other modalities without additional alignment loss.
Invoked when the ensemble is claimed to work uniformly on attribute, similarity, and return tasks.

pith-pipeline@v0.9.0 · 5730 in / 1197 out tokens · 40717 ms · 2026-05-25T13:40:00.326644+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use various techniques like denoising auto-encoders for text, Bayesian personalized ranking (BPR) for clickstream data, Siamese neural network architecture for image data and combined ensemble over the above methods for unified embeddings.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We show that unified product embeddings perform uniformly well across all these tasks.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 5 internal anchors

[1]

Matching consumer’s need and retrieving relevant products is pivotal to the business

INTRODUCTION E-commerce is growing at a phenomenal rate around the world. Matching consumer’s need and retrieving relevant products is pivotal to the business. This has led to a lot of research in areas of search, recommendation systems, per- sonalization, demand prediction etc. For all these tasks, de- tailed understanding of product and users become ext...

work page
[2]

Product titles are structured and the average length of product title is 7.3 words

Textual Data: This involves products’ title (name), description and cataloged attributes like brand, color, fabric and physical attributes like neck, pattern etc. Product titles are structured and the average length of product title is 7.3 words. Product descriptions vary a lot based on the products and contain both structured and unstructured information...

work page
[3]

These signals are good indicators for visibility and popularity of products on the platform

Clickstream Data: This includes all the users’ ses- sions and the involved interactions including searches, impressions, clicks, sorts and, ﬁlters used, add to carts, purchases etc. These signals are good indicators for visibility and popularity of products on the platform

work page
[4]

One Embedding To Do Them All

Visual Data: This includes product images available in the catalog. Each product on an average is repre- sented by at least 4 images. These images are mostly shot in a controlled setting with solid color background and model poses. Our work focuses on capturing a wider variety of signals from various data sources (as mentioned above) to embed all products...

work page internal anchor Pith review Pith/arXiv arXiv 1906
[5]

Embedding to Attribute : This task attempts to evaluate learned embeddings on how well they can cap- ture the products’ textual attributes like brand, color etc

work page
[6]

We show how our uniﬁed embeddings are able to better capture the sim- ilarity

Clicked-Purchased Product Similarity: we com- pute the similarity of the purchased product in a ses- sion with those which were clicked. We show how our uniﬁed embeddings are able to better capture the sim- ilarity

work page
[7]

Hence, through cart return predic- tion, we aim to identify the cart products which have a high probability of being returned and take corrective actions

Cart Return Prediction : Returns ensue bad user experience apart from extra operational costs incurred by our platform. Hence, through cart return predic- tion, we aim to identify the cart products which have a high probability of being returned and take corrective actions. This task involves using product embeddings to predict if a user u would return a ...

work page
[8]

For implicit feedback setting, in- terpreting unobserved feedback poses a challenge

RELATED WORK Traditionally, product representations have been learned through Matrix Factorization and related approaches [9, 16] which use only user’s feedback. For implicit feedback setting, in- terpreting unobserved feedback poses a challenge. [9] in- terprets unobserved feedback to be negative thereby asso- ciating weights with feedback and factorize ...

work page
[9]

As shown in Figure 1 we evaluate embeddings learned from diﬀerent data sources-

METHODOLOGY Figure 1: Diﬀerent Techniques to Learn Product Embeddings This section describes diﬀerent ways to learn product em- beddings. As shown in Figure 1 we evaluate embeddings learned from diﬀerent data sources-

work page
[10]

Clickstream Data: BPR-MF, Prod2Vec and DeepWalk- Prod2Vec

work page
[11]

Content Data (Catalogue and Image): Denoising Au- toencoder and Image Embeddings

work page
[12]

Table 1 describes the terminology used

Clickstream and Content Data: ProdSI2Vec (ProductSide- Information2Vec), DeepWalk-ProdSI2Vec and Uniﬁed Embeddings In addition to using user’s lifetime data, we also compare the performance of Prod2Vec and Prod-SI2Vec with graph based embeddings learned from a platform level item-item graph. Table 1 describes the terminology used. Symbol Meaning U the set...

work page
[13]

Brand:Nike, Puma, Adidas,

work page
[14]

BaseColor: Black, Red, Blue, Green,

work page
[15]

Fabric: Cotton, Polyester, Blended,

work page
[16]

Priceband: 0-500, 500-1000, 1000-1500, ...., 3000+

work page
[17]

Neck: Round Neck, Polo Collar, V-neck,

work page
[18]

In this approach, alongwith the product-product pairs we also generate product-SI pairs and SI-SI pairs to be input to the Word2Vec model

Pattern: Printed, Solid, Striped, Colorblocked, .... In this approach, alongwith the product-product pairs we also generate product-SI pairs and SI-SI pairs to be input to the Word2Vec model. For each (centre-product, context- product) pair, we generate the following tuples:

work page
[19]

(Pcentre,PSIcentre), for each SI of the centre product

work page
[20]

(Pcentre,PSIcontext), for each SI of the context product

work page
[21]

Thus we also learn vectors for each of those key-value pair from SI

(PSIcentre,PSIcontext), for each (SI,SI) pair from centre and context products By doing so we have increased vocabulary size from total number of products to total number products plus the total number of SI key-value pairs. Thus we also learn vectors for each of those key-value pair from SI. 3.4.3 DeepWalk-Prod2V ec and DeepWalk-ProdSI2V ec DeepWalk was ...

work page
[22]

Unifying Embeddings from ProdSI2Vec and Images

work page
[23]

The weights are learned us- ing grid search on the cross-validation dataset of the down- stream task we use the embeddings for

Unifying Embeddings from DeepWalk-ProdSI2Vec and Images We propose a simple weighted average to unify these em- beddings: γp =wI·γpI +wPSV ·γpP SV (9) whereγpI are image embeddings and wI is the weight asso- ciated with them, γpP SV are Word2Vec based embeddings (ProdSI2Vec or DeepWalk-ProdSI2Vec) and wPSV is the weight associated with them. The weights a...

work page
[24]

The generalizability of embeddings implies that they be able to capture all the signals which eﬀect tastes of a user

RESULTS We evaluate the performance of all the nine embeddings on three diﬀerent tasks, which chosen to be varied enough so as to be able to check the generalizability of embeddings. The generalizability of embeddings implies that they be able to capture all the signals which eﬀect tastes of a user. Table 2 shows nine types of product embeddings which are...

work page
[25]

CONCLUSION We propose a framework to combine multiple data sources - catalog text data, user’s clickstream session data, and product images and generate a uniﬁed representation of all products in a product semantic space . We utilized various state-of-art techniques like denoising auto-encoders for text, Bayesian personalized ranking (BPR) for clickstream...

work page
[26]

Personalizing Similar Product Recommendations in Fashion E-commerce

Agarwal, P., Vempati, S., and Borar, S. Person- alizing similar product recommendations in fashion e- commerce. arXiv preprint arXiv:1806.11371 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[27]

Deciphering fashion sensibility using community de- tection

Arora, S., Madvariya, A., Alok, D., and Borar, S. Deciphering fashion sensibility using community de- tection. KDDW on ML meets fashion (2017)

work page 2017
[28]

Decoding fashion con- texts using word embeddings

Arora, S., and W arrier, D. Decoding fashion con- texts using word embeddings. In KDD Workshop on Machine learning meets fashion (2016)

work page 2016
[29]

Real-time personaliza- tion using embeddings for search ranking at airbnb

Grbovic, M., and Cheng, H. Real-time personaliza- tion using embeddings for search ranking at airbnb. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018), ACM, pp. 311–320

work page 2018
[30]

E-commerce in your inbox: Product recom- mendations at scale

Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati, N., Savla, J., Bhagwan, V., and Sharp, D. E-commerce in your inbox: Product recom- mendations at scale. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining (2015), ACM, pp. 1809–1818

work page 2015
[31]

node2vec: Scalable feature learning for networks

Grover, A., and Leskovec, J. node2vec: Scalable feature learning for networks. InProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (2016), ACM, pp. 855–864

work page 2016
[32]

Ups and downs: Model- ing the visual evolution of fashion trends with one-class collaborative ﬁltering

He, R., and McAuley, J. Ups and downs: Model- ing the visual evolution of fashion trends with one-class collaborative ﬁltering. In proceedings of the 25th inter- national conference on world wide web (2016), Interna- tional World Wide Web Conferences Steering Commit- tee, pp. 507–517

work page 2016
[33]

Vbpr: Visual bayesian personalized ranking from implicit feedback

He, R., and McAuley, J. Vbpr: Visual bayesian personalized ranking from implicit feedback. In AAAI (2016), pp. 144–150

work page 2016
[34]

Collaborative ﬁltering for implicit feedback datasets

Hu, Y., Koren, Y., and Volinsky, C. Collaborative ﬁltering for implicit feedback datasets. In Data Mining,

work page
[35]

Eighth IEEE International Conference on (2008), Ieee, pp

ICDM’08. Eighth IEEE International Conference on (2008), Ieee, pp. 263–272

work page 2008
[36]

Visually-aware fashion recommendation and design with generative image models

Kang, W.-C., F ang, C., W ang, Z., and McAuley, J. Visually-aware fashion recommendation and design with generative image models. InData Mining (ICDM), 2017 IEEE International Conference on (2017), IEEE, pp. 207–216

work page 2017
[37]

Efficient Large-Scale Multi-Modal Classification

Kiela, D., Grave, E., Joulin, A., and Mikolov, T. Eﬃcient large-scale multi-modal classiﬁcation.arXiv preprint arXiv:1802.02892 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[38]

Neural word embedding as implicit matrix factorization

Levy, O., and Goldberg, Y. Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems (2014), pp. 2177–2185

work page 2014
[39]

Efficient Estimation of Word Representations in Vector Space

Mikolov, T., Chen, K., Corrado, G., and Dean, J. Eﬃcient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[40]

Specializing Joint Representations for the task of Product Recommendation

Nedelec, T., Smirnova, E., and V asile, F. Spe- cializing joint representations for the task of prod- uct recommendation. arXiv preprint arXiv:1706.07625 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[41]

Deepwalk: Online learning of social representations

Perozzi, B., Al-Rfou, R., and Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (2014), ACM, pp. 701–710

work page 2014
[42]

Bpr: Bayesian personalized rank- ing from implicit feedback

Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. Bpr: Bayesian personalized rank- ing from implicit feedback. InProceedings of the twenty- ﬁfth conference on uncertainty in artiﬁcial intelligence (2009), AUAI Press, pp. 452–461

work page 2009
[43]

The distributional hypothesis

Sahlgren, M. The distributional hypothesis. Italian Journal of Disability Studies 20 (2008), 33–53

work page 2008
[44]

Line: Large-scale information net- work embedding

Tang, J., Qu, M., W ang, M., Zhang, M., Yan, J., and Mei, Q. Line: Large-scale information net- work embedding. In Proceedings of the 24th Interna- tional Conference on World Wide Web (2015), Inter- national World Wide Web Conferences Steering Com- mittee, pp. 1067–1077

work page 2015
[45]

Meta- prod2vec: Product embeddings using side-information for recommendation

V asile, F., Smirnova, E., and Conneau, A. Meta- prod2vec: Product embeddings using side-information for recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems (2016), ACM, pp. 225–232

work page 2016
[46]

Extracting and composing robust features with denoising autoencoders

Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning (2008), ACM, pp. 1096–1103

work page 2008
[47]

Learn- ing ﬁne-grained image similarity with deep ranking

W ang, J., Song, Y., Leung, T., Rosenberg, C., W ang, J., Philbin, J., Chen, B., and Wu, Y. Learn- ing ﬁne-grained image similarity with deep ranking. In Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (2014), pp. 1386–1393

work page 2014

[1] [1]

Matching consumer’s need and retrieving relevant products is pivotal to the business

INTRODUCTION E-commerce is growing at a phenomenal rate around the world. Matching consumer’s need and retrieving relevant products is pivotal to the business. This has led to a lot of research in areas of search, recommendation systems, per- sonalization, demand prediction etc. For all these tasks, de- tailed understanding of product and users become ext...

work page

[2] [2]

Product titles are structured and the average length of product title is 7.3 words

Textual Data: This involves products’ title (name), description and cataloged attributes like brand, color, fabric and physical attributes like neck, pattern etc. Product titles are structured and the average length of product title is 7.3 words. Product descriptions vary a lot based on the products and contain both structured and unstructured information...

work page

[3] [3]

These signals are good indicators for visibility and popularity of products on the platform

Clickstream Data: This includes all the users’ ses- sions and the involved interactions including searches, impressions, clicks, sorts and, ﬁlters used, add to carts, purchases etc. These signals are good indicators for visibility and popularity of products on the platform

work page

[4] [4]

One Embedding To Do Them All

Visual Data: This includes product images available in the catalog. Each product on an average is repre- sented by at least 4 images. These images are mostly shot in a controlled setting with solid color background and model poses. Our work focuses on capturing a wider variety of signals from various data sources (as mentioned above) to embed all products...

work page internal anchor Pith review Pith/arXiv arXiv 1906

[5] [5]

Embedding to Attribute : This task attempts to evaluate learned embeddings on how well they can cap- ture the products’ textual attributes like brand, color etc

work page

[6] [6]

We show how our uniﬁed embeddings are able to better capture the sim- ilarity

Clicked-Purchased Product Similarity: we com- pute the similarity of the purchased product in a ses- sion with those which were clicked. We show how our uniﬁed embeddings are able to better capture the sim- ilarity

work page

[7] [7]

Hence, through cart return predic- tion, we aim to identify the cart products which have a high probability of being returned and take corrective actions

Cart Return Prediction : Returns ensue bad user experience apart from extra operational costs incurred by our platform. Hence, through cart return predic- tion, we aim to identify the cart products which have a high probability of being returned and take corrective actions. This task involves using product embeddings to predict if a user u would return a ...

work page

[8] [8]

For implicit feedback setting, in- terpreting unobserved feedback poses a challenge

RELATED WORK Traditionally, product representations have been learned through Matrix Factorization and related approaches [9, 16] which use only user’s feedback. For implicit feedback setting, in- terpreting unobserved feedback poses a challenge. [9] in- terprets unobserved feedback to be negative thereby asso- ciating weights with feedback and factorize ...

work page

[9] [9]

As shown in Figure 1 we evaluate embeddings learned from diﬀerent data sources-

METHODOLOGY Figure 1: Diﬀerent Techniques to Learn Product Embeddings This section describes diﬀerent ways to learn product em- beddings. As shown in Figure 1 we evaluate embeddings learned from diﬀerent data sources-

work page

[10] [10]

Clickstream Data: BPR-MF, Prod2Vec and DeepWalk- Prod2Vec

work page

[11] [11]

Content Data (Catalogue and Image): Denoising Au- toencoder and Image Embeddings

work page

[12] [12]

Table 1 describes the terminology used

Clickstream and Content Data: ProdSI2Vec (ProductSide- Information2Vec), DeepWalk-ProdSI2Vec and Uniﬁed Embeddings In addition to using user’s lifetime data, we also compare the performance of Prod2Vec and Prod-SI2Vec with graph based embeddings learned from a platform level item-item graph. Table 1 describes the terminology used. Symbol Meaning U the set...

work page

[13] [13]

Brand:Nike, Puma, Adidas,

work page

[14] [14]

BaseColor: Black, Red, Blue, Green,

work page

[15] [15]

Fabric: Cotton, Polyester, Blended,

work page

[16] [16]

Priceband: 0-500, 500-1000, 1000-1500, ...., 3000+

work page

[17] [17]

Neck: Round Neck, Polo Collar, V-neck,

work page

[18] [18]

In this approach, alongwith the product-product pairs we also generate product-SI pairs and SI-SI pairs to be input to the Word2Vec model

Pattern: Printed, Solid, Striped, Colorblocked, .... In this approach, alongwith the product-product pairs we also generate product-SI pairs and SI-SI pairs to be input to the Word2Vec model. For each (centre-product, context- product) pair, we generate the following tuples:

work page

[19] [19]

(Pcentre,PSIcentre), for each SI of the centre product

work page

[20] [20]

(Pcentre,PSIcontext), for each SI of the context product

work page

[21] [21]

Thus we also learn vectors for each of those key-value pair from SI

(PSIcentre,PSIcontext), for each (SI,SI) pair from centre and context products By doing so we have increased vocabulary size from total number of products to total number products plus the total number of SI key-value pairs. Thus we also learn vectors for each of those key-value pair from SI. 3.4.3 DeepWalk-Prod2V ec and DeepWalk-ProdSI2V ec DeepWalk was ...

work page

[22] [22]

Unifying Embeddings from ProdSI2Vec and Images

work page

[23] [23]

The weights are learned us- ing grid search on the cross-validation dataset of the down- stream task we use the embeddings for

Unifying Embeddings from DeepWalk-ProdSI2Vec and Images We propose a simple weighted average to unify these em- beddings: γp =wI·γpI +wPSV ·γpP SV (9) whereγpI are image embeddings and wI is the weight asso- ciated with them, γpP SV are Word2Vec based embeddings (ProdSI2Vec or DeepWalk-ProdSI2Vec) and wPSV is the weight associated with them. The weights a...

work page

[24] [24]

The generalizability of embeddings implies that they be able to capture all the signals which eﬀect tastes of a user

RESULTS We evaluate the performance of all the nine embeddings on three diﬀerent tasks, which chosen to be varied enough so as to be able to check the generalizability of embeddings. The generalizability of embeddings implies that they be able to capture all the signals which eﬀect tastes of a user. Table 2 shows nine types of product embeddings which are...

work page

[25] [25]

CONCLUSION We propose a framework to combine multiple data sources - catalog text data, user’s clickstream session data, and product images and generate a uniﬁed representation of all products in a product semantic space . We utilized various state-of-art techniques like denoising auto-encoders for text, Bayesian personalized ranking (BPR) for clickstream...

work page

[26] [26]

Personalizing Similar Product Recommendations in Fashion E-commerce

Agarwal, P., Vempati, S., and Borar, S. Person- alizing similar product recommendations in fashion e- commerce. arXiv preprint arXiv:1806.11371 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[27] [27]

Deciphering fashion sensibility using community de- tection

Arora, S., Madvariya, A., Alok, D., and Borar, S. Deciphering fashion sensibility using community de- tection. KDDW on ML meets fashion (2017)

work page 2017

[28] [28]

Decoding fashion con- texts using word embeddings

Arora, S., and W arrier, D. Decoding fashion con- texts using word embeddings. In KDD Workshop on Machine learning meets fashion (2016)

work page 2016

[29] [29]

Real-time personaliza- tion using embeddings for search ranking at airbnb

Grbovic, M., and Cheng, H. Real-time personaliza- tion using embeddings for search ranking at airbnb. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018), ACM, pp. 311–320

work page 2018

[30] [30]

E-commerce in your inbox: Product recom- mendations at scale

Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati, N., Savla, J., Bhagwan, V., and Sharp, D. E-commerce in your inbox: Product recom- mendations at scale. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining (2015), ACM, pp. 1809–1818

work page 2015

[31] [31]

node2vec: Scalable feature learning for networks

Grover, A., and Leskovec, J. node2vec: Scalable feature learning for networks. InProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (2016), ACM, pp. 855–864

work page 2016

[32] [32]

Ups and downs: Model- ing the visual evolution of fashion trends with one-class collaborative ﬁltering

He, R., and McAuley, J. Ups and downs: Model- ing the visual evolution of fashion trends with one-class collaborative ﬁltering. In proceedings of the 25th inter- national conference on world wide web (2016), Interna- tional World Wide Web Conferences Steering Commit- tee, pp. 507–517

work page 2016

[33] [33]

Vbpr: Visual bayesian personalized ranking from implicit feedback

He, R., and McAuley, J. Vbpr: Visual bayesian personalized ranking from implicit feedback. In AAAI (2016), pp. 144–150

work page 2016

[34] [34]

Collaborative ﬁltering for implicit feedback datasets

Hu, Y., Koren, Y., and Volinsky, C. Collaborative ﬁltering for implicit feedback datasets. In Data Mining,

work page

[35] [35]

Eighth IEEE International Conference on (2008), Ieee, pp

ICDM’08. Eighth IEEE International Conference on (2008), Ieee, pp. 263–272

work page 2008

[36] [36]

Visually-aware fashion recommendation and design with generative image models

Kang, W.-C., F ang, C., W ang, Z., and McAuley, J. Visually-aware fashion recommendation and design with generative image models. InData Mining (ICDM), 2017 IEEE International Conference on (2017), IEEE, pp. 207–216

work page 2017

[37] [37]

Efficient Large-Scale Multi-Modal Classification

Kiela, D., Grave, E., Joulin, A., and Mikolov, T. Eﬃcient large-scale multi-modal classiﬁcation.arXiv preprint arXiv:1802.02892 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[38] [38]

Neural word embedding as implicit matrix factorization

Levy, O., and Goldberg, Y. Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems (2014), pp. 2177–2185

work page 2014

[39] [39]

Efficient Estimation of Word Representations in Vector Space

Mikolov, T., Chen, K., Corrado, G., and Dean, J. Eﬃcient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[40] [40]

Specializing Joint Representations for the task of Product Recommendation

Nedelec, T., Smirnova, E., and V asile, F. Spe- cializing joint representations for the task of prod- uct recommendation. arXiv preprint arXiv:1706.07625 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[41] [41]

Deepwalk: Online learning of social representations

Perozzi, B., Al-Rfou, R., and Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (2014), ACM, pp. 701–710

work page 2014

[42] [42]

Bpr: Bayesian personalized rank- ing from implicit feedback

Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. Bpr: Bayesian personalized rank- ing from implicit feedback. InProceedings of the twenty- ﬁfth conference on uncertainty in artiﬁcial intelligence (2009), AUAI Press, pp. 452–461

work page 2009

[43] [43]

The distributional hypothesis

Sahlgren, M. The distributional hypothesis. Italian Journal of Disability Studies 20 (2008), 33–53

work page 2008

[44] [44]

Line: Large-scale information net- work embedding

Tang, J., Qu, M., W ang, M., Zhang, M., Yan, J., and Mei, Q. Line: Large-scale information net- work embedding. In Proceedings of the 24th Interna- tional Conference on World Wide Web (2015), Inter- national World Wide Web Conferences Steering Com- mittee, pp. 1067–1077

work page 2015

[45] [45]

Meta- prod2vec: Product embeddings using side-information for recommendation

V asile, F., Smirnova, E., and Conneau, A. Meta- prod2vec: Product embeddings using side-information for recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems (2016), ACM, pp. 225–232

work page 2016

[46] [46]

Extracting and composing robust features with denoising autoencoders

Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning (2008), ACM, pp. 1096–1103

work page 2008

[47] [47]

Learn- ing ﬁne-grained image similarity with deep ranking

W ang, J., Song, Y., Leung, T., Rosenberg, C., W ang, J., Philbin, J., Chen, B., and Wu, Y. Learn- ing ﬁne-grained image similarity with deep ranking. In Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (2014), pp. 1386–1393

work page 2014