Learning to Rank Broad and Narrow Queries in E-Commerce

Sagar Arora; Siddhartha Devapujula; Sumit Borar

arxiv: 1907.01549 · v2 · pith:QVBMCLWQnew · submitted 2019-07-01 · 💻 cs.IR · cs.CL· cs.LG· stat.ML

Learning to Rank Broad and Narrow Queries in E-Commerce

Siddhartha Devapujula , Sagar Arora , Sumit Borar This is my paper

Pith reviewed 2026-05-25 11:23 UTC · model grok-4.3

classification 💻 cs.IR cs.CLcs.LGstat.ML

keywords learning to ranke-commerce searchquery segmentationbroad queriesnarrow queriesfashion searchLETOR

0 comments

The pith

Specialized models for broad and narrow queries outperform a combined model in fashion search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a learning-to-rank framework for e-commerce product search. It first segments queries into broad and narrow categories according to inferred user intent. Separate ranking models are then trained on each segment and shown to deliver higher performance than one model trained on the full set of queries. The work also shows how denoising auto-encoders and word embeddings address sparsity in product and query features. These steps matter because different query types reflect distinct shopping goals that a single model struggles to serve equally well.

Core claim

The central claim is that, on fashion-category data, distinct pointwise and pairwise LETOR models trained on broad queries alone and on narrow queries alone outperform a single combined model trained on all queries. Query segmentation is performed by analyzing user intent, features are drawn from query, product, and query-product sources, and sparsity is mitigated with a denoising auto-encoder for product features plus skip-gram embeddings for query-product matching. Multiple target metrics are compared for robustness.

What carries the argument

A query-segmentation mechanism that divides queries into broad versus narrow categories on the basis of user intent, used to train separate pointwise and pairwise learning-to-rank models.

If this is right

Feature importance patterns differ between broad-query and narrow-query models.
Target metrics can be evaluated for stability when ranking is split by query type.
Sparsity-handling techniques enable the use of otherwise unusable product and query features.
Pointwise and pairwise training both benefit from the segmentation step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The segmentation step could be applied to non-fashion verticals if intent signals remain consistent.
Real-time query classification would be required for the specialized models to be deployed at scale.
Conversion or revenue metrics might improve if the ranking objective is aligned with the same broad-narrow split.

Load-bearing premise

The proposed way of dividing queries into broad versus narrow categories correctly reflects user intent and remains stable across product categories and time periods.

What would settle it

Train a single combined model on the full fashion query set and show that its ranking quality on a held-out test set equals or exceeds the quality of the two specialized models.

Figures

Figures reproduced from arXiv: 1907.01549 by Sagar Arora, Siddhartha Devapujula, Sumit Borar.

**Figure 1.** Figure 1: Query Distribution basis Coherency Scores For such queries on our platform we observe a more skewed distribution of traffic across queries, a 90-10 distribution arXiv:1907.01549v2 [cs.IR] 15 Jul 2019 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Coherency Score vs Recall Set Size Section 4 describes the system architecture and designs and Section 5 discusses the feature engineering and target variables. Finally, Section 6 describes our modeling approach, results and analysis. Our paper makes the following contributions: 1. We provide an approach to segment our queries into broad and narrow basis how coherent the downstream sessions are. We show … view at source ↗

**Figure 3.** Figure 3: Architecture When a user issues a query, the retrieval layer renders a set of top-K (typically 1000) products based on BM-25 score. Traditional BM25 based approaches are quite effective for retrieval; however in broad queries like “tshirts”, it becomes extremely significant to include business metrics like CTR and conversion to optimize the ranking. Not just the demand, it becomes critical for our platfor… view at source ↗

**Figure 4.** Figure 4: Denoising Autoencoder Architecture we compare this with normal autoencoder (without the noise layer). The autoencoder based approach would greatly help to reduce sparsity in features and further assist LETOR models to optimise ranking by learning weights (parameters) for different features. 3. Query Product Features The query product features are features which involve both query and product - eg ctr of … view at source ↗

**Figure 5.** Figure 5: Comparison of different models. All xla [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Broad Vs Narrow Clearly, LamdaMart performs the best across different feature combinations (except YYYN where RankNet performs Query Similar Attributes with cosine similarity nike dri-fit(0.72), adidas(0.64), puma(0.58), sportwear(0.55) baniyan (hindi word for vest) vests white(0.57), sando (0.513), cotton vest (0.51), innerwear(0.504) swimwear swimsuit(0.915), swimdress(0.718), tankini(0.701), bikini(0.6… view at source ↗

read the original abstract

Search is a prominent channel for discovering products on an e-commerce platform. Ranking products retrieved from search becomes crucial to address customer's need and optimize for business metrics. While learning to Rank (LETOR) models have been extensively studied and have demonstrated efficacy in the context of web search; it is a relatively new research area to be explored in the e-commerce. In this paper, we present a framework for building LETOR model for an e-commerce platform. We analyze user queries and propose a mechanism to segment queries between broad and narrow based on user's intent. We discuss different types of features - query, product and query-product and discuss challenges in using them. We show that sparsity in product features can be tackled through a denoising auto-encoder while skip-gram based word embeddings help solve the query-product sparsity issues. We also present various target metrics that can be employed for evaluating search results and compare their robustness. Further, we build and compare performances of both pointwise and pairwise LETOR models on fashion category data set. We also build and compare distinct models for broad and narrow queries, analyze feature importance across these and show that these specialized models perform better than a combined model in the fashion world.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports that separate LETOR models for broad and narrow queries beat a single model on fashion e-commerce data, but the segmentation step has no validation shown.

read the letter

The main point is that splitting queries by intent into broad and narrow categories, then training distinct ranking models, gives better results than one combined model on their fashion dataset. They also describe using a denoising auto-encoder on product features and skip-gram embeddings to deal with sparsity in query-product pairs, plus a comparison of pointwise and pairwise approaches and some notes on evaluation metrics. The practical angle on e-commerce LETOR is the clearest addition here, since most prior work stays in web search. The feature handling steps are straightforward and address real issues that come up when product catalogs are large and queries vary widely. The segmentation itself is the load-bearing piece, yet the description supplies no human agreement numbers, stability checks over time, or tests across categories. If the split relies mainly on surface signals like length or frequency, the reported lift could simply follow from how the data was divided rather than from genuine differences in intent. The abstract also omits dataset size, any performance numbers, error bars, or significance tests, which leaves the strength of the claim hard to judge. This is aimed at engineers running search on retail platforms who need ideas for handling query variety and sparse features. An applied reader could borrow the auto-encoder and embedding tactics as a starting point. It is worth sending to peer review because the setting is concrete and the techniques are reproducible in principle, but the authors would need to add validation for the segmentation and full experimental details before it could be taken as settled.

Referee Report

2 major / 1 minor

Summary. The paper presents a LETOR framework for e-commerce search ranking. It proposes a mechanism to segment queries into broad versus narrow based on user intent, discusses query/product/query-product features and sparsity mitigation via denoising auto-encoders and skip-gram embeddings, compares target metrics, and evaluates pointwise and pairwise models on fashion data. The central empirical claim is that distinct models trained on broad and narrow queries outperform a single combined model.

Significance. If the segmentation is shown to be valid and the gains are robust, the result would offer a practical way to improve ranking quality in e-commerce by tailoring models to query breadth, with direct business relevance for fashion verticals. The sparsity-handling techniques are standard but appropriately applied; the comparison of evaluation metrics is a secondary contribution.

major comments (2)

[Query segmentation mechanism (described in abstract and methods)] The headline result (specialized broad/narrow models outperform the combined model) is load-bearing on the claim that the proposed segmentation accurately captures user intent. The manuscript describes a mechanism but supplies no quantitative validation such as agreement with human labels, temporal stability, or cross-vertical consistency; without this, any reported lift could be an artifact of the partition rule rather than genuine intent differences.
[Experimental results and evaluation (abstract and results sections)] The abstract states that comparative experiments were performed on fashion data and that specialized models perform better, yet reports no performance numbers, dataset sizes, error bars, or statistical tests. This absence prevents verification of effect sizes or robustness to metric selection and post-hoc choices.

minor comments (1)

[Abstract] The abstract would benefit from a one-sentence description of the segmentation heuristic and the magnitude of the observed improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the presentation and claims. We respond to each major comment below and commit to revisions where appropriate.

read point-by-point responses

Referee: [Query segmentation mechanism (described in abstract and methods)] The headline result (specialized broad/narrow models outperform the combined model) is load-bearing on the claim that the proposed segmentation accurately captures user intent. The manuscript describes a mechanism but supplies no quantitative validation such as agreement with human labels, temporal stability, or cross-vertical consistency; without this, any reported lift could be an artifact of the partition rule rather than genuine intent differences.

Authors: We agree that the segmentation's validity is central to the interpretation of results. The manuscript describes a rule-based mechanism using query characteristics to approximate intent differences, and the observed performance gains provide supporting evidence. However, we acknowledge the absence of direct quantitative validation. In the revised manuscript we will add an analysis of agreement between the segmentation and human labels on a sampled set of queries, along with checks for temporal stability. revision: yes
Referee: [Experimental results and evaluation (abstract and results sections)] The abstract states that comparative experiments were performed on fashion data and that specialized models perform better, yet reports no performance numbers, dataset sizes, error bars, or statistical tests. This absence prevents verification of effect sizes or robustness to metric selection and post-hoc choices.

Authors: We agree that the experimental reporting requires more detail for verifiability. While the results section contains model comparisons, specific numerical values, dataset sizes, error bars, and statistical tests are not presented with sufficient prominence. In the revision we will incorporate concrete performance numbers, dataset statistics, standard errors, and significance tests into both the abstract and results sections. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical model comparison on held-out data

full rationale

The paper describes an empirical workflow: a segmentation heuristic for broad vs. narrow queries, feature engineering (including auto-encoders and embeddings), and training/comparison of pointwise and pairwise LETOR models on fashion data, with performance evaluated on held-out sets. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim (specialized models outperform a combined model) is a direct empirical result rather than a reduction to inputs by construction. The segmentation step is an input assumption whose validity is external to the reported metrics, but this does not create definitional or self-referential circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that query intent can be reliably partitioned into broad and narrow categories and that standard sparsity-handling techniques transfer to product and query-product features.

axioms (1)

domain assumption User queries can be meaningfully segmented into broad and narrow based on intent
Central to the proposed framework and to the claim that specialized models outperform a combined one.

pith-pipeline@v0.9.0 · 5749 in / 1189 out tokens · 22043 ms · 2026-05-25T11:23:10.489182+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 2 internal anchors

[1]

Learning to Rank Broad and Narrow Queries in E-Commerce

INTRODUCTION Users on an e-commerce platform typically discover prod- ucts through search, browsing categories or marketing cam- paigns. On our platform, search functionality is key to prod- uct discovery as each of these channels translates to a search query in the back-end. Search ranking is a critical aspect of our business. Hence any improvement in th...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[2]

We show that segmenting queries and training diﬀerent models for each can be a better ap- proach than training single model across the board

We provide an approach to segment our queries into broad and narrow basis how coherent the downstream sessions are. We show that segmenting queries and training diﬀerent models for each can be a better ap- proach than training single model across the board

work page
[3]

Apart from using typical query features, product fea- tures and query-product features, we propose a denois- ing autoencoder based architecture to reduce sparsity of product features and skip gram based word embed- dings for query-product features

work page
[4]

Further, we study the behaviour of various target variables - CTR, Add to cart ratio, conversion and Revenue Per Impression

We demonstrate the impact of various combinations of diﬀerent types of features on the model’s perfor- mance. Further, we study the behaviour of various target variables - CTR, Add to cart ratio, conversion and Revenue Per Impression

work page
[5]

We also show how our model signiﬁcantly improves NDCG compared to the baseline model built upon style popularity

We highlight the diﬀerences between broad and narrow queries in terms of modelling approach, feature impor- tance etc. We also show how our model signiﬁcantly improves NDCG compared to the baseline model built upon style popularity. 1The mean search engine ranking position for all the clicked products

work page
[6]

Various LETOR models like RankNet, Lam- daMart, AdaRank and RankBoost have been compared on Web Search data [3, 5–7]

RELA TED WORK LETOR methods have demonstrated their success in web search [3, 4]. Various LETOR models like RankNet, Lam- daMart, AdaRank and RankBoost have been compared on Web Search data [3, 5–7]. Moreover search has been exten- sively studied in the e-commerce primarily from the retrieval perspective [8–12]. Karmaker et al. [2] attempted to apply LETO...

work page
[7]

We have divided the queries into train and test in the ratio of 70:30

BROAD AND NARROW QUERIES We randomly sampled 100k queries and labelled them as Broad/Narrow. We have divided the queries into train and test in the ratio of 70:30. Later we have trained a SVM clas- siﬁer with radial-basis kernel on the train queries. We have considered multiple sets of features - Word2Vec of query, result set size and identiﬁed query attr...

work page
[8]

Clearly, it can be attributed to unnamed queries

This corresponds to bin number 92; which further corre- sponds to a recall set size of 1910-2300. Clearly, it can be attributed to unnamed queries. So, a query with coherency score≤ 0.58 is referred to as broad query while query with coherency score > 0.58 is referred to as narrow query. The table 1 shows various statistics regarding both the segments. It...

work page 1910
[9]

We use SOLR as the underlying search engine with over 3M fashion products indexed

SYSTEM DESIGN This section discusses the architecture (ﬁgure 3) involved in retrieval and ranking of search products on our platform. We use SOLR as the underlying search engine with over 3M fashion products indexed. Figure 3: Architecture When a user issues a query, the retrieval layer renders a set of top-K (typically 1000) products based on BM-25 score...

work page
[10]

3The primary focus of this paper would be LETOR, instead

Catalogue Data : Structured information regarding each product’s physical features like brand, color, mrp etc. 3The primary focus of this paper would be LETOR, instead

work page
[11]

Transactional Data: Product’s output business met- rics like daywise revenue, CTR etc

work page
[12]

Query-Clickstream logs : Logs each query and infor- mation regarding query’s downstream sessions like prod- ucts seen (impressions), clicked, added to cart, wish- listed, liked, purchased etc

work page
[13]

This section focuses on various features and target variables we used for training our LETOR models

MODEL Learning to rank is a popular approach that provides a prin- cipled way to optimize ranking of search results given various features. This section focuses on various features and target variables we used for training our LETOR models. From modelling perspective, we tried 2 pointwise models - Ran- dom Forests and Gradient Boosting Model and 2 pairwis...

work page
[14]

Query Features: These are features speciﬁc to query like total length of query, number of words, is brand (eg Nike, Tommy Hilﬁger) present in query, is article type (eg Dresses, Shoes) present in query, the identiﬁed article type, brand etc

work page
[15]

Product Features : These are features speciﬁc to the products (documents) They can either be popu- larity related or physical features. The popularity fea- tures include features involving past performance of the product’s brand or article type (hereafter referred to as entity) like revenue in 15 days, quantity sold in 15 days etc. It is worth mentioning ...

work page
[16]

bat- man printed tshirt

Query Product Features The query product fea- tures are features which involve both query and prod- uct - eg ctr of a tshirt product when the query is “bat- man printed tshirt”. The query product features can again be of 2 diﬀerent types: popularity based and relevance based. The popularity features include past performance of product’s entity as a result...

work page
[17]

Better the ranking, higher would be the CTR CTRqp = Cqp Iqp (5)

Click-Through Rate It is the probability of clicking on the listpage (the page as a result of search query). Better the ranking, higher would be the CTR CTRqp = Cqp Iqp (5)

work page
[18]

It is the perceived utility of click page

Add to cart Rate It is the probability of adding a product to the cart post the click. It is the perceived utility of click page. ATCRqp = Bqp Cqp (6)

work page
[19]

This can be considered as the overall satisfaction of the user

Conversion It is the probability of purchasing a prod- uct from listpage. This can be considered as the overall satisfaction of the user. Convqp = Bqp Iqp (7)

work page
[20]

RPIqp = Rqp Iqp (8)

Revenue per Impression It refers to the overall business value (revenue) from each impression as a re- sult of the query. RPIqp = Rqp Iqp (8)

work page
[21]

We have classiﬁed them into broad and narrow using on the model described in Section 3

RESULTS 6.1 Dataset We randomly sampled 100k queries. We have classiﬁed them into broad and narrow using on the model described in Section 3. We resulted 13k broad queries and 87k narrow queries. We sampled queries in 80-20 proportion in strat- iﬁed manner to collate train (broad and narrow) and test (broad and narrow) queries. 6.1.1 Train and Test Data W...

work page
[22]

Letip be the predicted rank position for each product andii be the ideal rank position

products for the query and compute relevance scores for each query-product as above. Letip be the predicted rank position for each product andii be the ideal rank position. Now for each query q, compute DCG = ∑ p 2relqp−1 log2(ip + 1) (10) IDCG = ∑ p 2relqp−1 log2(ii + 1) (11) NDCG = DCG IDCG (12) 6.2 Cross Target Learning In e-commerce, choosing one targ...

work page
[23]

We proposed a notion of coherency score and used it to seg- ment queries into broad and narrow

CONCLUSIONS We presented a framework for building LETOR models for an e-commerce platform - speciﬁcally for theunnamed queries. We proposed a notion of coherency score and used it to seg- ment queries into broad and narrow. We discussed the chal- lenges involved in feature representation (query, product and query-product) and target metrics (ctr,atcr,conv...

work page
[24]

Did We Get It Right? Predicting Query Performance in E-commerce Search

R. Kumar, M. Kumar, N. Shah, and C. Faloutsos, “Did we get it right? predicting query performance in e-commerce search,” arXiv preprint arXiv:1808.00239 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

On application of learning to rank for e-commerce search,

S. K. Karmaker Santu, P. Sondhi, and C. Zhai, “On application of learning to rank for e-commerce search,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval. ACM, 2017, pp. 475–484

work page 2017
[26]

Yahoo! learning to rank challenge overview,

O. Chapelle and Y. Chang, “Yahoo! learning to rank challenge overview,” in Proceedings of the Learning to Rank Challenge , 2011, pp. 1–24

work page 2011
[27]

Advances in formal mod- els of search and search behaviour,

L. Azzopardi and G. Zuccon, “Advances in formal mod- els of search and search behaviour,” in Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval . ACM, 2016, pp. 1–4

work page 2016
[28]

Learning to rank for information retrieval and natural language processing,

H. Li, “Learning to rank for information retrieval and natural language processing,” Synthesis Lectures on Human Language Technologies, vol. 7, no. 3, pp. 1–121, 2014

work page 2014
[29]

Learning to rank for information re- trieval,

T.-Y. Liu et al. , “Learning to rank for information re- trieval,” Foundations and Trends R⃝ in Information Re- trieval, vol. 3, no. 3, pp. 225–331, 2009

work page 2009
[30]

From ranknet to lambdarank to lamb- damart: An overview

C. J. Burges, “From ranknet to lambdarank to lamb- damart: An overview.”

work page
[31]

Diversifying search results,

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, “Diversifying search results,” in Proceedings of the sec- ond ACM international conference on web search and data mining . ACM, 2009, pp. 5–14

work page 2009
[32]

Towards a theory model for product search,

B. Li, A. Ghose, and P. G. Ipeirotis, “Towards a theory model for product search,” in Proceedings of the 20th international conference on World wide web . ACM, 2011, pp. 327–336

work page 2011
[33]

En- hancing product search by best-selling prediction in e- commerce,

B. Long, J. Bian, A. Dong, and Y. Chang, “En- hancing product search by best-selling prediction in e- commerce,” in Proceedings of the 21st ACM interna- tional conference on Information and knowledge man- agement. ACM, 2012, pp. 2479–2482

work page 2012
[34]

Learning latent vector spaces for product search,

C. Van Gysel, M. de Rijke, and E. Kanoulas, “Learning latent vector spaces for product search,” in Proceedings of the 25th ACM International on Conference on Infor- mation and Knowledge Management . ACM, 2016, pp. 165–174

work page 2016
[35]

Latent dirichlet allocation based diversiﬁed retrieval for e-commerce search,

J. Yu, S. Mohan, D. P. Putthividhya, and W.-K. Wong, “Latent dirichlet allocation based diversiﬁed retrieval for e-commerce search,” in Proceedings of the 7th ACM international conference on Web search and data min- ing. ACM, 2014, pp. 463–472

work page 2014
[36]

Narrow or broad?: Estimating subjec- tive speciﬁcity in exploratory search,

K. Athukorala, A. Oulasvirta, D. G lowacka, J. Vreeken, and G. Jacucci, “Narrow or broad?: Estimating subjec- tive speciﬁcity in exploratory search,” in Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management . ACM, 2014, pp. 819–828

work page 2014
[37]

Query ambigu- ity identiﬁcation based on user behavior information,

C. Luo, Y. Liu, M. Zhang, and S. Ma, “Query ambigu- ity identiﬁcation based on user behavior information,” in Asia Information Retrieval Symposium . Springer, 2014, pp. 36–47

work page 2014
[38]

Decoding fashion contexts using word embeddings

S. Arora and D. Warrier, “Decoding fashion contexts using word embeddings.”

work page
[39]

Wordnet: a lexical database for english,

G. A. Miller, “Wordnet: a lexical database for english,” Communications of the ACM , vol. 38, no. 11, pp. 39–41, 1995

work page 1995

[1] [1]

Learning to Rank Broad and Narrow Queries in E-Commerce

INTRODUCTION Users on an e-commerce platform typically discover prod- ucts through search, browsing categories or marketing cam- paigns. On our platform, search functionality is key to prod- uct discovery as each of these channels translates to a search query in the back-end. Search ranking is a critical aspect of our business. Hence any improvement in th...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[2] [2]

We show that segmenting queries and training diﬀerent models for each can be a better ap- proach than training single model across the board

We provide an approach to segment our queries into broad and narrow basis how coherent the downstream sessions are. We show that segmenting queries and training diﬀerent models for each can be a better ap- proach than training single model across the board

work page

[3] [3]

Apart from using typical query features, product fea- tures and query-product features, we propose a denois- ing autoencoder based architecture to reduce sparsity of product features and skip gram based word embed- dings for query-product features

work page

[4] [4]

Further, we study the behaviour of various target variables - CTR, Add to cart ratio, conversion and Revenue Per Impression

We demonstrate the impact of various combinations of diﬀerent types of features on the model’s perfor- mance. Further, we study the behaviour of various target variables - CTR, Add to cart ratio, conversion and Revenue Per Impression

work page

[5] [5]

We also show how our model signiﬁcantly improves NDCG compared to the baseline model built upon style popularity

We highlight the diﬀerences between broad and narrow queries in terms of modelling approach, feature impor- tance etc. We also show how our model signiﬁcantly improves NDCG compared to the baseline model built upon style popularity. 1The mean search engine ranking position for all the clicked products

work page

[6] [6]

Various LETOR models like RankNet, Lam- daMart, AdaRank and RankBoost have been compared on Web Search data [3, 5–7]

RELA TED WORK LETOR methods have demonstrated their success in web search [3, 4]. Various LETOR models like RankNet, Lam- daMart, AdaRank and RankBoost have been compared on Web Search data [3, 5–7]. Moreover search has been exten- sively studied in the e-commerce primarily from the retrieval perspective [8–12]. Karmaker et al. [2] attempted to apply LETO...

work page

[7] [7]

We have divided the queries into train and test in the ratio of 70:30

BROAD AND NARROW QUERIES We randomly sampled 100k queries and labelled them as Broad/Narrow. We have divided the queries into train and test in the ratio of 70:30. Later we have trained a SVM clas- siﬁer with radial-basis kernel on the train queries. We have considered multiple sets of features - Word2Vec of query, result set size and identiﬁed query attr...

work page

[8] [8]

Clearly, it can be attributed to unnamed queries

This corresponds to bin number 92; which further corre- sponds to a recall set size of 1910-2300. Clearly, it can be attributed to unnamed queries. So, a query with coherency score≤ 0.58 is referred to as broad query while query with coherency score > 0.58 is referred to as narrow query. The table 1 shows various statistics regarding both the segments. It...

work page 1910

[9] [9]

We use SOLR as the underlying search engine with over 3M fashion products indexed

SYSTEM DESIGN This section discusses the architecture (ﬁgure 3) involved in retrieval and ranking of search products on our platform. We use SOLR as the underlying search engine with over 3M fashion products indexed. Figure 3: Architecture When a user issues a query, the retrieval layer renders a set of top-K (typically 1000) products based on BM-25 score...

work page

[10] [10]

3The primary focus of this paper would be LETOR, instead

Catalogue Data : Structured information regarding each product’s physical features like brand, color, mrp etc. 3The primary focus of this paper would be LETOR, instead

work page

[11] [11]

Transactional Data: Product’s output business met- rics like daywise revenue, CTR etc

work page

[12] [12]

Query-Clickstream logs : Logs each query and infor- mation regarding query’s downstream sessions like prod- ucts seen (impressions), clicked, added to cart, wish- listed, liked, purchased etc

work page

[13] [13]

This section focuses on various features and target variables we used for training our LETOR models

MODEL Learning to rank is a popular approach that provides a prin- cipled way to optimize ranking of search results given various features. This section focuses on various features and target variables we used for training our LETOR models. From modelling perspective, we tried 2 pointwise models - Ran- dom Forests and Gradient Boosting Model and 2 pairwis...

work page

[14] [14]

Query Features: These are features speciﬁc to query like total length of query, number of words, is brand (eg Nike, Tommy Hilﬁger) present in query, is article type (eg Dresses, Shoes) present in query, the identiﬁed article type, brand etc

work page

[15] [15]

Product Features : These are features speciﬁc to the products (documents) They can either be popu- larity related or physical features. The popularity fea- tures include features involving past performance of the product’s brand or article type (hereafter referred to as entity) like revenue in 15 days, quantity sold in 15 days etc. It is worth mentioning ...

work page

[16] [16]

bat- man printed tshirt

Query Product Features The query product fea- tures are features which involve both query and prod- uct - eg ctr of a tshirt product when the query is “bat- man printed tshirt”. The query product features can again be of 2 diﬀerent types: popularity based and relevance based. The popularity features include past performance of product’s entity as a result...

work page

[17] [17]

Better the ranking, higher would be the CTR CTRqp = Cqp Iqp (5)

Click-Through Rate It is the probability of clicking on the listpage (the page as a result of search query). Better the ranking, higher would be the CTR CTRqp = Cqp Iqp (5)

work page

[18] [18]

It is the perceived utility of click page

Add to cart Rate It is the probability of adding a product to the cart post the click. It is the perceived utility of click page. ATCRqp = Bqp Cqp (6)

work page

[19] [19]

This can be considered as the overall satisfaction of the user

Conversion It is the probability of purchasing a prod- uct from listpage. This can be considered as the overall satisfaction of the user. Convqp = Bqp Iqp (7)

work page

[20] [20]

RPIqp = Rqp Iqp (8)

Revenue per Impression It refers to the overall business value (revenue) from each impression as a re- sult of the query. RPIqp = Rqp Iqp (8)

work page

[21] [21]

We have classiﬁed them into broad and narrow using on the model described in Section 3

RESULTS 6.1 Dataset We randomly sampled 100k queries. We have classiﬁed them into broad and narrow using on the model described in Section 3. We resulted 13k broad queries and 87k narrow queries. We sampled queries in 80-20 proportion in strat- iﬁed manner to collate train (broad and narrow) and test (broad and narrow) queries. 6.1.1 Train and Test Data W...

work page

[22] [22]

Letip be the predicted rank position for each product andii be the ideal rank position

products for the query and compute relevance scores for each query-product as above. Letip be the predicted rank position for each product andii be the ideal rank position. Now for each query q, compute DCG = ∑ p 2relqp−1 log2(ip + 1) (10) IDCG = ∑ p 2relqp−1 log2(ii + 1) (11) NDCG = DCG IDCG (12) 6.2 Cross Target Learning In e-commerce, choosing one targ...

work page

[23] [23]

We proposed a notion of coherency score and used it to seg- ment queries into broad and narrow

CONCLUSIONS We presented a framework for building LETOR models for an e-commerce platform - speciﬁcally for theunnamed queries. We proposed a notion of coherency score and used it to seg- ment queries into broad and narrow. We discussed the chal- lenges involved in feature representation (query, product and query-product) and target metrics (ctr,atcr,conv...

work page

[24] [24]

Did We Get It Right? Predicting Query Performance in E-commerce Search

R. Kumar, M. Kumar, N. Shah, and C. Faloutsos, “Did we get it right? predicting query performance in e-commerce search,” arXiv preprint arXiv:1808.00239 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

On application of learning to rank for e-commerce search,

S. K. Karmaker Santu, P. Sondhi, and C. Zhai, “On application of learning to rank for e-commerce search,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval. ACM, 2017, pp. 475–484

work page 2017

[26] [26]

Yahoo! learning to rank challenge overview,

O. Chapelle and Y. Chang, “Yahoo! learning to rank challenge overview,” in Proceedings of the Learning to Rank Challenge , 2011, pp. 1–24

work page 2011

[27] [27]

Advances in formal mod- els of search and search behaviour,

L. Azzopardi and G. Zuccon, “Advances in formal mod- els of search and search behaviour,” in Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval . ACM, 2016, pp. 1–4

work page 2016

[28] [28]

Learning to rank for information retrieval and natural language processing,

H. Li, “Learning to rank for information retrieval and natural language processing,” Synthesis Lectures on Human Language Technologies, vol. 7, no. 3, pp. 1–121, 2014

work page 2014

[29] [29]

Learning to rank for information re- trieval,

T.-Y. Liu et al. , “Learning to rank for information re- trieval,” Foundations and Trends R⃝ in Information Re- trieval, vol. 3, no. 3, pp. 225–331, 2009

work page 2009

[30] [30]

From ranknet to lambdarank to lamb- damart: An overview

C. J. Burges, “From ranknet to lambdarank to lamb- damart: An overview.”

work page

[31] [31]

Diversifying search results,

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, “Diversifying search results,” in Proceedings of the sec- ond ACM international conference on web search and data mining . ACM, 2009, pp. 5–14

work page 2009

[32] [32]

Towards a theory model for product search,

B. Li, A. Ghose, and P. G. Ipeirotis, “Towards a theory model for product search,” in Proceedings of the 20th international conference on World wide web . ACM, 2011, pp. 327–336

work page 2011

[33] [33]

En- hancing product search by best-selling prediction in e- commerce,

B. Long, J. Bian, A. Dong, and Y. Chang, “En- hancing product search by best-selling prediction in e- commerce,” in Proceedings of the 21st ACM interna- tional conference on Information and knowledge man- agement. ACM, 2012, pp. 2479–2482

work page 2012

[34] [34]

Learning latent vector spaces for product search,

C. Van Gysel, M. de Rijke, and E. Kanoulas, “Learning latent vector spaces for product search,” in Proceedings of the 25th ACM International on Conference on Infor- mation and Knowledge Management . ACM, 2016, pp. 165–174

work page 2016

[35] [35]

Latent dirichlet allocation based diversiﬁed retrieval for e-commerce search,

J. Yu, S. Mohan, D. P. Putthividhya, and W.-K. Wong, “Latent dirichlet allocation based diversiﬁed retrieval for e-commerce search,” in Proceedings of the 7th ACM international conference on Web search and data min- ing. ACM, 2014, pp. 463–472

work page 2014

[36] [36]

Narrow or broad?: Estimating subjec- tive speciﬁcity in exploratory search,

K. Athukorala, A. Oulasvirta, D. G lowacka, J. Vreeken, and G. Jacucci, “Narrow or broad?: Estimating subjec- tive speciﬁcity in exploratory search,” in Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management . ACM, 2014, pp. 819–828

work page 2014

[37] [37]

Query ambigu- ity identiﬁcation based on user behavior information,

C. Luo, Y. Liu, M. Zhang, and S. Ma, “Query ambigu- ity identiﬁcation based on user behavior information,” in Asia Information Retrieval Symposium . Springer, 2014, pp. 36–47

work page 2014

[38] [38]

Decoding fashion contexts using word embeddings

S. Arora and D. Warrier, “Decoding fashion contexts using word embeddings.”

work page

[39] [39]

Wordnet: a lexical database for english,

G. A. Miller, “Wordnet: a lexical database for english,” Communications of the ACM , vol. 38, no. 11, pp. 39–41, 1995

work page 1995