Learning to Rank Broad and Narrow Queries in E-Commerce
Pith reviewed 2026-05-25 11:23 UTC · model grok-4.3
The pith
Specialized models for broad and narrow queries outperform a combined model in fashion search.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that, on fashion-category data, distinct pointwise and pairwise LETOR models trained on broad queries alone and on narrow queries alone outperform a single combined model trained on all queries. Query segmentation is performed by analyzing user intent, features are drawn from query, product, and query-product sources, and sparsity is mitigated with a denoising auto-encoder for product features plus skip-gram embeddings for query-product matching. Multiple target metrics are compared for robustness.
What carries the argument
A query-segmentation mechanism that divides queries into broad versus narrow categories on the basis of user intent, used to train separate pointwise and pairwise learning-to-rank models.
If this is right
- Feature importance patterns differ between broad-query and narrow-query models.
- Target metrics can be evaluated for stability when ranking is split by query type.
- Sparsity-handling techniques enable the use of otherwise unusable product and query features.
- Pointwise and pairwise training both benefit from the segmentation step.
Where Pith is reading between the lines
- The segmentation step could be applied to non-fashion verticals if intent signals remain consistent.
- Real-time query classification would be required for the specialized models to be deployed at scale.
- Conversion or revenue metrics might improve if the ranking objective is aligned with the same broad-narrow split.
Load-bearing premise
The proposed way of dividing queries into broad versus narrow categories correctly reflects user intent and remains stable across product categories and time periods.
What would settle it
Train a single combined model on the full fashion query set and show that its ranking quality on a held-out test set equals or exceeds the quality of the two specialized models.
Figures
read the original abstract
Search is a prominent channel for discovering products on an e-commerce platform. Ranking products retrieved from search becomes crucial to address customer's need and optimize for business metrics. While learning to Rank (LETOR) models have been extensively studied and have demonstrated efficacy in the context of web search; it is a relatively new research area to be explored in the e-commerce. In this paper, we present a framework for building LETOR model for an e-commerce platform. We analyze user queries and propose a mechanism to segment queries between broad and narrow based on user's intent. We discuss different types of features - query, product and query-product and discuss challenges in using them. We show that sparsity in product features can be tackled through a denoising auto-encoder while skip-gram based word embeddings help solve the query-product sparsity issues. We also present various target metrics that can be employed for evaluating search results and compare their robustness. Further, we build and compare performances of both pointwise and pairwise LETOR models on fashion category data set. We also build and compare distinct models for broad and narrow queries, analyze feature importance across these and show that these specialized models perform better than a combined model in the fashion world.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a LETOR framework for e-commerce search ranking. It proposes a mechanism to segment queries into broad versus narrow based on user intent, discusses query/product/query-product features and sparsity mitigation via denoising auto-encoders and skip-gram embeddings, compares target metrics, and evaluates pointwise and pairwise models on fashion data. The central empirical claim is that distinct models trained on broad and narrow queries outperform a single combined model.
Significance. If the segmentation is shown to be valid and the gains are robust, the result would offer a practical way to improve ranking quality in e-commerce by tailoring models to query breadth, with direct business relevance for fashion verticals. The sparsity-handling techniques are standard but appropriately applied; the comparison of evaluation metrics is a secondary contribution.
major comments (2)
- [Query segmentation mechanism (described in abstract and methods)] The headline result (specialized broad/narrow models outperform the combined model) is load-bearing on the claim that the proposed segmentation accurately captures user intent. The manuscript describes a mechanism but supplies no quantitative validation such as agreement with human labels, temporal stability, or cross-vertical consistency; without this, any reported lift could be an artifact of the partition rule rather than genuine intent differences.
- [Experimental results and evaluation (abstract and results sections)] The abstract states that comparative experiments were performed on fashion data and that specialized models perform better, yet reports no performance numbers, dataset sizes, error bars, or statistical tests. This absence prevents verification of effect sizes or robustness to metric selection and post-hoc choices.
minor comments (1)
- [Abstract] The abstract would benefit from a one-sentence description of the segmentation heuristic and the magnitude of the observed improvements.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the presentation and claims. We respond to each major comment below and commit to revisions where appropriate.
read point-by-point responses
-
Referee: [Query segmentation mechanism (described in abstract and methods)] The headline result (specialized broad/narrow models outperform the combined model) is load-bearing on the claim that the proposed segmentation accurately captures user intent. The manuscript describes a mechanism but supplies no quantitative validation such as agreement with human labels, temporal stability, or cross-vertical consistency; without this, any reported lift could be an artifact of the partition rule rather than genuine intent differences.
Authors: We agree that the segmentation's validity is central to the interpretation of results. The manuscript describes a rule-based mechanism using query characteristics to approximate intent differences, and the observed performance gains provide supporting evidence. However, we acknowledge the absence of direct quantitative validation. In the revised manuscript we will add an analysis of agreement between the segmentation and human labels on a sampled set of queries, along with checks for temporal stability. revision: yes
-
Referee: [Experimental results and evaluation (abstract and results sections)] The abstract states that comparative experiments were performed on fashion data and that specialized models perform better, yet reports no performance numbers, dataset sizes, error bars, or statistical tests. This absence prevents verification of effect sizes or robustness to metric selection and post-hoc choices.
Authors: We agree that the experimental reporting requires more detail for verifiability. While the results section contains model comparisons, specific numerical values, dataset sizes, error bars, and statistical tests are not presented with sufficient prominence. In the revision we will incorporate concrete performance numbers, dataset statistics, standard errors, and significance tests into both the abstract and results sections. revision: yes
Circularity Check
No significant circularity; empirical model comparison on held-out data
full rationale
The paper describes an empirical workflow: a segmentation heuristic for broad vs. narrow queries, feature engineering (including auto-encoders and embeddings), and training/comparison of pointwise and pairwise LETOR models on fashion data, with performance evaluated on held-out sets. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim (specialized models outperform a combined model) is a direct empirical result rather than a reduction to inputs by construction. The segmentation step is an input assumption whose validity is external to the reported metrics, but this does not create definitional or self-referential circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption User queries can be meaningfully segmented into broad and narrow based on intent
Reference graph
Works this paper leans on
-
[1]
Learning to Rank Broad and Narrow Queries in E-Commerce
INTRODUCTION Users on an e-commerce platform typically discover prod- ucts through search, browsing categories or marketing cam- paigns. On our platform, search functionality is key to prod- uct discovery as each of these channels translates to a search query in the back-end. Search ranking is a critical aspect of our business. Hence any improvement in th...
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[2]
We provide an approach to segment our queries into broad and narrow basis how coherent the downstream sessions are. We show that segmenting queries and training different models for each can be a better ap- proach than training single model across the board
-
[3]
Apart from using typical query features, product fea- tures and query-product features, we propose a denois- ing autoencoder based architecture to reduce sparsity of product features and skip gram based word embed- dings for query-product features
-
[4]
We demonstrate the impact of various combinations of different types of features on the model’s perfor- mance. Further, we study the behaviour of various target variables - CTR, Add to cart ratio, conversion and Revenue Per Impression
-
[5]
We highlight the differences between broad and narrow queries in terms of modelling approach, feature impor- tance etc. We also show how our model significantly improves NDCG compared to the baseline model built upon style popularity. 1The mean search engine ranking position for all the clicked products
-
[6]
RELA TED WORK LETOR methods have demonstrated their success in web search [3, 4]. Various LETOR models like RankNet, Lam- daMart, AdaRank and RankBoost have been compared on Web Search data [3, 5–7]. Moreover search has been exten- sively studied in the e-commerce primarily from the retrieval perspective [8–12]. Karmaker et al. [2] attempted to apply LETO...
-
[7]
We have divided the queries into train and test in the ratio of 70:30
BROAD AND NARROW QUERIES We randomly sampled 100k queries and labelled them as Broad/Narrow. We have divided the queries into train and test in the ratio of 70:30. Later we have trained a SVM clas- sifier with radial-basis kernel on the train queries. We have considered multiple sets of features - Word2Vec of query, result set size and identified query attr...
-
[8]
Clearly, it can be attributed to unnamed queries
This corresponds to bin number 92; which further corre- sponds to a recall set size of 1910-2300. Clearly, it can be attributed to unnamed queries. So, a query with coherency score≤ 0.58 is referred to as broad query while query with coherency score > 0.58 is referred to as narrow query. The table 1 shows various statistics regarding both the segments. It...
work page 1910
-
[9]
We use SOLR as the underlying search engine with over 3M fashion products indexed
SYSTEM DESIGN This section discusses the architecture (figure 3) involved in retrieval and ranking of search products on our platform. We use SOLR as the underlying search engine with over 3M fashion products indexed. Figure 3: Architecture When a user issues a query, the retrieval layer renders a set of top-K (typically 1000) products based on BM-25 score...
-
[10]
3The primary focus of this paper would be LETOR, instead
Catalogue Data : Structured information regarding each product’s physical features like brand, color, mrp etc. 3The primary focus of this paper would be LETOR, instead
-
[11]
Transactional Data: Product’s output business met- rics like daywise revenue, CTR etc
-
[12]
Query-Clickstream logs : Logs each query and infor- mation regarding query’s downstream sessions like prod- ucts seen (impressions), clicked, added to cart, wish- listed, liked, purchased etc
-
[13]
This section focuses on various features and target variables we used for training our LETOR models
MODEL Learning to rank is a popular approach that provides a prin- cipled way to optimize ranking of search results given various features. This section focuses on various features and target variables we used for training our LETOR models. From modelling perspective, we tried 2 pointwise models - Ran- dom Forests and Gradient Boosting Model and 2 pairwis...
-
[14]
Query Features: These are features specific to query like total length of query, number of words, is brand (eg Nike, Tommy Hilfiger) present in query, is article type (eg Dresses, Shoes) present in query, the identified article type, brand etc
-
[15]
Product Features : These are features specific to the products (documents) They can either be popu- larity related or physical features. The popularity fea- tures include features involving past performance of the product’s brand or article type (hereafter referred to as entity) like revenue in 15 days, quantity sold in 15 days etc. It is worth mentioning ...
-
[16]
Query Product Features The query product fea- tures are features which involve both query and prod- uct - eg ctr of a tshirt product when the query is “bat- man printed tshirt”. The query product features can again be of 2 different types: popularity based and relevance based. The popularity features include past performance of product’s entity as a result...
-
[17]
Better the ranking, higher would be the CTR CTRqp = Cqp Iqp (5)
Click-Through Rate It is the probability of clicking on the listpage (the page as a result of search query). Better the ranking, higher would be the CTR CTRqp = Cqp Iqp (5)
-
[18]
It is the perceived utility of click page
Add to cart Rate It is the probability of adding a product to the cart post the click. It is the perceived utility of click page. ATCRqp = Bqp Cqp (6)
-
[19]
This can be considered as the overall satisfaction of the user
Conversion It is the probability of purchasing a prod- uct from listpage. This can be considered as the overall satisfaction of the user. Convqp = Bqp Iqp (7)
-
[20]
Revenue per Impression It refers to the overall business value (revenue) from each impression as a re- sult of the query. RPIqp = Rqp Iqp (8)
-
[21]
We have classified them into broad and narrow using on the model described in Section 3
RESULTS 6.1 Dataset We randomly sampled 100k queries. We have classified them into broad and narrow using on the model described in Section 3. We resulted 13k broad queries and 87k narrow queries. We sampled queries in 80-20 proportion in strat- ified manner to collate train (broad and narrow) and test (broad and narrow) queries. 6.1.1 Train and Test Data W...
-
[22]
Letip be the predicted rank position for each product andii be the ideal rank position
products for the query and compute relevance scores for each query-product as above. Letip be the predicted rank position for each product andii be the ideal rank position. Now for each query q, compute DCG = ∑ p 2relqp−1 log2(ip + 1) (10) IDCG = ∑ p 2relqp−1 log2(ii + 1) (11) NDCG = DCG IDCG (12) 6.2 Cross Target Learning In e-commerce, choosing one targ...
-
[23]
We proposed a notion of coherency score and used it to seg- ment queries into broad and narrow
CONCLUSIONS We presented a framework for building LETOR models for an e-commerce platform - specifically for theunnamed queries. We proposed a notion of coherency score and used it to seg- ment queries into broad and narrow. We discussed the chal- lenges involved in feature representation (query, product and query-product) and target metrics (ctr,atcr,conv...
-
[24]
Did We Get It Right? Predicting Query Performance in E-commerce Search
R. Kumar, M. Kumar, N. Shah, and C. Faloutsos, “Did we get it right? predicting query performance in e-commerce search,” arXiv preprint arXiv:1808.00239 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[25]
On application of learning to rank for e-commerce search,
S. K. Karmaker Santu, P. Sondhi, and C. Zhai, “On application of learning to rank for e-commerce search,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval. ACM, 2017, pp. 475–484
work page 2017
-
[26]
Yahoo! learning to rank challenge overview,
O. Chapelle and Y. Chang, “Yahoo! learning to rank challenge overview,” in Proceedings of the Learning to Rank Challenge , 2011, pp. 1–24
work page 2011
-
[27]
Advances in formal mod- els of search and search behaviour,
L. Azzopardi and G. Zuccon, “Advances in formal mod- els of search and search behaviour,” in Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval . ACM, 2016, pp. 1–4
work page 2016
-
[28]
Learning to rank for information retrieval and natural language processing,
H. Li, “Learning to rank for information retrieval and natural language processing,” Synthesis Lectures on Human Language Technologies, vol. 7, no. 3, pp. 1–121, 2014
work page 2014
-
[29]
Learning to rank for information re- trieval,
T.-Y. Liu et al. , “Learning to rank for information re- trieval,” Foundations and Trends R⃝ in Information Re- trieval, vol. 3, no. 3, pp. 225–331, 2009
work page 2009
-
[30]
From ranknet to lambdarank to lamb- damart: An overview
C. J. Burges, “From ranknet to lambdarank to lamb- damart: An overview.”
-
[31]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, “Diversifying search results,” in Proceedings of the sec- ond ACM international conference on web search and data mining . ACM, 2009, pp. 5–14
work page 2009
-
[32]
Towards a theory model for product search,
B. Li, A. Ghose, and P. G. Ipeirotis, “Towards a theory model for product search,” in Proceedings of the 20th international conference on World wide web . ACM, 2011, pp. 327–336
work page 2011
-
[33]
En- hancing product search by best-selling prediction in e- commerce,
B. Long, J. Bian, A. Dong, and Y. Chang, “En- hancing product search by best-selling prediction in e- commerce,” in Proceedings of the 21st ACM interna- tional conference on Information and knowledge man- agement. ACM, 2012, pp. 2479–2482
work page 2012
-
[34]
Learning latent vector spaces for product search,
C. Van Gysel, M. de Rijke, and E. Kanoulas, “Learning latent vector spaces for product search,” in Proceedings of the 25th ACM International on Conference on Infor- mation and Knowledge Management . ACM, 2016, pp. 165–174
work page 2016
-
[35]
Latent dirichlet allocation based diversified retrieval for e-commerce search,
J. Yu, S. Mohan, D. P. Putthividhya, and W.-K. Wong, “Latent dirichlet allocation based diversified retrieval for e-commerce search,” in Proceedings of the 7th ACM international conference on Web search and data min- ing. ACM, 2014, pp. 463–472
work page 2014
-
[36]
Narrow or broad?: Estimating subjec- tive specificity in exploratory search,
K. Athukorala, A. Oulasvirta, D. G lowacka, J. Vreeken, and G. Jacucci, “Narrow or broad?: Estimating subjec- tive specificity in exploratory search,” in Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management . ACM, 2014, pp. 819–828
work page 2014
-
[37]
Query ambigu- ity identification based on user behavior information,
C. Luo, Y. Liu, M. Zhang, and S. Ma, “Query ambigu- ity identification based on user behavior information,” in Asia Information Retrieval Symposium . Springer, 2014, pp. 36–47
work page 2014
-
[38]
Decoding fashion contexts using word embeddings
S. Arora and D. Warrier, “Decoding fashion contexts using word embeddings.”
-
[39]
Wordnet: a lexical database for english,
G. A. Miller, “Wordnet: a lexical database for english,” Communications of the ACM , vol. 38, no. 11, pp. 39–41, 1995
work page 1995
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.