pith. sign in

arxiv: 1907.10710 · v1 · pith:ARJEFQVQnew · submitted 2019-07-24 · 💻 cs.IR · cs.CL

Generic Intent Representation in Web Search

Pith reviewed 2026-05-24 16:30 UTC · model grok-4.3

classification 💻 cs.IR cs.CL
keywords intent representationweb searchweak supervisionclick logsquery similarityneural encodermulti-task learningnearest neighbor search
0
0 comments X

The pith

GEN Encoder maps queries sharing clicks to similar embeddings for better intent representation in search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GEN Encoder to learn a distributed representation space for user intent in web search. It trains the model end-to-end on large-scale click logs so queries with shared clicks receive similar vectors, then fine-tunes on paraphrase tasks. Experiments on query intent similarity show consistent gains over prior methods. The approach also uses nearest-neighbor search to match new queries to past ones with the same intent, cutting the number of unseen queries in half. Distances in the learned space further align with observed user behaviors across search sessions.

Core claim

GEN Encoder learns to map queries with shared clicks into similar embeddings end-to-end and then finetunes on multiple paraphrase tasks, yielding robust advantages on query intent similarity modeling while also reducing unseen queries via approximate nearest neighbor search and revealing session-level behavior patterns in embedding distances.

What carries the argument

GEN Encoder, which uses click-based weak supervision to produce embeddings that place queries with common clicks nearby in vector space, followed by multi-task fine-tuning.

If this is right

  • Ablation studies confirm that removing click-based supervision sharply reduces representation quality for user intent.
  • Multi-task learning on paraphrase data increases the generality of the learned embeddings across different tasks.
  • Approximate nearest neighbor lookup on the embeddings identifies prior queries with matching intent and halves the rate of unseen queries.
  • Embedding distances between queries correlate with information-seeking patterns observed in real search sessions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same click-supervised training could be tested on logs from other search engines to check whether the gains hold outside the original data source.
  • The embeddings might directly improve downstream components such as query reformulation or result diversification by grouping intent-similar traffic.
  • Extending the approach to session-level sequences rather than single queries could capture shifts in intent within one visit.

Load-bearing premise

Large-scale user clicks from search logs reliably indicate when two queries share the same user intent.

What would settle it

A direct test showing that queries sharing many clicks but differing in intent receive distant embeddings, or that GEN Encoder produces no measurable gain on the query intent similarity task.

Figures

Figures reproduced from arXiv: 1907.10710 by Chenyan Xiong, Corby Rosset, Hongfei Zhang, Nick Craswell, Paul N. Bennett, Saurabh Tiwary, Xia Song.

Figure 1
Figure 1. Figure 1: GEN Encoder Architecture Word Embedding maps words into a continuous space, which aligns words with similar intents, e.g. “housing” and “dorm”, and separates words with different intents, e.g. “Harvard” and “Cornell”. The query term t first goes through a standard embedding layer: t emb −−−−→ t®emb , which learns embeddings for all terms in the vocabulary. The embed￾dings are fed into a highway network for… view at source ↗
Figure 2
Figure 2. Figure 2: The distributions of tail query frequency and their [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The distributions of cosine similarities between queries in search sessions that are adjacent (1), separated by one (2) [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

This paper presents GEneric iNtent Encoder (GEN Encoder) which learns a distributed representation space for user intent in search. Leveraging large scale user clicks from Bing search logs as weak supervision of user intent, GEN Encoder learns to map queries with shared clicks into similar embeddings end-to-end and then finetunes on multiple paraphrase tasks. Experimental results on an intrinsic evaluation task - query intent similarity modeling - demonstrate GEN Encoder's robust and significant advantages over previous representation methods. Ablation studies reveal the crucial role of learning from implicit user feedback in representing user intent and the contributions of multi-task learning in representation generality. We also demonstrate that GEN Encoder alleviates the sparsity of tail search traffic and cuts down half of the unseen queries by using an efficient approximate nearest neighbor search to effectively identify previous queries with the same search intent. Finally, we demonstrate distances between GEN encodings reflect certain information seeking behaviors in search sessions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the GEneric iNtent Encoder (GEN Encoder), which learns distributed representations for user search intent. It leverages large-scale Bing click logs as weak supervision to embed queries sharing clicks nearby in an end-to-end manner, followed by fine-tuning on paraphrase tasks. Claims include robust advantages on an intrinsic query intent similarity task, ablation insights on click data and multi-task learning, alleviation of tail-query sparsity via ANN search, and reflection of session behaviors in embedding distances.

Significance. If the central claims hold after addressing evaluation gaps, the work could advance intent modeling in IR by demonstrating scalable use of implicit feedback and multi-task fine-tuning. Strengths include the scale of the click-log supervision and the practical downstream demonstrations on tail queries and sessions. These elements would be valuable if the weak-supervision proxy is shown to align with intent similarity.

major comments (2)
  1. [Abstract] Abstract and evaluation description: the claim of 'robust and significant advantages' on the query intent similarity modeling task provides no metrics, baselines, statistical tests, dataset sizes, or quantitative results; evaluation is described only at the level of task names. This directly underpins the central experimental claim of superiority.
  2. [Method (weak supervision)] Weak-supervision construction (method section): queries with shared clicks are treated as positive pairs for intent similarity, yet no analysis validates that co-clicked queries exhibit higher human-judged intent similarity than non-co-clicked ones, nor quantifies noise from position bias or result overlap. This assumption is load-bearing for both the training objective and the reported gains.
minor comments (2)
  1. [Abstract] The statement that the approach 'cuts down half of the unseen queries' lacks the precise metric, baseline, or evaluation protocol used to arrive at this figure.
  2. [Method] Notation for the embedding space and loss functions should be introduced with explicit equations rather than prose descriptions to aid clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. Below we respond point-by-point to the major comments and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract and evaluation description: the claim of 'robust and significant advantages' on the query intent similarity modeling task provides no metrics, baselines, statistical tests, dataset sizes, or quantitative results; evaluation is described only at the level of task names. This directly underpins the central experimental claim of superiority.

    Authors: We agree that the abstract would be stronger with quantitative support. The full experimental section already reports specific metrics, baselines, dataset sizes, and significance tests on the intent similarity task. We will revise the abstract to include representative numbers (e.g., relative improvements and dataset scale) while remaining within length limits. revision: yes

  2. Referee: [Method (weak supervision)] Weak-supervision construction (method section): queries with shared clicks are treated as positive pairs for intent similarity, yet no analysis validates that co-clicked queries exhibit higher human-judged intent similarity than non-co-clicked ones, nor quantifies noise from position bias or result overlap. This assumption is load-bearing for both the training objective and the reported gains.

    Authors: The comment correctly notes the absence of a direct human-judgment study validating the co-click proxy. Ablation results in the paper show that removing click-based training substantially hurts intent similarity performance, and downstream gains on tail queries and session modeling provide indirect support. We will add an explicit limitations paragraph in the method section acknowledging position bias and result overlap as sources of noise and noting that multi-task paraphrase fine-tuning is intended to increase robustness. A new large-scale human validation study is not feasible within the revision timeline. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external supervision signals

full rationale

The paper trains the GEN Encoder end-to-end by mapping queries that share clicks in Bing logs to similar embeddings, then fine-tunes on paraphrase corpora, and evaluates on a separate intrinsic query intent similarity task. No equations, self-citations, or steps are shown that reduce the learned embeddings or reported performance gains to quantities defined by the same fitted parameters or by renaming the training objective itself. The weak supervision and evaluation data are external to the model outputs, so the claimed advantages rest on independent signals rather than self-definition or fitted-input predictions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to assumptions stated there.

free parameters (1)
  • embedding dimension and training hyperparameters
    Standard neural model choices not specified in abstract but required for the reported embeddings.
axioms (1)
  • domain assumption Queries that share user clicks have similar underlying intent.
    Invoked in the abstract as the basis for weak supervision.

pith-pipeline@v0.9.0 · 5694 in / 1109 out tokens · 22439 ms · 2026-05-24T16:30:01.427038+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 3 internal anchors

  1. [1]

    Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving web search rank- ing by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 19–26

  2. [2]

    Bruce Croft

    Michael Bendersky, Donald Metzler, and W. Bruce Croft. 2011. Parameterized concept weighting in verbose queries. In Proceedings of the 34th annual interna- tional ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2011). ACM, 605–614

  3. [3]

    Andrei Broder. 2002. A taxonomy of web search. In ACM Sigir forum, Vol. 36. ACM, 3–10

  4. [4]

    Andrei Z Broder, Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja Josifovski, and Tong Zhang. 2007. Robust classification of rare queries using web Knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007). ACM, 231–238

  5. [5]

    Ben Carterette, Evangelos Kanoulas, Mark Hall, and Paul Clough. 2014. Overview of the TREC 2014 session track. InProceedings of The 23rd Text Retrieval Conference (TREC 2014)

  6. [6]

    Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al

  7. [7]

    Universal Sentence Encoder

    Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)

  8. [8]

    W Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search Engines: Information Retrieval in Practice . Addison-Wesley Reading

  9. [9]

    Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM 2018). ACM, 126–134

  10. [10]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805 (2018)

  11. [11]

    Fernando Diaz, Bhaskar Mitra, and Nick Craswell. 2016. Query Expansion with Locally-Trained Word Embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016) . 367–377

  12. [12]

    Doug Downey, Susan Dumais, and Eric Horvitz. 2007. Heads and tails: studies of web search with common and rare queries. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 847–848

  13. [13]

    Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2002. Placing search in context: The concept revisited. ACM Transactions on information systems 20, 1 (2002), 116–131

  14. [14]

    Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management . ACM, 55–64

  15. [15]

    Ahmed Hassan, Ryen W White, Susan T Dumais, and Yi-Min Wang. 2014. Strug- gling or exploring?: disambiguating long search sessions. In Proceedings of the 7th ACM international conference on Web search and data mining . ACM, 53–62

  16. [16]

    Jian Hu, Gang Wang, Fred Lochovsky, Jian-tao Sun, and Zheng Chen. 2009. Understanding user’s query intent with wikipedia. In Proceedings of the 18th international conference on World wide web . ACM, 471–480

  17. [17]

    Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM 2013) . ACM, 2333– 2338

  18. [18]

    Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush. 2016. Character- Aware Neural Language Models.. In AAAI. 2741–2749

  19. [19]

    Victor Lavrenko and W Bruce Croft. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001) . ACM, 120–127

  20. [20]

    Xiao Li, Ye-Yi Wang, and Alex Acero. 2008. Learning query intent from regularized click graphs. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval . ACM, 339–346

  21. [21]

    Yury A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence (2018)

  22. [22]

    Donald Metzler and W Bruce Croft. 2005. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005) . ACM, 472–479

  23. [23]

    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Advances in Neural Information Processing Systems 2013 (NIPS 2013). NIPS, 3111–3119

  24. [24]

    Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings ef- ficiently with noise-contrastive estimation. In Advances in neural information processing systems. 2265–2273

  25. [25]

    Eric Nalisnick, Bhaskar Mitra, Nick Craswell, and Rich Caruana. 2016. Improv- ing document ranking with dual word embeddings. In Proceedings of the 25th International Conference Companion on World Wide Web . International World Wide Web Conferences Steering Committee, 83–84

  26. [26]

    Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 257–266

  27. [27]

    Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP 2014) . 1532–1543

  28. [28]

    Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer

    Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL

  29. [29]

    Navid Rekabsaz, Mihai Lupu, Allan Hanbury, and Hamed Zamani. 2017. Word Embedding Causes Topic Shifting; Exploit Global Context!. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1105–1108

  30. [30]

    Gerard Salton and Chris Buckley. 1990. Improving retrieval performance by relevance feedback. Journal of the American society for information science 41, 4 (1990), 288–297

  31. [31]

    Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . 298–307

  32. [32]

    Dou Shen, Jian-Tao Sun, Qiang Yang, and Zheng Chen. 2006. Building bridges for web query classification. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval . ACM, 131–138

  33. [33]

    Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web . ACM, 373–374

  34. [34]

    Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training very deep networks. In Advances in neural information processing systems (NeuIPS 2015). 2377–2385

  35. [35]

    Alex Wang, Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. arXiv preprint arXiv:1804.07461 (2018)

  36. [36]

    Chenyan Xiong and Jamie Callan. 2015. Query expansion with Freebase. In Proceedings of the fifth ACM International Conference on the Theory of Information Retrieval (ICTIR 2015). ACM, 111–120

  37. [37]

    Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power

  38. [38]

    In Proceedings of the 40th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2017)

    End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2017) . ACM, 55–64

  39. [39]

    Xiaoxin Yin and Sarthak Shah. 2010. Building taxonomy of web search intents for name entity queries. In Proceedings of the 19th international conference on World wide web (WWW 2010) . ACM, 1001–1010

  40. [40]

    Hamed Zamani and W Bruce Croft. 2016. Embedding-based query language models. In Proceedings of the 2016 ACM international conference on the theory of information retrieval (ICTIR 2016) . ACM, 147–156

  41. [41]

    Hamed Zamani and W Bruce Croft. 2016. Estimating embedding vectors for queries. In Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (ICTIR 2016) . ACM, 123–132

  42. [42]

    Hamed Zamani and W Bruce Croft. 2017. Relevance-based word embedding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) . ACM, 505–514

  43. [43]

    Hamed Zamani, Bhaskar Mitra, Xia Song, Nick Craswell, and Saurabh Tiwary

  44. [44]

    In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM 2018)

    Neural Ranking Models with Multiple Document Fields. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM 2018). 700–708

  45. [45]

    Guoqing Zheng and James P. Callan. 2015. Learning to reweight terms with distributed representations. In Proceedings of the 38th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2015). ACM, 575–584