pith. sign in

arxiv: 2004.14503 · v3 · pith:7GCPOYTZnew · submitted 2020-04-29 · 💻 cs.IR · cs.CL

Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation

classification 💻 cs.IR cs.CL
keywords retrievaldomaingenerationlargemodelsneuralpassagequestion
0
0 comments X
read the original abstract

A major obstacle to the wide-spread adoption of neural retrieval models is that they require large supervised training sets to surpass traditional term-based techniques, which are constructed from raw corpora. In this paper, we propose an approach to zero-shot learning for passage retrieval that uses synthetic question generation to close this gap. The question generation system is trained on general domain data, but is applied to documents in the targeted domain. This allows us to create arbitrarily large, yet noisy, question-passage relevance pairs that are domain specific. Furthermore, when this is coupled with a simple hybrid term-neural model, first-stage retrieval performance can be improved further. Empirically, we show that this is an effective strategy for building neural passage retrieval models in the absence of large training corpora. Depending on the domain, this technique can even approach the accuracy of supervised models.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. UnIte: Uncertainty-based Iterative Document Sampling for Domain Adaptation in Information Retrieval

    cs.IR 2026-04 unverdicted novelty 7.0

    UnIte selects target-domain documents for pseudo-query generation by filtering high aleatoric uncertainty and prioritizing high epistemic uncertainty, yielding +2.45 to +3.49 nDCG@10 gains on BEIR with ~4k samples.