pith. machine review for the scientific record. sign in

arxiv: 1811.08008 · v1 · submitted 2018-11-19 · 💻 cs.IR · cs.CL· cs.LG

Recognition: unknown

End-to-End Retrieval in Continuous Space

Authors on Pith no claims yet
classification 💻 cs.IR cs.CLcs.LG
keywords retrievalcontinuousdiscretemodelssystemsembeddingsend-to-endindex
0
0 comments X
read the original abstract

Most text-based information retrieval (IR) systems index objects by words or phrases. These discrete systems have been augmented by models that use embeddings to measure similarity in continuous space. But continuous-space models are typically used just to re-rank the top candidates. We consider the problem of end-to-end continuous retrieval, where standard approximate nearest neighbor (ANN) search replaces the usual discrete inverted index, and rely entirely on distances between learned embeddings. By training simple models specifically for retrieval, with an appropriate model architecture, we improve on a discrete baseline by 8% and 26% (MAP) on two similar-question retrieval tasks. We also discuss the problem of evaluation for retrieval systems, and show how to modify existing pairwise similarity datasets for this purpose.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Atlas: Few-shot Learning with Retrieval Augmented Language Models

    cs.CL 2022-08 unverdicted novelty 6.0

    Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.

  2. Unsupervised Dense Information Retrieval with Contrastive Learning

    cs.IR 2021-12 unverdicted novelty 6.0

    Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.