pith. machine review for the scientific record. sign in

arxiv: 2112.09118 · v4 · submitted 2021-12-16 · 💻 cs.IR · cs.AI· cs.CL

Recognition: 3 theorem links

· Lean Theorem

Unsupervised Dense Information Retrieval with Contrastive Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-12 13:15 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL
keywords dense retrievalcontrastive learningunsupervised information retrievalBEIR benchmarkcross-lingual retrievalBM25 baselinemultilingual retrieval
0
0 comments X

The pith

Contrastive learning on unlabeled text trains dense retrievers that outperform BM25 on most BEIR datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests how far contrastive learning can go in creating effective dense retrievers without any labeled training data. It finds that the resulting models achieve strong retrieval performance in many settings and surpass the unsupervised BM25 baseline on eleven of the fifteen BEIR datasets when measured by Recall at 100. A sympathetic reader would care because dense retrievers have usually demanded large amounts of supervised examples, making them impractical for new domains or languages. The work shows that contrastive pre-training can also improve results after later fine-tuning and works across languages, including cross-lingual cases where term matching fails.

Core claim

We explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual

What carries the argument

Contrastive learning on dense neural encoders trained solely on unlabeled text to produce vector representations whose similarities reflect semantic relevance.

If this is right

  • Such unsupervised dense retrievers become practical for new applications lacking labeled data.
  • Using the contrastive model as pre-training improves fine-tuned performance on both small domain-specific sets and large collections like MS MARCO.
  • The same models deliver strong results in multilingual retrieval and support cross-lingual transfer even when fine-tuned only on English data.
  • Cross-script retrieval becomes possible, such as finding English documents from Arabic queries, which term-based methods cannot do.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the contrastive representations capture semantics reliably, they could extend to other retrieval-adjacent tasks like question answering or clustering without labels.
  • Hybrid systems combining these dense vectors with sparse BM25 scores might yield further gains on heterogeneous data.
  • Applying the method to even lower-resource languages or domains would test how far the unsupervised advantage reaches.

Load-bearing premise

That the similarities learned by contrastive training on unlabeled text match human notions of relevance closely enough to beat term-frequency methods across diverse datasets and languages.

What would settle it

A new retrieval dataset or language pair on which the unsupervised contrastive model achieves lower Recall@100 than BM25 would indicate the performance does not generalize as claimed.

read the original abstract

Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised term-frequency methods such as BM25. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS~MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual transfer when fine-tuned on supervised English data only and evaluated on low resources language such as Swahili. We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes training dense retrievers in a fully unsupervised manner via contrastive learning on unlabeled text corpora, using in-batch negatives. It reports that the resulting model outperforms the BM25 baseline on Recall@100 for 11 of 15 datasets in the BEIR zero-shot benchmark. The authors further show that the same unsupervised model improves downstream performance when used as pre-training before fine-tuning on either a few thousand in-domain examples or the full MS MARCO collection, and that it yields strong results for multilingual retrieval and cross-lingual transfer (including English-to-Swahili and cross-script Arabic-to-English).

Significance. If the reported numbers hold under full experimental verification, the work is significant because it supplies concrete evidence that contrastive objectives applied to raw text can produce dense representations whose similarity aligns with relevance well enough to beat a strong term-frequency baseline across heterogeneous domains and languages. The multilingual and cross-lingual results are particularly valuable for low-resource settings. The evaluation on the public BEIR suite and the explicit pre-training gains constitute reproducible, falsifiable claims that advance the practical deployment of dense retrievers without labeled data.

major comments (2)
  1. [§3] §3 (contrastive training details): the central empirical claim depends on the precise construction of in-batch negatives and the choice of temperature and batch size; without these hyperparameters and the exact unlabeled corpus statistics, it is impossible to verify that the reported Recall@100 gains are attributable to the contrastive objective rather than to corpus or implementation specifics.
  2. [Table 1] Table 1 (BEIR results): the 11/15 outperformance claim is load-bearing, yet the table does not report variance across random seeds or the exact negative-sampling procedure per dataset; a single-run comparison leaves open the possibility that the margin over BM25 is within noise on several of the 11 datasets.
minor comments (2)
  1. [Abstract] The abstract states that the model 'outperforms BM25 on 11 out of 15 datasets' but does not name the four datasets where it does not; adding this information would improve clarity.
  2. [Method] Notation for the contrastive loss (Eq. 1 or equivalent) uses 'in-batch negatives' without an explicit formula; a short equation would remove ambiguity about whether hard negatives or only random in-batch negatives are used.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive review and recommendation of minor revision. We address each major comment below with clarifications and commitments to revisions that strengthen reproducibility without altering the core claims.

read point-by-point responses
  1. Referee: [§3] §3 (contrastive training details): the central empirical claim depends on the precise construction of in-batch negatives and the choice of temperature and batch size; without these hyperparameters and the exact unlabeled corpus statistics, it is impossible to verify that the reported Recall@100 gains are attributable to the contrastive objective rather than to corpus or implementation specifics.

    Authors: We agree that these implementation details are necessary for verification and reproducibility. In the revised manuscript we will expand Section 3 to report the exact batch size, temperature value, the construction of in-batch negatives (other in-batch examples serve as negatives for each positive pair), and the precise statistics of the unlabeled corpus used for training (size, source, and preprocessing). These additions will make it possible to confirm that performance differences arise from the contrastive objective. revision: yes

  2. Referee: [Table 1] Table 1 (BEIR results): the 11/15 outperformance claim is load-bearing, yet the table does not report variance across random seeds or the exact negative-sampling procedure per dataset; a single-run comparison leaves open the possibility that the margin over BM25 is within noise on several of the 11 datasets.

    Authors: We will revise the text and table caption to explicitly describe the negative-sampling procedure, which applies the same in-batch negative construction uniformly to every dataset. We acknowledge that multi-seed variance would increase statistical confidence. Our experiments used a single training run per configuration owing to the high computational cost of large-batch contrastive pre-training. The margins over BM25 are substantial on the majority of the 11 datasets, making noise an unlikely explanation, yet we will add an explicit discussion of this limitation in the revised paper. revision: partial

standing simulated objections not resolved
  • Reporting variance across random seeds for the unsupervised models in Table 1, as multiple independent training runs were not performed.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical study applying standard contrastive learning (in-batch negatives on unlabeled text) to produce dense retrievers. All performance claims are measured on the external held-out BEIR benchmark (and cross-lingual sets) that play no role in training or hyperparameter selection. No derivation, equation, or self-citation reduces the central result to a fitted quantity or to a prior result by the same authors; the contrastive objective and evaluation protocol are independent of the reported Recall@100 numbers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Central claim rests on the effectiveness of standard contrastive objectives for learning semantic embeddings from unlabeled corpora; no explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5565 in / 1098 out tokens · 93975 ms · 2026-05-12T13:15:59.659047+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Cost.FunctionalEquation washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100.

  • Foundation.LawOfExistence existence_economically_inevitable echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    Contrastive learning is an approach that relies on the fact that every document is, in some way, unique. This signal is the only information available in the absence of manual supervision.

  • Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The relevance score between a query and a document is given by the dot product between their representations after applying the encoder.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 32 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

    cs.CL 2026-05 unverdicted novelty 8.0

    REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...

  2. MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

    cs.AI 2026-04 accept novelty 8.0

    MathNet delivers the largest multilingual Olympiad math dataset and benchmarks where models like Gemini-3.1-Pro reach 78% on solving but embedding models struggle on equivalent problem retrieval, with retrieval augmen...

  3. MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents

    cs.MA 2026-05 unverdicted novelty 7.0

    MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.

  4. Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders

    cs.CL 2026-05 unverdicted novelty 7.0

    EPIC trains LLMs to treat continuous embeddings as in-context prompts, yielding state-of-the-art text embedding performance on MTEB with or without prompts at inference and lower compute.

  5. Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

    cs.CL 2026-05 unverdicted novelty 7.0

    MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

  6. Skill Retrieval Augmentation for Agentic AI

    cs.CL 2026-04 unverdicted novelty 7.0

    Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.

  7. Bridging the Long-Tail Gap: Robust Retrieval-Augmented Relation Completion via Multi-Stage Paraphrase Infusion

    cs.CL 2026-04 unverdicted novelty 7.0

    RC-RAG boosts long-tail relation completion by infusing paraphrases into RAG stages, yielding up to 40.6 EM gains on benchmarks across five LLMs with no fine-tuning.

  8. Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation

    cs.IR 2026-04 unverdicted novelty 7.0

    An LLM simulation framework generates multilingual tip-of-the-tongue queries, validated by rank correlation with real queries, producing the first large-scale ToT benchmarks for four languages.

  9. HaS: Accelerating RAG through Homology-Aware Speculative Retrieval

    cs.IR 2026-04 unverdicted novelty 7.0

    HaS accelerates RAG retrieval via homology-aware speculative retrieval and homologous query re-identification validation, cutting latency 24-37% with 1-2% accuracy drop on tested datasets.

  10. Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

    cs.CR 2026-04 unverdicted novelty 7.0

    DEJA uses evolutionary optimization guided by an LLM-based Answer Utility Score to induce soft-failure responses in RAG systems, achieving over 79% soft attack success rate with under 15% hard failures and high stealt...

  11. C-Pack: Packed Resources For General Chinese Embeddings

    cs.CL 2023-09 accept novelty 7.0

    C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.

  12. SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory

    cs.AI 2026-05 unverdicted novelty 6.0

    SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and l...

  13. The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory

    cs.LG 2026-05 unverdicted novelty 6.0

    Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.

  14. Reproducing Complex Set-Compositional Information Retrieval

    cs.CL 2026-05 unverdicted novelty 6.0

    Neural retrievers that double BM25 performance on QUEST collapse below 0.02 Recall@100 on the new LIMIT+ benchmark while lexical methods reach 0.96, with all methods degrading as compositional depth increases.

  15. RAG over Thinking Traces Can Improve Reasoning Tasks

    cs.IR 2026-05 unverdicted novelty 6.0

    RAG over structured thinking traces boosts LLM reasoning on AIME, LiveCodeBench, and GPQA, with relative gains up to 56% and little added cost.

  16. Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning

    cs.CL 2026-05 unverdicted novelty 6.0

    Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.

  17. Beyond Semantic Relevance: Counterfactual Risk Minimization for Robust Retrieval-Augmented Generation

    cs.CL 2026-05 unverdicted novelty 6.0

    CoRM-RAG uses a cognitive perturbation protocol to simulate biases and trains an Evidence Critic to retrieve documents that support correct decisions even under adversarial query changes.

  18. CleanBase: Detecting Malicious Documents in RAG Knowledge Databases

    cs.CR 2026-05 unverdicted novelty 6.0

    CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.

  19. UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels

    cs.LG 2026-04 unverdicted novelty 6.0

    UniCon unifies contrastive alignment across encoders and alignment types using kernels to enable exact closed-form updates instead of stochastic optimization.

  20. ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval

    cs.IR 2026-04 unverdicted novelty 6.0

    ARHN refines hard-negative training data for dense retrieval by using LLMs to convert answer-containing passages into additional positives and exclude answer-containing passages from the negative set.

  21. Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory

    cs.IR 2026-04 unverdicted novelty 6.0

    ACGM learns task-adaptive sparse graphs over multi-modal agent histories via policy-gradient optimization, reaching 82.7 nDCG@10 and 89.2% Precision@10 on WebShop, VisualWebArena, and Mind2Web while outperforming 19 b...

  22. Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization

    cs.CL 2026-04 unverdicted novelty 6.0

    Personalized RewardBench reveals that state-of-the-art reward models reach only 75.94% accuracy on personalized preferences and shows stronger correlation with downstream BoN and PPO performance than prior benchmarks.

  23. Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers

    cs.IR 2026-04 unverdicted novelty 6.0

    Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.

  24. Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity

    cs.IR 2026-04 conditional novelty 6.0

    Generative retrieval beats dense retrieval and BM25 on the LIMIT dataset but degrades with hard negatives due to identifier ambiguity during decoding.

  25. Are LLM-Based Retrievers Worth Their Cost? An Empirical Study of Efficiency, Robustness, and Reasoning Overhead

    cs.IR 2026-04 accept novelty 6.0

    Empirical comparison across 14 retrievers on the BRIGHT benchmark shows reasoning-specialized models can match strong accuracy with competitive speed while many large LLM bi-encoders add latency for small gains and co...

  26. NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

    cs.CL 2024-05 accept novelty 6.0

    NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.

  27. MemGPT: Towards LLMs as Operating Systems

    cs.AI 2023-10 unverdicted novelty 6.0

    MemGPT uses OS-inspired virtual context management to extend LLM context windows for large document analysis and long-term multi-session chat.

  28. RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering

    eess.SP 2026-04 unverdicted novelty 5.0

    RECIPER improves procedure-oriented retrieval from materials papers by combining paragraph-level dense retrieval with LLM-extracted procedural summaries and lightweight reranking, yielding average gains of +3.73 Recal...

  29. Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions

    cs.IR 2026-04 accept novelty 5.0

    ConstBERT and ColBERT-v2 reproduce on MS-MARCO but drop 86-97% on long queries because MaxSim cannot filter filler noise, and extra fine-tuning or backend changes do not overcome the architectural constraint.

  30. Multilingual E5 Text Embeddings: A Technical Report

    cs.CL 2024-02 unverdicted novelty 5.0

    Open-source multilingual E5 embedding models are trained via contrastive pre-training on 1 billion text pairs and fine-tuning, with an instruction-tuned model matching English SOTA performance.

  31. Text Embeddings by Weakly-Supervised Contrastive Pre-training

    cs.CL 2022-12 unverdicted novelty 5.0

    E5 text embeddings trained with weakly-supervised contrastive pre-training on CCPairs outperform BM25 on BEIR zero-shot and achieve top results on MTEB, beating much larger models.

  32. Galactica: A Large Language Model for Science

    cs.CL 2022-11 unverdicted novelty 5.0

    Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

Reference graph

Works this paper leans on

148 extracted references · 148 canonical work pages · cited by 32 Pith papers · 15 internal anchors

  1. [1]

    Salient phrase aware dense retrieval: Can a dense retriever imitate a sparse one?, 2021

    Chen, Xilun and Lakhotia, Kushal and Oğuz, Barlas and Gupta, Anchit and Lewis, Patrick and Peshterliev, Stan and Mehdad, Yashar and Gupta, Sonal and Yih, Wen-tau , title =. doi:10.48550/ARXIV.2110.06918 , url =

  2. [2]

    Learning to retrieve passages without supervision, 2021

    Ram, Ori and Shachaf, Gal and Levy, Omer and Berant, Jonathan and Globerson, Amir , title =. doi:10.48550/ARXIV.2112.07708 , url =

  3. [4]

    Unsupervised Cross-lingual Representation Learning at Scale , journal =

    Alexis Conneau and Kartikay Khandelwal and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek and Francisco Guzm. Unsupervised Cross-lingual Representation Learning at Scale , journal =. 2019 , url =

  4. [5]

    CoRR , volume =

    Akari Asai and Xinyan Yu and Jungo Kasai and Hannaneh Hajishirzi , title =. CoRR , volume =. 2021 , url =

  5. [6]

    CoRR , volume =

    Shayne Longpre and Yi Lu and Joachim Daiber , title =. CoRR , volume =. 2020 , url =

  6. [7]

    CoRR , volume =

    Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin , title =. CoRR , volume =. 2021 , url =

  7. [8]

    Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki , title =

    Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki , title =. CoRR , volume =. 2020 , url =

  8. [14]

    Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Ves and Zettlemoyer, Luke , journal=

  9. [15]

    arXiv preprint arXiv:2007.00814 , year=

    Relevance-guided Supervision for OpenQA with ColBERT , author=. arXiv preprint arXiv:2007.00814 , year=

  10. [16]

    Bruce , title =

    Dehghani, Mostafa and Zamani, Hamed and Severyn, Aliaksei and Kamps, Jaap and Croft, W. Bruce , title =. 2017 , booktitle =

  11. [17]

    BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proc. NAACL. 2019

  12. [18]

    Deep Contextualized Word Representations

    Peters, Matthew and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke. Deep Contextualized Word Representations. Proc. NAACL. 2018

  13. [19]

    Chen, Danqi and Fisch, Adam and Weston, Jason and Bordes, Antoine , booktitle =. Reading

  14. [20]

    How Much Knowledge Can You Pack Into the Parameters of a Language Model?

    How Much Knowledge Can You Pack Into the Parameters of a Language Model? , author=. arXiv preprint arXiv:2002.08910 , year=

  15. [21]

    Talmor, Alon and Elazar, Yanai and Goldberg, Yoav and Berant, Jonathan , journal=. o

  16. [22]

    F., Araki, J

    How Can We Know What Language Models Know? , author=. arXiv preprint arXiv:1911.12543 , year=

  17. [23]

    Language Models as Knowledge Bases?

    Petroni, Fabio and Rockt. Language Models as Knowledge Bases?. Proc. EMNLP-IJCNLP. 2019

  18. [24]

    OpenAI Technical Report , year=

    Language models are unsupervised multitask learners , author=. OpenAI Technical Report , year=

  19. [25]

    Language Models are Few-Shot Learners

    Language models are few-shot learners , author=. arXiv preprint arXiv:2005.14165 , year=

  20. [26]

    arXiv preprint arXiv:1911.03868 , year=

    Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering , author=. arXiv preprint arXiv:1911.03868 , year=

  21. [27]

    Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , author=. Proc. ICLR , year=

  22. [28]

    arXiv preprint arXiv:2004.07202 , year=

    Entities as experts: Sparse memory access with entity supervision , author=. arXiv preprint arXiv:2004.07202 , year=

  23. [29]

    arXiv preprint arXiv:2005.04611 , year=

    How Context Affects Language Models' Factual Predictions , author=. arXiv preprint arXiv:2005.04611 , year=

  24. [30]

    Latent Retrieval for Weakly Supervised Open Domain Question Answering

    Lee, Kenton and Chang, Ming-Wei and Toutanova, Kristina. Latent Retrieval for Weakly Supervised Open Domain Question Answering. Proc. ACL. 2019

  25. [33]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. arXiv preprint arXiv:2005.11401 , year=

  26. [34]

    arXiv preprint arXiv:1911.02896 , year=

    Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering , author=. arXiv preprint arXiv:1911.02896 , year=

  27. [35]

    Multi-passage BERT : A Globally Normalized BERT Model for Open-domain Question Answering

    Wang, Zhiguo and Ng, Patrick and Ma, Xiaofei and Nallapati, Ramesh and Xiang, Bing. Multi-passage BERT : A Globally Normalized BERT Model for Open-domain Question Answering. Proc. EMNLP-IJCNLP. 2019

  28. [36]

    End-to-End Open-Domain Question Answering with BERT serini

    Yang, Wei and Xie, Yuqing and Lin, Aileen and Li, Xingyu and Tan, Luchen and Xiong, Kun and Li, Ming and Lin, Jimmy. End-to-End Open-Domain Question Answering with BERT serini. Proc. NAACL (Demonstrations). 2019

  29. [37]

    R ^3 : Reinforced ranker-reader for open-domain question answering , author=. Proc. AAAI , year=

  30. [38]

    Simple and Effective Multi-Paragraph Reading Comprehension

    Clark, Christopher and Gardner, Matt. Simple and Effective Multi-Paragraph Reading Comprehension. Proc. ACL. 2018

  31. [39]

    Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering , author=. Proc. ICLR , year=

  32. [40]

    Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering

    Lee, Jinhyuk and Yun, Seongjun and Kim, Hyunjae and Ko, Miyoung and Kang, Jaewoo. Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering. Proc. EMNLP. 2018

  33. [41]

    A Discrete Hard EM Approach for Weakly Supervised Question Answering

    Min, Sewon and Chen, Danqi and Hajishirzi, Hannaneh and Zettlemoyer, Luke. A Discrete Hard EM Approach for Weakly Supervised Question Answering. Proc. EMNLP-IJCNLP. 2019

  34. [42]

    arXiv preprint arXiv:1909.08041 , year=

    Revealing the importance of semantic retrieval for machine reading at scale , author=. arXiv preprint arXiv:1909.08041 , year=

  35. [43]

    Improving

    Grave, Edouard and Joulin, Armand and Usunier, Nicolas , date =. Improving

  36. [44]

    Generalization through

    Khandelwal, Urvashi and Levy, Omer and Jurafsky, Dan and Zettlemoyer, Luke and Lewis, Mike , date =. Generalization through

  37. [45]

    Adaptive

    Sukhbaatar, Sainbayar and Grave, Edouard and Bojanowski, Piotr and Joulin, Armand , date =. Adaptive

  38. [46]

    Advances in Neural Information Processing Systems 30 , pages =

    Attention is All you Need , author =. Advances in Neural Information Processing Systems 30 , pages =

  39. [47]

    Okapi at

    Robertson, Stephen E and Walker, Steve and Jones, Susan and Hancock-Beaulieu, Micheline M and Gatford, Mike and others , journal=. Okapi at

  40. [48]

    Voorhees, Ellen M and others , booktitle=. The

  41. [49]

    Toutanova and Llion Jones and Ming-Wei Chang and Andrew Dai and Jakob Uszkoreit and Quoc Le and Slav Petrov , year =

    Tom Kwiatkowski and Jennimaria Palomaki and Olivia Redfield and Michael Collins and Ankur Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Matthew Kelcey and Jacob Devlin and Kenton Lee and Kristina N. Toutanova and Llion Jones and Ming-Wei Chang and Andrew Dai and Jakob Uszkoreit and Quoc Le and Slav Petrov , year =. Natural

  42. [51]

    and Zettlemoyer, Luke , title =

    Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke , title =. Proc. ACL , year =

  43. [52]

    SQ u AD : 100,000+ Questions for Machine Comprehension of Text

    Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy. SQ u AD : 100,000+ Questions for Machine Comprehension of Text. Proc. EMNLP. 2016

  44. [53]

    Ko. The. TACL , year=

  45. [54]

    Reddy, Siva and Chen, Danqi and Manning, Christopher D , journal=

  46. [55]

    ELI 5: Long Form Question Answering

    Fan, Angela and Jernite, Yacine and Perez, Ethan and Grangier, David and Weston, Jason and Auli, Michael. ELI 5: Long Form Question Answering. Proc. ACL. 2019

  47. [56]

    Adam: A Method for Stochastic Optimization

    Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

  48. [57]

    gradient descent

    AmbigQA: Answering Ambiguous Open-domain Questions , author=. arXiv preprint arXiv:2004.10645 , year=

  49. [58]

    Bruce , title =

    Dehghani, Mostafa and Zamani, Hamed and Severyn, Aliaksei and Kamps, Jaap and Croft, W. Bruce , title =. 2017 , publisher =. doi:10.1145/3077136.3080832 , booktitle =

  50. [59]

    Journal of documentation , year=

    A statistical interpretation of term specificity and its application in retrieval , author=. Journal of documentation , year=

  51. [60]

    Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages=

    Learning deep structured semantic models for web search using clickthrough data , author=. Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages=

  52. [61]

    Proceedings of the 23rd international conference on world wide web , pages=

    Learning semantic representations using convolutional neural networks for web search , author=. Proceedings of the 23rd international conference on world wide web , pages=

  53. [62]

    Proceedings of the 23rd ACM international conference on conference on information and knowledge management , pages=

    A latent semantic model with convolutional-pooling structure for information retrieval , author=. Proceedings of the 23rd ACM international conference on conference on information and knowledge management , pages=

  54. [63]

    IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=

    Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval , author=. IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=. 2016 , publisher=

  55. [68]

    IEEE Transactions on Big Data , year=

    Billion-scale similarity search with GPUs , author=. IEEE Transactions on Big Data , year=

  56. [70]

    2009 , publisher=

    The probabilistic relevance framework: BM25 and beyond , author=. 2009 , publisher=

  57. [71]

    Journal of the American society for information science , volume=

    Indexing by latent semantic analysis , author=. Journal of the American society for information science , volume=. 1990 , publisher=

  58. [73]

    Foundations and Trends

    An introduction to neural information retrieval , author=. Foundations and Trends. 2018 , publisher=

  59. [74]

    2008 , publisher=

    Introduction to information retrieval , author=. 2008 , publisher=

  60. [77]

    2021 , eprint=

    End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering , author=. 2021 , eprint=

  61. [78]

    2020 , publisher =

    Distilling Knowledge from Reader to Retriever for Question Answering , author=. 2020 , publisher =

  62. [85]

    Advances in neural information processing systems , pages=

    Distributed representations of words and phrases and their compositionality , author=. Advances in neural information processing systems , pages=

  63. [86]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Unsupervised feature learning via non-parametric instance discrimination , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  64. [87]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Momentum contrast for unsupervised visual representation learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  65. [88]

    International conference on machine learning , pages=

    A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=. 2020 , organization=

  66. [89]

    arXiv preprint arXiv:2006.09882 , year=

    Unsupervised learning of visual features by contrasting cluster assignments , author=. arXiv preprint arXiv:2006.09882 , year=

  67. [90]

    Representation Learning with Contrastive Predictive Coding

    Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=

  68. [91]

    wav2vec 2.0: A framework for self-supervised learning of speech representations,

    wav2vec 2.0: A framework for self-supervised learning of speech representations , author=. arXiv preprint arXiv:2006.11477 , year=

  69. [96]

    Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

    Google's neural machine translation system: Bridging the gap between human and machine translation , author=. arXiv preprint arXiv:1609.08144 , year=

  70. [97]

    Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

    Bootstrap your own latent: A new approach to self-supervised learning , author=. arXiv preprint arXiv:2006.07733 , year=

  71. [99]

    2019 , eprint=

    Decoupled Weight Decay Regularization , author=. 2019 , eprint=

  72. [100]

    CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data

    Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzm \'a n, Francisco and Joulin, Armand and Grave, Edouard. CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data. Proceedings of the 12th Language Resources and Evaluation Conference. 2020

  73. [102]

    Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval , pages=

    Bridging the lexical chasm: statistical approaches to answer-finding , author=. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval , pages=

  74. [103]

    Transactions of the Association for Computational Linguistics , volume=

    Natural questions: a benchmark for question answering research , author=. Transactions of the Association for Computational Linguistics , volume=. 2019 , publisher=

  75. [104]

    CoCo@ NIPS , year=

    MS MARCO: A human generated machine reading comprehension dataset , author=. CoCo@ NIPS , year=

  76. [105]

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

    Exploring the limits of transfer learning with a unified text-to-text transformer , author=. arXiv preprint arXiv:1910.10683 , year=

  77. [106]

    Online preprint , year=

    From doc2query to docTTTTTquery , author=. Online preprint , year=

  78. [107]

    Dawei Zhu, Liang Wang, Nan Yang, Yifan Song, Wenhao Wu, Furu Wei, and Sujian Li

    Sparta: Efficient open-domain question answering via sparse transformer matching retrieval , author=. arXiv preprint arXiv:2009.13013 , year=

  79. [110]

    2021 , eprint=

    A Replication Study of Dense Passage Retriever , author=. 2021 , eprint=

  80. [111]

    Simple Entity-Centric Questions Challenge Dense Retrievers

    Sciavolino, Christopher and Zhong, Zexuan and Lee, Jinhyuk and Chen, Danqi. Simple Entity-Centric Questions Challenge Dense Retrievers. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021

Showing first 80 references.