arxiv: 2112.09118 · v4 · submitted 2021-12-16 · 💻 cs.IR · cs.AI· cs.CL

Recognition: 3 theorem links

· Lean Theorem

Unsupervised Dense Information Retrieval with Contrastive Learning

Gautier Izacard , Mathilde Caron , Lucas Hosseini , Sebastian Riedel , Piotr Bojanowski , Armand Joulin , Edouard Grave

Authors on Pith no claims yet

Pith reviewed 2026-05-12 13:15 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL

keywords dense retrievalcontrastive learningunsupervised information retrievalBEIR benchmarkcross-lingual retrievalBM25 baselinemultilingual retrieval

0 comments

The pith

Contrastive learning on unlabeled text trains dense retrievers that outperform BM25 on most BEIR datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests how far contrastive learning can go in creating effective dense retrievers without any labeled training data. It finds that the resulting models achieve strong retrieval performance in many settings and surpass the unsupervised BM25 baseline on eleven of the fifteen BEIR datasets when measured by Recall at 100. A sympathetic reader would care because dense retrievers have usually demanded large amounts of supervised examples, making them impractical for new domains or languages. The work shows that contrastive pre-training can also improve results after later fine-tuning and works across languages, including cross-lingual cases where term matching fails.

Core claim

We explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual

What carries the argument

Contrastive learning on dense neural encoders trained solely on unlabeled text to produce vector representations whose similarities reflect semantic relevance.

If this is right

Such unsupervised dense retrievers become practical for new applications lacking labeled data.
Using the contrastive model as pre-training improves fine-tuned performance on both small domain-specific sets and large collections like MS MARCO.
The same models deliver strong results in multilingual retrieval and support cross-lingual transfer even when fine-tuned only on English data.
Cross-script retrieval becomes possible, such as finding English documents from Arabic queries, which term-based methods cannot do.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the contrastive representations capture semantics reliably, they could extend to other retrieval-adjacent tasks like question answering or clustering without labels.
Hybrid systems combining these dense vectors with sparse BM25 scores might yield further gains on heterogeneous data.
Applying the method to even lower-resource languages or domains would test how far the unsupervised advantage reaches.

Load-bearing premise

That the similarities learned by contrastive training on unlabeled text match human notions of relevance closely enough to beat term-frequency methods across diverse datasets and languages.

What would settle it

A new retrieval dataset or language pair on which the unsupervised contrastive model achieves lower Recall@100 than BM25 would indicate the performance does not generalize as claimed.

read the original abstract

Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised term-frequency methods such as BM25. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS~MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual transfer when fine-tuned on supervised English data only and evaluated on low resources language such as Swahili. We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Contrastive learning on raw text produces dense retrievers that beat BM25 on 11 of 15 BEIR datasets and transfer across languages and scripts.

read the letter

The main thing to know is that this paper shows a simple contrastive loss on unlabeled documents can train dense retrievers that outperform BM25 in zero-shot retrieval on most BEIR tasks, with additional gains when used for pre-training and clear benefits in multilingual settings including cross-script cases like Arabic queries to English documents. That last part matters because term-matching methods cannot handle script differences at all. The work tests the approach across zero-shot, few-shot fine-tuning on a few thousand examples, full MS MARCO fine-tuning, and low-resource language transfer, which gives the empirical claims some breadth. The reported Recall@100 numbers on BEIR and the cross-lingual results are the concrete evidence, and they line up with the claim that document-internal contrastive pairs can produce vectors whose similarity aligns with relevance better than sparse baselines in many domains. The multilingual experiments in particular test the assumption directly by showing transfer where lexical overlap is absent. The setup appears reproducible from the description, with standard in-batch negatives and no obvious circularity since evaluation stays on held-out benchmarks. One soft spot is that the paper does not include extensive ablations on negative sampling or hard-negative mining, so it is not yet clear how sensitive the gains are to those choices or whether other contrastive variants would do better. The results rest entirely on empirical benchmarks rather than any formal guarantee, which is typical for this area but leaves room for dataset-specific quirks. Overall the evidence supports the central claim without internal contradictions. This paper is useful for anyone working on dense retrieval in data-scarce or multilingual settings, or for groups that want a stronger starting point before supervised fine-tuning. Readers focused on contrastive methods or low-resource IR will find the numbers and transfer results directly applicable. It deserves a serious referee because the empirical demonstration addresses a practical gap and the experiments are broad enough to merit detailed review. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes training dense retrievers in a fully unsupervised manner via contrastive learning on unlabeled text corpora, using in-batch negatives. It reports that the resulting model outperforms the BM25 baseline on Recall@100 for 11 of 15 datasets in the BEIR zero-shot benchmark. The authors further show that the same unsupervised model improves downstream performance when used as pre-training before fine-tuning on either a few thousand in-domain examples or the full MS MARCO collection, and that it yields strong results for multilingual retrieval and cross-lingual transfer (including English-to-Swahili and cross-script Arabic-to-English).

Significance. If the reported numbers hold under full experimental verification, the work is significant because it supplies concrete evidence that contrastive objectives applied to raw text can produce dense representations whose similarity aligns with relevance well enough to beat a strong term-frequency baseline across heterogeneous domains and languages. The multilingual and cross-lingual results are particularly valuable for low-resource settings. The evaluation on the public BEIR suite and the explicit pre-training gains constitute reproducible, falsifiable claims that advance the practical deployment of dense retrievers without labeled data.

major comments (2)

[§3] §3 (contrastive training details): the central empirical claim depends on the precise construction of in-batch negatives and the choice of temperature and batch size; without these hyperparameters and the exact unlabeled corpus statistics, it is impossible to verify that the reported Recall@100 gains are attributable to the contrastive objective rather than to corpus or implementation specifics.
[Table 1] Table 1 (BEIR results): the 11/15 outperformance claim is load-bearing, yet the table does not report variance across random seeds or the exact negative-sampling procedure per dataset; a single-run comparison leaves open the possibility that the margin over BM25 is within noise on several of the 11 datasets.

minor comments (2)

[Abstract] The abstract states that the model 'outperforms BM25 on 11 out of 15 datasets' but does not name the four datasets where it does not; adding this information would improve clarity.
[Method] Notation for the contrastive loss (Eq. 1 or equivalent) uses 'in-batch negatives' without an explicit formula; a short equation would remove ambiguity about whether hard negatives or only random in-batch negatives are used.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive review and recommendation of minor revision. We address each major comment below with clarifications and commitments to revisions that strengthen reproducibility without altering the core claims.

read point-by-point responses

Referee: [§3] §3 (contrastive training details): the central empirical claim depends on the precise construction of in-batch negatives and the choice of temperature and batch size; without these hyperparameters and the exact unlabeled corpus statistics, it is impossible to verify that the reported Recall@100 gains are attributable to the contrastive objective rather than to corpus or implementation specifics.

Authors: We agree that these implementation details are necessary for verification and reproducibility. In the revised manuscript we will expand Section 3 to report the exact batch size, temperature value, the construction of in-batch negatives (other in-batch examples serve as negatives for each positive pair), and the precise statistics of the unlabeled corpus used for training (size, source, and preprocessing). These additions will make it possible to confirm that performance differences arise from the contrastive objective. revision: yes
Referee: [Table 1] Table 1 (BEIR results): the 11/15 outperformance claim is load-bearing, yet the table does not report variance across random seeds or the exact negative-sampling procedure per dataset; a single-run comparison leaves open the possibility that the margin over BM25 is within noise on several of the 11 datasets.

Authors: We will revise the text and table caption to explicitly describe the negative-sampling procedure, which applies the same in-batch negative construction uniformly to every dataset. We acknowledge that multi-seed variance would increase statistical confidence. Our experiments used a single training run per configuration owing to the high computational cost of large-batch contrastive pre-training. The margins over BM25 are substantial on the majority of the 11 datasets, making noise an unlikely explanation, yet we will add an explicit discussion of this limitation in the revised paper. revision: partial

standing simulated objections not resolved

Reporting variance across random seeds for the unsupervised models in Table 1, as multiple independent training runs were not performed.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical study applying standard contrastive learning (in-batch negatives on unlabeled text) to produce dense retrievers. All performance claims are measured on the external held-out BEIR benchmark (and cross-lingual sets) that play no role in training or hyperparameter selection. No derivation, equation, or self-citation reduces the central result to a fitted quantity or to a prior result by the same authors; the contrastive objective and evaluation protocol are independent of the reported Recall@100 numbers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Central claim rests on the effectiveness of standard contrastive objectives for learning semantic embeddings from unlabeled corpora; no explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5565 in / 1098 out tokens · 93975 ms · 2026-05-12T13:15:59.659047+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100.
Foundation.LawOfExistence existence_economically_inevitable echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Contrastive learning is an approach that relies on the fact that every document is, in some way, unique. This signal is the only information available in the absence of manual supervision.
Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The relevance score between a query and a document is given by the dot product between their representations after applying the encoder.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 32 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
cs.CL 2026-05 unverdicted novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
cs.AI 2026-04 accept novelty 8.0

MathNet delivers the largest multilingual Olympiad math dataset and benchmarks where models like Gemini-3.1-Pro reach 78% on solving but embedding models struggle on equivalent problem retrieval, with retrieval augmen...
MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents
cs.MA 2026-05 unverdicted novelty 7.0

MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.
Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders
cs.CL 2026-05 unverdicted novelty 7.0

EPIC trains LLMs to treat continuous embeddings as in-context prompts, yielding state-of-the-art text embedding performance on MTEB with or without prompts at inference and lower compute.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
cs.CL 2026-05 unverdicted novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
Skill Retrieval Augmentation for Agentic AI
cs.CL 2026-04 unverdicted novelty 7.0

Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.
Bridging the Long-Tail Gap: Robust Retrieval-Augmented Relation Completion via Multi-Stage Paraphrase Infusion
cs.CL 2026-04 unverdicted novelty 7.0

RC-RAG boosts long-tail relation completion by infusing paraphrases into RAG stages, yielding up to 40.6 EM gains on benchmarks across five LLMs with no fine-tuning.
Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation
cs.IR 2026-04 unverdicted novelty 7.0

An LLM simulation framework generates multilingual tip-of-the-tongue queries, validated by rank correlation with real queries, producing the first large-scale ToT benchmarks for four languages.
HaS: Accelerating RAG through Homology-Aware Speculative Retrieval
cs.IR 2026-04 unverdicted novelty 7.0

HaS accelerates RAG retrieval via homology-aware speculative retrieval and homologous query re-identification validation, cutting latency 24-37% with 1-2% accuracy drop on tested datasets.
Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
cs.CR 2026-04 unverdicted novelty 7.0

DEJA uses evolutionary optimization guided by an LLM-based Answer Utility Score to induce soft-failure responses in RAG systems, achieving over 79% soft attack success rate with under 15% hard failures and high stealt...
C-Pack: Packed Resources For General Chinese Embeddings
cs.CL 2023-09 accept novelty 7.0

C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory
cs.AI 2026-05 unverdicted novelty 6.0

SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and l...
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory
cs.LG 2026-05 unverdicted novelty 6.0

Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
Reproducing Complex Set-Compositional Information Retrieval
cs.CL 2026-05 unverdicted novelty 6.0

Neural retrievers that double BM25 performance on QUEST collapse below 0.02 Recall@100 on the new LIMIT+ benchmark while lexical methods reach 0.96, with all methods degrading as compositional depth increases.
RAG over Thinking Traces Can Improve Reasoning Tasks
cs.IR 2026-05 unverdicted novelty 6.0

RAG over structured thinking traces boosts LLM reasoning on AIME, LiveCodeBench, and GPQA, with relative gains up to 56% and little added cost.
Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning
cs.CL 2026-05 unverdicted novelty 6.0

Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.
Beyond Semantic Relevance: Counterfactual Risk Minimization for Robust Retrieval-Augmented Generation
cs.CL 2026-05 unverdicted novelty 6.0

CoRM-RAG uses a cognitive perturbation protocol to simulate biases and trains an Evidence Critic to retrieve documents that support correct decisions even under adversarial query changes.
CleanBase: Detecting Malicious Documents in RAG Knowledge Databases
cs.CR 2026-05 unverdicted novelty 6.0

CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.
UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
cs.LG 2026-04 unverdicted novelty 6.0

UniCon unifies contrastive alignment across encoders and alignment types using kernels to enable exact closed-form updates instead of stochastic optimization.
ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval
cs.IR 2026-04 unverdicted novelty 6.0

ARHN refines hard-negative training data for dense retrieval by using LLMs to convert answer-containing passages into additional positives and exclude answer-containing passages from the negative set.
Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory
cs.IR 2026-04 unverdicted novelty 6.0

ACGM learns task-adaptive sparse graphs over multi-modal agent histories via policy-gradient optimization, reaching 82.7 nDCG@10 and 89.2% Precision@10 on WebShop, VisualWebArena, and Mind2Web while outperforming 19 b...
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
cs.CL 2026-04 unverdicted novelty 6.0

Personalized RewardBench reveals that state-of-the-art reward models reach only 75.94% accuracy on personalized preferences and shows stronger correlation with downstream BoN and PPO performance than prior benchmarks.
Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers
cs.IR 2026-04 unverdicted novelty 6.0

Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.
Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity
cs.IR 2026-04 conditional novelty 6.0

Generative retrieval beats dense retrieval and BM25 on the LIMIT dataset but degrades with hard negatives due to identifier ambiguity during decoding.
Are LLM-Based Retrievers Worth Their Cost? An Empirical Study of Efficiency, Robustness, and Reasoning Overhead
cs.IR 2026-04 accept novelty 6.0

Empirical comparison across 14 retrievers on the BRIGHT benchmark shows reasoning-specialized models can match strong accuracy with competitive speed while many large LLM bi-encoders add latency for small gains and co...
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
cs.CL 2024-05 accept novelty 6.0

NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.
MemGPT: Towards LLMs as Operating Systems
cs.AI 2023-10 unverdicted novelty 6.0

MemGPT uses OS-inspired virtual context management to extend LLM context windows for large document analysis and long-term multi-session chat.
RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering
eess.SP 2026-04 unverdicted novelty 5.0

RECIPER improves procedure-oriented retrieval from materials papers by combining paragraph-level dense retrieval with LLM-extracted procedural summaries and lightweight reranking, yielding average gains of +3.73 Recal...
Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions
cs.IR 2026-04 accept novelty 5.0

ConstBERT and ColBERT-v2 reproduce on MS-MARCO but drop 86-97% on long queries because MaxSim cannot filter filler noise, and extra fine-tuning or backend changes do not overcome the architectural constraint.
Multilingual E5 Text Embeddings: A Technical Report
cs.CL 2024-02 unverdicted novelty 5.0

Open-source multilingual E5 embedding models are trained via contrastive pre-training on 1 billion text pairs and fine-tuning, with an instruction-tuned model matching English SOTA performance.
Text Embeddings by Weakly-Supervised Contrastive Pre-training
cs.CL 2022-12 unverdicted novelty 5.0

E5 text embeddings trained with weakly-supervised contrastive pre-training on CCPairs outperform BM25 on BEIR zero-shot and achieve top results on MTEB, beating much larger models.
Galactica: A Large Language Model for Science
cs.CL 2022-11 unverdicted novelty 5.0

Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

Reference graph

Works this paper leans on

148 extracted references · 148 canonical work pages · cited by 32 Pith papers · 15 internal anchors

[1]

Salient phrase aware dense retrieval: Can a dense retriever imitate a sparse one?, 2021

Chen, Xilun and Lakhotia, Kushal and Oğuz, Barlas and Gupta, Anchit and Lewis, Patrick and Peshterliev, Stan and Mehdad, Yashar and Gupta, Sonal and Yih, Wen-tau , title =. doi:10.48550/ARXIV.2110.06918 , url =

work page doi:10.48550/arxiv.2110.06918
[2]

Learning to retrieve passages without supervision, 2021

Ram, Ori and Shachaf, Gal and Levy, Omer and Berant, Jonathan and Globerson, Amir , title =. doi:10.48550/ARXIV.2112.07708 , url =

work page doi:10.48550/arxiv.2112.07708
[4]

Unsupervised Cross-lingual Representation Learning at Scale , journal =

Alexis Conneau and Kartikay Khandelwal and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek and Francisco Guzm. Unsupervised Cross-lingual Representation Learning at Scale , journal =. 2019 , url =

work page 2019
[5]

CoRR , volume =

Akari Asai and Xinyan Yu and Jungo Kasai and Hannaneh Hajishirzi , title =. CoRR , volume =. 2021 , url =

work page 2021
[6]

CoRR , volume =

Shayne Longpre and Yi Lu and Joachim Daiber , title =. CoRR , volume =. 2020 , url =

work page 2020
[7]

CoRR , volume =

Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin , title =. CoRR , volume =. 2021 , url =

work page 2021
[8]

Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki , title =

Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki , title =. CoRR , volume =. 2020 , url =

work page 2020
[14]

Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Ves and Zettlemoyer, Luke , journal=

work page
[15]

arXiv preprint arXiv:2007.00814 , year=

Relevance-guided Supervision for OpenQA with ColBERT , author=. arXiv preprint arXiv:2007.00814 , year=

work page arXiv 2007
[16]

Bruce , title =

Dehghani, Mostafa and Zamani, Hamed and Severyn, Aliaksei and Kamps, Jaap and Croft, W. Bruce , title =. 2017 , booktitle =

work page 2017
[17]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proc. NAACL. 2019

work page 2019
[18]

Deep Contextualized Word Representations

Peters, Matthew and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke. Deep Contextualized Word Representations. Proc. NAACL. 2018

work page 2018
[19]

Chen, Danqi and Fisch, Adam and Weston, Jason and Bordes, Antoine , booktitle =. Reading

work page
[20]

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

How Much Knowledge Can You Pack Into the Parameters of a Language Model? , author=. arXiv preprint arXiv:2002.08910 , year=

work page internal anchor Pith review arXiv 2002
[21]

Talmor, Alon and Elazar, Yanai and Goldberg, Yoav and Berant, Jonathan , journal=. o

work page
[22]

F., Araki, J

How Can We Know What Language Models Know? , author=. arXiv preprint arXiv:1911.12543 , year=

work page arXiv 1911
[23]

Language Models as Knowledge Bases?

Petroni, Fabio and Rockt. Language Models as Knowledge Bases?. Proc. EMNLP-IJCNLP. 2019

work page 2019
[24]

OpenAI Technical Report , year=

Language models are unsupervised multitask learners , author=. OpenAI Technical Report , year=

work page
[25]

Language Models are Few-Shot Learners

Language models are few-shot learners , author=. arXiv preprint arXiv:2005.14165 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2005
[26]

arXiv preprint arXiv:1911.03868 , year=

Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering , author=. arXiv preprint arXiv:1911.03868 , year=

work page arXiv 1911
[27]

Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , author=. Proc. ICLR , year=

work page
[28]

arXiv preprint arXiv:2004.07202 , year=

Entities as experts: Sparse memory access with entity supervision , author=. arXiv preprint arXiv:2004.07202 , year=

work page arXiv 2004
[29]

arXiv preprint arXiv:2005.04611 , year=

How Context Affects Language Models' Factual Predictions , author=. arXiv preprint arXiv:2005.04611 , year=

work page arXiv 2005
[30]

Latent Retrieval for Weakly Supervised Open Domain Question Answering

Lee, Kenton and Chang, Ming-Wei and Toutanova, Kristina. Latent Retrieval for Weakly Supervised Open Domain Question Answering. Proc. ACL. 2019

work page 2019
[33]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. arXiv preprint arXiv:2005.11401 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2005
[34]

arXiv preprint arXiv:1911.02896 , year=

Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering , author=. arXiv preprint arXiv:1911.02896 , year=

work page arXiv 1911
[35]

Multi-passage BERT : A Globally Normalized BERT Model for Open-domain Question Answering

Wang, Zhiguo and Ng, Patrick and Ma, Xiaofei and Nallapati, Ramesh and Xiang, Bing. Multi-passage BERT : A Globally Normalized BERT Model for Open-domain Question Answering. Proc. EMNLP-IJCNLP. 2019

work page 2019
[36]

End-to-End Open-Domain Question Answering with BERT serini

Yang, Wei and Xie, Yuqing and Lin, Aileen and Li, Xingyu and Tan, Luchen and Xiong, Kun and Li, Ming and Lin, Jimmy. End-to-End Open-Domain Question Answering with BERT serini. Proc. NAACL (Demonstrations). 2019

work page 2019
[37]

R ^3 : Reinforced ranker-reader for open-domain question answering , author=. Proc. AAAI , year=

work page
[38]

Simple and Effective Multi-Paragraph Reading Comprehension

Clark, Christopher and Gardner, Matt. Simple and Effective Multi-Paragraph Reading Comprehension. Proc. ACL. 2018

work page 2018
[39]

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering , author=. Proc. ICLR , year=

work page
[40]

Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering

Lee, Jinhyuk and Yun, Seongjun and Kim, Hyunjae and Ko, Miyoung and Kang, Jaewoo. Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering. Proc. EMNLP. 2018

work page 2018
[41]

A Discrete Hard EM Approach for Weakly Supervised Question Answering

Min, Sewon and Chen, Danqi and Hajishirzi, Hannaneh and Zettlemoyer, Luke. A Discrete Hard EM Approach for Weakly Supervised Question Answering. Proc. EMNLP-IJCNLP. 2019

work page 2019
[42]

arXiv preprint arXiv:1909.08041 , year=

Revealing the importance of semantic retrieval for machine reading at scale , author=. arXiv preprint arXiv:1909.08041 , year=

work page arXiv 1909
[43]

Improving

Grave, Edouard and Joulin, Armand and Usunier, Nicolas , date =. Improving

work page
[44]

Generalization through

Khandelwal, Urvashi and Levy, Omer and Jurafsky, Dan and Zettlemoyer, Luke and Lewis, Mike , date =. Generalization through

work page
[45]

Adaptive

Sukhbaatar, Sainbayar and Grave, Edouard and Bojanowski, Piotr and Joulin, Armand , date =. Adaptive

work page
[46]

Advances in Neural Information Processing Systems 30 , pages =

Attention is All you Need , author =. Advances in Neural Information Processing Systems 30 , pages =

work page
[47]

Okapi at

Robertson, Stephen E and Walker, Steve and Jones, Susan and Hancock-Beaulieu, Micheline M and Gatford, Mike and others , journal=. Okapi at

work page
[48]

Voorhees, Ellen M and others , booktitle=. The

work page
[49]

Toutanova and Llion Jones and Ming-Wei Chang and Andrew Dai and Jakob Uszkoreit and Quoc Le and Slav Petrov , year =

Tom Kwiatkowski and Jennimaria Palomaki and Olivia Redfield and Michael Collins and Ankur Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Matthew Kelcey and Jacob Devlin and Kenton Lee and Kristina N. Toutanova and Llion Jones and Ming-Wei Chang and Andrew Dai and Jakob Uszkoreit and Quoc Le and Slav Petrov , year =. Natural

work page
[51]

and Zettlemoyer, Luke , title =

Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke , title =. Proc. ACL , year =

work page
[52]

SQ u AD : 100,000+ Questions for Machine Comprehension of Text

Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy. SQ u AD : 100,000+ Questions for Machine Comprehension of Text. Proc. EMNLP. 2016

work page 2016
[53]

Ko. The. TACL , year=

work page
[54]

Reddy, Siva and Chen, Danqi and Manning, Christopher D , journal=

work page
[55]

ELI 5: Long Form Question Answering

Fan, Angela and Jernite, Yacine and Perez, Ethan and Grangier, David and Weston, Jason and Auli, Michael. ELI 5: Long Form Question Answering. Proc. ACL. 2019

work page 2019
[56]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[57]

gradient descent

AmbigQA: Answering Ambiguous Open-domain Questions , author=. arXiv preprint arXiv:2004.10645 , year=

work page arXiv 2004
[58]

Bruce , title =

Dehghani, Mostafa and Zamani, Hamed and Severyn, Aliaksei and Kamps, Jaap and Croft, W. Bruce , title =. 2017 , publisher =. doi:10.1145/3077136.3080832 , booktitle =

work page doi:10.1145/3077136.3080832 2017
[59]

Journal of documentation , year=

A statistical interpretation of term specificity and its application in retrieval , author=. Journal of documentation , year=

work page
[60]

Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages=

Learning deep structured semantic models for web search using clickthrough data , author=. Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages=

work page
[61]

Proceedings of the 23rd international conference on world wide web , pages=

Learning semantic representations using convolutional neural networks for web search , author=. Proceedings of the 23rd international conference on world wide web , pages=

work page
[62]

Proceedings of the 23rd ACM international conference on conference on information and knowledge management , pages=

A latent semantic model with convolutional-pooling structure for information retrieval , author=. Proceedings of the 23rd ACM international conference on conference on information and knowledge management , pages=

work page
[63]

IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=

Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval , author=. IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=. 2016 , publisher=

work page 2016
[68]

IEEE Transactions on Big Data , year=

Billion-scale similarity search with GPUs , author=. IEEE Transactions on Big Data , year=

work page
[70]

2009 , publisher=

The probabilistic relevance framework: BM25 and beyond , author=. 2009 , publisher=

work page 2009
[71]

Journal of the American society for information science , volume=

Indexing by latent semantic analysis , author=. Journal of the American society for information science , volume=. 1990 , publisher=

work page 1990
[73]

Foundations and Trends

An introduction to neural information retrieval , author=. Foundations and Trends. 2018 , publisher=

work page 2018
[74]

2008 , publisher=

Introduction to information retrieval , author=. 2008 , publisher=

work page 2008
[77]

2021 , eprint=

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering , author=. 2021 , eprint=

work page 2021
[78]

2020 , publisher =

Distilling Knowledge from Reader to Retriever for Question Answering , author=. 2020 , publisher =

work page 2020
[85]

Advances in neural information processing systems , pages=

Distributed representations of words and phrases and their compositionality , author=. Advances in neural information processing systems , pages=

work page
[86]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Unsupervised feature learning via non-parametric instance discrimination , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[87]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Momentum contrast for unsupervised visual representation learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[88]

International conference on machine learning , pages=

A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020
[89]

arXiv preprint arXiv:2006.09882 , year=

Unsupervised learning of visual features by contrasting cluster assignments , author=. arXiv preprint arXiv:2006.09882 , year=

work page arXiv 2006
[90]

Representation Learning with Contrastive Predictive Coding

Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[91]

wav2vec 2.0: A framework for self-supervised learning of speech representations,

wav2vec 2.0: A framework for self-supervised learning of speech representations , author=. arXiv preprint arXiv:2006.11477 , year=

work page arXiv 2006
[96]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google's neural machine translation system: Bridging the gap between human and machine translation , author=. arXiv preprint arXiv:1609.08144 , year=

work page internal anchor Pith review arXiv
[97]

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Bootstrap your own latent: A new approach to self-supervised learning , author=. arXiv preprint arXiv:2006.07733 , year=

work page arXiv 2006
[99]

2019 , eprint=

Decoupled Weight Decay Regularization , author=. 2019 , eprint=

work page 2019
[100]

CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data

Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzm \'a n, Francisco and Joulin, Armand and Grave, Edouard. CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data. Proceedings of the 12th Language Resources and Evaluation Conference. 2020

work page 2020
[102]

Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval , pages=

Bridging the lexical chasm: statistical approaches to answer-finding , author=. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval , pages=

work page
[103]

Transactions of the Association for Computational Linguistics , volume=

Natural questions: a benchmark for question answering research , author=. Transactions of the Association for Computational Linguistics , volume=. 2019 , publisher=

work page 2019
[104]

CoCo@ NIPS , year=

MS MARCO: A human generated machine reading comprehension dataset , author=. CoCo@ NIPS , year=

work page
[105]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Exploring the limits of transfer learning with a unified text-to-text transformer , author=. arXiv preprint arXiv:1910.10683 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1910
[106]

Online preprint , year=

From doc2query to docTTTTTquery , author=. Online preprint , year=

work page
[107]

Dawei Zhu, Liang Wang, Nan Yang, Yifan Song, Wenhao Wu, Furu Wei, and Sujian Li

Sparta: Efficient open-domain question answering via sparse transformer matching retrieval , author=. arXiv preprint arXiv:2009.13013 , year=

work page arXiv 2009
[110]

2021 , eprint=

A Replication Study of Dense Passage Retriever , author=. 2021 , eprint=

work page 2021
[111]

Simple Entity-Centric Questions Challenge Dense Retrievers

Sciavolino, Christopher and Zhong, Zexuan and Lee, Jinhyuk and Chen, Danqi. Simple Entity-Centric Questions Challenge Dense Retrievers. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021

work page 2021

Showing first 80 references.