Latent Retrieval for Weakly Supervised Open Domain Question Answering

· 2019 · cs.CL · arXiv 1906.00300

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

Recent work on open domain question answering (QA) assumes strong supervision of the supporting evidence and/or assumes a blackbox information retrieval (IR) system to retrieve evidence candidates. We argue that both are suboptimal, since gold evidence is not always available, and QA is fundamentally different from IR. We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system. In this setting, evidence retrieval from all of Wikipedia is treated as a latent variable. Since this is impractical to learn from scratch, we pre-train the retriever with an Inverse Cloze Task. We evaluate on open versions of five QA datasets. On datasets where the questioner already knows the answer, a traditional IR system such as BM25 is sufficient. On datasets where a user is genuinely seeking an answer, we show that learned retrieval is crucial, outperforming BM25 by up to 19 points in exact match.

representative citing papers

Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG

cs.CL · 2025-11-12 · conditional · novelty 7.0

TARG uses uncertainty scores from a short no-context draft to gate retrieval in RAG, matching Always-RAG accuracy while cutting retrievals by 70-90% on QA benchmarks.

Improving language models by retrieving from trillions of tokens

cs.CL · 2021-12-08 · unverdicted · novelty 7.0

RETRO matches GPT-3 and Jurassic-1 performance on the Pile benchmark using 25 times fewer parameters by conditioning on retrieved chunks from a 2-trillion-token database.

R2MED: A Benchmark for Reasoning-Driven Medical Retrieval

cs.IR · 2025-05-20 · accept · novelty 6.0

R2MED is the first benchmark for reasoning-driven medical retrieval, where even top models reach only 41.4 nDCG@10 on queries requiring inference beyond lexical or semantic overlap.

An Empirical Study of Mamba-based Language Models

cs.LG · 2024-06-12 · accept · novelty 6.0

An 8B Mamba-2-Hybrid with 43% Mamba-2, 7% attention, and 50% MLP layers exceeds an 8B Transformer by 2.65 points on average across 12 tasks and matches it on 23 long-context tasks while enabling up to 8x faster inference.

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

cs.CL · 2020-02-10 · accept · novelty 6.0

Fine-tuned language models store knowledge in parameters to answer questions competitively with retrieval-based open-domain QA systems.

KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality

cs.AI · 2025-06-24 · unverdicted · novelty 5.0

KnowRL integrates a knowledge-verification factuality reward into RL training to enforce fact-based reasoning steps and lower hallucination rates in LLMs.

citing papers explorer

Showing 6 of 6 citing papers.

Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG cs.CL · 2025-11-12 · conditional · none · ref 9 · internal anchor
TARG uses uncertainty scores from a short no-context draft to gate retrieval in RAG, matching Always-RAG accuracy while cutting retrievals by 70-90% on QA benchmarks.
Improving language models by retrieving from trillions of tokens cs.CL · 2021-12-08 · unverdicted · none · ref 36 · internal anchor
RETRO matches GPT-3 and Jurassic-1 performance on the Pile benchmark using 25 times fewer parameters by conditioning on retrieved chunks from a 2-trillion-token database.
R2MED: A Benchmark for Reasoning-Driven Medical Retrieval cs.IR · 2025-05-20 · accept · none · ref 6 · internal anchor
R2MED is the first benchmark for reasoning-driven medical retrieval, where even top models reach only 41.4 nDCG@10 on queries requiring inference beyond lexical or semantic overlap.
An Empirical Study of Mamba-based Language Models cs.LG · 2024-06-12 · accept · none · ref 30 · internal anchor
An 8B Mamba-2-Hybrid with 43% Mamba-2, 7% attention, and 50% MLP layers exceeds an 8B Transformer by 2.65 points on average across 12 tasks and matches it on 23 long-context tasks while enabling up to 8x faster inference.
How Much Knowledge Can You Pack Into the Parameters of a Language Model? cs.CL · 2020-02-10 · accept · none · ref 56 · internal anchor
Fine-tuned language models store knowledge in parameters to answer questions competitively with retrieval-based open-domain QA systems.
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality cs.AI · 2025-06-24 · unverdicted · none · ref 4 · internal anchor
KnowRL integrates a knowledge-verification factuality reward into RL training to enforce fact-based reasoning steps and lower hallucination rates in LLMs.

Latent Retrieval for Weakly Supervised Open Domain Question Answering

fields

years

verdicts

representative citing papers

citing papers explorer