pith. machine review for the scientific record. sign in

arxiv: 2208.03299 · v3 · submitted 2022-08-05 · 💻 cs.CL

Recognition: 2 theorem links

Atlas: Few-shot Learning with Retrieval Augmented Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-16 13:44 UTC · model grok-4.3

classification 💻 cs.CL
keywords few-shot learningretrieval-augmented language modelsquestion answeringknowledge intensive tasksnatural questionslanguage models
0
0 comments X

The pith

Atlas, a retrieval-augmented language model, reaches over 42 percent accuracy on Natural Questions with only 64 examples while using 50 times fewer parameters than a 540 billion parameter model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Atlas as a pre-trained retrieval-augmented language model that learns knowledge-intensive tasks from very few examples. It combines a language model with an external document index so that facts are retrieved on demand rather than stored inside the model weights. Evaluations on Natural Questions, MMLU, and KILT show that this design supports strong few-shot performance on tasks where knowledge is central. The index itself can be swapped or updated to change what the model knows without retraining. This approach challenges the assumption that massive parameter counts are required to handle fact-based reasoning.

Core claim

Atlas is a carefully pre-trained retrieval-augmented language model that performs few-shot learning on knowledge-intensive tasks by retrieving relevant documents from an external index and conditioning its outputs on them, reaching over 42 percent accuracy on Natural Questions with 64 examples while outperforming a 540 billion parameter model.

What carries the argument

Retrieval-augmented language model that fetches documents from a fixed index and uses them to condition token predictions during both pre-training and few-shot inference.

If this is right

  • Knowledge can be updated by replacing the document index rather than retraining the entire model.
  • Few-shot performance on fact-heavy tasks improves when the index contains higher-quality or more domain-specific documents.
  • Smaller retrieval-augmented models can exceed the few-shot results of much larger non-retrieval models on question answering and fact-checking benchmarks.
  • The same pre-training recipe extends to other knowledge-intensive benchmarks such as MMLU and KILT with minimal additional examples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method could lower the compute cost of maintaining up-to-date knowledge systems by shifting storage from model parameters to an editable index.
  • Similar retrieval augmentation may help in domains where facts change rapidly, such as current events or scientific literature.
  • The approach raises the question of how to build and maintain retrieval indices that remain reliable across many different tasks without manual curation.

Load-bearing premise

The external index always supplies accurate and relevant documents, and the pre-training plus few-shot procedure teaches the model to use those documents correctly without needing to memorize facts internally.

What would settle it

Run Atlas on a Natural Questions subset after removing every relevant document from the retrieval index; if accuracy falls close to zero while a non-retrieval baseline stays flat, the claim is supported.

read the original abstract

Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings. In this work we present Atlas, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples. We perform evaluations on a wide range of tasks, including MMLU, KILT and NaturalQuestions, and study the impact of the content of the document index, showing that it can easily be updated. Notably, Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a 540B parameters model by 3% despite having 50x fewer parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Atlas, a pre-trained retrieval-augmented language model designed for few-shot learning on knowledge-intensive tasks. It reports strong results on MMLU, KILT, and Natural Questions, with the headline claim that Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a 540B-parameter model by 3% despite having 50x fewer parameters. The work also examines the effects of varying document index content and demonstrates that the index can be updated post-training.

Significance. If the results hold after addressing the index-overlap concern, the paper would show that retrieval augmentation enables parameter-efficient few-shot transfer for knowledge tasks without requiring the model to store facts internally. The explicit study of index updates is a concrete strength, as it provides a mechanism for knowledge editing that scales independently of model size.

major comments (2)
  1. [§4.2] §4.2 (Natural Questions experiments): the 42% accuracy result with 64 shots is presented without an ablation that holds the retrieval index fixed while removing any passage overlap or embedding similarity between the index and the 64 few-shot examples. The paper studies index content changes but does not isolate whether performance depends on distributional cues from the few-shot set itself.
  2. [§3.2] §3.2 (model architecture and training): the interaction between the frozen or jointly trained retriever and the language model during few-shot adaptation is not quantified with respect to retrieval precision on the test distribution; this is load-bearing for the claim that external retrieval substitutes for internal parameter storage.
minor comments (2)
  1. [Table 2] Table 2: baseline comparisons to PaLM-540B and other models should report standard deviations across multiple few-shot seeds to establish whether the 3% margin is statistically reliable.
  2. [§5] §5 (index ablations): the qualitative discussion of index updates would benefit from a quantitative metric such as retrieval recall@10 before and after update on a held-out validation split.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the contributions of our work on retrieval-augmented few-shot learning. We address each major comment below and will revise the manuscript to incorporate additional experiments and analysis as needed.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (Natural Questions experiments): the 42% accuracy result with 64 shots is presented without an ablation that holds the retrieval index fixed while removing any passage overlap or embedding similarity between the index and the 64 few-shot examples. The paper studies index content changes but does not isolate whether performance depends on distributional cues from the few-shot set itself.

    Authors: We agree that an explicit ablation isolating overlap or embedding similarity between the 64 few-shot examples and the retrieval index would strengthen the claim. In the revised manuscript we will add this experiment: we will construct a modified index that removes any passages with high embedding similarity (using the same retriever) or exact overlap with the few-shot set, then re-evaluate the 64-shot Natural Questions performance while keeping the index otherwise fixed. We expect the result to remain robust given the scale of the index relative to the tiny few-shot set, but the new ablation will directly address the concern about distributional cues. revision: yes

  2. Referee: [§3.2] §3.2 (model architecture and training): the interaction between the frozen or jointly trained retriever and the language model during few-shot adaptation is not quantified with respect to retrieval precision on the test distribution; this is load-bearing for the claim that external retrieval substitutes for internal parameter storage.

    Authors: We appreciate this observation. While §3.2 describes the frozen versus jointly-trained retriever variants, we did not report retrieval precision metrics on the test distribution during few-shot adaptation. In the revision we will add these measurements (e.g., top-1 and top-5 passage accuracy where the gold answer appears in the retrieved context) for both settings on Natural Questions and KILT tasks. This will quantify how retrieval quality evolves during adaptation and directly support the interpretation that external retrieval can substitute for internal parameter storage. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central claims rest on empirical evaluations of a retrieval-augmented model (Atlas) on public benchmarks such as Natural Questions, MMLU, and KILT using few-shot examples. The retrieval index is treated as an external, updatable knowledge source whose content impact is explicitly studied, with no equations or self-citations that reduce performance metrics to fitted parameters by construction or rename known results. The setup draws on independent pre-training and external document collections, rendering the reported accuracies (e.g., >42% on NQ with 64 shots) falsifiable against benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5491 in / 996 out tokens · 52401 ms · 2026-05-16T13:44:37.774736+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

    cs.CL 2023-10 conditional novelty 8.0

    DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.

  2. LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

    cs.CL 2023-08 unverdicted novelty 8.0

    LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).

  3. API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

    cs.CL 2023-04 conditional novelty 8.0

    API-Bank is a new benchmark and training dataset for tool-augmented LLMs that shows fine-tuned models can approach GPT-3.5 tool-use effectiveness.

  4. PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning

    cs.CL 2026-05 unverdicted novelty 7.0

    PlantMarkerBench supplies 5,550 literature sentences annotated for plant marker gene evidence validity and type across Arabidopsis, maize, rice and tomato, showing frontier LLMs handle direct expression evidence but s...

  5. PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning

    cs.CL 2026-05 unverdicted novelty 7.0

    PlantMarkerBench is a new multi-species benchmark with 5,550 evidence instances for evaluating language models on literature-grounded plant marker gene reasoning across expression, localization, function, indirect, an...

  6. AdversarialCoT: Single-Document Retrieval Poisoning for LLM Reasoning

    cs.IR 2026-04 unverdicted novelty 7.0

    A single query-specific poisoned document, built by extracting and iteratively refining an adversarial chain-of-thought, can substantially degrade reasoning accuracy in retrieval-augmented LLM systems.

  7. MMSearch-R1: Incentivizing LMMs to Search

    cs.CV 2025-06 unverdicted novelty 7.0

    MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting searc...

  8. C-Pack: Packed Resources For General Chinese Embeddings

    cs.CL 2023-09 accept novelty 7.0

    C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.

  9. The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

    cs.LG 2026-04 conditional novelty 6.0

    Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accura...

  10. Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

    cs.CL 2026-04 conditional novelty 6.0

    Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-paramete...

  11. Context Matters: Evaluating Context Strategies for Automated ADR Generation Using LLMs

    cs.SE 2026-04 unverdicted novelty 6.0

    A small recency window of 3-5 prior ADRs as context produces higher-fidelity LLM-generated Architecture Decision Records than no context, full history, or retrieval-augmented selection in typical sequential workflows.

  12. ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

    cs.AI 2025-03 unverdicted novelty 6.0

    ReSearch trains LLMs via RL to integrate search operations into reasoning steps, achieving strong generalization across benchmarks and eliciting reflection and self-correction without supervised reasoning data.

  13. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

    cs.CL 2024-01 unverdicted novelty 6.0

    RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.

  14. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

    cs.CL 2023-10 unverdicted novelty 6.0

    Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.

  15. DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

    cs.CL 2023-09 conditional novelty 6.0

    DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.

  16. The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

    cs.LG 2026-04 unverdicted novelty 5.0

    Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering close the gap.

  17. FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory

    cs.AI 2026-04 unverdicted novelty 5.0

    FSFM is a biologically-inspired selective forgetting framework for LLM agents that claims to boost access efficiency by 8.49%, content quality by 29.2% signal-to-noise, and eliminate security risks entirely through a ...

  18. DALM: A Domain-Algebraic Language Model via Three-Phase Structured Generation

    cs.CL 2026-04 unverdicted novelty 5.0

    DALM is a proposed language model architecture that enforces algebraic constraints via a three-phase process over domain lattices to prevent cross-domain knowledge contamination during generation.

  19. Retrieval-Augmented Generation for AI-Generated Content: A Survey

    cs.CV 2024-02 accept novelty 5.0

    A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.

  20. Towards General Text Embeddings with Multi-stage Contrastive Learning

    cs.CL 2023-08 unverdicted novelty 5.0

    GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.

  21. Memory as Metabolism: A Design for Companion Knowledge Systems

    cs.AI 2026-04 unverdicted novelty 4.0

    This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...

  22. Mitigating Hallucination on Hallucination in RAG via Ensemble Voting

    cs.CL 2026-03 unverdicted novelty 4.0

    VOTE-RAG applies retrieval voting across diverse queries and response voting across independent generations to mitigate hallucination-on-hallucination in RAG, matching or exceeding complex baselines on six benchmarks ...

Reference graph

Works this paper leans on

232 extracted references · 232 canonical work pages · cited by 20 Pith papers · 43 internal anchors

  1. [1]

    Re2g: Retrieve, rerank, generate, 2022

    Glass, Michael and Rossiello, Gaetano and Chowdhury, Md Faisal Mahbub and Naik, Ankita Rajaram and Cai, Pengshan and Gliozzo, Alfio , title =. doi:10.48550/ARXIV.2207.06300 , url =

  2. [2]

    Proofver: Natural logic theorem proving for fact verification, 2021

    Krishna, Amrith and Riedel, Sebastian and Vlachos, Andreas , title =. doi:10.48550/ARXIV.2108.11357 , url =

  3. [3]

    2018 , volume =

    Guo, Zhaochen and Barbosa, Denilson , title =. 2018 , volume =. doi:10.3233/SW-170273 , journal =

  4. [4]

    Robust Disambiguation of Named Entities in Text

    Hoffart, Johannes and Yosef, Mohamed Amir and Bordino, Ilaria and F. Robust Disambiguation of Named Entities in Text. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011

  5. [6]

    T - RE x: A Large Scale Alignment of Natural Language with Knowledge Base Triples

    Elsahar, Hady and Vougiouklis, Pavlos and Remaci, Arslen and Gravier, Christophe and Hare, Jonathon and Laforest, Frederique and Simperl, Elena. T - RE x: A Large Scale Alignment of Natural Language with Knowledge Base Triples. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

  6. [7]

    HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

    Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D. , title =. doi:10.48550/ARXIV.1809.09600 , url =

  7. [8]

    doi:10.48550/ARXIV.2009.02252 , url =

    Petroni, Fabio and Piktus, Aleksandra and Fan, Angela and Lewis, Patrick and Yazdani, Majid and De Cao, Nicola and Thorne, James and Jernite, Yacine and Karpukhin, Vladimir and Maillard, Jean and Plachouras, Vassilis and Rocktäschel, Tim and Riedel, Sebastian , title =. doi:10.48550/ARXIV.2009.02252 , url =

  8. [10]

    doi:10.48550/ARXIV.2108.13934 , url =

    Glass, Michael and Rossiello, Gaetano and Chowdhury, Md Faisal Mahbub and Gliozzo, Alfio , title =. doi:10.48550/ARXIV.2108.13934 , url =

  9. [11]

    Autoregressive search engines: Generating substrings as document identifiers, 2022

    Bevilacqua, Michele and Ottaviano, Giuseppe and Lewis, Patrick and Yih, Wen-tau and Riedel, Sebastian and Petroni, Fabio , title =. doi:10.48550/ARXIV.2204.10628 , url =

  10. [12]

    The web is your oyster - knowledge-intensive nlp against a very large web corpus, 2021

    Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Okhonko, Dmytro and Broscheit, Samuel and Izacard, Gautier and Lewis, Patrick and Oğuz, Barlas and Grave, Edouard and Yih, Wen-tau and Riedel, Sebastian , title =. doi:10.48550/ARXIV.2112.09924 , url =

  11. [13]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    Measuring Massive Multitask Language Understanding , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

  12. [16]

    Finetuned Language Models Are Zero-Shot Learners

    Wei, Jason and Bosma, Maarten and Zhao, Vincent Y. and Guu, Kelvin and Yu, Adams Wei and Lester, Brian and Du, Nan and Dai, Andrew M. and Le, Quoc V. , title =. doi:10.48550/ARXIV.2109.01652 , url =

  13. [17]

    Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

    Smith, Shaden and Patwary, Mostofa and Norick, Brandon and LeGresley, Patrick and Rajbhandari, Samyam and Casper, Jared and Liu, Zhun and Prabhumoye, Shrimai and Zerveas, George and Korthikanti, Vijay and Zhang, Elton and Child, Rewon and Aminabadi, Reza Yazdani and Bernauer, Julie and Song, Xia and Shoeybi, Mohammad and He, Yuxiong and Houston, Michael a...

  14. [18]

    Scaling Laws for Neural Language Models

    Kaplan, Jared and McCandlish, Sam and Henighan, Tom and Brown, Tom B. and Chess, Benjamin and Child, Rewon and Gray, Scott and Radford, Alec and Wu, Jeffrey and Amodei, Dario , title =. doi:10.48550/ARXIV.2001.08361 , url =

  15. [20]

    2022 , eprint=

    Improving Wikipedia Verifiability with AI , author=. 2022 , eprint=

  16. [21]

    Retrieval augmentation reduces hallucination in conversation, 2021

    Shuster, Kurt and Poff, Spencer and Chen, Moya and Kiela, Douwe and Weston, Jason , keywords =. Retrieval Augmentation Reduces Hallucination in Conversation , publisher =. doi:10.48550/ARXIV.2104.07567 , url =

  17. [22]

    International Conference on Learning Representations , year=

    Wizard of Wikipedia: Knowledge-Powered Conversational Agents , author=. International Conference on Learning Representations , year=

  18. [23]

    End-to-end training of multi-document reader and retriever for open-domain question answering, 2021

    Sachan, Devendra Singh and Reddy, Siva and Hamilton, William and Dyer, Chris and Yogatama, Dani , title =. doi:10.48550/ARXIV.2106.05346 , url =

  19. [24]

    2021 , eprint=

    You Only Need One Model for Open-domain Question Answering , author=. 2021 , eprint=

  20. [25]

    2022 , eprint=

    Unsupervised Dense Information Retrieval with Contrastive Learning , author=. 2022 , eprint=

  21. [26]

    Emergent Abilities of Large Language Models

    Wei, Jason and Tay, Yi and Bommasani, Rishi and Raffel, Colin and Zoph, Barret and Borgeaud, Sebastian and Yogatama, Dani and Bosma, Maarten and Zhou, Denny and Metzler, Donald and Chi, Ed H. and Hashimoto, Tatsunori and Vinyals, Oriol and Liang, Percy and Dean, Jeff and Fedus, William , title =. doi:10.48550/ARXIV.2206.07682 , url =

  22. [27]

    Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel M. and Wu, Jeffrey and W...

  23. [28]

    Training Compute-Optimal Large Language Models

    Hoffmann, Jordan and Borgeaud, Sebastian and Mensch, Arthur and Buchatskaya, Elena and Cai, Trevor and Rutherford, Eliza and Casas, Diego de Las and Hendricks, Lisa Anne and Welbl, Johannes and Clark, Aidan and Hennigan, Tom and Noland, Eric and Millican, Katie and Driessche, George van den and Damoc, Bogdan and Guy, Aurelia and Osindero, Simon and Simony...

  24. [29]

    Chowdhery, Aakanksha and Narang, Sharan and Devlin, Jacob and Bosma, Maarten and Mishra, Gaurav and Roberts, Adam and Barham, Paul and Chung, Hyung Won and Sutton, Charles and Gehrmann, Sebastian and Schuh, Parker and Shi, Kensen and Tsvyashchenko, Sasha and Maynez, Joshua and Rao, Abhishek and Barnes, Parker and Tay, Yi and Shazeer, Noam and Prabhakaran,...

  25. [30]

    Rae, Jack W. and Borgeaud, Sebastian and Cai, Trevor and Millican, Katie and Hoffmann, Jordan and Song, Francis and Aslanides, John and Henderson, Sarah and Ring, Roman and Young, Susannah and Rutherford, Eliza and Hennigan, Tom and Menick, Jacob and Cassirer, Albin and Powell, Richard and Driessche, George van den and Hendricks, Lisa Anne and Rauh, Marib...

  26. [31]

    Lieber, Opher and Sharir, Or and Lenz, Barak and Shoham, Yoav , title =

  27. [32]

    , title =

    Paranjape, Ashwin and Khattab, Omar and Potts, Christopher and Zaharia, Matei and Manning, Christopher D. , title =. doi:10.48550/ARXIV.2110.07752 , url =

  28. [33]

    Unsupervised Cross-lingual Representation Learning at Scale , journal =

    Alexis Conneau and Kartikay Khandelwal and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek and Francisco Guzm. Unsupervised Cross-lingual Representation Learning at Scale , journal =. 2019 , url =

  29. [34]

    CoRR , volume =

    Akari Asai and Xinyan Yu and Jungo Kasai and Hannaneh Hajishirzi , title =. CoRR , volume =. 2021 , url =

  30. [35]

    CoRR , volume =

    Shayne Longpre and Yi Lu and Joachim Daiber , title =. CoRR , volume =. 2020 , url =

  31. [36]

    CoRR , volume =

    Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin , title =. CoRR , volume =. 2021 , url =

  32. [37]

    Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki , title =

    Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki , title =. CoRR , volume =. 2020 , url =

  33. [38]

    arXiv preprint arXiv:2109.10086 , year=

    SPLADE v2: Sparse lexical and expansion model for information retrieval , author=. arXiv preprint arXiv:2109.10086 , year=

  34. [39]

    On Sampling Strategies for Neural Network-based Collaborative Filtering

    Ting Chen and Yizhou Sun and Yue Shi and Liangjie Hong , title =. arXiv preprint arXiv:1706.07881 , year=

  35. [40]

    SimCSE: Simple Contrastive Learning of Sentence Embeddings

    SimCSE: Simple Contrastive Learning of Sentence Embeddings , author=. arXiv preprint arXiv:2104.08821 , year=

  36. [41]

    arXiv preprint arXiv:2002.03932 , year=

    Pre-training tasks for embedding-based large-scale retrieval , author=. arXiv preprint arXiv:2002.03932 , year=

  37. [43]

    Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Ves and Zettlemoyer, Luke , journal=

  38. [44]

    arXiv preprint arXiv:2007.00814 , year=

    Relevance-guided Supervision for OpenQA with ColBERT , author=. arXiv preprint arXiv:2007.00814 , year=

  39. [45]

    Bruce , title =

    Dehghani, Mostafa and Zamani, Hamed and Severyn, Aliaksei and Kamps, Jaap and Croft, W. Bruce , title =. 2017 , booktitle =

  40. [46]

    BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proc. NAACL. 2019

  41. [47]

    Deep Contextualized Word Representations

    Peters, Matthew and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke. Deep Contextualized Word Representations. Proc. NAACL. 2018

  42. [48]

    Chen, Danqi and Fisch, Adam and Weston, Jason and Bordes, Antoine , booktitle =. Reading

  43. [49]

    How Much Knowledge Can You Pack Into the Parameters of a Language Model?

    How Much Knowledge Can You Pack Into the Parameters of a Language Model? , author=. arXiv preprint arXiv:2002.08910 , year=

  44. [50]

    Talmor, Alon and Elazar, Yanai and Goldberg, Yoav and Berant, Jonathan , journal=. o

  45. [51]

    arXiv preprint arXiv:1911.12543 , year=

    How Can We Know What Language Models Know? , author=. arXiv preprint arXiv:1911.12543 , year=

  46. [52]

    Language Models as Knowledge Bases?

    Petroni, Fabio and Rockt. Language Models as Knowledge Bases?. Proc. EMNLP-IJCNLP. 2019

  47. [53]

    OpenAI Technical Report , year=

    Language models are unsupervised multitask learners , author=. OpenAI Technical Report , year=

  48. [54]

    arXiv preprint arXiv:1911.03868 , year=

    Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering , author=. arXiv preprint arXiv:1911.03868 , year=

  49. [55]

    Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , author=. Proc. ICLR , year=

  50. [56]

    arXiv preprint arXiv:2004.07202 , year=

    Entities as experts: Sparse memory access with entity supervision , author=. arXiv preprint arXiv:2004.07202 , year=

  51. [58]

    Latent Retrieval for Weakly Supervised Open Domain Question Answering

    Lee, Kenton and Chang, Ming-Wei and Toutanova, Kristina. Latent Retrieval for Weakly Supervised Open Domain Question Answering. Proc. ACL. 2019

  52. [62]

    arXiv preprint arXiv:1911.02896 , year=

    Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering , author=. arXiv preprint arXiv:1911.02896 , year=

  53. [64]

    End-to-End Open-Domain Question Answering with BERT serini

    Yang, Wei and Xie, Yuqing and Lin, Aileen and Li, Xingyu and Tan, Luchen and Xiong, Kun and Li, Ming and Lin, Jimmy. End-to-End Open-Domain Question Answering with BERT serini. Proc. NAACL (Demonstrations). 2019

  54. [65]

    R ^3 : Reinforced ranker-reader for open-domain question answering , author=. Proc. AAAI , year=

  55. [66]

    Simple and Effective Multi-Paragraph Reading Comprehension

    Clark, Christopher and Gardner, Matt. Simple and Effective Multi-Paragraph Reading Comprehension. Proc. ACL. 2018

  56. [67]

    Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering , author=. Proc. ICLR , year=

  57. [68]

    Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering

    Lee, Jinhyuk and Yun, Seongjun and Kim, Hyunjae and Ko, Miyoung and Kang, Jaewoo. Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering. Proc. EMNLP. 2018

  58. [69]

    A Discrete Hard EM Approach for Weakly Supervised Question Answering

    Min, Sewon and Chen, Danqi and Hajishirzi, Hannaneh and Zettlemoyer, Luke. A Discrete Hard EM Approach for Weakly Supervised Question Answering. Proc. EMNLP-IJCNLP. 2019

  59. [70]

    arXiv preprint arXiv:1909.08041 , year=

    Revealing the importance of semantic retrieval for machine reading at scale , author=. arXiv preprint arXiv:1909.08041 , year=

  60. [71]

    International Conference on Learning Representations , year=

    Improving Neural Language Models with a Continuous Cache , author=. International Conference on Learning Representations , year=

  61. [72]

    Unbounded cache model for online language modeling with open vocabulary

    Grave, Edouard and Cisse, Moustapha and Joulin, Armand , title =. doi:10.48550/ARXIV.1711.02604 , url =

  62. [73]

    International Conference on Learning Representations , year=

    Generalization through Memorization: Nearest Neighbor Language Models , author=. International Conference on Learning Representations , year=

  63. [74]

    Rae, Erich Elsen, and Laurent Sifre

    Borgeaud, Sebastian and Mensch, Arthur and Hoffmann, Jordan and Cai, Trevor and Rutherford, Eliza and Millican, Katie and Driessche, George van den and Lespiau, Jean-Baptiste and Damoc, Bogdan and Clark, Aidan and Casas, Diego de Las and Guy, Aurelia and Menick, Jacob and Ring, Roman and Hennigan, Tom and Huang, Saffron and Maggiore, Loren and Jones, Chri...

  64. [75]

    Adaptive

    Sukhbaatar, Sainbayar and Grave, Edouard and Bojanowski, Piotr and Joulin, Armand , date =. Adaptive

  65. [76]

    Advances in Neural Information Processing Systems 30 , pages =

    Attention is All you Need , author =. Advances in Neural Information Processing Systems 30 , pages =

  66. [77]

    Okapi at

    Robertson, Stephen E and Walker, Steve and Jones, Susan and Hancock-Beaulieu, Micheline M and Gatford, Mike and others , journal=. Okapi at

  67. [78]

    Voorhees, Ellen M and others , booktitle=. The

  68. [79]

    MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

    Ms marco: A human generated machine reading comprehension dataset , author=. arXiv preprint arXiv:1611.09268 , year=

  69. [80]

    and Zettlemoyer, Luke , title =

    Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke , title =. Proc. ACL , year =

  70. [81]

    SQ u AD : 100,000+ Questions for Machine Comprehension of Text

    Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy. SQ u AD : 100,000+ Questions for Machine Comprehension of Text. Proc. EMNLP. 2016

  71. [82]

    Ko. The. TACL , year=

  72. [83]

    Reddy, Siva and Chen, Danqi and Manning, Christopher D , journal=

  73. [84]

    ELI 5: Long Form Question Answering

    Fan, Angela and Jernite, Yacine and Perez, Ethan and Grangier, David and Weston, Jason and Auli, Michael. ELI 5: Long Form Question Answering. Proc. ACL. 2019

  74. [85]

    Adam: A Method for Stochastic Optimization

    Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

  75. [87]

    Bruce , title =

    Dehghani, Mostafa and Zamani, Hamed and Severyn, Aliaksei and Kamps, Jaap and Croft, W. Bruce , title =. 2017 , publisher =. doi:10.1145/3077136.3080832 , booktitle =

  76. [88]

    Journal of documentation , year=

    A statistical interpretation of term specificity and its application in retrieval , author=. Journal of documentation , year=

  77. [89]

    Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages=

    Learning deep structured semantic models for web search using clickthrough data , author=. Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages=

  78. [90]

    Proceedings of the 23rd ACM international conference on conference on information and knowledge management , pages=

    A latent semantic model with convolutional-pooling structure for information retrieval , author=. Proceedings of the 23rd ACM international conference on conference on information and knowledge management , pages=

  79. [91]

    IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=

    Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval , author=. IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=. 2016 , publisher=

  80. [92]

    End-to-End Retrieval in Continuous Space

    End-to-end retrieval in continuous space , author=. arXiv preprint arXiv:1811.08008 , year=

Showing first 80 references.