Recognition: 2 theorem links
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Pith reviewed 2026-05-16 13:44 UTC · model grok-4.3
The pith
Atlas, a retrieval-augmented language model, reaches over 42 percent accuracy on Natural Questions with only 64 examples while using 50 times fewer parameters than a 540 billion parameter model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Atlas is a carefully pre-trained retrieval-augmented language model that performs few-shot learning on knowledge-intensive tasks by retrieving relevant documents from an external index and conditioning its outputs on them, reaching over 42 percent accuracy on Natural Questions with 64 examples while outperforming a 540 billion parameter model.
What carries the argument
Retrieval-augmented language model that fetches documents from a fixed index and uses them to condition token predictions during both pre-training and few-shot inference.
If this is right
- Knowledge can be updated by replacing the document index rather than retraining the entire model.
- Few-shot performance on fact-heavy tasks improves when the index contains higher-quality or more domain-specific documents.
- Smaller retrieval-augmented models can exceed the few-shot results of much larger non-retrieval models on question answering and fact-checking benchmarks.
- The same pre-training recipe extends to other knowledge-intensive benchmarks such as MMLU and KILT with minimal additional examples.
Where Pith is reading between the lines
- This method could lower the compute cost of maintaining up-to-date knowledge systems by shifting storage from model parameters to an editable index.
- Similar retrieval augmentation may help in domains where facts change rapidly, such as current events or scientific literature.
- The approach raises the question of how to build and maintain retrieval indices that remain reliable across many different tasks without manual curation.
Load-bearing premise
The external index always supplies accurate and relevant documents, and the pre-training plus few-shot procedure teaches the model to use those documents correctly without needing to memorize facts internally.
What would settle it
Run Atlas on a Natural Questions subset after removing every relevant document from the retrieval index; if accuracy falls close to zero while a non-retrieval baseline stays flat, the claim is supported.
read the original abstract
Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings. In this work we present Atlas, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples. We perform evaluations on a wide range of tasks, including MMLU, KILT and NaturalQuestions, and study the impact of the content of the document index, showing that it can easily be updated. Notably, Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a 540B parameters model by 3% despite having 50x fewer parameters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Atlas, a pre-trained retrieval-augmented language model designed for few-shot learning on knowledge-intensive tasks. It reports strong results on MMLU, KILT, and Natural Questions, with the headline claim that Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a 540B-parameter model by 3% despite having 50x fewer parameters. The work also examines the effects of varying document index content and demonstrates that the index can be updated post-training.
Significance. If the results hold after addressing the index-overlap concern, the paper would show that retrieval augmentation enables parameter-efficient few-shot transfer for knowledge tasks without requiring the model to store facts internally. The explicit study of index updates is a concrete strength, as it provides a mechanism for knowledge editing that scales independently of model size.
major comments (2)
- [§4.2] §4.2 (Natural Questions experiments): the 42% accuracy result with 64 shots is presented without an ablation that holds the retrieval index fixed while removing any passage overlap or embedding similarity between the index and the 64 few-shot examples. The paper studies index content changes but does not isolate whether performance depends on distributional cues from the few-shot set itself.
- [§3.2] §3.2 (model architecture and training): the interaction between the frozen or jointly trained retriever and the language model during few-shot adaptation is not quantified with respect to retrieval precision on the test distribution; this is load-bearing for the claim that external retrieval substitutes for internal parameter storage.
minor comments (2)
- [Table 2] Table 2: baseline comparisons to PaLM-540B and other models should report standard deviations across multiple few-shot seeds to establish whether the 3% margin is statistically reliable.
- [§5] §5 (index ablations): the qualitative discussion of index updates would benefit from a quantitative metric such as retrieval recall@10 before and after update on a held-out validation split.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the contributions of our work on retrieval-augmented few-shot learning. We address each major comment below and will revise the manuscript to incorporate additional experiments and analysis as needed.
read point-by-point responses
-
Referee: [§4.2] §4.2 (Natural Questions experiments): the 42% accuracy result with 64 shots is presented without an ablation that holds the retrieval index fixed while removing any passage overlap or embedding similarity between the index and the 64 few-shot examples. The paper studies index content changes but does not isolate whether performance depends on distributional cues from the few-shot set itself.
Authors: We agree that an explicit ablation isolating overlap or embedding similarity between the 64 few-shot examples and the retrieval index would strengthen the claim. In the revised manuscript we will add this experiment: we will construct a modified index that removes any passages with high embedding similarity (using the same retriever) or exact overlap with the few-shot set, then re-evaluate the 64-shot Natural Questions performance while keeping the index otherwise fixed. We expect the result to remain robust given the scale of the index relative to the tiny few-shot set, but the new ablation will directly address the concern about distributional cues. revision: yes
-
Referee: [§3.2] §3.2 (model architecture and training): the interaction between the frozen or jointly trained retriever and the language model during few-shot adaptation is not quantified with respect to retrieval precision on the test distribution; this is load-bearing for the claim that external retrieval substitutes for internal parameter storage.
Authors: We appreciate this observation. While §3.2 describes the frozen versus jointly-trained retriever variants, we did not report retrieval precision metrics on the test distribution during few-shot adaptation. In the revision we will add these measurements (e.g., top-1 and top-5 passage accuracy where the gold answer appears in the retrieved context) for both settings on Natural Questions and KILT tasks. This will quantify how retrieval quality evolves during adaptation and directly support the interpretation that external retrieval can substitute for internal parameter storage. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's central claims rest on empirical evaluations of a retrieval-augmented model (Atlas) on public benchmarks such as Natural Questions, MMLU, and KILT using few-shot examples. The retrieval index is treated as an external, updatable knowledge source whose content impact is explicitly studied, with no equations or self-citations that reduce performance metrics to fitted parameters by construction or rename known results. The setup draws on independent pre-training and external document collections, rendering the reported accuracies (e.g., >42% on NQ with 64 shots) falsifiable against benchmarks rather than tautological.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 22 Pith papers
-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.
-
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
-
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
API-Bank is a new benchmark and training dataset for tool-augmented LLMs that shows fine-tuned models can approach GPT-3.5 tool-use effectiveness.
-
PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning
PlantMarkerBench supplies 5,550 literature sentences annotated for plant marker gene evidence validity and type across Arabidopsis, maize, rice and tomato, showing frontier LLMs handle direct expression evidence but s...
-
PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning
PlantMarkerBench is a new multi-species benchmark with 5,550 evidence instances for evaluating language models on literature-grounded plant marker gene reasoning across expression, localization, function, indirect, an...
-
AdversarialCoT: Single-Document Retrieval Poisoning for LLM Reasoning
A single query-specific poisoned document, built by extracting and iteratively refining an adversarial chain-of-thought, can substantially degrade reasoning accuracy in retrieval-augmented LLM systems.
-
MMSearch-R1: Incentivizing LMMs to Search
MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting searc...
-
C-Pack: Packed Resources For General Chinese Embeddings
C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
-
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accura...
-
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-paramete...
-
Context Matters: Evaluating Context Strategies for Automated ADR Generation Using LLMs
A small recency window of 3-5 prior ADRs as context produces higher-fidelity LLM-generated Architecture Decision Records than no context, full history, or retrieval-augmented selection in typical sequential workflows.
-
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
ReSearch trains LLMs via RL to integrate search operations into reasoning steps, achieving strong generalization across benchmarks and eliciting reflection and self-correction without supervised reasoning data.
-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
-
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.
-
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering close the gap.
-
FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory
FSFM is a biologically-inspired selective forgetting framework for LLM agents that claims to boost access efficiency by 8.49%, content quality by 29.2% signal-to-noise, and eliminate security risks entirely through a ...
-
DALM: A Domain-Algebraic Language Model via Three-Phase Structured Generation
DALM is a proposed language model architecture that enforces algebraic constraints via a three-phase process over domain lattices to prevent cross-domain knowledge contamination during generation.
-
Retrieval-Augmented Generation for AI-Generated Content: A Survey
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.
-
Towards General Text Embeddings with Multi-stage Contrastive Learning
GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.
-
Memory as Metabolism: A Design for Companion Knowledge Systems
This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...
-
Mitigating Hallucination on Hallucination in RAG via Ensemble Voting
VOTE-RAG applies retrieval voting across diverse queries and response voting across independent generations to mitigate hallucination-on-hallucination in RAG, matching or exceeding complex baselines on six benchmarks ...
Reference graph
Works this paper leans on
-
[1]
Re2g: Retrieve, rerank, generate, 2022
Glass, Michael and Rossiello, Gaetano and Chowdhury, Md Faisal Mahbub and Naik, Ankita Rajaram and Cai, Pengshan and Gliozzo, Alfio , title =. doi:10.48550/ARXIV.2207.06300 , url =
-
[2]
Proofver: Natural logic theorem proving for fact verification, 2021
Krishna, Amrith and Riedel, Sebastian and Vlachos, Andreas , title =. doi:10.48550/ARXIV.2108.11357 , url =
-
[3]
Guo, Zhaochen and Barbosa, Denilson , title =. 2018 , volume =. doi:10.3233/SW-170273 , journal =
-
[4]
Robust Disambiguation of Named Entities in Text
Hoffart, Johannes and Yosef, Mohamed Amir and Bordino, Ilaria and F. Robust Disambiguation of Named Entities in Text. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011
work page 2011
-
[6]
T - RE x: A Large Scale Alignment of Natural Language with Knowledge Base Triples
Elsahar, Hady and Vougiouklis, Pavlos and Remaci, Arslen and Gravier, Christophe and Hare, Jonathon and Laforest, Frederique and Simperl, Elena. T - RE x: A Large Scale Alignment of Natural Language with Knowledge Base Triples. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018
work page 2018
-
[7]
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D. , title =. doi:10.48550/ARXIV.1809.09600 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1809.09600
-
[8]
doi:10.48550/ARXIV.2009.02252 , url =
Petroni, Fabio and Piktus, Aleksandra and Fan, Angela and Lewis, Patrick and Yazdani, Majid and De Cao, Nicola and Thorne, James and Jernite, Yacine and Karpukhin, Vladimir and Maillard, Jean and Plachouras, Vassilis and Rocktäschel, Tim and Riedel, Sebastian , title =. doi:10.48550/ARXIV.2009.02252 , url =
-
[10]
doi:10.48550/ARXIV.2108.13934 , url =
Glass, Michael and Rossiello, Gaetano and Chowdhury, Md Faisal Mahbub and Gliozzo, Alfio , title =. doi:10.48550/ARXIV.2108.13934 , url =
-
[11]
Autoregressive search engines: Generating substrings as document identifiers, 2022
Bevilacqua, Michele and Ottaviano, Giuseppe and Lewis, Patrick and Yih, Wen-tau and Riedel, Sebastian and Petroni, Fabio , title =. doi:10.48550/ARXIV.2204.10628 , url =
-
[12]
The web is your oyster - knowledge-intensive nlp against a very large web corpus, 2021
Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Okhonko, Dmytro and Broscheit, Samuel and Izacard, Gautier and Lewis, Patrick and Oğuz, Barlas and Grave, Edouard and Yih, Wen-tau and Riedel, Sebastian , title =. doi:10.48550/ARXIV.2112.09924 , url =
-
[13]
Proceedings of the International Conference on Learning Representations (ICLR) , year=
Measuring Massive Multitask Language Understanding , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=
-
[16]
Finetuned Language Models Are Zero-Shot Learners
Wei, Jason and Bosma, Maarten and Zhao, Vincent Y. and Guu, Kelvin and Yu, Adams Wei and Lester, Brian and Du, Nan and Dai, Andrew M. and Le, Quoc V. , title =. doi:10.48550/ARXIV.2109.01652 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2109.01652
-
[17]
Smith, Shaden and Patwary, Mostofa and Norick, Brandon and LeGresley, Patrick and Rajbhandari, Samyam and Casper, Jared and Liu, Zhun and Prabhumoye, Shrimai and Zerveas, George and Korthikanti, Vijay and Zhang, Elton and Child, Rewon and Aminabadi, Reza Yazdani and Bernauer, Julie and Song, Xia and Shoeybi, Mohammad and He, Yuxiong and Houston, Michael a...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2201.11990
-
[18]
Scaling Laws for Neural Language Models
Kaplan, Jared and McCandlish, Sam and Henighan, Tom and Brown, Tom B. and Chess, Benjamin and Child, Rewon and Gray, Scott and Radford, Alec and Wu, Jeffrey and Amodei, Dario , title =. doi:10.48550/ARXIV.2001.08361 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2001.08361 2001
- [20]
-
[21]
Retrieval augmentation reduces hallucination in conversation, 2021
Shuster, Kurt and Poff, Spencer and Chen, Moya and Kiela, Douwe and Weston, Jason , keywords =. Retrieval Augmentation Reduces Hallucination in Conversation , publisher =. doi:10.48550/ARXIV.2104.07567 , url =
-
[22]
International Conference on Learning Representations , year=
Wizard of Wikipedia: Knowledge-Powered Conversational Agents , author=. International Conference on Learning Representations , year=
-
[23]
End-to-end training of multi-document reader and retriever for open-domain question answering, 2021
Sachan, Devendra Singh and Reddy, Siva and Hamilton, William and Dyer, Chris and Yogatama, Dani , title =. doi:10.48550/ARXIV.2106.05346 , url =
-
[24]
You Only Need One Model for Open-domain Question Answering , author=. 2021 , eprint=
work page 2021
-
[25]
Unsupervised Dense Information Retrieval with Contrastive Learning , author=. 2022 , eprint=
work page 2022
-
[26]
Emergent Abilities of Large Language Models
Wei, Jason and Tay, Yi and Bommasani, Rishi and Raffel, Colin and Zoph, Barret and Borgeaud, Sebastian and Yogatama, Dani and Bosma, Maarten and Zhou, Denny and Metzler, Donald and Chi, Ed H. and Hashimoto, Tatsunori and Vinyals, Oriol and Liang, Percy and Dean, Jeff and Fedus, William , title =. doi:10.48550/ARXIV.2206.07682 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2206.07682
-
[27]
Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel M. and Wu, Jeffrey and W...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.14165 2005
-
[28]
Training Compute-Optimal Large Language Models
Hoffmann, Jordan and Borgeaud, Sebastian and Mensch, Arthur and Buchatskaya, Elena and Cai, Trevor and Rutherford, Eliza and Casas, Diego de Las and Hendricks, Lisa Anne and Welbl, Johannes and Clark, Aidan and Hennigan, Tom and Noland, Eric and Millican, Katie and Driessche, George van den and Damoc, Bogdan and Guy, Aurelia and Osindero, Simon and Simony...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.15556 2022
-
[29]
Chowdhery, Aakanksha and Narang, Sharan and Devlin, Jacob and Bosma, Maarten and Mishra, Gaurav and Roberts, Adam and Barham, Paul and Chung, Hyung Won and Sutton, Charles and Gehrmann, Sebastian and Schuh, Parker and Shi, Kensen and Tsvyashchenko, Sasha and Maynez, Joshua and Rao, Abhishek and Barnes, Parker and Tay, Yi and Shazeer, Noam and Prabhakaran,...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.02311
-
[30]
Rae, Jack W. and Borgeaud, Sebastian and Cai, Trevor and Millican, Katie and Hoffmann, Jordan and Song, Francis and Aslanides, John and Henderson, Sarah and Ring, Roman and Young, Susannah and Rutherford, Eliza and Hennigan, Tom and Menick, Jacob and Cassirer, Albin and Powell, Richard and Driessche, George van den and Hendricks, Lisa Anne and Rauh, Marib...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2112.11446
-
[31]
Lieber, Opher and Sharir, Or and Lenz, Barak and Shoham, Yoav , title =
-
[32]
Paranjape, Ashwin and Khattab, Omar and Potts, Christopher and Zaharia, Matei and Manning, Christopher D. , title =. doi:10.48550/ARXIV.2110.07752 , url =
-
[33]
Unsupervised Cross-lingual Representation Learning at Scale , journal =
Alexis Conneau and Kartikay Khandelwal and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek and Francisco Guzm. Unsupervised Cross-lingual Representation Learning at Scale , journal =. 2019 , url =
work page 2019
-
[34]
Akari Asai and Xinyan Yu and Jungo Kasai and Hannaneh Hajishirzi , title =. CoRR , volume =. 2021 , url =
work page 2021
-
[35]
Shayne Longpre and Yi Lu and Joachim Daiber , title =. CoRR , volume =. 2020 , url =
work page 2020
-
[36]
Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin , title =. CoRR , volume =. 2021 , url =
work page 2021
-
[37]
Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki , title =. CoRR , volume =. 2020 , url =
work page 2020
-
[38]
arXiv preprint arXiv:2109.10086 , year=
SPLADE v2: Sparse lexical and expansion model for information retrieval , author=. arXiv preprint arXiv:2109.10086 , year=
-
[39]
On Sampling Strategies for Neural Network-based Collaborative Filtering
Ting Chen and Yizhou Sun and Yue Shi and Liangjie Hong , title =. arXiv preprint arXiv:1706.07881 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[40]
SimCSE: Simple Contrastive Learning of Sentence Embeddings
SimCSE: Simple Contrastive Learning of Sentence Embeddings , author=. arXiv preprint arXiv:2104.08821 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[41]
arXiv preprint arXiv:2002.03932 , year=
Pre-training tasks for embedding-based large-scale retrieval , author=. arXiv preprint arXiv:2002.03932 , year=
-
[43]
Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Ves and Zettlemoyer, Luke , journal=
-
[44]
arXiv preprint arXiv:2007.00814 , year=
Relevance-guided Supervision for OpenQA with ColBERT , author=. arXiv preprint arXiv:2007.00814 , year=
-
[45]
Dehghani, Mostafa and Zamani, Hamed and Severyn, Aliaksei and Kamps, Jaap and Croft, W. Bruce , title =. 2017 , booktitle =
work page 2017
-
[46]
BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proc. NAACL. 2019
work page 2019
-
[47]
Deep Contextualized Word Representations
Peters, Matthew and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke. Deep Contextualized Word Representations. Proc. NAACL. 2018
work page 2018
-
[48]
Chen, Danqi and Fisch, Adam and Weston, Jason and Bordes, Antoine , booktitle =. Reading
-
[49]
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
How Much Knowledge Can You Pack Into the Parameters of a Language Model? , author=. arXiv preprint arXiv:2002.08910 , year=
work page internal anchor Pith review Pith/arXiv arXiv 2002
-
[50]
Talmor, Alon and Elazar, Yanai and Goldberg, Yoav and Berant, Jonathan , journal=. o
-
[51]
arXiv preprint arXiv:1911.12543 , year=
How Can We Know What Language Models Know? , author=. arXiv preprint arXiv:1911.12543 , year=
-
[52]
Language Models as Knowledge Bases?
Petroni, Fabio and Rockt. Language Models as Knowledge Bases?. Proc. EMNLP-IJCNLP. 2019
work page 2019
-
[53]
OpenAI Technical Report , year=
Language models are unsupervised multitask learners , author=. OpenAI Technical Report , year=
-
[54]
arXiv preprint arXiv:1911.03868 , year=
Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering , author=. arXiv preprint arXiv:1911.03868 , year=
-
[55]
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , author=. Proc. ICLR , year=
-
[56]
arXiv preprint arXiv:2004.07202 , year=
Entities as experts: Sparse memory access with entity supervision , author=. arXiv preprint arXiv:2004.07202 , year=
-
[58]
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Lee, Kenton and Chang, Ming-Wei and Toutanova, Kristina. Latent Retrieval for Weakly Supervised Open Domain Question Answering. Proc. ACL. 2019
work page 2019
-
[62]
arXiv preprint arXiv:1911.02896 , year=
Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering , author=. arXiv preprint arXiv:1911.02896 , year=
-
[64]
End-to-End Open-Domain Question Answering with BERT serini
Yang, Wei and Xie, Yuqing and Lin, Aileen and Li, Xingyu and Tan, Luchen and Xiong, Kun and Li, Ming and Lin, Jimmy. End-to-End Open-Domain Question Answering with BERT serini. Proc. NAACL (Demonstrations). 2019
work page 2019
-
[65]
R ^3 : Reinforced ranker-reader for open-domain question answering , author=. Proc. AAAI , year=
-
[66]
Simple and Effective Multi-Paragraph Reading Comprehension
Clark, Christopher and Gardner, Matt. Simple and Effective Multi-Paragraph Reading Comprehension. Proc. ACL. 2018
work page 2018
-
[67]
Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering , author=. Proc. ICLR , year=
-
[68]
Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering
Lee, Jinhyuk and Yun, Seongjun and Kim, Hyunjae and Ko, Miyoung and Kang, Jaewoo. Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering. Proc. EMNLP. 2018
work page 2018
-
[69]
A Discrete Hard EM Approach for Weakly Supervised Question Answering
Min, Sewon and Chen, Danqi and Hajishirzi, Hannaneh and Zettlemoyer, Luke. A Discrete Hard EM Approach for Weakly Supervised Question Answering. Proc. EMNLP-IJCNLP. 2019
work page 2019
-
[70]
arXiv preprint arXiv:1909.08041 , year=
Revealing the importance of semantic retrieval for machine reading at scale , author=. arXiv preprint arXiv:1909.08041 , year=
-
[71]
International Conference on Learning Representations , year=
Improving Neural Language Models with a Continuous Cache , author=. International Conference on Learning Representations , year=
-
[72]
Unbounded cache model for online language modeling with open vocabulary
Grave, Edouard and Cisse, Moustapha and Joulin, Armand , title =. doi:10.48550/ARXIV.1711.02604 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1711.02604
-
[73]
International Conference on Learning Representations , year=
Generalization through Memorization: Nearest Neighbor Language Models , author=. International Conference on Learning Representations , year=
-
[74]
Rae, Erich Elsen, and Laurent Sifre
Borgeaud, Sebastian and Mensch, Arthur and Hoffmann, Jordan and Cai, Trevor and Rutherford, Eliza and Millican, Katie and Driessche, George van den and Lespiau, Jean-Baptiste and Damoc, Bogdan and Clark, Aidan and Casas, Diego de Las and Guy, Aurelia and Menick, Jacob and Ring, Roman and Hennigan, Tom and Huang, Saffron and Maggiore, Loren and Jones, Chri...
- [75]
-
[76]
Advances in Neural Information Processing Systems 30 , pages =
Attention is All you Need , author =. Advances in Neural Information Processing Systems 30 , pages =
- [77]
-
[78]
Voorhees, Ellen M and others , booktitle=. The
-
[79]
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Ms marco: A human generated machine reading comprehension dataset , author=. arXiv preprint arXiv:1611.09268 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[80]
and Zettlemoyer, Luke , title =
Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke , title =. Proc. ACL , year =
-
[81]
SQ u AD : 100,000+ Questions for Machine Comprehension of Text
Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy. SQ u AD : 100,000+ Questions for Machine Comprehension of Text. Proc. EMNLP. 2016
work page 2016
-
[82]
Ko. The. TACL , year=
-
[83]
Reddy, Siva and Chen, Danqi and Manning, Christopher D , journal=
-
[84]
ELI 5: Long Form Question Answering
Fan, Angela and Jernite, Yacine and Perez, Ethan and Grangier, David and Weston, Jason and Auli, Michael. ELI 5: Long Form Question Answering. Proc. ACL. 2019
work page 2019
-
[85]
Adam: A Method for Stochastic Optimization
Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[87]
Dehghani, Mostafa and Zamani, Hamed and Severyn, Aliaksei and Kamps, Jaap and Croft, W. Bruce , title =. 2017 , publisher =. doi:10.1145/3077136.3080832 , booktitle =
-
[88]
Journal of documentation , year=
A statistical interpretation of term specificity and its application in retrieval , author=. Journal of documentation , year=
-
[89]
Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages=
Learning deep structured semantic models for web search using clickthrough data , author=. Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages=
-
[90]
A latent semantic model with convolutional-pooling structure for information retrieval , author=. Proceedings of the 23rd ACM international conference on conference on information and knowledge management , pages=
-
[91]
IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=
Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval , author=. IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=. 2016 , publisher=
work page 2016
-
[92]
End-to-End Retrieval in Continuous Space
End-to-end retrieval in continuous space , author=. arXiv preprint arXiv:1811.08008 , year=
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.