Recognition: 3 theorem links
· Lean TheoremUnsupervised Dense Information Retrieval with Contrastive Learning
Pith reviewed 2026-05-12 13:15 UTC · model grok-4.3
The pith
Contrastive learning on unlabeled text trains dense retrievers that outperform BM25 on most BEIR datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual
What carries the argument
Contrastive learning on dense neural encoders trained solely on unlabeled text to produce vector representations whose similarities reflect semantic relevance.
If this is right
- Such unsupervised dense retrievers become practical for new applications lacking labeled data.
- Using the contrastive model as pre-training improves fine-tuned performance on both small domain-specific sets and large collections like MS MARCO.
- The same models deliver strong results in multilingual retrieval and support cross-lingual transfer even when fine-tuned only on English data.
- Cross-script retrieval becomes possible, such as finding English documents from Arabic queries, which term-based methods cannot do.
Where Pith is reading between the lines
- If the contrastive representations capture semantics reliably, they could extend to other retrieval-adjacent tasks like question answering or clustering without labels.
- Hybrid systems combining these dense vectors with sparse BM25 scores might yield further gains on heterogeneous data.
- Applying the method to even lower-resource languages or domains would test how far the unsupervised advantage reaches.
Load-bearing premise
That the similarities learned by contrastive training on unlabeled text match human notions of relevance closely enough to beat term-frequency methods across diverse datasets and languages.
What would settle it
A new retrieval dataset or language pair on which the unsupervised contrastive model achieves lower Recall@100 than BM25 would indicate the performance does not generalize as claimed.
read the original abstract
Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised term-frequency methods such as BM25. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS~MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual transfer when fine-tuned on supervised English data only and evaluated on low resources language such as Swahili. We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes training dense retrievers in a fully unsupervised manner via contrastive learning on unlabeled text corpora, using in-batch negatives. It reports that the resulting model outperforms the BM25 baseline on Recall@100 for 11 of 15 datasets in the BEIR zero-shot benchmark. The authors further show that the same unsupervised model improves downstream performance when used as pre-training before fine-tuning on either a few thousand in-domain examples or the full MS MARCO collection, and that it yields strong results for multilingual retrieval and cross-lingual transfer (including English-to-Swahili and cross-script Arabic-to-English).
Significance. If the reported numbers hold under full experimental verification, the work is significant because it supplies concrete evidence that contrastive objectives applied to raw text can produce dense representations whose similarity aligns with relevance well enough to beat a strong term-frequency baseline across heterogeneous domains and languages. The multilingual and cross-lingual results are particularly valuable for low-resource settings. The evaluation on the public BEIR suite and the explicit pre-training gains constitute reproducible, falsifiable claims that advance the practical deployment of dense retrievers without labeled data.
major comments (2)
- [§3] §3 (contrastive training details): the central empirical claim depends on the precise construction of in-batch negatives and the choice of temperature and batch size; without these hyperparameters and the exact unlabeled corpus statistics, it is impossible to verify that the reported Recall@100 gains are attributable to the contrastive objective rather than to corpus or implementation specifics.
- [Table 1] Table 1 (BEIR results): the 11/15 outperformance claim is load-bearing, yet the table does not report variance across random seeds or the exact negative-sampling procedure per dataset; a single-run comparison leaves open the possibility that the margin over BM25 is within noise on several of the 11 datasets.
minor comments (2)
- [Abstract] The abstract states that the model 'outperforms BM25 on 11 out of 15 datasets' but does not name the four datasets where it does not; adding this information would improve clarity.
- [Method] Notation for the contrastive loss (Eq. 1 or equivalent) uses 'in-batch negatives' without an explicit formula; a short equation would remove ambiguity about whether hard negatives or only random in-batch negatives are used.
Simulated Author's Rebuttal
We thank the referee for the constructive review and recommendation of minor revision. We address each major comment below with clarifications and commitments to revisions that strengthen reproducibility without altering the core claims.
read point-by-point responses
-
Referee: [§3] §3 (contrastive training details): the central empirical claim depends on the precise construction of in-batch negatives and the choice of temperature and batch size; without these hyperparameters and the exact unlabeled corpus statistics, it is impossible to verify that the reported Recall@100 gains are attributable to the contrastive objective rather than to corpus or implementation specifics.
Authors: We agree that these implementation details are necessary for verification and reproducibility. In the revised manuscript we will expand Section 3 to report the exact batch size, temperature value, the construction of in-batch negatives (other in-batch examples serve as negatives for each positive pair), and the precise statistics of the unlabeled corpus used for training (size, source, and preprocessing). These additions will make it possible to confirm that performance differences arise from the contrastive objective. revision: yes
-
Referee: [Table 1] Table 1 (BEIR results): the 11/15 outperformance claim is load-bearing, yet the table does not report variance across random seeds or the exact negative-sampling procedure per dataset; a single-run comparison leaves open the possibility that the margin over BM25 is within noise on several of the 11 datasets.
Authors: We will revise the text and table caption to explicitly describe the negative-sampling procedure, which applies the same in-batch negative construction uniformly to every dataset. We acknowledge that multi-seed variance would increase statistical confidence. Our experiments used a single training run per configuration owing to the high computational cost of large-batch contrastive pre-training. The margins over BM25 are substantial on the majority of the 11 datasets, making noise an unlikely explanation, yet we will add an explicit discussion of this limitation in the revised paper. revision: partial
- Reporting variance across random seeds for the unsupervised models in Table 1, as multiple independent training runs were not performed.
Circularity Check
No significant circularity
full rationale
The paper is an empirical study applying standard contrastive learning (in-batch negatives on unlabeled text) to produce dense retrievers. All performance claims are measured on the external held-out BEIR benchmark (and cross-lingual sets) that play no role in training or hyperparameter selection. No derivation, equation, or self-citation reduces the central result to a fitted quantity or to a prior result by the same authors; the contrastive objective and evaluation protocol are independent of the reported Recall@100 numbers.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
Cost.FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100.
-
Foundation.LawOfExistenceexistence_economically_inevitable echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Contrastive learning is an approach that relies on the fact that every document is, in some way, unique. This signal is the only information available in the absence of manual supervision.
-
Foundation.HierarchyEmergencehierarchy_emergence_forces_phi unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The relevance score between a query and a document is given by the dot product between their representations after applying the encoder.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 32 Pith papers
-
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
-
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
MathNet delivers the largest multilingual Olympiad math dataset and benchmarks where models like Gemini-3.1-Pro reach 78% on solving but embedding models struggle on equivalent problem retrieval, with retrieval augmen...
-
MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents
MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.
-
Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders
EPIC trains LLMs to treat continuous embeddings as in-context prompts, yielding state-of-the-art text embedding performance on MTEB with or without prompts at inference and lower compute.
-
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
-
Skill Retrieval Augmentation for Agentic AI
Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.
-
Bridging the Long-Tail Gap: Robust Retrieval-Augmented Relation Completion via Multi-Stage Paraphrase Infusion
RC-RAG boosts long-tail relation completion by infusing paraphrases into RAG stages, yielding up to 40.6 EM gains on benchmarks across five LLMs with no fine-tuning.
-
Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation
An LLM simulation framework generates multilingual tip-of-the-tongue queries, validated by rank correlation with real queries, producing the first large-scale ToT benchmarks for four languages.
-
HaS: Accelerating RAG through Homology-Aware Speculative Retrieval
HaS accelerates RAG retrieval via homology-aware speculative retrieval and homologous query re-identification validation, cutting latency 24-37% with 1-2% accuracy drop on tested datasets.
-
Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
DEJA uses evolutionary optimization guided by an LLM-based Answer Utility Score to induce soft-failure responses in RAG systems, achieving over 79% soft attack success rate with under 15% hard failures and high stealt...
-
C-Pack: Packed Resources For General Chinese Embeddings
C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
-
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory
SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and l...
-
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
-
Reproducing Complex Set-Compositional Information Retrieval
Neural retrievers that double BM25 performance on QUEST collapse below 0.02 Recall@100 on the new LIMIT+ benchmark while lexical methods reach 0.96, with all methods degrading as compositional depth increases.
-
RAG over Thinking Traces Can Improve Reasoning Tasks
RAG over structured thinking traces boosts LLM reasoning on AIME, LiveCodeBench, and GPQA, with relative gains up to 56% and little added cost.
-
Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning
Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.
-
Beyond Semantic Relevance: Counterfactual Risk Minimization for Robust Retrieval-Augmented Generation
CoRM-RAG uses a cognitive perturbation protocol to simulate biases and trains an Evidence Critic to retrieve documents that support correct decisions even under adversarial query changes.
-
CleanBase: Detecting Malicious Documents in RAG Knowledge Databases
CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.
-
UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
UniCon unifies contrastive alignment across encoders and alignment types using kernels to enable exact closed-form updates instead of stochastic optimization.
-
ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval
ARHN refines hard-negative training data for dense retrieval by using LLMs to convert answer-containing passages into additional positives and exclude answer-containing passages from the negative set.
-
Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory
ACGM learns task-adaptive sparse graphs over multi-modal agent histories via policy-gradient optimization, reaching 82.7 nDCG@10 and 89.2% Precision@10 on WebShop, VisualWebArena, and Mind2Web while outperforming 19 b...
-
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
Personalized RewardBench reveals that state-of-the-art reward models reach only 75.94% accuracy on personalized preferences and shows stronger correlation with downstream BoN and PPO performance than prior benchmarks.
-
Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers
Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.
-
Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity
Generative retrieval beats dense retrieval and BM25 on the LIMIT dataset but degrades with hard negatives due to identifier ambiguity during decoding.
-
Are LLM-Based Retrievers Worth Their Cost? An Empirical Study of Efficiency, Robustness, and Reasoning Overhead
Empirical comparison across 14 retrievers on the BRIGHT benchmark shows reasoning-specialized models can match strong accuracy with competitive speed while many large LLM bi-encoders add latency for small gains and co...
-
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.
-
MemGPT: Towards LLMs as Operating Systems
MemGPT uses OS-inspired virtual context management to extend LLM context windows for large document analysis and long-term multi-session chat.
-
RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering
RECIPER improves procedure-oriented retrieval from materials papers by combining paragraph-level dense retrieval with LLM-extracted procedural summaries and lightweight reranking, yielding average gains of +3.73 Recal...
-
Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions
ConstBERT and ColBERT-v2 reproduce on MS-MARCO but drop 86-97% on long queries because MaxSim cannot filter filler noise, and extra fine-tuning or backend changes do not overcome the architectural constraint.
-
Multilingual E5 Text Embeddings: A Technical Report
Open-source multilingual E5 embedding models are trained via contrastive pre-training on 1 billion text pairs and fine-tuning, with an instruction-tuned model matching English SOTA performance.
-
Text Embeddings by Weakly-Supervised Contrastive Pre-training
E5 text embeddings trained with weakly-supervised contrastive pre-training on CCPairs outperform BM25 on BEIR zero-shot and achieve top results on MTEB, beating much larger models.
-
Galactica: A Large Language Model for Science
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.
Reference graph
Works this paper leans on
-
[1]
Salient phrase aware dense retrieval: Can a dense retriever imitate a sparse one?, 2021
Chen, Xilun and Lakhotia, Kushal and Oğuz, Barlas and Gupta, Anchit and Lewis, Patrick and Peshterliev, Stan and Mehdad, Yashar and Gupta, Sonal and Yih, Wen-tau , title =. doi:10.48550/ARXIV.2110.06918 , url =
-
[2]
Learning to retrieve passages without supervision, 2021
Ram, Ori and Shachaf, Gal and Levy, Omer and Berant, Jonathan and Globerson, Amir , title =. doi:10.48550/ARXIV.2112.07708 , url =
-
[4]
Unsupervised Cross-lingual Representation Learning at Scale , journal =
Alexis Conneau and Kartikay Khandelwal and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek and Francisco Guzm. Unsupervised Cross-lingual Representation Learning at Scale , journal =. 2019 , url =
work page 2019
-
[5]
Akari Asai and Xinyan Yu and Jungo Kasai and Hannaneh Hajishirzi , title =. CoRR , volume =. 2021 , url =
work page 2021
-
[6]
Shayne Longpre and Yi Lu and Joachim Daiber , title =. CoRR , volume =. 2020 , url =
work page 2020
-
[7]
Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin , title =. CoRR , volume =. 2021 , url =
work page 2021
-
[8]
Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki , title =. CoRR , volume =. 2020 , url =
work page 2020
-
[14]
Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Ves and Zettlemoyer, Luke , journal=
-
[15]
arXiv preprint arXiv:2007.00814 , year=
Relevance-guided Supervision for OpenQA with ColBERT , author=. arXiv preprint arXiv:2007.00814 , year=
-
[16]
Dehghani, Mostafa and Zamani, Hamed and Severyn, Aliaksei and Kamps, Jaap and Croft, W. Bruce , title =. 2017 , booktitle =
work page 2017
-
[17]
BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proc. NAACL. 2019
work page 2019
-
[18]
Deep Contextualized Word Representations
Peters, Matthew and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke. Deep Contextualized Word Representations. Proc. NAACL. 2018
work page 2018
-
[19]
Chen, Danqi and Fisch, Adam and Weston, Jason and Bordes, Antoine , booktitle =. Reading
-
[20]
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
How Much Knowledge Can You Pack Into the Parameters of a Language Model? , author=. arXiv preprint arXiv:2002.08910 , year=
work page internal anchor Pith review arXiv 2002
-
[21]
Talmor, Alon and Elazar, Yanai and Goldberg, Yoav and Berant, Jonathan , journal=. o
-
[22]
How Can We Know What Language Models Know? , author=. arXiv preprint arXiv:1911.12543 , year=
-
[23]
Language Models as Knowledge Bases?
Petroni, Fabio and Rockt. Language Models as Knowledge Bases?. Proc. EMNLP-IJCNLP. 2019
work page 2019
-
[24]
OpenAI Technical Report , year=
Language models are unsupervised multitask learners , author=. OpenAI Technical Report , year=
-
[25]
Language Models are Few-Shot Learners
Language models are few-shot learners , author=. arXiv preprint arXiv:2005.14165 , year=
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[26]
arXiv preprint arXiv:1911.03868 , year=
Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering , author=. arXiv preprint arXiv:1911.03868 , year=
-
[27]
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , author=. Proc. ICLR , year=
-
[28]
arXiv preprint arXiv:2004.07202 , year=
Entities as experts: Sparse memory access with entity supervision , author=. arXiv preprint arXiv:2004.07202 , year=
-
[29]
arXiv preprint arXiv:2005.04611 , year=
How Context Affects Language Models' Factual Predictions , author=. arXiv preprint arXiv:2005.04611 , year=
-
[30]
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Lee, Kenton and Chang, Ming-Wei and Toutanova, Kristina. Latent Retrieval for Weakly Supervised Open Domain Question Answering. Proc. ACL. 2019
work page 2019
-
[33]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. arXiv preprint arXiv:2005.11401 , year=
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[34]
arXiv preprint arXiv:1911.02896 , year=
Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering , author=. arXiv preprint arXiv:1911.02896 , year=
-
[35]
Multi-passage BERT : A Globally Normalized BERT Model for Open-domain Question Answering
Wang, Zhiguo and Ng, Patrick and Ma, Xiaofei and Nallapati, Ramesh and Xiang, Bing. Multi-passage BERT : A Globally Normalized BERT Model for Open-domain Question Answering. Proc. EMNLP-IJCNLP. 2019
work page 2019
-
[36]
End-to-End Open-Domain Question Answering with BERT serini
Yang, Wei and Xie, Yuqing and Lin, Aileen and Li, Xingyu and Tan, Luchen and Xiong, Kun and Li, Ming and Lin, Jimmy. End-to-End Open-Domain Question Answering with BERT serini. Proc. NAACL (Demonstrations). 2019
work page 2019
-
[37]
R ^3 : Reinforced ranker-reader for open-domain question answering , author=. Proc. AAAI , year=
-
[38]
Simple and Effective Multi-Paragraph Reading Comprehension
Clark, Christopher and Gardner, Matt. Simple and Effective Multi-Paragraph Reading Comprehension. Proc. ACL. 2018
work page 2018
-
[39]
Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering , author=. Proc. ICLR , year=
-
[40]
Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering
Lee, Jinhyuk and Yun, Seongjun and Kim, Hyunjae and Ko, Miyoung and Kang, Jaewoo. Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering. Proc. EMNLP. 2018
work page 2018
-
[41]
A Discrete Hard EM Approach for Weakly Supervised Question Answering
Min, Sewon and Chen, Danqi and Hajishirzi, Hannaneh and Zettlemoyer, Luke. A Discrete Hard EM Approach for Weakly Supervised Question Answering. Proc. EMNLP-IJCNLP. 2019
work page 2019
-
[42]
arXiv preprint arXiv:1909.08041 , year=
Revealing the importance of semantic retrieval for machine reading at scale , author=. arXiv preprint arXiv:1909.08041 , year=
- [43]
-
[44]
Khandelwal, Urvashi and Levy, Omer and Jurafsky, Dan and Zettlemoyer, Luke and Lewis, Mike , date =. Generalization through
- [45]
-
[46]
Advances in Neural Information Processing Systems 30 , pages =
Attention is All you Need , author =. Advances in Neural Information Processing Systems 30 , pages =
- [47]
-
[48]
Voorhees, Ellen M and others , booktitle=. The
-
[49]
Tom Kwiatkowski and Jennimaria Palomaki and Olivia Redfield and Michael Collins and Ankur Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Matthew Kelcey and Jacob Devlin and Kenton Lee and Kristina N. Toutanova and Llion Jones and Ming-Wei Chang and Andrew Dai and Jakob Uszkoreit and Quoc Le and Slav Petrov , year =. Natural
-
[51]
and Zettlemoyer, Luke , title =
Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke , title =. Proc. ACL , year =
-
[52]
SQ u AD : 100,000+ Questions for Machine Comprehension of Text
Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy. SQ u AD : 100,000+ Questions for Machine Comprehension of Text. Proc. EMNLP. 2016
work page 2016
-
[53]
Ko. The. TACL , year=
-
[54]
Reddy, Siva and Chen, Danqi and Manning, Christopher D , journal=
-
[55]
ELI 5: Long Form Question Answering
Fan, Angela and Jernite, Yacine and Perez, Ethan and Grangier, David and Weston, Jason and Auli, Michael. ELI 5: Long Form Question Answering. Proc. ACL. 2019
work page 2019
-
[56]
Adam: A Method for Stochastic Optimization
Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[57]
AmbigQA: Answering Ambiguous Open-domain Questions , author=. arXiv preprint arXiv:2004.10645 , year=
-
[58]
Dehghani, Mostafa and Zamani, Hamed and Severyn, Aliaksei and Kamps, Jaap and Croft, W. Bruce , title =. 2017 , publisher =. doi:10.1145/3077136.3080832 , booktitle =
-
[59]
Journal of documentation , year=
A statistical interpretation of term specificity and its application in retrieval , author=. Journal of documentation , year=
-
[60]
Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages=
Learning deep structured semantic models for web search using clickthrough data , author=. Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages=
-
[61]
Proceedings of the 23rd international conference on world wide web , pages=
Learning semantic representations using convolutional neural networks for web search , author=. Proceedings of the 23rd international conference on world wide web , pages=
-
[62]
A latent semantic model with convolutional-pooling structure for information retrieval , author=. Proceedings of the 23rd ACM international conference on conference on information and knowledge management , pages=
-
[63]
IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=
Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval , author=. IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=. 2016 , publisher=
work page 2016
-
[68]
IEEE Transactions on Big Data , year=
Billion-scale similarity search with GPUs , author=. IEEE Transactions on Big Data , year=
-
[70]
The probabilistic relevance framework: BM25 and beyond , author=. 2009 , publisher=
work page 2009
-
[71]
Journal of the American society for information science , volume=
Indexing by latent semantic analysis , author=. Journal of the American society for information science , volume=. 1990 , publisher=
work page 1990
-
[73]
An introduction to neural information retrieval , author=. Foundations and Trends. 2018 , publisher=
work page 2018
- [74]
-
[77]
End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering , author=. 2021 , eprint=
work page 2021
-
[78]
Distilling Knowledge from Reader to Retriever for Question Answering , author=. 2020 , publisher =
work page 2020
-
[85]
Advances in neural information processing systems , pages=
Distributed representations of words and phrases and their compositionality , author=. Advances in neural information processing systems , pages=
-
[86]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Unsupervised feature learning via non-parametric instance discrimination , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[87]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Momentum contrast for unsupervised visual representation learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[88]
International conference on machine learning , pages=
A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=. 2020 , organization=
work page 2020
-
[89]
arXiv preprint arXiv:2006.09882 , year=
Unsupervised learning of visual features by contrasting cluster assignments , author=. arXiv preprint arXiv:2006.09882 , year=
-
[90]
Representation Learning with Contrastive Predictive Coding
Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[91]
wav2vec 2.0: A framework for self-supervised learning of speech representations,
wav2vec 2.0: A framework for self-supervised learning of speech representations , author=. arXiv preprint arXiv:2006.11477 , year=
-
[96]
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google's neural machine translation system: Bridging the gap between human and machine translation , author=. arXiv preprint arXiv:1609.08144 , year=
work page internal anchor Pith review arXiv
-
[97]
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap your own latent: A new approach to self-supervised learning , author=. arXiv preprint arXiv:2006.07733 , year=
- [99]
-
[100]
CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data
Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzm \'a n, Francisco and Joulin, Armand and Grave, Edouard. CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data. Proceedings of the 12th Language Resources and Evaluation Conference. 2020
work page 2020
-
[102]
Bridging the lexical chasm: statistical approaches to answer-finding , author=. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval , pages=
-
[103]
Transactions of the Association for Computational Linguistics , volume=
Natural questions: a benchmark for question answering research , author=. Transactions of the Association for Computational Linguistics , volume=. 2019 , publisher=
work page 2019
-
[104]
MS MARCO: A human generated machine reading comprehension dataset , author=. CoCo@ NIPS , year=
-
[105]
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Exploring the limits of transfer learning with a unified text-to-text transformer , author=. arXiv preprint arXiv:1910.10683 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[106]
From doc2query to docTTTTTquery , author=. Online preprint , year=
-
[107]
Dawei Zhu, Liang Wang, Nan Yang, Yifan Song, Wenhao Wu, Furu Wei, and Sujian Li
Sparta: Efficient open-domain question answering via sparse transformer matching retrieval , author=. arXiv preprint arXiv:2009.13013 , year=
-
[110]
A Replication Study of Dense Passage Retriever , author=. 2021 , eprint=
work page 2021
-
[111]
Simple Entity-Centric Questions Challenge Dense Retrievers
Sciavolino, Christopher and Zhong, Zexuan and Lee, Jinhyuk and Chen, Danqi. Simple Entity-Centric Questions Challenge Dense Retrievers. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.