hub

C ol BERT v2: E ffective and E fficient R etrieval via L ightweight L ate I nteraction

Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia · 2021 · arXiv 2112.01488

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

cs.CL · 2023-10-05 · conditional · novelty 8.0

DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization

cs.CL · 2025-10-06 · unverdicted · novelty 7.0

GQR is a test-time optimization technique that refines primary retriever query embeddings using complementary retriever scores to achieve high performance with smaller representations in multimodal visual document retrieval.

ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation

cs.IR · 2026-04-14 · unverdicted · novelty 6.0

ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.

EmbeddingGemma: Powerful and Lightweight Text Representations

cs.CL · 2025-09-24 · unverdicted · novelty 6.0

A 300M-parameter open embedding model sets new SOTA on MTEB for its size class and matches models twice as large while staying effective when compressed.

LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations

cs.IR · 2025-09-16 · conditional · novelty 6.0

LEAF distills teacher-aligned student embedding models that achieve new SOTA results on BEIR and MTEB for their size class while requiring only modest data and compute.

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

cs.LG · 2025-04-28 · unverdicted · novelty 6.0

TurboQuant achieves near-optimal vector quantization distortion for both MSE and inner products via random rotation and per-coordinate scalar quantization, with a formal proof that it matches lower bounds within a factor of approximately 2.7.

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

cs.CL · 2024-01-31 · unverdicted · novelty 6.0

RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.

Text and Code Embeddings by Contrastive Pre-Training

cs.CL · 2022-01-24 · unverdicted · novelty 6.0

Contrastive pre-training on unsupervised data at scale creates text and code embeddings that set new state-of-the-art results on classification and semantic search benchmarks.

MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution

cs.LG · 2025-12-05 · unverdicted · novelty 5.0

MaxShapley computes fair document attributions in generative QA by reducing Shapley value calculation to polynomial time via a max-sum utility, matching exact Shapley quality on HotPotQA, MuSiQUE, and MS MARCO while using up to 9x fewer resources.

Chronological Knowledge Retrieval: A Retrieval-Augmented Generation Approach to Construction Project Documentation

cs.CL · 2026-03-25 · unverdicted · novelty 4.0

A RAG framework integrates semantic search and LLMs to deliver time-annotated answers to natural-language questions on construction project meeting minutes, demonstrated on an industry dataset with public code and data release.

Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com

cs.IR · 2026-05-01 · conditional · novelty 3.0

Structured negative mining with taxonomy and LLM judges improves offline category accuracy by 2.6% in IKEA search but yields no significant online engagement gains due to prevalent zero-click user behavior.

citing papers explorer

Showing 12 of 12 citing papers.

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines cs.CL · 2023-10-05 · conditional · none · ref 47
DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 62
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization cs.CL · 2025-10-06 · unverdicted · none · ref 29
GQR is a test-time optimization technique that refines primary retriever query embeddings using complementary retriever scores to achieve high performance with smaller representations in multimodal visual document retrieval.
ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation cs.IR · 2026-04-14 · unverdicted · none · ref 4
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
EmbeddingGemma: Powerful and Lightweight Text Representations cs.CL · 2025-09-24 · unverdicted · none · ref 19
A 300M-parameter open embedding model sets new SOTA on MTEB for its size class and matches models twice as large while staying effective when compressed.
LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations cs.IR · 2025-09-16 · conditional · none · ref 32
LEAF distills teacher-aligned student embedding models that achieve new SOTA results on BEIR and MTEB for their size class while requiring only modest data and compute.
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate cs.LG · 2025-04-28 · unverdicted · none · ref 46
TurboQuant achieves near-optimal vector quantization distortion for both MSE and inner products via random rotation and per-coordinate scalar quantization, with a formal proof that it matches lower bounds within a factor of approximately 2.7.
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval cs.CL · 2024-01-31 · unverdicted · none · ref 106
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
Text and Code Embeddings by Contrastive Pre-Training cs.CL · 2022-01-24 · unverdicted · none · ref 19
Contrastive pre-training on unsupervised data at scale creates text and code embeddings that set new state-of-the-art results on classification and semantic search benchmarks.
MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution cs.LG · 2025-12-05 · unverdicted · none · ref 78
MaxShapley computes fair document attributions in generative QA by reducing Shapley value calculation to polynomial time via a max-sum utility, matching exact Shapley quality on HotPotQA, MuSiQUE, and MS MARCO while using up to 9x fewer resources.
Chronological Knowledge Retrieval: A Retrieval-Augmented Generation Approach to Construction Project Documentation cs.CL · 2026-03-25 · unverdicted · none · ref 16
A RAG framework integrates semantic search and LLMs to deliver time-annotated answers to natural-language questions on construction project meeting minutes, demonstrated on an industry dataset with public code and data release.
Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com cs.IR · 2026-05-01 · conditional · none · ref 13
Structured negative mining with taxonomy and LLM judges improves offline category accuracy by 2.6% in IKEA search but yields no significant online engagement gains due to prevalent zero-click user behavior.

C ol BERT v2: E ffective and E fficient R etrieval via L ightweight L ate I nteraction

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer