pith. sign in

arxiv: 2601.13525 · v2 · pith:KEHXG5EKnew · submitted 2026-01-20 · 💻 cs.IR

More Than Efficiency: Embedding Compression Improves Domain Adaptation in Dense Retrieval

classification 💻 cs.IR
keywords domainadaptationembeddingsretrievalcompressiondenseefficiencyembedding
0
0 comments X
read the original abstract

Dense retrievers powered by pretrained embeddings are widely used for document retrieval but struggle in specialized domains due to the mismatches between the training and target domain distributions. Domain adaptation typically requires costly annotation and retraining of query-document pairs. In this work, we revisit an overlooked alternative: applying PCA to domain embeddings to derive lower-dimensional representations that preserve domain-relevant features while discarding non-discriminative components. Though traditionally used for efficiency, we demonstrate that this simple embedding compression can effectively improve retrieval performance. Evaluated across 9 retrievers and 14 MTEB datasets, PCA applied solely to query embeddings improves NDCG@10 in 75.4% of model-dataset pairs, offering a simple and lightweight method for domain adaptation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Spectral Tempering for Embedding Compression in Dense Passage Retrieval

    cs.IR 2026-03 unverdicted novelty 7.0

    Spectral Tempering derives an adaptive scaling factor γ(k) from the embedding eigenspectrum via local SNR analysis and knee-point normalization to achieve near-optimal compression without training or validation.