When Is 0.1% Enough? Analyzing the Combined Effects of Dimensionality Reduction and Quantization on Text Embedding Compression

Hayato Tsukagoshi; Riku Kisako; Ryohei Sasano

arxiv: 2606.01074 · v1 · pith:YZZGPREBnew · submitted 2026-05-31 · 💻 cs.CL

When Is 0.1% Enough? Analyzing the Combined Effects of Dimensionality Reduction and Quantization on Text Embedding Compression

Riku Kisako , Hayato Tsukagoshi , Ryohei Sasano This is my paper

Pith reviewed 2026-06-28 17:19 UTC · model grok-4.3

classification 💻 cs.CL

keywords text embeddingsdimensionality reductionquantizationembedding compressionMTEB tasksperformance evaluationvector storage

0 comments

The pith

Combining dimensionality reduction and quantization compresses text embeddings to 0.1% of original size with almost no performance loss in tested cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether applying dimensionality reduction and quantization together produces stronger compression of high-dimensional text embeddings than either technique used separately. Experiments cover four MTEB task families and four pretrained embedding models. Results indicate that the paired methods reach compression ratios as low as 0.1 percent of original size while preserving task performance, and that the best strategy varies by task. High-dimensional embeddings create large storage and compute costs, so effective compression would lower barriers to deploying these models at scale. The work focuses on empirical behavior rather than new theory or algorithms.

Core claim

The experimental results demonstrate that combining dimensionality reduction and quantization enables substantially stronger compression than using either method alone, that in some settings embeddings can be reduced to as little as 0.1% of their original size with almost no performance degradation, and that the optimal compression strategy depends on the task.

What carries the argument

Sequential or joint application of dimensionality reduction (lowering vector length) followed by quantization (lowering value precision) to fixed text embedding vectors from pretrained models.

If this is right

Storage systems could index orders of magnitude more embeddings without added hardware.
Task-specific compression pipelines would replace uniform approaches in production retrieval setups.
Resource-limited devices could run embedding-based applications that currently require full-precision vectors.
Performance on retrieval, classification, and clustering tasks stays close to baseline under aggressive joint compression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Embedding models may carry redundant capacity in both dimension count and numeric precision for many current uses.
Training pipelines could add compression objectives so that models are optimized for size from the start.
Dynamic selection of reduction-plus-quantization settings per query type could become a standard runtime step.

Load-bearing premise

The four MTEB task families and four pretrained embedding models chosen are representative enough that the observed compression behavior and task-dependent optima will generalize to other embedding models and real-world applications not covered by these benchmarks.

What would settle it

Applying the same compression ratios to a new embedding model or task family outside the four tested ones and measuring clear performance drops at the 0.1% size level.

Figures

Figures reproduced from arXiv: 2606.01074 by Hayato Tsukagoshi, Riku Kisako, Ryohei Sasano.

**Figure 2.** Figure 2: Performance of Head-based dimensionality [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 4.** Figure 4: Performance of PCA without random orthog [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Effect of fixed quantization formats on gte-Qwen2 with Head-based dimensionality reduction. PCA-based dimensionality reduction with low-bit quantization. We observed the same qualitative tendency across models, indicating that the instability of PCA-only compression is not specific to gte-Qwen2. 5.2 Sensitivity to Quantization Format Our main experiments use a global equal-count lookup-table quantizer for… view at source ↗

**Figure 6.** Figure 6: Performance of Head-based dimensionality reduction (top) and PCA+ROR (bottom) for Qwen3-Embedding across different bit-widths and task types. 3 6 9 12 15 60.0 62.5 65.0 67.5 70.0 72.5 75.0 77.5 80.0 Score (%) classification 3 6 9 12 15 20 30 40 50 60 clustering 3 6 9 12 15 log2(dim x bits) 0 10 20 30 40 50 60 Score (%) retrieval 3 6 9 12 15 log2(dim x bits) 20 30 40 50 60 70 80 sts dim=2 dim=4 dim=8 dim=1… view at source ↗

**Figure 8.** Figure 8: Performance of Head-based dimensionality [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

read the original abstract

Recent high-performing text embedding models often output high-dimensional real-valued vectors, resulting in substantial storage and computational costs. To address this issue, compression methods based on dimensionality reduction or quantization have been proposed; however, the effects of combining dimensionality reduction and quantization have not been sufficiently investigated. In this paper, we systematically examine the effectiveness of compressing text embeddings by combining dimensionality reduction and quantization, using four MTEB task families and four pretrained embedding models. The experimental results demonstrate that combining dimensionality reduction and quantization enables substantially stronger compression than using either method alone, that in some settings embeddings can be reduced to as little as 0.1% of their original size with almost no performance degradation, and that the optimal compression strategy depends on the task.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows combining dimensionality reduction and quantization can reach 0.1% embedding size with little loss on some MTEB tasks, but the results stay tied to four models and four task families.

read the letter

The main thing to know is that stacking the two compression steps beats using either one alone in their tests, and the right mix shifts depending on the task.

The new part is the systematic look at how the techniques interact. The abstract points out that this combination has not been checked enough before, so running it across four pretrained models and four MTEB task families adds some fresh coverage on when the pair delivers bigger gains.

They handle the comparison cleanly enough to show the combined approach gives stronger size-to-performance trade-offs than the separate methods. The observation that optima are task-dependent is also straightforward and practical.

The clear soft spot is scope. All the numbers come from just those four models and four task families. If the way reduced dimensions affect quantization error or how the vector statistics line up changes for other models or task distributions, the 0.1% claim and the task-specific advice will not carry over. That matches the stress-test concern on generalization.

This is for teams that ship embedding models and need concrete numbers on storage cuts. A reader who works on production compression for retrieval or classification tasks will find usable guidance.

It deserves a serious referee. The empirical question is relevant and the setup is direct, even with the narrow test bed. I would send it for review.

Referee Report

2 major / 1 minor

Summary. The paper claims that combining dimensionality reduction and quantization enables substantially stronger compression of text embeddings than either technique alone. Experiments across four MTEB task families and four pretrained embedding models show that embeddings can be reduced to as little as 0.1% of original size with almost no performance degradation in some settings, and that the optimal compression strategy is task-dependent.

Significance. If the reported synergies and 0.1% compression results hold under a complete evaluation protocol with proper statistical controls, the work would offer practical value for deploying high-dimensional embedding models under storage and compute constraints. The empirical focus on interactions between the two compression stages is a clear strength.

major comments (2)

[Abstract and experimental sections] Abstract and experimental sections: the central claims rest on experiments limited to four MTEB task families and four pretrained models. The interaction between dimensionality reduction and quantization (e.g., how reduced dimensionality affects quantization error) may differ for other model families or task distributions, undermining the generality of the task-dependent optima and 0.1% compression findings.
[Experimental protocol (throughout results sections)] Experimental protocol (throughout results sections): no error bars, statistical tests, data exclusion rules, or full evaluation details are described, making it impossible to determine whether the 0.1% size claim with negligible degradation rests on post-hoc selection or holds under the full protocol.

minor comments (1)

[Abstract] The abstract uses the phrase 'almost no performance degradation' without defining the threshold or reporting the exact metric values used to support it.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. Below we provide point-by-point responses to the major comments, indicating the revisions we intend to make to address the concerns raised.

read point-by-point responses

Referee: [Abstract and experimental sections] Abstract and experimental sections: the central claims rest on experiments limited to four MTEB task families and four pretrained models. The interaction between dimensionality reduction and quantization (e.g., how reduced dimensionality affects quantization error) may differ for other model families or task distributions, undermining the generality of the task-dependent optima and 0.1% compression findings.

Authors: We recognize the limitation in the breadth of our experimental evaluation. Our selection of four task families and four models aimed to provide initial insights into the combined compression effects across representative settings. To address the referee's concern, we will revise the abstract and add a limitations section in the manuscript to explicitly note that the observed synergies and task-dependent optima may not generalize to all model families or task distributions. This will include a discussion of how reduced dimensionality might affect quantization error in other contexts. revision: partial
Referee: [Experimental protocol (throughout results sections)] Experimental protocol (throughout results sections): no error bars, statistical tests, data exclusion rules, or full evaluation details are described, making it impossible to determine whether the 0.1% size claim with negligible degradation rests on post-hoc selection or holds under the full protocol.

Authors: We agree that the experimental protocol requires more detailed reporting to allow proper assessment of the results. In the revised manuscript, we will include a thorough description of the full evaluation protocol, data exclusion rules, and any preprocessing steps. We will also add error bars to the performance metrics (e.g., based on task-level variance) and conduct statistical tests where appropriate to support the claims of minimal degradation at 0.1% compression. This will help clarify that the results are not the result of post-hoc selection. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurements on fixed benchmarks

full rationale

The paper performs direct experiments: it applies dimensionality reduction and quantization to embeddings from four fixed pretrained models, evaluates on four MTEB task families, and reports observed size/performance trade-offs. There are no equations, fitted parameters presented as predictions, self-citations used to justify uniqueness, or derivations that reduce results to prior fitted quantities. The central claims are statements about the measured outcomes on the chosen benchmarks; they do not rely on any internal reduction or self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a purely empirical study; no mathematical derivations, free parameters, axioms, or new postulated entities are introduced.

pith-pipeline@v0.9.1-grok · 5665 in / 1156 out tokens · 24306 ms · 2026-06-28T17:19:05.364102+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 9 canonical work pages · 8 internal anchors

[1]

Universal Sentence Encoder

Universal Sentence Encoder.Preprint, arXiv:1803.11175. Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes

work page internal anchor Pith review Pith/arXiv arXiv
[2]

InProceedings of the 2017 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP), pages 670–680, Copenhagen, Denmark

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. InProceedings of the 2017 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP), pages 670–680, Copenhagen, Denmark. Association for Computa- tional Linguistics. Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh

2017
[3]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transform- ers.arXiv preprint arXiv:2210.17323. Tianyu Gao, Xingcheng Yao, and Danqi Chen

work page internal anchor Pith review Pith/arXiv arXiv
[4]

InProceedings of the 2021 conference on empirical methods in natural language processing (EMNLP), pages 6894–6910

SimCSE: Simple Contrastive Learning of Sentence Embeddings. InProceedings of the 2021 conference on empirical methods in natural language processing (EMNLP), pages 6894–6910. Naamán Huerga-Pérez, Rubén Álvarez, Rubén Ferrero- Guillén, Alberto Martínez-Gutiérrez, and Javier Díez-González

2021
[5]

Taehee Jeong

Optimization of embed- dings storage for RAG systems using quantization and dimensionality reduction techniques.Preprint, arXiv:2505.00105. Taehee Jeong

work page arXiv
[6]

Gemini Embedding: Generalizable Embeddings from Gemini

Ma- tryoshka Representation Learning. InAdvances in Neural Information Processing Systems (NeurIPS), volume 35, pages 30233–30249. Curran Associates, Inc. Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, and Wei Ping. 2025a. NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models. InInternat...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Towards General Text Embeddings with Multi-stage Contrastive Learning

Towards General Text Embeddings with Multi-stage Con- trastive Learning.arXiv preprint arXiv:2308.03281. Siyu Liao, Jie Chen, Yanzhi Wang, Qinru Qiu, and Bo Yuan

work page internal anchor Pith review Pith/arXiv arXiv
[8]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Pro- jection for Dimension Reduction.arXiv preprint arXiv:1802.03426. Niklas Muennighoff, Nouamane Tazi, Loic Magne, and Nils Reimers

work page internal anchor Pith review Pith/arXiv arXiv
[9]

InProceedings of the 17th Con- ference of the European Chapter of the Association for Computational Linguistics (EACL), pages 2014– 2037, Dubrovnik, Croatia

MTEB: Massive Text Embed- ding Benchmark. InProceedings of the 17th Con- ference of the European Chapter of the Association for Computational Linguistics (EACL), pages 2014– 2037, Dubrovnik, Croatia. Association for Computa- tional Linguistics. Zach Nussbaum, John Xavier Morris, Andriy Mul- yar, and Brandon Duderstadt

2014
[10]

InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 853–

Efficient Document Retrieval by End-to- End Refining and Quantizing BERT Embedding with Contrastive Product Quantization. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 853–

2022
[11]

Sentence- BERT: Sentence Embeddings using Siamese BERT- Networks. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 3982–

2019
[12]

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 27705–27726, Suzhou, China

Randomly Re- moving 50% of Dimensions in Text Embeddings has Minimal Impact on Retrieval and Classification Tasks. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 27705–27726, Suzhou, China. Association for Computational Linguistics. Joshua B Tenenbaum, Vin de Silva, and John C Langford

2025
[13]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Text Embeddings by Weakly- Supervised Contrastive Pre-training.arXiv preprint arXiv:2212.03533. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei

work page internal anchor Pith review Pith/arXiv arXiv
[14]

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

TurboQuant: Online Vector Quanti- zation with Near-optimal Distortion Rate.Preprint, arXiv:2504.19874. Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Mod- els.Preprint, arXiv:2506.05176. A Model Details Used in Our Experiments Table 2 summarizes the pretrained text embedding models used in our experiments. For the instruction-based models in Table 2, we encode input texts with task-specific instructions for all tasks. For classi...

work page internal anchor Pith review Pith/arXiv arXiv 2048

[1] [1]

Universal Sentence Encoder

Universal Sentence Encoder.Preprint, arXiv:1803.11175. Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

InProceedings of the 2017 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP), pages 670–680, Copenhagen, Denmark

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. InProceedings of the 2017 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP), pages 670–680, Copenhagen, Denmark. Association for Computa- tional Linguistics. Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh

2017

[3] [3]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transform- ers.arXiv preprint arXiv:2210.17323. Tianyu Gao, Xingcheng Yao, and Danqi Chen

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

InProceedings of the 2021 conference on empirical methods in natural language processing (EMNLP), pages 6894–6910

SimCSE: Simple Contrastive Learning of Sentence Embeddings. InProceedings of the 2021 conference on empirical methods in natural language processing (EMNLP), pages 6894–6910. Naamán Huerga-Pérez, Rubén Álvarez, Rubén Ferrero- Guillén, Alberto Martínez-Gutiérrez, and Javier Díez-González

2021

[5] [5]

Taehee Jeong

Optimization of embed- dings storage for RAG systems using quantization and dimensionality reduction techniques.Preprint, arXiv:2505.00105. Taehee Jeong

work page arXiv

[6] [6]

Gemini Embedding: Generalizable Embeddings from Gemini

Ma- tryoshka Representation Learning. InAdvances in Neural Information Processing Systems (NeurIPS), volume 35, pages 30233–30249. Curran Associates, Inc. Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, and Wei Ping. 2025a. NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models. InInternat...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

Towards General Text Embeddings with Multi-stage Contrastive Learning

Towards General Text Embeddings with Multi-stage Con- trastive Learning.arXiv preprint arXiv:2308.03281. Siyu Liao, Jie Chen, Yanzhi Wang, Qinru Qiu, and Bo Yuan

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Pro- jection for Dimension Reduction.arXiv preprint arXiv:1802.03426. Niklas Muennighoff, Nouamane Tazi, Loic Magne, and Nils Reimers

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

InProceedings of the 17th Con- ference of the European Chapter of the Association for Computational Linguistics (EACL), pages 2014– 2037, Dubrovnik, Croatia

MTEB: Massive Text Embed- ding Benchmark. InProceedings of the 17th Con- ference of the European Chapter of the Association for Computational Linguistics (EACL), pages 2014– 2037, Dubrovnik, Croatia. Association for Computa- tional Linguistics. Zach Nussbaum, John Xavier Morris, Andriy Mul- yar, and Brandon Duderstadt

2014

[10] [10]

InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 853–

Efficient Document Retrieval by End-to- End Refining and Quantizing BERT Embedding with Contrastive Product Quantization. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 853–

2022

[11] [11]

Sentence- BERT: Sentence Embeddings using Siamese BERT- Networks. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 3982–

2019

[12] [12]

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 27705–27726, Suzhou, China

Randomly Re- moving 50% of Dimensions in Text Embeddings has Minimal Impact on Retrieval and Classification Tasks. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 27705–27726, Suzhou, China. Association for Computational Linguistics. Joshua B Tenenbaum, Vin de Silva, and John C Langford

2025

[13] [13]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Text Embeddings by Weakly- Supervised Contrastive Pre-training.arXiv preprint arXiv:2212.03533. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

TurboQuant: Online Vector Quanti- zation with Near-optimal Distortion Rate.Preprint, arXiv:2504.19874. Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Mod- els.Preprint, arXiv:2506.05176. A Model Details Used in Our Experiments Table 2 summarizes the pretrained text embedding models used in our experiments. For the instruction-based models in Table 2, we encode input texts with task-specific instructions for all tasks. For classi...

work page internal anchor Pith review Pith/arXiv arXiv 2048