Recognition: 1 theorem link
· Lean TheoremSpectral Tempering for Embedding Compression in Dense Passage Retrieval
Pith reviewed 2026-05-15 08:54 UTC · model grok-4.3
The pith
Spectral Tempering derives an adaptive scaling factor from the embedding spectrum to compress dense retrieval vectors without training or tuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The optimal scaling strength gamma for spectral reweighting of retrieval embeddings varies systematically with target dimensionality k and is governed by the signal-to-noise ratio of the retained subspace; Spectral Tempering estimates this strength from local SNR analysis and knee-point normalization performed solely on the corpus eigenspectrum, yielding a learning-free, model-agnostic gamma(k) that matches the performance of grid-searched optima.
What carries the argument
Spectral Tempering (SpecTemp), which computes an adaptive gamma(k) by local SNR analysis and knee-point normalization on the corpus eigenspectrum to set the scaling strength for each target dimensionality.
If this is right
- Dimensionality reduction for dense retrieval no longer requires per-task hyperparameter search or labeled validation sets.
- The same procedure applies unchanged to embeddings produced by any model, because it uses only the corpus eigenspectrum.
- Compressed vectors retain near-oracle retrieval quality for every chosen output dimension.
- Storage and query latency in production retrieval systems can be reduced while preserving accuracy without additional training.
Where Pith is reading between the lines
- The same spectrum-driven adaptation could be tested on other embedding tasks such as sentence clustering or recommendation ranking.
- If the SNR-knee pattern proves stable across domains, the method offers a template for making other post-hoc compression techniques parameter-free.
- The approach suggests that many high-dimensional embedding spaces share a predictable decay structure that can be exploited without supervision.
Load-bearing premise
The claim that the optimal gamma can be recovered accurately from the eigenspectrum alone via local SNR analysis and knee-point normalization without any task labels.
What would settle it
On a held-out query set, apply the derived gamma(k) to compress embeddings and measure retrieval metrics; if accuracy falls below that of a simple fixed-gamma baseline or PCA for the same k, the adaptive estimation fails.
Figures
read the original abstract
Dimensionality reduction is critical for deploying dense retrieval systems at scale, yet mainstream post-hoc methods face a fundamental trade-off: principal component analysis (PCA) preserves dominant variance but underutilizes representational capacity, while whitening enforces isotropy at the cost of amplifying noise in the heavy-tailed eigenspectrum of retrieval embeddings. Intermediate spectral scaling methods unify these extremes by reweighting dimensions with a power coefficient $\gamma$, but treat $\gamma$ as a fixed hyperparameter that requires task-specific tuning. We show that the optimal scaling strength $\gamma$ is not a global constant: it varies systematically with target dimensionality $k$ and is governed by the signal-to-noise ratio (SNR) of the retained subspace. Based on this insight, we propose Spectral Tempering (\textbf{SpecTemp}), a learning-free method that derives an adaptive $\gamma(k)$ directly from the corpus eigenspectrum using local SNR analysis and knee-point normalization, requiring no labeled data or validation-based search. Extensive experiments demonstrate that Spectral Tempering consistently achieves near-oracle performance relative to grid-searched $\gamma^*(k)$ while remaining fully learning-free and model-agnostic. Our code is publicly available at https://github.com/liyongkang123/SpecTemp.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Spectral Tempering (SpecTemp), a learning-free method for compressing dense retrieval embeddings. It claims that the optimal spectral scaling exponent γ is not fixed but varies systematically with target dimensionality k, and can be derived directly from the corpus covariance eigenspectrum via local signal-to-noise ratio analysis followed by knee-point normalization, yielding retrieval performance (nDCG/MRR) close to that of an oracle γ* obtained by grid search on labeled validation data, without any supervised tuning or model-specific training.
Significance. If the central claim holds, SpecTemp would remove a practical bottleneck in deploying dense retrievers at scale by eliminating validation-based hyperparameter search while still outperforming standard PCA and whitening baselines. The fully unsupervised, model-agnostic character and public code release are clear strengths that could influence production embedding pipelines.
major comments (2)
- [§3.2] §3.2 (derivation of γ(k)): the mapping from local SNR knee-point on the corpus eigenspectrum to the scaling strength γ(k) is introduced as a heuristic without a derivation or proof that the resulting γ preserves query-document similarity rankings under the dot-product or cosine metrics used at inference; this step is load-bearing for the 'near-oracle' guarantee.
- [§4] §4 (experimental validation): the abstract asserts 'near-oracle' performance, yet the reported results must include concrete deltas (e.g., nDCG@10 or MRR differences versus grid-searched γ* and versus PCA/whitening) on standard benchmarks such as MS MARCO and Natural Questions, together with ablations on the knee-detection rule and sensitivity to finite-sample eigenspectrum estimation.
minor comments (2)
- [Abstract] Abstract: the phrase 'extensive experiments demonstrate' should be accompanied by at least one headline quantitative result (e.g., 'within 0.5% of oracle nDCG@10 on MS MARCO') to allow readers to gauge the strength of the claim immediately.
- [§3.1] Notation: the precise definition of 'local SNR' and the knee-point detection algorithm (e.g., which curvature or slope threshold is used) should be stated with an explicit equation or pseudocode for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (derivation of γ(k)): the mapping from local SNR knee-point on the corpus eigenspectrum to the scaling strength γ(k) is introduced as a heuristic without a derivation or proof that the resulting γ preserves query-document similarity rankings under the dot-product or cosine metrics used at inference; this step is load-bearing for the 'near-oracle' guarantee.
Authors: We acknowledge that the mapping from the local SNR knee-point to γ(k) is presented as a heuristic rather than a formally derived quantity. The motivation stems from the observation that the knee identifies the transition from signal-dominated to noise-dominated dimensions, after which a tempered scaling (γ(k) < 1) prevents noise amplification while preserving relative similarities under dot-product and cosine metrics. Although we do not supply a closed-form proof that this choice exactly preserves rankings, the method is grounded in the eigenspectrum properties of retrieval embeddings and is validated by consistently achieving near-oracle performance across benchmarks. In the revised manuscript we will expand §3.2 with additional intuition and a small-scale analytic example illustrating why the knee-normalized γ maintains ordering of query-document scores. revision: partial
-
Referee: [§4] §4 (experimental validation): the abstract asserts 'near-oracle' performance, yet the reported results must include concrete deltas (e.g., nDCG@10 or MRR differences versus grid-searched γ* and versus PCA/whitening) on standard benchmarks such as MS MARCO and Natural Questions, together with ablations on the knee-detection rule and sensitivity to finite-sample eigenspectrum estimation.
Authors: We agree that explicit numerical deltas and additional ablations will make the experimental claims more precise. The current version reports that SpecTemp is close to the oracle but does not tabulate exact differences. In the revision we will add tables in §4 showing nDCG@10 and MRR deltas versus both the grid-searched γ* and the PCA/whitening baselines on MS MARCO and Natural Questions. We will also include ablations on alternative knee-detection procedures (e.g., curvature-based vs. threshold-based) and sensitivity experiments that subsample the corpus to assess finite-sample eigenspectrum stability. revision: yes
- A formal derivation or proof that the SNR knee-point heuristic exactly preserves query-document similarity rankings under dot-product or cosine metrics is not available; the claim rests on empirical evidence.
Circularity Check
No significant circularity: adaptive γ(k) derived directly from unlabeled corpus eigenspectrum
full rationale
The paper's central derivation computes γ(k) via local SNR analysis and knee-point normalization on the corpus covariance eigenspectrum alone, using only unlabeled data. This procedure does not reduce by construction to any fitted parameter, task label, or self-referential definition inside the paper; the output γ(k) is produced from spectral statistics without presupposing retrieval performance. No self-citations are load-bearing for the uniqueness or correctness of the mapping, and the method is explicitly learning-free. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The eigenspectrum of the corpus embeddings reflects the signal-to-noise ratio across subspaces
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We estimate the noise floor σ²_noise as the mean eigenvalue of the spectral tail... SNR(i)=max(0,(λ_i−σ²_noise)/σ²_noise)... γ(k)=min(1,SNR(k)/S_ref)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, et al. 2016. Ms marco: A human generated machine reading comprehension dataset.arXiv preprint arXiv:1611.09268(2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu
-
[3]
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. InFindings of the Asso- ciation for Computational Linguistics: ACL 2024. Association for Computational Linguistics, Bangkok, Thailand, 2318–2335. https://doi.org/10.18653/v1/2024. findings-acl.137
-
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Under- standing. InProceedings of the 2019 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Tech- nologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 201...
-
[5]
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss library. (2024). arXiv:2401.08281 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[6]
Michael Günther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Sedigheh Eslami, Scott Martens, Bo Wang, Nan Wang, and Han Xiao
-
[7]
arXiv:2506.18902 [cs.AI] https://arxiv.org/abs/2506.18902
jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval. arXiv:2506.18902 [cs.AI] https://arxiv.org/abs/2506.18902
-
[8]
Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, and Allan Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. InSIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. ACM, 113–122. https://doi.o...
-
[9]
Junjie Huang, Duyu Tang, Wanjun Zhong, Shuai Lu, Linjun Shou, Ming Gong, Daxin Jiang, and Nan Duan. 2021. WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. InFindings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 238–244. https://aclanthology.org/2021...
work page 2021
-
[10]
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search.IEEE Trans. Pattern Anal. Mach. Intell.33, 1 (2011), 117–128. https://doi.org/10.1109/TPAMI.2010.57
-
[11]
William B Johnson, Joram Lindenstrauss, et al . 1984. Extensions of Lipschitz mappings into a Hilbert space.Contemporary mathematics26, 189-206 (1984), 1. https://api.semanticscholar.org/CorpusID:117819162
work page 1984
-
[12]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020. Association for Computational Linguistics, 6769–6781. https://doi.org...
-
[13]
Kakade, Prateek Jain, and Ali Farhadi
Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham M. Kakade, Prateek Jain, and Ali Farhadi. 2022. Matryoshka Representation Learning. InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022...
work page 2022
-
[14]
https://aclanthology.org/ Q19-1026/
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur P. Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob De- vlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: a Benchmark for Question Answering Resear...
-
[15]
Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the Sentence Embeddings from Pre-trained Language Models. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 9119–9130. https: //aclanthology.org/2020.emnlp-main.733/
work page 2020
-
[16]
Yongkang Li. 2026. Understanding and Enhancing Robustness in Dense Informa- tion Retrieval. InAdvances in Information Retrieval - 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 - April 2, SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Yongkang Li, Panagiotis Eustratiadis, and Evangelos Kanoulas 2026,...
-
[17]
Yongkang Li, Panagiotis Eustratiadis, and Evangelos Kanoulas. 2025. Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval. InAdvances in Infor- mation Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part IV (Lecture Notes in Computer Sci- ence, Vol. 15575). Springer, 95–1...
-
[18]
Yongkang Li, Panagiotis Eustratiadis, Simon Lupart, and Evangelos Kanoulas
-
[19]
Unsupervised Corpus Poisoning Attacks in Continuous Space for Dense Retrieval. InProceedings of the 48th International ACM SIGIR Conference on Re- search and Development in Information Retrieval, SIGIR 2025, Padua, Italy, July 13-18, 2025. ACM, 2452–2462. https://doi.org/10.1145/3726302.3730110
-
[20]
Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. Towards General Text Embeddings with Multi-stage Contrastive Learning. arXiv:2308.03281 [cs.CL] https://arxiv.org/abs/2308.03281
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, and Xilun Chen. 2023. How to Train Your Dragon: Diverse Augmentation Towards Generalizable Dense Retrieval. InFindings of the Association for Computational Linguistics: EMNLP 2023, Singapore, Decem- ber 6-10, 2023. Association for Computational Linguistics, 6385–6...
-
[22]
Akmal Haidar, and Mehdi Rezagholizadeh
Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md. Akmal Haidar, and Mehdi Rezagholizadeh. 2020. Improving Word Embedding Factorization for Compres- sion Using Distilled Nonlinear Neural Decomposition. InFindings of the Asso- ciation for Computational Linguistics: EMNLP 2020. Association for Computa- tional Linguistics, Online, 2774–2784. https://doi.org/1...
-
[23]
Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, and Xueqi Cheng. 2023. Black-box Adversarial Attacks against Dense Retrieval Models: A Multi-view Contrastive Learning Method. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management(<conf- loc>, <city>Birmingham</city>, <country>United Kin...
-
[24]
Zhenghao Liu, Han Zhang, Chenyan Xiong, Zhiyuan Liu, Yu Gu, and Xiaohua Li. 2022. Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 5692–5698. https://doi.org/10....
- [25]
-
[26]
Xueguang Ma, Minghan Li, Kai Sun, Ji Xin, and Jimmy Lin. 2021. Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2854–2859. h...
work page 2021
-
[27]
Xueguang Ma, Liang Wang, Nan Yang, Furu Wei, and Jimmy Lin. 2024. Fine- Tuning LLaMA for Multi-Stage Text Retrieval. InProceedings of the 47th Inter- national ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024, Washington DC, USA, July 14-18, 2024. ACM, 2421–2425. https://doi.org/10.1145/3626772.3657951
-
[28]
Macedo Maia, Siegfried Handschuh, André Freitas, Brian Davis, Ross McDermott, Manel Zarrouk, and Alexandra Balahur. 2018. WWW’18 Open Challenge: Fi- nancial Opinion Mining and Question Answering. InCompanion of the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon , France, April 23-27, 2018. ACM, 1941–1942. https://doi.org/10.1145/318455...
-
[29]
Jiaqi Mu and Pramod Viswanath. 2018. All-but-the-Top: Simple and Effective Postprocessing for Word Representations. InInternational Conference on Learning Representations. https://openreview.net/forum?id=HkuGJ3kCb
work page 2018
- [30]
-
[31]
Gustavo Penha, Arthur Câmara, and Claudia Hauff. 2022. Evaluating the Robust- ness of Retrieval Pipelines with Query Variation Generators. InAdvances in Infor- mation Retrieval - 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10-14, 2022, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 13185). Springer, 397–412. ...
-
[32]
Sara Rajaee and Mohammad Taher Pilehvar. 2021. A Cluster-based Approach for Improving Isotropy in Contextual Embedding Space. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistic...
work page 2021
-
[33]
Vikas Raunak, Vivek Gupta, and Florian Metze. 2019. Effective Dimensionality Reduction for Word Embeddings. InProceedings of the 4th Workshop on Rep- resentation Learning for NLP (RepL4NLP-2019). Association for Computational Linguistics, Florence, Italy, 235–243. https://aclanthology.org/W19-4328/
work page 2019
-
[34]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. Association for Computational L...
-
[35]
Ville Satopaa, Jeannie R. Albrecht, David E. Irwin, and Barath Raghavan. 2011. Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior. In 31st IEEE International Conference on Distributed Computing Systems Workshops (ICDCS 2011 Workshops), 20-24 June 2011, Minneapolis, Minnesota, USA. IEEE Computer Society, 166–171. https://doi.org/10...
-
[36]
2022.When BERT Whitening Introduces Hyperparameters: There Is Always One That Suits You
Jianlin Su. 2022.When BERT Whitening Introduces Hyperparameters: There Is Always One That Suits You. https://kexue.fm/archives/9079 Chinese blog post
work page 2022
- [37]
-
[38]
Sotaro Takeshita, Yurina Takeshita, Daniel Ruffinelli, and Simone Paolo Ponzetto
-
[39]
InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Randomly Removing 50% of Dimensions in Text Embeddings has Minimal Impact on Retrieval and Classification Tasks. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Compu- tational Linguistics, Suzhou, China, 27705–27726. https://doi.org/10.18653/v1/ 2025.emnlp-main.1410
-
[40]
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal
-
[41]
FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). Association for Computational Linguistics, 809–819. https:...
work page internal anchor Pith review doi:10.18653/v1/n18-1074 2018
-
[42]
Henrique Schechter Vera, Sahil Dua, Biao Zhang, Daniel Salz, Ryan Mullins, Sindhu Raghuram Panyam, Sara Smoot, Iftekhar Naim, Joe Zou, Feiyang Chen, Daniel Cer, Alice Lisak, Min Choi, Lucas Gonzalez, Omar Sanseviero, Glenn Cameron, Ian Ballantyne, Kat Black, Kaifeng Chen, Weiyi Wang, Zhe Li, Gus Martins, Jinhyuk Lee, Mark Sherwood, Juyeong Ji, Renjie Wu, ...
-
[43]
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrastive Pre-training.CoRRabs/2212.03533 (2022). https://doi.org/10.48550/ ARXIV.2212.03533 arXiv:2212.03533
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[44]
Bennett, Junaid Ahmed, and Arnold Overwijk
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/foru...
work page 2021
-
[45]
Gaifan Zhang, Yi Zhou, and Danushka Bollegala. 2024. Evaluating Unsupervised Dimensionality Reduction Methods for Pretrained Sentence Embeddings. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL, Torino, Italia, 6530–6543. https://aclanthology.org/20...
work page 2024
- [46]
-
[47]
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou
-
[48]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models. arXiv:2506.05176 [cs.CL] https://arxiv.org/abs/2506.05176
work page internal anchor Pith review Pith/arXiv arXiv
-
[49]
Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. 2023. Poison- ing Retrieval Corpora by Injecting Adversarial Passages. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. Association for Computational Linguistics, 13764–13775. https://doi.org/10.18653/V1/2023.E...
- [50]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.