pith. machine review for the scientific record. sign in

arxiv: 2605.02950 · v1 · submitted 2026-05-01 · 💻 cs.LG · cs.AI

Recognition: unknown

Kernel Affine Hull Machines for Compute-Efficient Query-Side Semantic Encoding

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:40 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords kernel affine hull machinessemantic retrievalquery encodingfixed teacherRKHSleast mean squaresretrieval efficiencylegal document search
0
0 comments X

The pith

Kernel Affine Hull Machines replace neural query encoding with an 8.5-times-faster RKHS estimator while preserving or improving retrieval rankings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether a simple geometric model can stand in for a full neural network when the embedding model is fixed and only queries need encoding. It introduces Kernel Affine Hull Machines that use kernel methods and normalized least-mean-squares to project cheap lexical features into the same space as the expensive teacher model. If this works, semantic search systems could drop the cost of answering each user query dramatically without hurting how well relevant documents are ranked. The authors test the idea on a collection of Austrian legal texts and show that their estimator not only matches but exceeds the ranking performance of other lightweight adapters while cutting query time by a factor of eight and a half. They also provide an explicit breakdown of where the approximation error comes from, making the method more interpretable than black-box alternatives.

Core claim

Kernel Affine Hull Machines map inexpensive lexical features into a frozen semantic embedding space by estimating prototype-mixture weights inside a reproducing kernel Hilbert space and refining those weights with normalized least-mean-squares updates. This construction supplies a transparent decomposition of the total encoding error into posterior-approximation error, generalization error, and teacher-noise error. Evaluated on a controlled Austrian-law retrieval task containing 5,000 queries over 84 laws, the method records the lowest mean-squared reconstruction error to the teacher embeddings, the highest R-squared and cosine similarity, and the best scores on mean reciprocal rank at 20, 0

What carries the argument

Kernel Affine Hull Machines, which perform prototype-mixture estimation in an RKHS followed by normalized LMS refinement to produce an explicit mapping from lexical features to a fixed teacher embedding space.

If this is right

  • Retrieval quality measured by MRR@20, Hit@20 and Top-1 accuracy can be maintained or improved when replacing neural query encoding with the proposed geometric estimator.
  • Per-query latency drops by a factor of 8.5 in fixed-teacher deployments.
  • The sources of encoding error become diagnosable through the explicit decomposition into posterior, generalization, and noise terms.
  • Learned adapters are outperformed by the analytically derived KAHM when the teacher model remains frozen.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique may extend to other domains where cheap surface features exist but deep embeddings are expensive, such as recommendation systems or cross-modal search.
  • If the error decomposition proves stable across datasets, it could support adaptive systems that switch between the estimator and the full teacher based on predicted generalization error.
  • Larger-scale experiments would be needed to determine how the number of prototypes and kernel bandwidth choices scale with corpus size.

Load-bearing premise

That the RKHS-based estimator refined by normalized LMS can approximate the fixed teacher's mapping from lexical features to embeddings well enough to keep the relative ordering of documents unchanged for most queries.

What would settle it

A test set in which KAHM and the teacher model disagree on the top-ranked document for a substantial fraction of queries would show that retrieval quality is not preserved.

Figures

Figures reproduced from arXiv: 2605.02950 by Bernhard A. Moser, Manuela Gei{\ss}, Mohit Kumar, Somayeh Kargaran.

Figure 1
Figure 1. Figure 1: Compute-quality trade-off based on online per-qu [PITH_FULL_IMAGE:figures/full_fig_p023_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Main retrieval-quality metrics across cutoffs. T [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Consensus- and routing-sensitive metrics across [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
read the original abstract

Transformer-based semantic retrieval is highly effective, yet in many deployments the dominant cost lies in online query encoding rather than corpus indexing. We study the fixed-teacher query-adaptation problem and ask whether repeated neural inference can be replaced by a lightweight, analytically explicit estimator without degrading decision-relevant retrieval quality. We propose Kernel Affine Hull Machines (KAHMs), which map inexpensive lexical features into a frozen semantic embedding space by estimating prototype-mixture weights in a rigorously specified RKHS and refining prototypes via normalized least-mean-squares, yielding a transparent decomposition of encoding error into posterior-approximation, generalization, and teacher-noise components. On a controlled Austrian-law benchmark (5,000 queries; 84 laws; 10,762 units), KAHM attains the strongest teacher-space reconstruction among matched learned adapters (MSE 0.000091, R^2 0.9071, cosine 0.9536) and consistently leads rank-sensitive metrics, including mean reciprocal rank at 20 (MRR@20, the average inverse rank of the first relevant result within the top 20), Hit rate at 20 (Hit@20, the fraction of queries with at least one relevant result in the top 20), and Top-1 accuracy (the fraction of queries whose correct item is ranked first), with scores of 0.504, 0.694, and 0.411, respectively. It also reduces per-query latency by a factor of 8.5 relative to direct transformer encoding. These results demonstrate that, in fixed-teacher regimes, lightweight geometric estimators can substitute for online neural encoding, preserving retrieval performance while substantially improving efficiency and interpretability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes Kernel Affine Hull Machines (KAHMs) as a lightweight, analytically explicit estimator to replace repeated transformer inference for query encoding in fixed-teacher semantic retrieval. KAHMs map lexical features into a frozen embedding space via an RKHS prototype-mixture model refined by normalized least-mean-squares, with an explicit decomposition of encoding error into posterior-approximation, generalization, and teacher-noise terms. On a 5,000-query Austrian-law benchmark (84 laws, 10,762 units), KAHM reports the best reconstruction among matched adapters (MSE 0.000091, R² 0.9071, cosine 0.9536) together with leading rank metrics (MRR@20 0.504, Hit@20 0.694, Top-1 0.411) and an 8.5× latency reduction versus direct transformer encoding.

Significance. If the central claim holds, the work shows that geometrically interpretable, parameter-light estimators can substitute for neural query encoders in retrieval pipelines while preserving decision-relevant ranking quality. This would offer substantial efficiency gains and improved transparency in fixed-teacher deployments, with the error decomposition providing a principled route to diagnose approximation effects.

major comments (1)
  1. Abstract and Results: the claim that KAHM substitutes for transformer inference 'without degrading decision-relevant retrieval quality' is not yet supported, because the frozen teacher's own MRR@20, Hit@20, and Top-1 accuracy on the identical 5,000-query Austrian-law set are not reported. Superiority among learned adapters does not establish that the reported approximation error leaves end-to-end retrieval performance intact; a direct oracle baseline is required to connect the RKHS error decomposition to rank preservation.
minor comments (1)
  1. Abstract: the inline parenthetical definitions of MRR@20, Hit@20, and Top-1 accuracy interrupt the flow; relocating them to a dedicated evaluation subsection would improve readability without altering content.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address the single major comment below and will revise the manuscript to incorporate the requested baseline.

read point-by-point responses
  1. Referee: Abstract and Results: the claim that KAHM substitutes for transformer inference 'without degrading decision-relevant retrieval quality' is not yet supported, because the frozen teacher's own MRR@20, Hit@20, and Top-1 accuracy on the identical 5,000-query Austrian-law set are not reported. Superiority among learned adapters does not establish that the reported approximation error leaves end-to-end retrieval performance intact; a direct oracle baseline is required to connect the RKHS error decomposition to rank preservation.

    Authors: We agree that the current manuscript does not report the frozen teacher's own MRR@20, Hit@20, and Top-1 accuracy on the 5,000-query Austrian-law benchmark. While KAHM outperforms the matched adapters on both reconstruction and ranking metrics, this does not by itself demonstrate that the approximation preserves the teacher's end-to-end retrieval quality. We will compute the teacher's performance on the identical query set and add these oracle numbers to the results table, abstract, and discussion. This addition will allow readers to assess directly whether the observed approximation error (posterior, generalization, and teacher-noise terms) leaves rank-sensitive metrics intact. revision: yes

Circularity Check

0 steps flagged

No circularity: analytical RKHS estimator with independent empirical validation

full rationale

The paper derives KAHM as an explicit RKHS-based estimator for prototype-mixture weights refined by normalized LMS, with an error decomposition into posterior-approximation, generalization, and teacher-noise terms. Reported reconstruction metrics (MSE, R², cosine) and rank metrics (MRR@20, Hit@20, Top-1) are downstream empirical measurements on held-out queries, not quantities forced by construction from the fitting procedure itself. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked to justify the core method; the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; the method relies on standard RKHS and least-mean-squares, with no new entities postulated.

axioms (1)
  • domain assumption Reproducing Kernel Hilbert Space (RKHS) properties allow mapping lexical features to semantic embeddings
    Central to the description of KAHMs mapping inexpensive lexical features into a frozen semantic embedding space.

pith-pipeline@v0.9.0 · 5612 in / 1309 out tokens · 42530 ms · 2026-05-09T19:40:46.400355+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 40 canonical work pages · 6 internal anchors

  1. [1]

    Nachman Aronszajn. 1950. Theory of Reproducing Kernels . Trans. Amer . Math. Soc. 68, 3 (1950), 337–404. https://doi.org/10.1090/S0002-9947-1950-0051437-7

  2. [2]

    Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning . Springer, New Y ork

  3. [3]

    Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu L ian, and Zheng Liu. 2024. M3-Embedding: Multi- Linguality, Multi-Functionality, Multi-Granularity Tex t Embeddings Through Self-Knowledge Distillation. In Findings of the Association for Computational Linguistics : ACL 2024 . Association for Computational Linguis- tics, Bangkok, Thailand, 2318–2335...

  4. [4]

    Dumais, George W

    Scott Deerwester, Susan T. Dumais, George W . Furnas, Tho mas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (1990), 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6 <391::AID-ASI1>3.0.CO;2-9

  5. [5]

    Chenlong Deng, Kelong Mao, and Zhicheng Dou. 2024. Learn ing Interpretable Legal Case Retrieval via Knowledge-Guided Case Reformulation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, Miami, Flori da, USA, 1253–1265. https://doi.org/10.18653/v1/2024.emnlp-main.73

  6. [6]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina T outanova. 2019. BERT: Pre-training of Deep Bidirec- tional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies, V olume 1 (Long and Short Papers). 4171–4186. https:...

  7. [7]

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Joh nson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Fai ss Library. arXiv preprint arXiv:2401.08281 (2024). https://doi.org/10.48550/arXiv.2401.08281

  8. [8]

    Yi Feng, Chuanyi Li, and Vincent Ng. 2024. Legal Case Retr ieval: A Survey of the State of the Art. In Proceedings of the 62nd Annual Meeting of the Association fo r Computational Linguistics (V olume 1: Long Papers) . Association for Computational Linguistics, Bangkok, Tha iland, 6472–6485. https://doi.org/10.18653/v1/2024.acl-long.350

  9. [9]

    Thibault Formal, Benjamin Piwowarski, and Stéphane Cli nchant. 2021. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. In Proceedings of the 44th International ACM SIGIR Conferenceon Research and Development in Information Retrieval. Association for Computing Machinery, New Y ork, NY , USA, 2288–2292. https://doi.org/10.1145/3404835.3463098 35

  10. [10]

    Cheng Gao, Chaojun Xiao, Zhenghao Liu, Huimin Chen, Zhi yuan Liu, and Maosong Sun. 2024. Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Qu ery-Candidate Pairs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Proces sing. Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.emnlp-main.402

  11. [11]

    Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp . 2011. Finding Structure with Randomness: Proba- bilistic Algorithms for Constructing Approximate Matrix D ecompositions. SIAM Rev. 53, 2 (2011), 217–288. https://doi.org/10.1137/090771806

  12. [12]

    Sayed, and Thomas Kailath

    Babak Hassibi, Ali H. Sayed, and Thomas Kailath. 1996. H ∞ Optimality of the LMS Algorithm. IEEE Trans- actions on Signal Processing 44, 2 (1996), 267–280. https://doi.org/10.1109/78.485923

  13. [13]

    Simon Haykin. 2014. Adaptive Filter Theory (5 ed.). Pearson

  14. [14]

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Di stilling the Knowledge in a Neural Network. arXiv:1503.02531 [cs.LG]

  15. [15]

    Hoerl and Robert W

    Arthur E. Hoerl and Robert W . Kennard. 1970. Ridge Regre ssion: Biased Estimation for Nonorthogonal Prob- lems. T echnometrics12, 1 (1970), 55–67. https://doi.org/10.1080/00401706.1970.10488634

  16. [16]

    Thomas Hofmann, Bernhard Schölkopf, and Alexander J. S mola. 2008. Kernel Methods in Machine Learning. The Annals of Statistics 36, 3 (2008), 1171–1220. https://doi.org/10.1214/009053607000000677

  17. [17]

    Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Y a ng, Jimmy Lin, and Allan Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic A ware Sampling. In Proceedings of the 44th In- ternational ACM SIGIR Conference on Research and Developme nt in Information Retrieval . Association for Computing Machinery, New Y ork, NY , USA,...

  18. [18]

    intfloat. 2024. multilingual-e5-large. https://huggingface.co/intfloat/multilingual-e5-lar ge Hugging Face model card

  19. [19]

    Gautier Izacard, Mathilde Caron, Lucas Hosseini, Seba stian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022. Unsupervised Dense Information Retri eval with Contrastive Learning. Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=jKN1pXi7b0

  20. [20]

    Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao C hen, Linlin Li, Fang Wang, and Qun Liu

  21. [21]

    TinyBERT: Distilling BERT for natural language understanding

    TinyBERT: Distilling BERT for Natural Language Under standing. In Findings of the Association for Computational Linguistics: EMNLP 2020 . Association for Computational Linguistics, Online, 4163 –4174. https://doi.org/10.18653/v1/2020.findings-emnlp.372

  22. [22]

    Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Le wis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Dom ain Question Answering. In Proceed- ings of the 2020 Conference on Empirical Methods in Natural L anguage Processing (EMNLP) . 6769–6781. https://doi.org/10.18653/v1/2020.emnlp-main.550

  23. [23]

    Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficien t and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval . Association for Computing Machinery, New Y ork, NY , USA, 39 –48. https://doi.org/10.1145/3397271.3401075

  24. [24]

    Mohit Kumar, Mathias Brucker, Alexander V alentinitsc h, Adnan Husakovic, Ali Abbas, Manuela Geiß, and Bernhard A. Moser. 2025. Operator-Theoretic Framework for Gradient-Free Federated Learning. https://arxiv.org/abs/2512.01025

  25. [25]

    Moser, and Lukas Fischer

    Mohit Kumar, Bernhard A. Moser, and Lukas Fischer. 2024 . On Mitigating the Utility-Loss in Differentially Private Learning: A New Perspective by a Geometrically Insp ired Kernel Approach. Journal of Artificial Intelli- gence Research 79 (2024), 515–567. https://doi.org/10.1613/jair.1.15071

  26. [26]

    Mohit Kumar, Alexander V alentinitsch, Magdalena Fuch s, Mathias Brucker, Juliana Bowles, Adnan Husakovic, Ali Abbas, and Bernhard A. Moser. 2025. Geometrically Inspi red Kernel Machines for Collaborative Learning Beyond Gradient Descent. Journal of Artificial Intelligence Research 83 (July 2025), 35 pages. https://doi.org/10.1613/jair.1.16821 36

  27. [27]

    Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, an d Ali Farhadi. 2022. Matryoshka Representation Learning. In Advances in Neural Information Processing Systems . https://arxiv.org/abs/2205.13147

  28. [28]

    Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman , Mohammad Shoeybi, Bryan Catan- zaro, and Wei Ping. 2025. NV -Embed: Improved Techniques for Training LLMs as Gener- alist Embedding Models. In The Thirteenth International Conference on Learning Repre sentations. https://openreview.net/forum?id=lgsyLSsDRe Spotlight

  29. [29]

    Sean Lee, Aamir Shakir, Julius Lipp, and Darius Koenig. 2024. Open Source Gets DE-licious: Mixedbread x deepset German/Engli sh Embeddings . https://www.mixedbread.com/blog/deepset-mxbai-embed -de-large-v1 Mixedbread blog post

  30. [30]

    Xianming Li and Jing Li. 2023. AnglE-optimized Text Emb eddings. arXiv preprint arXiv:2309.12871 (2023). https://doi.org/10.48550/arXiv.2309.12871

  31. [31]

    Some methods for classification and analysis of multivariate observations,

    J. MacQueen. 1967. Some Methods for Classification and A nalysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, V olume 1: Statistics. University of California Press, Berkeley, CA, 281–297. https://projecteuclid.org/euclid.bsmsp/1200512992

  32. [32]

    Mixedbread and deepset. 2024. mixedbread-ai/deepset-mxbai-embed-de-large-v1. https://huggingface.co/mixedbread-ai/deepset-mxbai- embed-de-large-v1 Hugging Face model card

  33. [33]

    Niklas Muennighoff, Nouamane Tazi, Loic Magne, and Nil s Reimers. 2023. MTEB: Massive Text Em- bedding Benchmark. In Proceedings of the 17th Conference of the European Chapter o f the Association for Computational Linguistics . Association for Computational Linguistics, Dubrovnik, C roatia, 2014–2037. https://doi.org/10.18653/v1/2023.eacl-main.148

  34. [34]

    Large Dual Encoders Are Generalizable Retrievers

    Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernánd ez Ábrego, Vincent Y . Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, and Yinfei Y ang. 2022. Large D ual Encoders Are Generaliz- able Retrievers. In Proceedings of the 2022 Conference on Empirical Methods in N atural Language Processing. Association for Computational Linguistics, Abu Dhabi, Un ited ...

  35. [35]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT- Networks. In Proceedings of the 2019 Conference on Empirical Methods in N atural Language Processing and the 9th International Joint Conference on Natural Langu age Processing (EMNLP-IJCNLP) . 3982–3992. https://doi.org/10.18653/v1/D19-1410

  36. [36]

    Republic of Austria. 2026. Legal Information System of the Republic of Austria (RIS) . https://www.ris.bka.gv.at/ Official portal for Austrian legal information

  37. [37]

    Learning representations by back-propagating errors.Nature1986,323, 533–536

    David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Wi lliams. 1986. Learning Representations by Back- Propagating Errors. Nature 323, 6088 (1986), 533–536. https://doi.org/10.1038/323533a0

  38. [38]

    Tetsuya Sakai. 2006. Evaluating Evaluation Metrics Ba sed on the Bootstrap. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Develop ment in Information Retrieval . Association for Computing Machinery, New Y ork, NY , USA, 525–532. https://doi.org/10.1145/1148170.1148261

  39. [39]

    Gerard Salton and Christopher Buckley. 1988. Term-wei ghting approaches in auto- matic text retrieval. Information Processing & Management 24, 5 (1988), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0

  40. [40]

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thom as Wolf. 2019. DistilBERT, a Distilled V ersion of BERT: Smaller, Faster, Cheaper and Lighter. arXiv preprint arXiv:1910.01108 (2019). https://doi.org/10.48550/arXiv.1910.01108

  41. [41]

    Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Chri stopher Potts, and Matei Zaharia. 2022. Col- BERTv2: Effective and Efficient Retrieval via Lightweight L ate Interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies . Association for Computational Lin...

  42. [42]

    Ali H. Sayed. 2008. Adaptive Filters. Wiley, Hoboken, NJ

  43. [43]

    Bernhard Schölkopf and Alexander J. Smola. 2002. Learning with Kernels: Support V ector Machines, Regular- ization, Optimization, and Beyond . MIT Press, Cambridge, MA

  44. [44]

    Smith, Luke Zettlemoyer, and Tao Yu

    Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Y ush i Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Y u. 2022. One Embedder, Any Task: Instruction-Finetuned Text Embeddings. arXiv preprint arXiv:2212.09741 (2022). https://doi.org/10.48550/arXiv.2212.09741

  45. [45]

    Chongyang Tao, Chang Liu, Tao Shen, Can Xu, Xiubo Geng, B inxing Jiao, and Daxin Jiang. 2024. ADAM: Dense Retrieval Distillation with Adaptive Dark Examples. In Findings of the Association for Computa- tional Linguistics: ACL 2024 . Association for Computational Linguistics, Bangkok, Tha iland, 11639–11651. https://doi.org/10.18653/v1/2024.findings-acl.692

  46. [46]

    Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Infor mation Retrieval Models. arXiv preprint arXiv:2104.08663 (2021). https://doi.org/10.48550/arXiv.2104.08663

  47. [47]

    Gomez, Łukasz Kaiser, and Illia Polosukhin

    Ashish V aswani, Noam Shazeer, Niki Parmar, Jakob Uszko reit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All Y ou Need. In Advances in Neural Information Processing Systems , V ol. 30

  48. [48]

    Liang Wang, Nan Y ang, Xiaolong Huang, Binxing Jiao, Lin jun Y ang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrasti ve Pre-training. arXiv preprint arXiv:2212.03533 (2022). https://doi.org/10.48550/arXiv.2212.03533

  49. [49]

    Liang Wang, Nan Y ang, Xiaolong Huang, Linjun Y ang, Rang an Majumder, and Furu Wei. 2024. Multilingual E5 Text Embeddings: A Technical Report. arXiv preprint arXiv:2402.05672 (2024). https://doi.org/10.48550/arXiv.2402.05672

  50. [50]

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaum ond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shl eifer, Patrick von Platen, Clara Ma, Y acine Jer- nite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, M ariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-o...

  51. [51]

    Hansi Zeng, Hamed Zamani, and Vishwa Vinay. 2022. Curri culum Learning for Dense Retrieval Dis- tillation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval . Association for Computing Machinery, New Y ork, NY , USA, 19 79–1983. https://doi.org/10.1145/3477495.3531791

  52. [52]

    8 Daniel Yang, Yao-Hung Hubert Tsai, and Makoto Yamada

    Lucia Zheng, Neel Guha, Javokhir Arifov, Sarah Zhang, M ichal Skreta, Christopher D. Manning, Peter Hender- son, and Daniel E. Ho. 2025. A Reasoning-Focused Legal Retri eval Benchmark. In Proceedings of the 4th ACM Symposium on Computer Science and Law . https://doi.org/10.1145/3709025.3712219 38