arxiv: 2605.02950 · v1 · submitted 2026-05-01 · 💻 cs.LG · cs.AI

Recognition: unknown

Kernel Affine Hull Machines for Compute-Efficient Query-Side Semantic Encoding

Mohit Kumar , Somayeh Kargaran , Bernhard A. Moser , Manuela Gei{\ss}

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:40 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords kernel affine hull machinessemantic retrievalquery encodingfixed teacherRKHSleast mean squaresretrieval efficiencylegal document search

0 comments

The pith

Kernel Affine Hull Machines replace neural query encoding with an 8.5-times-faster RKHS estimator while preserving or improving retrieval rankings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether a simple geometric model can stand in for a full neural network when the embedding model is fixed and only queries need encoding. It introduces Kernel Affine Hull Machines that use kernel methods and normalized least-mean-squares to project cheap lexical features into the same space as the expensive teacher model. If this works, semantic search systems could drop the cost of answering each user query dramatically without hurting how well relevant documents are ranked. The authors test the idea on a collection of Austrian legal texts and show that their estimator not only matches but exceeds the ranking performance of other lightweight adapters while cutting query time by a factor of eight and a half. They also provide an explicit breakdown of where the approximation error comes from, making the method more interpretable than black-box alternatives.

Core claim

Kernel Affine Hull Machines map inexpensive lexical features into a frozen semantic embedding space by estimating prototype-mixture weights inside a reproducing kernel Hilbert space and refining those weights with normalized least-mean-squares updates. This construction supplies a transparent decomposition of the total encoding error into posterior-approximation error, generalization error, and teacher-noise error. Evaluated on a controlled Austrian-law retrieval task containing 5,000 queries over 84 laws, the method records the lowest mean-squared reconstruction error to the teacher embeddings, the highest R-squared and cosine similarity, and the best scores on mean reciprocal rank at 20, 0

What carries the argument

Kernel Affine Hull Machines, which perform prototype-mixture estimation in an RKHS followed by normalized LMS refinement to produce an explicit mapping from lexical features to a fixed teacher embedding space.

If this is right

Retrieval quality measured by MRR@20, Hit@20 and Top-1 accuracy can be maintained or improved when replacing neural query encoding with the proposed geometric estimator.
Per-query latency drops by a factor of 8.5 in fixed-teacher deployments.
The sources of encoding error become diagnosable through the explicit decomposition into posterior, generalization, and noise terms.
Learned adapters are outperformed by the analytically derived KAHM when the teacher model remains frozen.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique may extend to other domains where cheap surface features exist but deep embeddings are expensive, such as recommendation systems or cross-modal search.
If the error decomposition proves stable across datasets, it could support adaptive systems that switch between the estimator and the full teacher based on predicted generalization error.
Larger-scale experiments would be needed to determine how the number of prototypes and kernel bandwidth choices scale with corpus size.

Load-bearing premise

That the RKHS-based estimator refined by normalized LMS can approximate the fixed teacher's mapping from lexical features to embeddings well enough to keep the relative ordering of documents unchanged for most queries.

What would settle it

A test set in which KAHM and the teacher model disagree on the top-ranked document for a substantial fraction of queries would show that retrieval quality is not preserved.

Figures

Figures reproduced from arXiv: 2605.02950 by Bernhard A. Moser, Manuela Gei{\ss}, Mohit Kumar, Somayeh Kargaran.

**Figure 2.** Figure 2: Main retrieval-quality metrics across cutoffs. T [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗

**Figure 3.** Figure 3: Consensus- and routing-sensitive metrics across [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗

read the original abstract

Transformer-based semantic retrieval is highly effective, yet in many deployments the dominant cost lies in online query encoding rather than corpus indexing. We study the fixed-teacher query-adaptation problem and ask whether repeated neural inference can be replaced by a lightweight, analytically explicit estimator without degrading decision-relevant retrieval quality. We propose Kernel Affine Hull Machines (KAHMs), which map inexpensive lexical features into a frozen semantic embedding space by estimating prototype-mixture weights in a rigorously specified RKHS and refining prototypes via normalized least-mean-squares, yielding a transparent decomposition of encoding error into posterior-approximation, generalization, and teacher-noise components. On a controlled Austrian-law benchmark (5,000 queries; 84 laws; 10,762 units), KAHM attains the strongest teacher-space reconstruction among matched learned adapters (MSE 0.000091, R^2 0.9071, cosine 0.9536) and consistently leads rank-sensitive metrics, including mean reciprocal rank at 20 (MRR@20, the average inverse rank of the first relevant result within the top 20), Hit rate at 20 (Hit@20, the fraction of queries with at least one relevant result in the top 20), and Top-1 accuracy (the fraction of queries whose correct item is ranked first), with scores of 0.504, 0.694, and 0.411, respectively. It also reduces per-query latency by a factor of 8.5 relative to direct transformer encoding. These results demonstrate that, in fixed-teacher regimes, lightweight geometric estimators can substitute for online neural encoding, preserving retrieval performance while substantially improving efficiency and interpretability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KAHM gives a kernel-based way to approximate fixed-teacher query embeddings from lexical features with explicit error breakdown, strong reconstruction numbers, and 8.5x latency cut on the benchmark, but skips the teacher's own retrieval metrics.

read the letter

The paper introduces Kernel Affine Hull Machines for approximating fixed teacher embeddings in semantic retrieval using kernel methods on lexical features. It achieves strong reconstruction and ranking results while cutting latency significantly, but omits a direct comparison to the teacher's retrieval performance. KAHMs map cheap features into the frozen embedding space by estimating mixture weights in an RKHS and refining them with normalized least-mean-squares. This yields a decomposition of error into posterior approximation, generalization, and teacher noise terms. That analytical structure is the freshest element here, as most query adapters rely on learned parameters without such breakdowns. The approach performs well on the reported benchmark. With 5,000 queries over 84 Austrian laws, it leads other adapters in teacher-space reconstruction at MSE 0.000091, R2 0.9071, and cosine similarity 0.9536. It also records the highest MRR@20 at 0.504, Hit@20 at 0.694, and Top-1 accuracy at 0.411, alongside an 8.5x latency reduction versus direct transformer encoding. These figures suggest practical value for cost-sensitive or real-time setups. The clearest limitation is the missing teacher baseline on the rank metrics. The claim that it substitutes without degrading quality rests on outperforming adapters, yet without the frozen model's own MRR or hit rates on the identical set, the approximation's impact on end-to-end retrieval stays unproven. The error decomposition helps, but tying those components to rank preservation would close the loop. Other aspects seem minor. The benchmark is narrow but controlled, and the method stays parameter-light by design. No obvious circularity in the formulation. This paper suits readers working on efficient, interpretable alternatives to neural query encoders in fixed-teacher retrieval pipelines. Practitioners in legal tech or similar domains could apply the ideas directly. It has enough substance and reproducible elements to merit a serious referee, though the baseline addition would strengthen it considerably. Recommendation: Send it to peer review with a note to include the teacher's performance numbers.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes Kernel Affine Hull Machines (KAHMs) as a lightweight, analytically explicit estimator to replace repeated transformer inference for query encoding in fixed-teacher semantic retrieval. KAHMs map lexical features into a frozen embedding space via an RKHS prototype-mixture model refined by normalized least-mean-squares, with an explicit decomposition of encoding error into posterior-approximation, generalization, and teacher-noise terms. On a 5,000-query Austrian-law benchmark (84 laws, 10,762 units), KAHM reports the best reconstruction among matched adapters (MSE 0.000091, R² 0.9071, cosine 0.9536) together with leading rank metrics (MRR@20 0.504, Hit@20 0.694, Top-1 0.411) and an 8.5× latency reduction versus direct transformer encoding.

Significance. If the central claim holds, the work shows that geometrically interpretable, parameter-light estimators can substitute for neural query encoders in retrieval pipelines while preserving decision-relevant ranking quality. This would offer substantial efficiency gains and improved transparency in fixed-teacher deployments, with the error decomposition providing a principled route to diagnose approximation effects.

major comments (1)

Abstract and Results: the claim that KAHM substitutes for transformer inference 'without degrading decision-relevant retrieval quality' is not yet supported, because the frozen teacher's own MRR@20, Hit@20, and Top-1 accuracy on the identical 5,000-query Austrian-law set are not reported. Superiority among learned adapters does not establish that the reported approximation error leaves end-to-end retrieval performance intact; a direct oracle baseline is required to connect the RKHS error decomposition to rank preservation.

minor comments (1)

Abstract: the inline parenthetical definitions of MRR@20, Hit@20, and Top-1 accuracy interrupt the flow; relocating them to a dedicated evaluation subsection would improve readability without altering content.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address the single major comment below and will revise the manuscript to incorporate the requested baseline.

read point-by-point responses

Referee: Abstract and Results: the claim that KAHM substitutes for transformer inference 'without degrading decision-relevant retrieval quality' is not yet supported, because the frozen teacher's own MRR@20, Hit@20, and Top-1 accuracy on the identical 5,000-query Austrian-law set are not reported. Superiority among learned adapters does not establish that the reported approximation error leaves end-to-end retrieval performance intact; a direct oracle baseline is required to connect the RKHS error decomposition to rank preservation.

Authors: We agree that the current manuscript does not report the frozen teacher's own MRR@20, Hit@20, and Top-1 accuracy on the 5,000-query Austrian-law benchmark. While KAHM outperforms the matched adapters on both reconstruction and ranking metrics, this does not by itself demonstrate that the approximation preserves the teacher's end-to-end retrieval quality. We will compute the teacher's performance on the identical query set and add these oracle numbers to the results table, abstract, and discussion. This addition will allow readers to assess directly whether the observed approximation error (posterior, generalization, and teacher-noise terms) leaves rank-sensitive metrics intact. revision: yes

Circularity Check

0 steps flagged

No circularity: analytical RKHS estimator with independent empirical validation

full rationale

The paper derives KAHM as an explicit RKHS-based estimator for prototype-mixture weights refined by normalized LMS, with an error decomposition into posterior-approximation, generalization, and teacher-noise terms. Reported reconstruction metrics (MSE, R², cosine) and rank metrics (MRR@20, Hit@20, Top-1) are downstream empirical measurements on held-out queries, not quantities forced by construction from the fitting procedure itself. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked to justify the core method; the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; the method relies on standard RKHS and least-mean-squares, with no new entities postulated.

axioms (1)

domain assumption Reproducing Kernel Hilbert Space (RKHS) properties allow mapping lexical features to semantic embeddings
Central to the description of KAHMs mapping inexpensive lexical features into a frozen semantic embedding space.

pith-pipeline@v0.9.0 · 5612 in / 1309 out tokens · 42530 ms · 2026-05-09T19:40:46.400355+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 40 canonical work pages · 6 internal anchors

[1]

Nachman Aronszajn. 1950. Theory of Reproducing Kernels . Trans. Amer . Math. Soc. 68, 3 (1950), 337–404. https://doi.org/10.1090/S0002-9947-1950-0051437-7

work page doi:10.1090/s0002-9947-1950-0051437-7 1950
[2]

Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning . Springer, New Y ork

2006
[3]

Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu L ian, and Zheng Liu. 2024. M3-Embedding: Multi- Linguality, Multi-Functionality, Multi-Granularity Tex t Embeddings Through Self-Knowledge Distillation. In Findings of the Association for Computational Linguistics : ACL 2024 . Association for Computational Linguis- tics, Bangkok, Thailand, 2318–2335...

work page doi:10.18653/v1/2024.findings-acl.137 2024
[4]

Dumais, George W

Scott Deerwester, Susan T. Dumais, George W . Furnas, Tho mas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (1990), 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6 <391::AID-ASI1>3.0.CO;2-9

work page doi:10.1002/(sici)1097-4571(199009)41:6 1990
[5]

Chenlong Deng, Kelong Mao, and Zhicheng Dou. 2024. Learn ing Interpretable Legal Case Retrieval via Knowledge-Guided Case Reformulation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, Miami, Flori da, USA, 1253–1265. https://doi.org/10.18653/v1/2024.emnlp-main.73

work page doi:10.18653/v1/2024.emnlp-main.73 2024
[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina T outanova. 2019. BERT: Pre-training of Deep Bidirec- tional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies, V olume 1 (Long and Short Papers). 4171–4186. https:...

work page doi:10.18653/v1/n19-1423 2019
[7]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Joh nson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Fai ss Library. arXiv preprint arXiv:2401.08281 (2024). https://doi.org/10.48550/arXiv.2401.08281

work page internal anchor Pith review doi:10.48550/arxiv.2401.08281 2024
[8]

Yi Feng, Chuanyi Li, and Vincent Ng. 2024. Legal Case Retr ieval: A Survey of the State of the Art. In Proceedings of the 62nd Annual Meeting of the Association fo r Computational Linguistics (V olume 1: Long Papers) . Association for Computational Linguistics, Bangkok, Tha iland, 6472–6485. https://doi.org/10.18653/v1/2024.acl-long.350

work page doi:10.18653/v1/2024.acl-long.350 2024
[9]

Thibault Formal, Benjamin Piwowarski, and Stéphane Cli nchant. 2021. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. In Proceedings of the 44th International ACM SIGIR Conferenceon Research and Development in Information Retrieval. Association for Computing Machinery, New Y ork, NY , USA, 2288–2292. https://doi.org/10.1145/3404835.3463098 35

work page doi:10.1145/3404835.3463098 2021
[10]

Cheng Gao, Chaojun Xiao, Zhenghao Liu, Huimin Chen, Zhi yuan Liu, and Maosong Sun. 2024. Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Qu ery-Candidate Pairs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Proces sing. Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.emnlp-main.402

work page doi:10.18653/v1/2024.emnlp-main.402 2024
[11]

Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp . 2011. Finding Structure with Randomness: Proba- bilistic Algorithms for Constructing Approximate Matrix D ecompositions. SIAM Rev. 53, 2 (2011), 217–288. https://doi.org/10.1137/090771806

work page doi:10.1137/090771806 2011
[12]

Sayed, and Thomas Kailath

Babak Hassibi, Ali H. Sayed, and Thomas Kailath. 1996. H ∞ Optimality of the LMS Algorithm. IEEE Trans- actions on Signal Processing 44, 2 (1996), 267–280. https://doi.org/10.1109/78.485923

work page doi:10.1109/78.485923 1996
[13]

Simon Haykin. 2014. Adaptive Filter Theory (5 ed.). Pearson

2014
[14]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Di stilling the Knowledge in a Neural Network. arXiv:1503.02531 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2015
[15]

Hoerl and Robert W

Arthur E. Hoerl and Robert W . Kennard. 1970. Ridge Regre ssion: Biased Estimation for Nonorthogonal Prob- lems. T echnometrics12, 1 (1970), 55–67. https://doi.org/10.1080/00401706.1970.10488634

work page doi:10.1080/00401706.1970.10488634 1970
[16]

Thomas Hofmann, Bernhard Schölkopf, and Alexander J. S mola. 2008. Kernel Methods in Machine Learning. The Annals of Statistics 36, 3 (2008), 1171–1220. https://doi.org/10.1214/009053607000000677

work page doi:10.1214/009053607000000677 2008
[17]

Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Y a ng, Jimmy Lin, and Allan Hanbury. 2021. Efﬁciently Teaching an Effective Dense Retriever with Balanced Topic A ware Sampling. In Proceedings of the 44th In- ternational ACM SIGIR Conference on Research and Developme nt in Information Retrieval . Association for Computing Machinery, New Y ork, NY , USA,...

work page doi:10.1145/3404835.3462891 2021
[18]

intﬂoat. 2024. multilingual-e5-large. https://huggingface.co/intfloat/multilingual-e5-lar ge Hugging Face model card

2024
[19]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Seba stian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022. Unsupervised Dense Information Retri eval with Contrastive Learning. Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=jKN1pXi7b0

2022
[20]

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao C hen, Linlin Li, Fang Wang, and Qun Liu
[21]

TinyBERT: Distilling BERT for natural language understanding

TinyBERT: Distilling BERT for Natural Language Under standing. In Findings of the Association for Computational Linguistics: EMNLP 2020 . Association for Computational Linguistics, Online, 4163 –4174. https://doi.org/10.18653/v1/2020.findings-emnlp.372

work page doi:10.18653/v1/2020.findings-emnlp.372 2020
[22]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Le wis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Dom ain Question Answering. In Proceed- ings of the 2020 Conference on Empirical Methods in Natural L anguage Processing (EMNLP) . 6769–6781. https://doi.org/10.18653/v1/2020.emnlp-main.550

work page doi:10.18653/v1/2020.emnlp-main.550 2020
[23]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efﬁcien t and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval . Association for Computing Machinery, New Y ork, NY , USA, 39 –48. https://doi.org/10.1145/3397271.3401075

work page doi:10.1145/3397271.3401075 2020
[24]

Mohit Kumar, Mathias Brucker, Alexander V alentinitsc h, Adnan Husakovic, Ali Abbas, Manuela Geiß, and Bernhard A. Moser. 2025. Operator-Theoretic Framework for Gradient-Free Federated Learning. https://arxiv.org/abs/2512.01025

work page arXiv 2025
[25]

Moser, and Lukas Fischer

Mohit Kumar, Bernhard A. Moser, and Lukas Fischer. 2024 . On Mitigating the Utility-Loss in Differentially Private Learning: A New Perspective by a Geometrically Insp ired Kernel Approach. Journal of Artiﬁcial Intelli- gence Research 79 (2024), 515–567. https://doi.org/10.1613/jair.1.15071

work page doi:10.1613/jair.1.15071 2024
[26]

Mohit Kumar, Alexander V alentinitsch, Magdalena Fuch s, Mathias Brucker, Juliana Bowles, Adnan Husakovic, Ali Abbas, and Bernhard A. Moser. 2025. Geometrically Inspi red Kernel Machines for Collaborative Learning Beyond Gradient Descent. Journal of Artiﬁcial Intelligence Research 83 (July 2025), 35 pages. https://doi.org/10.1613/jair.1.16821 36

work page doi:10.1613/jair.1.16821 2025
[27]

Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, an d Ali Farhadi. 2022. Matryoshka Representation Learning. In Advances in Neural Information Processing Systems . https://arxiv.org/abs/2205.13147

work page arXiv 2022
[28]

Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman , Mohammad Shoeybi, Bryan Catan- zaro, and Wei Ping. 2025. NV -Embed: Improved Techniques for Training LLMs as Gener- alist Embedding Models. In The Thirteenth International Conference on Learning Repre sentations. https://openreview.net/forum?id=lgsyLSsDRe Spotlight

2025
[29]

Sean Lee, Aamir Shakir, Julius Lipp, and Darius Koenig. 2024. Open Source Gets DE-licious: Mixedbread x deepset German/Engli sh Embeddings . https://www.mixedbread.com/blog/deepset-mxbai-embed -de-large-v1 Mixedbread blog post

2024
[30]

Xianming Li and Jing Li. 2023. AnglE-optimized Text Emb eddings. arXiv preprint arXiv:2309.12871 (2023). https://doi.org/10.48550/arXiv.2309.12871

work page doi:10.48550/arxiv.2309.12871 2023
[31]

Some methods for classification and analysis of multivariate observations,

J. MacQueen. 1967. Some Methods for Classiﬁcation and A nalysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, V olume 1: Statistics. University of California Press, Berkeley, CA, 281–297. https://projecteuclid.org/euclid.bsmsp/1200512992

work page arXiv 1967
[32]

Mixedbread and deepset. 2024. mixedbread-ai/deepset-mxbai-embed-de-large-v1. https://huggingface.co/mixedbread-ai/deepset-mxbai- embed-de-large-v1 Hugging Face model card

2024
[33]

Niklas Muennighoff, Nouamane Tazi, Loic Magne, and Nil s Reimers. 2023. MTEB: Massive Text Em- bedding Benchmark. In Proceedings of the 17th Conference of the European Chapter o f the Association for Computational Linguistics . Association for Computational Linguistics, Dubrovnik, C roatia, 2014–2037. https://doi.org/10.18653/v1/2023.eacl-main.148

work page doi:10.18653/v1/2023.eacl-main.148 2023
[34]

Large Dual Encoders Are Generalizable Retrievers

Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernánd ez Ábrego, Vincent Y . Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, and Yinfei Y ang. 2022. Large D ual Encoders Are Generaliz- able Retrievers. In Proceedings of the 2022 Conference on Empirical Methods in N atural Language Processing. Association for Computational Linguistics, Abu Dhabi, Un ited ...

work page doi:10.18653/v1/2022.emnlp-main.669 2022
[35]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT- Networks. In Proceedings of the 2019 Conference on Empirical Methods in N atural Language Processing and the 9th International Joint Conference on Natural Langu age Processing (EMNLP-IJCNLP) . 3982–3992. https://doi.org/10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[36]

Republic of Austria. 2026. Legal Information System of the Republic of Austria (RIS) . https://www.ris.bka.gv.at/ Ofﬁcial portal for Austrian legal information

2026
[37]

Learning representations by back-propagating errors.Nature1986,323, 533–536

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Wi lliams. 1986. Learning Representations by Back- Propagating Errors. Nature 323, 6088 (1986), 533–536. https://doi.org/10.1038/323533a0

work page doi:10.1038/323533a0 1986
[38]

Tetsuya Sakai. 2006. Evaluating Evaluation Metrics Ba sed on the Bootstrap. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Develop ment in Information Retrieval . Association for Computing Machinery, New Y ork, NY , USA, 525–532. https://doi.org/10.1145/1148170.1148261

work page doi:10.1145/1148170.1148261 2006
[39]

Gerard Salton and Christopher Buckley. 1988. Term-wei ghting approaches in auto- matic text retrieval. Information Processing & Management 24, 5 (1988), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0

work page doi:10.1016/0306-4573(88)90021-0 1988
[40]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thom as Wolf. 2019. DistilBERT, a Distilled V ersion of BERT: Smaller, Faster, Cheaper and Lighter. arXiv preprint arXiv:1910.01108 (2019). https://doi.org/10.48550/arXiv.1910.01108

work page internal anchor Pith review doi:10.48550/arxiv.1910.01108 2019
[41]

Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Chri stopher Potts, and Matei Zaharia. 2022. Col- BERTv2: Effective and Efﬁcient Retrieval via Lightweight L ate Interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies . Association for Computational Lin...

work page doi:10.18653/v1/2022.naacl-main.272 2022
[42]

Ali H. Sayed. 2008. Adaptive Filters. Wiley, Hoboken, NJ

2008
[43]

Bernhard Schölkopf and Alexander J. Smola. 2002. Learning with Kernels: Support V ector Machines, Regular- ization, Optimization, and Beyond . MIT Press, Cambridge, MA

2002
[44]

Smith, Luke Zettlemoyer, and Tao Yu

Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Y ush i Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Y u. 2022. One Embedder, Any Task: Instruction-Finetuned Text Embeddings. arXiv preprint arXiv:2212.09741 (2022). https://doi.org/10.48550/arXiv.2212.09741

work page doi:10.48550/arxiv.2212.09741 2022
[45]

Chongyang Tao, Chang Liu, Tao Shen, Can Xu, Xiubo Geng, B inxing Jiao, and Daxin Jiang. 2024. ADAM: Dense Retrieval Distillation with Adaptive Dark Examples. In Findings of the Association for Computa- tional Linguistics: ACL 2024 . Association for Computational Linguistics, Bangkok, Tha iland, 11639–11651. https://doi.org/10.18653/v1/2024.findings-acl.692

work page doi:10.18653/v1/2024.findings-acl.692 2024
[46]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Infor mation Retrieval Models. arXiv preprint arXiv:2104.08663 (2021). https://doi.org/10.48550/arXiv.2104.08663

work page internal anchor Pith review doi:10.48550/arxiv.2104.08663 2021
[47]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish V aswani, Noam Shazeer, Niki Parmar, Jakob Uszko reit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All Y ou Need. In Advances in Neural Information Processing Systems , V ol. 30

2017
[48]

Liang Wang, Nan Y ang, Xiaolong Huang, Binxing Jiao, Lin jun Y ang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrasti ve Pre-training. arXiv preprint arXiv:2212.03533 (2022). https://doi.org/10.48550/arXiv.2212.03533

work page internal anchor Pith review doi:10.48550/arxiv.2212.03533 2022
[49]

Liang Wang, Nan Y ang, Xiaolong Huang, Linjun Y ang, Rang an Majumder, and Furu Wei. 2024. Multilingual E5 Text Embeddings: A Technical Report. arXiv preprint arXiv:2402.05672 (2024). https://doi.org/10.48550/arXiv.2402.05672

work page internal anchor Pith review doi:10.48550/arxiv.2402.05672 2024
[50]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaum ond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shl eifer, Patrick von Platen, Clara Ma, Y acine Jer- nite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, M ariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-o...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020
[51]

Hansi Zeng, Hamed Zamani, and Vishwa Vinay. 2022. Curri culum Learning for Dense Retrieval Dis- tillation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval . Association for Computing Machinery, New Y ork, NY , USA, 19 79–1983. https://doi.org/10.1145/3477495.3531791

work page doi:10.1145/3477495.3531791 2022
[52]

8 Daniel Yang, Yao-Hung Hubert Tsai, and Makoto Yamada

Lucia Zheng, Neel Guha, Javokhir Arifov, Sarah Zhang, M ichal Skreta, Christopher D. Manning, Peter Hender- son, and Daniel E. Ho. 2025. A Reasoning-Focused Legal Retri eval Benchmark. In Proceedings of the 4th ACM Symposium on Computer Science and Law . https://doi.org/10.1145/3709025.3712219 38

work page doi:10.1145/3709025.3712219 2025