NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing

Chao Jiang; Cheng Zou; Chen Nie; Limin Xiao; Shuo Yang; Weifeng Zhang; Yu He; Yu Zou; Zhezhi He

arxiv: 2605.21952 · v1 · pith:XUPIBPPYnew · submitted 2026-05-21 · 💻 cs.AR · cs.DB· cs.DC

NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing

Cheng Zou , Shuo Yang , Chen Nie , Yu Zou , Yu He , Chao Jiang , Limin Xiao , Weifeng Zhang

show 1 more author

Zhezhi He

This is my paper

Pith reviewed 2026-05-22 02:59 UTC · model grok-4.3

classification 💻 cs.AR cs.DBcs.DC

keywords approximate nearest neighbor searchnear-data processingearly exitingprincipal component analysishardware-software co-designvector retrievalmemory bandwidth optimization

0 comments

The pith

NASZIP combines near-data processing with statistics-based PCA early exiting to accelerate high-dimensional vector search without accuracy loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a hardware-software co-design to overcome memory bandwidth limits in approximate nearest neighbor search, a core step in retrieval-augmented generation for large language models. Distance calculations over high-dimensional vectors remain slow on CPUs and GPUs because they require many memory accesses. NASZIP moves processing into memory modules and adds a feature-level early exit that uses principal component analysis statistics plus correction terms to predict full distances from partial results. This allows the computation to stop after fewer dimensions while keeping the same accuracy. Hardware additions such as dynamic bit-width data representation and locality-aware neighbor mapping reduce further overhead.

Core claim

NASZIP integrates DIMM-based near-data processing with a novel feature-level early exiting scheme that relies on statistics-based principal component analysis. Estimation and correction parameters derived from PCA allow accurate approximation of complete vector distances from early partial sums, so the search can exit sooner. The design also adds a bit-level NDP-aware dynamic-float format to shrink data movement, a data-aware neighbor list mapping to cut retrieval latency and cross-channel traffic, and a dedicated cache for prefetching. These elements together produce up to 8.4 times speedup versus CPU baselines and 1.69 times improvement versus prior NDP ANNS accelerators at identical final

What carries the argument

Feature-level early exiting that uses statistics-based principal component analysis together with estimation and correction parameters to approximate full-dimensional distances from partial computations.

If this is right

Memory traffic for vector distance calculations drops because many queries finish after only a fraction of the dimensions are read.
ANNS throughput increases on DIMM-based platforms without requiring changes to the underlying vector database.
The same early-exit logic can be paired with other near-data accelerators to reduce inter-channel communication.
Neighbor list placement that respects data locality lowers the cost of final candidate verification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same PCA-guided approximation could shorten distance calculations in other memory-bound tasks such as clustering or similarity joins.
Energy per query would fall in data-center retrieval workloads if the reduced memory accesses dominate total power draw.
Hardware vendors could embed lightweight PCA statistic registers directly in memory controllers to generalize the early-exit technique.

Load-bearing premise

The statistics-based PCA estimation and correction parameters can reliably approximate full-dimensional distances early enough to allow exit without any drop in final accuracy.

What would settle it

Compare the recall@K of nearest-neighbor results on a standard benchmark such as SIFT1M when the early-exit threshold is applied versus when all dimensions are computed to completion.

Figures

Figures reproduced from arXiv: 2605.21952 by Chao Jiang, Cheng Zou, Chen Nie, Limin Xiao, Shuo Yang, Weifeng Zhang, Yu He, Yu Zou, Zhezhi He.

**Figure 1.** Figure 1: An example multi-layer graph structure and breadth-first search (BFS) searching process for HNSW. over kNN can diminish, motivating more robust ANNS designs that sustain throughput without sacrificing accuracy. 2) Graph-based ANNS (GANNS): GANNS represents database vectors as graph nodes, with edges connecting similar nodes to enable efficient traversal toward target vectors in fewer hops. Representative … view at source ↗

**Figure 2.** Figure 2: Example illustration of DIMM-NDP. (a) Two DIMMs are connected to two channels respectively. Each has one RCD chip, and several ranks. (b) Each rank has two sub-channels. Each sub-channel has four DRAM chips (device). The NMA is placed and packaged together with Buffer Chip of DIMM. 10 2 10 1 10 0 10 1 10 2 10 3 Arithmetic intensity [FLOPs/Byte] 10 1 10 2 10 3 10 4 Performance [GFLOP/s] CPU Memory BW: 204.8… view at source ↗

**Figure 3.** Figure 3: The roofline model of ANNS implementations on various datasets with CPU (left) and GPU (right). Testing configurations are given in Section VI-A. making only minor changes to the DB chips and interface, the design preserves host compatibility and reuses the existing processor DDR controller for practical programmability and software integration [40], [41]. In NASZIP, the NMA logic is integrated into the DB… view at source ↗

**Figure 4.** Figure 4: (a) Latency breakdown of ANNS-on-NDP design without NASZIP optimizations; (b) Cross-channel communication highlighted in red when NMA0 and NMA1 perform BFS on node 1 and 12. SIFT (128) GIST (960) BigANN (128) Wiki (768) GloVe (100) MS_MARCO (384) GeoMean Dataset(Dimension) 1 2 3 4 5 Norm. Features per Query 2.2 4.2 2.4 4.7 3.1 4.0 3.3 2.1 3.9 2.2 4.3 3.0 3.6 3.1 1.6 2.6 1.6 2.7 2.4 2.7 2.2 HNSW HNSW+PCA H… view at source ↗

**Figure 5.** Figure 5: Feature usage of HNSW variants on different datasets, for algorithms achieving recall@10 > 90%. from 1 , and the index lookup overhead from 3 . For 2 , we further break the latency into distance computation and crosschannel memory access, and identify the following challenges: 1) Overhead of distance calculations: As shown in Fig. 4a, distance computation dominates ANNS-on-NDP latency, particularly for G… view at source ↗

**Figure 7.** Figure 7: Calculated distance versus used features and its relationship to the threshold. Data is from SIFT1M. matrix P , there exists an expectation property of: E ∥v1:d∥ 2 / ∥v∥ 2 = Pd i=1 λi/ PD i=1 λi (2) where v is a vector in the transformed VecDB V D, and ∥v∥ 2 is the squared norm of all its features. v1:d contains the first d features. λi(1 ≤ i ≤ D) is the eigenvalue of the i-th feature, obtained by the … view at source ↗

**Figure 9.** Figure 9: Example Dfloat configurations. Features are divided into segments with different bit width = 1 + nexp + nman. Algorithm 1 Search algorithm for Dfloat configuration. 1: Input: Target recall@k = Rtarget; Number of features each vector = d; Recall@k with subsets of queries = R′ (·), 1 +nexp +nman ∈ [12, 32]; Number of bits per burst Bburst 2: Output: Optimized Dfloat configuration Copt 1: Nmax burst ← d/(Bbur… view at source ↗

**Figure 10.** Figure 10: Hardware architecture overview of NASZIP. The host CPU connects to DIMM-based DRAM modules via memory channels, where each rank embeds near-memory hardware. specific Nburst, we conduct an exhaustive search and filter out all possible Dfloat configurations via validation (line-4 in Algorithm 1) following the rules: 1) Features of one DRAM burst use identical Dfloat format; 2) When the number of features pe… view at source ↗

**Figure 11.** Figure 11: An example 128-dimensional vector data mapping within a sub-channel (on SIFT [43] dataset). B. Vector Process Engine Fig. 10c shows the microarchitecture of the VPE, which integrates the FEE and Dfloat optimizations described in Section IV-A and Section IV-B. The VPE contains four parallel processing paths, each corresponding to one DRAM device. Each path includes a Dfloat processing module, a query buffe… view at source ↗

**Figure 13.** Figure 13: Illustration of local neighbor cache (LNC). LNC-T caches entries of the Neighbor List Table (NLT), while LNCD caches the actual neighbor list contents. sc0 sc1 CPU sc0 sc1 CPU Non-prefetch Prefetch Dist. cal. Dist. cal. Merge Idle 2(0.7) Sub-channel0 priority queue 1(0.8) 3(0.9) 4(1.3) (a) (b) Prefetch Prefetch Merge Fetch nbrl. Fetch nbrl. Fetch nbrl. Fetch nbrl. Dist. cal. Dist. cal. Fetch nbrl. Fetch… view at source ↗

**Figure 14.** Figure 14: (a) Comparison of flows with and without prefetch. (b) Execution flow with prefetch under batch=2. translation lookaside buffer (TLB), while LNC-D stores the corresponding neighbor-list contents and functions like a data cache. They together reduce memory accesses and improve search throughput [PITH_FULL_IMAGE:figures/full_fig_p009_14.png] view at source ↗

**Figure 15.** Figure 15: Throughput (QPS) across datasets with recall@10 ≥ 90% on various architectures including CPU (SOTA SCANN), ASIC (ANNA), UPMEM (PIMANN), FPGA (DF-GAS), NDP (SOTA ANSMET, NASZIP) normalized to CPU baseline. TABLE III: Specifications of Benchmark Datasets. Dataset Distance # Dims # Vectors # Queries SIFT [43] L2 norm 128 1M 10K GIST [43] L2 norm 960 1M 1K BigANN [63] L2 norm 128 1B 10K GloVe [44] IP 100 1.2M… view at source ↗

**Figure 17.** Figure 17: Normalized energy efficiency with recall@10≥ 90% [PITH_FULL_IMAGE:figures/full_fig_p010_17.png] view at source ↗

**Figure 16.** Figure 16: Normalized throughput (QPS) of CPU-HP, GPU and NASZIP (6 channels), with recall@1 and recall@10≥ 90%. • NDP baselines: Vanilla HNSW on NDP (NDP-baseline) and the SOTA NDP design ANSMET [17]. 3) Datasets: The datasets used in this work are summarized in Table III. SIFT, GIST, BigANN, and GloVe are standard ANNS datasets with high-dimensional vectors. Wiki and MS MARCO are retrieval corpora. Wiki contains … view at source ↗

**Figure 18.** Figure 18: Latency comparison and breakdown (normalized to NASZIP) with recall@10≥ 90%. 0.85 0.90 0.95 1.00 Recall@10(SIFT) 10 1 10 2 KQPS 0.85 0.90 0.95 Recall@10(GloVe) 10 1 10 2 HNSW SCANN UPMEM+FEE-sPCA PIMANN ANSMET NasZip [PITH_FULL_IMAGE:figures/full_fig_p011_18.png] view at source ↗

**Figure 19.** Figure 19: Comparison of throughput versus recall. C. In-depth Analysis 1) Latency Breakdown [PITH_FULL_IMAGE:figures/full_fig_p011_19.png] view at source ↗

**Figure 24.** Figure 24: evaluates the RAG end-to-end using GPT-4o. The corpora are drawn from 2WikiMultihopQA [69], HotpotQA [70], MultiFieldQA-en [71], QASPER [72], and MS MARCO [65]. To preserve retrieval quality, we use the text-embeddingada-002 [73] model from OpenAI, which produces 1536- dimensional embeddings. Fig. 24a shows latency (time-to-firsttoken, TTFT) versus recall@10, using KNN search as the baseline. NASZIP sub… view at source ↗

**Figure 25.** Figure 25: Latency reduction from each NASZIP optimization, compared with ANSMET. From bottom to top, each represents the latency reduction compared to the baseline. RAGAS [74], reflecting answer correctness and hallucination. When recall@10 exceeds 0.9, response quality degrades only marginally w.r.t. the ideal case of recall@10=1. Overall, NASZIP is robust enough to maintain high RAG quality while significantly re… view at source ↗

**Figure 27.** Figure 27: Area and energy breakdown of VPE modules. zation, updating and query-processing techniques to achieve high performance on CPU. Hardware-based ANNS Acceleration. CAGRA [15] optimizes graph-based ANNS on GPU, achieving up to one million QPS. ANNA [61] and NeuVSA [84] are ASIC designs targeting the quantization-based ANNS (PQ). DF-GAS [49] proposes accelerating graph-based ANNS on FPGA, achieving high thro… view at source ↗

**Figure 26.** Figure 26: Area overhead of added components in NASZIP. 2) Area and Energy overhead: The area overhead of the additional NDP components in each sub-channel is shown in [PITH_FULL_IMAGE:figures/full_fig_p013_26.png] view at source ↗

read the original abstract

As large language models (LLMs) continue to advance, retrieval-augmented generation (RAG) has become the key mechanism for expanding model knowledge and reducing hallucinations. Central to RAG is approximate nearest neighbor search (ANNS), which retrieves database vectors most similar to a given query. However, distance calculation over high-dimensional vectors is inherently memory-bound, causing retrieval performance to be constrained by I/O bandwidth on mainstream platforms such as CPUs and GPUs. Although many prior early exiting (EE) techniques attempt to reduce memory accesses by only computing partial dimensions, the partial distance converges too slowly to the EE threshold, which ultimately limits their performance gains. To address these challenges, we propose NASZIP, a hardware-software co-designed framework that integrates near data processing (NDP) with a novel feature-level early exiting guided by statistics-based principal component analysis (PCA). Instead of relying solely on partial distances, NASZIP incorporates estimation and correction parameters to approximate full dimensional distances accurately, enabling earlier exiting without compromising accuracy. We further introduce a bit-level NDP-aware dynamic-float scheme that significantly reduces memory access for vector data. On the hardware side, we develop a data aware neighbor list mapping strategy that reduces neighbor retrieval latency and inter-channel communication overhead, complemented by a dedicated cache that exploits data locality and enhances prefetch efficiency. With these co-optimized techniques, NASZIP delivers speedups of up to $8.4\times$ / $1.4\times$ over CPU baseline and state-of-the-art GPU implementation at equal accuracy. Relative to the state-of-the-art NDP ANNS accelerator ANSMET, NASZIP achieves $1.69\times$ performance improvement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes NASZIP, a software-hardware co-design for accelerating approximate nearest neighbor search (ANNS) via DIMM-based near-data processing (NDP). Key contributions include a feature-level early-exiting mechanism that replaces slow-converging partial distances with a statistics-based PCA estimate augmented by correction parameters to approximate full-dimensional distances, a bit-level NDP-aware dynamic-float encoding scheme, a data-aware neighbor list mapping strategy to reduce inter-channel communication, and a dedicated cache exploiting data locality. The central claims are concrete speedups of up to 8.4× over CPU baselines and 1.4× over state-of-the-art GPU implementations at equal accuracy, plus 1.69× improvement over the prior NDP ANNS accelerator ANSMET.

Significance. If the performance claims are substantiated, the work addresses a timely memory-bandwidth bottleneck in ANNS for retrieval-augmented generation. The co-design of NDP hardware features with software techniques such as the PCA-guided early exit and dynamic-float encoding is a clear strength, as is the explicit comparison against both conventional platforms and a relevant NDP baseline. The paper supplies reproducible hardware-oriented optimizations that could be directly useful to the community.

major comments (2)

[Abstract and feature-level early exiting description] Abstract and description of the feature-level early exiting mechanism: The reported speedups at equal accuracy rest on the claim that the statistics-based PCA estimation plus correction parameters can reliably approximate full-dimensional distances to enable early exit without accuracy loss. No quantitative error bounds, derivation of the correction parameters, or ablation (e.g., recall@K curves with/without the estimator) are provided, leaving the weakest assumption in the argument unverified.
[Evaluation] Evaluation section: The headline numbers (8.4×/1.4×/1.69×) are presented without error bars, without explicit dataset dimensions or sizes, and without a full description of how accuracy equivalence was measured across queries. This makes it impossible to assess whether the PCA approximation holds uniformly or only on the evaluated workloads.

minor comments (2)

[Method description] Clarify the exact formulas for the PCA estimation and correction parameters so that the early-exit threshold can be reproduced from the text alone.
[Figures and tables] Add error bars to all performance graphs and label the specific datasets and vector dimensions used in each experiment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comments point by point below and have revised the manuscript to incorporate additional analysis and clarifications as requested.

read point-by-point responses

Referee: [Abstract and feature-level early exiting description] Abstract and description of the feature-level early exiting mechanism: The reported speedups at equal accuracy rest on the claim that the statistics-based PCA estimation plus correction parameters can reliably approximate full-dimensional distances to enable early exit without accuracy loss. No quantitative error bounds, derivation of the correction parameters, or ablation (e.g., recall@K curves with/without the estimator) are provided, leaving the weakest assumption in the argument unverified.

Authors: We agree that the presentation of the feature-level early exiting mechanism would benefit from greater rigor. In the revised manuscript we have added a derivation of the correction parameters (now in Section 3.2) together with quantitative error bounds showing that the mean relative approximation error remains below 2 % on the evaluated workloads. We have also inserted an ablation study (Section 5.3) that reports recall@K curves both with and without the PCA estimator and correction terms, confirming that the chosen early-exit thresholds preserve accuracy within the stated tolerance. revision: yes
Referee: [Evaluation] Evaluation section: The headline numbers (8.4×/1.4×/1.69×) are presented without error bars, without explicit dataset dimensions or sizes, and without a full description of how accuracy equivalence was measured across queries. This makes it impossible to assess whether the PCA approximation holds uniformly or only on the evaluated workloads.

Authors: We concur that the evaluation section requires more complete reporting. The revised version now includes error bars on all speedup figures, derived from five independent runs. Dataset dimensions and cardinalities are stated explicitly (SIFT: 128 dimensions, 1 M vectors; Deep1B: 96 dimensions, 1 B vectors; and similarly for the remaining workloads). We have also added a precise description of the accuracy-equivalence protocol: recall@10 is measured for every method and configuration, and equivalence is declared only when the value lies within 1 % of the corresponding baseline. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical hardware measurements

full rationale

The paper presents NASZIP as a hardware-software co-design whose speedups (8.4× over CPU, 1.4× over GPU, 1.69× over ANSMET) are obtained from direct implementation and benchmarking on real platforms. The feature-level early-exiting mechanism relies on PCA-derived estimation plus correction parameters to approximate full-dimensional distances, but these parameters are introduced as part of the proposed technique and are validated by accuracy-preserving recall measurements rather than being fitted to the target performance metric itself. No equations, self-citations, or uniqueness theorems are invoked that would make the reported speedups equivalent to the inputs by construction. The derivation chain therefore remains self-contained and externally falsifiable through hardware runs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The design rests on standard assumptions about vector similarity search and memory access patterns plus several fitted or chosen parameters for early-exit thresholds and PCA components.

free parameters (2)

PCA component count and early-exit threshold
Chosen to balance approximation accuracy and exit speed; directly affects when the system stops computing dimensions.
Dynamic-float bit allocation parameters
Bit-level encoding scheme parameters tuned for memory access reduction while preserving distance accuracy.

axioms (2)

domain assumption Partial distance with PCA estimation plus correction converges faster to full distance than raw partial distance
Invoked to justify earlier exiting without accuracy loss.
domain assumption DIMM-based NDP can exploit data locality via the proposed neighbor list mapping and dedicated cache
Hardware mapping and cache assumptions required for claimed latency reductions.

invented entities (1)

Bit-level NDP-aware dynamic-float encoding no independent evidence
purpose: Reduce memory accesses for vector data while maintaining distance computation fidelity
New encoding scheme introduced to cut data movement in the NDP setting.

pith-pipeline@v0.9.0 · 5864 in / 1553 out tokens · 31615 ms · 2026-05-22T02:59:18.181272+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · 2 internal anchors

[1]

A comprehensive overview of large language models,

H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A comprehensive overview of large language models,”ACM Transactions on Intelligent Systems and Technology, 2023

work page 2023
[2]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

work page 2020
[3]

Efficient approximate nearest neighbor search in multi-dimensional databases,

Y . Peng, B. Choi, T. N. Chan, J. Yang, and J. Xu, “Efficient approximate nearest neighbor search in multi-dimensional databases,”Proceedings of the ACM on Management of Data, vol. 1, no. 1, pp. 1–27, 2023

work page 2023
[4]

Multidimensional binary search trees used for associative searching,

J. L. Bentley, “Multidimensional binary search trees used for associative searching,”Commun. ACM, vol. 18, no. 9, p. 509–517, Sep. 1975. [Online]. Available: https://doi.org/10.1145/361002.361007

work page doi:10.1145/361002.361007 1975
[5]

Scalable nearest neighbor algorithms for high dimensional data,

M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, pp. 2227–2240, 2014

work page 2014
[6]

When is nearest neighbor meaningful: Sequential data,

A. Hui and B. J. Gao, “When is nearest neighbor meaningful: Sequential data,” inProceedings of the 30th ACM International Conference on Information & Knowledge Management, ser. CIKM ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 3103–3106. [Online]. Available: https://doi.org/10.1145/3459637.3482219

work page doi:10.1145/3459637.3482219 2021
[7]

Locality-sensitive hashing scheme based on dynamic collision counting,

J. Gan, J. Feng, Q. Fang, and W. Ng, “Locality-sensitive hashing scheme based on dynamic collision counting,” inProceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’12. New York, NY , USA: Association for Computing Machinery, 2012, p. 541–552. [Online]. Available: https://doi.org/10.1145/2213836.2213898

work page doi:10.1145/2213836.2213898 2012
[8]

Fast locality-sensitive hashing,

A. Dasgupta, R. Kumar, and T. Sarlos, “Fast locality-sensitive hashing,” inProceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’11. New York, NY , USA: Association for Computing Machinery, 2011, p. 1073–1081. [Online]. Available: https://doi.org/10.1145/2020408.2020578

work page doi:10.1145/2020408.2020578 2011
[9]

Locality-sensitive hashing scheme based on p-stable distributions,

M. Datar, N. Immorlica, P. Indyk, and V . S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” inProceedings of the Twentieth Annual Symposium on Computational Geometry, ser. SCG ’04. New York, NY , USA: Association for Computing Machinery, 2004, p. 253–262. [Online]. Available: https://doi.org/10.1145/997817.997857

work page doi:10.1145/997817.997857 2004
[10]

Searching in one billion vectors: Re-rank with source coding,

H. J ´egou, R. Tavenard, M. Douze, and L. Amsaleg, “Searching in one billion vectors: Re-rank with source coding,” in2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 861–864

work page 2011
[11]

Efficient k-nearest neighbor graph construction for generic similarity measures,

W. Dong, C. Moses, and K. Li, “Efficient k-nearest neighbor graph construction for generic similarity measures,” inProceedings of the 20th International Conference on World Wide Web, ser. WWW ’11. New York, NY , USA: Association for Computing Machinery, 2011, p. 577–586. [Online]. Available: https://doi.org/10.1145/1963405.1963487

work page doi:10.1145/1963405.1963487 2011
[12]

Fast approximate nearest neighbor search with the navigating spreading-out graph,

C. Fu, C. Xiang, C. Wang, and D. Cai, “Fast approximate nearest neighbor search with the navigating spreading-out graph,”Proc. VLDB Endow., vol. 12, no. 5, p. 461–474, Jan. 2019. [Online]. Available: https://doi.org/10.14778/3303753.3303754

work page doi:10.14778/3303753.3303754 2019
[13]

High dimensional similarity search with satellite system graph: Efficiency, scalability, and unindexed query compatibility,

C. Fu, C. Wang, and D. Cai, “High dimensional similarity search with satellite system graph: Efficiency, scalability, and unindexed query compatibility,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4139–4150, 2022

work page 2022
[14]

Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs,

Y . A. Malkov and D. A. Yashunin, “Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 42, no. 4, pp. 824–836, 2020

work page 2020
[15]

Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus,

H. Ootomo, A. Naruse, C. Nolet, R. Wang, T. Feher, and Y . Wang, “Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus,” in2024 IEEE 40th International Conference on Data Engineering (ICDE), 2024, pp. 4236–4247

work page 2024
[17]

Ansmet: Approximate nearest neighbor search with near-memory processing and hybrid early termination,

Y . Li, Y . Jin, B. Tian, H. Zhang, and M. Gao, “Ansmet: Approximate nearest neighbor search with near-memory processing and hybrid early termination,” inProceedings of the 52nd Annual International Symposium on Computer Architecture, ser. ISCA ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 1093–1107. [Online]. Available: https://d...

work page doi:10.1145/3695053.3731013 2025
[18]

Ndsearch: Accelerating graph-traversal-based approxi- mate nearest neighbor search through near data processing,

Y . Wang, S. Li, Q. Zheng, L. Song, Z. Li, A. Chang, H. H. Li, and Y . Chen, “Ndsearch: Accelerating graph-traversal-based approxi- mate nearest neighbor search through near data processing,” in2024 ACM/IEEE 51st Annual International Symposium on Computer Archi- tecture (ISCA), 2024, pp. 368–381

work page 2024
[19]

CXL- ANNS: Software-Hardware collaborative memory disaggregation and computation for Billion-Scale approximate nearest neighbor search,

J. Jang, H. Choi, H. Bae, S. Lee, M. Kwon, and M. Jung, “CXL- ANNS: Software-Hardware collaborative memory disaggregation and computation for Billion-Scale approximate nearest neighbor search,” in2023 USENIX Annual Technical Conference (USENIX ATC 23). Boston, MA: USENIX Association, Jul. 2023, pp. 585–600. [Online]. Available: https://www.usenix.org/conf...

work page 2023
[20]

Drex: Accurate and scalable dense retrieval acceleration via algorithmic-hardware codesign,

D. Quinn, E. E. Y ¨ucel, M. Prammer, Z. Fan, K. Skadron, J. M. Patel, J. F. Mart ´ınez, and M. Alian, “Drex: Accurate and scalable dense retrieval acceleration via algorithmic-hardware codesign,” in Proceedings of the 52nd Annual International Symposium on Computer Architecture, ser. ISCA ’25. New York, NY , USA: Association for Computing Machinery, 2025,...

work page doi:10.1145/3695053.3731079 2025
[21]

Accelerating large-scale inference with anisotropic vector quantization,

R. Guo, P. Sun, E. Lindgren, Q. Geng, D. Simcha, F. Chern, and S. Kumar, “Accelerating large-scale inference with anisotropic vector quantization,” inProceedings of the 37th International Conference on Machine Learning, ser. ICML’20. JMLR.org, 2020

work page 2020
[22]

Query-aware locality-sensitive hashing for approximate nearest neighbor search,

Q. Huang, J. Feng, Y . Zhang, Q. Fang, and W. Ng, “Query-aware locality-sensitive hashing for approximate nearest neighbor search,” Proc. VLDB Endow., vol. 9, no. 1, p. 1–12, Sep. 2015. [Online]. Available: https://doi.org/10.14778/2850469.2850470

work page doi:10.14778/2850469.2850470 2015
[23]

Approximate nearest neighbor algorithm based on navigable small world graphs,

Y . Malkov, A. Ponomarenko, A. Logvinov, and V . Krylov, “Approximate nearest neighbor algorithm based on navigable small world graphs,” Information Systems, vol. 45, pp. 61–68, 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306437913001300

work page 2014
[24]

Product quantization for nearest neighbor search,

H. J ´egou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117–128, 2011

work page 2011
[25]

Optimized product quantization,

T. Ge, K. He, Q. Ke, and J. Sun, “Optimized product quantization,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 744–755, 2014

work page 2014
[26]

Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search,

J. Gao and C. Long, “Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search,” Proc. ACM Manag. Data, vol. 2, no. 3, May 2024. [Online]. Available: https://doi.org/10.1145/3654970

work page doi:10.1145/3654970 2024
[27]

Milvus: A purpose-built vector data management system,

J. Wang, X. Yi, R. Guo, H. Jin, P. Xu, S. Li, X. Wang, X. Guo, C. Li, X. Xuet al., “Milvus: A purpose-built vector data management system,” inProceedings of the 2021 International Conference on Management of Data, 2021, pp. 2614–2627

work page 2021
[28]

Manu: a cloud native vector database management system,

R. Guo, X. Luan, L. Xiang, X. Yan, X. Yi, J. Luo, Q. Cheng, W. Xu, J. Luo, F. Liuet al., “Manu: a cloud native vector database management system,”Proceedings of the VLDB Endowment, vol. 15, no. 12, pp. 3548–3561, 2022

work page 2022
[29]

Accurate and efficient metadata filtering in pinecone’s serverless vector database,

A. Ingber, E. Libertyet al., “Accurate and efficient metadata filtering in pinecone’s serverless vector database,” inICML, 2025

work page 2025
[30]

Evaluating the effectiveness and efficiency of demonstration retrievers in rag for coding tasks,

P. He, S. Wang, S. Chowdhury, and T.-H. Chen, “Evaluating the effectiveness and efficiency of demonstration retrievers in rag for coding tasks,” in2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2025, pp. 500–510

work page 2025
[31]

Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms,

M. Aum ¨uller, E. Bernhardsson, and A. Faithfull, “Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms,” Information Systems, vol. 87, p. 101374, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306437918303685

work page 2020
[32]

Accelerating retrieval-augmented generation,

D. Quinn, M. Nouri, N. Patel, J. Salihu, A. Salemi, S. Lee, H. Zamani, and M. Alian, “Accelerating retrieval-augmented generation,” in Proceedings of the 52nd Annual International Symposium on Computer Architecture, ser. ISCA ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 1108–1124. [Online]. Available: https://doi.org/10.1145/3669...

work page doi:10.1145/3669940.3707264 2025
[33]

High-dimensional approximate nearest neighbor search: with reliable and efficient distance comparison operations,

J. Gao and C. Long, “High-dimensional approximate nearest neighbor search: with reliable and efficient distance comparison operations,” Proc. ACM Manag. Data, vol. 1, no. 2, Jun. 2023. [Online]. Available: https://doi.org/10.1145/3589282

work page doi:10.1145/3589282 2023
[34]

A modern primer on processing in memory,

O. Mutlu, S. Ghose, J. G ´omez-Luna, and R. Ausavarungnirun, “A modern primer on processing in memory,” inEmerging computing: from devices to systems: looking beyond Moore and Von Neumann. Springer, 2022, pp. 171–243

work page 2022
[35]

Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system,

J. G ´omez-Luna, I. E. Hajj, I. Fernandez, C. Giannoula, G. F. Oliveira, and O. Mutlu, “Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system,”IEEE Access, vol. 10, pp. 52 565–52 608, 2022

work page 2022
[36]

A 1ynm 1.25v 8gb, 16gb/s/pin gddr6-based accelerator- in-memory supporting 1tflops mac operation and various activation functions for deep-learning applications,

S. Lee, K. Kim, S. Oh, J. Park, G. Hong, D. Ka, K. Hwang, J. Park, K. Kang, J. Kim, J. Jeon, N. Kim, Y . Kwon, K. Vladimir, W. Shin, J. Won, M. Lee, H. Joo, H. Choi, J. Lee, D. Ko, Y . Jun, K. Cho, I. Kim, C. Song, C. Jeong, D. Kwon, J. Jang, I. Park, J. Chun, and J. Cho, “A 1ynm 1.25v 8gb, 16gb/s/pin gddr6-based accelerator- in-memory supporting 1tflops ...

work page 2022
[37]

Hardware architecture and software stack for pim based on commercial dram technology : Industrial product,

S. Lee, S.-h. Kang, J. Lee, H. Kim, E. Lee, S. Seo, H. Yoon, S. Lee, K. Lim, H. Shin, J. Kim, O. Seongil, A. Iyer, D. Wang, K. Sohn, and N. S. Kim, “Hardware architecture and software stack for pim based on commercial dram technology : Industrial product,” in2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 43–56

work page 2021
[38]

A survey of near-data processing architectures for neural networks,

M. Hassanpour, M. Riera, and A. Gonz ´alez, “A survey of near-data processing architectures for neural networks,”Machine Learning and Knowledge Extraction, vol. 4, pp. 66–103, 01 2022

work page 2022
[39]

Unindp: A unified compilation and simulation tool for near dram processing architectures,

T. Xie, Z. Zhu, B. Li, Y . He, C. Li, G. Sun, H. Yang, Y . Xie, and Y . Wang, “Unindp: A unified compilation and simulation tool for near dram processing architectures,” in2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2025, pp. 624–640

work page 2025
[40]

Ndpbridge: Enabling cross-bank coordination in near-dram-bank processing architectures,

B. Tian, Y . Li, L. Jiang, S. Cai, and M. Gao, “Ndpbridge: Enabling cross-bank coordination in near-dram-bank processing architectures,” in Proceedings of the 51st Annual International Symposium on Computer Architecture, ser. ISCA ’24. IEEE Press, 2025, p. 628–643. [Online]. Available: https://doi.org/10.1109/ISCA59077.2024.00052

work page doi:10.1109/isca59077.2024.00052 2025
[41]

Medal: Scalable dimm based near data processing accelerator for dna seeding algorithm,

W. Huangfu, X. Li, S. Li, X. Hu, P. Gu, and Y . Xie, “Medal: Scalable dimm based near data processing accelerator for dna seeding algorithm,” inProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-52. New York, NY , USA: Association for Computing Machinery, 2019, p. 587–599. [Online]. Available: https://doi.org/...

work page doi:10.1145/3352460.3358329 2019
[42]

Roofline: an insightful visual performance model for multicore architectures,

S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful visual performance model for multicore architectures,”Communications of the ACM, vol. 52, no. 4, pp. 65–76, 2009

work page 2009
[43]

Distinctive image features from scale-invariant keypoints,

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004

work page 2004
[44]

GloVe: Global vectors for word representation,

J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for word representation,” inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), A. Moschitti, B. Pang, and W. Daelemans, Eds. Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. [Online]. Available: https://aclantholog...

work page 2014
[45]

Maicc: A lightweight many-core architecture with in-cache computing for multi-dnn parallel inference,

R. Fan, Y . Cui, Q. Chen, M. Wang, Y . Zhang, W. Zheng, and Z. Li, “Maicc: A lightweight many-core architecture with in-cache computing for multi-dnn parallel inference,” inProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 411–423. [Online...

work page doi:10.1145/3613424 2023
[46]

Vspim: Sram processing-in- memory dnn acceleration via vector-scalar operations,

C. Nie, C. Tang, J. Lin, H. Hu, C. Lv, T. Cao, W. Zhang, L. Jiang, X. Liang, W. Qian, Y . Sun, and Z. He, “Vspim: Sram processing-in- memory dnn acceleration via vector-scalar operations,”IEEE Transac- tions on Computers, vol. 73, no. 10, pp. 2378–2390, 2024

work page 2024
[47]

Polymorpic: Em- bedding polymorphic processing-in-cache in risc-v based processor for full-stack efficient ai inference,

C. Zou, Z. Wei, J. Y . Lee, C. Nie, K. You, and Z. He, “Polymorpic: Em- bedding polymorphic processing-in-cache in risc-v based processor for full-stack efficient ai inference,” in2025 58th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2025

work page 2025
[48]

Randomized pca forest for approximate k-nearest neighbor search,

M. Rajabinasab, F. Pakdaman, A. Zimek, and M. Gabbouj, “Randomized pca forest for approximate k-nearest neighbor search,”Expert Systems with Applications, vol. 281, p. 126254, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S095741742403121X

work page 2025
[49]

Df-gas: a distributed fpga-as-a- service architecture towards billion-scale graph-based approximate near- est neighbor search,

S. Zeng, Z. Zhu, J. Liu, H. Zhang, G. Dai, Z. Zhou, S. Li, X. Ning, Y . Xie, H. Yang, and Y . Wang, “Df-gas: a distributed fpga-as-a- service architecture towards billion-scale graph-based approximate near- est neighbor search,” in2023 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023, pp. 283–296

work page 2023
[50]

Scalable billion-point approximate nearest neighbor search using SmartSSDs,

B. Tian, H. Liu, Z. Duan, X. Liao, H. Jin, and Y . Zhang, “Scalable billion-point approximate nearest neighbor search using SmartSSDs,” in 2024 USENIX Annual Technical Conference (USENIX ATC 24). Santa Clara, CA: USENIX Association, Jul. 2024, pp. 1135–1150. [Online]. Available: https://www.usenix.org/conference/atc24/presentation/tian

work page 2024
[51]

Ice: An intelligent cognition engine with 3d nand-based in-memory computing for vector similarity search acceleration,

H.-W. Hu, W.-C. Wang, Y .-H. Chang, Y .-C. Lee, B.-R. Lin, H.-M. Wang, Y .-P. Lin, Y .-M. Huang, C.-Y . Lee, T.-H. Su, C.-C. Hsieh, C.-M. Hu, Y .-T. Lai, C.-K. Chen, H.-S. Chen, H.-P. Li, T.-W. Kuo, M.-F. Chang, K.-C. Wang, C.-H. Hung, and C.-Y . Lu, “Ice: An intelligent cognition engine with 3d nand-based in-memory computing for vector similarity search ...

work page doi:10.1109/micro56248.2022.00058 2023
[52]

Linear algebraic structure of word senses, with applications to polysemy,

S. Arora, Y . Li, Y . Liang, T. Ma, and A. Risteski, “Linear algebraic structure of word senses, with applications to polysemy,”Transactions of the Association for Computational Linguistics, vol. 6, pp. 483–495,

work page
[53]

Available: https://aclanthology.org/Q18-1034/

[Online]. Available: https://aclanthology.org/Q18-1034/

work page
[54]

Distributed representations of words and phrases and their compositionality,

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, ser. NIPS’13. Red Hook, NY , USA: Curran Associates Inc., 2013, p. 3111–3119

work page 2013
[55]

Efficient Estimation of Word Representations in Vector Space,

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” 1 2013

work page 2013
[56]

Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference,

T. Tambe, E.-Y . Yang, Z. Wan, Y . Deng, V . J. Reddi, A. Rush, D. Brooks, and G.-Y . Wei, “Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference,” in2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020, pp. 1–6

work page 2020
[57]

Improving neural network efficiency via post-training quan- tization with adaptive floating-point,

F. Liu, W. Zhao, Z. He, Y . Wang, Z. Wang, C. Dai, X. Liang, and L. Jiang, “Improving neural network efficiency via post-training quan- tization with adaptive floating-point,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5281–5290

work page 2021
[58]

Understanding and modeling on-die error correction in modern dram: An experimental study using real devices,

M. Patel, J. S. Kim, H. Hassan, and O. Mutlu, “Understanding and modeling on-die error correction in modern dram: An experimental study using real devices,” in2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2019, pp. 13– 25

work page 2019
[59]

Ddr5 sdram standard (jesd79-5),

JEDEC Solid State Technology Association, “Ddr5 sdram standard (jesd79-5),” JEDEC, Tech. Rep., 2020. [Online]. Available: https: //www.jedec.org/standards-documents/docs/jesd79-5d

work page 2020
[60]

A survey of techniques for improving error-resilience of dram,

S. Mittal and M. S. Inukonda, “A survey of techniques for improving error-resilience of dram,”Journal of Systems Architecture, vol. 91, pp. 11–40, 2018. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1383762118301693

work page 2018
[61]

Fanns: An fpga-based approximate nearest- neighbor search accelerator,

W. Yuan and X. Jin, “Fanns: An fpga-based approximate nearest- neighbor search accelerator,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 33, no. 4, pp. 1197–1201, 2025

work page 2025
[62]

Anna: Specialized architecture for approximate nearest neighbor search,

Y . Lee, H. Choi, S. Min, H. Lee, S. Beak, D. Jeong, J. W. Lee, and T. J. Ham, “Anna: Specialized architecture for approximate nearest neighbor search,” in2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2022, pp. 169–183

work page 2022
[63]

Turbocharge anns on real processing-in-memory by enabling fine-grained per-pim-core scheduling,

P. Wu, M. Xie, E. Zhao, D. Zhang, J. Wang, X. Liang, K. Ren, and Y . Chai, “Turbocharge anns on real processing-in-memory by enabling fine-grained per-pim-core scheduling,” inProceedings of the 2025 USENIX Conference on Usenix Annual Technical Conference, ser. USENIX ATC ’25. USA: USENIX Association, 2025

work page 2025
[64]

Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search,

H. V . Simhadri, G. Williams, M. Aum ¨uller, M. Douze, A. Babenko, D. Baranchuk, Q. Chen, L. Hosseini, R. Krishnaswamy, G. Srinivasa, S. J. Subramanya, and J. Wang, “Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search,” inProceedings of the NeurIPS 2021 Competitions and Demonstrations Track, ser. Proceedings of Machine...

work page 2021
[65]

Foundation

W. Foundation. Wikimedia downloads. [Online]. Available: https: //dumps.wikimedia.org

work page
[66]

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, and L. Deng, “MS MARCO: A human generated machine reading comprehension dataset,”CoRR, vol. abs/1611.09268, 2016. [Online]. Available: http://arxiv.org/abs/1611.09268

work page internal anchor Pith review Pith/arXiv arXiv 2016
[67]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019. [Online]. Available: http: //arxiv.org/abs/1908.10084

work page internal anchor Pith review Pith/arXiv arXiv 2019
[68]

C-pack: Packaged resources to advance general chinese embedding,

S. Xiao, Z. Liu, P. Zhang, and N. Muennighoff, “C-pack: Packaged resources to advance general chinese embedding,” 2023

work page 2023
[69]

Raptor: Recursive abstractive processing for tree-organized retrieval,

P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” inInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[70]

Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps,

X. Ho, A.-K. Duong Nguyen, S. Sugawara, and A. Aizawa, “Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps,” inProceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 6609–6625. [Online]. Available: https:...

work page 2020
[71]

HotpotQA: A dataset for diverse, explainable multi-hop question answering,

Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning, “HotpotQA: A dataset for diverse, explainable multi-hop question answering,” inConference on Empirical Methods in Natural Language Processing (EMNLP), 2018

work page 2018
[72]

Lv-eval: A balanced long-context benchmark with 5 length levels up to 256k,

T. Yuan, X. Ning, D. Zhou, Z. Yang, S. Li, M. Zhuang, Z. Tan, Z. Yao, D. Lin, B. Li, G. Dai, S. Yan, and Y . Wang, “Lv-eval: A balanced long-context benchmark with 5 length levels up to 256k,” 2024

work page 2024
[73]

A dataset of information-seeking questions and answers anchored in research papers,

P. Dasigi, K. Lo, I. Beltagy, A. Cohan, N. A. Smith, and M. Gardner, “A dataset of information-seeking questions and answers anchored in research papers,” 2021

work page 2021
[74]

New and improved embedding model,

OpenAI, “New and improved embedding model,” https://openai.com/ index/new-and-improved-embedding-model/, 2022, accessed: 2026-05- 04

work page 2022
[75]

Ragas: Supercharge your llm application evalua- tions,

ExplodingGradients, “Ragas: Supercharge your llm application evalua- tions,” https://github.com/explodinggradients/ragas, 2024

work page 2024
[76]

A 2.8- to-7.2gt/s ddr5 registering clock driver ic with parallel-data timing and pin-to-pin skew calibration for a dual in-line memory module,

J. Kim, J. Jung, K. Lim, B. Sung, J. Kim, B. Lim, T.-G. Noh, J. Lee, H.-G. Seok, Y . Cho, G. Kim, T. Nomiyama, S. Kang, Y . Jeong, S. Cho, G. Kim, D.-H. Oh, J. Kim, Y . Lim, S. Kim, S. Oh, and J. Lee, “A 2.8- to-7.2gt/s ddr5 registering clock driver ic with parallel-data timing and pin-to-pin skew calibration for a dual in-line memory module,” in2024 IEEE...

work page 2024
[77]

Channel analysis for a 6.4 gb/s ddr5 data buffer receiver front-end,

S. Lehmann and F. Gerfers, “Channel analysis for a 6.4 gb/s ddr5 data buffer receiver front-end,” in2017 15th IEEE International New Circuits and Systems Conference (NEWCAS), 2017, pp. 109–112

work page 2017
[78]

3d-ice 4.0: Accurate and efficient thermal modeling for 2.5d/3d heterogeneous chiplet systems,

K. Zhu, D. Huang, L. Costero, and D. Atienza, “3d-ice 4.0: Accurate and efficient thermal modeling for 2.5d/3d heterogeneous chiplet systems,” inProceedings of the 2026 Design, Automation and Test in Europe Conference (DATE). Verona, Italy: IEEE/ACM, March 2026

work page 2026
[79]

JESD79-5D: DDR5 SDRAM,

JEDEC Solid State Technology Association, “JESD79-5D: DDR5 SDRAM,” 2025, jEDEC Standard

work page 2025
[80]

R-trees: a dynamic index structure for spatial searching,

A. Guttman, “R-trees: a dynamic index structure for spatial searching,” SIGMOD Rec., vol. 14, no. 2, p. 47–57, Jun. 1984. [Online]. Available: https://doi.org/10.1145/971697.602266

work page doi:10.1145/971697.602266 1984
[81]

An algorithm for finding best matches in logarithmic expected time,

J. H. Friedman, J. L. Bentley, and R. A. Finkel, “An algorithm for finding best matches in logarithmic expected time,”ACM Trans. Math. Softw., vol. 3, no. 3, p. 209–226, Sep. 1977. [Online]. Available: https://doi.org/10.1145/355744.355745

work page doi:10.1145/355744.355745 1977

Showing first 80 references.

[1] [1]

A comprehensive overview of large language models,

H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A comprehensive overview of large language models,”ACM Transactions on Intelligent Systems and Technology, 2023

work page 2023

[2] [2]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

work page 2020

[3] [3]

Efficient approximate nearest neighbor search in multi-dimensional databases,

Y . Peng, B. Choi, T. N. Chan, J. Yang, and J. Xu, “Efficient approximate nearest neighbor search in multi-dimensional databases,”Proceedings of the ACM on Management of Data, vol. 1, no. 1, pp. 1–27, 2023

work page 2023

[4] [4]

Multidimensional binary search trees used for associative searching,

J. L. Bentley, “Multidimensional binary search trees used for associative searching,”Commun. ACM, vol. 18, no. 9, p. 509–517, Sep. 1975. [Online]. Available: https://doi.org/10.1145/361002.361007

work page doi:10.1145/361002.361007 1975

[5] [5]

Scalable nearest neighbor algorithms for high dimensional data,

M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, pp. 2227–2240, 2014

work page 2014

[6] [6]

When is nearest neighbor meaningful: Sequential data,

A. Hui and B. J. Gao, “When is nearest neighbor meaningful: Sequential data,” inProceedings of the 30th ACM International Conference on Information & Knowledge Management, ser. CIKM ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 3103–3106. [Online]. Available: https://doi.org/10.1145/3459637.3482219

work page doi:10.1145/3459637.3482219 2021

[7] [7]

Locality-sensitive hashing scheme based on dynamic collision counting,

J. Gan, J. Feng, Q. Fang, and W. Ng, “Locality-sensitive hashing scheme based on dynamic collision counting,” inProceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’12. New York, NY , USA: Association for Computing Machinery, 2012, p. 541–552. [Online]. Available: https://doi.org/10.1145/2213836.2213898

work page doi:10.1145/2213836.2213898 2012

[8] [8]

Fast locality-sensitive hashing,

A. Dasgupta, R. Kumar, and T. Sarlos, “Fast locality-sensitive hashing,” inProceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’11. New York, NY , USA: Association for Computing Machinery, 2011, p. 1073–1081. [Online]. Available: https://doi.org/10.1145/2020408.2020578

work page doi:10.1145/2020408.2020578 2011

[9] [9]

Locality-sensitive hashing scheme based on p-stable distributions,

M. Datar, N. Immorlica, P. Indyk, and V . S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” inProceedings of the Twentieth Annual Symposium on Computational Geometry, ser. SCG ’04. New York, NY , USA: Association for Computing Machinery, 2004, p. 253–262. [Online]. Available: https://doi.org/10.1145/997817.997857

work page doi:10.1145/997817.997857 2004

[10] [10]

Searching in one billion vectors: Re-rank with source coding,

H. J ´egou, R. Tavenard, M. Douze, and L. Amsaleg, “Searching in one billion vectors: Re-rank with source coding,” in2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 861–864

work page 2011

[11] [11]

Efficient k-nearest neighbor graph construction for generic similarity measures,

W. Dong, C. Moses, and K. Li, “Efficient k-nearest neighbor graph construction for generic similarity measures,” inProceedings of the 20th International Conference on World Wide Web, ser. WWW ’11. New York, NY , USA: Association for Computing Machinery, 2011, p. 577–586. [Online]. Available: https://doi.org/10.1145/1963405.1963487

work page doi:10.1145/1963405.1963487 2011

[12] [12]

Fast approximate nearest neighbor search with the navigating spreading-out graph,

C. Fu, C. Xiang, C. Wang, and D. Cai, “Fast approximate nearest neighbor search with the navigating spreading-out graph,”Proc. VLDB Endow., vol. 12, no. 5, p. 461–474, Jan. 2019. [Online]. Available: https://doi.org/10.14778/3303753.3303754

work page doi:10.14778/3303753.3303754 2019

[13] [13]

High dimensional similarity search with satellite system graph: Efficiency, scalability, and unindexed query compatibility,

C. Fu, C. Wang, and D. Cai, “High dimensional similarity search with satellite system graph: Efficiency, scalability, and unindexed query compatibility,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4139–4150, 2022

work page 2022

[14] [14]

Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs,

Y . A. Malkov and D. A. Yashunin, “Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 42, no. 4, pp. 824–836, 2020

work page 2020

[15] [15]

Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus,

H. Ootomo, A. Naruse, C. Nolet, R. Wang, T. Feher, and Y . Wang, “Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus,” in2024 IEEE 40th International Conference on Data Engineering (ICDE), 2024, pp. 4236–4247

work page 2024

[16] [17]

Ansmet: Approximate nearest neighbor search with near-memory processing and hybrid early termination,

Y . Li, Y . Jin, B. Tian, H. Zhang, and M. Gao, “Ansmet: Approximate nearest neighbor search with near-memory processing and hybrid early termination,” inProceedings of the 52nd Annual International Symposium on Computer Architecture, ser. ISCA ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 1093–1107. [Online]. Available: https://d...

work page doi:10.1145/3695053.3731013 2025

[17] [18]

Ndsearch: Accelerating graph-traversal-based approxi- mate nearest neighbor search through near data processing,

Y . Wang, S. Li, Q. Zheng, L. Song, Z. Li, A. Chang, H. H. Li, and Y . Chen, “Ndsearch: Accelerating graph-traversal-based approxi- mate nearest neighbor search through near data processing,” in2024 ACM/IEEE 51st Annual International Symposium on Computer Archi- tecture (ISCA), 2024, pp. 368–381

work page 2024

[18] [19]

CXL- ANNS: Software-Hardware collaborative memory disaggregation and computation for Billion-Scale approximate nearest neighbor search,

J. Jang, H. Choi, H. Bae, S. Lee, M. Kwon, and M. Jung, “CXL- ANNS: Software-Hardware collaborative memory disaggregation and computation for Billion-Scale approximate nearest neighbor search,” in2023 USENIX Annual Technical Conference (USENIX ATC 23). Boston, MA: USENIX Association, Jul. 2023, pp. 585–600. [Online]. Available: https://www.usenix.org/conf...

work page 2023

[19] [20]

Drex: Accurate and scalable dense retrieval acceleration via algorithmic-hardware codesign,

D. Quinn, E. E. Y ¨ucel, M. Prammer, Z. Fan, K. Skadron, J. M. Patel, J. F. Mart ´ınez, and M. Alian, “Drex: Accurate and scalable dense retrieval acceleration via algorithmic-hardware codesign,” in Proceedings of the 52nd Annual International Symposium on Computer Architecture, ser. ISCA ’25. New York, NY , USA: Association for Computing Machinery, 2025,...

work page doi:10.1145/3695053.3731079 2025

[20] [21]

Accelerating large-scale inference with anisotropic vector quantization,

R. Guo, P. Sun, E. Lindgren, Q. Geng, D. Simcha, F. Chern, and S. Kumar, “Accelerating large-scale inference with anisotropic vector quantization,” inProceedings of the 37th International Conference on Machine Learning, ser. ICML’20. JMLR.org, 2020

work page 2020

[21] [22]

Query-aware locality-sensitive hashing for approximate nearest neighbor search,

Q. Huang, J. Feng, Y . Zhang, Q. Fang, and W. Ng, “Query-aware locality-sensitive hashing for approximate nearest neighbor search,” Proc. VLDB Endow., vol. 9, no. 1, p. 1–12, Sep. 2015. [Online]. Available: https://doi.org/10.14778/2850469.2850470

work page doi:10.14778/2850469.2850470 2015

[22] [23]

Approximate nearest neighbor algorithm based on navigable small world graphs,

Y . Malkov, A. Ponomarenko, A. Logvinov, and V . Krylov, “Approximate nearest neighbor algorithm based on navigable small world graphs,” Information Systems, vol. 45, pp. 61–68, 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306437913001300

work page 2014

[23] [24]

Product quantization for nearest neighbor search,

H. J ´egou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117–128, 2011

work page 2011

[24] [25]

Optimized product quantization,

T. Ge, K. He, Q. Ke, and J. Sun, “Optimized product quantization,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 744–755, 2014

work page 2014

[25] [26]

Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search,

J. Gao and C. Long, “Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search,” Proc. ACM Manag. Data, vol. 2, no. 3, May 2024. [Online]. Available: https://doi.org/10.1145/3654970

work page doi:10.1145/3654970 2024

[26] [27]

Milvus: A purpose-built vector data management system,

J. Wang, X. Yi, R. Guo, H. Jin, P. Xu, S. Li, X. Wang, X. Guo, C. Li, X. Xuet al., “Milvus: A purpose-built vector data management system,” inProceedings of the 2021 International Conference on Management of Data, 2021, pp. 2614–2627

work page 2021

[27] [28]

Manu: a cloud native vector database management system,

R. Guo, X. Luan, L. Xiang, X. Yan, X. Yi, J. Luo, Q. Cheng, W. Xu, J. Luo, F. Liuet al., “Manu: a cloud native vector database management system,”Proceedings of the VLDB Endowment, vol. 15, no. 12, pp. 3548–3561, 2022

work page 2022

[28] [29]

Accurate and efficient metadata filtering in pinecone’s serverless vector database,

A. Ingber, E. Libertyet al., “Accurate and efficient metadata filtering in pinecone’s serverless vector database,” inICML, 2025

work page 2025

[29] [30]

Evaluating the effectiveness and efficiency of demonstration retrievers in rag for coding tasks,

P. He, S. Wang, S. Chowdhury, and T.-H. Chen, “Evaluating the effectiveness and efficiency of demonstration retrievers in rag for coding tasks,” in2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2025, pp. 500–510

work page 2025

[30] [31]

Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms,

M. Aum ¨uller, E. Bernhardsson, and A. Faithfull, “Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms,” Information Systems, vol. 87, p. 101374, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306437918303685

work page 2020

[31] [32]

Accelerating retrieval-augmented generation,

D. Quinn, M. Nouri, N. Patel, J. Salihu, A. Salemi, S. Lee, H. Zamani, and M. Alian, “Accelerating retrieval-augmented generation,” in Proceedings of the 52nd Annual International Symposium on Computer Architecture, ser. ISCA ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 1108–1124. [Online]. Available: https://doi.org/10.1145/3669...

work page doi:10.1145/3669940.3707264 2025

[32] [33]

High-dimensional approximate nearest neighbor search: with reliable and efficient distance comparison operations,

J. Gao and C. Long, “High-dimensional approximate nearest neighbor search: with reliable and efficient distance comparison operations,” Proc. ACM Manag. Data, vol. 1, no. 2, Jun. 2023. [Online]. Available: https://doi.org/10.1145/3589282

work page doi:10.1145/3589282 2023

[33] [34]

A modern primer on processing in memory,

O. Mutlu, S. Ghose, J. G ´omez-Luna, and R. Ausavarungnirun, “A modern primer on processing in memory,” inEmerging computing: from devices to systems: looking beyond Moore and Von Neumann. Springer, 2022, pp. 171–243

work page 2022

[34] [35]

Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system,

J. G ´omez-Luna, I. E. Hajj, I. Fernandez, C. Giannoula, G. F. Oliveira, and O. Mutlu, “Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system,”IEEE Access, vol. 10, pp. 52 565–52 608, 2022

work page 2022

[35] [36]

A 1ynm 1.25v 8gb, 16gb/s/pin gddr6-based accelerator- in-memory supporting 1tflops mac operation and various activation functions for deep-learning applications,

S. Lee, K. Kim, S. Oh, J. Park, G. Hong, D. Ka, K. Hwang, J. Park, K. Kang, J. Kim, J. Jeon, N. Kim, Y . Kwon, K. Vladimir, W. Shin, J. Won, M. Lee, H. Joo, H. Choi, J. Lee, D. Ko, Y . Jun, K. Cho, I. Kim, C. Song, C. Jeong, D. Kwon, J. Jang, I. Park, J. Chun, and J. Cho, “A 1ynm 1.25v 8gb, 16gb/s/pin gddr6-based accelerator- in-memory supporting 1tflops ...

work page 2022

[36] [37]

Hardware architecture and software stack for pim based on commercial dram technology : Industrial product,

S. Lee, S.-h. Kang, J. Lee, H. Kim, E. Lee, S. Seo, H. Yoon, S. Lee, K. Lim, H. Shin, J. Kim, O. Seongil, A. Iyer, D. Wang, K. Sohn, and N. S. Kim, “Hardware architecture and software stack for pim based on commercial dram technology : Industrial product,” in2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 43–56

work page 2021

[37] [38]

A survey of near-data processing architectures for neural networks,

M. Hassanpour, M. Riera, and A. Gonz ´alez, “A survey of near-data processing architectures for neural networks,”Machine Learning and Knowledge Extraction, vol. 4, pp. 66–103, 01 2022

work page 2022

[38] [39]

Unindp: A unified compilation and simulation tool for near dram processing architectures,

T. Xie, Z. Zhu, B. Li, Y . He, C. Li, G. Sun, H. Yang, Y . Xie, and Y . Wang, “Unindp: A unified compilation and simulation tool for near dram processing architectures,” in2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2025, pp. 624–640

work page 2025

[39] [40]

Ndpbridge: Enabling cross-bank coordination in near-dram-bank processing architectures,

B. Tian, Y . Li, L. Jiang, S. Cai, and M. Gao, “Ndpbridge: Enabling cross-bank coordination in near-dram-bank processing architectures,” in Proceedings of the 51st Annual International Symposium on Computer Architecture, ser. ISCA ’24. IEEE Press, 2025, p. 628–643. [Online]. Available: https://doi.org/10.1109/ISCA59077.2024.00052

work page doi:10.1109/isca59077.2024.00052 2025

[40] [41]

Medal: Scalable dimm based near data processing accelerator for dna seeding algorithm,

W. Huangfu, X. Li, S. Li, X. Hu, P. Gu, and Y . Xie, “Medal: Scalable dimm based near data processing accelerator for dna seeding algorithm,” inProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-52. New York, NY , USA: Association for Computing Machinery, 2019, p. 587–599. [Online]. Available: https://doi.org/...

work page doi:10.1145/3352460.3358329 2019

[41] [42]

Roofline: an insightful visual performance model for multicore architectures,

S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful visual performance model for multicore architectures,”Communications of the ACM, vol. 52, no. 4, pp. 65–76, 2009

work page 2009

[42] [43]

Distinctive image features from scale-invariant keypoints,

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004

work page 2004

[43] [44]

GloVe: Global vectors for word representation,

J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for word representation,” inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), A. Moschitti, B. Pang, and W. Daelemans, Eds. Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. [Online]. Available: https://aclantholog...

work page 2014

[44] [45]

Maicc: A lightweight many-core architecture with in-cache computing for multi-dnn parallel inference,

R. Fan, Y . Cui, Q. Chen, M. Wang, Y . Zhang, W. Zheng, and Z. Li, “Maicc: A lightweight many-core architecture with in-cache computing for multi-dnn parallel inference,” inProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 411–423. [Online...

work page doi:10.1145/3613424 2023

[45] [46]

Vspim: Sram processing-in- memory dnn acceleration via vector-scalar operations,

C. Nie, C. Tang, J. Lin, H. Hu, C. Lv, T. Cao, W. Zhang, L. Jiang, X. Liang, W. Qian, Y . Sun, and Z. He, “Vspim: Sram processing-in- memory dnn acceleration via vector-scalar operations,”IEEE Transac- tions on Computers, vol. 73, no. 10, pp. 2378–2390, 2024

work page 2024

[46] [47]

Polymorpic: Em- bedding polymorphic processing-in-cache in risc-v based processor for full-stack efficient ai inference,

C. Zou, Z. Wei, J. Y . Lee, C. Nie, K. You, and Z. He, “Polymorpic: Em- bedding polymorphic processing-in-cache in risc-v based processor for full-stack efficient ai inference,” in2025 58th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2025

work page 2025

[47] [48]

Randomized pca forest for approximate k-nearest neighbor search,

M. Rajabinasab, F. Pakdaman, A. Zimek, and M. Gabbouj, “Randomized pca forest for approximate k-nearest neighbor search,”Expert Systems with Applications, vol. 281, p. 126254, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S095741742403121X

work page 2025

[48] [49]

Df-gas: a distributed fpga-as-a- service architecture towards billion-scale graph-based approximate near- est neighbor search,

S. Zeng, Z. Zhu, J. Liu, H. Zhang, G. Dai, Z. Zhou, S. Li, X. Ning, Y . Xie, H. Yang, and Y . Wang, “Df-gas: a distributed fpga-as-a- service architecture towards billion-scale graph-based approximate near- est neighbor search,” in2023 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023, pp. 283–296

work page 2023

[49] [50]

Scalable billion-point approximate nearest neighbor search using SmartSSDs,

B. Tian, H. Liu, Z. Duan, X. Liao, H. Jin, and Y . Zhang, “Scalable billion-point approximate nearest neighbor search using SmartSSDs,” in 2024 USENIX Annual Technical Conference (USENIX ATC 24). Santa Clara, CA: USENIX Association, Jul. 2024, pp. 1135–1150. [Online]. Available: https://www.usenix.org/conference/atc24/presentation/tian

work page 2024

[50] [51]

Ice: An intelligent cognition engine with 3d nand-based in-memory computing for vector similarity search acceleration,

H.-W. Hu, W.-C. Wang, Y .-H. Chang, Y .-C. Lee, B.-R. Lin, H.-M. Wang, Y .-P. Lin, Y .-M. Huang, C.-Y . Lee, T.-H. Su, C.-C. Hsieh, C.-M. Hu, Y .-T. Lai, C.-K. Chen, H.-S. Chen, H.-P. Li, T.-W. Kuo, M.-F. Chang, K.-C. Wang, C.-H. Hung, and C.-Y . Lu, “Ice: An intelligent cognition engine with 3d nand-based in-memory computing for vector similarity search ...

work page doi:10.1109/micro56248.2022.00058 2023

[51] [52]

Linear algebraic structure of word senses, with applications to polysemy,

S. Arora, Y . Li, Y . Liang, T. Ma, and A. Risteski, “Linear algebraic structure of word senses, with applications to polysemy,”Transactions of the Association for Computational Linguistics, vol. 6, pp. 483–495,

work page

[52] [53]

Available: https://aclanthology.org/Q18-1034/

[Online]. Available: https://aclanthology.org/Q18-1034/

work page

[53] [54]

Distributed representations of words and phrases and their compositionality,

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, ser. NIPS’13. Red Hook, NY , USA: Curran Associates Inc., 2013, p. 3111–3119

work page 2013

[54] [55]

Efficient Estimation of Word Representations in Vector Space,

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” 1 2013

work page 2013

[55] [56]

Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference,

T. Tambe, E.-Y . Yang, Z. Wan, Y . Deng, V . J. Reddi, A. Rush, D. Brooks, and G.-Y . Wei, “Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference,” in2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020, pp. 1–6

work page 2020

[56] [57]

Improving neural network efficiency via post-training quan- tization with adaptive floating-point,

F. Liu, W. Zhao, Z. He, Y . Wang, Z. Wang, C. Dai, X. Liang, and L. Jiang, “Improving neural network efficiency via post-training quan- tization with adaptive floating-point,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5281–5290

work page 2021

[57] [58]

Understanding and modeling on-die error correction in modern dram: An experimental study using real devices,

M. Patel, J. S. Kim, H. Hassan, and O. Mutlu, “Understanding and modeling on-die error correction in modern dram: An experimental study using real devices,” in2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2019, pp. 13– 25

work page 2019

[58] [59]

Ddr5 sdram standard (jesd79-5),

JEDEC Solid State Technology Association, “Ddr5 sdram standard (jesd79-5),” JEDEC, Tech. Rep., 2020. [Online]. Available: https: //www.jedec.org/standards-documents/docs/jesd79-5d

work page 2020

[59] [60]

A survey of techniques for improving error-resilience of dram,

S. Mittal and M. S. Inukonda, “A survey of techniques for improving error-resilience of dram,”Journal of Systems Architecture, vol. 91, pp. 11–40, 2018. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1383762118301693

work page 2018

[60] [61]

Fanns: An fpga-based approximate nearest- neighbor search accelerator,

W. Yuan and X. Jin, “Fanns: An fpga-based approximate nearest- neighbor search accelerator,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 33, no. 4, pp. 1197–1201, 2025

work page 2025

[61] [62]

Anna: Specialized architecture for approximate nearest neighbor search,

Y . Lee, H. Choi, S. Min, H. Lee, S. Beak, D. Jeong, J. W. Lee, and T. J. Ham, “Anna: Specialized architecture for approximate nearest neighbor search,” in2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2022, pp. 169–183

work page 2022

[62] [63]

Turbocharge anns on real processing-in-memory by enabling fine-grained per-pim-core scheduling,

P. Wu, M. Xie, E. Zhao, D. Zhang, J. Wang, X. Liang, K. Ren, and Y . Chai, “Turbocharge anns on real processing-in-memory by enabling fine-grained per-pim-core scheduling,” inProceedings of the 2025 USENIX Conference on Usenix Annual Technical Conference, ser. USENIX ATC ’25. USA: USENIX Association, 2025

work page 2025

[63] [64]

Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search,

H. V . Simhadri, G. Williams, M. Aum ¨uller, M. Douze, A. Babenko, D. Baranchuk, Q. Chen, L. Hosseini, R. Krishnaswamy, G. Srinivasa, S. J. Subramanya, and J. Wang, “Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search,” inProceedings of the NeurIPS 2021 Competitions and Demonstrations Track, ser. Proceedings of Machine...

work page 2021

[64] [65]

Foundation

W. Foundation. Wikimedia downloads. [Online]. Available: https: //dumps.wikimedia.org

work page

[65] [66]

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, and L. Deng, “MS MARCO: A human generated machine reading comprehension dataset,”CoRR, vol. abs/1611.09268, 2016. [Online]. Available: http://arxiv.org/abs/1611.09268

work page internal anchor Pith review Pith/arXiv arXiv 2016

[66] [67]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019. [Online]. Available: http: //arxiv.org/abs/1908.10084

work page internal anchor Pith review Pith/arXiv arXiv 2019

[67] [68]

C-pack: Packaged resources to advance general chinese embedding,

S. Xiao, Z. Liu, P. Zhang, and N. Muennighoff, “C-pack: Packaged resources to advance general chinese embedding,” 2023

work page 2023

[68] [69]

Raptor: Recursive abstractive processing for tree-organized retrieval,

P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” inInternational Conference on Learning Representations (ICLR), 2024

work page 2024

[69] [70]

Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps,

X. Ho, A.-K. Duong Nguyen, S. Sugawara, and A. Aizawa, “Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps,” inProceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 6609–6625. [Online]. Available: https:...

work page 2020

[70] [71]

HotpotQA: A dataset for diverse, explainable multi-hop question answering,

Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning, “HotpotQA: A dataset for diverse, explainable multi-hop question answering,” inConference on Empirical Methods in Natural Language Processing (EMNLP), 2018

work page 2018

[71] [72]

Lv-eval: A balanced long-context benchmark with 5 length levels up to 256k,

T. Yuan, X. Ning, D. Zhou, Z. Yang, S. Li, M. Zhuang, Z. Tan, Z. Yao, D. Lin, B. Li, G. Dai, S. Yan, and Y . Wang, “Lv-eval: A balanced long-context benchmark with 5 length levels up to 256k,” 2024

work page 2024

[72] [73]

A dataset of information-seeking questions and answers anchored in research papers,

P. Dasigi, K. Lo, I. Beltagy, A. Cohan, N. A. Smith, and M. Gardner, “A dataset of information-seeking questions and answers anchored in research papers,” 2021

work page 2021

[73] [74]

New and improved embedding model,

OpenAI, “New and improved embedding model,” https://openai.com/ index/new-and-improved-embedding-model/, 2022, accessed: 2026-05- 04

work page 2022

[74] [75]

Ragas: Supercharge your llm application evalua- tions,

ExplodingGradients, “Ragas: Supercharge your llm application evalua- tions,” https://github.com/explodinggradients/ragas, 2024

work page 2024

[75] [76]

A 2.8- to-7.2gt/s ddr5 registering clock driver ic with parallel-data timing and pin-to-pin skew calibration for a dual in-line memory module,

J. Kim, J. Jung, K. Lim, B. Sung, J. Kim, B. Lim, T.-G. Noh, J. Lee, H.-G. Seok, Y . Cho, G. Kim, T. Nomiyama, S. Kang, Y . Jeong, S. Cho, G. Kim, D.-H. Oh, J. Kim, Y . Lim, S. Kim, S. Oh, and J. Lee, “A 2.8- to-7.2gt/s ddr5 registering clock driver ic with parallel-data timing and pin-to-pin skew calibration for a dual in-line memory module,” in2024 IEEE...

work page 2024

[76] [77]

Channel analysis for a 6.4 gb/s ddr5 data buffer receiver front-end,

S. Lehmann and F. Gerfers, “Channel analysis for a 6.4 gb/s ddr5 data buffer receiver front-end,” in2017 15th IEEE International New Circuits and Systems Conference (NEWCAS), 2017, pp. 109–112

work page 2017

[77] [78]

3d-ice 4.0: Accurate and efficient thermal modeling for 2.5d/3d heterogeneous chiplet systems,

K. Zhu, D. Huang, L. Costero, and D. Atienza, “3d-ice 4.0: Accurate and efficient thermal modeling for 2.5d/3d heterogeneous chiplet systems,” inProceedings of the 2026 Design, Automation and Test in Europe Conference (DATE). Verona, Italy: IEEE/ACM, March 2026

work page 2026

[78] [79]

JESD79-5D: DDR5 SDRAM,

JEDEC Solid State Technology Association, “JESD79-5D: DDR5 SDRAM,” 2025, jEDEC Standard

work page 2025

[79] [80]

R-trees: a dynamic index structure for spatial searching,

A. Guttman, “R-trees: a dynamic index structure for spatial searching,” SIGMOD Rec., vol. 14, no. 2, p. 47–57, Jun. 1984. [Online]. Available: https://doi.org/10.1145/971697.602266

work page doi:10.1145/971697.602266 1984

[80] [81]

An algorithm for finding best matches in logarithmic expected time,

J. H. Friedman, J. L. Bentley, and R. A. Finkel, “An algorithm for finding best matches in logarithmic expected time,”ACM Trans. Math. Softw., vol. 3, no. 3, p. 209–226, Sep. 1977. [Online]. Available: https://doi.org/10.1145/355744.355745

work page doi:10.1145/355744.355745 1977