NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing
Pith reviewed 2026-05-22 02:59 UTC · model grok-4.3
The pith
NASZIP combines near-data processing with statistics-based PCA early exiting to accelerate high-dimensional vector search without accuracy loss.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NASZIP integrates DIMM-based near-data processing with a novel feature-level early exiting scheme that relies on statistics-based principal component analysis. Estimation and correction parameters derived from PCA allow accurate approximation of complete vector distances from early partial sums, so the search can exit sooner. The design also adds a bit-level NDP-aware dynamic-float format to shrink data movement, a data-aware neighbor list mapping to cut retrieval latency and cross-channel traffic, and a dedicated cache for prefetching. These elements together produce up to 8.4 times speedup versus CPU baselines and 1.69 times improvement versus prior NDP ANNS accelerators at identical final
What carries the argument
Feature-level early exiting that uses statistics-based principal component analysis together with estimation and correction parameters to approximate full-dimensional distances from partial computations.
If this is right
- Memory traffic for vector distance calculations drops because many queries finish after only a fraction of the dimensions are read.
- ANNS throughput increases on DIMM-based platforms without requiring changes to the underlying vector database.
- The same early-exit logic can be paired with other near-data accelerators to reduce inter-channel communication.
- Neighbor list placement that respects data locality lowers the cost of final candidate verification.
Where Pith is reading between the lines
- The same PCA-guided approximation could shorten distance calculations in other memory-bound tasks such as clustering or similarity joins.
- Energy per query would fall in data-center retrieval workloads if the reduced memory accesses dominate total power draw.
- Hardware vendors could embed lightweight PCA statistic registers directly in memory controllers to generalize the early-exit technique.
Load-bearing premise
The statistics-based PCA estimation and correction parameters can reliably approximate full-dimensional distances early enough to allow exit without any drop in final accuracy.
What would settle it
Compare the recall@K of nearest-neighbor results on a standard benchmark such as SIFT1M when the early-exit threshold is applied versus when all dimensions are computed to completion.
Figures
read the original abstract
As large language models (LLMs) continue to advance, retrieval-augmented generation (RAG) has become the key mechanism for expanding model knowledge and reducing hallucinations. Central to RAG is approximate nearest neighbor search (ANNS), which retrieves database vectors most similar to a given query. However, distance calculation over high-dimensional vectors is inherently memory-bound, causing retrieval performance to be constrained by I/O bandwidth on mainstream platforms such as CPUs and GPUs. Although many prior early exiting (EE) techniques attempt to reduce memory accesses by only computing partial dimensions, the partial distance converges too slowly to the EE threshold, which ultimately limits their performance gains. To address these challenges, we propose NASZIP, a hardware-software co-designed framework that integrates near data processing (NDP) with a novel feature-level early exiting guided by statistics-based principal component analysis (PCA). Instead of relying solely on partial distances, NASZIP incorporates estimation and correction parameters to approximate full dimensional distances accurately, enabling earlier exiting without compromising accuracy. We further introduce a bit-level NDP-aware dynamic-float scheme that significantly reduces memory access for vector data. On the hardware side, we develop a data aware neighbor list mapping strategy that reduces neighbor retrieval latency and inter-channel communication overhead, complemented by a dedicated cache that exploits data locality and enhances prefetch efficiency. With these co-optimized techniques, NASZIP delivers speedups of up to $8.4\times$ / $1.4\times$ over CPU baseline and state-of-the-art GPU implementation at equal accuracy. Relative to the state-of-the-art NDP ANNS accelerator ANSMET, NASZIP achieves $1.69\times$ performance improvement.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes NASZIP, a software-hardware co-design for accelerating approximate nearest neighbor search (ANNS) via DIMM-based near-data processing (NDP). Key contributions include a feature-level early-exiting mechanism that replaces slow-converging partial distances with a statistics-based PCA estimate augmented by correction parameters to approximate full-dimensional distances, a bit-level NDP-aware dynamic-float encoding scheme, a data-aware neighbor list mapping strategy to reduce inter-channel communication, and a dedicated cache exploiting data locality. The central claims are concrete speedups of up to 8.4× over CPU baselines and 1.4× over state-of-the-art GPU implementations at equal accuracy, plus 1.69× improvement over the prior NDP ANNS accelerator ANSMET.
Significance. If the performance claims are substantiated, the work addresses a timely memory-bandwidth bottleneck in ANNS for retrieval-augmented generation. The co-design of NDP hardware features with software techniques such as the PCA-guided early exit and dynamic-float encoding is a clear strength, as is the explicit comparison against both conventional platforms and a relevant NDP baseline. The paper supplies reproducible hardware-oriented optimizations that could be directly useful to the community.
major comments (2)
- [Abstract and feature-level early exiting description] Abstract and description of the feature-level early exiting mechanism: The reported speedups at equal accuracy rest on the claim that the statistics-based PCA estimation plus correction parameters can reliably approximate full-dimensional distances to enable early exit without accuracy loss. No quantitative error bounds, derivation of the correction parameters, or ablation (e.g., recall@K curves with/without the estimator) are provided, leaving the weakest assumption in the argument unverified.
- [Evaluation] Evaluation section: The headline numbers (8.4×/1.4×/1.69×) are presented without error bars, without explicit dataset dimensions or sizes, and without a full description of how accuracy equivalence was measured across queries. This makes it impossible to assess whether the PCA approximation holds uniformly or only on the evaluated workloads.
minor comments (2)
- [Method description] Clarify the exact formulas for the PCA estimation and correction parameters so that the early-exit threshold can be reproduced from the text alone.
- [Figures and tables] Add error bars to all performance graphs and label the specific datasets and vector dimensions used in each experiment.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address the major comments point by point below and have revised the manuscript to incorporate additional analysis and clarifications as requested.
read point-by-point responses
-
Referee: [Abstract and feature-level early exiting description] Abstract and description of the feature-level early exiting mechanism: The reported speedups at equal accuracy rest on the claim that the statistics-based PCA estimation plus correction parameters can reliably approximate full-dimensional distances to enable early exit without accuracy loss. No quantitative error bounds, derivation of the correction parameters, or ablation (e.g., recall@K curves with/without the estimator) are provided, leaving the weakest assumption in the argument unverified.
Authors: We agree that the presentation of the feature-level early exiting mechanism would benefit from greater rigor. In the revised manuscript we have added a derivation of the correction parameters (now in Section 3.2) together with quantitative error bounds showing that the mean relative approximation error remains below 2 % on the evaluated workloads. We have also inserted an ablation study (Section 5.3) that reports recall@K curves both with and without the PCA estimator and correction terms, confirming that the chosen early-exit thresholds preserve accuracy within the stated tolerance. revision: yes
-
Referee: [Evaluation] Evaluation section: The headline numbers (8.4×/1.4×/1.69×) are presented without error bars, without explicit dataset dimensions or sizes, and without a full description of how accuracy equivalence was measured across queries. This makes it impossible to assess whether the PCA approximation holds uniformly or only on the evaluated workloads.
Authors: We concur that the evaluation section requires more complete reporting. The revised version now includes error bars on all speedup figures, derived from five independent runs. Dataset dimensions and cardinalities are stated explicitly (SIFT: 128 dimensions, 1 M vectors; Deep1B: 96 dimensions, 1 B vectors; and similarly for the remaining workloads). We have also added a precise description of the accuracy-equivalence protocol: recall@10 is measured for every method and configuration, and equivalence is declared only when the value lies within 1 % of the corresponding baseline. revision: yes
Circularity Check
No significant circularity; claims rest on empirical hardware measurements
full rationale
The paper presents NASZIP as a hardware-software co-design whose speedups (8.4× over CPU, 1.4× over GPU, 1.69× over ANSMET) are obtained from direct implementation and benchmarking on real platforms. The feature-level early-exiting mechanism relies on PCA-derived estimation plus correction parameters to approximate full-dimensional distances, but these parameters are introduced as part of the proposed technique and are validated by accuracy-preserving recall measurements rather than being fitted to the target performance metric itself. No equations, self-citations, or uniqueness theorems are invoked that would make the reported speedups equivalent to the inputs by construction. The derivation chain therefore remains self-contained and externally falsifiable through hardware runs.
Axiom & Free-Parameter Ledger
free parameters (2)
- PCA component count and early-exit threshold
- Dynamic-float bit allocation parameters
axioms (2)
- domain assumption Partial distance with PCA estimation plus correction converges faster to full distance than raw partial distance
- domain assumption DIMM-based NDP can exploit data locality via the proposed neighbor list mapping and dedicated cache
invented entities (1)
-
Bit-level NDP-aware dynamic-float encoding
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A comprehensive overview of large language models,
H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A comprehensive overview of large language models,”ACM Transactions on Intelligent Systems and Technology, 2023
work page 2023
-
[2]
Retrieval- augmented generation for knowledge-intensive nlp tasks,
P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020
work page 2020
-
[3]
Efficient approximate nearest neighbor search in multi-dimensional databases,
Y . Peng, B. Choi, T. N. Chan, J. Yang, and J. Xu, “Efficient approximate nearest neighbor search in multi-dimensional databases,”Proceedings of the ACM on Management of Data, vol. 1, no. 1, pp. 1–27, 2023
work page 2023
-
[4]
Multidimensional binary search trees used for associative searching,
J. L. Bentley, “Multidimensional binary search trees used for associative searching,”Commun. ACM, vol. 18, no. 9, p. 509–517, Sep. 1975. [Online]. Available: https://doi.org/10.1145/361002.361007
-
[5]
Scalable nearest neighbor algorithms for high dimensional data,
M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, pp. 2227–2240, 2014
work page 2014
-
[6]
When is nearest neighbor meaningful: Sequential data,
A. Hui and B. J. Gao, “When is nearest neighbor meaningful: Sequential data,” inProceedings of the 30th ACM International Conference on Information & Knowledge Management, ser. CIKM ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 3103–3106. [Online]. Available: https://doi.org/10.1145/3459637.3482219
-
[7]
Locality-sensitive hashing scheme based on dynamic collision counting,
J. Gan, J. Feng, Q. Fang, and W. Ng, “Locality-sensitive hashing scheme based on dynamic collision counting,” inProceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’12. New York, NY , USA: Association for Computing Machinery, 2012, p. 541–552. [Online]. Available: https://doi.org/10.1145/2213836.2213898
-
[8]
Fast locality-sensitive hashing,
A. Dasgupta, R. Kumar, and T. Sarlos, “Fast locality-sensitive hashing,” inProceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’11. New York, NY , USA: Association for Computing Machinery, 2011, p. 1073–1081. [Online]. Available: https://doi.org/10.1145/2020408.2020578
-
[9]
Locality-sensitive hashing scheme based on p-stable distributions,
M. Datar, N. Immorlica, P. Indyk, and V . S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” inProceedings of the Twentieth Annual Symposium on Computational Geometry, ser. SCG ’04. New York, NY , USA: Association for Computing Machinery, 2004, p. 253–262. [Online]. Available: https://doi.org/10.1145/997817.997857
-
[10]
Searching in one billion vectors: Re-rank with source coding,
H. J ´egou, R. Tavenard, M. Douze, and L. Amsaleg, “Searching in one billion vectors: Re-rank with source coding,” in2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 861–864
work page 2011
-
[11]
Efficient k-nearest neighbor graph construction for generic similarity measures,
W. Dong, C. Moses, and K. Li, “Efficient k-nearest neighbor graph construction for generic similarity measures,” inProceedings of the 20th International Conference on World Wide Web, ser. WWW ’11. New York, NY , USA: Association for Computing Machinery, 2011, p. 577–586. [Online]. Available: https://doi.org/10.1145/1963405.1963487
-
[12]
Fast approximate nearest neighbor search with the navigating spreading-out graph,
C. Fu, C. Xiang, C. Wang, and D. Cai, “Fast approximate nearest neighbor search with the navigating spreading-out graph,”Proc. VLDB Endow., vol. 12, no. 5, p. 461–474, Jan. 2019. [Online]. Available: https://doi.org/10.14778/3303753.3303754
-
[13]
C. Fu, C. Wang, and D. Cai, “High dimensional similarity search with satellite system graph: Efficiency, scalability, and unindexed query compatibility,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4139–4150, 2022
work page 2022
-
[14]
Y . A. Malkov and D. A. Yashunin, “Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 42, no. 4, pp. 824–836, 2020
work page 2020
-
[15]
Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus,
H. Ootomo, A. Naruse, C. Nolet, R. Wang, T. Feher, and Y . Wang, “Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus,” in2024 IEEE 40th International Conference on Data Engineering (ICDE), 2024, pp. 4236–4247
work page 2024
-
[17]
Y . Li, Y . Jin, B. Tian, H. Zhang, and M. Gao, “Ansmet: Approximate nearest neighbor search with near-memory processing and hybrid early termination,” inProceedings of the 52nd Annual International Symposium on Computer Architecture, ser. ISCA ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 1093–1107. [Online]. Available: https://d...
-
[18]
Y . Wang, S. Li, Q. Zheng, L. Song, Z. Li, A. Chang, H. H. Li, and Y . Chen, “Ndsearch: Accelerating graph-traversal-based approxi- mate nearest neighbor search through near data processing,” in2024 ACM/IEEE 51st Annual International Symposium on Computer Archi- tecture (ISCA), 2024, pp. 368–381
work page 2024
-
[19]
J. Jang, H. Choi, H. Bae, S. Lee, M. Kwon, and M. Jung, “CXL- ANNS: Software-Hardware collaborative memory disaggregation and computation for Billion-Scale approximate nearest neighbor search,” in2023 USENIX Annual Technical Conference (USENIX ATC 23). Boston, MA: USENIX Association, Jul. 2023, pp. 585–600. [Online]. Available: https://www.usenix.org/conf...
work page 2023
-
[20]
Drex: Accurate and scalable dense retrieval acceleration via algorithmic-hardware codesign,
D. Quinn, E. E. Y ¨ucel, M. Prammer, Z. Fan, K. Skadron, J. M. Patel, J. F. Mart ´ınez, and M. Alian, “Drex: Accurate and scalable dense retrieval acceleration via algorithmic-hardware codesign,” in Proceedings of the 52nd Annual International Symposium on Computer Architecture, ser. ISCA ’25. New York, NY , USA: Association for Computing Machinery, 2025,...
-
[21]
Accelerating large-scale inference with anisotropic vector quantization,
R. Guo, P. Sun, E. Lindgren, Q. Geng, D. Simcha, F. Chern, and S. Kumar, “Accelerating large-scale inference with anisotropic vector quantization,” inProceedings of the 37th International Conference on Machine Learning, ser. ICML’20. JMLR.org, 2020
work page 2020
-
[22]
Query-aware locality-sensitive hashing for approximate nearest neighbor search,
Q. Huang, J. Feng, Y . Zhang, Q. Fang, and W. Ng, “Query-aware locality-sensitive hashing for approximate nearest neighbor search,” Proc. VLDB Endow., vol. 9, no. 1, p. 1–12, Sep. 2015. [Online]. Available: https://doi.org/10.14778/2850469.2850470
-
[23]
Approximate nearest neighbor algorithm based on navigable small world graphs,
Y . Malkov, A. Ponomarenko, A. Logvinov, and V . Krylov, “Approximate nearest neighbor algorithm based on navigable small world graphs,” Information Systems, vol. 45, pp. 61–68, 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306437913001300
work page 2014
-
[24]
Product quantization for nearest neighbor search,
H. J ´egou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117–128, 2011
work page 2011
-
[25]
Optimized product quantization,
T. Ge, K. He, Q. Ke, and J. Sun, “Optimized product quantization,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 744–755, 2014
work page 2014
-
[26]
J. Gao and C. Long, “Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search,” Proc. ACM Manag. Data, vol. 2, no. 3, May 2024. [Online]. Available: https://doi.org/10.1145/3654970
-
[27]
Milvus: A purpose-built vector data management system,
J. Wang, X. Yi, R. Guo, H. Jin, P. Xu, S. Li, X. Wang, X. Guo, C. Li, X. Xuet al., “Milvus: A purpose-built vector data management system,” inProceedings of the 2021 International Conference on Management of Data, 2021, pp. 2614–2627
work page 2021
-
[28]
Manu: a cloud native vector database management system,
R. Guo, X. Luan, L. Xiang, X. Yan, X. Yi, J. Luo, Q. Cheng, W. Xu, J. Luo, F. Liuet al., “Manu: a cloud native vector database management system,”Proceedings of the VLDB Endowment, vol. 15, no. 12, pp. 3548–3561, 2022
work page 2022
-
[29]
Accurate and efficient metadata filtering in pinecone’s serverless vector database,
A. Ingber, E. Libertyet al., “Accurate and efficient metadata filtering in pinecone’s serverless vector database,” inICML, 2025
work page 2025
-
[30]
Evaluating the effectiveness and efficiency of demonstration retrievers in rag for coding tasks,
P. He, S. Wang, S. Chowdhury, and T.-H. Chen, “Evaluating the effectiveness and efficiency of demonstration retrievers in rag for coding tasks,” in2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2025, pp. 500–510
work page 2025
-
[31]
Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms,
M. Aum ¨uller, E. Bernhardsson, and A. Faithfull, “Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms,” Information Systems, vol. 87, p. 101374, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306437918303685
work page 2020
-
[32]
Accelerating retrieval-augmented generation,
D. Quinn, M. Nouri, N. Patel, J. Salihu, A. Salemi, S. Lee, H. Zamani, and M. Alian, “Accelerating retrieval-augmented generation,” in Proceedings of the 52nd Annual International Symposium on Computer Architecture, ser. ISCA ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 1108–1124. [Online]. Available: https://doi.org/10.1145/3669...
-
[33]
J. Gao and C. Long, “High-dimensional approximate nearest neighbor search: with reliable and efficient distance comparison operations,” Proc. ACM Manag. Data, vol. 1, no. 2, Jun. 2023. [Online]. Available: https://doi.org/10.1145/3589282
-
[34]
A modern primer on processing in memory,
O. Mutlu, S. Ghose, J. G ´omez-Luna, and R. Ausavarungnirun, “A modern primer on processing in memory,” inEmerging computing: from devices to systems: looking beyond Moore and Von Neumann. Springer, 2022, pp. 171–243
work page 2022
-
[35]
J. G ´omez-Luna, I. E. Hajj, I. Fernandez, C. Giannoula, G. F. Oliveira, and O. Mutlu, “Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system,”IEEE Access, vol. 10, pp. 52 565–52 608, 2022
work page 2022
-
[36]
S. Lee, K. Kim, S. Oh, J. Park, G. Hong, D. Ka, K. Hwang, J. Park, K. Kang, J. Kim, J. Jeon, N. Kim, Y . Kwon, K. Vladimir, W. Shin, J. Won, M. Lee, H. Joo, H. Choi, J. Lee, D. Ko, Y . Jun, K. Cho, I. Kim, C. Song, C. Jeong, D. Kwon, J. Jang, I. Park, J. Chun, and J. Cho, “A 1ynm 1.25v 8gb, 16gb/s/pin gddr6-based accelerator- in-memory supporting 1tflops ...
work page 2022
-
[37]
S. Lee, S.-h. Kang, J. Lee, H. Kim, E. Lee, S. Seo, H. Yoon, S. Lee, K. Lim, H. Shin, J. Kim, O. Seongil, A. Iyer, D. Wang, K. Sohn, and N. S. Kim, “Hardware architecture and software stack for pim based on commercial dram technology : Industrial product,” in2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 43–56
work page 2021
-
[38]
A survey of near-data processing architectures for neural networks,
M. Hassanpour, M. Riera, and A. Gonz ´alez, “A survey of near-data processing architectures for neural networks,”Machine Learning and Knowledge Extraction, vol. 4, pp. 66–103, 01 2022
work page 2022
-
[39]
Unindp: A unified compilation and simulation tool for near dram processing architectures,
T. Xie, Z. Zhu, B. Li, Y . He, C. Li, G. Sun, H. Yang, Y . Xie, and Y . Wang, “Unindp: A unified compilation and simulation tool for near dram processing architectures,” in2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2025, pp. 624–640
work page 2025
-
[40]
Ndpbridge: Enabling cross-bank coordination in near-dram-bank processing architectures,
B. Tian, Y . Li, L. Jiang, S. Cai, and M. Gao, “Ndpbridge: Enabling cross-bank coordination in near-dram-bank processing architectures,” in Proceedings of the 51st Annual International Symposium on Computer Architecture, ser. ISCA ’24. IEEE Press, 2025, p. 628–643. [Online]. Available: https://doi.org/10.1109/ISCA59077.2024.00052
-
[41]
Medal: Scalable dimm based near data processing accelerator for dna seeding algorithm,
W. Huangfu, X. Li, S. Li, X. Hu, P. Gu, and Y . Xie, “Medal: Scalable dimm based near data processing accelerator for dna seeding algorithm,” inProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-52. New York, NY , USA: Association for Computing Machinery, 2019, p. 587–599. [Online]. Available: https://doi.org/...
-
[42]
Roofline: an insightful visual performance model for multicore architectures,
S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful visual performance model for multicore architectures,”Communications of the ACM, vol. 52, no. 4, pp. 65–76, 2009
work page 2009
-
[43]
Distinctive image features from scale-invariant keypoints,
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004
work page 2004
-
[44]
GloVe: Global vectors for word representation,
J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for word representation,” inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), A. Moschitti, B. Pang, and W. Daelemans, Eds. Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. [Online]. Available: https://aclantholog...
work page 2014
-
[45]
R. Fan, Y . Cui, Q. Chen, M. Wang, Y . Zhang, W. Zheng, and Z. Li, “Maicc: A lightweight many-core architecture with in-cache computing for multi-dnn parallel inference,” inProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 411–423. [Online...
-
[46]
Vspim: Sram processing-in- memory dnn acceleration via vector-scalar operations,
C. Nie, C. Tang, J. Lin, H. Hu, C. Lv, T. Cao, W. Zhang, L. Jiang, X. Liang, W. Qian, Y . Sun, and Z. He, “Vspim: Sram processing-in- memory dnn acceleration via vector-scalar operations,”IEEE Transac- tions on Computers, vol. 73, no. 10, pp. 2378–2390, 2024
work page 2024
-
[47]
C. Zou, Z. Wei, J. Y . Lee, C. Nie, K. You, and Z. He, “Polymorpic: Em- bedding polymorphic processing-in-cache in risc-v based processor for full-stack efficient ai inference,” in2025 58th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2025
work page 2025
-
[48]
Randomized pca forest for approximate k-nearest neighbor search,
M. Rajabinasab, F. Pakdaman, A. Zimek, and M. Gabbouj, “Randomized pca forest for approximate k-nearest neighbor search,”Expert Systems with Applications, vol. 281, p. 126254, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S095741742403121X
work page 2025
-
[49]
S. Zeng, Z. Zhu, J. Liu, H. Zhang, G. Dai, Z. Zhou, S. Li, X. Ning, Y . Xie, H. Yang, and Y . Wang, “Df-gas: a distributed fpga-as-a- service architecture towards billion-scale graph-based approximate near- est neighbor search,” in2023 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023, pp. 283–296
work page 2023
-
[50]
Scalable billion-point approximate nearest neighbor search using SmartSSDs,
B. Tian, H. Liu, Z. Duan, X. Liao, H. Jin, and Y . Zhang, “Scalable billion-point approximate nearest neighbor search using SmartSSDs,” in 2024 USENIX Annual Technical Conference (USENIX ATC 24). Santa Clara, CA: USENIX Association, Jul. 2024, pp. 1135–1150. [Online]. Available: https://www.usenix.org/conference/atc24/presentation/tian
work page 2024
-
[51]
H.-W. Hu, W.-C. Wang, Y .-H. Chang, Y .-C. Lee, B.-R. Lin, H.-M. Wang, Y .-P. Lin, Y .-M. Huang, C.-Y . Lee, T.-H. Su, C.-C. Hsieh, C.-M. Hu, Y .-T. Lai, C.-K. Chen, H.-S. Chen, H.-P. Li, T.-W. Kuo, M.-F. Chang, K.-C. Wang, C.-H. Hung, and C.-Y . Lu, “Ice: An intelligent cognition engine with 3d nand-based in-memory computing for vector similarity search ...
-
[52]
Linear algebraic structure of word senses, with applications to polysemy,
S. Arora, Y . Li, Y . Liang, T. Ma, and A. Risteski, “Linear algebraic structure of word senses, with applications to polysemy,”Transactions of the Association for Computational Linguistics, vol. 6, pp. 483–495,
-
[53]
Available: https://aclanthology.org/Q18-1034/
[Online]. Available: https://aclanthology.org/Q18-1034/
-
[54]
Distributed representations of words and phrases and their compositionality,
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, ser. NIPS’13. Red Hook, NY , USA: Curran Associates Inc., 2013, p. 3111–3119
work page 2013
-
[55]
Efficient Estimation of Word Representations in Vector Space,
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” 1 2013
work page 2013
-
[56]
T. Tambe, E.-Y . Yang, Z. Wan, Y . Deng, V . J. Reddi, A. Rush, D. Brooks, and G.-Y . Wei, “Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference,” in2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020, pp. 1–6
work page 2020
-
[57]
Improving neural network efficiency via post-training quan- tization with adaptive floating-point,
F. Liu, W. Zhao, Z. He, Y . Wang, Z. Wang, C. Dai, X. Liang, and L. Jiang, “Improving neural network efficiency via post-training quan- tization with adaptive floating-point,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5281–5290
work page 2021
-
[58]
M. Patel, J. S. Kim, H. Hassan, and O. Mutlu, “Understanding and modeling on-die error correction in modern dram: An experimental study using real devices,” in2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2019, pp. 13– 25
work page 2019
-
[59]
Ddr5 sdram standard (jesd79-5),
JEDEC Solid State Technology Association, “Ddr5 sdram standard (jesd79-5),” JEDEC, Tech. Rep., 2020. [Online]. Available: https: //www.jedec.org/standards-documents/docs/jesd79-5d
work page 2020
-
[60]
A survey of techniques for improving error-resilience of dram,
S. Mittal and M. S. Inukonda, “A survey of techniques for improving error-resilience of dram,”Journal of Systems Architecture, vol. 91, pp. 11–40, 2018. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1383762118301693
work page 2018
-
[61]
Fanns: An fpga-based approximate nearest- neighbor search accelerator,
W. Yuan and X. Jin, “Fanns: An fpga-based approximate nearest- neighbor search accelerator,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 33, no. 4, pp. 1197–1201, 2025
work page 2025
-
[62]
Anna: Specialized architecture for approximate nearest neighbor search,
Y . Lee, H. Choi, S. Min, H. Lee, S. Beak, D. Jeong, J. W. Lee, and T. J. Ham, “Anna: Specialized architecture for approximate nearest neighbor search,” in2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2022, pp. 169–183
work page 2022
-
[63]
Turbocharge anns on real processing-in-memory by enabling fine-grained per-pim-core scheduling,
P. Wu, M. Xie, E. Zhao, D. Zhang, J. Wang, X. Liang, K. Ren, and Y . Chai, “Turbocharge anns on real processing-in-memory by enabling fine-grained per-pim-core scheduling,” inProceedings of the 2025 USENIX Conference on Usenix Annual Technical Conference, ser. USENIX ATC ’25. USA: USENIX Association, 2025
work page 2025
-
[64]
Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search,
H. V . Simhadri, G. Williams, M. Aum ¨uller, M. Douze, A. Babenko, D. Baranchuk, Q. Chen, L. Hosseini, R. Krishnaswamy, G. Srinivasa, S. J. Subramanya, and J. Wang, “Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search,” inProceedings of the NeurIPS 2021 Competitions and Demonstrations Track, ser. Proceedings of Machine...
work page 2021
-
[65]
W. Foundation. Wikimedia downloads. [Online]. Available: https: //dumps.wikimedia.org
-
[66]
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, and L. Deng, “MS MARCO: A human generated machine reading comprehension dataset,”CoRR, vol. abs/1611.09268, 2016. [Online]. Available: http://arxiv.org/abs/1611.09268
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[67]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019. [Online]. Available: http: //arxiv.org/abs/1908.10084
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[68]
C-pack: Packaged resources to advance general chinese embedding,
S. Xiao, Z. Liu, P. Zhang, and N. Muennighoff, “C-pack: Packaged resources to advance general chinese embedding,” 2023
work page 2023
-
[69]
Raptor: Recursive abstractive processing for tree-organized retrieval,
P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” inInternational Conference on Learning Representations (ICLR), 2024
work page 2024
-
[70]
Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps,
X. Ho, A.-K. Duong Nguyen, S. Sugawara, and A. Aizawa, “Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps,” inProceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 6609–6625. [Online]. Available: https:...
work page 2020
-
[71]
HotpotQA: A dataset for diverse, explainable multi-hop question answering,
Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning, “HotpotQA: A dataset for diverse, explainable multi-hop question answering,” inConference on Empirical Methods in Natural Language Processing (EMNLP), 2018
work page 2018
-
[72]
Lv-eval: A balanced long-context benchmark with 5 length levels up to 256k,
T. Yuan, X. Ning, D. Zhou, Z. Yang, S. Li, M. Zhuang, Z. Tan, Z. Yao, D. Lin, B. Li, G. Dai, S. Yan, and Y . Wang, “Lv-eval: A balanced long-context benchmark with 5 length levels up to 256k,” 2024
work page 2024
-
[73]
A dataset of information-seeking questions and answers anchored in research papers,
P. Dasigi, K. Lo, I. Beltagy, A. Cohan, N. A. Smith, and M. Gardner, “A dataset of information-seeking questions and answers anchored in research papers,” 2021
work page 2021
-
[74]
New and improved embedding model,
OpenAI, “New and improved embedding model,” https://openai.com/ index/new-and-improved-embedding-model/, 2022, accessed: 2026-05- 04
work page 2022
-
[75]
Ragas: Supercharge your llm application evalua- tions,
ExplodingGradients, “Ragas: Supercharge your llm application evalua- tions,” https://github.com/explodinggradients/ragas, 2024
work page 2024
-
[76]
J. Kim, J. Jung, K. Lim, B. Sung, J. Kim, B. Lim, T.-G. Noh, J. Lee, H.-G. Seok, Y . Cho, G. Kim, T. Nomiyama, S. Kang, Y . Jeong, S. Cho, G. Kim, D.-H. Oh, J. Kim, Y . Lim, S. Kim, S. Oh, and J. Lee, “A 2.8- to-7.2gt/s ddr5 registering clock driver ic with parallel-data timing and pin-to-pin skew calibration for a dual in-line memory module,” in2024 IEEE...
work page 2024
-
[77]
Channel analysis for a 6.4 gb/s ddr5 data buffer receiver front-end,
S. Lehmann and F. Gerfers, “Channel analysis for a 6.4 gb/s ddr5 data buffer receiver front-end,” in2017 15th IEEE International New Circuits and Systems Conference (NEWCAS), 2017, pp. 109–112
work page 2017
-
[78]
3d-ice 4.0: Accurate and efficient thermal modeling for 2.5d/3d heterogeneous chiplet systems,
K. Zhu, D. Huang, L. Costero, and D. Atienza, “3d-ice 4.0: Accurate and efficient thermal modeling for 2.5d/3d heterogeneous chiplet systems,” inProceedings of the 2026 Design, Automation and Test in Europe Conference (DATE). Verona, Italy: IEEE/ACM, March 2026
work page 2026
-
[79]
JEDEC Solid State Technology Association, “JESD79-5D: DDR5 SDRAM,” 2025, jEDEC Standard
work page 2025
-
[80]
R-trees: a dynamic index structure for spatial searching,
A. Guttman, “R-trees: a dynamic index structure for spatial searching,” SIGMOD Rec., vol. 14, no. 2, p. 47–57, Jun. 1984. [Online]. Available: https://doi.org/10.1145/971697.602266
-
[81]
An algorithm for finding best matches in logarithmic expected time,
J. H. Friedman, J. L. Bentley, and R. A. Finkel, “An algorithm for finding best matches in logarithmic expected time,”ACM Trans. Math. Softw., vol. 3, no. 3, p. 209–226, Sep. 1977. [Online]. Available: https://doi.org/10.1145/355744.355745
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.